在批量复现工作时遇到了一些问题(这里以91服务器7070端口的作为示例):
最终的导入的表达量文件中会出现一个问题,即SRR有多个,而对应的Bio sample只有一个或者多个SRR对照同一个Bio sample
以transcript_fpkm_matrix_09.csv为例(有12个SRR,但是只有1个Bio sample),在Chado Expression Data Loader时,我们会对这个文件进行修改,把SRR替换成对应的SAMN,但是多个SRR替换成同一个Bio sampled的SAMN的话就会报错:
Running 'Chado Expression Data Loader' importer
NOTE: Loading of file is performed using a database transaction.
If it fails or is terminated prematurely then all insertions and
updates are rolled back and will not be found in the database
Step 1 of 23: Parsing input file...
Found 48829 features and 1 samples.
Step 2 of 23: Find Existing Features...
Step 3 of 23: Find Existing Elements...
Step 4 of 23: Insert New Elements...
Step 5 of 23: Get Newly Inserted Element IDs...
Step 6 of 23: Find NCBI BioSamples...
Note: This step may take a while. It queries NCBI if any samples use GEO
or SRA IDs.
Step 7 of 23: Find Local BioSamples...
Step 8 of 23: Insert New BioSamples...
Step 9 of 23: Get Newly Inserted BioSample IDs...
Step 10 of 23: Find Existing Assays for Each BioSample...
Step 11 of 23: Insert New Assays for BioSamples...
Step 12 of 23: Get Newly Inserted Assays IDs...
Step 13 of 23: Find Assay Biomaterial Links...
Step 14 of 23: Insert New Assay Biomaterial Links...
Step 15 of 23: Get Newly Inserted Assay Biomaterial Links...
Step 16 of 23: Clear Acquisition Data...
Step 17 of 23: Insert Acquisition Data...
Step 18 of 23: Get Newly Inserted Acquisition Data...
Step 19 of 23: Insert Quantification Records...
Step 20 of 23: Get Newly Inserted Quantification Records...
Step 22 of 23: Clear Quantification Data for this Assay...
Step 22 of 23: Insert Quantification Props...
Step 23 of 23: Insert Expression Data...
SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "elementresult_c1"
DETAIL: Key (element_id, quantification_id)=(48830, 109) already exists.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "elementresult_c1"DETAIL: Key (element_id, quantification_id)=(48830, 109) already exists.
原因是chado数据库中不能重复多个quantification_id(即不可以在csv文件的第一行有重复的表头)会导致导入出错。
经过多次尝试(unique name改为name和Create BioSample Records)均会报上面的错,所以在阅读完官方文档后打算保留原有格式(SRR作为表头进行新的BioSample的创建)
其他不变,Name Match Type选择Name,Data Delimiter下勾选Create BioSample Records选项,导入即可
Running 'Chado Expression Data Loader' importer
NOTE: Loading of file is performed using a database transaction.
If it fails or is terminated prematurely then all insertions and
updates are rolled back and will not be found in the database
Step 1 of 23: Parsing input file...
Found 48829 features and 12 samples.
Step 2 of 23: Find Existing Features...
Step 3 of 23: Find Existing Elements...
Step 4 of 23: Insert New Elements...
Step 5 of 23: Get Newly Inserted Element IDs...
Step 6 of 23: Find NCBI BioSamples...
Note: This step may take a while. It queries NCBI if any samples use GEO
or SRA IDs.
Step 7 of 23: Find Local BioSamples...
Step 8 of 23: Insert New BioSamples...
Step 9 of 23: Get Newly Inserted BioSample IDs...
Step 10 of 23: Find Existing Assays for Each BioSample...
Step 11 of 23: Insert New Assays for BioSamples...
Step 12 of 23: Get Newly Inserted Assays IDs...
Step 13 of 23: Find Assay Biomaterial Links...
Step 14 of 23: Insert New Assay Biomaterial Links...
Step 15 of 23: Get Newly Inserted Assay Biomaterial Links...
Step 16 of 23: Clear Acquisition Data...
Step 17 of 23: Insert Acquisition Data...
Step 18 of 23: Get Newly Inserted Acquisition Data...
Step 19 of 23: Insert Quantification Records...
Step 20 of 23: Get Newly Inserted Quantification Records...
Step 22 of 23: Clear Quantification Data for this Assay...
Step 22 of 23: Insert Quantification Props...
Step 23 of 23: Insert Expression Data...
Percent complete: 100.00 %. Memory: 91,391,016 bytes.
Done.
Remapping Chado Controlled vocabularies to Tripal Terms...
Done.
成功创建,经过测试确实创建了新的十二个BioSample,命名就是SRR的格式,但是不能像其他SMAN的样子与ncbi有reference,其次测试与之对应的热图上也会显示SRR号和跳转功能,没有影响