By masiyi, 30 June, 2024

在批量复现工作时遇到了一些问题(这里以91服务器7070端口的作为示例):

最终的导入的表达量文件中会出现一个问题,即SRR有多个,而对应的Bio sample只有一个或者多个SRR对照同一个Bio sample

以transcript_fpkm_matrix_09.csv为例(有12个SRR,但是只有1个Bio sample),在Chado Expression Data Loader时,我们会对这个文件进行修改,把SRR替换成对应的SAMN,但是多个SRR替换成同一个Bio sampled的SAMN的话就会报错:

Running 'Chado Expression Data Loader' importer
NOTE: Loading of file is performed using a database transaction. 
If it fails or is terminated prematurely then all insertions and 
updates are rolled back and will not be found in the database

Step  1 of 23: Parsing input file...                                       
  Found 48829 features and 1 samples.
Step  2 of 23: Find Existing Features...                                   
Step  3 of 23: Find Existing Elements...                                   
Step  4 of 23: Insert New Elements...                                      
Step  5 of 23: Get Newly Inserted Element IDs...                           
Step  6 of 23: Find NCBI BioSamples...                                     
  Note: This step may take a while. It queries NCBI if any samples use GEO 
        or SRA IDs.
Step  7 of 23: Find Local BioSamples...                                    
Step  8 of 23: Insert New BioSamples...                                    
Step  9 of 23: Get Newly Inserted BioSample IDs...                         
Step 10 of 23: Find Existing Assays for Each BioSample...                  
Step 11 of 23: Insert New Assays for BioSamples...                         
Step 12 of 23: Get Newly Inserted Assays IDs...                            
Step 13 of 23: Find Assay Biomaterial Links...                             
Step 14 of 23: Insert New Assay Biomaterial Links...                       
Step 15 of 23: Get Newly Inserted Assay Biomaterial Links...               
Step 16 of 23: Clear Acquisition Data...                                   
Step 17 of 23: Insert Acquisition Data...                                  
Step 18 of 23: Get Newly Inserted Acquisition Data...                      
Step 19 of 23: Insert Quantification Records...                            
Step 20 of 23: Get Newly Inserted Quantification Records...                
Step 22 of 23: Clear Quantification Data for this Assay...                 
Step 22 of 23: Insert Quantification Props...                              
Step 23 of 23: Insert Expression Data...                                   
SQLSTATE[23505]: Unique violation: 7 ERROR:  duplicate key value violates unique constraint "elementresult_c1"
DETAIL:  Key (element_id, quantification_id)=(48830, 109) already exists.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] SQLSTATE[23505]: Unique violation: 7 ERROR:  duplicate key value violates unique constraint "elementresult_c1"DETAIL:  Key (element_id, quantification_id)=(48830, 109) already exists.

原因是chado数据库中不能重复多个quantification_id(即不可以在csv文件的第一行有重复的表头)会导致导入出错。

经过多次尝试(unique name改为name和Create BioSample Records)均会报上面的错,所以在阅读完官方文档后打算保留原有格式(SRR作为表头进行新的BioSample的创建)

其他不变,Name Match Type选择Name,Data Delimiter下勾选Create BioSample Records选项,导入即可

Running 'Chado Expression Data Loader' importer
NOTE: Loading of file is performed using a database transaction. 
If it fails or is terminated prematurely then all insertions and 
updates are rolled back and will not be found in the database

Step  1 of 23: Parsing input file...                                       
  Found 48829 features and 12 samples.
Step  2 of 23: Find Existing Features...                                   
Step  3 of 23: Find Existing Elements...                                   
Step  4 of 23: Insert New Elements...                                      
Step  5 of 23: Get Newly Inserted Element IDs...                           
Step  6 of 23: Find NCBI BioSamples...                                     
  Note: This step may take a while. It queries NCBI if any samples use GEO 
        or SRA IDs.
Step  7 of 23: Find Local BioSamples...                                    
Step  8 of 23: Insert New BioSamples...                                    
Step  9 of 23: Get Newly Inserted BioSample IDs...                         
Step 10 of 23: Find Existing Assays for Each BioSample...                  
Step 11 of 23: Insert New Assays for BioSamples...                         
Step 12 of 23: Get Newly Inserted Assays IDs...                            
Step 13 of 23: Find Assay Biomaterial Links...                             
Step 14 of 23: Insert New Assay Biomaterial Links...                       
Step 15 of 23: Get Newly Inserted Assay Biomaterial Links...               
Step 16 of 23: Clear Acquisition Data...                                   
Step 17 of 23: Insert Acquisition Data...                                  
Step 18 of 23: Get Newly Inserted Acquisition Data...                      
Step 19 of 23: Insert Quantification Records...                            
Step 20 of 23: Get Newly Inserted Quantification Records...                
Step 22 of 23: Clear Quantification Data for this Assay...                 
Step 22 of 23: Insert Quantification Props...                              
Step 23 of 23: Insert Expression Data...                                   
Percent complete: 100.00 %. Memory: 91,391,016 bytes.
Done.

Remapping Chado Controlled vocabularies to Tripal Terms...
Done.

成功创建,经过测试确实创建了新的十二个BioSample,命名就是SRR的格式,但是不能像其他SMAN的样子与ncbi有reference,其次测试与之对应的热图上也会显示SRR号和跳转功能,没有影响