By masiyi, 18 August, 2024
Forums

整个过程借鉴于https://tripal.readthedocs.io/en/latest/user_guide/example_genomics/func_annots/interpro.html

一、从GitHub上面下载模块https://github.com/tripal/tripal_analysis_interpro/archive/refs/heads/7.x-3.x.zip 到modules目录下面

解压后去网站控制端进行安装 Tripal Interpro 这个模块

二、前期数据的准备工作:

这里首先需要了解一个网页工具 InterPro,引用文献:

Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunić I, Marchler-Bauer A, Mi H, Natale DA, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A. [InterPro in 2022](https://doi.org/10.1093/nar/gkac993). Nucleic Acids Research, Nov 2022, (doi: 10.1093/nar/gkac993) 

里面输入的主要是蛋白序列,由于网页端的原因,限制上传文件的大小(具体是多少不太清楚,反正牛油果的pep全部放进去会出错),所以我这里筛选了93个TPS基因家族的蛋白序列进行上传处理(这里有个txt框输入,也可以直接copy输入,其实输入文件以后也像是读取了文件填入了这个txt框)

然后点search就可以了,等着出结果即可,结果是每条序列一个结果,然后可以统一下载,下载XML格式即可

三、上传数据

首先在 Content -> Tripal Content -> Add Tripal Content --> InterPro Results 按照如下填写必填字段,不是必须填的可以不填

NameInterPro Annotations of P. americana TPS Gene family v1.0
InterPro ProgramInterProScan
InterPro Version4.8
Date PerformedCurrent Date
Data Source NameP. americana v1.0 mRNA
Data Source Versionv1.0
Data Source URIn/a
DescriptionMaterials & Methods: C. sinensis mRNA sequences were mapped to IPR domains and GO terms using a local installation of InterProScan executed on a computational cluster. InterProScan date files used were MATCH_DATA_v32, DATA_v32.0 and PTHR_DATA v31.0.

保存即可,得到一个页面http://10.202.40.91:7070/InterProresults/91048

 

接着上传上面得到的XML数据,在 Tripal > Data Loaders > Chado InterProScan XML results loader 路径下,选择我们要上传的xml文件填入绝对路径,接着analysis选择上面新得到的interpro的方法,最下面的 Query Type 选择mRNA

Import以后drush即可:

[root@f845b400013b raw]# drush trp-run-jobs --username=admin --root=/var/www/html

2024-08-14 07:26:38
Tripal Job Launcher
Running as user 'admin'
-------------------
2024-08-14 07:26:38: There are 4 jobs queued.
2024-08-14 07:26:38: Job ID 623.
2024-08-14 07:26:38: Calling: tripal_tripal_cron_notification()
2024-08-14 07:26:48: Job ID 624.
2024-08-14 07:26:48: Calling: tripal_expire_collections()
2024-08-14 07:26:48: Job ID 625.
2024-08-14 07:26:48: Calling: tripal_expire_files()
2024-08-14 07:26:48: Job ID 626.
2024-08-14 07:26:48: Calling: tripal_run_importer(113)

Running 'Chado InterProScan XML results loader' importer
NOTE: Loading of file is performed using a database transaction. 
If it fails or is terminated prematurely then all insertions and 
updates are rolled back and will not be found in the database

Percent complete: 100.00 %. Memory: 38,822,224 bytes.
Done


Done.

Remapping Chado Controlled vocabularies to Tripal Terms...
Done.

现在就可以在我们上面做过分析的mRNA的信息页面看到这个mRNA对应的基因家族鉴定的结果以及链接了