整个过程借鉴于https://tripal.readthedocs.io/en/latest/user_guide/example_genomics/func_annots/interpro.html
一、从GitHub上面下载模块https://github.com/tripal/tripal_analysis_interpro/archive/refs/heads/7.x-3.x.zip 到modules目录下面
解压后去网站控制端进行安装 Tripal Interpro 这个模块
二、前期数据的准备工作:
这里首先需要了解一个网页工具 InterPro,引用文献:
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunić I, Marchler-Bauer A, Mi H, Natale DA, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A. [InterPro in 2022](https://doi.org/10.1093/nar/gkac993). Nucleic Acids Research, Nov 2022, (doi: 10.1093/nar/gkac993)
里面输入的主要是蛋白序列,由于网页端的原因,限制上传文件的大小(具体是多少不太清楚,反正牛油果的pep全部放进去会出错),所以我这里筛选了93个TPS基因家族的蛋白序列进行上传处理(这里有个txt框输入,也可以直接copy输入,其实输入文件以后也像是读取了文件填入了这个txt框)
然后点search就可以了,等着出结果即可,结果是每条序列一个结果,然后可以统一下载,下载XML格式即可
三、上传数据
首先在 Content -> Tripal Content -> Add Tripal Content --> InterPro Results 按照如下填写必填字段,不是必须填的可以不填
InterPro Annotations of P. americana TPS Gene family v1.0 | |
InterPro Program | InterProScan |
InterPro Version | 4.8 |
Date Performed | Current Date |
Data Source Name | P. americana v1.0 mRNA |
Data Source Version | v1.0 |
Data Source URI | n/a |
Description |
保存即可,得到一个页面http://10.202.40.91:7070/InterProresults/91048
接着上传上面得到的XML数据,在 Tripal > Data Loaders > Chado InterProScan XML results loader 路径下,选择我们要上传的xml文件填入绝对路径,接着analysis选择上面新得到的interpro的方法,最下面的 Query Type 选择mRNA
Import以后drush即可:
[root@f845b400013b raw]# drush trp-run-jobs --username=admin --root=/var/www/html
2024-08-14 07:26:38
Tripal Job Launcher
Running as user 'admin'
-------------------
2024-08-14 07:26:38: There are 4 jobs queued.
2024-08-14 07:26:38: Job ID 623.
2024-08-14 07:26:38: Calling: tripal_tripal_cron_notification()
2024-08-14 07:26:48: Job ID 624.
2024-08-14 07:26:48: Calling: tripal_expire_collections()
2024-08-14 07:26:48: Job ID 625.
2024-08-14 07:26:48: Calling: tripal_expire_files()
2024-08-14 07:26:48: Job ID 626.
2024-08-14 07:26:48: Calling: tripal_run_importer(113)
Running 'Chado InterProScan XML results loader' importer
NOTE: Loading of file is performed using a database transaction.
If it fails or is terminated prematurely then all insertions and
updates are rolled back and will not be found in the database
Percent complete: 100.00 %. Memory: 38,822,224 bytes.
Done
Done.
Remapping Chado Controlled vocabularies to Tripal Terms...
Done.
现在就可以在我们上面做过分析的mRNA的信息页面看到这个mRNA对应的基因家族鉴定的结果以及链接了