本测试模块所有测试文件及结果都在91上的 rhoddb-test 容器中
一、安装与启用
1.Click on the green "Clone or download" button on the top right corner of this page to obtain the web URL. Download this module by running git clone <URL> on command line.
将clone的文件放在 /sites/all/modules下面,然后去启用模块
2.Tripal Expression(在TRIPAL EXTENSIONS下面)的依赖模块如下:Requires: Tripal (enabled), Views (enabled), Chaos tools (enabled), Path (enabled), Search (enabled), PHP filter (enabled), Entity API (enabled), Redirect (enabled), Tripal Chado (enabled), Date (enabled), Date API (enabled), Image (enabled), File (enabled), Field (enabled), Field SQL storage (enabled), Link (enabled), Tripal Chado Views (enabled), Tripal Biomaterials (disabled), jQuery Update (enabled)
所以还要开启一个Tripal Biomaterials的模块
二、加载数据
本篇以马樱杜鹃数据作为测试数据
1.去ncbi上下载了马樱杜鹃的100条biosample数据(右下角选择send to,然后file,format选择Full XMLtext下载全100条数据即可),然后在 Tripal -> DataLoader -> Chado Biological...下面选择导入文件( ,后台drush以后去publish即可,看到有100条生物体(biosample)信息被插入了
2.使用生物样品批量上样器插入数据库的属性将作为新字段提供。可以通过转到 admin->structure->Tripal Content Types -> Biological Sample 并按屏幕左上角的 + Check for New Fields 按钮来找到它们。
三、加载表达式数据
需要前期准备的工作:上传马樱杜鹃的转录组的fasta文件(这里的转录组fasta文件可以用gffread命令通过基因组fasta文件和gff3注释文件提取:gffread -w output.fasta -g genome.fasta input.gff)、创建好你想与之关联的analysis(用一开始搭建网站的那个analysis也行)
1.表达式数据加载程序
Chado Expression Data Loader (位置在 Tripal -> Data Loaders -> Chado Expression Data Loader)为用户提供了一种加载与实验关联的表达式数据的方法。加载器可以从两种类型的格式(矩阵和列)加载数据。矩阵格式需要一行包含生物样本名称的数据。第一列应是唯一的功能名称。要素必须已加载到数据库中。如果不存在,将添加生物样本。表达值将映射到列中的生物样本库和行中的特征。一次只能加载一个矩阵文件。列格式要求第一列包含要素,第二列为表达式值。(这里我们做transcript的矩阵文件,用华东师兄转录组流程跑出的那个matrix文件即可)
PS:在文件上传地方,上传文件时会报错说失败了,因为用户数据目录没有赋予写入权限,于是去sites目录下面找,原来tripal的用户数据目录是在sites/default/files/tripal目录中,于是给它开放写入权限,即可上传成功数据
2.在这部分里面必填内容中需要选择organism(马樱杜鹃)、源文件类型(矩阵matrix)、sequence type选择mRNA、名称匹配选择unique name、最下面的单位部分选择FPKM
最后切记要记得把Data Delimiter选择成为comma逗号分隔符,它默认是tab制表分隔符,而我上传的csv是逗号的,不改这个会报错找不到某个转录本,0,0,0,......这样一堆逗号命名的转录本id,错误如下:
Feature, Rhdel03G0200500.1,0,0,0,0.040786,0,0,0,0,0,0,0,0,0,0,0, does not exist in the database.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Feature, Rhdel03G0200500.1,0,0,0,0.040786,0,0,0,0,0,0,0,0,0,0,0, does not exist in the database.
3.然后在直接使用华东师兄跑出来的结果的话会有一个问题,师兄跑的时候是以一个Project为一组跑的,里面的转录本命名是SRRXXXX,而导入生物体样本时候第一行sample name不是这个SRR转录组名字,所以会找不到,导致报错,如下:
Step 1 of 23: Parsing input file...
Found 46348 features and 15 samples.
Step 2 of 23: Find Existing Features...
Step 3 of 23: Find Existing Elements...
Step 4 of 23: Insert New Elements...
Step 5 of 23: Get Newly Inserted Element IDs...
Step 6 of 23: Find NCBI BioSamples...
Note: This step may take a while. It queries NCBI if any samples use GEO
or SRA IDs.
Step 7 of 23: Find Local BioSamples...
Step 8 of 23: Insert New BioSamples...
Skipping: submitter indicated to not create BioSamples.
Step 9 of 23: Get Newly Inserted BioSample IDs...
Skipping: submitter indicated to not create BioSamples.
Step 10 of 23: Find Existing Assays for Each BioSample...
Step 11 of 23: Insert New Assays for BioSamples...
Step 12 of 23: Get Newly Inserted Assays IDs...
Step 13 of 23: Find Assay Biomaterial Links...
Step 14 of 23: Insert New Assay Biomaterial Links...
There is no biomaterial_id available for "SRR7403455". Either check the "Create BioSample Records" checkbox, or make sure the sample names exactly match those loaded with the biomaterial loader.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] There is no biomaterial_id available for "SRR7403455". Either check the "Create BioSample Records" checkbox, or make sure the sample names exactly match those loaded with the biomaterial loader.
于是去NCBI里面查看一下发现Identifiers BioSample: SAMN09460246; Sample name: para-A; SRA: SRS3441789。这个SAMN编号跟上传tripal content里面的编号一样命名方式,那就将那个SRRXXXX部分改成这个SAMN的命名部分即可成功导入(最终版本的测试转录本表达量矩阵数据位置在/var/www/html/sites/default/files/tripal/users/1/Rhododendron_delavayi_finaltestV3_transcript_fpkm_matrix.csv),结果如下:
Step 1 of 23: Parsing input file...
Found 46348 features and 15 samples.
Step 2 of 23: Find Existing Features...
Step 3 of 23: Find Existing Elements...
Step 4 of 23: Insert New Elements...
Step 5 of 23: Get Newly Inserted Element IDs...
Step 6 of 23: Find NCBI BioSamples...
Note: This step may take a while. It queries NCBI if any samples use GEO
or SRA IDs.
Step 7 of 23: Find Local BioSamples...
Step 8 of 23: Insert New BioSamples...
Skipping: submitter indicated to not create BioSamples.
Step 9 of 23: Get Newly Inserted BioSample IDs...
Skipping: submitter indicated to not create BioSamples.
Step 10 of 23: Find Existing Assays for Each BioSample...
Step 11 of 23: Insert New Assays for BioSamples...
Step 12 of 23: Get Newly Inserted Assays IDs...
Step 13 of 23: Find Assay Biomaterial Links...
Step 14 of 23: Insert New Assay Biomaterial Links...
Step 15 of 23: Get Newly Inserted Assay Biomaterial Links...
Step 16 of 23: Clear Acquisition Data...
Step 17 of 23: Insert Acquisition Data...
Step 18 of 23: Get Newly Inserted Acquisition Data...
Step 19 of 23: Insert Quantification Records...
Step 20 of 23: Get Newly Inserted Quantification Records...
Step 22 of 23: Clear Quantification Data for this Assay...
Step 22 of 23: Insert Quantification Props...
Step 23 of 23: Insert Expression Data...
Percent complete: 100.00 %. Memory: 90,228,112 bytes.
Done.
Remapping Chado Controlled vocabularies to Tripal Terms...
Done.
然后可以在block最下面找见两个heatmap的block,就是可以用啦!将这两个block配置在你想让他显示的位置即可
Example修改的位置在Tripal ->Extensions -> Expression Analysis -> Expression Heatmap Search Settings
非常棒的一个内部文章。