By qianfaqiang, 31 January, 2026

Forums

RNACocktail流程由高精度工具组成，用于RNA-Seq分析的不同步骤。它对短读长和长读长技术进行广谱RNA-Seq分析，以从转录组数据中获得有意义的见解。该方法是在分析多种RNA-Seq样本（从生殖系、癌症到干细胞数据集）和技术后开发的，采用多种工具组合，确定了一个全面、快速且准确的流程。

该镜像已经下载到92服务器上

RNACocktail 需要用户自行构建基因组和/或转录组参考序列的索引文件。

我目前都是用的参考基因组比对的，所以只构建了HISAT2的索引，其它索引构建可参考链接文档http://bioinform.github.io/rnacocktail/

hisat2-build [options] reference.fa hisat2_index_basename

RNACocktail用于配对端短读序列（HISAT2）比对

# RNACocktail用于配对端短读序列（HISAT2）比对。
nohup run_rnacocktail.py align \
  --align_idx /expdb/index_dir/hisat2_index_dir/hisat2-idx \
  --outdir /expdb/rnacocktail_result2 \
  --workdir /expdb/rnacocktail_result2/work \
  --ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
  --1 "/expdb/data/GSE66737_fastq/SRR1909019_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909020_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909021_1.fastq.gz" \
      "/expdb/data/GSE66737_fastq/SRR1909022_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909023_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909024_1.fastq.gz" \
  --2 "/expdb/data/GSE66737_fastq/SRR1909019_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909020_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909021_2.fastq.gz" \
      "/expdb/data/GSE66737_fastq/SRR1909022_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909023_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909024_2.fastq.gz" \
  --hisat2 $(which hisat2) \
  --hisat2_sps $(which hisat2_extract_splice_sites.py) \
  --samtools $(which samtools) \
  --threads 10 \
  --sample "Ctrl_SRR1909019,Ctrl_SRR1909020,Ctrl_SRR1909021" \
            "Treat_SRR1909022,Treat_SRR1909023,Treat_SRR1909024" \
  > /expdb/rnacocktail_result2/HISAT2.log 2>&1 &

RNACocktail用于单端短读序列（HISAT2）比对

nohup run_rnacocktail.py align \
  --align_idx /expdb/index_dir/hisat2_index_dir/hisat2-idx \
  --outdir /expdb/rnacocktail_result2/GSE45989/EXPRSS_NlaIII-DGE_100nM \
  --workdir /expdb/rnacocktail_result2/GSE45989/EXPRSS_NlaIII-DGE_100nM/work \
  --ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
  --U /expdb/data/GSE45989_fastq/SRR825597.fastq.gz \
/expdb/data/GSE45989_fastq/SRR825598.fastq.gz \
/expdb/data/GSE45989_fastq/SRR825599.fastq.gz \
/expdb/data/GSE45989_fastq/SRR825600.fastq.gz \
  --hisat2 $(which hisat2) \
  --hisat2_sps $(which hisat2_extract_splice_sites.py) \
  --samtools $(which samtools) \
  --threads 4 \
  --sample "Ctrl_SRR825597,Ctrl_SRR825598,Ctrl_SRR825599.Ctrl_SRR825600" \
  > /expdb/rnacocktail_result2/GSE45989/EXPRSS_NlaIII-DGE_100nM/HISAT2.log 2>&1 &

RNACocktail用于短读长转录组重建（StringTie）

 nohup run_rnacocktail.py reconstruct \
 --alignment_bam /expdb/rnacocktail_result/hisat2/Ctrl_SRR1909019/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Ctrl_SRR1909020/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Ctrl_SRR1909021/alignments.sorted.bam /expdb/rnacocktail_result/hisat2/Treat_SRR1909022/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Treat_SRR1909023/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Treat_SRR1909024/alignments.sorted.bam \
 --outdir /expdb/rnacocktail_result \
 --workdir /expdb/rnacocktail_result/work \
 --ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
 --stringtie $(which stringtie) \
 --threads 5 \
 --sample "Ctrl_SRR1909019,Ctrl_SRR1909020,Ctrl_SRR1909021" \
            "Treat_SRR1909022,Treat_SRR1909023,Treat_SRR1909024" \
 > /expdb/rnacocktail_result/Stringtie.log 2>&1 &

RNACocktail运行，用于对StringTie计算转录组（DESeq2）上HISAT2比对序列的差异表达分析

 nohup run_rnacocktail.py diff \
 --alignments /expdb/rnacocktail_result/hisat2/Ctrl_SRR1909019/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Ctrl_SRR1909020/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Ctrl_SRR1909021/alignments.sorted.bam \
/expdb/rnacocktail_result/hisat2/Treat_SRR1909022/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Treat_SRR1909023/alignments.sorted.bam,/expdb/rnacocktail_result/hisat2/Treat_SRR1909024/alignments.sorted.bam \
 --transcripts_gtfs /expdb/rnacocktail_result/stringtie/Ctrl_SRR1909019/transcripts.gtf,/expdb/rnacocktail_result/stringtie/Ctrl_SRR1909020/transcripts.gtf,/expdb/rnacocktail_result/stringtie/Ctrl_SRR1909021/transcripts.gtf \
/expdb/rnacocktail_result/stringtie/Treat_SRR1909022/transcripts.gtf,/expdb/rnacocktail_result/stringtie/Treat_SRR1909023/transcripts.gtf,/expdb/rnacocktail_result/stringtie/Treat_SRR1909024/transcripts.gtf \
--sample "Ctrl_SRR1909019,Ctrl_SRR1909020,Ctrl_SRR1909021" \
            "Treat_SRR1909022,Treat_SRR1909023,Treat_SRR1909024" \
--ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
 --outdir /expdb/rnacocktail_result \
 --workdir /expdb/rnacocktail_result/work \
 --featureCounts $(which featureCounts) \
 --threads 5 \
 > /expdb/rnacocktail_result/diff.log 2>&1 &

基于双端序列HISAT2+StringTie+DEseq2定量和差异分析

nohup /usr/local/bin/run_rnacocktail.py all \
  --outdir /expdb/rnacocktail_result1 \
  --workdir /expdb/rnacocktail_result1/work \
  --threads 6 \
  --1 "/expdb/data/GSE66737_fastq/SRR1909019_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909020_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909021_1.fastq.gz" \
     "/expdb/data/GSE66737_fastq/SRR1909022_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909023_1.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909024_1.fastq.gz" \
  --2 "/expdb/data/GSE66737_fastq/SRR1909019_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909020_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909021_2.fastq.gz" \
     "/expdb/data/GSE66737_fastq/SRR1909022_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909023_2.fastq.gz,/expdb/data/GSE66737_fastq/SRR1909024_2.fastq.gz" \
  --sample "Ctrl_SRR1909019,Ctrl_SRR1909020,Ctrl_SRR1909021" \
            "Treat_SRR1909022,Treat_SRR1909023,Treat_SRR1909024" \
  --ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
  --ref_genome /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
  --align_idx /expdb/index_dir/hisat2_index_dir/hisat2-idx \
  --unzip \
  --file_format fastq.gz \
  --exclude quantify denovo long_correct long_align long_reconstruct long_fusion variant editing fusion \
  --hisat2 $(which hisat2) \
  --hisat2_sps $(which hisat2_extract_splice_sites.py) \
  --samtools $(which samtools) \
  --stringtie $(which stringtie) \
  --stringtie_opts "-e -G /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf" \
  --featureCounts $(which featureCounts) \
  --R $(which R) \
  > /expdb/rnacocktail_result1/all_pipeline.log 2>&1 &

--align_idx /expdb/index_dir/hisat2_index_dir/hisat2-idx \在输入构建的索路径时，以我的为例，不应该只输入到目录/expdb/index_dir/hisat2_index_dir/，而是要加上你存放索引目录下文件的1.ht2的前缀hisat2-idx，这样RNACocktail才会查找到你的索引，否则会报错

# 给的路径不能是索引所在的目录路径， RNACocktail 期望的是索引文件的基础名称
No HISAT index file /expdb/index_dir/hisat2_index_dir.1.ht2

注：如果要运行all modle时，可以使用--exclude将不需要执行的任务排除掉。如不需要检查基因融合、变异等，可以--exclude variant fusion两个方法之间要用空格隔开不能使用逗号，并且不要加引号~~--exclude "variant,fusion"~~这样是错误的

注：--1 和 --2 参数同一分组的文件使用英文状态下的逗号“，”分隔，实验组与对照组区分则用空格分隔，或者“\”换行。其中我在运行run_rnacocktail.py all时发现不管你有几个样本，比对之后他就会生成各自对应的bam文件，但是单独运行每一步时如run_rnacocktail.py align时，它会将几个样本的bam文件合在一起。生成一个总的bam文件，所以这在做run_rnacocktail.py diff就会有问题了，因为只有一个bam文件，而做差异分析时，至少的2个样本以上。除非使用run_rnacocktail.py align一个样本一个样本比对。当然还是要用run_rnacocktail.py all更方便了。

在运行run_rnacocktail.py all还发现了一个问题就是--sample样本数要和--1和--2上面的样本数对应，不考虑双端，只考虑样本的数量。所以如果每一个组别的样本数过多的话，再执行DEseq2时，会报错创建的目录名称过长（得到差异分析文件时，他会自动将样本名称连在一块创建目录），而导致差异分析这一步运行失败。所以如果样本量过多时，可以将--sample后跟的样本简写，后续通过mv修改目录名称

INFO 2026-01-16 06:56:15,403 src.run_diff         DESeq2 failed!
ERROR 2026-01-16 06:56:15,403 src.run_diff         [Errno 36] File name too long: '/expdb/rnacocktail_result/GSE45989/work/diff-alignment/deseq2/Ctrl_rep1,Ctrl_rep2,Ctrl_rep3,Ctrl_rep4,Ctrl_rep5,Ctrl_rep6,Ctrl_rep7,Ctrl_rep8,Ctrl_rep9,Ctrl_rep10,Ctrl_rep11,Ctrl_rep12-Treat_rep1,Treat_rep2,Treat_rep3,Treat_rep4,Treat_rep5,Treat_rep6,Treat_rep7,Treat_rep8,Treat_rep9,Treat_rep10,Treat_rep11,Treat_rep12'

基于单端序列HISAT2+StringTie+DEseq2定量和差异分析

nohup /usr/local/bin/run_rnacocktail.py all \
  --outdir /expdb/rnacocktail_result1/GSE45989/EXPRSS_water_100nM \
  --workdir /expdb/rnacocktail_result1/GSE45989/EXPRSS_water_100nM/work \
  --threads 6 \
  --U /expdb/data/GSE45989_fastq/SRR825593.fastq.gz,/expdb/data/GSE45989_fastq/SRR825594.fastq.gz,/expdb/data/GSE45989_fastq/SRR825595.fastq.gz,/expdb/data/GSE45989_fastq/SRR825596.fastq.gz \
      /expdb/data/GSE45989_fastq/SRR825597.fastq.gz,/expdb/data/GSE45989_fastq/SRR825598.fastq.gz,/expdb/data/GSE45989_fastq/SRR825599.fastq.gz,/expdb/data/GSE45989_fastq/SRR825600.fastq.gz \
  --sample "Ctrl_SRR825593,Ctrl_SRR825594,Ctrl_SRR825595,Ctrl_SRR825596" \
            "Treal_SRR825597,Treal_SRR825598,Treal_SRR825599,TrealSRR825600" \
  --ref_gtf /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf \
  --ref_genome /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
  --align_idx /expdb/index_dir/hisat2_index_dir/hisat2-idx \
  --unzip \
  --file_format fastq.gz \
  --exclude quantify denovo long_correct long_align long_reconstruct long_fusion variant editing fusion \
  --hisat2 $(which hisat2) \
  --hisat2_sps $(which hisat2_extract_splice_sites.py) \
  --samtools $(which samtools) \
  --stringtie $(which stringtie) \
  --stringtie_opts "-e -G /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf" \
  --featureCounts $(which featureCounts) \
  --R $(which R) \
  > /expdb/rnacocktail_result/GSE45989/EXPRSS_water_100nM/pipeline.log 2>&1 &

因为在跑不同植物的转录组是，不免有一些物种的测序以前是单端的，之前的snakemake应该只可以对双端的测序进行RNA-seq，而RNACocktail只需要加上参数--U，然后把参数--1和--2去掉就可以了。

该流程还包括基于长读的技术，检测基因编辑（editing），基因融合（fusion），变异（variant）等。使用详情可以参考下面链接

http://bioinform.github.io/rnacocktail/

关于stringtie定量基因的时候，最后输出很多MSTRG样式的geneid

用hisat2-stringtie-DESeq2这一套流程做差异表达基因分析的时候的时候，最后会输出很多带MSTRG字样的geneid。

网上有些人给出的答案是这个带MSTRG样id的是新发现的新转录本，但是做出来的结果几乎一半都是带这个id的基因，觉得不太可能。于是在外网搜索到了一篇关于这个问题的开发者的答案，大概的意思就是用stringtie在run脚本的时候因为是多线程的，所以每一个线程分开运行，当接收到一个gene_id的时候会先给他一个MSTRG id，方便之后在合并的时候不会乱
大概就是这个merge gtf这一步的时候，生成的，同一个基因没有重叠的转录本会分割成两个基因，所以会赋予MSTRG的id ，后来我重新搜索基于stringtie-DESeq2的分析流程,hisat2+stringtie+deseq2分析RNA-SEQ数据，发现可以不用merge gtf这一步，所以我直接用自己的注释基因组，最后做出来果然没有MSTRG id了。也就是说如果你对新发现的转录本不感兴趣，就可以不用merge gtf这一步。
内容来源：链接：https://www.jianshu.com/p/1697f2d91eb5

这里我因为最后导入数据库的数据不需要这些新发现的新转录本，所以我试着增加一些参数取消生成新转录本的这个步骤

-e 不预测新的转录本，该参数要配合-G使用。-G后跟参考注释文件 
--stringtie_opts "-e -G /expdb/data/ninanjie/Arabidopsis_thaliana.TAIR10.58.gtf" \

加入这个参数以后StingTie生成的定量文件确实是没有了，但是最后生成的差异分析文件，还是包含了新的转录本。这里我就开始在想为什么了。最后检查出来是，这个是在程序里写死了的一步，将重构的转录本--merge合并了。因此最后得到的差异分析文件还是包含了新的转录本。这也是上面说的如果不想发现新的转录本，可以不用merge gtf这一步。由于这个镜像里写死了，所以就不好再改了。只能再写一个小脚本给新的转录本和已知的分开了。