By ruiyuan Li, 31 July, 2025

原始数据在/data/public_92/Torreya_grandis/NCBI-20230420,是鑫哥整理的版本
容器用的是鑫哥的061b51fa0bb3
数据处理
把从ncbi上下载的sra数据转换为fastq
下载sratoolkit
1. 创建专用环境
conda create -n sra_tools python=3.9
2. 激活环境
conda activate sra_tools
3. 安装 sratoolkit
conda install -c bioconda sra-tools
使用
单端测序(或RNA测序数据,最开始受网上信息的误导,博主讲RNA数据用单端测序,但是运行完发现,下载数据是双端测序的数据):
fastq-dump SRR23105320.sra -O /home/lry 
转换完成后,根据华东师兄的方法,压缩为gz文件
双端测序
fastq-dump SRR14306907.sra --split-3 --gzip -O ./ 
用华东师兄的流程跑,config,info文件的内容
PROJECT: ganhan
SAMPLES: ["SRR23105320","SRR23105321","SRR23105322","SRR23105323","SRR23105324","SRR23105325","SRR23105326","SRR23105328"]
GENOME: /home/lry/Tgra.fa
ANNOTATION: /home/lry/Tgra.chr.convert2.gtf
INPUTPATH: /home/lry/ganhan1
OUTPUTPATH: /home/lry/ganhan/ganhanout
THREAD: "10"
INFO: /home/configs/info.csv
CONTROL: ["CK"]
TREAT: ["D60","D40"]
SPECIES: Torreya
(base) root@061b51fa0bb3:/home/configs# cat info.csv 
id,condition
SRR23105320,D60
SRR23105321,D60
SRR23105322,D60
SRR23105323,D40
SRR23105324,D40
SRR23105325,D40
SRR23105326,CK
SRR23105328,CK在/home/run路径下运行
snakemake -s exp.py -j 10
在运行时发现,ck组有一个数据是有问题的SRR23105327,所有就只跑了两组ck

不同发育阶段的配置文件
(base) root@061b51fa0bb3:/home/configs# cat config.yaml

PROJECT: ganhan
SAMPLES: ["SRR12964406","SRR12964407","SRR12964408","SRR12964409","SRR12964410","SRR12964411","SRR12964412","SRR12964413","SRR12964414"]
GENOME: /home/lry/Tgra.fa
ANNOTATION: /home/lry/Tgra.chr.convert2.gtf
INPUTPATH: /home/lry/fayu/fayu1
OUTPUTPATH: /home/lry/fayu/fayuout
THREAD: "10"
INFO: /home/configs/info.csv
CONTROL: ["CK"]
TREAT: ["D60","D40"]
SPECIES: Torreya
(base) root@061b51fa0bb3:/home/configs# cat info.csv 
id,condition
SRR12964406,2
SRR12964407,1
SRR12964408,1
SRR12964409,1
SRR12964410,3
SRR12964411,3
SRR12964412,3
SRR12964413,2
SRR12964414,2