By ruiyuan Li, 31 October, 2025

【金山文档 | WPS云文档】 pbsm的pipeline
流程

Pipeline: Gene Analysis Workflow

├─ Step 1: BLAST

│ ├─ Input: Query FASTA (-q), Target FASTA folder (-t)

│ ├─ Action: run BLASTN against all target files

│ └─ Output: XML result files (in output_dir)

├─ Step 2: 提取最佳 hits

│ ├─ Input: BLAST XML files

│ ├─ Action: 解析 XML,找每个 query 的最佳 homolog

│ └─ Output: results 文件夹下每个基因子文件夹的 id.txt

├─ Step 3: 合并 FASTA

│ ├─ Input: Query FASTA + Target FASTA

│ ├─ Action: 合并所有 FASTA 文件到 all.fa

│ └─ Output: all.fa

├─ Step 4: 提取 FASTA 对应 id

│ ├─ Input: all.fa + results/id.txt

│ ├─ Action: 提取每个基因的 FASTA 序列

│ └─ Output: results/<gene>/fasta.txt

├─ Step 5: Align2CDS

│ ├─ Input: results/<gene>/fasta.txt

│ ├─ Action: 对每个基因进行 Align2CDS

│ └─ Output: results/<gene>/dna_seq_for_paml.txt

├─ Step 6: 建树 (Tree)

│ ├─ Input: results/<gene>/dna_seq_for_paml.txt

│ ├─ Action: FastTree 构建基因树,去掉 branch length

│ └─ Output: results/<gene>/gene_tree.trees

├─ Step 7: OBSM

│ ├─ Input: results/<gene>/dna_seq_for_paml.txt + gene_tree.trees

│ ├─ Action: 批量运行 obsm

│ └─ Output: results/<gene>/obsmout/

└─ Step 8: best_model 分类输出

├─ Input: results/<gene>/obsmout/

├─ Action: 批量运行 best_model

├─ Output:

│ ├─ positive_results.txt

│ ├─ negative_results.txt

│ ├─ error_log.txt

│ └─ positiveid.txt (所有 gene id)