# 生成单个样本的gVCF文件
# HaplotypeCaller最多可以设置4个线程,由于Java限制再增加也没有用
gatk --java-options "-Xmx10g -XX:ParallelGCThreads=4" HaplotypeCaller -R genome.fasta -I sample1.pe.sort.markdup.bam -ERC GVCF -O sample1.g.vcf.gz
# 利用gatk生成每个样本的gVCF文件时,报错
具体的报错信息:
03:26:02.151 INFO HaplotypeCaller - Requester pays: disabled 03:26:02.151 INFO HaplotypeCaller - Initializing engine 03:26:02.154 INFO HaplotypeCaller - Shutting down engine [September 16, 2025 at 3:26:02 AM UTC] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=335544320 *********************************************************************** A USER ERROR has occurred: Fasta index file file:///home/Rmolle_calllsnp_work/Rmolle_genome_GCA025413875/Rmolle_genomic_GCA_025413875.1.fasta.fai for reference file:///home/Rmolle_calllsnp_work/Rmolle_genome_GCA025413875/Rmolle_genomic_GCA_025413875.1.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it. *********************************************************************** Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.GATK 运行需要参考基因组文件(FASTA)的配套索引文件 .fai和 .dict。当前报错表明 .fai文件缺失。
# 首先,检查文件所在的文件夹以及名称,将文件都移动至基因组文件的文件夹: (base) root@961a4377e759:/home/Rmolle_calllsnp_work/Rmolle_results/bam_sort_bygatk# cp Rmolle_genomic_GCA_025413875.1.dict ../../Rmolle_genome_GCA025413875/ # 其次,检查文件命名,将后缀为.fna改为.fasta(fna后缀不识别): (base) root@961a4377e759:/home/Rmolle_calllsnp_work/Rmolle_genome_GCA025413875# cp Rmolle_genomic_GCA_025413875.1.fna.fai Rmolle_genomic_GCA_025413875.1.fasta.fai # 再次运行,成功!