By Gengxin, 31 March, 2026
Forums

三个碱基翻译成一个蛋白质,但是如何检查CDS文件中几万个基因中,哪些CDS的序列是三的倍数?

运行一下脚本,不是3的倍数的CDS序列就被保存在bad_Tdistichum.cds.genes1.txt中

awk '
/^>/ {if(seqlen&&seqlen%3!=0)print substr(name,2);name=$0;seqlen=0;next}{seqlen+=length($0)}
END{if(seqlen%3!=0)print substr(name,2)}
' /data2/yuweiliang/genome_data/Tdistichum.cds > bad_Tdistichum.cds.genes1.txt