By masiyi, 14 April, 2024
Forums

以下两个脚本都是在91服务器名为msy_python的容器中,感兴趣的可以进这个容器进行尝试(我把测试数据目录放在了/home/masiyi下)

1.通过python脚本计算染色体长度

其中利用了Bio python库中的SeqIO模块

root@dc2147a07b60:/home/masiyi# cat gene_karyotype.py 
from Bio import SeqIO
fasta_file = "Cinnamomum_japonicum_genome.fasta"
chromosome_lengths = {}
for record in SeqIO.parse(fasta_file, "fasta"):
   if record.id.startswith("chr"):
       chromosome_lengths[record.id] = len(record.seq)
# print("Chr\tStart\tEnd")
# for chromosome, length in chromosome_lengths.items():
#    print("{}\t0\t{}".format(chromosome, length))
with open("karyotype.txt", "w") as file:
   file.write("Chr\tStart\tEnd\n")
   for chromosome, length in chromosome_lengths.items():
       file.write("{}\t0\t{}\n".format(chromosome, length))

可以通过运行gene_karyotype.py在当前目录下得到一个karyotype.txt以制表符分隔的文件(后面的可视化需要的前提文件)

 

2.绘制染色体长度和基因密度可视化的R脚本

其中利用了RIdeogram包进行主要的工作,如有需要请引用:Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. 2020. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Computer Science 6:e251

root@dc2147a07b60:/home/masiyi# cat test.R 
# 导入RIdeogram包
library(RIdeogram)

# 设置输入文件路径
input_file <- "./Cinnamomum_japonicum.gff3.gz"
karyotype_file <- "./gene_karyotype.txt"  # 是上一步py脚本得到的染色体相对长度的结果文件

# 运行GFFex函数并将结果保存为gene_density对象
gene_density <- GFFex(input = input_file, karyotype = karyotype_file, feature = "gene", window = 1000000)

gene_karyotype <- read.delim("/home/masiyi/gene_karyotype.txt")
# data(gene_karyotype, package="RIdeogram")

# 将gene_density保存为txt文件
# output_file <- "/home/masiyi/gene_density.txt"
# write.table(gene_density, file = output_file, sep = "\t", quote = FALSE, row.names = FALSE)

# gene_density <- read.table("./gene_density.txt")
# data(gene_density, package="RIdeogram")	

# 染色体及基因密度可视化保存为svg
ideogram(karyotype = gene_karyotype, overlaid = gene_density)
convertSVG("chromosome.svg", device = "png")

可以得到最终的svg和png格式的染色体长度和基因密度的矢量图文件,如果需要中间的基因密度的过程文件进行校正和其他工作,可以取消 '将gene_density保存为txt文件' 这一步的注释,会生成一个gene_density.txt的过程文件