以下两个脚本都是在91服务器名为msy_python的容器中,感兴趣的可以进这个容器进行尝试(我把测试数据目录放在了/home/masiyi下)
1.通过python脚本计算染色体长度
其中利用了Bio python库中的SeqIO模块
root@dc2147a07b60:/home/masiyi# cat gene_karyotype.py
from Bio import SeqIO
fasta_file = "Cinnamomum_japonicum_genome.fasta"
chromosome_lengths = {}
for record in SeqIO.parse(fasta_file, "fasta"):
if record.id.startswith("chr"):
chromosome_lengths[record.id] = len(record.seq)
# print("Chr\tStart\tEnd")
# for chromosome, length in chromosome_lengths.items():
# print("{}\t0\t{}".format(chromosome, length))
with open("karyotype.txt", "w") as file:
file.write("Chr\tStart\tEnd\n")
for chromosome, length in chromosome_lengths.items():
file.write("{}\t0\t{}\n".format(chromosome, length))
可以通过运行gene_karyotype.py在当前目录下得到一个karyotype.txt以制表符分隔的文件(后面的可视化需要的前提文件)
2.绘制染色体长度和基因密度可视化的R脚本
其中利用了RIdeogram包进行主要的工作,如有需要请引用: 2020. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Computer Science 6:e251 https://doi.org/10.7717/peerj-cs.251
root@dc2147a07b60:/home/masiyi# cat test.R
# 导入RIdeogram包
library(RIdeogram)
# 设置输入文件路径
input_file <- "./Cinnamomum_japonicum.gff3.gz"
karyotype_file <- "./gene_karyotype.txt" # 是上一步py脚本得到的染色体相对长度的结果文件
# 运行GFFex函数并将结果保存为gene_density对象
gene_density <- GFFex(input = input_file, karyotype = karyotype_file, feature = "gene", window = 1000000)
gene_karyotype <- read.delim("/home/masiyi/gene_karyotype.txt")
# data(gene_karyotype, package="RIdeogram")
# 将gene_density保存为txt文件
# output_file <- "/home/masiyi/gene_density.txt"
# write.table(gene_density, file = output_file, sep = "\t", quote = FALSE, row.names = FALSE)
# gene_density <- read.table("./gene_density.txt")
# data(gene_density, package="RIdeogram")
# 染色体及基因密度可视化保存为svg
ideogram(karyotype = gene_karyotype, overlaid = gene_density)
convertSVG("chromosome.svg", device = "png")
可以得到最终的svg和png格式的染色体长度和基因密度的矢量图文件,如果需要中间的基因密度的过程文件进行校正和其他工作,可以取消 '将gene_density保存为txt文件' 这一步的注释,会生成一个gene_density.txt的过程文件