跳转至

计算panel bed编码区长度

Bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
#@File    :   run.sh
#@Time    :   2023/11/21 09:32:17
#@Author  :   biolxy
#@Version :   1.0
#@Contact :   biolxy@aliyun.com
#@Desc    :   None

inputbed=$1

# export PATH=/data/biogonco/lixy/bedtools/bedtools2/bin:$PATH  # v2.30.1 不行
export PATH=/data/bioinfo_project/bioinfo_miniconda3/bin:$PATH  # v2.30.0 不行
# /data/biogonco/lixy/bedtools/bedtools2/bin/bedtools

bedtools merge -i ${inputbed} > merge.bed

bedtools summary -i merge.bed -g /mnt/nas_001/project/oncodemo/genome_and_annotation/hg19/gatk/b37/chrom.sizes > merge.bed.summary  # 统计的是 bed 区间的长度


# 去 gencode 下载 gencode.v41.annotation.gff3
# wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh37_mapping/gencode.v44lift37.annotation.gff3.gz
# /mnt/nas_001/project/oncodemo/genome_and_annotation/genome/gffread-0.12.7.Linux_x86_64/gffread gencode.v44lift37.annotation.gff3 -T -o gencode.v44lift37.annotation.bed
# awk '$3 == "CDS"' gencode.v44lift37.annotation.bed | sed 's/^chr//g' > coding_CDS.bed

# coding_CDS.bed 为编码区bed

bedtools intersect -a merge.bed -b /mnt/nas_001/project/oncodemo/genome_and_annotation/genome/coding_CDS.bed | sort -k1,1 -k2,2n |  bedtools merge -i - > panel.cds.bed

# panel.cds.bed 为 pane 的 编码区 bed
bedtools summary -i panel.cds.bed -g /mnt/nas_001/project/oncodemo/genome_and_annotation/hg19/gatk/b37/chrom.sizes > panel.cds.summary 

注意

要注意bedtools的版本问题,我测试的 v2.30.0可以,v2.30.1 不行