使用cistrome BETA整合ChIPseq和RNAseq

发布时间 2023-04-19 23:22:55作者: Life·Intelligence

 

写在前面:

在获得同一个样本多种测序数据后,一个自然的目标就是整合,general的问题就是:表观是如何影响转录的?

基本的数据种类:

  • TF binding,ChIP-seq和Cut&Run
  • Histone profile,ChIP-seq和Cut&Run
  • Open chromatin,ATAC-seq
  • Gene expression,RNA-seq

具体的问题就是:

  • 表观转录调控是如何进行的?
  • 能否用表观数据来准确预测GEX?需要大样本或者单细胞!
  • 能否搞清楚DEG是如何被该TF调控的?涉及到哪些pathway?有哪些co-factor?

 

BETA功能:

BETA has three functions:

  • (i) to predict whether the factor has activating or repressive function
  • (ii) to infer the factor’s target genes
  • (iii) to identify the motif of the factor and its collaborators, which might modulate the factor’s activating or repressive function.

用于预测转录因子具有激活还是抑制的功能;
推断识别转录因子的直接靶基因;
用于鉴定转录因子的motif及其结合者。

 

BETA-basic can be used to predict whether a factor has activating or repressive function and detect direct target genes.【这个手动就能做出来,就是个peak注释,然后用DEG来分】

BETA-plus can be used to predict whether a factor has activating or repressive function, whether it can detect direct target genes and whether it can analyze sequence motifs in target regions.【这个的统计分析有一点高端】

Both binding and differential expression data are required for BETA-basic and BETA-plus, whereas BETA-minus is used when only binding data are available to predict target genes.【这个就不用说了,chipseeker,chip-anno, great都可以坐】

 

优点:

  • BETA能给你把一切都rank好,而且给出P-value,你自己做则有一点麻烦

缺点:

  • BETA基本不涉及到算法,也没什么fancy的地方

 

BETA输入:

  • 文件一:peak文件,bed格式,我觉得这个必须用DAP的peak
  • 文件二:DEG文件,需要最基本的gene name,log2fc,P-value

 


 

代码:

安装

conda create -y -n beta_chip python=2.7.15
conda activate beta_chip
# conda install -y -c hcc beta 
# conda install -y libiconv
pip install argparse
pip install numpy
# download from http://cistrome.org/BETA/src/BETA_1.0.7.zip
python setup.py install

  

BETA basic \
    -p 3656_peaks.bed \
    -e AR_diff_expr.xls  \
    -k LIM  \
    -g hg19 \
    --da 500 \
    -n AR \
    -o basic_output_dir 

  

BETA plus \
    -p 3656_peaks.bed \
    -e AR_diff_expr.xls  \
    -k LIM \
    -g hg19 \
   --gs /home/zz950/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa \
    -n AR \
    -o plus_output_dir

 

测试数据:http://cistrome.org/BETA/src/BETA_test_data.zip  

 

如何产生有意义的结果?

  1. 用DAP和DEG,用diff来预测diff;
  2. DAP分gain和loss两类,分别来看;
  3. 其他peak文件也可以用BETA来分析,注意分gain和loss;

 

 


 

原理:

看原始文献

用peak来predict DEG

monotonically decreasing function that is based on the distance between the binding site and transcription start site - 这不太行,一个distal enhancer可以10x GEX表达量,递减不合适

核心逻辑:gene expression changes associated with factor binding can give better confidence that a gene is a direct target

 

能否拓展到其他peak数据?

  • Histone profile
  • Open chromatin

 

局限:

仍然是在找association,因为引入了更可靠的假设,所以结果将会更加可靠。

  • First, factor-binding sites and target genes usually lack a one-to-one relationship. The same factor could bind anywhere between the proximal promoter to hundreds of kilobases downstream to regulate gene expression.
  • Alternatively, the same binding site could regulate multiple genes by interacting with different promoters in different subpopulations of cells.
  • Second, not all factor-binding sites found in a ChIP-seq experiment are functional, potentially owing to the lack of collaborating factors or conditions favorable to their function.
  • Finally, the binding of one factor may cause secondary effects owing to transcriptional changes of its direct targets.

 

其他工具:

 

参考: