methylCRF
Overview
No current assay of DNA methylation provides single-CpG resolution, comprehensive genome-wide coverage, and cost feasibility for a typical laboratory. To address this gap, we introduce methylCRF, a novel Conditional Random Field-based algorithm that integrates methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data to predict DNA methylation levels at single CpG resolution. Our method is a combined computational and experimental strategy to produce DNA methylomes of all 28 million CpGs in the human genome for a fraction (<10%) of the cost of whole genome bisulfite sequencing methods.
methylCRF was benchmarked for accuracy against Infinium arrays, RRBS, WGBS sequencing and locus specific-bisulfite sequencing performed on the same embryonic stem cell line. methylCRF transformation of MeDIP-seq/MRE-seq was equivalent to a biological replicate of WGBS in quantification, coverage and resolution.
This page describes how to download the methylCRF-suite of programs and run it on a MeDIP-seq/MRE-seq data set.
Usage
Name
- methylCRF.pl - Predict genome-wide CpG methylation levels using MeDIP-seq and MRE-seq assays.
Synapsis
- methylCRF.pl [options] <MeDIP_seq bed> <MRE_seq bed> <MRE virtual digest xx_cpg.bed> >mCRF.bed 2>err
Description
Arguments
MeDIP_seq.bed | MeDIP-seq (with with uniq alignemnts) |
MRE_seq.bed | MRE-seq (normalized read counts by MRE_norm.pl) |
MRE vdigest | CpG's sampled by MRE-seq. format: CpGid |
Options
-gdir | directory with model specific files: crf.list, cut.list, model files: [crfname].mdl [dflt: mdl] |
-gdat | directory with genomic specific files: cpg.bed (Epigenetic Roadmap blacklist removed), gdata.tbl, [crfnm]_cpg.bin [dflt: gdat] |
-gap | The size of gap between consecutive CpG's for which to start a new region: 750bp was used for model in paper. [dflt: 750] |
-eid | Prefix to add to all output files [dflt: eid] |
-sfrom/-gto | Optionally only run certain steps (sfrom: start from, gto: go untill):
|
Requirements
- a working Unix-like OS
- standard GNU tools and perl.
- samtools
- pre-aligned MeDIP-seq and MRE-seq libraries in bam format
How-to
- Get methylCRF code by either:
- git
git clone https://bitbucket.org/mscse/methylcrf.git
- or wget
wget https://bitbucket.org/mscse/methylcrf/get/fd50dc97b171.zip unzip fd50dc97b171.zip
- git
- Run make to compile binaries and set up links
cd methylCRF make
*Note: stay in the methylCRF directory for the following steps
- Retrieve a methylCRF model
- H1ES
wget http://methylcrf.wustl.edu/h1es_mdl.tgz tar -zxf h1es_mdl.tgz
- H1ES
- Retrieve species-specific data files
- hg19
wget http://methylcrf.wustl.edu/hg19/hg19_gdat.tgz tar -zxf hg19_gdat.tgz
- Please email to request other species not listed below.
- hg19
- Prepare MeDIP-seq and MRE-seq data
- Generate from sequencing experiments
- MeDIP Generate sorted bed file of unique alignments. Note that the reads should be virtually extended to expected fragment length. The score field should 1.
- ex: chr1 <begin> <begin+length> . 1 <strand>
- MRE Generate normalized MRE counts
-
Retreive an MRE fragment file for. Note for 5-enzyme MRE replace '3enz' with '5enz'. (email me for other species)
wget http://methylcrf.wustl.edu/hg19/MRE_frags/MRE_3enz_4_6000.bed
- Run MRE_norm.pl. It outputs <expr_prefix>_MRE.bed (as well as some diagnostic files)
NOTE: the call number is the number of bases that were cut off the read during sequencing (it's optional)( samtools view <bam1> |sam2bed.pl -r -q <minMapq> -c <bam1call> - samtools view <bam2> |sam2bed.pl -r -q <minMapq> -c <bam2call> - ) | MRE_norm.pl - MRE_{3,5}enz_4_6000.bed <expr_prefix>
- Generate from sequencing experiments
- Alternately Download preprocessed H1ES data used for the paper:
wget http://methylcrf.wustl.edu/h1es_dipmre.tgz tar -zxf h1es_dipmre.tgz
- Estimate Methylation by running methylCRF
-
for example, for the H1ES data set used in the paper:
export PATH="~/src/methylCRF/:$PATH" ./methylCRF.pl -eid H1ES -mdir h1es_mdl -gdat hg19_gdat H1ESmrg_DIP.bed H1ESmrg_MRE.bed MRE_3enz_4_6000_cpg.bin >H1ES_mCRF.bed 2>err
NOTE2: if some script can't find another, check that you have . in your path (ie, in: echo $PATH)
-
for example, for the H1ES data set used in the paper:
MRE Protocol Enzymes
- 3 enzyme:
HpaII (CCGG),
SsiI (CCGC),
Hin6I (GCGC)
- 4 enzyme:
above 3,
HpyCH4IV (ACGT)
- 5 enzyme:
above 4,
BstUI (CGCG)
Downloads
Experimental Data
- 3 enzyme: HpaII (CCGG), SsiI (CCGC), Hin6I (GCGC)
- 4 enzyme: above 3, HpyCH4IV (ACGT)
- 5 enzyme: above 4, BstUI (CGCG)
Downloads
Experimental Data
- H1ES
- MeDIP: unique alignment locations bed
- MRE: normalized read counts bed
Model
Species Data
Future
- As more WGBS data becomes evailable, we plan to retrain methylCRF with combinations of cell-types.
- We will continue to explore the integration of more types of data.