methylCRF

Combining MeDIP-seq and MRE-seq to estimate single CpG methylation genome wide.


Overview

No current assay of DNA methylation provides single-CpG resolution, comprehensive genome-wide coverage, and cost feasibility for a typical laboratory. To address this gap, we introduce methylCRF, a novel Conditional Random Field-based algorithm that integrates methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data to predict DNA methylation levels at single CpG resolution. Our method is a combined computational and experimental strategy to produce DNA methylomes of all 28 million CpGs in the human genome for a fraction (<10%) of the cost of whole genome bisulfite sequencing methods.

methylCRF was benchmarked for accuracy against Infinium arrays, RRBS, WGBS sequencing and locus specific-bisulfite sequencing performed on the same embryonic stem cell line. methylCRF transformation of MeDIP-seq/MRE-seq was equivalent to a biological replicate of WGBS in quantification, coverage and resolution.

This page describes how to download the methylCRF-suite of programs and run it on a MeDIP-seq/MRE-seq data set.


Usage

Name

Synapsis

Description

Arguments
MeDIP_seq.bedMeDIP-seq (with with uniq alignemnts)
MRE_seq.bedMRE-seq (normalized read counts by MRE_norm.pl)
MRE vdigestCpG's sampled by MRE-seq. format: CpGid[0|1] where CpGid corresponds to cpg.bed.
Options
-gdirdirectory with model specific files: crf.list, cut.list, model files: [crfname].mdl [dflt: mdl]
-gdatdirectory with genomic specific files: cpg.bed (Epigenetic Roadmap blacklist removed), gdata.tbl, [crfnm]_cpg.bin [dflt: gdat]
-gapThe size of gap between consecutive CpG's for which to start a new region: 750bp was used for model in paper. [dflt: 750]
-eidPrefix to add to all output files [dflt: eid]
-sfrom/-gtoOptionally only run certain steps (sfrom: start from, gto: go untill):
  1. Generate MRE/DIP avg window score files (1.5 DIPMRE count files)
  2. Make MRE-sampled CpG file [deprecated]
  3. Generate CRF tables (~ 1.5 hr)
  4. Prediction
  5. Combine predictions into ensemble
  6. make dirs to put extra files in [have to give explicitely]

Requirements

  1. a working Unix-like OS
  2. standard GNU tools and perl.
  3. samtools
  4. pre-aligned MeDIP-seq and MRE-seq libraries in bam format

How-to

  1. Get methylCRF code by either:
    • git
      git clone https://bitbucket.org/mscse/methylcrf.git
    • or wget
      wget https://bitbucket.org/mscse/methylcrf/get/fd50dc97b171.zip 
      unzip fd50dc97b171.zip
      

  2. Run make to compile binaries and set up links
    cd methylCRF
    make
    
  3. *Note: stay in the methylCRF directory for the following steps

  4. Retrieve a methylCRF model
    • H1ES
      wget http://methylcrf.wustl.edu/h1es_mdl.tgz
      tar -zxf h1es_mdl.tgz
      

  5. Retrieve species-specific data files
    • hg19
      wget http://methylcrf.wustl.edu/hg19/hg19_gdat.tgz
      tar -zxf hg19_gdat.tgz
      
    • Please email to request other species not listed below.
  6. Prepare MeDIP-seq and MRE-seq data
    • Generate from sequencing experiments
      • MeDIP
      • Generate sorted bed file of unique alignments. Note that the reads should be virtually extended to expected fragment length. The score field should 1.
        • ex: chr1 <begin> <begin+length> . 1 <strand>
      • MRE
      • Generate normalized MRE counts
        • Retreive an MRE fragment file for. Note for 5-enzyme MRE replace '3enz' with '5enz'. (email me for other species)
              wget http://methylcrf.wustl.edu/hg19/MRE_frags/MRE_3enz_4_6000.bed
          

        • Run MRE_norm.pl. It outputs <expr_prefix>_MRE.bed (as well as some diagnostic files)
        • ( 
           samtools view <bam1> |sam2bed.pl -r -q <minMapq> -c <bam1call> - 
           samtools view <bam2> |sam2bed.pl -r -q <minMapq> -c <bam2call> - 
          ) | MRE_norm.pl - MRE_{3,5}enz_4_6000.bed <expr_prefix> 
          NOTE: the call number is the number of bases that were cut off the read during sequencing (it's optional)
    • Alternately Download preprocessed H1ES data used for the paper:
      wget http://methylcrf.wustl.edu/h1es_dipmre.tgz
      tar -zxf h1es_dipmre.tgz
  7. Estimate Methylation by running methylCRF
    • for example, for the H1ES data set used in the paper:
      export PATH="~/src/methylCRF/:$PATH"
      ./methylCRF.pl -eid H1ES -mdir h1es_mdl -gdat hg19_gdat H1ESmrg_DIP.bed H1ESmrg_MRE.bed MRE_3enz_4_6000_cpg.bin >H1ES_mCRF.bed 2>err
      
      NOTE1: use the 3 or 5 enzyme fragment file use for MRE normalization.
      NOTE2: if some script can't find another, check that you have . in your path (ie, in: echo $PATH)

MRE Protocol Enzymes


Downloads

Experimental Data

Model

Species Data


Future






methylCRF Google Group

Please post questions here or send questions directly to methylcrf _at_ googlegroups.com