MethyC-Seq Analysis Notes

2014-10-29 4 minute read

Preliminary steps


  1. Enter a screen
screen -S bcl-conversion
  1. Navigate to directory with run (eg. 130909_SNL119_0105_AC2GYKACXX)
  2. Check sample sheet configured correctly. If only one adapter in lane, remove adapter sequence from sample sheet.
  3. Run the following (modified with correct run name, sample sheet etc). Can change final value to change number of reads in files:
/usr/local/packages/CASAVA_v1.8.2/bcl2fastq/build/bin/ --input-dir /dd_rundata/hiseq/Runs/130909_SNL119_0105_AC2GYKACXX/Data/Intensities/BaseCalls/ --sample-sheet /dd_rundata/hiseq/Runs/130909_SNL119_0105_AC2GYKACXX/SampleSheet.csv --fastq-cluster-count 50000000
  1. Navigate to newly created Unaligned directory (under top run directory) and enter:
nohup make -j 12

Moving and renaming files

  1. Copy run files from run directory to working directory:
cp Project_E_grandis ~/working_data
  1. Rename fastq files to s_1_sequence.txt, s_2_sequence.txt etc.
  2. Store sequence files a separate directory, eg. sequences. If you have data from the same library but multiple runs, store in separate directories.


Can do multiple samples at a time

Use map.php (0 mismatches, or use map_1mm.php or map_2mm.php for 1 or 2 mismatches) to map all the reads to the genome (follow instructions):

php /home/lister/working_data/php/methpipe_se/map.php | tee -a log.txt

For PE data

php /home/lister/working_data/php/methpipe_pe/map.php | tee -a log.txt


Do one sample at a time. This step will generate all the tables in mySQL (used for AnnoJ and DMR script).

This script can take mapped read runs, merge sets, convert to .slam format, sort reads, collapse, trim, split, stack, hammer, import reads, stacks and mC to MYSQL.

Start with a mapped dir containing the subdir that contain the \*_final mapped files.

Navigate to directory above mapped run data and start a screen:

screen -S postmap

Start the postmap script as follows:

php /home/lister/working_data/php/methpipe_se/post_map.php | tee -a log.txt

You will see the following prompts:

  -Do you want to perform stage 1 (merge mapped runs, convert to .slam, sort, collapse, trim reads) (y/n): y
  -Do you want to perform stage 2 (import mapped reads into MYSQL) (y/n): y
  -Do you want to perform stage 3 (stack and hammer) (y/n): y
  -Do you want to perform stage 4 (import stacks into MYSQL) (y/n): y
  -Do you want to perform stage 5 (import mC's into MYSQL) (y/n): y
  -Do you want to perform stage 6 (correct mammalian mCH for genotype) (y/n): n
  -Do you want to perform stage 7 (make and import allC tables) (y/n): y
  -Do you want to perform stage 8 (identify partially methylated domains) (y/n): n

Enter the path to mapped folder when prompted.

Shows a summary of all options, then:

- Based on these settings, do you want to proceed (y/n): y
- Number of libraries that make up the sample: 1
- Enter run folder names in library 1 (space delim): sequences  <-- name of sample folder

Methylpy DMR finder

Add the following to your .bashrc:

alias methylenv='source /usr/local/virtualenv/methylenv/bin/activate; export PYTHONPATH=/usr/local/packages/methylpy:/usr/local/packages/methylpy/methylpy'

All Methylpy steps must be done in using the methylenv. To exit the methylenv, type deactivate.

If you don’t have the package already it can be cloned from bitbucket:

git clone

To update, from the directory created by the clone:

git pull

To test methylpy:


There are 3 steps to the DMR finding algorithm:

  1. Perform a root mean square test (you can think of it like a chisquare test) on each site across all samples. P-values are simulated (i.e., randomize the data a bunch of times and see if you get a significant result), which adjusts for multiple testing.

  2. Calculate threshold p-value for a desired FDR.

  3. Aggregate any significant sites within X bp and showing changes in the same direction (e.g, sample A is methylated and sample B is unmethylated) into a window.

Generation of allC files

Edit the allC generating script:

Sample base names:

["sample_1_name", "sample_2_name"]

Database names:


MySQL server:


Run allC generating script

Move to folder called “allC”:

$ methylenv
(methylenv) $ python


Run all samples at the same time within an experiment.

Edit the python script named with your sample names and parameters.

Run script, in a folder named “DMR”:

python > dmr_find_ouput.txt

Histogram correction

Modify script with name of _rms_results.tsv file from allC step.

Run script:

python >> histogram_correction_output.txt

Use the p-value determined by histogram correction for the collapse step.


Edit the script with your sample names and parameters. This may need to be changed and run several times to find the right parameters. Run the script on the _rms_results.tsv file.