MethyC-Seq Analysis Notes
- Enter a screen
screen -S bcl-conversion
- Navigate to directory with run (eg. 130909_SNL119_0105_AC2GYKACXX)
- Check sample sheet configured correctly. If only one adapter in lane, remove adapter sequence from sample sheet.
- Run the following (modified with correct run name, sample sheet etc). Can change final value to change number of reads in files:
/usr/local/packages/CASAVA_v1.8.2/bcl2fastq/build/bin/configureBclToFastq.pl --input-dir /dd_rundata/hiseq/Runs/130909_SNL119_0105_AC2GYKACXX/Data/Intensities/BaseCalls/ --sample-sheet /dd_rundata/hiseq/Runs/130909_SNL119_0105_AC2GYKACXX/SampleSheet.csv --fastq-cluster-count 50000000
- Navigate to newly created Unaligned directory (under top run directory) and enter:
nohup make -j 12
Moving and renaming files
- Copy run files from run directory to working directory:
cp Project_E_grandis ~/working_data
- Rename fastq files to
- Store sequence files a separate directory, eg. sequences. If you have data from the same library but multiple runs, store in separate directories.
Can do multiple samples at a time
map.php (0 mismatches, or use
map_2mm.php for 1 or 2 mismatches) to map all the reads to the genome (follow instructions):
php /home/lister/working_data/php/methpipe_se/map.php | tee -a log.txt
For PE data
php /home/lister/working_data/php/methpipe_pe/map.php | tee -a log.txt
Do one sample at a time. This step will generate all the tables in mySQL (used for AnnoJ and DMR script).
This script can take mapped read runs, merge sets, convert to .slam format, sort reads, collapse, trim, split, stack, hammer, import reads, stacks and mC to MYSQL.
Start with a mapped dir containing the subdir that contain the
\*_final mapped files.
Navigate to directory above mapped run data and start a screen:
screen -S postmap
Start the postmap script as follows:
php /home/lister/working_data/php/methpipe_se/post_map.php | tee -a log.txt
You will see the following prompts:
-Do you want to perform stage 1 (merge mapped runs, convert to .slam, sort, collapse, trim reads) (y/n): y -Do you want to perform stage 2 (import mapped reads into MYSQL) (y/n): y -Do you want to perform stage 3 (stack and hammer) (y/n): y -Do you want to perform stage 4 (import stacks into MYSQL) (y/n): y -Do you want to perform stage 5 (import mC's into MYSQL) (y/n): y -Do you want to perform stage 6 (correct mammalian mCH for genotype) (y/n): n -Do you want to perform stage 7 (make and import allC tables) (y/n): y -Do you want to perform stage 8 (identify partially methylated domains) (y/n): n
Enter the path to mapped folder when prompted.
Shows a summary of all options, then:
- Based on these settings, do you want to proceed (y/n): y - Number of libraries that make up the sample: 1 - Enter run folder names in library 1 (space delim): sequences <-- name of sample folder
Methylpy DMR finder
Add the following to your
alias methylenv='source /usr/local/virtualenv/methylenv/bin/activate; export PYTHONPATH=/usr/local/packages/methylpy:/usr/local/packages/methylpy/methylpy'
All Methylpy steps must be done in using the methylenv. To exit the methylenv, type
If you don’t have the package already it can be cloned from bitbucket:
git clone email@example.com:schultzmattd/methylpy.git
To update, from the directory created by the clone:
To test methylpy:
There are 3 steps to the DMR finding algorithm:
Perform a root mean square test (you can think of it like a chisquare test) on each site across all samples. P-values are simulated (i.e., randomize the data a bunch of times and see if you get a significant result), which adjusts for multiple testing.
Calculate threshold p-value for a desired FDR.
Aggregate any significant sites within X bp and showing changes in the same direction (e.g, sample A is methylated and sample B is unmethylated) into a window.
Generation of allC files
Edit the allC generating script:
Sample base names:
Run allC generating script
Move to folder called “allC”:
$ methylenv (methylenv) $ python create_allc_file_egrandis.py
Run all samples at the same time within an experiment.
Edit the python script named
DMR_find.py with your sample names and parameters.
Run script, in a folder named “DMR”:
python dmr_find.py > dmr_find_ouput.txt
histogram_correction.py script with name of
_rms_results.tsv file from allC step.
python histogram_correction.py >> histogram_correction_output.txt
Use the p-value determined by histogram correction for the collapse step.
Edit the collapse.py script with your sample names and parameters. This may need to be changed and run several times to find the right parameters.
Run the script on the