WGBS Analysis notes -- BS-seeker2

Tim Stuart 2016-03-29 2 minute read

Whole genome bisulfite sequencing analysis notes for BS-seeker2.

Step 1: Trim adapters and low quality bases


| seqtk trimfq -l 50 - \
| pigz > filtered_reads.fq.gz

Step 2: Quality control

This step can also be performed on the unfiltered fastq file to compare output before and after read trimming.

fastqc filtered_reads.fq.gz

Step 3: Alignment

3.1: Build genome index

I added the lambda genome as an extra chromosome, L, to calculate the bisulfite nonconversion rate.

python bs_seeker2-build.py -f tair10_lambda.fa --aligner=bowtie2

3.2: Align

python bs_seeker2-align.py -g tair10_lambda.fa --aligner=bowtie2 \
-u unmapped.fa -o mapped.bam --bt2-p 4 -i filtered_reads.fq.gz

Step 4: Post-processing of alignment files

Sort and remove PCR duplicates

samtools sort -@ 10 mapped.bam sorted
picard MarkDuplicates I=sorted.bam O=filtered.bam M=dup_metrics.txt REMOVE_DUPLICATES=true AS=true

Step 5: Call methylation

python bs_seeker2-call_methylation.py -i filtered.bam --sorted -o sample_name \
--db /home/tstuart/working_data/GitHub/BSseeker2/bs_utils/reference_genomes/tair10_lambda.fa_bowtie2/

Step 6: Build database

I like to have the methylation data in a SQL database indexed by position so that it can be easily and quickly be queried.

First create an empty MySQL database that will contain the data tables. In MySQL type:

create database DATABASE_NAME

Now run the build_tables.py script that will create and index a table from the BS-Seeker2 CGMap file

python build_tables.py --host HOST_ADDRESS -p PASSWORD -u USER -d DATABASE_NAME -f CG_MAP_FILE -n SAMPLE_NAME

Program versions:

  • python 2.7.10
  • cutadapt 1.9.1
  • seqtk 1.0-r82-dirty
  • fastqc 0.11.5
  • BS-seeker2 v2.0.10
  • samtools 1.2
  • hstlib 1.2.1
  • picard 2.1.1