Tim Stuart     About     Archive     Publications     Feed

Useful bioinformatics

Trim reads

With cutadapt

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 30 -o output.fq.gz input.fq.gz

With trimmomatic

trimmomatic PE -threads 10 \
      reads_1.fastq.gz reads_2.fastq.gz \
      reads_1_trim.fq reads_1_se_trim.fq reads_2_trim.fq reads_2_se_trim.fq \
      ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 \
      LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Filter short reads from FASTQ

gzip -d -c *.fastq.gz | paste - - - - \
  | awk 'length($2) > 25' \
  | tr "\t" "\n" 

Map RNA-seq

# build index
STAR --runThreadN 10 \
     --runMode genomeGenerate \
     --genomeDir genome \
     --genomeFastaFiles genome.fa \
     --sjdbGTFfile genes.gtf
     
# map
STAR --runThreadN 20 \
     --genomeDir genome \
     --alignIntronMax 5000 \
     --alignIntronMin 10 \
     --readFilesCommand zcat \
     --quantMode GeneCounts \
     --outSAMtype BAM SortedByCoordinate \
     --outFileNamePrefix output_ \
     --readFilesIn input.fq.gz

Pseudomap RNA-seq with kallisto

# build index 
kallisto index -i index.idx Arabidopsis_thaliana.TAIR10.cds.all.fa

#quantify
for filename in *.fq.gz; do
    samplename=(${filename//.fq.gz/ })
    kallisto quant -i index.idx -o $samplename -b 100 --single -l 150 -s 20 $filename -t 4
done

Create coverage track from bam file

bamCoverage is part of deepTools.

bamCoverage -b reads.bam -o coverage.bw -p 10

Create UCSC browser track

You need a track hub, a trackdb file, and track files. These should be placed in a publicly accessible place.

Track hub:

hub Arabidopsis
shortLabel RootRNAseq
longLabel Root
genomesFile genomes.txt
email timstuart90@gmail.com

trackdb

include athTha1.annotationTracks.txt
include root/root.track.txt

Track

track root_coverage
type bigWig
bigDataUrl protoplast/root.bw
shortLabel root
longLabel root_coverage
visibility squish
priority 2
maxHeightPixels 100:60:10

On UCSC go to my data > track hubs > my hubs and enter the address of the track hub file.

Bedtools

bedtools merge -i dmrs_ddc.tsv -d 100 > dmrs_ddc_merged.tsv
bedtools intersect -a a.bed -b b.bed -wa -f 0.50 > targets.bed

Bowtie

Mapping PE data

bowtie2 -p8 --local --fr -q -R 5 -N 1 -x [path/to/bowtie2/index] -X 1000 \
        -1 read_1.fq.gz -2 read_2.fq.gz -S aligned.sam 

Samtools

samtools can read from a stream, so can pipe output in from other tools (eg bowtie to get .bam output) or other samtools commands.

Convert from sam to bam

samtools view -bS file.sam > file.bam

Sort bamfile

samtools sort -@ 20 -O bam -T tmp file.bam 

Samblaster

Extracts reads from .sam alignment files

Get discordant reads

bowtie2 [options] | samblaster -e -d file.sam | samtools view -bS - > file.bam