Recently a method for imputing single cell gene expression matricies was posted on biorxiv by David van Dijk et al., called magic (Markov Affinity-based Graph Imputation of Cells). I’ve been analysing single cell RNA-seq data recently, and this method looks like it could be useful when trying to find co-transcriptional networks, as single cell data suffers from dropout which makes finding co-transcriptional networks hard.read more
Table of Contents
Clone the repo if you haven’t already:
git clone https://github.com/timoast/dac.git
Install the following packages:read more
I recently analysed some smRNA data for a paper I’m working on. These are my analysis notes.
I used previously published data for Brachypodium, from this paper:
Garvin DF, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas S, et al. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463: 763–768. doi:10.1038/nature08747
First step is to download the data:
$ wget ftp://ftp-trace.ncbi.nlm.nih.gov//sra/sra-instant/reads/ByStudy/sra/SRP/SRP001/SRP001895/SRR035616/SRR035616.sra $ fastq-dump SRR035616.sra $ pigz SRR035616.fastq
Illumina TruSeq adapter sequence:
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 50 reads.fastq \ | seqtk trimfq -l 50 - \ | pigz > filtered_reads.fq.gz
2 March 2016
After posting a my first preprint to bioRxiv a few weeks ago, I have been periodically checking the number of views and PDF downloads. I became interested to see how many downloads or views the preprints on bioRxiv typically get, but this type of information isn’t actually available. What are the all-time top bioRxiv preprints? How many people are reading bioRxiv preprints on average? No-one knows! Altmetric must track this data, as it will tell you how a particular preprint ranks in relation to others, but that data hasn’t been made publicly available (as far as I can tell).read more
In an effort to read more papers this year, I’m going to read a paper (or something paper-like) each day for the remainder of the year, and post each paper below as I go.
Galanter JM, Gignoux CR, Oh SS, Torgerson D, Pino-Yanes M, Thakur N, et al. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. eLife. doi:10.7554/eLife.20532read more
This week I presented some recent work at the Lorne Genome conference. It was my first time at Lorne and it’s a fantastic meeting with lots of interesting talks. It was great to see so many people getting interested in transposon biology too!
Here are a few links to the work I was presenting:
TEPID TE presence/absence variant discovery software:
While there are very fast and easy ways to extract reads from a bam file according to mapping location, extracting reads by read name is more difficult.
Simple methods, like using
grep, are incredibly slow if you want to look for more than a few reads.
pysam allows you to index a bam file by read name (using
pysam.IndexedReads(AlignmentFile)) while keeping the sort order. However, the index is stored in memory so this can use a lot of RAM (my tests with a 5.7 GB bam file used about 9 GB RAM).
I wrote a small python script (below) that uses pysam to extract reads by read name from a bam file.
Extracting 10 reads from a 5.7 GB bam file, just using
grep is slightly faster than the python script:
I’ve been looking for a simple way to estimate paired-end insert sizes from mapped data. There are a few more complicated tools available (I think picard will do it), as well as simple scripts different people have written that don’t exclude outliers. However outliers due to discordant read alignments will hugely change the mean, so need to be removed.
As I couldn’t find anything that was a nice middle ground, I ended up writing my own simple python script that will take input from stdin and exclude outliers more that two standard deviations from the median, and print the mean and standard deviation to stdout.
It can be used to estimate the insert size from sam or bam files:
$ head -10000 mapped.sam | python mean_size.py 220 35
$ samtools view mapped.bam | head -10000 | python mean_size.py 220 35
screen -S bcl-conversion