Tim Stuart     About     Archive     Publications     Feed

Using python decorators

Yesterday I wrote my first python decorator. Decorators have always seemed a bit mysterious to me, but having finally written one I can see a bit better how they work. This is the decorator I wrote:

read more

Create a computational lab notebook with bookdown

Every data analysis I do now is kept in an R Markdown document. These are great for mixing code with explanatory text, and you can run code in many languages not just R. Whenever I finished working on something, I would compile the R Markdown document into a self-contained html report and save is somewhere, usually with a descriptive filename like “coverage_genes” or “col_vs_cvi”.

read more

Installing Magic

Installing magic

Recently a method for imputing single cell gene expression matricies was posted on biorxiv by David van Dijk et al., called magic (Markov Affinity-based Graph Imputation of Cells). I’ve been analysing single cell RNA-seq data recently, and this method looks like it could be useful when trying to find co-transcriptional networks, as single cell data suffers from dropout which makes finding co-transcriptional networks hard.

read more

R demo

Table of Contents

Getting started

Clone the repo if you haven’t already:

git clone https://github.com/timoast/dac.git

Install RStudio.

Install the following packages:

read more

smRNA analysis notes

I recently analysed some smRNA data for a paper I’m working on. These are my analysis notes.

I used previously published data for Brachypodium, from this paper:

Garvin DF, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas S, et al. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463: 763–768. doi:10.1038/nature08747

First step is to download the data:

$ wget ftp://ftp-trace.ncbi.nlm.nih.gov//sra/sra-instant/reads/ByStudy/sra/SRP/SRP001/SRP001895/SRR035616/SRR035616.sra
$ fastq-dump SRR035616.sra
$ pigz SRR035616.fastq
read more

WGBS Analysis notes -- BS-seeker2

Whole genome bisulfite sequencing analysis notes for BS-seeker2.

Step 1: Trim adapters and low quality bases

Illumina TruSeq adapter sequence: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 50 reads.fastq \
| seqtk trimfq -l 50 - \
| pigz > filtered_reads.fq.gz
read more

bioRxiv

A look at bioRxiv preprints

Tim Stuart
2 March 2016

After posting a my first preprint to bioRxiv a few weeks ago, I have been periodically checking the number of views and PDF downloads. I became interested to see how many downloads or views the preprints on bioRxiv typically get, but this type of information isn’t actually available. What are the all-time top bioRxiv preprints? How many people are reading bioRxiv preprints on average? No-one knows! Altmetric must track this data, as it will tell you how a particular preprint ranks in relation to others, but that data hasn’t been made publicly available (as far as I can tell).

read more

A paper a day

In an effort to read more papers this year, I’m going to read a paper (or something paper-like) each day for the remainder of the year, and post each paper below as I go.

Dec 31

Galanter JM, Gignoux CR, Oh SS, Torgerson D, Pino-Yanes M, Thakur N, et al. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. eLife. doi:10.7554/eLife.20532

read more

Lorne Genome 2016 presentation

This week I presented some recent work at the Lorne Genome conference. It was my first time at Lorne and it’s a fantastic meeting with lots of interesting talks. It was great to see so many people getting interested in transposon biology too!

Here are a few links to the work I was presenting:

TEPID TE presence/absence variant discovery software:

https://github.com/ListerLab/TEPID

Poster:

https://dx.doi.org/10.6084/m9.figshare.2082757.v1

Manuscript:

http://dx.doi.org/10.1101/039511

read more

Extract reads from bam file by read name

While there are very fast and easy ways to extract reads from a bam file according to mapping location, extracting reads by read name is more difficult.

Simple methods, like using grep, are incredibly slow if you want to look for more than a few reads.

Luckily, pysam allows you to index a bam file by read name (using pysam.IndexedReads(AlignmentFile)) while keeping the sort order. However, the index is stored in memory so this can use a lot of RAM (my tests with a 5.7 GB bam file used about 9 GB RAM).

I wrote a small python script (below) that uses pysam to extract reads by read name from a bam file.

Extracting 10 reads from a 5.7 GB bam file, just using grep is slightly faster than the python script:

read more