Tim Stuart     About     Archive     Publications     Feed

bioRxiv 2017 update

I first looked at the biorxiv submission data back in March 2016. A lot has changed since then, and biorxiv has grown nearly 5-fold. Time for an update.

collection_date <- ymd("2017_10_04")

dat <- fread("~/Documents/GitHub/biorxivData/data/biorxiv_data_2017_10_04.tsv") %>% 
  mutate(Age = collection_date - ymd(`Original submission`),
         Revised = `Original submission` != `Current submission`)
read more

Useful bioinformatics

Trim reads

With cutadapt

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 30 -o output.fq.gz input.fq.gz

With trimmomatic

trimmomatic PE -threads 10 \
      reads_1.fastq.gz reads_2.fastq.gz \
      reads_1_trim.fq reads_1_se_trim.fq reads_2_trim.fq reads_2_se_trim.fq \
      ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 \
read more

Using python decorators

Yesterday I wrote my first python decorator. Decorators have always seemed a bit mysterious to me, but having finally written one I can see a bit better how they work. This is the decorator I wrote:

read more

Create a computational lab notebook with bookdown

Every data analysis I do now is kept in an R Markdown document. These are great for mixing code with explanatory text, and you can run code in many languages not just R. Whenever I finished working on something, I would compile the R Markdown document into a self-contained html report and save is somewhere, usually with a descriptive filename like “coverage_genes” or “col_vs_cvi”.

read more

Installing Magic

Installing magic

Recently a method for imputing single cell gene expression matricies was posted on biorxiv by David van Dijk et al., called magic (Markov Affinity-based Graph Imputation of Cells). I’ve been analysing single cell RNA-seq data recently, and this method looks like it could be useful when trying to find co-transcriptional networks, as single cell data suffers from dropout which makes finding co-transcriptional networks hard.

read more

R demo

Table of Contents

Getting started

Clone the repo if you haven’t already:

git clone https://github.com/timoast/dac.git

Install RStudio.

Install the following packages:

read more

smRNA analysis notes

I recently analysed some smRNA data for a paper I’m working on. These are my analysis notes.

I used previously published data for Brachypodium, from this paper:

Garvin DF, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas S, et al. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463: 763–768. doi:10.1038/nature08747

First step is to download the data:

$ wget ftp://ftp-trace.ncbi.nlm.nih.gov//sra/sra-instant/reads/ByStudy/sra/SRP/SRP001/SRP001895/SRR035616/SRR035616.sra
$ fastq-dump SRR035616.sra
$ pigz SRR035616.fastq
read more

WGBS Analysis notes -- BS-seeker2

Whole genome bisulfite sequencing analysis notes for BS-seeker2.

Step 1: Trim adapters and low quality bases


| seqtk trimfq -l 50 - \
| pigz > filtered_reads.fq.gz
read more


A look at bioRxiv preprints

Tim Stuart
2 March 2016

After posting a my first preprint to bioRxiv a few weeks ago, I have been periodically checking the number of views and PDF downloads. I became interested to see how many downloads or views the preprints on bioRxiv typically get, but this type of information isn’t actually available. What are the all-time top bioRxiv preprints? How many people are reading bioRxiv preprints on average? No-one knows! Altmetric must track this data, as it will tell you how a particular preprint ranks in relation to others, but that data hasn’t been made publicly available (as far as I can tell).

read more

A paper a day

In an effort to read more papers this year, I’m going to read a paper (or something paper-like) each day for the remainder of the year, and post each paper below as I go.

Dec 31

Galanter JM, Gignoux CR, Oh SS, Torgerson D, Pino-Yanes M, Thakur N, et al. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. eLife. doi:10.7554/eLife.20532

read more