Self-paced
Introduction to R for Genomics
dplyr, ggplot2, vcfR, and Bioconductor. No prior coding experience needed.
The dataset
What You’ll Analyse
Every lesson uses data from a real next-generation sequencing experiment — not toy examples.
E. coli Long-Term Evolution Experiment
From Tenaillon et al. (2016): 12 populations of E. coli propagated for over 50,000 generations. You’ll work with three sequenced strains from the NCBI SRA, tracing how a citrate-using mutant (Cit+) arose and became dominant alongside hypermutability.
- How many base-pair changes exist between Cit+ and Cit− strains?
- What specific mutations distinguish them?
- How do read depth and mapping quality vary across the genome?
# Load the variant calls variants <- read_csv(“combined_tidy_vcf.csv”)
# Filter high-confidence SNPs variants %>% filter(QUAL >= 100, !INDEL) %>% select(sample_id, POS, REF, ALT, DP)
# Plot coverage across the genome ggplot(variants, aes(x = POS, y = DP, color = sample_id)) + geom_point(alpha = 0.5) + labs(x = “Base Pair Position”, y = “Read Depth (DP)”)Curriculum
Eight Lessons, One Genomics Workflow
Every concept is taught through real variant-calling data. You learn R because you’re solving a genomics problem.
01
Introducing R & RStudio
Set up your environment, create projects, write scripts, and learn the RStudio interface.
45 min
02
R Basics
Vectors, data types, arithmetic operators, subsetting, and logical operations — the building blocks.
80 min
03
The Example Dataset
Meet the SNP dataset and VCF file format you’ll work with throughout the workshop.
15 min
04
Factors & Data Frames
Load tabular data, understand tidy data principles, work with factors, and import from Excel.
90 min
05
Bioconductor Packages
Install and use packages from the Bioconductor repository for genomics analysis.
15 min
06
Data Wrangling with dplyr
Transform data using select, filter, mutate, pipes, and the split-apply-combine pattern.
55 min
07
Visualisation with ggplot2
Build scatter plots, histograms, box plots, facets, and custom themes for publication figures.
90 min
08
Getting Help with R
Use built-in help, write effective search queries, and ask good questions on forums.
15 min
Who it’s for
Built for Bench Scientists, Not Developers
No computer science background needed. Just curiosity and a laptop.
Genomics Researchers
You generate sequencing data but rely on others for downstream analysis. This workshop gives you independence.
Graduate Students
Starting a computational project and need a structured, hands-on entry point with real biological data.
Wet Lab Researchers
You want to go from raw variant calls to figures without waiting weeks for a bioinformatician.
Beyond Excel
You’ve hit the limits of spreadsheets — data too large, analyses not reproducible, plots not publication-ready.
Before you begin
Get Set Up in 10 Minutes
STEP 01
Install R & RStudio
Download R (version 4.0+) and RStudio Desktop — both free for Mac, Windows, and Linux.
STEP 02
Download Materials
Grab the datasets and exercise files (15 MB). A quick reference guide is also available.
STEP 03
Start Lesson 1
Open RStudio, take a deep breath, and enter the workshop. We’ll walk you through everything from here.
Ready?
Start Analysing Your First Genome
Everything you need — data, code, and guided instruction — is waiting inside.
15–20 hours · Self-paced · Always available