Software & Tools
45+ bioinformatics tools curated by category - language tags, usage notes, and direct links for genomics, RNA-seq, metagenomics, variant calling, and data visualisation.
Essential Starting Points
Core tools every bioinformatician reaches for - sequence search, quality control, and variant discovery.
Sequence alignment and database search for similarities across DNA, RNA, and protein. The most widely used bioinformatics tool in existence.
Extended Unix awk tailored for biological data - processes FASTQ, FASTA, SAM, and BED natively.
Broad Institute's suite for variant discovery and genotyping. Industry standard for germline and somatic calling.
Ultra-fast all-in-one FASTQ preprocessor - adapter trimming, quality filtering, QC reporting, deduplication in one pass.
Quality control for raw sequencing data - per-base quality scores, GC content, overrepresented sequences, adapter content.
The R Ecosystem
Everything you need to get R running for bioinformatics. Three essential starting points.
The statistical computing language behind DESeq2, edgeR, limma, Seurat, and 2,000+ Bioconductor packages.
Primary source for installing R on Windows, macOS, and Linux. Start every R project here.
Download RStudio Desktop (free) - the most popular IDE for R, maintained by Posit.
Home for 2,000+ R packages for bioinformatics - DESeq2, edgeR, limma, GenomicRanges, and hundreds more.
Pipeline & Workflow Systems
Scale beyond single scripts - workflow languages used in production bioinformatics and HPC environments.
Python-based workflow management for reproducible, scalable bioinformatics pipelines.
Documentation →Scalable workflow framework with native HPC and cloud support. nf-core provides production pipelines.
Documentation →Broad Institute's engine for portability across local, HPC, and cloud. Executes WDL and CWL.
Documentation →Workflow Description Language - portable specification used on Terra and with Cromwell.
Documentation →RNA-seq Analysis Tools
Alignment, quantification, and differential expression - the complete RNA-seq analysis stack.
R/Bioconductor package for differential gene expression using negative binomial modelling.
R package for differential expression using empirical Bayes and count-based models.
Fast splice-aware aligner for mapping RNA-seq reads to a reference genome.
Ultrafast splice-aware aligner - gold standard for mapping at scale.
Alignment-free transcript quantification. Extremely fast and memory-efficient.
Step-by-step RNA-seq differential expression guide using Limma, Glimma, and edgeR in R.
Visualisation Resources
Tools and references for publication-quality graphics and interactive dashboards.
Guidance on choosing the right chart, with R, Python, D3.js, and React examples.
Official printable cheat sheets for ggplot2, dplyr, tidyr, and RStudio.
RColorBrewer, viridis, and ggplot2 palettes with visual previews.
Data visualisation platform for interactive dashboards. Free tier available.
Metagenomics Tools
Classify and profile microbial communities from amplicon and shotgun sequencing data.
Variant Calling & Structural Analysis
Detect, genotype, and interpret genetic variants - from SNPs to large structural rearrangements.
Toolset for genome-wide association studies and population genetics.
Haplotype-based variant detector for SNPs, MNPs, indels, and complex variants using Bayesian modelling.
Command-line tools for variant calling and manipulating VCF/BCF files. Part of HTSlib.
Structural variant detection using split reads, discordant pairs, and read depth signals.
Illumina's structural variant and indel caller for paired-end sequencing - rapid and scalable.
Additional Useful Tools
Networks, notebooks, containers, genome browsers, and open platforms for reproducible research.
Visualisation and analysis of biological networks and pathways.
Interactive notebook environment for Python, R, and Julia.
Containerise workflows for fully reproducible, portable pipelines.
Container system optimised for HPC environments - no root required.
Open-source platform for reproducible, browser-based bioinformatics.
Interactive tool for exploring genome annotations and custom data tracks.
Key Bioinformatics File Formats
A quick reference to the file formats you'll encounter most in genomics workflows.
Raw reads with per-base quality scores. Standard output from Illumina sequencers.
Biological sequences without quality information. Used for reference genomes.
Aligned reads. BAM is the compressed binary form. Index (.bai) required for access.
Compressed alternative to BAM using reference-based compression. Reduces storage costs.
Variant Call Format - SNPs, indels, and structural variants with genotype data.
Tab-delimited genomic intervals (chromosome, start, end) with optional extra fields.
Gene annotation files describing features - genes, transcripts, exons, and UTRs.
AnnData format - standard for single-cell analysis (Scanpy). Stores matrices and metadata.
Serialised R object - Seurat, DESeqDataSet, SummarizedExperiment saved to disk.
Universal tabular format for count matrices, metadata, and analysis results.
More resources, more depth
Create a free account to unlock the full library - additional tools, tutorials, datasets, and curated community picks.