Databases & Data
Genomics and Sequence Databases
GenBank
Repository for nucleotide sequences and their protein translations. Includes annotations and metadata.Ensembl Genome Browser
Genome-scale data for vertebrates, plants, and more. Includes annotations, comparative genomics, and regulatory data.UCSC Genome Browser
Interactive platform for viewing genome assemblies with data on genes, sequences, regulatory elements, and conservation.RefSeq
Curated collection of DNA, RNA, and protein sequences. Provides reference standards for genome annotation.DDBJ (DNA Data Bank of Japan)
Partner to GenBank and EMBL-EBI, sharing nucleotide sequence data globally.GATK Resource Bundle
A comprehensive set of reference files, including:- Human reference genomes (e.g.,
GRCh38
,hg19
)
- Variant calling reference files like known SNPs (
dbSNP
) and indels (Mills_and_1000G
).
- Precomputed files for Base Quality Score Recalibration (
BQSR
) and Variant Quality Score Recalibration (VQSR
).
- Human reference genomes (e.g.,
dbSNP
SNP reference files for population genetics and variant calling.
Protein and Structural Databases
UniProt
Comprehensive resource for protein sequence and functional information.Protein Data Bank (PDB)
Repository of 3D structural data of proteins, nucleic acids, and complexes.InterPro
Functional analysis of proteins by classifying sequences into families.Pfam
Database of protein families and associated annotations.STRING
Database for protein-protein interaction networks.
Functional and Pathway Databases
KEGG (Kyoto Encyclopedia of Genes and Genomes)
Collection of manually curated pathway maps and functional annotations.Reactome
Curated pathways of biological processes.BioCyc
Database of metabolic pathways and genome annotations.WikiPathways
Open-access platform for biological pathway knowledge.
Oncology and Cancer Genomics Databases
COSMIC (Catalogue of Somatic Mutations in Cancer)
Comprehensive database of somatic mutations in human cancers. Includes data on mutations, gene fusions, copy-number changes, and clinical correlations.GDC (Genomic Data Commons)
Repository for cancer genomic and clinical data.ICGC (International Cancer Genome Consortium)
Platform providing genomic and clinical data from cancer research studies.TCGA (The Cancer Genome Atlas)
A dataset for cancer genomics and phenotypes.
Microbial and Metagenomics Databases
MG-RAST
Platform for analysis of metagenomic data.Greengenes
16S rRNA gene database for microbial taxonomy and phylogeny.SILVA
Comprehensive database for ribosomal RNA sequences.IMG/M
Integrated Microbial Genomes and Microbiomes platform.
Epigenomics and Regulatory Databases
ENCODE
Resource of functional genomics data for gene regulation.Roadmap Epigenomics
Epigenetic data for diverse cell types and tissues.JASPAR
Open-access database of transcription factor binding profiles.GTRD (Gene Transcription Regulation Database)
Transcription factor binding sites and gene regulation data.
Metagenomics Tools
QIIME2
Platform for analysing and visualising microbiome data.Kraken2
A taxonomic sequence classification system.MetaPhlAn
A tool for profiling the composition of microbial communities.
Variation and Population Genetics Databases
1000 Genomes Project
Catalog of human genetic variation.gnomAD
Aggregated human genome and exome data.HapMap
Resource for human haplotypes and SNP frequencies.ExAC
Database of exome sequencing data from diverse populations.
Miscellaneous Resources
Gene Ontology (GO)
Resource for functional annotation of genes.ArrayExpress
Archive for functional genomics data.BioSamples
Metadata repository for biological samples.Human Protein Atlas
Comprehensive data on protein expression in human tissues and organs.