Resource Guides: NCBI and Sequencing Databases: NCBI Databases

Overview & Popular NCBI Databases and Tools

The National Center for Biotechnology Information (NCBI) advances science and health by providing access to biomedical and genomic information.

Popular NCBI Databases and Tools:

Basic Local Alignment Search Tool (BLAST)
Finds regions of local similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.

Gene
Portal to gene information in most NCBI databases. Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide. Search with an organism name and gene symbol or name to find everything known about it (e.g., what it’s been called, associations with diseases, what it does, what it looks like, related experimental datasets, and more!).

Nucleotide
Collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.

Protein
Portal to protein information in most NCBI databases. The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
PubMed
Public access database to citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
To ensure you see the check for full text at UMass Chan button when you are logged into NCBI, add the UMassMed Outside Tool to your NCBI account. To add: Sign in to My NCBI. >> Select your username in the upper right corner to access the Account Settings page. >> Select Site Preferences >> Select Outside Tool under PubMed Preferences. >> Check the box next to "UMassMed users can check for full text" from the list of Outside Tool services available.

The power of NCBI's resources is found in their relationship to one another, as most are linked together (check out this example from the Structure Group), providing a comprehensive toolkit for researchers in biomedicine. View the full list of tools and resources:

NCBI Databases and Tools
The full list of biomedical and genomic databases and tools provided by the National Center for Biotechnology Information (NCBI).

To programmatically search and download records from NCBI databases, use the Entrez Programming Utilities (E-utilities).

Help with NCBI Databases

Online tutorials and help are available at each site, and a nice collection of tutorials can be found on NCBI's Section of the National Library of Medicine (NLM) YouTube channel.

Other help and tutorials can be found on the NCBI Tutorials website.

NCBI Help Manual
Collection of help documents for resources at NCBI.
The NCBI Handbook, 2nd edition
Comprehensive overview of the breadth of informatics resources at NCBI, and an in-depth account of the scope, data, infrastructure, processing, and access for each major database or resource. Geared towards advanced users of NCBI resources to provide an understanding of how bioinformatics resources at NCBI work.

An annual update on database resources is published in Nucleic Acids Research:

Sayers,E, et al. (2024). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research.

Genetics & Medicine

Databases

ClinVar
Aggregates information about genomic variation and its relationship to human health.
dbGaP
Archives and distributes the results of studies that investigated the interaction of genotype and phenotype in humans. Users can obtain controlled access to data, download public data, and contribute their own results to the database.
Genetic Testing Registry
Provides a central location for voluntary submission of genetic test information by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials.

HuGE Navigator
An integrated, searchable knowledge base of genetic associations and human genome epidemiology.

Medline Plus Genetics
Consumer-friendly information about the effects of genetic variation on human health.
NCBI Virus
Community portal for viral sequence data from RefSeq, GenBank and other NCBI repositories.
OMIM
The Online Mendelian Inheritance of Man is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes. Developed for the web by NCBI. Currently, hosted, authored, and edited by Johns Hopkins University.

Publication

GeneReviews
Collection of expert-authored, peer-reviewed disease descriptions on the NCBI Bookshelf that apply genetic testing to the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions.

Other NCBI Sequence Databases

BioProject
Database of complete and in-progress large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism and their data outputs. A BioProject record provides users a single place to find links to the diverse data types generated for that project.
Conserved Domain Database (CDD)
Resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.
dbSNP
Database of single nucleotide polymorphisms and multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants.
dbVAR
Database of genomic structural variation.

Gene Expression Omnibus (GEO)
Functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Search for studies in GEO Datasets and individual gene expression profiles based on gene annotation or pre-computed profile characteristics in GEO Profiles.

Pathogen Detection (Beta)
Bacterial and fungal pathogen genomic sequences from numerous ongoing surveillance and research efforts. System clusters related pathogen genome sequences to identify possible transmission chains and screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences.
PopSet
Database contains related nucleotide sequences that originate from comparative studies: phylogenetic, population, environmental (ecosystem), and mutational. Each record in the database is a set of nucleotide sequences representing the same molecule from the same species (population, mutation), different identifiable species (phylogenetic), or anonymous species from the same biological community (ecosystem).
Protein Clusters
Collection of related protein sequences (clusters) consists of proteins derived from the annotations of whole genomes, organelles and plasmids. It currently limited to Archaea, Bacteria, Plants, Fungi, Protozoans, and Viruses.
Protein Family Models
A collection of models representing homologous proteins with a common function. It includes conserved domain architecture, hidden Markov models and BlastRules.
RefSeq
Collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC).
Sequence Read Archive
High throughput sequencing data from next-generation sequencing technologies. Includes both raw sequencing data and alignment information in the form of read placements on a reference sequence.
Structure
Molecular modeling database that contains macromolecular 3D structures derived from the Protein Data Bank, as well as tools for their visualization and comparative analysis.
Taxonomy
Curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet. Good way to find all the NCBI records for a particular species.

Other Sequence Databases

BioGRID
General repository for interaction datasets.

Proteopedia
Collaborative 3D-encyclopedia of proteins & other molecules.

Other NCBI Sequence Analysis Tools

BLAST RefSeqGene
Performs a BLAST search of the genomic sequences in the RefSeqGene/LRG set. The default display provides ready navigation to review alignments in the Graphics display.
COBALT
COBALT is a protein multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST.
Comparative Genome Viewer
Compare two genomes based on assembly-assembly alignments provided by NCBI.
Conserved Domain Search Service (CD Search)
Identifies the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (Reverse Position-Specific BLAST) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD).
Magic-BLAST
Maps large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons.
ORF Finder (Open Reading Frame Finder)
A graphical analysis tool that finds all open reading frames in a user's sequence or in a sequence already in the database. Sixteen different genetic codes can be used. The deduced amino acid sequence can be saved in various formats and searched against protein databases using BLAST.
ProSplign
A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors.
Sequence Viewer
Provides a configurable graphical display of a nucleotide or protein sequence and features that have been annotated on that sequence. In addition to use on NCBI sequence database pages, this viewer is available as an embeddable webpage component. Detailed documentation including an API Reference guide is available for developers wishing to embed the viewer in their own pages.
Variation Viewer
Interactive examination and download of nucleotide variants for a specific locus.
VecScreen
A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a query sequence for segments that match any sequence in a specialized non-redundant vector database (UniVec).

Other Sequence Analysis Tools

PSORT
Programs for subcellular localization prediction as well as other datasets and resources relevant to localization prediction.
Sanger Institute
Software, data downloads, databases available from the Institute.

Chemical & Bioassay Databases

BioSample
Descriptions of biological source materials used in experimental assays.

PubChem
Collection of freely accessible chemical information. Search chemicals by name, molecular formula, structure, and other identifiers. Find chemical and physical properties, biological activities, safety and toxicity information, patents, and citations.

NLM Technical Bulletin

The weekly update of technical news from the National Library of Medicine. You'll find previews of coming changes to databases, tips on tools, and more.

How to Obtain the Genomic Sequence for a Gene

How to retrieve data from GenBank

There are several ways to search and retrieve data from GenBank.

Search GenBank for sequence identifiers and annotations with Nucleotide.

Search and align GenBank sequences to a query sequence using BLAST (Basic Local Alignment Search Tool). BLAST searches Nucleotide; see BLAST info for more information about the numerous BLAST databases.

Search, link, and download sequences programmatically using NCBI e-utilities .

NCBI and Sequencing Databases

Need Help?

Research and Scholarly Communication Services Support