Skip to Main Content
UMass Chan Medical School, Lamar Soutter Library. Education. Research. Health Care. Empowering the future. Preserving the past.
UMass Chan Medical School Homepage Lamar Soutter Library Homepage

Researcher Tools, Services and Support

The purpose of this guide is to provide resources and information to the UMass Medical School community about the Library's research and scholarly communication services.

BioProject (formerly Genome Project)

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The BioProject Database is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism.

The BioProject Quick Start Guide can help you begin using this resource.

Sequencing Databases

  • The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
  • The Expressed Sequence Tags database (dbEST) is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression, find potential variation, and annotate genes. 
  • The Genome Survey Sequences database (dbGSS) is a collection of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences, clone-end sequences, and exon-trapped sequences.

FAQ: Which of these three databases should I search?

The Nucleotide, GSS, and EST databases all contain nucleic acid sequences. The data in GSS and EST are from two large bulk sequence divisions of GenBank. Searching any of the three will provide links to results in the others, however unless you know that you are trying to find a specific set of EST or GSS sequences, searching the Nucleotide database with general text queries will produce the most relevant results. You can always follow links to results in EST and GSS from the Nucleotide db results.


Other Related Resources

How to Obtain the Genomic Sequence for a Gene


There are several ways to search and retrieve data from GenBank.

  • Search GenBank for sequence identifiers and annotations with Entrez Nucleotide , which is divided into three divisions: CoreNucleotide (the main collection), dbEST (Expressed Sequence Tags), anddbGSS (Genome Survey Sequences).
  • Search and align GenBank sequences to a query sequence using BLAST (Basic Local Alignment Search Tool). BLAST searches CoreNucleotide, dbEST, and dbGSS independently; see BLAST info for more information about the numerous BLAST databases.
  • Search, link, and download sequences programatically using NCBI e-utilities .


The Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC).

Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genomoe annotation to reporting locations of sequence variation in medical records. The RefSeq collection can be retrieved in several different ways including:

  • PubMed
  • Nucleotide
  • Protein
  • Gene
  • Map Viewer
  • RefSeq FTP site

Read more about RefSeq in Pruitt et al., The Reference Sequence (RefSeq) Database.


The database of single nucleotide polymorphisms (dbSNP) and multiple small-scale variations that includ insertions/deletions, microsatellites, and non-polymorphic variants. Kitts and Sherry's, The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation, explains how to use this resource.