Skip to Main Content
UMass Chan Medical School, Lamar Soutter Library. Education. Research. Health Care. Empowering the future. Preserving the past.
UMass Chan Medical School Homepage Lamar Soutter Library Homepage

Researcher Tools, Services and Support

The purpose of this guide is to provide resources and information to the UMass Medical School community about the Library's research and scholarly communication services.

BioProject (formerly Genome Project)

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The BioProject Database is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism.

The BioProject Quick Start Guide can help you begin using this resource.


Epigenomics is a resource to explore and visualize richly-annotated epigenomics datasets.

Other Useful Resources

Clone DB

The Clone database integrates information about clones and libraries, including sequence data, map positions and distributor information. (Formerly the NCBI Clone Registry.)

A comprehensive overview of this resource is available.


  • The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
  • The Expressed Sequence Tags database (dbEST) is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression, find potential variation, and annotate genes. 
  • The Genome Survey Sequences database (dbGSS) is a collection of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences, clone-end sequences, and exon-trapped sequences.

FAQ: Which of these three databases should I search?

The Nucleotide, GSS, and EST databases all contain nucleic acid sequences. The data in GSS and EST are from two large bulk sequence divisions of GenBank. Searching any of the three will provide links to results in the others, however unless you know that you are trying to find a specific set of EST or GSS sequences, searching the Nucleotide database with general text queries will produce the most relevant results. You can always follow links to results in EST and GSS from the Nucleotide db results.


Genome organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations. It contains information on more than 1000 organisms. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.


UniSTS is a comprehensive database of sequence tagges sites (STSs) derived from STS-based maps and other experiments. STSs are defined by PCR primer pairs and are associated with additional information such as genomic position, genes, and sequences.

Genome Reference Consortium

This group is responsible for the human and mouse reference genomes. Along with NCBI, embers include:

  • The Genome Center at Washington University
  • The Wellcome Trust Sanger Institute
  • The European Bioinformatics Institute (EBI)

The GRC works to correct misrepresented loci and to close remaining assembly gaps. The public website allows users to view genomic regions currently under review, report genome-related problems, and contact the Consortium.