Skip to Main Content
UMass Chan Medical School, Lamar Soutter Library. Education. Research. Health Care. Empowering the future. Preserving the past.
UMass Chan Medical School Homepage Lamar Soutter Library Homepage

Researcher Tools, Services and Support

The purpose of this guide is to provide resources and information to the UMass Medical School community about the Library's research and scholarly communication services.


The Protein database is a collection of sequences from several sources including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.

Conserved Domain Database (CDD)

The CDD is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.

Learn more about Conserved Domains and Protein Classification.

Protein Clusters

The Protein Clusters database is a collection of protein sequences, consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes. The database provides easy access to annotated information, publication, domains, structures, external links, and analysis tools including multiple alignments, phylogenetic trees, and genomic neighborhoods (ProtMap).

BioProject (formerly Genome Project)

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The BioProject Database is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism.

The BioProject Quick Start Guide can help you begin using this resource.

Reference Sequence (RefSeq)

The Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC).

Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genomoe annotation to reporting locations of sequence variation in medical records. The RefSeq collection can be retrieved in several different ways including:

  • PubMed
  • Nucleotide
  • Protein
  • Gene
  • Map Viewer
  • RefSeq FTP site

Read more about RefSeq in Pruitt et al., The Reference Sequence (RefSeq) Database.