The Protein database is a collection of sequences from several sources including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
The CDD is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.
Learn more about Conserved Domains and Protein Classification.
The Protein Clusters database is a collection of protein sequences, consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes. The database provides easy access to annotated information, publication, domains, structures, external links, and analysis tools including multiple alignments, phylogenetic trees, and genomic neighborhoods (ProtMap).
A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The BioProject Database is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism.
The BioProject Quick Start Guide can help you begin using this resource.
The Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC).
Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genomoe annotation to reporting locations of sequence variation in medical records. The RefSeq collection can be retrieved in several different ways including:
Read more about RefSeq in Pruitt et al., The Reference Sequence (RefSeq) Database.