A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The BioProject Database is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for a cellular organism.
The BioProject Quick Start Guide can help you begin using this resource.
There are several ways to search and retrieve data from GenBank.
The Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC).
Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genomoe annotation to reporting locations of sequence variation in medical records. The RefSeq collection can be retrieved in several different ways including:
Read more about RefSeq in Pruitt et al., The Reference Sequence (RefSeq) Database.
The database of single nucleotide polymorphisms (dbSNP) and multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants. Kitts and Sherry's, The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation, explains how to use this resource.