Skip to Main Content
UMass Chan Medical School, Lamar Soutter Library. Education. Research. Health Care. Empowering the future. Preserving the past.
UMass Chan Medical School Homepage Lamar Soutter Library Homepage

Analysis-Ready Data Sets Tutorial

Step by step tutorial on how to make your data analysis-ready using Excel or REDCap.

Cleaning

Best Practices for Cleaning Data Sets

Before performing an analysis of your data or sharing it with an analyst, review the data set for any inaccuracies, inconsistencies, or sensitive data. Cleaning your data allows you to identify outliers or errors before you compile your results.

  1. Check for outliers
    Outliers are infrequent values far from the norm, often caused by conversion errors or data entry mistakes, which must be explained or repaired (both).
    Ensure all data elements are in the correct formats and ranges.
  2. Check for missing data
    Ensure that there are no data items or records that are missing, creating null elements. If you are manually entering data from a print original document (ie: a paper patient survey), check the digital record against the source to make sure any missing data are not errors.
    Code missing data appropriately.
  3. Ensure that your data does not contain Protected Health Information (PHI).
    Health Insurance Portability and Accountability Act (HIPAA) requires that researchers protect the privacy and confidentiality of their patients. No individually identifiable health information should be included in your data sets.