Before performing an analysis of your data or sharing it with an analyst, review the data set for any inaccuracies, inconsistencies, or sensitive data. Cleaning your data allows you to identify outliers or errors before you compile your results.
Check for outliers
Outliers are infrequent values far from the norm, often caused by conversion errors or data entry mistakes, which must be explained or repaired (both).
Ensure all data elements are in the correct formats and ranges.
Check for missing data
Ensure that there are no data items or records that are missing, creating null elements. If you are manually entering data from a print original document (ie: a paper patient survey), check the digital record against the source to make sure any missing data are not errors.
Code missing data appropriately.
Ensure that your data does not contain Protected Health Information (PHI).
Health Insurance Portability and Accountability Act (HIPAA) requires that researchers protect the privacy and confidentiality of their patients. No individually identifiable health information should be included in your data sets.