Data validation helps ensure that data is collected correctly.
Data should be actively validated as much as possible throughout the process of data collection and entry. This will reduce the time spent later cleaning data, explaining outliers and accounting for missing data.
- Program valid ranges for inputting data into fields when applicable.
For example, do not allow an age greater than 150 years or only allow numbers to be entered into numeric fields.
- Apply data types to fields to prevent reformatting.
For example, set up date fields to only accept dates rather than setting them up as free text fields.
- Prevent the entry of leading and/or trailing spaces or other characters ($, #, %, *, /, etc.) that may interfere with data analysis.
- Plan for “other” data responses.
Lists of answers to single-answer questions should include all possible responses. To encompass all the possibilities, include an “other,” category as answer and create a text box to capture the information.
- Plan for “prefer not to answer”
Leave a place for the interviewer to record “did not choose to answer” or “does not know.” This will avoid null data.