With Excel Examples!
When you use Excel, you have to do this yourself.
Be sure to put field names at the top of each column in an Excel spreadsheet. The letters at the top of each column and the numbers at the start of each row are not interpreted by statistical programs. In addition, they do not tell statisticians anything about the type of data contained in cells.
When you use Excel, be sure to add a study ID of some sort.
Be sure to add a study ID to each subject about which you are collecting data; consider these the names of rows. Study IDs must be unique so that rows are not confused with one another.
The data below lacks unique IDs for each subject. The field names at the top of the file make it clear that the data contains data for four subjects from the study baseline (columns A-D) and three subjects from the follow-up (E-H). What is not clear is which subject did not show up for the follow-up. Without knowing that, we don’t know whether her MTOP score (whatever that is) increased or decreased.
When you use Excel to store study data, you should periodically check your subject IDs and ensure that they are unique.
For a small study, you might sort the data by the subject IDs and carefully examine them for duplicates. For a larger study, you can use MS Excel’s Advanced Filtering capability to identify duplicate IDs:
Rectangular format is also known as flat files, tidy data, pivot ready data, or long data.
When you use Excel, it is possible to enter data in a non-rectangular fashion.
Rectangular format means that there are no duplicate values in any single row.
Original
Non-rectangular formats like this may look like a reasonable way to store data, but the original file above cannot be read in that form by most statistical packages. Non-rectangular data is difficult to analyze-even in MS Excel. It cannot be sorted easily, cannot be used in a pivot table, cannot be examined by using filters, and can be easily misunderstood.
Even though there are very few data points, figuring out whether MTOP scores increased or decreased from follow-up to baseline would be a difficult task.
The file from the above image is much easier to analyze, through software and visually, when recorded rectangularly, such that each cell is meaningful. Each cell is meaningful because it contains an attribute (designated by the name of the column) for a record (designated by the ID of the row, or pid in this example.)
Version 2 below is stored longitudinally, and uses a phase variable to delineate the different visits.
Version 2
Alternatively, version 3 below uses different fields for baseline and follow-up information, so it doesn’t need a phase variable to be understood. This is often termed “wide.”
Version 3
Both of these forms are acceptable, and REDCap can store data either way. They each present analytical problems that depend on the type of analysis being conducted. If your study design and analysis are complicated, you might consider discussing these different options with a statistician prior to structuring your study database.
When you use Excel, it is possible to transpose this data.
If you are storing data in MS Excel, it is easy to create a dataset using columns to represent units of analysis and using rows to store the attributes of those units, as in the example below.
Some software packages allow the creation and/or analysis of data files like the above, but most do not. MS Excel allows the creation of such files, but it cannot analyze the data thus created (for example, the dataset above cannot be sorted by gender, or filtered, or used to create a pivot table showing cross-tabulations or frequencies). Plus, it’s difficult to read.
If you have already started, however, it is easy to use MS Excel to transpose the data, so that attributes are in columns and records are in rows. Simply
When you use Excel, problems arise when data are missing in the middle of a row.
Problems often arise when data are missing in the middle of a row such as in the example below, where data row 4 is missing the FollowUp_MTOP value and the subsequent values have shifted over.
To fix a problem like this in Excel, simply insert a cell where the missing value should be and shift the values over to their proper place.
When you use Excel, discrete data should not be combined in a single column.
In the example below, column G (Name), column H (Hospital) and column I (Site/Doctor) all contain data that should be separated out into discrete columns.
If you are working in Excel, the Text to Columns feature will help to fix this problem.
Select the data that contains values to be separated (in the example column G called Name is selected), add empty columns to the right of the column being split to accommodate the number of columns that the single column will be split into, then click on Text to Columns on the data tab as shown in the example below.
Excel will launch a dialog box that will walk you through the process of separating your data.
The result of text to column procedure is shown below, don’t forget to rename the columns and apply an appropriate name to the new column
Results of Text to Columns
With renamed columns