Data Dictionary
A data dictionary provides human readable description of the data, providing context on the nature and structure of the data. This helps someone not familiar with the data understand, and use the data. At a minimum they should contain the following pieces of information about the data:
- variable names
- variable labels
- variable codes, and
- special values for missing data.
An example data dictionary table from incarceration trends repository. This includes information on the variable, its class (type), and a longer description.
Variable | Class | Description |
---|---|---|
year | integer (date) | Year |
urbanicity | character | County-type (urban, suburban, small/mid, rural) |
pop_category | character | Category for population - either race, gender, or Total |
rate_per_100000 | double | Rate within a category for prison population per 100,000 people |
Every data dictionary should also be provided in its raw form (e.g., a CSV) in the repository
References