What are codebooks/data dictionaries?

Codebooks are important sources of data documentation. They can have different purposes and meanings depending on whether your data are quantitative or qualitative.

Codebooks for quantitative data

Codebooks (sometimes called data dictionaries) are essential to understanding your data (both for yourself and any other users). Without a codebook it’s often impossible to make sense of the information present in your data since variable names alone are often not enough to understand what a variable represents. Your codebook is also required for the correct interpretation of categorical variables and the meaning their values. Various statistical programs offer ways to label your variables and the values of your categorical variables, however, it is always advised to have a separate codebook containing this information, particularly when data are saved in a proprietary format. Codebooks can also be used to document variable-level information, so a codebook separate from your dataset is an essential document.

The RDM LibGuide describes the elements that are essential for a good codebook. You should also aim to include in your codebook (where applicable):

  • What each record in the dataset refers to
    • In the case of questionnaires, this is usually a single respondent, but might also be a household
    • In the case of experimental data, this could be repeated measurements for a single subject
  • Unit of measurement, if applicable
  • The meaning of any numerical representations of categorical variables (e.g. 1 = non-binary, 2 = female, 3 = male etc.)
  • The numerical codes for missing values
    • If there are various reasons why data could be missing (e.g. missing due to non-response versus missing due to questionnaire routing) make sure to describe this for each type of missingness.
  • The source used if variables (or the categories used in a variable) are based on existing standards
  • For questionnaire data, any weighting variables used, if applicable
  • Explanation of how new derived variables were created
    • You can also refer to the code used to create the derived variable or the section of the logbook that describes the process

While a codebook tends to provide more specific data information, there can be overlap between what you document in a codebook and what you document in a README file. You can opt to cross-reference between these two documents, rather than documenting the same thing in two places, however you must then ensure that these cross-references are maintained (i.e. don’t go changing your folder structure and make it impossible to find the cross-referenced file(s) later on).

Codebooks for qualitative data

You may create a codebook when coding qualitative data (see Tip 1 on this page about qualitative coding). It is important to clearly document the meaning of the codes you create.

If you are using Atlas.ti for coding, a codebook can be created by exporting codes and code comments to Excel:

  • More information on Atlas.ti codebooks can be found at the bottom of this page
  • Ensure that at a minimum each code and code comment are exported and included in your Atlas.ti codebook