Codebooks are important sources of data documentation. They can have
different purposes and meanings depending on whether your data are
quantitative or qualitative.
Codebooks (sometimes called data dictionaries) are essential to
understanding your data (both for yourself and any other users). Without
a codebook it’s often impossible to make sense of the information
present in your data since variable names alone are often not enough to
understand what a variable represents. Your codebook is also required
for the correct interpretation of categorical variables and the meaning
their values. Various statistical programs offer ways to label your
variables and the values of your categorical variables, however, it is
always advised to have a separate codebook containing this information,
particularly when data are saved in a proprietary format. Codebooks can
also be used to document variable-level information, so a codebook
separate from your dataset is an essential document.
The RDM LibGuide describes the elements that are
essential for a good codebook. You should also aim to include in your
codebook (where applicable):
While a codebook tends to provide more specific data information,
there can be overlap between what you document in a codebook and what
you document in a README
file. You can opt to cross-reference between these two documents,
rather than documenting the same thing in two places, however you must
then ensure that these cross-references are maintained (i.e. don’t go
changing your folder structure and make it impossible to find the
cross-referenced file(s) later on).
You may create a codebook when coding qualitative data (see Tip 1 on
this page about qualitative coding). It is important to
clearly document the meaning of the codes you create.
If you are using Atlas.ti for coding, a codebook can be created by
exporting codes and code comments to Excel: