Data in the humanities: naming conventions, versioning, documentation
This is the second in a trio of posts about research data in the humanities.
Back with some more tips and tricks for managing research data in the humanities! Each of the below could appear as a consideration in a data management plan (DMP) for a grant application. If you’ve ever prepared an editorial procedure statement, you’ll find some of the processes familiar: it’s all about consistency.
Naming conventions exist to improve searchability/findability of files, to keep consistency in file management across a project, and transparency when there is more than one person who needs to access the files (or for the sake of your future self!). So formalise it! In an accessible place linked to your project (more under ‘documentation’ below), write down an example of how you’ll name all of your files, and stick with it. It is essential to do this before you start, so that you don’t have a backlog of disorganised files to rename.
Windows files have a maximum path limit of ~256 characters. While that’s almost certainly more than you’ll need, it’s best to keep naming conventions short and sweet.
Based on what’s important for YOUR project, consider including:
- Date (at least year and month, if not day). This might look like yyyymmdd or another variation—decide ahead of time and stick to it! Spelling out the year is a good idea because a shortened version, such as 180319, can be unclear.
- Key topic/marker/origin descriptor (if you have multiple files relating to this topic, you can group them in a folder)
- Avoid using non-alphanumeric characters. Remember that Windows has a list of ‘reserved’ characters that can’t be used in file names: <, >, :, “, /, \, |, *, ?
- Dashes or underscores may be used to divide where helpful for browsing, but avoid too many as this creates a longer title that may not be fully visible in some file search settings
- Camel case (first letter of each text section) can also make the file name more readable and save characters
- Using the date order year-month-day means files are easy to sort: YyyymmddOriginDatadescriptor_ID. For example, a file name for archive notes describing a textile in the V&A’s collection might look like: 20220701VandABullerswoodNotes_T.31-1923_V1
Be explicit in your documentation:
‘My file naming convention is YyyymmddOriginDatadescriptor_ID_V[ersion]#, for example, 20220701VandABullerswoodNotes_T.31-1923_V01’.
If you are collaborating, agree ahead of time on the naming conventions that will be used across the project. See below for more on documentation.
Versioning, also known as revision control, is an important tool for data management in the humanities, especially for collaborative projects. Deliberate versioning should communicate the context in which users should approach the data, as well as information about how one version relates to other versions. This helps with data preservation and reuse, since it provides a reference point for the most current version of a file and allows previous versions to be audited and compared for consistency.
A versioning system records changes to files within a project over time, so that you can view or even revert to earlier versions at a later point in the project, compare changes across the development of the project, see who modified what and when, or which issues were introduced at which points of the project, among other things. This can be managed through file creation, tracking changes, and naming conventions, or through versioning software.
Indicate the version at the end of a file name. From the example above: 20220701VandABullerswoodNotes_T.31-1923_V01. The V01 included at the end indicates the version, and even if collaborators work on the file on the same date, they have a way of differentiating the versions of the file they’ve created. Again, be explicit about the language used to describe versions in file names when you document your naming conventions.
An example of versioning for notes collected by two researchers from a focus group of interns might look like:
- 20230523_InternFocusNotes_v1 (you created this version, raw)
- 20230523_InternFocusNotes_v1_he84edited (he84 edited v1)
- 20230523_InternFocusNotes_v2 (you accepted he84’s changes)
- 20230523_InternFocusNotes_v5_submitted (version submitted to repository)
The point of documentation is to make finding answers from your data a simpler, less time-consuming process. Consolidate all data documentation in one file or folder for the most efficiency.
Types of data documentation will vary from project to project, but can include (among many other things):
- Criteria for data collected, such as which archival records you’ll examine and why
- Method of collection
- Dates of collection
- Non-sensitive data relating to respondents in open-ended studies, such as total number listed, refusal rates, total respondents in final sample, etc.
- Issues that occur during data collection, such as notes on why you weren’t able to photograph from a certain angle or why certain categories of participants fluctuated due to inclement weather
- What will be considered outliers/inconsistencies and why
- Codes or keys for surveys or other information, such as criteria for textual imagery or coding topics of discussion in free-flowing interviews
- Definition of variables
Keeping this information clear will provide you with a fuller picture of your research, help you address incongruities, maintain consistency, and provide a strong base for writing up your research. As usual, contact your RDM team ([email protected]) if you have any questions!
If you found this helpful, stay tuned for our next post, on storage, backups, and data publication for the humanities.