Data in the humanities: tips and tricks

Wednesday 28 June 2023

This is the first in a trio of posts about research data in the humanities.

Naming conventions

Naming conventions exist to improve searchability/findability of files, to keep consistency in file management across a project, and transparency when there is more than one person who needs to access the files (or for the sake of your future self!). So formalise it! In an accessible place linked to your project (more under ‘documentation’ below), write down an example of how you’ll name all of your files, and stick with it. It is essential to do this before you start, so that you don’t have a backlog of disorganised files to rename.

Here are a few Windows files have a maximum path limit of ~256 characters. While that’s almost certainly more than you’ll need, it’s best to keep naming conventions short and sweet.

Consider including

Date (at least year and month, if not day). This might look like yyyymmdd or another variation—decide ahead of time and stick to it!
Key topic/marker/origin descriptor (if you have multiple files relating to this topic, you can group them in a folder)
Avoid using non alphanumeric characters where possible (see exception below for ‘V&A’)
Dashes or underscores may be used to divide where helpful for browsing, but avoid too many as this creates a longer title that may not be fully visible in some file search settings
Camel case (first letter of each text section) can also make the file name more readable
Using the date from year to month to day means files are easy to sort
YyyymmddOriginDatadescriptor_ID

For example, a file name for archive notes describing a textile in the V&A’s collection might look like: 20220701V&ABullerswoodNotes_T.31-1923_V1

Be explicit in your documentation: ‘My file naming convention is YyyymmddOriginDatadescriptor_ID_V[ersion]#, for example, 20220701V&ABullerswoodNotes_T.31-1923_V01’. If you are collaborating, agree ahead of time on the naming conventions that will be used across the project.

See below for more on documentation.

Versioning

Versioning, also known as revision control, is an important tool for data management in the humanities, especially for collaborative projects. Deliberate versioning should communicate the context in which users should approach the data, as well as information about how one version relates to other versions. This helps with data preservation and reuse, since it provides a reference point for the most current version of a file and allows previous versions to be audited and compared for consistency.

A versioning system records changes to files within a project over time, so that you can view or even revert to earlier versions at a later point in the project, compare changes across the development of the project, see who modified what and when, or which issues were introduced at which points of the project, among other things. This can be managed through file creation, tracking changes, and naming conventions, or through versioning software.

Indicate the version at the end of a file name. From the example above: 20220701V&ABullerswoodNotes_T.31-1923_V01. The V01 included at the end indicates the version, and even if collaborators work on the file on the same date, they have a way of differentiating the versions of the file they’ve created. Again, be explicit about the lanugage used to describe versions in file names when you document your naming conventions.

An example of versioning for notes collected by two researchers from a focus group of interns might look like:

20230523_InternFocusNotes_v1 (you created this version, raw)
20230523_InternFocusNotes_v1_he84edited (he84 edited v1)
20230523_InternFocusNotes_v2 (you accepted he84’s changes)
20230523_InternFocusNotes_v5_submitted (version submitted to repository)

Documentation

If this was helpful, stay tuned for our next post about tips and tricks for data management in the humanities!

Data in the humanities: tips and tricks

Leave a reply

Contact us