Deposit of data underpinning theses, #lovedata18
In our last post to mark Love Data Week 2018 we summarise the work done towards the deposit of data underpinning theses.
It is almost a year ago that the Research Data Management (RDM) team began work on a process for the deposit of data underpinning PhD theses. The project is in close collaboration with our colleagues from the e-theses team, Janet Aucock and David Collins. The new process was launched at the end of January 2018.
How did we approach this task? Janet, David and I met regularly (a few times a month) looking at different aspects of the process: workflows, communication, guidance and more. The majority of the project team meetings was dedicated to the development of detailed workflows. We wanted to make sure that the new data deposit workflow integrated seamlessly with the existing student submission and the e-theses team processes. Most importantly, we didn’t want to affect how the students submit their theses. We also knew that the two existing workflows (for research data deposit and e-theses deposit) had to cross at some point but we wanted to minimise (ideally keep down to zero) the number of extra emails required between the teams. How to achieve this? We knew we had to look for integrated ‘triggers’, standard actions from one team’s workflow that would inform actions for the other team, a domino effect. Luckily for us, a new online theses declaration tool was being introduced by Registry, a project led by Jacqui Ritchie (Registry Co-ordinator, Continuous improvement). The tool allows the student to create their declaration by answering a series of questions (the tool is much more interesting than my very simple description of it). After meeting with Jacqui, it became clear that the declaration tool could offer some of those triggers we were looking for. When key tasks are completed by the students, the tool sends them a confirmation email and copies the RDM team in. Other triggers are emails that the e-theses team send to the students as part of their workflow and that are now also sent to us, and vice versa.
So, what about the data deposit workflow? The deposit of data happens alongside the thesis submission. Students are asked to create dataset records in Pure before submission of their theses for examination and upload their data files after examination, once the revisions are approved. At this point we provide them with a DOI that they can include in the full text of their thesis. Of course this is the simple version of the workflow. We need to check whether there are dataset embargo requests, which can be on files only, on files and description, on files, description and title. We also need to make sure the Pure dataset record is updated accordingly and that we deal with sensitive or confidential data correctly. Finally there is the question of whether they have used secondary data and how we can record this information in Pure.
The process we developed during our meetings should cover most of the possible scenarios for the publication of data underpinning theses. Still, as this is a new process, we are of course keeping it under constant review to further improve and optimise it.
Some questions still remain to be answered such as the difference between what we define as ‘underpinning’ and ‘supplementary’ data. When is a dataset an integral part of a thesis, such as an appendix? When is it an independent output? Some cases are straightforward to define, others require investigation and dialogue with the students.
If you want to find out more, next week (20 February 2018), Janet and I will present an overview of the project and the workflows at the OCLC EMEA Regional Council Meeting in Edinburgh. If you can’t make the meeting, over the next few months we will also post updates on this blog, we will share improvements we made to the workflows and our lessons learnt, so watch this space!