The D’s of Data: Data Management Plans
This is the first part in our 2026 series on data.
When it comes to planning for data, there is often a sense of overwhelm. Templates or other guidance can help, but sometimes these still don’t clarify what’s required. In this post we’ll break down 7 easy topics that should be considered in a best-practice data management plan.
1. Details: data types, formats, and expected volume
This may seem obvious but the answers to these simple questions will determine what you can do in the rest of the data management plan. For instance, if you plan to submit raw data to a repository to get a DOI that can be used to cite your work, but you will end up with several terabytes of raw data, you might need to explore other options. Storage considerations could shift as well. Where and how you will work on analysing the data will depend on the file formats/types of data. This information usually doesn’t take much space from the word count and lends a lot of credibility to the plan if you’re writing a grant application.
2. Collecting the data
The first step is to plan your data collection or generation and how you will transfer it to the platform for analysis during your project. This includes both getting the data and securing it at the point where you plan to work on it. Will you be travelling to collect the data? How will you secure it for transfer? Where will you be analysing it, and how will you get it there? If you have physical data (samples), how will you obtain them and convey them to the lab or other location where you are planning to analyse them? For digital data, will you need to alter the format for analysis (e.g. transcribing interview audio or converting file types)? Is code involved? If you have sensitive data, at what point will it be anonymised/pseudo anonymised and how?
3. Storage, backup, and access
Storage, backup, and access relate to storing and securing the data while you (and anyone else who has access to it) are actively working on it. Is the planned storage cloud-based or not and why? Reasons for cloud-based storage might include easier access for collaborators, autosaving, and support from your university’s IT services department. Security of sensitive data is a common reason for server-based storage. Storing data solely on external hard drives or a personal computer is generally not recommended since there is little support or recourse available if something goes wrong. Talk to your university’s research data or IT services teams for advice about best options for storage. Backups might be built into one of your storage systems, as in many electronic research notebooks, but it’s always good to have a backup schedule in mind: for example, daily incremental backups, weekly full backups, and whether the backups will be automated or manual. If you’re using cloud storage as your primary option, you can usually find the automated backup schedule, location of the servers and other details online, then augment these with your own schedule.
You may have heard of the 3-2-1 backup rule. It’s a data protection strategy that stores 3 total copies of data, on 2 different media types, with 1 copy stored off-site. This approach prevents data loss from single points of failure like ransomware, hardware theft, or natural disasters.
4. Documentation
Documentation should cover anything about the data generated as part of your research that would enable research data to be reused by your team, other researchers, or yourself in future research. Methods for generating data, analytical or procedural information, planning to capture instrument metadata alongside data, recording provenance of data and their coding, or detailed descriptions for variables are just a few examples of documentation you might include in a DMP.
5. Risks and risk mitigation
It’s important to evaluate the main risks to security of information, confidentiality, and anything else that might impact the security of your data. Summarise the main risks and processes that will keep your data safe. Assess the intensity of the risks—what are the implications if data is lost, hacked, accidentally released?—and respond with appropriate measures to mitigate those risks. Describe the main processes and security measures for storage and processing of commercial, confidential or personal data, and for data access. Describe controls put in place and auditing of user compliance with consent and security conditions.
6. Long-term: retention, preservation, destruction
Although the future may seem a long way off, long-term storage, preservation, and retention of the research data are a key part of the planning process. Long-term retention is about how long you keep the data for and is key for research integrity and reproducibility, which is important for funders, journals, and the wider academic community to be able to verify or replicate findings. It’s also important for your own future reuse and impact—your high-quality dataset can have value long after the initial project ends. Keeping them accessible long-term allows for secondary analysis, new research questions, teaching use, future collaboration opportunities, increased citations, and broader research impact. Additionally, many research councils, charities, institutions, or government bodies specify minimum retention periods for data (usually 5-10 years as a minimum, sometimes longer for environmental, clinical, or longitudinal data). Don’t forget to indicate whether any data will not be retained and why!
Preservation is not just storage, it’s keeping data usable and understandable over time. File formats, software, and storage media change, so you’ll need to think about making sure your data are stored in durable, open formats and include sufficient documentation (yep, back to documentation, but with a longer view) such as metadata, codebooks, or README files. This is also a gift to your future self—it’s so much easier to reuse your own data when it’s been thoughtfully preserved. Funders also increasingly expect researchers not just to store data but to ensure it remains interpretable, curated, and, often, preserved in a trusted repository. This strengthens the credibility of the project and ensures public money produces enduring, accessible outputs.
Finally, not all data should be kept forever. For example, data destruction might be considered where there are ethical obligations or legal compliance reasons. Secure destruction should be planned in advance and the methods (e.g. secure wiping, physical destruction) should be documented.
7. Publication and sharing
It’s helpful at this point to think about your data access statement. The UKRI and the University, as well as some other major funders, require a data access statement even if no data were used during the creation of the article. If you have sensitive data, make sure that you follow what was approved in your ethics form, participant information sheet and/or consent form. If publishing a sensitive dataset, did you inform participants that you would be sharing the data publicly? Did you specify any conditions? It is crucial to consider early potential future data sharing and whether that will meet funder, publisher, or other requirements. Consider where (institutional repository? data centre?), how (In part? In full? Publicly? With access restrictions?), and in what form the data will be shared. These details will help in future when crafting your data access statement.
What if I don’t have data?
No problem. It’s usually sufficient to write a short paragraph describing your work and explaining (for example) that it’s purely theoretical, or that the data examined is textual third-party data (e.g. not belonging to you) and will be covered in the reference list.
Data management planning may seem intimidating, but breaking it down into manageable chunks can support best practice and the right level of detailed thinking. Approaching planning strategically is an excellent opportunity to organise your thoughts. In reality, you probably already know most of the information that’s needed. If this post has been helpful for you in thinking about data management planning, stay tuned for our next post in the ‘D’s of Data’ series.
