The significant changes in the landscape of higher education, research, and knowledge and technology transfer that digital transformation has brought about have caused a growing need for effective research data management (RDM). Julius-Maximilians-Universität Würzburg (JMU) provides guidance and support for researchers that are involved in the acquisition of third-party funding, the implementation of research projects, and other aspects of RDM. The University encourages its researchers to involve all collaborators that contribute to a research project in RDM from an early stage and to develop a data management plan (DMP). At the planning stage of any research project, you should devote particular attention to developing a strategy for the management, publication, and use of the data generated by that project. To ensure the success of your project and the re-usability of the generated data, you should also make arrangements for the long-term storage of that data. Having proper data management procedures in place ensures long-term protection from data loss that is independent of the individuals involved. To facilitate access to and use of your research data, you should use transparent archiving structures and provide contextual information for that data. Please also bear in mind that pure data publications are considered fully-fledged publications that can help you make your work more visible and raise your research profile: Your data will be assigned a DOI; this means that they will be citable without having been discussed in journal articles and can be used in some forms of citation analysis, lists of publications, etc.
The sections below will discuss good practice in RDM in more detail:
Develop a DMP to outline how you are planning to manage your research data throughout the project lifecycle. Start out by identifying what type of data your project will generate and discuss how you will process, maintain, store, and archive them as well as how they might be re-used in future research. To ensure the effectiveness of your RDM and the long-term usability of your data, you must develop your DMP at the planning stage of your research project, i. e. before data are generated, not after project completion. Why it is important to preserve and maintain data during a research project’s lifecycle and beyond can best be explained with the help of the DCC Curation Lifecycle Model and the WissGrid guidelines:
- Planning: The first stage of the lifecycle involves the planning of your research project, the development of your DMP, and the collection of data (either new primary data or already existing data). The analysis of these data will, in turn, generate additional data that are essential for your research project.
- Selection: It is neither possible nor recommended to retain all data generated in the course of research activities. This is why, before you archive your data, you must determine what should be retained and what should be disposed of (e. g. unnecessary intermediate results). You must also determine the retention period for your data.
- Ingestion: This is the transfer of data selected for retention to a data archive. Before you can transfer your data to an archive, you will have to process and structure them, homogenise them, and prepare metadata that describe them. This is usually a labour-intensive process. You should also keep a record of the steps involved in the processing of your data and should preserve the software environment you used to process them.
- Storage: The effective long-term archiving of your data beyond the lifecycle of your research project depends upon having an appropriate technical infrastructure in place. This can be provided by experienced storage service providers. Research data must be archived for a minimum of ten years.
- Preservation: You should not assume that digital data remain usable outside the environment in which they were generated and/or used originally. Therefore, please determine what technology environment is needed to ensure the usability of your data and how you are planning to ensure that your data remain usable even if technology changes.
- Access and re-use: Research data are useless if they are not discoverable. This is why, in the final stage, you must determine how you will be making your data discoverable, who you want to be able to access them, and how you will provide access to them.
Data management planning is about defining processes to be applied throughout the research data lifecycle, defining responsibilities, and clarifying what technologies you are going to apply.
Questions you may want to consider include:
- What is the purpose of your research?
- Which institutions and individuals will be involved?
- What type of data will you generate and use?
- How will you generate and use your research data?
- What discipline-specific standards will you apply (data formats, ontologies, etc.)?
- What quality assurance processes will you adopt?
- What additional information will you have to provide to make your data understandable?
- What data will have to be retained and why?
- When will you select data for retention?
- How long can and should your data be retained?
- Who will be allowed to use your data? Are any restrictions (legal or otherwise) on data sharing required?
Since requirements and standards vary from discipline to discipline, we are not able to provide a complete step-by-step guide to data management planning. Rather, you will have to tailor your planning to meet the requirements of your specific research project.
Funding bodies such as the DFG (German Research Foundation), the German Federal Ministry of Education and Research (BMBF), and the European Union increasingly require that researchers applying for funding submit a data management plan or, at least, a statement describing how data generated by the project will be managed together with their proposals. In addition, many aspects of RDM and project-specific processes are assigned greater weight in the evaluation of project proposals. There is no successful project without RDM.
Open data management in the context of EU funded projects
Open access to research data is an underlying principle of the European Commission’s Horizon 2020 programme. Researchers applying for funding under this programme (all sections and calls of the working programmes 2017) must develop a data management plan outlining a strategy for the collection, storage, and open accessibility of data generated by EU funded projects. The EU has issued official guidelines on the development of data management plans for projects funded under Horizon 2020. The DMPonline tool helps you develop DMPs that meet the European Commission’s requirements.
BMBF funded projects The BMBF’s relevant (auxiliary) terms and conditions for project funding (Bestimmungen/Nebenbestimmungen für Fördervorhaben) require that researchers applying for funding submit an exploitation plan (Verwertungsplan) discussing, among other aspects, the potential for re-use in research and technology development of data generated by funded projects.
DFG funded projects In 2015, the DFG adopted its Guidelines on the Handling of Research Data. The DFG expects applicants to address the following questions at the planning stage of their research projects: Is it likely that the data generated by the project will be re-used in future research? Which datasets are likely to be re-used? How can the accessibility and re-usability of the data be ensured? Project proposals submitted to the DFG must include information on the type of data expected to be generated by the project and on the management of these data. Applicants may also request funding for project-specific costs incurred in connection with making the data re-usable by third parties. Collaborative research centres can submit proposals for information infrastructure projects. Research data must be archived for a minimum of ten years. For more information and discipline-specific recommendations for RDM, please refer to the web pages of the DFG.
Whether, and how easily, your research data are re-usable depends, among other factors, on your choice of data format. Considerations when choosing formats
- Is the format standard to your discipline?
- Does it guarantee long term data usability and access? This cannot be known with absolute certainty, but the more software tools can read a format the more likely it is that it will continue to be supported in the future.
- Is the format’s documentation openly accessible? If it is not, it will not be possible to develop software that can read the format at a later date.
- Are technologies (DRMs, encryption, etc.) restricting the use of the format?
- Are there legal restrictions on the use of the format (e. g. patents in the case of MP3)?
- Does the format contain relevant information only? In particular, you should avoid including irrelevant formatting information (e. g. font sizes in ODS or XLS spreadsheets).
A list of recommended data formats is available on the pages of the RADAR Project.
We would encourage you to use one of the following format
|Data type||Recommended format|
with formulas: LaTeX (TEX)
edition projects in the humanities : TEI/P5
numerical data: HDF5
|Raster graphic||PNG, TIFF (baseline)|
|Vector graphics||SVG, EPS|
|Relational databases||SQL Dump, XML, cf. Spreadsheet|
|Structured data||XML or popular XML dialects, JSON, YAML|
Metadata make your research data identifiable, discoverable, and accessible to other researchers. Use a metadata schema to describe and annotate your datasets. Appropriate metadata will help you manage your research data, understand and use them over long periods of time, and make them accessible to others: The Dublin Core schema, a widely adopted metadata element set, for example, comprises elements such as identifiers (ISBN, DOI, etc.), technical descriptions (e. g. file formats), content descriptions (e. g. research topic, research methods), and references to documents to which your data relate.
For lists of established disciplinary metadata standards, please refer to the websites of the Standford University Libraries and the Digital Curation Centre (DCC).
To make the retrieval of information through metadata searches more efficient, you should use standardised terms and phrases wherever possible. Only if we consistently use the same term for the same concept can a metadata search retrieve exact and complete results and can we establish links between different sets of research data. This is why you should use a controlled vocabulary when you populate metadata elements. A controlled vocabulary is a set of pre-defined rules and terms that represent specific concepts. It is provided in the form of a word list or structured thesaurus. You can create and use your own controlled vocabulary for your specific project. We would recommend, however, that you use standardised vocabularies and thesauri. These are often discipline or institution specific and are being updated and disseminated by the competent bodies on an ongoing basis.
So-called authority data are another type of controlled vocabulary. These are datasets that describe persons and organisations as well as concepts or pieces of literature, music, film, artefacts, buildings, and geographical places. They provide semantic context and are assigned persistent identifiers. Many research institutions in German-speaking countries use authority data that are available in the Integrated Authority File (GND) of the German National Library.
Classifications, too, are a type of controlled vocabulary. They are used to systematically categorise data and describe them in a standardised way.
The University of Basel is building up a database of controlled vocabularies, authority files, classifications, thesauri, ontologies, and taxonomies worldwide (BARTOC Basel Register of Thesauri, Ontologies & Classifications).
Different academic disciplines prefer different publication media. You should choose a publication medium that is commonly used in your discipline. In principle, you can choose between publishing your research data
in a media repository, in a data paper, or in the form of a supplement to a journal article or book
A repository is a server for data storage. Institutional repositories hold the publications of one particular institution. Disciplinary repositories contain data associated with a particular academic discipline or particular types of media, and multidisciplinary repositories provide a general publication platform.
You should choose a repository that is commonly used in your discipline. In addition, you should make sure that the repository of your choice is reputable and independent. It should also provide options for the granting of rights and licences and should allow persistent identifier linking.
See below for a list of selected disciplinary and multidisciplinary repositories:
- Inspire-HEP – high energy physics
- Pangaea - earth and environmental science
- Psychdata - psychology
- GESIS – quantitative social research
- DARIAH-DE - humanities and cultural studies
An exhaustive list of data repositories by discipline is available at Open Access Directory (OAD).
Re3data.org is a registry of around 1,500 repositories that also offers a rating system.
It is important that your research data are accompanied by a data paper. A data paper is a metadata document that describes a particular dataset or a group of datasets. Unlike conventional research articles, data papers do not report hypotheses and conclusions; they merely describe data and the circumstances of their collection, their quality and limitations as well as their potential for re-use in future research. The actual datasets are usually stored on a media server or deposited in a repository and are merely linked to the paper.
Data papers are published in the form of peer-reviewed articles in academic journals. In addition to these data papers, most journals also contain other scientific or scholarly articles.
Open access to data generated by EU funded projects
Open access to all peer-reviewed scientific publications that result from research funded under Horizon 2020 is an obligation in the European Commission’s Horizon 2020 programme. It is not enough to merely provide a link on the project website; researchers will have to publish the final version of the peer-reviewed article, choosing one of the following options
- ‘Green option’ / self-archiving: The article is archived by the researcher in an online repository (e. g. JMU’s OPUS publication server) before, after, or alongside its publication in a scientific or scholarly journal with a maximum embargo period of 6 or, in the humanities and social sciences, 12 months.
- ‘Gold option’ / open access publishing: The article is immediately provided in open access mode. Under certain conditions, the University of Würzburg will cover some of the associated costs, the so-called article processing charges (APC). Many funding bodies will also reimburse APCs incurred in connection with a funded project.
Research data published in open access mode should also be shared as open data. Projects can opt out of publishing their research data if, for example, publication is incompatible with the obligation to protect results that are capable of commercial exploitation. Proposals will not be penalised for opting out. We expect, however, that more and more consortia and projects will want to make their research data openly available.
There are a number of legal aspects that must be considered when managing and, in particular, when publishing research data. The use of certain research methods in disciplines such as the life sciences is, for example, subject to approval by an ethics committee. If you want to use personal data, you must meet stringent data protection requirements. In addition, the use and publication of data is governed by the provisions of the German Gesetz über Arbeitnehmererfindungen (Employee Invention Act, ArbEG) and copyright law and is subject to the rules on protection of third party interests.
Please make sure you clarify any legal issues at the planning stage of your project.
JMU provides guidance and support on issues surrounding RDM:
- project proposals and structuring of research projects
- publishing via JMU's publishing house Würzburg University Press
- publishing via the OPUS publication server
- open access publications
- research data repositories
- support for research projects in the humanities
- interoperable, open, documented data formats
- short and medium-term data storage for periods of up to 10 years
- publication of research data
For contact details, please go to the 'Contact' section.
For more information on RDM, the DFG, and the requirements of the European Commission, please refer to:
- EU funded support on issues surrounding open access and open data:
- OpenAire information portal,
- German contact point for issues surrounding open access
- ZENODO interdisciplinary repository