Abschnittsübersicht

  • processing time: 15 minutes, 36 seconds

    • 1.1 Introduction & Learning Objectives

      This learning module provides you with information on how to handle research data and shows you the advantages of well-structured and organised research data management (RDM).

      After completing this chapter, you will be able to...

      • ...classify and define the terms 'research data' and 'research data management',
      • ...value the advantages of a well-structured RDM,
      • ...overview the further contents of the learning module and know which aspects are most relevant for you.

    • 1.2 What is research data and what is research data management?

      According to the "Guidelines on the Handling of Research Data" published by the DFG (German Research Foundation) in 2015, research data includes “among other things: Measurement data, laboratory values, audiovisual information, texts, survey data, objects from collections or samples that are created, developed or evaluated in scientific work. Methodological testing procedures such as questionnaires, software and simulations can also represent central results of scientific research and should therefore also be included under the term research data.”

      Research data can therefore vary a lot depending on the subject area and doesn't only play a role in the typical disciplines that deal with data, such as the natural sciences or social and economic sciences (see Fig. 1.2 & Fig. 1.3), but also includes, for example, linguistic language data or image descriptions from the art sciences, etc.

      Forschungsdaten aus der Chemie
      Fig. 1.2: Research data from chemistry

      Fig. 1.3: Research data from economic sciences

      The focus is primarily on handling digital research data. The particular challenge is that, due to the digitalisation and automation of work processes, ever larger and heterogeneous amounts of data are being created, the sensible and coordinated handling of which is very time-consuming. This heterogeneity is characterised on the one hand by file formats that are used in many different ways (.txt, .docx, .pdf, .ods, etc.) and on the other hand by different forms of presentation with different levels of abstraction (graphics, 3D models, simulations, survey data, etc.).

      Conventional scientific procedures often do not yet guarantee sufficient use of the large amounts of data. Furthermore, there are still only a few overarching standards for handling (digital) research data. Handling is mainly shaped by individual or subject-specific practices. Data loss or the non-reproducibility of data are not uncommon, especially after project completion. Research data can then only be reused or reproduced for further research purposes to a limited extent, for example, due to a lack of documentation of the work steps or outdated formats (cf. Büttner, Hobohm and Müller 2011: 13 et seq.).

      It is precisely this problem that research data management addresses. It is intended to offer sustainable opportunities for the handling of research data. Research data management, or RDM for short, encompasses the entire handling of research data, from planning, collection, processing, and quality assurance to storage and making available or publication. All steps of RDM should be documented and aligned with the current subject-specific standards and practices of the individual scientific disciplines. Many scientific institutions have now published a research data guideline to regulate the handling of research data as a first step. The research data guideline of the Frankfurt UAS can be found here.

    • 1.3 Advantages of good research data management

      But what advantages do you actually get from good research data management (RDM)? In a first step, Figure 1.4 breaks down the various goals that can be pursued through RDM for different dimensions.


      [Image: Jens Ludwig, what are research data? Nestor PERICLES School 2016]

      Fig. 1.4: Goals of the RDM for different dimensions

      The goals are influenced by different dimensions (internal/external context; active/rare use of data). Research data management should support researchers in the handling and traceability of their data (the two left boxes) and meet the demands of the public (the two right blocks). Furthermore, it should ensure that generated data can be actively used for further research (upper blocks), as well as for long-term quality assurance in the form of documentation of the research process (lower blocks) (cf. Broschard and Wellenkamp 2019: section Benefits of research data management).

      Research data management should lead to long-term traceability and reproducibility of data through appropriate documentation of the research process and minimise data loss. The transparency of data collection and processing is thus promoted and validation of research results, e.g., in case of allegations, is further facilitated. In the long run, successful research data management saves time and resources. Reasons for this include better collaboration (e.g. through common standards, use of common platforms, etc.), avoidance of errors and protection against data loss.

      In addition to these practical benefits during research, a publication of well-documented and reusable datasets improves visibility and reputation for you as a researcher, as increasingly not only scientific articles but also data publications are appreciated with ever increasing tendency.

    • 1.4 Research data and good scientific practice

      The DFG's "Guidelines for Safeguarding Good Research Practice" (often referred to as the DFG Code) provide a common basis for science by setting requirements for scientific excellence and collaborative scientific work. These also include requirements for working with research data. The DFG Code consists of a total of nineteen guidelines, whereby the first six guidelines deal with scientific principles, guidelines 7 to 17 with the actual research process and the last two guidelines with the non-compliance with good research practice.

      Part of the explanations here are above all the guidelines that have a direct reference to research data. Guideline 7, "Cross-phase quality assurance", states with regard to research data:

      "The origin of the data, organisms, materials and software used in the research process is disclosed and the reuse of data is clearly indicated; original sources are cited. The nature and the scope of research data generated during the research process are described. Research data are handled in accordance with the requirements of the relevant subject area. The source code of publicly available software must be persistent, citable and documented. Depending on the particular subject area, it is an essential part of quality assurance that results or findings can be replicated or confirmed by other researchers (for example with the aid of a detailed description of materials and methods)" (DFG 2019, 14 et seq., emphasis by the author).

      Research data, including the associated research software, is considered to be of great value in the context of good scientific practice with regard to the quality assurance of research. Therefore, make sure that you document all work steps in such a way that other scientists have the possibility to check your results. This also includes citing external (data) sources that you may have used to extend your own data.

      Guideline 10, "Legal and ethical frameworks, usage rights", points out, in addition to the responsible handling of research data, that the legal framework conditions of a research project also include "documented agreements on the rights of use to research data arising from it and research results". (DFG 2019, 16) For you as a researcher, this means obtaining these agreements and disclosing the rights of use in the metadata descriptions of the data for subsequent users.

      In Guideline 12, "Documentation", the DFG requires that "all information relevant to the achievement of a research result [should be documented] as comprehensibly as is necessary and appropriate in the subject area concerned in order to be able to review and evaluate the result". (DFG 2019, 17 et seq.) In order to ensure this traceability, it is necessary, among other things, to provide information on research data used and on research data generated during the project period that is openly presented to third parties in an understandable form.

      Guideline 13, "Providing public access to research results", calls for researchers to move towards open access, also with regard to the research data used. "In the interest of transparency and to enable research to be referred to and reused by others, whenever possible researchers make the research data and principal materials of which a publication is based available in recognised archives and repositories in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable." (DFG 2019, 19) However, the DFG also explicitly points out that in some cases it may not be possible to publish data in open access (e.g., in the case of third-party patent rights). The following principle should therefore always apply with regard to Open Access: As open as possible, as restrictive as necessary.

      The last guideline that relates to research data is Guideline 17, "Archiving". It states that, when research results are published, the research data on which the publication is based "are generally archived in an accessible and identifiable manner for a period of ten years at the institution where the data were produced or in cross-location repositories". (DFG 2019, 22) Find out about archiving options from the Research Data Unit at the [name of university] before you start a research project. Especially if it is a project with a very high volume of data, funds can be applied for to ensure the necessary storage infrastructure for archiving.

      If you need more information on good scientific practice, it is worth visiting the DFG's newly created portal on the DFG Code. Additional information can be found on the website Ombudsman for Science, a "body established by the DFG to assist all scientists and scholars in Germany with questions and conflicts in the area of good scientific practice (GSP) or scientific integrity." Here you will find further literature that deals specifically with the handling of research data according to good scientific practice. At this address you will find references to international literature on so-called codes of conduct in science. This article deals with the question of cooperation and granting access to data after the completion of a third-party funded project, when the researchers may no longer be at the institution where they collected the data.

      The guidelines of the Frankfurt UAS to assure a good scientific practice (Leitlinien der Frankfurt UAS zur Sicherung guter wissenschaftlicher Praxis) are almost similar to the guidelines of the DFG. 

    • 1.5 Structure of this learning module on RDM

      The aim of good research data management is to keep research data available and usable for others for as long as possible, i. e., far beyond the duration of the project. This is why, in the context of research data management, we often talk about the lifespan of data and, associated with this, about the research data life cycle. The research data life cycle, which is discussed in Chapter 2, illustrates what this means and what tasks can arise in RDM.

      When the time has finally come and you want to start your own project, it is now often necessary, due to the requirements of the major research funders (especially the DFG, BMBF and EU), for you as a researcher to draw up a data management plan that comprehensively describes how you will handle the research data throughout the duration of the project. Chapter 3 will show what a data management plan can look like and what you should bear in mind.

      If you then actually collect and process the data and want to make the data usable for subsequent research, you should provide the research data with metadata that give people who are not familiar with the project a comprehensive understanding of the data. If you want to make the data available to a large subject-specific community, the use of so-called metadata standards should also be included. Chapter 4 will give you an overview of the benefits of metadata and metadata standards.

      Chapter 5 deals with the FAIR principles, which formulate a quality standard for making data findable, accessible, interoperable, and reusable. Even though this development is still comparatively young, research data must increasingly be measured against these criteria. In addition to the more technical FAIR principles, the CARE principles are also presented, which in turn contain the ethical requirements of professional handling of research data.

      If your data is supposed to be a useful resource for other researchers, it must reach a certain qualitative standard. What options there are to increase the quality of your data and what to look out for will be presented in Chapter 6.

      Chapter 7 provides guidance on how to better organise your data during the research project. This includes, on the one hand, the use of a versioning concept to be able to directly recognise and compare both old and new data, and on the other hand, the creation of specific folder structures or the use of uniform naming of files and research data.

      The collection of data is usually followed by the storage of this data on a data medium so that you can retrieve it and use it later. In addition, according to good scientific practice, the data should be stored somewhere after the research is completed so that other researchers can access and re-use the data. What you should pay attention to and what support the Frankfurt UAS offers is the subject of Chapter 8.

      Legal issues are often related to the processing of research data and subsequent publication. Chapter 9 provides an overview of the legal particularities you need to be aware of when dealing with research data and how to deal with them. However, the explanations in this chapter are of a purely informative nature and are not legally binding. In the event of acute legal questions regarding the collection or publication of data, you should therefore always additionally consult the legal advice and/or the data protection officer of the Frankfurt UAS (dsb@fra-uas.de).


    • Test your knowledge about the content of the chapter

    • Here is a summary with the most important facts.