Bölüm: 9 Legal aspects of research data management | Research Data Management - An Online Introduction

9.1 Introduction & learning objectives

Legal issues in dealing with research data arise at every stage of the research data life cycle. Figure 9.1 provides an initial overview of legal aspects that need to be considered in each phase of data handling.

Edit based on: Paul Baumann/Philipp Krahn, Rechtliche Rahmenbedingungen des FDM - Grundlagen und Praxisbeispiele, Dresden 2020, Slide 4

Fig. 9.1: Legal aspects of research data management in research data lifecycle

You do not have to find solutions for all the legal details of handling your research data yourself. However, if you want to work in the spirit of good scientific practice and research ethics, you should know at least the basics of some legal scenarios.

After completing this chapter, you will be able to...

• ...name the most important legal areas in dealing with research data • ...take concrete steps to implement your research project in compliance with the law • ...decide whether and how you can publish your data • ...contact the right place if you have any questions

If you have complex legal questions, you can contact the legal department and/or the data protection officer of the university. In addition, your research data management officer will also be happy to help you.

9.2 Which areas of law are relevant?

The following areas of law are particularly relevant for the responsible handling of data:

• Data protection law • Copyright and neighbouring rights • Contract law

Depending on the research project, other areas of law may also be affected. For example, if your research involves inventions, you must also observe patent law. Likewise, especially in the case of cooperation with companies or contract research, there may be contractual agreements that need to be observed (e.g. confidentiality agreement).

Especially in epidemiological research with personal data and in research with therapeutic objectives, ethical considerations should also be taken into account. These are often already summarised in discipline-specific guidelines.

For example:

For some projects, an expert opinion from an ethics committee may be obligatory. As a rule, universities maintain such committees to assess fundamental ethical issues in science and research as well as ethical issues in scientific investigations. At Frankfurt UAS, an ethics committee is currently being established. If you need an ethics review now, you may contact a subject-specific ethics committee. A collection of ethics committees in Germany is offered by KonsortSWD.

9.3 Data protection

Data protection rights must be observed when collecting, storing, processing, and passing on research data relating to individuals. If you work as a researcher at a Hessian university with such data, it is advisable to know the main features of the following legal texts in particular:

General Data Protection Regulation of the European Union (GDPR)
German Federal Data Protection Act (BDSG)
Hessian Data Protection and Freedom of Information Act (HDSIG)

The following video briefly introduces the data protection laws that are particularly relevant to scientific research and explains how they relate to each other:

Source: Excerpt from MLS LEGAL - Data Protection in Research (YouTube) [Creative Commons licence with attribution (reuse permitted)]

Data without personal reference or anonymised information, on the other hand, do not fall under data protection law and can usually be processed freely, taking into account other rights (e.g. copyrights).

What exactly distinguishes personal data from other (anonymous) research data is explained in detail in the following section. In case of doubt, you should assume a personal reference to avoid liability risks.

9.3.1 Personal data and special categories of personal data

According to Art. 4 (1) of the GDPR, personal data is any information relating to an identified or identifiable living person. Examples of personal research data include survey data in the social sciences or health data in medical research.

An identifiable person is one who can be identified directly or indirectly by means of attribution:

in particular to an identifier such as a name, an identification number, location data, an online identifier or
to one or more particular characteristics that are an expression of the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person.

The following cases in particular have recently been decided in case law:

Images, film, and sound recordings if there is a reference to a person
IP addresses
written answers of a candidate in a vocational examination
Examiner's comments on the assessment of these answers

In determining whether a person is identifiable, the GDPR requires that account be taken of all the means likely to be used by the data protection officer or by any other person, in normal circumstances (in terms of cost and time), to identify the person (Recital 26 GDPR).

Source: Excerpt from MLS LEGAL - Data protection in research (YouTube) [Creative Commons licence with attribution (re-use permitted)].

In addition, there are categories of data in case law that are considered particularly sensitive. These include, for example, data on a person's state of health, sexual orientation, and political or religious views. A list of these special categories of personal data can be found in Article 9 of the GDPR.

This data is subject to special protection and special due diligence obligations during processing. This means, for example, that participants in scientific studies must explicitly consent to the processing of these special categories of personal data before the data is collected. Further aspects are explained in the following video:

Source: Excerpt from MLS LEGAL - Data protection in research (YouTube) [Creative Commons licence with attribution (re-use permitted)].

When processing personal data, the so-called Principles relating to processing of personal data (Art. 5 GDPR) must be observed:

Personal research data may only be collected if they are necessary to achieve the research purpose.
The collection and processing must be done transparently and with due probity vis-à-vis the data subjects.
Data subjects must at all times be able to understand the processing of their personal data and must not be misled by false and omitted information.
Protecting privacy by safeguarding personal data should be central to all collection and processing considerations.
The data must also correctly reflect the circumstances of the person concerned, i.e. it must not falsify them.
They shall be protected against misuse (e.g. removal, alteration, damage) technically and organisationally within the bounds of what is reasonable.

9.3.2 Informed consent and legal permission standards

Personal research data may only be collected and processed with the informed consent of the person concerned or with a legal standard of permission (so-called principle of prohibition with reservation of permission).

According to Recital 32 p.2 GDPR, the following requirements can be stated for informed consent:

Consent must be freely given (i.e. without physical or psychological influence)
Especially when processing sensitive personal data (according to Art. 9 or 10 GDPR), it is advisable to write down the consent.
The persons giving consent must be able to understand in advance which of their personal data will be used how, for what, by whom and for how long. In other words, people should be put in a position in which they are able to assess the consequences of their own consent.

On the other hand, legal permissions are granted without the consent of the data subject. Particular importance is attached to the exceptions for scientific research purposes contained in § 27 BDSG, but also in many state data protection laws (e.g. § 13 LDSG-BW, § 17 DSG-NRW, § 13 NDSG).

According to this, the processing of personal data is permitted if the interests pursued with the research project outweigh those of the persons concerned (cf. forschungsdaten.info). However, since this rarely applies, you should always obtain consent in case of doubt.

Consent does not require any special form. However, it must be verifiable – e.g. in the event of a review by the data protection supervisory authority – so that written or electronic documentation is strongly recommended. The declaration of consent should contain at least the following information:

Person responsible for data collection (legal entity) who is also the addressee of the declaration of consent;
Project title;
Specific information on the type of data collected;
Data processing procedures, data protection officer;
Reference to voluntariness, to the right of withdrawal, reference to the consequences or the absence of consequences in the event of refusal or withdrawal;***
particularly important: Intended use(s).

Above all, the data subject must be informed that their consent is completely voluntary, that they can therefore also refuse to consent and – if they do – that they can revoke the consent with effect for the future at any time, but that previous usages cannot be reversed (Cf. https://www.forschungsdaten-bildung.de/einwilligung).

The declaration of consent must be supplemented with information on the processing of the data. This includes the legal basis and purposes of the processing (insofar as these go beyond the processing), any data transfer to countries outside the EU, the storage or deletion periods of the personal data and the right of appeal to a data protection supervisory authority (cf. Watteler/Ebel 2019: 60).

Consent can also be given in the abstract for scientific purposes that are not known at the time of collection (so-called broad consent). However, the more specific the description, the more likely the scope of the consent in question will be able to extend to uses that go beyond the use of the primary purpose.

If the publication of data within the framework of the RDM is intended, the consent should explicitly include the storage and publication of the data. A practicable compromise between abstract and concrete broad consent can, for example, be a graded consent.

Fig. 9.2: Example of informed consent in "broad consent format" (source: Baumann/Krahn 2020).

Das folgende Video fasst alle Aspekte zur informierten Einwilligung und zu den gesetzlichen Erlaubnistatbeständen noch einmal zusammen:

Source: Excerpt from MLS LEGAL - Data protection in research (YouTube) [Creative Commons licence with attribution (re-use permitted)]

Further information

Some disciplines offer assistance and examples of wording for written informed consent (cf. e.g. VerbundFDB, RatSWD).

Model declaration for oral or written interviews of the Arbeitskreis Deutscher Markt- und Sozialinstitute (Working Group of German Market and Social Institutes)
Recommendations and model consent forms of the RatSWD
Formulation examples for “informed consents” of the VerbundFDB
Template for the Informed Consent for the Processing of Personal Data (German / English) by Qualiservice
Handout on Informed Consent (explanations on the use of the QA templates)

9.3.3 Means of removing identifying features

In general, personal research data must be anonymised after collection as soon as possible for the research purpose (at the latest when the research project is completed).

Anonymisation**

A change in the data to such an extent that the individual data on personal or factual circumstances can no longer be attributed to a specific or identifiable natural person (so-called absolute anonymisation) or can only be attributed to a specific or identifiable natural person with a disproportionate effort in terms of time, costs, and manpower (so-called de facto anonymisation).

The first step is to remove direct identification features (name, address, telephone number, etc.). Often, however, this is not sufficient to eliminate a reference to a person. In this case, reducing the accuracy of the information (aggregation) can be an effective measure that also allows certain parts of the information to be retained.

Aggregation

Summary of several individual values of the same kind to reduce the granularity of information. From the summarised information, it is no longer possible to draw conclusions about the individual pieces of information.

Here, detailed individual information (e.g. salary in the last month) is grouped into classes (e.g. lower, middle, upper class). The degree of aggregation necessary to exclude a personal reference can vary. It essentially depends on which other potential identification features are available in the data or can be obtained from external sources.

Example of gradual aggregation

Address → City → State → East/West → Country → Continent

In each case, careful consideration must be given to which of the available means appear to be the most suitable and proportionate to remove the identifying characteristics in such a way that no or only very limited de-anonymisation is possible, even with any additional knowledge as well as extensive capacities for data research and aggregation.

Postponement of anonymisation is only possible if characteristics that reveal a personal reference are needed to achieve the research purpose or individual research steps. This is the case, for example, during an ongoing research project that uses biometric data.

In this case, however, the personal characteristics must be securely and separately stored immediately after collection. This can be done, for example, by pseudonymising the personal research data.

Pseudonymisation

The separation of personal characteristics immediately after collection from the rest of the data, so that the data can no longer be assigned to a specific person without adding information.

One example is the use of a key table that assigns corresponding ID codes to the plain names of persons. In this way, the personal reference can only be established if one is in possession of the key table. If necessary, this can also be held by an independent trustee.

However, the data processed in this way continue to have a personal reference until the personal characteristics to be stored separately are deleted and are therefore subject to the requirements of data protection law.

Source: Excerpt from MLS LEGAL - Data protection in research (YouTube) [Creative Commons licence with attribution (re-use permitted)].

9.4 Decision-making authority

In addition to data protection, another important question is who can decide on the handling of the research data, especially its publication. As a rule, the person to whom the research data are “assigned” can also decide on their handling, such as their publication. Such an “assignment” can result from copyright law, service contract law or patent law, for example.

9.4.1 Copyright and Ancillary Copyright Law

As a rule, the protectability of individual research data under copyright law can only be assessed on a case-by-case basis and even then, not with sufficient legal certainty. Nevertheless, different case groups of research data can be distinguished according to the concrete type of content and, above all, how it was obtained:

Qualitative research data are, for example, linguistic works such as qualitative interviews or longer texts. They can contain copyrighted formulations, structures and thought processes. Copyright protection never applies if the wording, structure, and line of thought are essentially predetermined by professional practice.
Scientific representations, such as drawings, plans, maps, sketches, and tables, may be subject to copyright protection if the representation is not dictated by factual constraints or scientific conventions, but instead gives the scientist room for manoeuvre.
Under the same conditions, photographs and other photographic images are also protected by copyright. In addition to photographs, images from imaging procedures, such as X-ray, magnetic resonance and computer tomography images are included, as well as photographs and individual images from films.
Quantitative data are, for example, measurement results or statistical data. In the context of standardised surveys, there will be no copyright protection in most cases.
(Quantitative) research data, the arrangement and compilation of which has the effect of establishing individuality, is a so-called database work (Section 4 UrhG(German Copyright Law)). Only its structure and not the information as such is subject to copyright protection.
Metadata often are relatively short, purely descriptive representations. They are usually not protected by copyright. In principle, they can only be protected in the rare cases where they contain, for example, longer sections of text or photographs.

Photographs and other photographic images may also be protected by a neighbouring right under Section 72 UrhG. The following figure by Brettschneider (2020) attempts a generalisation of the protectability of research data as copyright works:

Fig. 9.3: Work quality of research data, source: hhttps://zenodo.org/record/3763031, slide 5.

Compilations of research data within the framework of a database can be protected by copyright as a database work – but also by the database producer right (§87a UrhG). This ancillary copyright requires a substantial investment in terms of collecting, organising, and making research data accessible.

The owner of the database producer rights is usually the person who makes the essential investments, e.g. pays the researchers' remuneration and bears the economic risk. Generally, this is also the employing university or research institution. In some cases, a third-party commissioning or funding institution may also be the owner.

In the case of non-protected research data (e.g. measurement results), it is largely unclear from a legal point of view who has the decision-making authority over the data in a specific individual case. Whether a possible personal right of the scientist also allows an assignment of the research data to a person in these cases is disputed.

9.4.2 Granting of rights of use within the framework of service and employment contracts

If the creation of copyright-protected works is one of the duties or central tasks of the employment contract, the employer is granted rights of use to these so-called “compulsory works” on the basis of the employment contract or employment relationship (Section 43 UrhG (German Copyright Act)). The following “mapping” of research data result from the balance of interests with the freedom of research (Art. 5 para. 3 GG (German Basic Law)):

As a rule, university teachers are entitled to all rights of exploitation, use and publication of the works they have created, unless there are express contractual agreements (e.g. third-party funding, non-disclosure agreements). § 43 UrhG (so-called “compulsory works”) does not apply here.
Scientific assistants and employees are privileged under Article 5 (3) of the GG (German Basic Law) if and to the extent that the scientific work is carried out free of instructions. If the research is carried out in accordance with instructions, a tacit granting of the right to use the research data generated is to be assumed.
In the case of students and external doctoral candidates, no rights of use are granted to the university, as they are not employees. However, different contractual agreements can be made, e.g. in the case of third-party funded projects, through which the university is granted rights of use.

The following figure illustrates the issues of the transfer of exploitation rights to the employer (“compulsory work” under Section 43 UrhG) and the balancing of interests with the freedom of research (Article 5(3) GG) according to roles as they are to be weighed in the scientific field in individual cases:

Fig. 9.4: Ownership of research data, source: https://zenodo.org/record/3763031, slide 7

It should be noted that the granting of rights of use within the framework of service and employment contracts may also be tacit if the granting of rights of use is not expressly regulated in the contract. Within the framework of the (tacit) granting, the scientist also leaves the right to determine to the employer whether and how the work is published. On the other hand, each scientist retains their right to be named.

9.4.3 Summary

The following video explains in summary the complex interplay of all the legal positions for the “mapping” of research data that have been elaborated so far and in a few additional aspects, it even goes beyond (e.g. software, data carriers):

Source: Excerpt from "Open Science: From Data to Publications" - Brettschneider, Peter (2020): Legal Issues in Publishing (https://www.youtube.com/watch?v=CrvnMLxGppI) [Creative Commons licence with attribution (reuse permitted) CC BY 4.0]

9.5 Publication and licensing of research data

Before data can be made publicly available, there are a number of legal aspects to consider – because not all data can or should be made public. The most important legal aspects are considered in the following decision aid in the form of a flow chart. Answering the questions will guide you through the decision-making process to a recommendation:

Fig. 9.5: Decision-making process for data publication, (source: forschungsdaten.info, https://zenodo.org/record/3368293)

Essentially, but not exclusively, questions of data protection and copyright must be clarified before publication. The decisive course for the possibility of publishing research data in a repository is therefore often already set when the data are collected, and the corresponding declarations of consent are obtained.

9.5.1 What are suitable licensing models?

In order for others to be allowed to use your copyrighted data, the conditions of use must be regulated. This is done by issuing a licence. If no licence exists, copyrighted data may only be used with the express consent of the copyright holder.

On the other hand, non-copyrighted research data whose use is already permitted without contractual permission (e.g. licence) should neither be restricted nor subject to conditions. For this reason, under the CC-BY 4.0 licence, for example, there is also no enforceable obligation for attribution (see clause 8a of the licence agreement).

Creative Commons licences are often used to make research data available. Just as the European Commission in its project Horizon 2020, the DFGrecommends the use of these licence types. When deciding on a specific licence, the guiding principle is “as open as possible, as restrictive as necessary”:

Fig. 9.6: Possible uses of data under different Creative Commons licences, source: Apel et al.

The “Abridged version of the expert opinion on the legal framework conditions of research data management” of the BMBF-funded DataJus project at TU Dresden, which investigated the legal framework conditions of research data management, advocates the following two licences:


Licence	Description
CC0(Plus)	The CC0 licence enables maximum release of the data and facilitates subsequent use. There is no right to credit. This licence is particularly recommended for metadata.
CC-BY 4.0	The CC-BY 4.0 licence makes sense if credit is desired. At the same time, the requirement to cite the source is met (safeguarding good scientific practice). The CC-BY 4.0 licence is therefore recommended for the publication of research data.

Licence

Description

CC0(Plus)

The CC0 licence enables maximum release of the data and facilitates subsequent use. There is no right to credit. This licence is particularly recommended for metadata.

CC-BY 4.0

The CC-BY 4.0 licence makes sense if credit is desired. At the same time, the requirement to cite the source is met (safeguarding good scientific practice). The CC-BY 4.0 licence is therefore recommended for the publication of research data.

The use of further licence modules is not recommended. For example, the Creative Commons (CC) licences with the attribute “ND” (e.g. CC-BY-ND) rule out the distribution of “modified” material. This would make it impossible to make a new database publicly available that was created from parts of other databases.

Software, unlike much other research data, requires a separate licence. The use of Creative Commons licences is not recommended for this. However, different licences are also available: MIT licence, GNU General Public License (GPL), GNU Lesser General Public License (LGPL), Apache licence. The most important distinction here is between copyright licences (such as Apache) and so-called copyleft licences (such as GNU-GPL). A copyleft corresponds largely to the "Share Alike" of Creative Commons licences.

9.5.2 What can inhibit publication?

Not all research data may or should actually be published. Before you decide not to publish your data at all, you should always check whether you can take measures to enable legally and ethically unobjectionable publication. The following figure provides an overview of possible legal hurdles and corresponding solutions:

Fig. 9.7: Decision tree for data publication, source: Böker/Brettschneider (2020)

In addition, you should also take research ethics aspects into account when deciding whether to publish your research data. The following points should give you some criteria without claiming to be complete:

Can the data be used in a way that is harmful to society?
For example, does publication pose risks to the researched individuals (even if they have consented to the use of their data)?
Do participating working group members have legitimate interests in preventing or delaying data publication (e.g. for the completion of qualification work)?

9.5.3 Protection of confidential information in research data centres

By using data centres or even archives, it is possible to restrict access to confidential and sensitive data and at the same time enable data sharing for research and educational purposes. The data held in data centres and archives are generally not publicly accessible. Their use after user registration is restricted to specific purposes. Users sign an end-user licence in which they agree to certain conditions, such as not using data for commercial purposes or not identifying potentially identifiable individuals. The type of data access permitted is determined in advance with the originator. Furthermore, data centres can impose additional access regulations for confidential data.¹

9.6 Summary

The following video “From Data to Publications” concludes this chapter by explaining the complex interplay of all the legal positions on research data that have been worked out so far, and in a few aspects also goes beyond what has been explained (e.g. software, data carriers):

Source: "Open Science: From Data to Publications" - Brettschneider, Peter (2020): Legal issues in publishing (https://www.youtube.com/watch?v=CrvnMLxGppI) [Creative Commons licence with attribution (reuse permitted) CC BY 4.0]

References, further reading and online sources Sayfa

Test - 9 Legal aspects of research data management Sınav

Test your knowledge about the content of this chapter !

Handout - 9 Legal aspects of research data management Dosya

Here is a summary of the most important facts

Bölüm anahatları

9.1 Introduction & learning objectives

9.2 Which areas of law are relevant?

9.3 Data protection

9.3.1 Personal data and special categories of personal data

9.3.2 Informed consent and legal permission standards

Further information

9.3.3 Means of removing identifying features

Anonymisation**

Aggregation

Pseudonymisation

9.4 Decision-making authority

9.4.1 Copyright and Ancillary Copyright Law

9.4.2 Granting of rights of use within the framework of service and employment contracts

9.4.3 Summary

9.5 Publication and licensing of research data

9.5.1 What are suitable licensing models?

9.5.2 What can inhibit publication?

9.5.3 Protection of confidential information in research data centres

9.6 Summary