Résumé de section

  • processing time: 14 minutes, 4 seconds

    • 5.1 Introduction & learning goals

      When you start to read up on technical requirements of data exchange in research data management, you will very quickly come across the term “FAIR Data Principles” or “FAIR Principles” (rarely also: “FAIR Criteria”). Furthermore, in anthropology, social science and similar disciplines, ethical requirements are placed on the data when it comes to the study of indigenous peoples for example, which is why the so-called CARE principles were developed analogously to the more technically oriented FAIR principles.

      After completing this chapter, you will be able to...

      • ...name the FAIR principles.
      • ...process research data according to the FAIR principles.
      • ...name the CARE principles.
      • ...name what needs to be considered in the CARE principles.
    • 5.2 What are the FAIR principles?

      Many steps are necessary to collect and analyse research data. Besides, it takes time and energy and requires the brainpower of scientists. In addition, there is often a very high consumption of material, electricity and energy for mobility, equipment, computers, or elaborate settings. Especially when humans are the object of research or animal testing is necessary, it quickly becomes clear that – if possible – the research data collected should be used as widely and diversely as possible and that repetitions of the same research should be urgently avoided.

      Research data should therefore be usable without restrictions for as long as possible. This applies to the use of research data collected by the researchers themselves, but also to research data that researchers make available to each other.

      For this, research data must have certain properties. These are described in more detail in the FAIR principles. The abbreviation FAIR is composed of the first letters of the descriptive words:

      • Findable
      • Accessible
      • Interoperable
      • Reusable.

      They were developed in 2014 in a workshop of the Lorentz Center in the Netherlands and published for the first time in March 2016 in the journal Scientific Data. (cf. Wilkinson et. al. 2016)

      The vision to be achieved by adhering to the FAIR principles is the possibility for all researchers worldwide to benefit from the research data published this way and to produce research data themselves again in accordance with the FAIR principles. At the European level, for example, the European Open Science Cloud (EOSC for short) project of the European Commission relies on strict compliance with the FAIR principles when creating and publishing research data, so that this data can be made available to European researchers in a European science cloud.


    • 5.3 How do I prepare research data according to the FAIR principles?

      In the following, aspects of preparing research data in accordance with the FAIR principles will be outlined on the basis of the above-mentioned properties and the original document with reference to the various steps in the research data cycle (planning, collection, archiving, etc.). Although the four properties are considered separately here, they require each other.

      The following explanations serve only as a brief summary of the individual requirements of the FAIR principles. For a much more detailed overview of how you can implement them as a researcher, have a look at the TIB weblog for example.

      Findability

      Ensuring the findability of research data is crucial for the reusability of the data. An important step towards making data retrievable/findable is the assignment of so-called persistent identifiers, which globally ensure the unique and permanent identification of a digital resource. A frequently used form of such persistent identifiers is the DOI (Digital Object Identifier). This identifier must also be present in the metadata (see Chapter 4) and refer to the actual research data in order to be linked to it. It is also important to collect and document metadata that is as complete as possible, as well as all parameters of the actual research data, in order to improve retrievability. Finally, to make the data retrievable, the data must be fed into a searchable system that can be used by humans.

      Accessibility

      Once a user has found interesting research data via a search system, they are then facing the problem of accessing the data. In order to guarantee secure accessibility at all, the FAIR principles stipulate that standardised communication protocols (mainly http[s] and ftp) be used, which any browser can implement.

      Data can either be published directly in research data journals or research data centres. Research data publications enable the publication of all research and metadata, not just a selection of research results as is known and common for peer-reviewed articles in journals.

      When publishing research data, persistent metadata is very important. To be compliant with the FAIR principles, metadata of research data once published must continue to be available even if the research data may need to be withdrawn later. This condition should be met by all repositories, but check this anyway before publishing.

      It should be noted, however, that not all research data is suitable for free publication. Great care must be taken with sensitive and personal data, as well as with the rights of other persons or an institution to the research data. Even if further use is still pending, for example for the application of a patent, all ambiguities must be resolved before publication. If the data is sensitive and therefore cannot be made freely available, it is sufficient, in order to comply with the FAIR principles, to provide a reference at some point in the metadata to whom to contact if one is interested in this data (e.g., e-mail address, telephone number, etc.). FAIR is therefore not necessarily synonymous with Open Access, even though it is desirable.

      Interoperability

      The term “interoperability” originally comes from IT system development and refers to the ability of systems to work with other systems that already exist or are planned for the future, as far as possible without restrictions. Transferred to research data, this means on the one hand that it should be possible to integrate data into other similar data without a major effort and on the other hand that the research data should be compatible with different systems for analysis, processing and archiving.

      To ensure this, the FAIR principles propose the use of widely used formal languages and data models that are readable by both machine and humans. Examples of such languages include RDF, OWL, but also subject-specific controlled vocabularies (see Chapter 4.5) and thesauri.

      Reusability In order to enable a high degree of reusability of data by humans and machines, research data and the metadata related to it must be described so well that it can be replicated or reproduced and, in the best case also be applied to different settings. It helps to choose, if possible, reproducible settings from the outset and to provide the data with a large number of unique and relevant attributes that should, among other things, answer the following questions for other users in order to be able to draw conclusions about the generation of the data:

      • For what purpose or area of application was the data collected or generated?
      • When was the data collected?
      • Is the data based on own or third-party data?
      • Who collected the data and under what conditions (e.g. laboratory equipment)?
      • Which software and software versions were used?
      • Which version of the data is available, if more than one?
      • What were fixed baseline parameters in the survey?
      • Is it raw data or already processed data?
      • Are all variables used either explained somewhere or self-explanatory?

      Furthermore, the data must contain information on the licence status, i.e., there must be information on which data use licence the corresponding data fall under (see Chapter 9). In the age of Open Science, Open Access licences for one's own data are desirable and are also required by many funders. The best-known OA licences include Creative Commons and MIT, both of which also comply with the FAIR principles. To ensure that the data can also be used by others and that it is possible to draw accurate conclusions about the origin, the metadata should also contain standardised information about the citation.

    • 5.4 Possibilities of implementation

      Implementing the FAIR principles in every aspect is a challenge. To have a first indicator of how FAIR your data is, you can use the FAIR self-assessment tool from the Australian Research Data Commons, which you find here. Furthermore, when selecting a data repository for storing and publishing your data, you can definitely make sure that it has a “FAIR Compliance” designation. To do so, it must meet the requirements listed here:

      • The data sets (or ideally the individual files of a data set) are provided with unique and permanent persistent identifiers (e.g. DOIs).
      • The database allows the upload of intrinsic metadata (e.g. name of the author, content of the dataset, associated publication) as well as metadata defined by the person responsible (e.g. names of variables).
      • The licences (e.g., CC0, CC-BY, MIT) under which the data can be made available in the repository must be clearly identifiable or selectable by the user.
      • The source information, including metadata, is always publicly available, even in the case of restricted-access datasets.
      • The data archive provides an input screen that prescribes a specific format for the intrinsic metadata (to ensure machine readability/compatibility).
      • The database has a plan for the long-term preservation of the archived data.

      Source: Swiss National Science Foundation. Data Management Plan (DMP) – Guidelines for researchers

      When searching for a suitable repository that meets the FAIR data principles, you can also use the Repository Finder. If you activate the option “See the repositories in re3data that meet the criteria of the FAIRsFAIR Project”, you will get an overview of certified repositories that offer Open Access and persistent identifiers for the data. The Repository Finder uses the Registry of Research Data Repositories (re3data) for the search. It provides a good overview of international research data repositories in a variety of scientific disciplines.



      Fig. 5.1: The contents of the FAIR principles. CC-BY 4.0 Henrike Becker, graphically adapted by Andre Pietsch





    • 5.5 What are the CARE principles?

      The FAIR principles focus on characteristics of data to facilitate increased data sharing. Here, ethical issues do not play a role. To address these, the Global Indigenous Data Alliance (GIDA) published the CARE Principles for the responsible use of indigenous data in 2019 as a complementary guide to the FAIR Principles. These were drafted during International Data Week and the parallel Research Data Alliance Plenary on 8 November 2018 in Gaborone, Botswana, and focus on the individual and collective rights to self-determination and power of control of indigenous peoples in relation to collected data related to them. These data about indigenous peoples include, for example, surveys of their language, knowledge, customs, technologies, natural resources, and territories. In Germany, the application of the CARE Principles is not yet widespread.

      The abbreviation CARE is composed of the first letters of the following requirements for data to help achieve this goal:

      • Collective Benefit
      • Authority to Control
      • Responsibility
      • Ethics
    • 5.6 What are the CARE principles?

      Collective Benefit

      The first principle of the CARE Principles is that data systems must be designed in such a way that indigenous peoples can benefit from the data. For inclusive development, governments and institutions must actively support the use as well as the re-use of data by indigenous nations or communities by facilitating the creation of the foundations for innovation, value creation and the promotion of local, self-determined development processes.

      Data can improve planning, implementation and evaluation processes and support indigenous communities in addressing their needs. Decision-making processes can also be improved through data collection at all levels, involving citizens as well as institutions and governments in the collection process, giving them a better understanding of their peoples, territories, and resources. At the same time, the open sharing of such data also provides researchers with better insights into research and policy programmes that affect the respective indigenous peoples.

      Indigenous data is based on community values, which in turn are part of an overall society. Any value created as a result of research with such data should therefore also benefit indigenous communities in an equitable way, so that they can derive their own benefit from it and, if necessary, change their future actions based on this data.

      Authority to Control

      When data is collected in research about indigenous peoples, it is important to plan at the collection stage how to enable the research subjects to control this data themselves in order to protect their rights and interests even when the data is published. Self-governance of this data in the form of self-management should empower both indigenous peoples and the controlling institutions to determine how populations, lands and territories, resources, designations of origin and their knowledge are represented and identified in such data.

      In addition, Indigenous Peoples have a right to free, prior, and informed consent to the collection and use of such data, including the development of data policies and protocols for collection. This also includes making the collected data available and accessible. They must therefore have an active leadership role in the actual management and subsequent access to this data.

      Responsibility

      The collection of data from indigenous peoples goes hand in hand with certain responsibilities of the researchers in dealing with these data. For example, surveys must always be conducted in a way that research results and analysed data contribute to the collective benefit of the indigenous peoples and are made available to them in an understandable manner.

      To ensure a positive relationship between researchers and Indigenous Peoples, the use of data is only possible if there is mutual respect, trust and understanding. Importantly, what respect, trust and understanding look like in the particular cultural setting is determined by the indigenous peoples, not the researchers. When working with data, it must be ensured at all times that the production, interpretation, and any further use of the data preserves and respects the dignity of the indigenous community.

      In order to improve skills and capacities of indigenous peoples in handling data collected about them, data use is linked to mutual responsibility to improve data literacy in the communities. It also aims to support the development of digital infrastructure as much as possible to enable the collection, management, security, and subsequent use of data. This will be achieved by, among other things, providing resources to generate data bases on the languages, worldviews and lived experiences (including values and principles) of the respective indigenous peoples.

      Ethics

      The rights and well-being of Indigenous Peoples should be the primary concern at all stages of the data lifecycle. In order to minimise harm to Indigenous Peoples and maximise benefits, data must be collected and used in a manner consistent with the ethical framework of Indigenous Peoples and the rights affirmed in the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP). The assessment of benefits and harms should be made from the perspective of the Indigenous peoples, nations, or communities to which the data relate, not from the researcher's basis of assessment.

      Ethical decision-making processes address imbalances in power and resources and their impact on indigenous rights and human rights. To increase equity, such processes must always include a relevant voting group from the indigenous community. In addition, data governance should take into account potential future use and harm, so the metadata should include the origin (provenance) and purpose, as well as any restrictions or obligations on secondary use, including any consents.

    • Test your knowledge about the content of the chapter !

    • Here is a summary of the most important facts