Persistent Identifiers in Canada – Position Paper

Prepared for NDRIO in November 2020
Contact: Eugene Barsky, Head, Research Commons, University of British Columbia, eugene.​barsky@​ubc.​ca

For the ORCID Canada Consortium, DataCite Canada Consortium, and the Canadian Research Data Management (RDM) Community

What is a Persistent Identifier (PID)?

The research enterprise generates a great deal of information, both in physical and digital form— from descriptive information about researchers, to publications, to datasets resulting from a research project. This information is scattered across many systems and technologies, including human resource systems, grant management systems, publication databases, data repositories, and web pages. Persistent Identifiers (PIDs) are the anchors that facilitate links between related information.

Essentially, these IDs are labels that refer to a specific entity in the information landscape, such as an object, organization, person, or dataset. For example, in the same way that a Person’ record has the fields First Name’ and Last Name,’ it should be a best practice for each such record to also have a field for ID’ to uniquely identify the person, since there can be more than one author with the same first and last names. A persistent ID adds value by providing a long-lasting reference to a digital object that gives information about that object, regardless of what happens to it.

PIDs serve two primary functions: first, PIDs uniquely identify an object, person, or organization, facilitating unambiguous reference to that entity; and second, PIDs provide a mechanism to find entities over time, even if they change location.

A PID system adds a third function: a framework (e.g. software and associated repository) for discovering objects described by a PID and doing something with them (e.g. viewing the object, getting usage statistics, etc.) in a sustainable way. In Canada, a number of frameworks associated with particular PIDs have already received broad support by key communities of practice; however, these communities require ongoing support to remain sustainable and, ultimately, persistent.

PIDs are urgently needed in research to unambiguously locate, link, and cite research outputs (e.g. journal articles, data, and other research products such as samples, software, and formulas), as well as entities in the research process, such as authors, funders, and institutions. PIDs are now available for literature, data, samples, authors, and much more.

We believe that a persistent identifier” is a new name for a concept that has been a part of the scholarly environment for decades. In the past, publishers used identifiers such as ISBNs and ISSNs to distinguish unique textual objects. However, the proliferation of digitally-available research and technical publications has created a need for machine-readable, interoperable PIDs. Machine-readable PIDs, such as DOIs and ORCID iDs, are valuable assets in enabling information sharing across systems.

For this white paper, we will focus on three main categories of identifiers, and provide our recommendations for each. Notably, we are recommending support for two Canadian member-focused consortia, while also looking toward the future, where scalable PIDs implementation plans at the national and provincial levels will be required. We recognize that in addition to the PID frameworks identified below, other infrastructure exists or is being developed and will require further consideration going forward. This includes systems like Crossref DOIs, as well as PIDs for other kinds of objects, for example: Research Activity iDs (RaIDs) which aggregate all the PIDs associated with a specific research activity or project (making it easier to discover related objects); Grant IDs (a special type of DOI) provide a unique ID for specific funding awards; and Research Resource IDs (RRIDs) which provide an ID for resources used in research (e.g. cell lines, organisms).

1. Object Identifiers

The term object refers to a meaningful piece of data and is intentionally broad. Objects include books, articles, white papers, chapters, datasets, tables, figures, videos, etc. A single resource, such as an electronic book, may have multiple object identifiers associated with it, such as an identifier for the entire book, identifiers for each chapter, and identifiers for individual figures within chapters.

Digital Object Identifier (DOI)

A DOI is a digital identifier of an object, and can be assigned to any object, whether physical or digital. The DOI system1 is managed by the International DOI Foundation (IDF), which provides oversight to DOI registration agencies and maintains the DOI resolver and APIs.

DOIs serve as unique, permanent alphanumeric strings assigned to specific objects, which remain unchanged. DOIs are the most common type of identifier for digital objects, particularly for scholarly, research, and technical publications. Different versions of work might have different DOIs, but should be combined in the landing page for the object.

DataCite is a global non-profit organization that develops and supports tools and methods that make data and scholarly content more accessible, useful, and citable. On January 1, 2020, the Canadian Research Knowledge Network (CRKN) and the Canadian Association of Research Libraries (CARL) Portage Network jointly formed the DataCite Canada consortium2. The DataCite Canada consortium currently has over 50 members, ranging from small science publishers like CybelePress to premier U15 research universities like the University of British Columbia. There are currently more than 550,000 DOIs minted with DataCite in Canada — over 128,000 of these were minted in 2019 alone. In addition to providing consortial membership, DataCite Canada also provides a robust community of practice and governance for DOIs in Canada.

We recommend that NDRIO provides sustainable yearly funding for the Datacite Canada consortium and further promotes the adoption of DOIs for data sharing in Canada:

  • Ensure integration of DOIs in all NDRIO-managed infrastructure
  • Require new research software projects funded by NDRIO to support DOIs

2. Researcher Identifiers

Researcher Identifiers, also known as Author Identifiers or Scholar Identifiers, are alphanumeric strings that establish a unique identity for a given author or creator. Researcher IDs are becoming more critical in the scholarly-communication system. Individual or contributor identifiers encompass researchers, authors, scientists, artists, musicians, etc. These identifiers establish a profile for a contributor to a work that disambiguates that contributor from others. Unique identifiers enable contributors with the same or similar names to accurately track citations of their own research.

Open Researcher and Contributor Identifier (ORCiD iD)

An ORCID iD3 is a PID for researchers that records professional activities and disambiguates one researcher from another. An ORCID record connects researchers with their contributions and affiliations over time, despite name changes or different name formats (e.g. John Q. Smith, J. Smith, John Smith, JQ Smith, etc.). It can be connected in some way to most other creator profiles and, in our opinion, is the most interoperable researcher PID.

ORCID iDs have been gaining traction internationally and are being integrated into institutions and systems across the world. According to the ORCID landing page, at the time of writing this paper, there were 10 million ORCIDs assigned to researchers around the globe, at least 127,000 of which are associated with students, staff, and scholars at Canadian institutions. ORCID supports principles of the Canadian government’s Roadmap for Open Science, and can ultimately enhance the global impact of Canadian work as it becomes more visible via ORCID.

ORCID partners with local organizations to form regional consortia. In Canada, the ORCID-CA consortium4 was established in 2016 by the ORCID-CA implementation group with twenty-seven members with the goal of supporting Canadian scholars and institutions in the adoption of ORCID iDs across the Canadian research ecosystem. ORCID-CA is sustained by institutional membership, which includes thirty-eight- member organizations as of November 2020 across university, government, research data management, and medical sectors. ORCID-CA primarily supports the Canadian ORCID community of practice, with an emphasis on communications and technical integrations with local systems. Integrations provide direct value to researchers and institutions by saving administrative time and facilitating the flow of information about scholarly activities tied to specific scholars into and out of Canadian and global databases. To date, ORCID-CA members have completed 20 system or platform integrations, with many more to come.

We recommend that NDRIO provides sustainable yearly funding for the ORCID Canada consortium and further promotes the adoption of ORCIDs for data sharing in Canada:

  • Ensure integration of ORCIDs in all NDRIO-managed infrastructure
  • Require new research software projects funded by NDRIO to support ORCID integrations
  • Encourage the adoption of ORCID iDs in grant application systems

We recommend that NDRIO facilitate the adoption of, and support for, additional PIDs that help associate the researcher with other important elements of the research endeavour, such as Projects (RaID) and funding awards (Grant ID), thereby simplifying the discovery and impact of Canada’s research outputs.

3. Organization Identifiers

An organization identifier is a (typically alphanumeric) string that establishes a unique identity for a specific organizational entity. The goal of organization identifiers is to enable clear, long-term linking between the organizations supporting creators and the creation of objects. For example, an organization identifier can be used to uniquely identify a researcher’s institutional affiliation or the funder of a research project. Organization identifiers cover research institutions, funders, corporations, government agencies, etc. Whereas object and researchers’ identifiers are more solidly established and adopted, organization identifiers are still in the development stage.

Organizational affiliations are another important identifier for a variety of stakeholders, including academic administrators, funders, publishers, repository managers, software developers, rights agencies and individual researchers.

Research Organization Registry (ROR) Identifier

ROR IDs5 are globally-unique, stable, discoverable, and resolvable identifiers for research organizations. Launched in 2019, ROR is a community-led project originating from the OrgID initiative, in which 17 different organizations (representing publishers, libraries, platform providers, metadata services) worked together to define a vision for a community-led registry of organization identifiers.

ROR maps its IDs to other identifiers for the same organization, such as GRID, Wikidata, ISNI, and Crossref’s Funder ID. This kind of interoperability and ability to link and crosswalk identifiers is central to ROR’s aims.

We recommend that NDRIO encourages Canadian institutions to consistently use RORs when working within the PID frameworks (such as DOIs and ORCIDs) and encourage the adoption of RORs in grant application systems.

Summary:

Research is increasingly international and multidisciplinary, resulting in a very complex scholarly environment. Unique identifiers facilitate discovery and reuse of content, global interoperability, and a better understanding of the impact and value of scholarship. PIDs are the building blocks of the FAIR Principles6. They reduce time and administrative burden by enabling data entered into one system to be automatically reused in the context of other systems, supporting more streamlined and automated processes for researchers, funders, vendors, and institutions. Additionally, PIDs facilitate the transfer of information across organizations and establish links across systems institutionally, nationally, and internationally. Used together, PIDs represent a sophisticated digital infrastructure of interconnected people, organizations, and resources that enable the community to innovate in new ways. Clearly, there are many advantages to adopting sustainable solutions for unique identifiers for Canadian research entities and outputs.

Our recommendations:

Given the many potential benefits and implications of using PIDs, we recommend that NDRIO supports the Canadian PID consortia and services that adhere to best practices and the principles of trust and good governance. The five succinct recommendations below are designed to support the development of a rich national framework for research information in Canada, including improved management, discovery, and interoperability of research data.

  1. We recommend that NDRIO assumes a leadership role in this area and commits to the adoption of best-practice PIDs and ensures that the solutions adopted are national in scale, not-for-profit, and incorporate appropriate community governance.
  2. We recommend that NDRIO provides sustainable yearly funding for the DataCite Canada Consortium and further promotes the adoption of DOIs for data sharing in Canada by:
    • Ensuring integration of DOIs in all NDRIO-managed infrastructure
    • Requiring new research software projects funded by NDRIO to support DOIs
  3. We recommend that NDRIO provides sustainable yearly funding for the ORCID Canada Consortium and further promotes the adoption of ORCIDs for data sharing in Canada by:
    • Ensuring integration of ORCIDs in all NDRIO-managed infrastructure
    • Requiring new research software projects funded by NDRIO to support ORCID integrations
    • Encourage the adoption of ORCID iDs in grant application systems
  4. We recommend that NDRIO facilitate the adoption of, and support for, additional PIDs that help associate the researcher with other important elements of the research endeavour, such as Projects (RaID) and funding awards (Grant ID), thereby simplifying the discovery and impact of Canada’s research outputs.
  5. We recommend that NDRIO encourages Canadian institutions to consistently use RORs when working within the PID frameworks (such as DOIs and ORCIDs) and supports the adoption of RORs in grant application systems.

1 DOI System – https://​www​.doi​.org/
2 DataCite Canada Consortium – https://​www​.crkn​-rcdr​.ca/​e​n​/​d​a​t​a​c​i​t​e​-​c​a​n​a​d​a​-​c​o​n​s​o​rtium
3 ORCIDhttps://​orcid​.org/​about
4 ORCID-CA: The ORCID Consortium in Canada – https://​orcid​-ca​.org/home
5 Research Organization Registry (RORs) – https://​ror​.org/​a​bout/
6 FAIR principles – https://​www​.go​-fair​.org/​f​a​i​r​-​p​r​i​n​c​i​ples/