Karen Payne <email@example.com>, International Science Council, World Data System David Castle <firstname.lastname@example.org>, University of Victoria
Mark Leggott <email@example.com>, Research Data Canada
In this paper, we propose a vision for Canada’s role in building a global digital research infrastructure (DRI). We begin by advocating for a principles and policy based approach that will thoughtfully guide DRI investments in Canada, drawing on the research community’s significant expertise and willingness to engage. We then recommend a constructive engagement with leading international DRI organizations and communities of researchers to ensure interoperability. Our approach has two major motivations. One is to ensure that Canadian researchers are equipped to excel in research, and it is widely accepted that international collaboration is a pathway to, and a hallmark of, research excellence.1 The other motivation is related to impact. Global climate circulation models, viruses that do not respect borders, and complex economic systems built with sovereign partners represent just a fraction of the ledger of scientific inquiry that necessitates international cooperation. What is good for scientists internationally, is also good for scientists domestically, and vice versa. This was a key finding from the Office of the Chief Science Advisor of Canada in their Roadmap for Open Science: “The Chief Science Advisor should monitor the dynamic international context and make recommendations to ensure that the Open Science strategy for federally supported intramural and extramural science continues to keep pace with international developments.”2 Equally important, strengthening scientific institutions is a key component to securing a fair and just international world order.
Ideal Future DRI State in Canada
The ideal future state of DRI in Canada and abroad consists of highly trained personnel easily accessing, utilizing, contributing to, and managing a global set of interoperable scientific resources, or a Global Open Research Commons (GORC). It is a vision of seamless open access by all to research content including data, publications, visualizations, software and computing resources, metadata, vocabulary, identification and other services. Envisioned in this way, a GORC enables research excellence and raises the potential for research to have global relevance and socio-economic impact.
NDRIO is in a position to drive Canada forward to adopting the vision of a GORC. The path forward must be paved with good governance. We recommend the development and statement of an overt commitment to a principles and policy-based approach to DRI investments, and the provision of critical supports to research communities by fostering consensus about strategy, priorities and standards. Over the last two decades this principled approach has been enormously successful as institutions mandated open access to data and publications resulting from research supported with public funds. That phase of mandated access is moving into a phase of mandated interoperable sustainability as laid out in the FAIR3, CARE4 and TRUST5 principles, which articulate a set of values and desirable characteristics that ensure responsible inclusion and reusability of scientific assets. Looking forward, we anticipate this phase will inform funding requirements to include evidence of a commitment to best practices in global DRI, made manifest by nationally accessible and appropriate use of enabling components including, but not limited to, global persistent identifiers, well-formed ontologies, and machine actionable metadata.
Call to Action and Key Considerations
Our call to action is for NDRIO to adopt organizational policies and principles for decision making that align with international best practices. Further, as NDRIO invests in national research infrastructure, international interoperability must be adopted as a precondition for research excellence and impact.
We raise three key considerations that we believe are germane to securing our vision of the ideal state of DRI in Canada, and for NDRIO to contemplate as its organizational policies and principles evolve.
1. Differentiate NDRIO’s Role in DRI From Commercial Providers
The only model for a loosely organized, international collaboration creating an easy to use, global DRI in a relatively short time span is the internet. Global DRI relies heavily on the internet and looks to technology companies for guidance on how to build a web of value-added services on top of that infrastructure. The research community, however, faces a unique set of challenges that set it apart from, and make it difficult to emulate, success stories from technology companies. The foundation of the internet was built on a coherent set of practices and norms beginning with very few players.
Corporations that built services making use of the internet are top-down systems with the ability to focus on a single use case, mandate standards and implementation choices, and have a clear metric of success: profit. The research landscape is decidedly different: with a wide range of stakeholders and domains, representing education, internet and critical infrastructure communities, each with their own abilities, goals, traditions and needs. Researchers dedicate themselves to building solutions for their needs, with uncertain funding, a shifting landscape of technological innovations, and no central authority to mandate standards. There are multiple considerations regarding commercial entities in DRI, including but not limited to:
a. Vendor Lock-In – There are few commercial cloud providers capable of hosting big data sets and providing ARC resources, and they lack interoperability. Pathways for easy migration and information exchange must be negotiated up front to address vendor lock-in is real.
b. Interoperability – Lack of interoperability can happen in multiple areas of DRI. For example, one of the most common PID services used to identify publications and other research content, DOIs, are based on the Handle system to resolve the PID (match the identifier to the resource). Other PID systems like ORCID for identifying researchers, uses a different resolver.6
c. Archive Capture – Commercial publishers increasingly serve as gatekeepers in research by offering repository services for a greater number of scholarly work types.7 Consuming open access resources and capturing scientific artefacts without adhering to best practices will lead to less interoperability and reusability, for example in ensuring that virtual notebooks can be re-run in the future.
2. Meet Researchers Where They Are
Investments in DRI must take stock of the tools researchers already use. Fortunately, Canada has a rich history of seeking out researcher engagement to identify their DRI challenges.8,9 The national researcher survey that NDRIO is currently developing will be paramount to this recommendation, and provides an opportunity to find new ways to architect the DRI system to provide contemporary and improved services to the research community. We anticipate the survey will reveal a dual challenge in the development of DRI: the components of the DRI system need to be “plug and play” interoperable services, but we need to ease the burden on researchers by offering platforms with a single point of entry to help them manage their workflows. To be clear, we do not promote a single monolith platform for all researchers. Rather we anticipate prioritizing development for the most commonly used services and platforms directly linked to real world use cases in demand by researchers.
Many researchers currently analyze data and run simulations using virtual research environments (VREs) or electronic notebooks. Some VREs like the iReceptor gateway, have strong connections to tightly managed data assets.10 Going forward, data discovery services need to be reconfigured for notebooks and VREs so that searches among any data source can be conducted within the workspace. The Australian Research Data Commons is developing a platform architecture for researchers to browse data sources and load those sources into a single working space accessible for processing by JupyterHub11.
To the extent possible, developers should reuse existing standards in new contexts as they enhance functionality in research platforms. The most advanced reusable components will likely arise from the four science domains who have, out of necessity, created systems to manage big data: earth observation, physics, ‘omics and astronomy. Digital infrastructure developers at CERN have been providing good examples of this approach across several big data domains. The CERN Virtual Machine File System, which was devised to host several hundred million software analytics packages that are distributed to data stores in order to process high energy physics data in situ, has also been utilized to standardise and automate the transfer of biomedical data across federated Galaxy compute platforms12. More recently, the ESCAPE (European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures) project is investigating the re-use of CERN workflows and protocols as they facilitate the interoperability and re-use of data from the International Virtual Observatory Alliance (IVOA). Finally, La Referencia has a collaborative agreement with CERN to use Zenodo as temporary repository while countries develop their own infrastructure13. We should promote the use of standards and services to the domains that are less well versed in digital infrastructure as they work to increase their capacity.
3. Support International Coordination Mechanisms
International coordination of research is difficult and requires constant attention. If Canada forges its own path without looking abroad we will forego opportunities to integrate DRI with global partners and make it more difficult for Canadian researchers to avail themselves of international services and infrastructure. Coordination mechanisms like working groups and federations ensure that skilled personnel and technical infrastructure are aligned, while laying the groundwork for Canadian research excellence, competitiveness and impact in the long term. Trusted actors can work together to determine principles for engagement that inform frameworks for development and serve as the basis for policies and laws. A compelling example of this is the update to the 1995 European Data Protection Directive initiated by an opinion written by the European Data Protection Supervisor. The opinion later grew through a series of working parties to become the GDPR, the gold standard for data protection legislation globally, and a model for Canada’s new Privacy legislation.14
More recently, at the request of the European Commission, Canadians joined international colleagues on the RDA COVID-19 working group to create recommendations for data sharing across the clinical, ‘omics, epidemiological and social sciences, as well as guidance on community participation, Indigenous data sovereignty, legal and ethical concerns, and software development.15 This type of work is a central value proposition for RDA, and international coordination is explicitly addressed in RDA’s forthcoming strategic plan. This EU-RDA program can be a conduit for NDRIO to ensure federal funds flow internationally to support these efforts and enable Canadian researchers to engage.
Canada has a history of supporting research commons with international scope. Using the polar community as a compelling use case, Canada’s geography, well-earned reputation of being able to work with any other country, research expertise and existing investments place Canadian researchers in a prime position to retain a leadership role in this area. Achieving our vision for an ideal future for DRI in Canada is increasingly in reach because international coordination groups can help Canada identify and break down required tasks, while simultaneously developing a community of practice alongside the identified infrastructure and standards. NDRIO can prioritize DRI investments in a GORC that will catalyze Canadian research excellence and potential for impact.
In conclusion, we reiterate that NDRIO’s commitment to a principled approach to building scientific infrastructure will best serve Canada and our international partners. No single community or country can address every consideration in the DRI landscape, making it incumbent upon NDRIO to coordinate with international scientific federations as they marshal their strengths to address global challenges.
- Wagner, C. 2008. The New Invisible College: Science for Development. Washington DC: Brookings Institution Press.
- https://www.rdc-drc.ca/the-ireceptor-gateway-fair-open-data-interoperability-and-data-curation-promotes-rapid- response-to-covid-19/