FIT4RRI Co-Creation Experiments: Text and Data Mining (TDM) Experiment at The Open University

Maxie GottschlingCo-creation experiments, FIT4RRI Project, Open Science, RRI

FIT4RRI Experiments

FIT4RRI is conducting four co-creation experiments to observe Responsible Research and Innovation (RRI) and Open Science in action. Our experiments are an exercise of engaging the quadruple helix actors (university, industry, policy makers, society) into the design and implementation of research projects.

Our intention is to understand how institutions need to change their organisational frameworks to allow better embedment of RRI and provide value to the involved actors.

Each experiment focuses on a different topic, e.g. material science, text and data mining, energy or photometry.

Experiment by The Open University

One of the FIT4RRI experiments is implemented by our partner institution The Open University (OU) and deals with Text and Data Mining (TDM) of Big Scholarly Data.

TDM is the process of extracting high quality and meaningful information from text to answer unknown questions. In the past the OU was a partner at the OpenMinTeD project and investigated the machine accessibility of the Hybrid Gold Open Access publications for text and data mining purposes. This resulted into the creation of a publisher connector, a tool providing an interoperability layer at the granularity of metadata and full text files over non-standardised publisher systems.

In the FIT4RRI experiment the OU extends this work to closed access publications. As text and data mining is better performed in large volumes of data, the goal is to take advantage of the UK Copyright Law exception. According to this lawful text miners are offered a massive corpus of subscription based scientific publications ready to text mine, contributing to the upgrade of unexplored knowledge. The OU has put together a Working Group (WG), which is called “eduTDM”.  The WG consists of a variety of stakeholders such as: publishers, text and data miners, organisations making recommendations and spread best practices and industry experts. The purpose of this co-experiment is to investigate the technical and organisational challenges of creating a new model to support researchers in text mining content from multiple publishers.

Figure 1. Services needed for eduTDM implementation

So far the eduTDM WG has met three times. The first meeting had an exploratory purpose, where the WG was asked to discuss about:

  • benefits to TDM for research,
  • awareness of the UK Copyrights exception
  • whether UK  affiliated researchers should be having TDM access to subscribed content Identification of the technical and organisational challenges for establishing this idea.

Having a strong technical background, the OU decided that the second and third meetings would have a technical focus. A white paper report, which is currently in draft will describe the technical infrastructure in detail. The paper tries to assist users who are currently individually responsible for maintaining their system and need to adapt for different publishers and the heterogeneous standards they expose. This is prone to manual error, is limited by domain knowledge of individual researchers and doesn’t automatically scale up to accommodate changes/updates introduced by publishers in the future. eduTDM aims to provide a better alternative for machine access for TDM of content from multiple publishers.

In the eduTDM model, users can issue an API call to eduTDM to obtain all documents matching their need. This will involve a workflow comprising of a sequential series of steps as follows:

  • User issues a “request for content” call to eduTDM.
  • On behalf of the user, eduTDM issues separate content request calls to each of the publishers or can satisfy the request by means of cached content.
  • Each publisher independently processes the incoming request and responds to eduTDM with the requested content.
  • eduTDM aggregates and harmonises the content from all the responses received.
  • eduTDM responds to the user with the aggregated content.

Table 1. Direct comparisons with existing services

The OU co-experiment evolves mainly around the Open Access and Open Science RRI pillars and it also applies the Governance pillar and more specifically organisational change, governmental, funder and institutional policies.


By Nancy Pontika, Petr Knoth and Bikash Gyawali