Skip to Main Content

Text and Data Mining (TDM)

Publishing and Text and Data Mining

Publishers are aware of the growing interest in text and data mining, and also of the different methods that are used to perform text and data mining.

As there are usually methods in place to track bots and other methods of TDM it is important to be aware of any restrictions a given publisher may place on our usage as it can have a large impact on everyone's ability to use a resource if we step over the boundaries we have agreed to.

The following is a list of publishers that allows TDM.  Restrictions are noted, as well as links to getting started.

Contact Ask a librarian for further assistance. 

Database
/Vendor

Details More information
American
Chemical
Society

No API available at this time. 

 
American
Physical
Society

Researchers may request the data set for use in research about networks and the social aspects of science.

APS Data Sets for Research

arXiv

arXiv provides access to metadata through several APIs and other data through several different avenues

arXiv Bulk Access Overview

Clarivate

May retrieve reasonable amounts of content required for your own work and for CSIRO’s internal business purposes.

The database does not provide access to full-text metadata via an API. However, the search results metadata contains the DOI which may be useful for text mining through CrossRef. 

Getting started:

  • Register for an account in the Clarivate API portal. 
  • The WoS Article Match Retrieval API and WoS Lite API are available for use.
  • To use the WoS Expanded API email askalibrarian@csiro.au including your personal WoS account and details of the TDM project. 

Clarivate Web of Knowledge Text Mining Example

Clarivate Developer Portal

 

CORE

CORE (COnnecting REpositries) provides access to a large database of full text items from repositories and open access journals. CORE data can be downloaded as a bulk dataset. CORE also provides access to data through CORE API.

CORE Dataset
CORE Dataset documentation
CORE API

Elsevier : ScienceDirect
/Scopus

For non-commercial use only.  API keys may not be shared.  Elsevier Provisions for Text and Data Mining. By default the ScienceDirect API returns article metadata only. To add full-text data email askalibrarian@csiro.au including your personal Elsevier account details.

 

Getting started:

Elsevier Research Products APIs

Text Mining Example Elsevier

IEEE

For non-commercial research purposes. Only CSIRO staff are permitted to access search results and click-through to full-text articles.

The database does not provide access to full-text metadata via an API. However, the search results metadata contains fields such as the DOI which may be useful for text mining through CrossRef. 

Getting started:

  • Sign in or register for an IEEE account in order to obtain an API key.  A summary of the TDM project is required before an API key is issued. 

IEEE Available APIs 

 Text Mining Example IEEE

Public Library of Science (PLOS)

PLOS articles may be mined, reused, and shared by anyone, anywhere, for any purpose. The entire PLOS text corpus is available for download. PLOS also provides access to an API for non-bulk downloading of articles.

PLOS Text and Data Mining overview

PubMed

The National Library of Medicine provides access to several large datasets of journal articles and other scientific publications.

License terms vary for each dataset and should be checked before downloading.

PMC Article Datasets

Springer
Nature

Content and TDM materials may only be stored on an internal server for the duration of a TDM project.  TDM output is for internal personal use only and may not be used to create derivative products.

Getting started:

Springer Nature Developer portal
Text and data mining at Springer Nature

 Text Mining Example Springer

Taylor and Francis

No API available at this time.  Publisher will advise when available. 

Wiley

CSIRO does not have a separate TDM agreement with Wiley at this time.

For non-commercial, scholarly research related to specific projects. TDM may not be used for direct or indirect commercial purposes. Wiley Text and Data Mining License v1.1.

Search for relevant articles using WileyOnline, then use the CrossRef API to access full text abstracts and PDF’s. 

Getting started:

  • Using your ORCID ID obtain an API token
  • Identify relevant articles on WileyOnline and obtain DOI’s
  • Use the Crossref API as an intermediary to access Wiley full-text abstracts and pdfs. 

Wiley Text and Data Mining 

 

 

Text Mining Examples: