Publishers are aware of the growing interest in text and data mining, and also of the different methods that are used to perform text and data mining.
As there are usually methods in place to track bots and other methods of TDM it is important to be aware of any restrictions a given publisher may place on our usage as it can have a large impact on everyone's ability to use a resource if we step over the boundaries we have agreed to.
The following is a list of publishers that allows TDM. Restrictions are noted, as well as links to getting started.
Contact Ask a librarian for further assistance.
Database |
Details | More information |
---|---|---|
American Chemical Society |
No API available at this time. |
|
American Physical Society |
Researchers may request the data set for use in research about networks and the social aspects of science. |
|
arXiv |
arXiv provides access to metadata through several APIs and other data through several different avenues |
|
Clarivate |
May retrieve reasonable amounts of content required for your own work and for CSIRO’s internal business purposes. The database does not provide access to full-text metadata via an API. However, the search results metadata contains the DOI which may be useful for text mining through CrossRef. Getting started:
|
Clarivate Web of Knowledge Text Mining Example
|
CORE |
CORE (COnnecting REpositries) provides access to a large database of full text items from repositories and open access journals. CORE data can be downloaded as a bulk dataset. CORE also provides access to data through CORE API. |
|
Elsevier : ScienceDirect |
For non-commercial use only. API keys may not be shared. Elsevier Provisions for Text and Data Mining. By default the ScienceDirect API returns article metadata only. To add full-text data email askalibrarian@csiro.au including your personal Elsevier account details.
Getting started:
|
|
IEEE |
For non-commercial research purposes. Only CSIRO staff are permitted to access search results and click-through to full-text articles. The database does not provide access to full-text metadata via an API. However, the search results metadata contains fields such as the DOI which may be useful for text mining through CrossRef. Getting started:
|
|
Public Library of Science (PLOS) |
PLOS articles may be mined, reused, and shared by anyone, anywhere, for any purpose. The entire PLOS text corpus is available for download. PLOS also provides access to an API for non-bulk downloading of articles. |
|
PubMed |
The National Library of Medicine provides access to several large datasets of journal articles and other scientific publications. |
|
Springer |
Content and TDM materials may only be stored on an internal server for the duration of a TDM project. TDM output is for internal personal use only and may not be used to create derivative products. Getting started:
|
Springer Nature Developer portal |
Taylor and Francis |
No API available at this time. Publisher will advise when available. | |
Wiley |
CSIRO does not have a separate TDM agreement with Wiley at this time. For non-commercial, scholarly research related to specific projects. TDM may not be used for direct or indirect commercial purposes. Wiley Text and Data Mining License v1.1. Search for relevant articles using WileyOnline, then use the CrossRef API to access full text abstracts and PDF’s. Getting started:
|
|