When utilizing a dataset for text and data mining purposes it is important to consider the limits that exist with regards to the use and reuse of the material you have accessed and downloaded.
While there is some discussion in government and legal circles about allowing exceptions in copyright law for the purpose of training technologies such as Large Language Models, in most cases, dataset owners will place limits upon the purposes for which their products can be used for.
Text and data mining agreements may restrict the amount of material that may be retrieved, may require attributions to be made in any outputs or may forbid the creation of any derivative work based on the dataset, or use the dataset directly or indirectly for any commercial activity. As an example, this may disallow the use of the dataset for the purpose of training a Large Language Model.
You should consult any guidelines that outline such limits before undertaking any text and data mining work so as to be informed of what you can and cannot do with the material you seek to access. More information for a number of resources is available on the Publisher Resources page.