Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- README.txt
- Size
- 1.82 KB
- Format
- Text file
- Description
- Readme
- MD5
- dbea4cf9d8eba2dae318a74c1a9dc3f0
OAGS Title Generation Dataset
===============================
OAGS is a title generation dataset consisting of 34993700 abstracts
and titles from scientific articles. Texts were lowercased and
tokenized with Stanford CoreNLP tokenizer. No other preprocessing
steps were applied in this release version. Dataset records
(samples) are stored as JSON lines in each text file.
The data is derived from OAG data collection
(https://aminer.org/open-academic-graph) which was released
under ODC-BY licence.
This data (OAGS Title Generation Dataset) is released under
CC-BY licence (https://creativecommons.org/licenses/by/4.0/).
Download
--------
This dataset can be download from LINDAT/CLARIN repository
http://hdl.handle.net/11234/1-3043
Publications
------------
If using it, please cite the following paper:
Çano, Erion and Bojar, Ondřej, 2019, "Efficiency Metrics for
Data-Driven Models: A Text Summarization Case Study", INLG 2019,
The 12th Inter . . .

- Name
- OAGS.zip
- Size
- 14.89 GB
- Format
- application/zip
- Description
- Data
- MD5
- b3def7c79f11d2c109c48cc0a72b88ae
- OAGS
- oags_train3.txt1 GB
- oags_val.txt14 MB
- oags_val-test_backup.txt657 MB
- oags_train2.txt1 GB
- oags_test.txt14 MB
- oags_train_backup.txt42 GB
- oags_train1.txt557 MB