Files in this item
Creative Commons - Attribution 4.0 International (CC BY 4.0)
- 1.82 KB
- Text file
OAGS Title Generation Dataset =============================== OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGS Title Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). Download -------- This dataset can be download from LINDAT/CLARIN repository http://hdl.handle.net/11234/1-3043 Publications ------------ If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, "Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study", INLG 2019, The 12th Inter . . .