Soubory tohoto záznamu

Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
README.txt
Velikost
1.82 KB
Formát
Textový soubor
Popis
Readme
MD5
dbea4cf9d8eba2dae318a74c1a9dc3f0
 Stáhnout soubor  Náhled
 Náhled souboru  
OAGS Title Generation Dataset
===============================

OAGS is a title generation dataset consisting of 34993700 abstracts 
and titles from scientific articles. Texts were lowercased and 
tokenized with Stanford CoreNLP tokenizer. No other preprocessing
steps were applied in this release version. Dataset records 
(samples) are stored as JSON lines in each text file. 

The data is derived from OAG data collection 
(https://aminer.org/open-academic-graph) which was released 
under ODC-BY licence. 

This data (OAGS Title Generation Dataset) is released under 
CC-BY licence (https://creativecommons.org/licenses/by/4.0/). 


Download
--------

This dataset can be download from LINDAT/CLARIN repository
http://hdl.handle.net/11234/1-3043


Publications
------------

If using it, please cite the following paper:

Çano, Erion and Bojar, Ondřej, 2019, "Efficiency Metrics for 
Data-Driven Models: A Text Summarization Case Study", INLG 2019, 
The 12th Inter . . .
                                            
Icon
Název
OAGS.zip
Velikost
14.89 GB
Formát
application/zip
Popis
Data
MD5
b3def7c79f11d2c109c48cc0a72b88ae
 Stáhnout soubor  Náhled
 Náhled souboru  
  • OAGS
    • oags_train3.txt1 GB
    • oags_val.txt14 MB
    • oags_val-test_backup.txt657 MB
    • oags_train2.txt1 GB
    • oags_test.txt14 MB
    • oags_train_backup.txt42 GB
    • oags_train1.txt557 MB