Soubory tohoto záznamu

Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
oagkx.zip
Velikost
8.51 GB
Formát
application/zip
Popis
data
MD5
8a6475ea0d5a38c7aff97a0f5260df20
 Stáhnout soubor  Náhled
 Náhled souboru  
  • oagkx
    • part_11_0.txt11 MB
    • part_3_1.txt1 GB
    • part_0_1.txt900 MB
    • part_13_0.txt69 MB
    • part_10_0.txt873 MB
    • part_2_1.txt1 GB
    • part_12_0.txt11 MB
    • part_5_1.txt877 MB
    • part_1_1.txt867 MB
    • part_14_0.txt1 GB
    • part_7_1.txt120 MB
    • part_4_1.txt1 GB
    • part_9_1.txt867 MB
    • part_6_1.txt1 GB
    • part_8_1.txt541 MB
    • part_0_0.txt752 MB
    • part_3_0.txt1 GB
    • part_5_0.txt1 GB
    • part_2_0.txt1 GB
    • part_7_0.txt1 GB
    • part_4_0.txt1 GB
    • part_1_0.txt1 GB
    • part_9_0.txt709 MB
    • part_6_0.txt789 MB
    • part_8_0.txt561 MB
    • part_11_1.txt9 MB
    • part_13_1.txt108 MB
    • part_10_1.txt58 MB
    • part_3_2.txt437 MB
    • part_0_2.txt770 MB
    • part_5_2.txt880 MB
    • part_2_2.txt345 MB
    • part_12_1.txt9 MB
    • part_4_2.txt568 MB
    • part_1_2.txt759 MB
    • part_14_1.txt1 GB
    • part_7_2.txt311 MB
Icon
Název
README.txt
Velikost
1.93 KB
Formát
Textový soubor
Popis
readme
MD5
a286e714b793d3a196864122183a7fa1
 Stáhnout soubor  Náhled
 Náhled souboru  
OAGKX Keyword Generation Dataset
================================

OAGKX is a keyword extraction/generation dataset consisting
of 22674436 abstracts, titles and keyword strings from scientific 
articles. The texts were lowercased and tokenized with 
Stanford CoreNLP tokenizer. No other preprocessing steps
were applied in this release version. Dataset records 
(samples) are stored as JSON lines in each text file. 

The data is derived from OAG data collection 
(https://aminer.org/open-academic-graph) which was released 
under ODC-BY license. 

This data (OAGKX Keyword Generation Dataset) is released under 
CC-BY license (https://creativecommons.org/licenses/by/4.0/). 


Download
--------

This dataset can be download from LINDAT/CLARIN repository
http://hdl.handle.net/11234/1-3062


Publications
------------

If using it, please cite the following paper:

Çano Erion, Bojar Ondřej. Keyphrase Generation: A Multi-Aspect Survey. FRUCT 2019,
Proceedings of th . . .