Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
oagl.zip
Size
7.28 GB
Format
application/zip
Description
Data
MD5
e2d6dfc1a6d7c76499e4c1c27ad86a89
 Download file  Preview
 File Preview  
  • oagl
    • val.txt829 kB
    • test.txt1 MB
    • val-test_bck.txt274 MB
    • train_bck.txt22 GB
    • train.txt5 MB
Icon
Name
README.txt
Size
1.13 KB
Format
Text file
Description
Readme
MD5
324f45558e52c18d41aab080763a2fc0
 Download file  Preview
 File Preview  
OAGL Paper Length Dataset
=========================

OAGL is a paper length prediction dataset consisting
of 17528680 records which comprise various scientific 
publication metadata like abstracts, titles, keywords,
publication years, venues, etc. The last field of each
record is the page length of the corresponding publication. 
Dataset records (samples) are stored as JSON lines in each 
text file. 

The data is derived from OAG data collection 
(https://aminer.org/open-academic-graph) which was released 
under ODC-BY license. 

This data (OAGL Paper Length Dataset) is released under 
CC-BY license (https://creativecommons.org/licenses/by/4.0/). 


Download
--------

This dataset can be download from:
http://hdl.handle.net/11234/1-3257


Publications
------------


Acknowledgements
----------------


Statistics of OAGL:
-------------------

Total samples:     	17528680 
Title tokens*	   	mean: 11.96 	std: 4.49 
Abstract tokens*	mean: 144.86 	st . . .