Zobrazit minimální záznam

 
dc.contributor.author Çano, Erion
dc.date.accessioned 2023-09-22T10:36:37Z
dc.date.available 2023-09-22T10:36:37Z
dc.date.issued 2023-09-19
dc.identifier.uri http://hdl.handle.net/11234/1-5214
dc.description AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. AlbNER data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using AlbMoRe corpus, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Recognition in Albanian. CoRR, abs/2309.08741, 2023. URL https://arxiv.org/abs/2309.08741.
dc.language.iso sqi
dc.publisher University of Vienna
dc.relation.isreferencedby https://arxiv.org/abs/2309.08741
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject named entity recognition
dc.subject under-resourced languages
dc.subject albanian language
dc.title AlbNER Named Entity Recognition in Albanian
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LRT + Open Submissions
contact.person Erion Çano erion.cano@univie.ac.at University of Vienna
size.info 900 entries
size.info 900 sentences
size.info 15822 tokens
files.size 51597
files.count 2


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (50.39 KB)
Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
README.txt
Velikost
1.07 KB
Formát
Textový soubor
Popis
Documentation
MD5
924c1618bb557042e41dddfc91c4a165
 Stáhnout soubor  Náhled
 Náhled souboru  
AlbNER Named Entity Recognition in Albanian
===========================================

AlbNER is a Named Entity Recognition corpus of
Wikipedia sentences in Albanian, consisting of
900 records. The sentence tokens are manually
labeled complying with the CoNLL-2003 shared
task annotation scheme explained at
https://aclanthology.org/W03-0419.pdf that uses
I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC,
B-MISC and O tags. From the total of 900 records,
500 of them should be used for model training
(file train.txt), 100 for model developmen
(file dev.txt) and remaining 300 (file test.txt)
for model testing. 


License
-------

AlbNER corpus data are released under CC-BY license
(https://creativecommons.org/licenses/by/4.0/). 


Download
--------

AlbNER corpus can be download from:
http://hdl.handle.net/11234/1-5214


Publications
------------

If using AlbMoRe data, please cite the following paper:

Çano Erion. AlbNER: A Corpus for Named Entity Re . . .
                                            
Icon
Název
AlbNER.zip
Velikost
49.32 KB
Formát
application/zip
Popis
Data
MD5
b006ca2a7dc12e3b7132ec3f20be4f92
 Stáhnout soubor  Náhled
 Náhled souboru  
  • AlbNER
    • test.txt45 kB
    • dev.txt14 kB
    • train.txt76 kB

Zobrazit minimální záznam