dc.contributor.author | Çano, Erion |
dc.date.accessioned | 2023-09-22T10:36:37Z |
dc.date.available | 2023-09-22T10:36:37Z |
dc.date.issued | 2023-09-19 |
dc.identifier.uri | http://hdl.handle.net/11234/1-5214 |
dc.description | AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. AlbNER data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using AlbMoRe corpus, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Recognition in Albanian. CoRR, abs/2309.08741, 2023. URL https://arxiv.org/abs/2309.08741. |
dc.language.iso | sqi |
dc.publisher | University of Vienna |
dc.relation.isreferencedby | https://arxiv.org/abs/2309.08741 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
dc.subject | named entity recognition |
dc.subject | under-resourced languages |
dc.subject | albanian language |
dc.title | AlbNER Named Entity Recognition in Albanian |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LRT + Open Submissions |
contact.person | Erion Çano erion.cano@univie.ac.at University of Vienna |
size.info | 900 entries |
size.info | 900 sentences |
size.info | 15822 tokens |
files.size | 51597 |
files.count | 2 |
Soubory tohoto záznamu
Stáhnout všechny soubory záznamu (50.39 KB)Licenční kategorie:
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Název
- README.txt
- Velikost
- 1.07 KB
- Formát
- Textový soubor
- Popis
- Documentation
- MD5
- 924c1618bb557042e41dddfc91c4a165
AlbNER Named Entity Recognition in Albanian =========================================== AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. From the total of 900 records, 500 of them should be used for model training (file train.txt), 100 for model developmen (file dev.txt) and remaining 300 (file test.txt) for model testing. License ------- AlbNER corpus data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). Download -------- AlbNER corpus can be download from: http://hdl.handle.net/11234/1-5214 Publications ------------ If using AlbMoRe data, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Re . . .
- Název
- AlbNER.zip
- Velikost
- 49.32 KB
- Formát
- application/zip
- Popis
- Data
- MD5
- b006ca2a7dc12e3b7132ec3f20be4f92
- AlbNER
- test.txt45 kB
- dev.txt14 kB
- train.txt76 kB