Please use the following text to cite this item or export to a predefined format:
Ševčíková, Magda; Žabokrtský, Zdeněk; Straková, Jana and Straka, Milan, 2014,
Czech Named Entity Corpus 1.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C.
| dc.contributor.author | Ševčíková, Magda |
| dc.contributor.author | Žabokrtský, Zdeněk |
| dc.contributor.author | Straková, Jana |
| dc.contributor.author | Straka, Milan |
| dc.date.accessioned | 2014-01-09T10:03:56Z |
| dc.date.available | 2014-01-09T10:03:56Z |
| dc.date.issued | 2014-01-09 |
| dc.description | Czech Named Entity Corpus 1.1 fixes some issues of the Czech Named Entity Corpus 1.0: misannotated entities are fixed, all formats contain the same data, tmt format is replaced with treex format, all formats contain splitting into training, development and testing portion of the data. |
| dc.description.sponsorship | SVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky), LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat), GPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny), PRVOUK (PRVOUK) |
| dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C |
| dc.language.iso | ces |
| dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
| dc.relation.replaces | http://hdl.handle.net/11858/00-097C-0000-0022-C73C-7 |
| dc.rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
| dc.rights.label | PUB |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
| dc.source.uri | http://ufal.mff.cuni.cz/cnec/ |
| dc.subject | named entity recognition |
| dc.subject | corpus |
| dc.title | Czech Named Entity Corpus 1.1 |
| dc.type | corpus |
| local.branding | LINDAT / CLARIAH-CZ |
| local.files.count | 1 |
| local.files.size | 10987946 |
| local.has.files | yes |
| local.language.name | Czech |
| local.size.info | 5868 sentences |
| local.sponsor | nationalFunds SVV 267 314 Univerzita Karlova v Praze (mimo GAUK) Teoretické základy informatiky a výpočetní lingvistiky |
| local.sponsor | nationalFunds LM2010013 Ministerstvo školství, mládeže a tělovýchovy České republiky LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat |
| local.sponsor | nationalFunds GPP406/12/P175 Grantová agentura České republiky Vybrané derivační vztahy pro automatické zpracování češtiny |
| local.sponsor | nationalFunds PRVOUK Univerzita Karlova v Praze (mimo GAUK) PRVOUK |
| metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | strakova@ufal.mff.cuni.cz |
| metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics in Prague |
| metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Jana |
| metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Straková |
| metashare.ResourceInfo#ContentInfo.description | Czech Named Entity Corpus 1.1 fixes some issues of the Czech Named Entity Corpus 1.0: misannotated entities are fixed, all formats contain the same data, tmt format is replaced with treex format, all formats contain splitting into training, development and testing portion of the data. |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| metashare.ResourceInfo#ContentInfo.resourceType | corpus |
| metashare.ResourceInfo#DistributionInfo#LicenseInfo.license | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
| metashare.ResourceInfo#DistributionInfo.availability | unrestrictedUse |
| metashare.ResourceInfo#IdentificationInfo.resourceName | Czech Named Entity Corpus 1.1 |
| metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | SVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky) |
| metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat) |
| metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | GPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny) |
| metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | PRVOUK (PRVOUK) |
| metashare.ResourceInfo#TextInfo#SizeInfo.size | 5868 |
| metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | sentences |
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- Czech_Named_Entity_Corpus_1.1.zip
- Size
- 10.48 MB
- Format
- application/zip
- Description
- Zip
- MD5
- 9457d49807c494a23a5f029f88fa09a6

- cnec1.1
- LICENSE21 kB
- README3 kB
- data
- xml
- named_ent_train.xml1 MB
- named_ent_etest.xml156 kB
- named_ent_dtest.xml153 kB
- named_ent.xml1 MB
- html
- named_ent_train.html1 MB
- named_ent.html1 MB
- named_ent_dtest.html207 kB
- named_ent_etest.html212 kB
- plain
- named_ent_train.txt835 kB
- named_ent_etest.txt106 kB
- named_ent_dtest.txt105 kB
- named_ent.txt1 MB
- treex
- named_ent.treex43 MB
- named_ent_train.treex34 MB
- named_ent_dtest.treex4 MB
- named_ent_etest.treex4 MB
- xml
- tools
- doc
- techrep-ne-2007.pdf600 kB
- doc.pdf151 kB
- statistics.txt923 B
- ne-type-hierarchy.pdf54 kB

