This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 
Please use the following text to cite this item or export to a predefined format:
Ševčíková, Magda; Žabokrtský, Zdeněk; Straková, Jana and Straka, Milan, 2014, Czech Named Entity Corpus 1.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C.
dc.contributor.authorŠevčíková, Magda
dc.contributor.authorŽabokrtský, Zdeněk
dc.contributor.authorStraková, Jana
dc.contributor.authorStraka, Milan
dc.date.accessioned2014-01-09T10:03:56Z
dc.date.available2014-01-09T10:03:56Z
dc.date.issued2014-01-09
dc.descriptionCzech Named Entity Corpus 1.1 fixes some issues of the Czech Named Entity Corpus 1.0: misannotated entities are fixed, all formats contain the same data, tmt format is replaced with treex format, all formats contain splitting into training, development and testing portion of the data.
dc.description.sponsorshipSVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky), LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat), GPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny), PRVOUK (PRVOUK)
dc.identifier.urihttp://hdl.handle.net/11858/00-097C-0000-0023-1B04-C
dc.language.isoces
dc.publisherCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.replaceshttp://hdl.handle.net/11858/00-097C-0000-0022-C73C-7
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/
dc.source.urihttp://ufal.mff.cuni.cz/cnec/
dc.subjectnamed entity recognition
dc.subjectcorpus
dc.titleCzech Named Entity Corpus 1.1
dc.typecorpus
local.brandingLINDAT / CLARIAH-CZ
local.files.count1
local.files.size10987946
local.has.filesyes
local.language.nameCzech
local.size.info5868 sentences
local.sponsornationalFunds SVV 267 314 Univerzita Karlova v Praze (mimo GAUK) Teoretické základy informatiky a výpočetní lingvistiky
local.sponsornationalFunds LM2010013 Ministerstvo školství, mládeže a tělovýchovy České republiky LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
local.sponsornationalFunds GPP406/12/P175 Grantová agentura České republiky Vybrané derivační vztahy pro automatické zpracování češtiny
local.sponsornationalFunds PRVOUK Univerzita Karlova v Praze (mimo GAUK) PRVOUK
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.emailstrakova@ufal.mff.cuni.cz
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationNameCharles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics in Prague
metashare.ResourceInfo#ContactInfo#PersonInfo.givenNameJana
metashare.ResourceInfo#ContactInfo#PersonInfo.surnameStraková
metashare.ResourceInfo#ContentInfo.descriptionCzech Named Entity Corpus 1.1 fixes some issues of the Czech Named Entity Corpus 1.0: misannotated entities are fixed, all formats contain the same data, tmt format is replaced with treex format, all formats contain splitting into training, development and testing portion of the data.
metashare.ResourceInfo#ContentInfo.mediaTypetext
metashare.ResourceInfo#ContentInfo.resourceTypecorpus
metashare.ResourceInfo#DistributionInfo#LicenseInfo.licenseAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
metashare.ResourceInfo#DistributionInfo.availabilityunrestrictedUse
metashare.ResourceInfo#IdentificationInfo.resourceNameCzech Named Entity Corpus 1.1
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectNameSVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky)
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectNameLM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat)
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectNameGPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny)
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectNamePRVOUK (PRVOUK)
metashare.ResourceInfo#TextInfo#SizeInfo.size5868
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnitsentences

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2*
2014-01-09 00:00:00
2007-01-01 00:00:00
* Selected version
This item isPublicly Available
and licensed under:
 Files in this item
Name
Czech_Named_Entity_Corpus_1.1.zip
Size
10.48 MB
Format
application/zip
Description
Zip
MD5
9457d49807c494a23a5f029f88fa09a6
Preview
  File Preview
  • cnec1.1
    • LICENSE21 kB
    • README3 kB
    • data
      • xml
        • named_ent_train.xml1 MB
        • named_ent_etest.xml156 kB
        • named_ent_dtest.xml153 kB
        • named_ent.xml1 MB
      • html
        • named_ent_train.html1 MB
        • named_ent.html1 MB
        • named_ent_dtest.html207 kB
        • named_ent_etest.html212 kB
      • plain
        • named_ent_train.txt835 kB
        • named_ent_etest.txt106 kB
        • named_ent_dtest.txt105 kB
        • named_ent.txt1 MB
      • treex
        • named_ent.treex43 MB
        • named_ent_train.treex34 MB
        • named_ent_dtest.treex4 MB
        • named_ent_etest.treex4 MB
    • tools
      • statistics.pl509 B
      • Treex
      • namedent_annotations_to_html.pl3 kB
      • namedent_annotations_to_xml_simple.pl559 B
      • compare_ne_outputs_v2.pl14 kB
    • doc
      • techrep-ne-2007.pdf600 kB
      • doc.pdf151 kB
      • statistics.txt923 B
      • ne-type-hierarchy.pdf54 kB