Zobrazit minimální záznam

 
dc.contributor.author Spoustová, Johanka
dc.contributor.author Spousta, Miroslav
dc.date.accessioned 2012-06-21T11:53:56Z
dc.date.available 2012-06-21T11:53:56Z
dc.date.issued 2012-06-21
dc.identifier.uri http://hdl.handle.net/11858/00-097C-0000-0006-B847-6
dc.description Web corpus of Czech, created in 2011. Contains newspapers+magazines, discussions, blogs. See http://www.lrec-conf.org/proceedings/lrec2012/summaries/120.html for details.
dc.description.sponsorship GA405/09/0278
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
dc.rights.uri http://creativecommons.org/licenses/by/3.0/
dc.subject corpus
dc.subject Czech
dc.subject web
dc.title CWC2011
dc.type corpus
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Spoustová
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Johanka
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName Charles University in Prague, UFAL
metashare.ResourceInfo#DistributionInfo.availability unrestrictedUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium download
metashare.ResourceInfo#ValidationInfo.validated True
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName #1-Internet as a Language Corpus
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType #1-National
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#LanguageInfo.languageCoding ces
metashare.ResourceInfo#TextInfo#SizeInfo.size 2650000000
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit words
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email johanka@ucw.cz
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
sponsor Grantová agentura České republiky GA405/09/0278 Internet jako jazykový korpus nationalFunds
size.info 2650000000 words
files.size 6074441470
files.count 6
featuredService.kontext basic|https://lindat.mff.cuni.cz/services/kontext/first_form?corpname=cwc_11_cs_w
featuredService.kontext with syntactic annotation|https://lindat.mff.cuni.cz/services/kontext/first_form?corpname=cwc_parsed_cs_a


 Soubory tohoto záznamu

Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
Distributed under Creative Commons Attribution Required
Icon
Název
plain.articles_shuffled.txt.bz2
Velikost
1.17 GB
Formát
application/x-bzip2
Popis
Articles, 700M tokens, sentence-shuffled, plain forms only, sentence-breaks (<s>), one token per line. UTF-8.
MD5
cf9bc9b5d0425af41e3f40dcef62c2e1
 Stáhnout soubor
Icon
Název
plain.blogs_shuffled.txt.bz2
Velikost
2.16 GB
Formát
application/x-bzip2
Popis
Blogs, 1.2B tokens, sentence-shuffled, plain forms only, sentence-breaks (<s>), one token per line. UTF-8.
MD5
b37a4cdf02b414793adbb2bab7d5641a
 Stáhnout soubor
Icon
Název
plain.discussions_shuffled.txt.bz2
Velikost
2.27 GB
Formát
application/x-bzip2
Popis
Discussions, 1.4B tokens, sentence-shuffled, plain forms only, sentence-breaks (<s>), one token per line. UTF-8.
MD5
0cccab42183d211515dfbed99aa48b26
 Stáhnout soubor
Icon
Název
urls-articles.bz2
Velikost
20.58 MB
Formát
application/x-bzip2
Popis
url list of the articles section
MD5
1a2034c69c80225d666ff80526b7c884
 Stáhnout soubor
Icon
Název
urls-blogs.bz2
Velikost
31.5 MB
Formát
application/x-bzip2
Popis
url list of the blogs section
MD5
34a1e6760880d661d7ab7a2da94c9a70
 Stáhnout soubor
Icon
Název
urls-discussions.bz2
Velikost
14.12 MB
Formát
application/x-bzip2
Popis
url list of the discussions section
MD5
858ec3d95e6eae67a8a15241dc499801
 Stáhnout soubor

Zobrazit minimální záznam