C4Corpus (CC BY-NC part)
Please use the following text to cite this item or export to a predefined format:
Gurevych, Iryna; Habernal, Ivan and Zayed, Omnia, 2016,
C4Corpus (CC BY-NC part), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-2204.
Authors
Item identifier
Project URL
Date issued
2016-04-14
Size
10000000000 tokens
Language(s)
Thai,
Description
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Publisher
Acknowledgement
German Research Foundation (DFG)
Project code:DIP DA 1600/1-1
Project name:Information Consolidation: A New Paradigm in Knowledge Search
Amazon
Project code:Amazon Web Services in Education Grant
Project name:Web Services in Education Grant
Subject(s)
Collections
This item isPublicly Available
and licensed under:


