Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2020 – VERSION 1)
Please use the following text to cite this item or export to a predefined format:
Rüdiger, Jan Oliver, 2024,
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2020 – VERSION 1), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-5793.
Authors
Item identifier
Date issued
2024-11-14
Size
4102984860 tokens,
3587840 texts
Language(s)
Description
*** german version see below ***
The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the German-language (visible) internet over time - with the aim of achieving comparability with the DeReKo (‘German Reference Corpus’ of the Leibniz Institute for the German Language - DeReKo volume 57 billion tokens - status: DeReKo Release 2024-I). The corpus is separated by year (here year 2020) and versioned (here version 1). Version 1 comprises (all years 2013-2024) 97.45 billion tokens.
The corpus is based on the data dumps from CommonCrawl (https://commoncrawl.org/). CommonCrawl is a non-profit organisation that provides copies of the visible Internet free of charge for research purposes.
The CommonCrawl WET raw data was first filtered by TLD (top-level domain). Only pages ending in the following TLDs were taken into account: ‘.at; .bayern; .berlin; .ch; .cologne; .de; .gmbh; .hamburg; .koeln; .nrw; .ruhr; .saarland; .swiss; .tirol; .wien; .zuerich’. These are the exclusive German-language TLDs according to ICANN (https://data.iana.org/TLD/tlds-alpha-by-domain.txt) as of 1 June 2024 - TLDs with a purely corporate reference (e.g. ‘.edeka; .bmw; .ford’) were excluded. The language of the individual documents (URLs) was then estimated with the help of NTextCat (https://github.com/ivanakcheurov/ntextcat) (via the CORE14 profile of NTextCat) - only those documents/URLs for which German was the most likely language were processed further (e.g. to exclude foreign-language material such as individual subpages). The third step involved filtering for manual selectors and filtering for 1:1 duplicates (within one year).
The filtering and subsequent processing was carried out using CorpusExplorer (http://hdl.handle.net/11234/1-2634) and our own (supplementary) scripts, and the TreeTagger (http://hdl.handle.net/11372/LRT-323) was used for automatic annotation. The corpus was processed on the HELIX HPC cluster. The author would like to take this opportunity to thank the state of Baden-Württemberg and the German Research Foundation (DFG) for the possibility to use the bwHPC/HELIX HPC cluster - funding code HPC cluster: INST 35/1597-1 FUGG.
Data content:
- Tokens and record boundaries
- Automatic lemma and POS annotation (using TreeTagger)
- Metadata:
- GUID - Unique identifier of the document
- YEAR - Year of capture (please use this information for data slices)
- Url - Full URL
- Tld - Top-Level Domain
- Domain - Domain without TLD (but with sub-domains if applicable)
- DomainFull - Complete domain (incl. TLD)
- DomainFull - Complete domain (incl. TLD)
- Datum - (System Information): Date of the CorpusExplorer (date of capture by CommonCrawl - not date of creation/modification of the document).
- Hash - (System Information): SHA1 hash of the CommonCrawl
- Pfad - (System Information): Path of the cluster (raw data) - is supplied by the system.
Please note that the files are saved as *.cec6.gz. These are binary files of the CorpusExplorer (see above). These files ensure efficient archiving. You can use both CorpusExplorer and the ‘CEC6-Converter’ (available for Linux, MacOS and Windows - see: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5705) to convert the data. The data can be exported in the following formats:
- CATMA v6
- CoNLL
- CSV
- CSV (only meta-data)
- DTA TCF-XML
- DWDS TEI-XML
- HTML
- IDS I5-XML
- IDS KorAP XML
- IMS Open Corpus Workbench
- JSON
- OPUS Corpus Collection XCES
- Plaintext
- SaltXML
- SlashA XML
- SketchEngine VERT
- SPEEDy/CODEX (JSON)
- TLV-XML
- TreeTagger
- TXM
- WebLicht
- XML
Please note that an export increases the storage space requirement extensively. The ‘CorpusExplorerConsole’ (https://github.com/notesjor/CorpusExplorer.Terminal.Console - available for Linux, MacOS and Windows) also offers a simple solution for editing and analysing. If you have any questions, please contact the author.
Legal information
The data was downloaded on 01.11.2024. The use, processing and distribution is subject to §60d UrhG (german copyright law), which authorises the use for non-commercial purposes in research and teaching. LINDAT/CLARIN is responsible for long-term archiving in accordance with §69d para. 5 and ensures that only authorised persons can access the data. The data has been checked to the best of our knowledge and belief (on a random basis) - should you nevertheless find legal violations (e.g. right to be forgotten, personal rights, etc.), please write an e-mail to the author (amc_report@jan-oliver-ruediger.de) with the following information: 1) why this content is undesirable (please outline only briefly) and 2) how the content can be identified - e.g. file name, URL or domain, etc. The author will endeavour to identify the content. The author will endeavour to remove the content and re-upload the data (modified) within two weeks (new version). If you have any further questions, please contact CLARIN.
*** english version see above ***
Das ‚Ancillary Monitor Corpus: Common Crawl - german web‘ wurde mit dem Ziel konzipiert - eine breit angelegte und zeitlich begleitende linguistische Analyse des deutschsprachigen (sichtbaren) Internets zu ermöglichen - wobei eine Vergleichbarkeit mit dem DeReKo (‚Deutsches Referenz Korpus‘ des Leibniz-Instituts für Deutsche Sprache - DeReKo Umfang 57 Mrd. Token - Stand: DeReKo Release 2024-I) angestrebt wird. Das Korpus ist nach Jahren getrennt (hier Jahr 2020) und versioniert (hier Version 1). Die Version 1 umfasst (alle Jahre 2013-2024) 97,45 Mrd. Token.
Das Korpus basiert auf den Daten-Dumps von CommonCrawl (https://commoncrawl.org/). CommonCrawl ist eine Non-Profit-Organisation, die Kopien des sichtbaren Internets kostenlos für die Forschung zur Verfügung stellt.
Die CommonCrawl WET Rohdaten wurden zunächst nach TLD (Top-Level Domain) gefiltert. Es wurden nur Seiten berücksichtigt, die auf folgende TLDs enden: „.at; .bayern; .berlin; .ch; .cologne; .de; .gmbh; .hamburg; .koeln; .nrw; .ruhr; .saarland; .swiss; .tirol; .wien; .zuerich“. Dies sind die exklusiven deutschsprachigen TLDs laut ICANN (https://data.iana.org/TLD/tlds-alpha-by-domain.txt) zum Stand 01.06.2024 - ausgeschlossen wurden TLDs mit reinem Firmenbezug (z.B. „.edeka; .bmw; .ford“). Für die einzelnen Dokumente (URLs) wurde dann mit Hilfe von NTextCat (https://github.com/ivanakcheurov/ntextcat) die Sprache geschätzt (über das CORE14-Profil von NTextCat) - es wurden nur solche Dokumente/URLs weiterverarbeitet, bei denen Deutsch die wahrscheinlichste Sprache war (z.B. um möglichst auszuschließen, dass fremdsprachiges Material wie einzelne Unterseitenbereiche enthalten sind). Als dritter Schritt erfolgte eine Filterung nach manuellen Selektoren und eine Filterung nach 1:1-Dubletten (innerhalb eines Jahres).
Die Filterung und anschließende Aufbereitung erfolgte mit dem CorpusExplorer (http://hdl.handle.net/11234/1-2634) und eigenen (ergänzenden) Skripten, wobei für die automatische Annotation der TreeTagger (http://hdl.handle.net/11372/LRT-323) verwendet wurde. Die Aufbereitung des Korpus erfolgte auf dem HELIX-HPC-Cluster. Der Autor dankt an dieser Stelle dem Land Baden-Württemberg und der Deutschen Forschungsgemeinschaft (DFG) für die Möglichkeit das bwHPC/HELIX HPC-Cluster nutzen zu können – Förderkennzeichen HPC-Cluster: INST 35/1597-1 FUGG.
Dateninhalt:
- Token und Satzgrenzen
- Automatische Lemma- und POS-Annotation (mittels TreeTagger)
- Metadaten:
- GUID - Eindeutiger Identifikator des Dokuments
- YEAR - Jahr der Erfassung (bitte verwenden Sie diese Angabe für Datenschnitte)
- Url - Vollständige URL
- Tld – Top-Level Domain
- Domain – Domain ohne TLD (aber ggf. mit Sub-Domains)
- DomainFull – Vollständige Domain (inkl. TLD)
- DomainFull - Komplette Domain (inkl. TLD)
- Datum - (System Information): Datum des CorpusExplorers (Tag der Erfassung durch CommonCrawl - nicht Tag der Erstellung/Änderung des Dokuments).
- Hash - (System Information): SHA1-Hash des CommonCrawl
- Pfad - (System Information): Pfad des Clusters (Rohdaten) - wird systembedingt geliefert.
Bitte beachten Sie, dass die Dateien als *.cec6.gz gespeichert sind. Dies sind Binärdateien des CorpusExplorers (siehe oben). Diese Dateien gewährleisten eine effiziente Archivierung. Sie können sowohl den CorpusExplorer als auch den ‚CEC6-Converter‘ (verfügbar für Linux, MacOS und Windows - siehe: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5705) zur Konvertierung der Daten verwenden. Die Daten können in folgende Formate exportiert werden:
- CATMA v6
- CoNLL
- CSV
- CSV (only meta-data)
- DTA TCF-XML
- DWDS TEI-XML
- HTML
- IDS I5-XML
- IDS KorAP XML
- IMS Open Corpus Workbench
- JSON
- OPUS Corpus Collection XCES
- Plaintext
- SaltXML
- SlashA XML
- SketchEngine VERT
- SPEEDy/CODEX (JSON)
- TLV-XML
- TreeTagger
- TXM
- WebLicht
- XML
Bitte beachten Sie, dass ein Export den Speicherplatzbedarf erheblich erhöht. Eine einfache Lösung zur Bearbeitung und Analyse bietet auch die „CorpusExplorerConsole“ (https://github.com/notesjor/CorpusExplorer.Terminal.Console - verfügbar für Linux, MacOS und Windows). Bei Fragen wenden Sie sich bitte an den Autor.
Rechtliche Hinweise
Die Daten wurden am 01.11.2024 heruntergeladen. Die Nutzung, Verarbeitung und Verbreitung unterliegt §60d UrhG, der die Nutzung für nicht kommerzielle Zwecke in Forschung und Lehre erlaubt. LINDAT/CLARIN übernimmt die Langzeitarchivierung nach §69d Abs. 5 und stellt sicher, dass nur berechtigte Personen auf die Daten zugreifen können. Die Daten wurden nach bestem Wissen und Gewissen (stichprobenartig) überprüft - sollten Sie dennoch Rechtsverletzungen (z.B. Recht auf Vergessenwerden, Persönlichkeitsrechte etc.) finden, schreiben Sie bitte eine E-Mail an den Autor (amc_report@jan-oliver-ruediger.de) mit folgenden Informationen: 1) warum dieser Inhalt unerwünscht ist (bitte nur kurz skizzieren) und 2) wie der Inhalt identifiziert werden kann - z.B. Dateiname, URL oder Domain etc. Der Autor wird sich bemühen, den Inhalt zu entfernen und die Daten innerhalb von zwei Wochen (verändert) wieder hochzuladen (neue Version). Bei weiteren Fragen wenden Sie sich bitte an CLARIN.
Publisher
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- 2020_0008.cec6.gz
- Size
- 202.5 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- da646d8d9688463840da04db903114ad

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0010.cec6.gz
- Size
- 196.27 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- c222e28f16b34b62959609747be83110

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0009.cec6.gz
- Size
- 200.02 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 7dfa9430519158e3457d6be94f39b849

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0013.cec6.gz
- Size
- 201.76 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 95d666d6708390ba6cfedbc4f0404e19

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0012.cec6.gz
- Size
- 199.92 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 55b6a6cab0bd82747cac218f8101f353

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0011.cec6.gz
- Size
- 197.93 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- d663475308d07f7b38087b9544f2a5a2

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0016.cec6.gz
- Size
- 203.55 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 8eea18636a6b4fe5aed564b8053c6b17

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0024.cec6.gz
- Size
- 204.76 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 32828aa3e8a20f76f054519a2b219a84

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0025.cec6.gz
- Size
- 192.38 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- db3766acf5d239e9ede3c72f592e2307

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0021.cec6.gz
- Size
- 200.11 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- ab49e7757a98a541600c1c233ed1e0ee

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0001.cec6.gz
- Size
- 200.24 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 6d1fdae486a3cff6614504d85636cf04

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0002.cec6.gz
- Size
- 199.71 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 967f9516dda1970e345df5d4531f669b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0023.cec6.gz
- Size
- 202.73 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 11f205e0b2787935b71cdab8e3f2cc9e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0022.cec6.gz
- Size
- 199.88 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 05e5ef9c1d4c92a94bddd81d537fa4ea

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0003.cec6.gz
- Size
- 197.7 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- d819c5f72e3f7adc65fc624eaa35af52

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0006.cec6.gz
- Size
- 200.88 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- c4c3b4423f8e5fccf9c4c77883c1ad35

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0004.cec6.gz
- Size
- 197.9 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 27ee1937bf9301e25677397cbc24ca7f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0005.cec6.gz
- Size
- 203.79 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 65487826bc7f3115820a79c44d1eb481

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0007.cec6.gz
- Size
- 200.68 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- f0230f38f98c1a6effb7cb05006d5ea4

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0015.cec6.gz
- Size
- 194.45 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 4a7bd987eddab2d475b201fbf90f9c3b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0018.cec6.gz
- Size
- 195.51 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 3ccbbae60ec5b36f8894d5efb0981788

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0017.cec6.gz
- Size
- 195.2 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 36532edd6e14fa21ecb40014da123691

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0014.cec6.gz
- Size
- 206.66 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 87fef0c513f21e2e4d87475804091325

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0019.cec6.gz
- Size
- 201.13 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 1bc7ab6b4b4493cc0d80ca1d201c3281

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0020.cec6.gz
- Size
- 202.08 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 9fdcff55e68c3e5359c084b9946d1987

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0044.cec6.gz
- Size
- 193.57 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 27f38603a0ca784a604a436d872a5c40

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0042.cec6.gz
- Size
- 191.01 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- a52d6391889224f8028c1b1e621d8e57

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0045.cec6.gz
- Size
- 202.14 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 7c9e38526aa78415a731b60ef29af853

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0047.cec6.gz
- Size
- 198.89 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2d85737dafc754f9159edb265f2c0bba

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0046.cec6.gz
- Size
- 200.68 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 1d13e03eca109344ae2f0bce4f36fccb

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0048.cec6.gz
- Size
- 201.75 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 4135950d81202bae06776a71b134e9e8

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0049.cec6.gz
- Size
- 197.91 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 31f1087533ee4c955c53db5f87beefdf

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0050.cec6.gz
- Size
- 203.08 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 12f4269ae5455f18539fd1319a44f483

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0029.cec6.gz
- Size
- 197.01 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 31c9fed609cc02b4e64db2638a70f5ea

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0027.cec6.gz
- Size
- 198.39 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- aafbb4437fee49e65ab260d388afd521

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0026.cec6.gz
- Size
- 198.31 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 9c0cd0485106e3f926ce7324029392bb

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0030.cec6.gz
- Size
- 190.6 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 6bcf9e5aae6d3760774dda4879235c78

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0040.cec6.gz
- Size
- 198.76 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- a2a376a52dfab3803c9899e6eaaf6103

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0028.cec6.gz
- Size
- 203.76 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- d2611f63f13462b64ccba00c4c71f276

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0031.cec6.gz
- Size
- 201.81 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 666f745673f9993a4a860e9c2e60e14f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0043.cec6.gz
- Size
- 200.78 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- cdd5a0a32a6b14fbd99f797be1580d38

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0041.cec6.gz
- Size
- 203.83 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 65db6cb1fc9199ec3ae0481d6befd020

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0034.cec6.gz
- Size
- 197.02 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 52bf68166ced3881bc156fb80c4da826

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0036.cec6.gz
- Size
- 199.52 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2868ce80d1efdc54e9ef7c477a24f235

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0033.cec6.gz
- Size
- 200.02 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 6418493755bf5a74ef54388f6fa98706

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0035.cec6.gz
- Size
- 198.95 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 4ba1dcd10dda14ccb047af21d40d8fa6

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0037.cec6.gz
- Size
- 203.13 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- e08f50dafd85de1eb69831b68d972ece

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0039.cec6.gz
- Size
- 195.71 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- c1e00853aabf36e955fa4487d7027a15

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0032.cec6.gz
- Size
- 199.7 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- ac396bb4dda026b9cdc940d985728f18

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0038.cec6.gz
- Size
- 198.15 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 3101e8b14ed3d691d04bb3111ec2f119

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0060.cec6.gz
- Size
- 200.45 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- fb729c37de4c018ad7525f9fe39439af

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0053.cec6.gz
- Size
- 203.55 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 5b94280f88ab4b5cc7337490bf287d44

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0051.cec6.gz
- Size
- 197.7 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 994f6242cbac7cbe482a06cc68c88b5f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0052.cec6.gz
- Size
- 200.56 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 5fa7310dfbfe78023ed8d6a5a16cb521

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0057.cec6.gz
- Size
- 200.74 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 5efb9ac704ad3856d5c25a10618926a2

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0054.cec6.gz
- Size
- 202.62 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- d27c07c3091b57dd8e3ece4027f6109b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0055.cec6.gz
- Size
- 204.7 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 6f1e37c1aa589c3169d0c43910758e39

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0056.cec6.gz
- Size
- 199.87 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2580d8691e613d40418fec92b2d1fdcd

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0058.cec6.gz
- Size
- 198.43 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 1b274b2d86805148788cbc5940564844

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0059.cec6.gz
- Size
- 201.59 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- fb22d2ab355d0c8dc14cbfc9bc451736

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0065.cec6.gz
- Size
- 204.13 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 36e48ebc27d53f99d918e81747a9dde1

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0066.cec6.gz
- Size
- 200.61 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 89ed17f2fd0fceec77e8fa86b89b9167

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0067.cec6.gz
- Size
- 195.18 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- af5fa3aa754e141872dcada59d45830d

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0061.cec6.gz
- Size
- 198.09 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- e5dd0cce53f8135d09e6b7eaf167d264

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0064.cec6.gz
- Size
- 201.68 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- f35ca1cf1f7ecd3a13ab4dc2af857319

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0070.cec6.gz
- Size
- 205.59 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 4d25dedc7d05bde3f5dace74f44925a8

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0063.cec6.gz
- Size
- 195.65 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 61311508b443c88655528c8b5c1adcdf

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0062.cec6.gz
- Size
- 197.94 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- fc0355dcc6fddb0370fe82b07828f6f0

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0068.cec6.gz
- Size
- 200.62 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- b771c6c421864972e91099473c000842

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0069.cec6.gz
- Size
- 199.25 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 7d67999b669365017807a394c31e035e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0072.cec6.gz
- Size
- 203.73 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 5e3f314ef534d1fcb20b96eac5c6e226

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0080.cec6.gz
- Size
- 205.64 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- df52b2242f8b99782a9dfda87cfdf9f3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0079.cec6.gz
- Size
- 197.5 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- eb0dc9872f7184966a2864159cf83868

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0071.cec6.gz
- Size
- 191.15 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 32e7cbcb3f2f51f8994bc9e626a59d42

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0077.cec6.gz
- Size
- 201.05 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- aa82024907ea17bf1eca743a735e5b16

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0078.cec6.gz
- Size
- 192.22 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 00200c77f5fc08fbe066c245ef2fe68c

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0075.cec6.gz
- Size
- 193.89 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 68d2bb6e7b10c202596206781318c75e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0074.cec6.gz
- Size
- 201.41 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- f13b11bbd05b7a074edd2a37b4df8352

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0073.cec6.gz
- Size
- 203.88 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- b25a17edce70b1bf3bc1ebb4e59d970f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0076.cec6.gz
- Size
- 204.2 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- b0940f1e405e593e20e848ddf721500a

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0083.cec6.gz
- Size
- 192.32 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- c917f525e214cffd5d75562c77607b0f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0082.cec6.gz
- Size
- 202.08 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2e43b497c0979664ca1d8195cb986bab

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0087.cec6.gz
- Size
- 199.03 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 864e155d6cc2fedb4e7b67312381e3d6

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0086.cec6.gz
- Size
- 201.31 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2584da77934d87f1dabe5564aa112160

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0085.cec6.gz
- Size
- 198.89 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 9878454408c753a7dc74bad3575d4e92

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0084.cec6.gz
- Size
- 202.55 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- e9c4f9bf1ec2c3d449bd8a8de0419400

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0090.cec6.gz
- Size
- 200.44 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 407eae18093794c9778f8489a8e791e1

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0081.cec6.gz
- Size
- 203.79 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 4c3c67441e24eccbcd5c063f6192d3c2

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0089.cec6.gz
- Size
- 200.65 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- ffa3b8a000c48830ef2e578e3993469f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0088.cec6.gz
- Size
- 196.9 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 43add20e2df4abfbbcd7abb976b8c2eb

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0103.cec6.gz
- Size
- 98.46 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- a77996dbcb7bdd48ecb64f3c95996f36

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0102.cec6.gz
- Size
- 202.18 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 8178505e669fba3aabddc6662ec5648b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0099.cec6.gz
- Size
- 203.1 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2f107a4d27057e9cdb752c7a70920ac3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0098.cec6.gz
- Size
- 200.33 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- bf96c58558a2a38c052e63061e1e9f23

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0100.cec6.gz
- Size
- 194.5 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 21ce5c977589c6704925ee9472b231e9

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0101.cec6.gz
- Size
- 195.61 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- e833a522fd1538ecad32ce59f2ba2004

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0092.cec6.gz
- Size
- 194.63 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- fb44efeab12f731bf85ced282251cb50

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0095.cec6.gz
- Size
- 210.58 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 6a611dd9c2de913b7c70d26c25232a7e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0093.cec6.gz
- Size
- 205.48 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 15b0b655ba41cf60ba7f5a833540ff24

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0096.cec6.gz
- Size
- 189.59 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 0b733cd28fdfe5334226b8db0ab1e0d9

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0091.cec6.gz
- Size
- 197.95 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 8d8909f90c32377a63248f406b91d862

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0094.cec6.gz
- Size
- 198.75 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 5778bcc1d9bbd2e3eeca2ad627431acf

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- 2020_0097.cec6.gz
- Size
- 197.97 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 0a7c100f38b0cefe37d6882f01f944df

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

