This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

C4Corpus (CC BY-NC-ND part)

Please use the following text to cite this item or export to a predefined format:
Gurevych, Iryna; Habernal, Ivan and Zayed, Omnia, 2016, C4Corpus (CC BY-NC-ND part), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-2205.
Date issued
2016-04-14
Size
10000000000 tokens
Description
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Acknowledgement
 Files in this item
Name
Lic_by-nc-nd_Lang_nl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
9.12 MB
Format
application/x-gzip
Description
MD5
cc980a0a114ba92bae0dc7fcfac2021e
Preview
  File Preview
    • Lic_by-nc-nd_Lang_nl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc24 MB
    • Lic_by-nc-nd_Lang_nl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc24 MB
Name
Lic_by-nc-nd_Lang_no_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc.gz
Size
604.14 KB
Format
application/x-gzip
Description
MD5
01d3e8243c339e306d42dcaaf45588f0
Preview
  File Preview
    • Lic_by-nc-nd_Lang_no_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_no_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_pl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
3.19 MB
Format
application/x-gzip
Description
MD5
855da3c770120baf0ae00f889f9235d3
Preview
  File Preview
    • Lic_by-nc-nd_Lang_pl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc8 MB
    • Lic_by-nc-nd_Lang_pl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc8 MB
Name
Lic_by-nc-nd_Lang_pt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc.gz
Size
94.81 MB
Format
application/x-gzip
Description
MD5
8905485e48c60409506ee6551de02a98
Preview
  File Preview
    • Lic_by-nc-nd_Lang_pt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc263 MB
    • Lic_by-nc-nd_Lang_pt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc263 MB
Name
Lic_by-nc-nd_Lang_ro_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc.gz
Size
7.01 MB
Format
application/x-gzip
Description
MD5
12c5eeee8ad9de164b2b3b2b533d0f05
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ro_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc18 MB
    • Lic_by-nc-nd_Lang_ro_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc18 MB
Name
Lic_by-nc-nd_Lang_ru_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc.gz
Size
2.5 MB
Format
application/x-gzip
Description
MD5
07c4909f600ae1d2354dcec1eb26d263
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ru_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc8 MB
    • Lic_by-nc-nd_Lang_ru_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc8 MB
Name
Lic_by-nc-nd_Lang_sk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc.gz
Size
508.3 KB
Format
application/x-gzip
Description
MD5
16bf8e286e72d012bbbb896b94873bdb
Preview
  File Preview
    • Lic_by-nc-nd_Lang_sk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_sk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_sl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
843.34 KB
Format
application/x-gzip
Description
MD5
806b6b8d4086dae12d6f90b6f0ac68e0
Preview
  File Preview
    • Lic_by-nc-nd_Lang_sl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_sl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_so_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc.gz
Size
180.85 KB
Format
application/x-gzip
Description
MD5
083a395b26194c8ea058b7b4d701208d
Preview
  File Preview
    • Lic_by-nc-nd_Lang_so_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc513 kB
    • Lic_by-nc-nd_Lang_so_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc513 kB
Name
Lic_by-nc-nd_Lang_sq_NoBoilerplate_true_MinHtml_true-r-00020.seg-00000.warc.gz
Size
272.04 KB
Format
application/x-gzip
Description
MD5
9cdaeee4e1e76b23fc26f814646f0b7a
Preview
  File Preview
    • Lic_by-nc-nd_Lang_sq_NoBoilerplate_true_MinHtml_true-r-00020.seg-00000.warc819 kB
    • Lic_by-nc-nd_Lang_sq_NoBoilerplate_true_MinHtml_true-r-00020.seg-00000.warc819 kB
Name
Lic_by-nc-nd_Lang_sv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc.gz
Size
1.31 MB
Format
application/x-gzip
Description
MD5
e3d5f60275df7f59cfa761d2e1f7334b
Preview
  File Preview
    • Lic_by-nc-nd_Lang_sv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc3 MB
    • Lic_by-nc-nd_Lang_sv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc3 MB
Name
Lic_by-nc-nd_Lang_sw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc.gz
Size
62.72 KB
Format
application/x-gzip
Description
MD5
a34ba1d8320bf614bf92c813fa66d686
Preview
  File Preview
    • Lic_by-nc-nd_Lang_sw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc169 kB
    • Lic_by-nc-nd_Lang_sw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc169 kB
Name
Lic_by-nc-nd_Lang_ta_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc.gz
Size
1.53 MB
Format
application/x-gzip
Description
MD5
5a023eddd92c402d127217bd3075d824
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ta_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc7 MB
    • Lic_by-nc-nd_Lang_ta_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc7 MB
Name
Lic_by-nc-nd_Lang_te_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc.gz
Size
52.85 KB
Format
application/x-gzip
Description
MD5
ef3ef717ef84f738f9b323191623afae
Preview
  File Preview
    • Lic_by-nc-nd_Lang_te_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc209 kB
    • Lic_by-nc-nd_Lang_te_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc209 kB
Name
Lic_by-nc-nd_Lang_th_NoBoilerplate_true_MinHtml_true-r-00011.seg-00000.warc.gz
Size
7.77 MB
Format
application/x-gzip
Description
MD5
3ffda6791e169566ba9cc051ccb2c4ca
Preview
  File Preview
    • Lic_by-nc-nd_Lang_th_NoBoilerplate_true_MinHtml_true-r-00011.seg-00000.warc33 MB
    • Lic_by-nc-nd_Lang_th_NoBoilerplate_true_MinHtml_true-r-00011.seg-00000.warc33 MB
Name
Lic_by-nc-nd_Lang_tl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
779.85 KB
Format
application/x-gzip
Description
MD5
76dea75971e43d32b1e032577e34cf62
Preview
  File Preview
    • Lic_by-nc-nd_Lang_tl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_tl_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_tr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
7.76 MB
Format
application/x-gzip
Description
MD5
d79ea300a3eacd2b5fbd99563e00f5cc
Preview
  File Preview
    • Lic_by-nc-nd_Lang_tr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc21 MB
    • Lic_by-nc-nd_Lang_tr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc21 MB
Name
Lic_by-nc-nd_Lang_uk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc.gz
Size
394.12 KB
Format
application/x-gzip
Description
MD5
a911fa929596527e2e4b66da4a0bdf5f
Preview
  File Preview
    • Lic_by-nc-nd_Lang_uk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_uk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_unknown_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc.gz
Size
34.68 MB
Format
application/x-gzip
Description
MD5
eac29c8586b6514bd89e42f8e2588558
Preview
  File Preview
    • Lic_by-nc-nd_Lang_unknown_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc91 MB
    • Lic_by-nc-nd_Lang_unknown_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc91 MB
Name
Lic_by-nc-nd_Lang_ur_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
78.57 KB
Format
application/x-gzip
Description
MD5
421b1ff78fe344087ca0a796d6fd5393
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ur_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc250 kB
    • Lic_by-nc-nd_Lang_ur_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc250 kB
Name
Lic_by-nc-nd_Lang_vi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc.gz
Size
7 MB
Format
application/x-gzip
Description
MD5
242f1bed62fcfcf07ac0a198cc678e8e
Preview
  File Preview
    • Lic_by-nc-nd_Lang_vi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc23 MB
    • Lic_by-nc-nd_Lang_vi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc23 MB
Name
Lic_by-nc-nd_Lang_zh-cn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc.gz
Size
607.65 KB
Format
application/x-gzip
Description
MD5
c1d05bf9c964cd34a975c781c03459de
Preview
  File Preview
    • Lic_by-nc-nd_Lang_zh-cn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_zh-cn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_zh-tw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc.gz
Size
630.03 KB
Format
application/x-gzip
Description
MD5
cf84cdad88541b4df8b4a38639951862
Preview
  File Preview
    • Lic_by-nc-nd_Lang_zh-tw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_zh-tw_NoBoilerplate_true_MinHtml_true-r-00026.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_af_NoBoilerplate_true_MinHtml_true-r-00009.seg-00000.warc.gz
Size
36.86 KB
Format
application/x-gzip
Description
MD5
69576a3e19707595aa71930b605f4627
Preview
  File Preview
    • Lic_by-nc-nd_Lang_af_NoBoilerplate_true_MinHtml_true-r-00009.seg-00000.warc108 kB
    • Lic_by-nc-nd_Lang_af_NoBoilerplate_true_MinHtml_true-r-00009.seg-00000.warc108 kB
Name
Lic_by-nc-nd_Lang_ar_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
4.74 MB
Format
application/x-gzip
Description
MD5
6c02fdcceb78ef9e122886236866f138
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ar_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc16 MB
    • Lic_by-nc-nd_Lang_ar_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc16 MB
Name
Lic_by-nc-nd_Lang_bg_NoBoilerplate_true_MinHtml_true-r-00010.seg-00000.warc.gz
Size
4.72 MB
Format
application/x-gzip
Description
MD5
ef86643e43206b0f4f6ee29df6054171
Preview
  File Preview
    • Lic_by-nc-nd_Lang_bg_NoBoilerplate_true_MinHtml_true-r-00010.seg-00000.warc16 MB
    • Lic_by-nc-nd_Lang_bg_NoBoilerplate_true_MinHtml_true-r-00010.seg-00000.warc16 MB
Name
Lic_by-nc-nd_Lang_bn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc.gz
Size
389.08 KB
Format
application/x-gzip
Description
MD5
e61bd49a4eb0869e3d9b4b4f0de54c2b
Preview
  File Preview
    • Lic_by-nc-nd_Lang_bn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_bn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_cs_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc.gz
Size
3.39 MB
Format
application/x-gzip
Description
MD5
bf3bd2ef9709f65099fca54c20048830
Preview
  File Preview
    • Lic_by-nc-nd_Lang_cs_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc8 MB
    • Lic_by-nc-nd_Lang_cs_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc8 MB
Name
Lic_by-nc-nd_Lang_da_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc.gz
Size
1.7 MB
Format
application/x-gzip
Description
MD5
543fc4c405cf9c2ecb2c5f2a09dcd0d7
Preview
  File Preview
    • Lic_by-nc-nd_Lang_da_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc6 MB
    • Lic_by-nc-nd_Lang_da_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc6 MB
Name
Lic_by-nc-nd_Lang_de_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc.gz
Size
97 MB
Format
application/x-gzip
Description
MD5
69bba32a551bf406157fa45776063cb9
Preview
  File Preview
    • Lic_by-nc-nd_Lang_de_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc298 MB
    • Lic_by-nc-nd_Lang_de_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc298 MB
Name
Lic_by-nc-nd_Lang_el_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
4.59 MB
Format
application/x-gzip
Description
MD5
fc64bc4d976fdb11fc9b925aa48b68e2
Preview
  File Preview
    • Lic_by-nc-nd_Lang_el_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc15 MB
    • Lic_by-nc-nd_Lang_el_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc15 MB
Name
Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc.gz
Size
953.7 MB
Format
application/x-gzip
Description
MD5
712b703c52e6dc4f35e86d319baae75f
Preview
  File Preview
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc2 GB
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc2 GB
Name
Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00001.warc.gz
Size
953.72 MB
Format
application/x-gzip
Description
MD5
8d7e5b1137285d1fb32f292ec56aa6ef
Preview
  File Preview
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00001.warc2 GB
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00001.warc2 GB
Name
Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00002.warc.gz
Size
953.7 MB
Format
application/x-gzip
Description
MD5
0a6ba521fd9e01552734b8ab45560c48
Preview
  File Preview
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00002.warc2 GB
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00002.warc2 GB
Name
Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00003.warc.gz
Size
422.15 MB
Format
application/x-gzip
Description
MD5
d0bc2bec0bd2c21a54e4ab0866ec1820
Preview
  File Preview
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00003.warc1 GB
    • Lic_by-nc-nd_Lang_en_NoBoilerplate_true_MinHtml_true-r-00017.seg-00003.warc1 GB
Name
Lic_by-nc-nd_Lang_es_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc.gz
Size
434.93 MB
Format
application/x-gzip
Description
MD5
9dccc9d83345aa23609153b014926f22
Preview
  File Preview
    • Lic_by-nc-nd_Lang_es_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc1 GB
    • Lic_by-nc-nd_Lang_es_NoBoilerplate_true_MinHtml_true-r-00022.seg-00000.warc1 GB
Name
Lic_by-nc-nd_Lang_et_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc.gz
Size
962.27 KB
Format
application/x-gzip
Description
MD5
f4b1f78285e8d86845fd5fcba44809ba
Preview
  File Preview
    • Lic_by-nc-nd_Lang_et_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc3 MB
    • Lic_by-nc-nd_Lang_et_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc3 MB
Name
Lic_by-nc-nd_Lang_fa_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc.gz
Size
1.24 MB
Format
application/x-gzip
Description
MD5
0fe1c018920afce4a44c32399dcaf839
Preview
  File Preview
    • Lic_by-nc-nd_Lang_fa_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc4 MB
    • Lic_by-nc-nd_Lang_fa_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc4 MB
Name
Lic_by-nc-nd_Lang_fi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc.gz
Size
886.21 KB
Format
application/x-gzip
Description
MD5
6e0ada40860fec5077fd9117a4c1da1f
Preview
  File Preview
    • Lic_by-nc-nd_Lang_fi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_fi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_fr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
83.28 MB
Format
application/x-gzip
Description
MD5
aad91d581ff4dfe4aa699a01c4dd27c5
Preview
  File Preview
    • Lic_by-nc-nd_Lang_fr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc236 MB
    • Lic_by-nc-nd_Lang_fr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc236 MB
Name
Lic_by-nc-nd_Lang_gu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc.gz
Size
170.27 KB
Format
application/x-gzip
Description
MD5
3344cbbeee8499f071f40fc1a4650ee9
Preview
  File Preview
    • Lic_by-nc-nd_Lang_gu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc726 kB
    • Lic_by-nc-nd_Lang_gu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc726 kB
Name
Lic_by-nc-nd_Lang_he_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc.gz
Size
923.04 KB
Format
application/x-gzip
Description
MD5
ff765f9fc47e3f3b1729ebb8aa703d78
Preview
  File Preview
    • Lic_by-nc-nd_Lang_he_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_he_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_hi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc.gz
Size
1.58 MB
Format
application/x-gzip
Description
MD5
7f0c265afa29786ac6470d211a50b6e3
Preview
  File Preview
    • Lic_by-nc-nd_Lang_hi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc6 MB
    • Lic_by-nc-nd_Lang_hi_NoBoilerplate_true_MinHtml_true-r-00012.seg-00000.warc6 MB
Name
Lic_by-nc-nd_Lang_hr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
15.25 MB
Format
application/x-gzip
Description
MD5
c98433037a24fce60e9b8b272861850a
Preview
  File Preview
    • Lic_by-nc-nd_Lang_hr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc40 MB
    • Lic_by-nc-nd_Lang_hr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc40 MB
Name
Lic_by-nc-nd_Lang_hu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc.gz
Size
8.22 MB
Format
application/x-gzip
Description
MD5
c9cef762687f8d9996f3bae4e65e723f
Preview
  File Preview
    • Lic_by-nc-nd_Lang_hu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc23 MB
    • Lic_by-nc-nd_Lang_hu_NoBoilerplate_true_MinHtml_true-r-00024.seg-00000.warc23 MB
Name
Lic_by-nc-nd_Lang_id_NoBoilerplate_true_MinHtml_true-r-00007.seg-00000.warc.gz
Size
8.34 MB
Format
application/x-gzip
Description
MD5
7b415acc3d3eb7a64ffbe3a932280141
Preview
  File Preview
    • Lic_by-nc-nd_Lang_id_NoBoilerplate_true_MinHtml_true-r-00007.seg-00000.warc23 MB
    • Lic_by-nc-nd_Lang_id_NoBoilerplate_true_MinHtml_true-r-00007.seg-00000.warc23 MB
Name
Lic_by-nc-nd_Lang_it_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc.gz
Size
146.19 MB
Format
application/x-gzip
Description
MD5
396277746745a2519d5528f5745de8d7
Preview
  File Preview
    • Lic_by-nc-nd_Lang_it_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc387 MB
    • Lic_by-nc-nd_Lang_it_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc387 MB
Name
Lic_by-nc-nd_Lang_ja_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc.gz
Size
1.04 MB
Format
application/x-gzip
Description
MD5
c3c2d94ce2d10dc663b31a0331cbddc1
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ja_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_ja_NoBoilerplate_true_MinHtml_true-r-00004.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_kn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc.gz
Size
23.46 KB
Format
application/x-gzip
Description
MD5
996bcfb730f933c0eea50911437c1324
Preview
  File Preview
    • Lic_by-nc-nd_Lang_kn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc89 kB
    • Lic_by-nc-nd_Lang_kn_NoBoilerplate_true_MinHtml_true-r-00017.seg-00000.warc89 kB
Name
Lic_by-nc-nd_Lang_ko_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc.gz
Size
10.87 MB
Format
application/x-gzip
Description
MD5
f5d7063fdf6d8afdac64111840b622db
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ko_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc29 MB
    • Lic_by-nc-nd_Lang_ko_NoBoilerplate_true_MinHtml_true-r-00018.seg-00000.warc29 MB
Name
Lic_by-nc-nd_Lang_lt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc.gz
Size
309.45 KB
Format
application/x-gzip
Description
MD5
e6ac5906d36e99a391eb2a3283c8c2e9
Preview
  File Preview
    • Lic_by-nc-nd_Lang_lt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc918 kB
    • Lic_by-nc-nd_Lang_lt_NoBoilerplate_true_MinHtml_true-r-00023.seg-00000.warc918 kB
Name
Lic_by-nc-nd_Lang_lv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc.gz
Size
504.83 KB
Format
application/x-gzip
Description
MD5
63a85d3ce224b9ba39ce417726121e22
Preview
  File Preview
    • Lic_by-nc-nd_Lang_lv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_lv_NoBoilerplate_true_MinHtml_true-r-00025.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_mk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc.gz
Size
631.48 KB
Format
application/x-gzip
Description
MD5
23de489f81917f2fbb2196c3a2b6613f
Preview
  File Preview
    • Lic_by-nc-nd_Lang_mk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc2 MB
    • Lic_by-nc-nd_Lang_mk_NoBoilerplate_true_MinHtml_true-r-00014.seg-00000.warc2 MB
Name
Lic_by-nc-nd_Lang_ml_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc.gz
Size
240.73 KB
Format
application/x-gzip
Description
MD5
aab00f383f979fe3b63630c8b4fb5f24
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ml_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc1 MB
    • Lic_by-nc-nd_Lang_ml_NoBoilerplate_true_MinHtml_true-r-00015.seg-00000.warc1 MB
Name
Lic_by-nc-nd_Lang_mr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc.gz
Size
116.49 KB
Format
application/x-gzip
Description
MD5
e7330c41903b5426be128824c6315222
Preview
  File Preview
    • Lic_by-nc-nd_Lang_mr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc448 kB
    • Lic_by-nc-nd_Lang_mr_NoBoilerplate_true_MinHtml_true-r-00021.seg-00000.warc448 kB
Name
Lic_by-nc-nd_Lang_ne_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc.gz
Size
7.98 KB
Format
application/x-gzip
Description
MD5
921fb655b00dc97b7c2e778640470f4c
Preview
  File Preview
    • Lic_by-nc-nd_Lang_ne_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc26 kB
    • Lic_by-nc-nd_Lang_ne_NoBoilerplate_true_MinHtml_true-r-00008.seg-00000.warc26 kB