This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

SELEXINI corpus

Please use the following text to cite this item or export to a predefined format:
Scholivet, Manon; Savary, Agata; Estève, Louis Clément; Candito, Marie and Ramisch, Carlos, 2024, SELEXINI corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-5822.
Date issued
2024-12-12
Size
3000000000 tokens
Language(s)
Description
We present here a large automatically annotated corpus for French. This corpus is divided into two parts: the first from BigScience, and the second from HPLT. The annotated documents from HPLT were selected in order to optimise the lexical diversity of the final corpus SELEXINI.
This item isPublicly Available
and licensed under:
 Files in this item
Name
bigscience_FTB-dep.tar.gz
Size
10.69 GB
Format
application/x-gzip
Description
gzip Archive
MD5
52e499d41a83ca820585efb888b5524a
Preview
  File Preview
Name
bigscience_UD.tar.gz
Size
15.03 GB
Format
application/x-gzip
Description
gzip Archive
MD5
11dae92039406f4190045609a233aad6
Preview
  File Preview
Name
hplt_FTB-dep.tar.gz
Size
11.68 GB
Format
application/x-gzip
Description
gzip Archive
MD5
547801aae560d9cfdbde76a692fb7543
Preview
  File Preview
Name
hplt_UD.tar.gz
Size
16.36 GB
Format
application/x-gzip
Description
gzip Archive
MD5
1ba2fa0de3166f404b2bd0aa8a21b857
Preview
  File Preview