Automatically Annotated Corpora with Stanza and UDPipe for Czech, English, and Greek
Please use the following text to cite this item or export to a predefined format:
Diamantopoulos, Konstantinos, 2026,
Automatically Annotated Corpora with Stanza and UDPipe for Czech, English, and Greek, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-6120.
Authors
Item identifier
Date issued
2026-03-04
Size
30,000,000 sentences
Language(s)
Description
This resource contains six automatically annotated corpora derived from the Leipzig Corpora Collection, covering three languages: Czech, English, and Greek. For each language, two corpora are provided — one annotated with Stanza and one annotated with UDPipe — resulting in two corpora per language and six corpora in total.
Acknowledgement
Czech Science Foundation
Project code:26-21822S
Project name:Complexity of inflection and word-formation: An intra- and cross-linguistic perspective
MŠMT ČR
Project code:CZ.02.01.01/00/23_025/0008691
Project name:HumanAId PN
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- corpora.tar.gz
- Size
- 6.32 GB
- Format
- application/x-gzip
- Description
- MD5
- 4588f035baf7271cf4afb18f38ad9ecb


