What's New

 corpus 
corpus
Description:
Danish Fungi 2020 (DF20) is a fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish Fungal Atlas, is unique in its taxonomy-accurate class labels, small number of errors, ...
 This item contains 2 files (138.95 GB).
 
Publicly Available
 corpus 
corpus
Description:
Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority ...
 This item contains 1 file (15.08 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
Corpus of contemporary written (printed) Czech sized 4.7 GW (i.e. 5.7 billion tokens). It covers mostly the 1990-2019 period and features rich metadata including detailed bibliographical information, text-type classification ...
 This item contains 1 file (21.87 GB).
 
Academic Use Attribution Required Noncommercial

Most Viewed Items

Top Last Week
 corpus 
corpus
Author(s):
Zeman, Daniel ; et al.show everyone Zeman, Daniel ; Nivre, Joakim ; Abrams, Mitchell ; Ackermann, Elia ; Aepli, Noëmi ; Aghaei, Hamid ; Agić, Željko ; Ahmadi, Amir ; Ahrenberg, Lars ; Ajede, Chika Kennedy ; Aleksandravičiūtė, Gabrielė ; Alfina, Ika ; Antonsen, Lene ; Aplonova, Katya ; Aquino, Angelina ; Aragon, Carolina ; Aranzabe, Maria Jesus ; Arıcan, Bilge Nas ; Arnardóttir, Þórunn ; Arutie, Gashaw ; Arwidarasti, Jessica Naraiswari ; Asahara, Masayuki ; Aslan, Deniz Baran ; Ateyah, Luma ; Atmaca, Furkan ; Attia, Mohammed ; Atutxa, Aitziber ; Augustinus, Liesbeth ; Badmaeva, Elena ; Balasubramani, Keerthana ; Ballesteros, Miguel ; Banerjee, Esha ; Bank, Sebastian ; Barbu Mititelu, Verginica ; Barkarson, Starkaður ; Basile, Rodolfo ; Basmov, Victoria ; Batchelor, Colin ; Bauer, John ; Bedir, Seyyit Talha ; Bengoetxea, Kepa ; Berk, Gözde ; Berzak, Yevgeni ; Bhat, Irshad Ahmad ; Bhat, Riyaz Ahmad ; Biagetti, Erica ; Bick, Eckhard ; Bielinskienė, Agnė ; Bjarnadóttir, Kristín ; Blokland, Rogier ; Bobicev, Victoria ; Boizou, Loïc ; Borges Völker, Emanuel ; Börstell, Carl ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Boyd, Adriane ; Braggaar, Anouck ; Brokaitė, Kristina ; Burchardt, Aljoscha ; Candito, Marie ; Caron, Bernard ; Caron, Gauthier ; Cassidy, Lauren ; Cavalcanti, Tatiana ; Cebiroğlu Eryiğit, Gülşen ; Cecchini, Flavio Massimiliano ; Celano, Giuseppe G. A. ; Čéplö, Slavomír ; Cesur, Neslihan ; Cetin, Savas ; Çetinoğlu, Özlem ; Chalub, Fabricio ; Chauhan, Shweta ; Chi, Ethan ; Chika, Taishi ; Cho, Yongseok ; Choi, Jinho ; Chun, Jayeol ; Chung, Juyeon ; Cignarella, Alessandra T. ; Cinková, Silvie ; Collomb, Aurélie ; Çöltekin, Çağrı ; Connor, Miriam ; Courtin, Marine ; Cristescu, Mihaela ; Daniel, Philemon ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Derin, Mehmet Oguz ; de Souza, Elvis ; Diaz de Ilarraza, Arantza ; Dickerson, Carly ; Dinakaramani, Arawinda ; Di Nuovo, Elisa ; Dione, Bamba ; Dirix, Peter ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eckhoff, Hanne ; Eiche, Sandra ; Eli, Marhaba ; Elkahky, Ali ; Ephrem, Binyam ; Erina, Olga ; Erjavec, Tomaž ; Etienne, Aline ; Evelyn, Wograine ; Facundes, Sidney ; Farkas, Richárd ; Ferdaousi, Jannatul ; Fernanda, Marília ; Fernandez Alcalde, Hector ; Foster, Jennifer ; Freitas, Cláudia ; Fujita, Kazunori ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Gärdenfors, Moa ; Garza, Sebastian ; Gerardi, Fabrício Ferraz ; Gerdes, Kim ; Ginter, Filip ; Godoy, Gustavo ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; González Saavedra, Berta ; Griciūtė, Bernadeta ; Grioni, Matias ; Grobol, Loïc ; Grūzītis, Normunds ; Guillaume, Bruno ; Guillot-Barbance, Céline ; Güngör, Tunga ; Habash, Nizar ; Hafsteinsson, Hinrik ; Hajič, Jan ; Hajič jr., Jan ; Hämäläinen, Mika ; Hà Mỹ, Linh ; Han, Na-Rae ; Hanifmuti, Muhammad Yudistira ; Hardwick, Sam ; Harris, Kim ; Haug, Dag ; Heinecke, Johannes ; Hellwig, Oliver ; Hennig, Felix ; Hladká, Barbora ; Hlaváčová, Jaroslava ; Hociung, Florinel ; Hohle, Petter ; Huber, Eva ; Hwang, Jena ; Ikeda, Takumi ; Ingason, Anton Karl ; Ion, Radu ; Irimia, Elena ; Ishola, Ọlájídé ; Ito, Kaoru ; Jannat, Siratun ; Jelínek, Tomáš ; Jha, Apoorva ; Johannsen, Anders ; Jónsdóttir, Hildur ; Jørgensen, Fredrik ; Juutinen, Markus ; K, Sarveswaran ; Kaşıkara, Hüner ; Kaasen, Andre ; Kabaeva, Nadezhda ; Kahane, Sylvain ; Kanayama, Hiroshi ; Kanerva, Jenna ; Kara, Neslihan ; Katz, Boris ; Kayadelen, Tolga ; Kenney, Jessica ; Kettnerová, Václava ; Kirchner, Jesse ; Klementieva, Elena ; Klyachko, Elena ; Köhn, Arne ; Köksal, Abdullatif ; Kopacewicz, Kamil ; Korkiakangas, Timo ; Köse, Mehmet ; Kotsyba, Natalia ; Kovalevskaitė, Jolanta ; Krek, Simon ; Krishnamurthy, Parameswari ; Kübler, Sandra ; Kuyrukçu, Oğuzhan ; Kuzgun, Aslı ; Kwak, Sookyoung ; Laippala, Veronika ; Lam, Lucia ; Lambertino, Lorenzo ; Lando, Tatiana ; Larasati, Septina Dian ; Lavrentiev, Alexei ; Lee, John ; Lê Hồng, Phương ; Lenci, Alessandro ; Lertpradit, Saran ; Leung, Herman ; Levina, Maria ; Li, Cheuk Ying ; Li, Josie ; Li, Keying ; Li, Yuan ; Lim, KyungTae ; Lima Padovani, Bruna ; Lindén, Krister ; Ljubešić, Nikola ; Loginova, Olga ; Lusito, Stefano ; Luthfi, Andry ; Luukko, Mikko ; Lyashevskaya, Olga ; Lynn, Teresa ; Macketanz, Vivien ; Mahamdi, Menel ; Maillard, Jean ; Makazhanov, Aibek ; Mandl, Michael ; Manning, Christopher ; Manurung, Ruli ; Marşan, Büşra ; Mărănduc, Cătălina ; Mareček, David ; Marheinecke, Katrin ; Martínez Alonso, Héctor ; Martín-Rodríguez, Lorena ; Martins, André ; Mašek, Jan ; Matsuda, Hiroshi ; Matsumoto, Yuji ; Mazzei, Alessandro ; McDonald, Ryan ; McGuinness, Sarah ; Mendonça, Gustavo ; Merzhevich, Tatiana ; Miekka, Niko ; Mischenkova, Karina ; Misirpashayeva, Margarita ; Missilä, Anna ; Mititelu, Cătălin ; Mitrofan, Maria ; Miyao, Yusuke ; Mojiri Foroushani, AmirHossein ; Molnár, Judit ; Moloodi, Amirsaeid ; Montemagni, Simonetta ; More, Amir ; Moreno Romero, Laura ; Moretti, Giovanni ; Mori, Keiko Sophie ; Mori, Shinsuke ; Morioka, Tomohiko ; Moro, Shigeki ; Mortensen, Bjartur ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Munro, Robert ; Murawaki, Yugo ; Müürisep, Kaili ; Nainwani, Pinkey ; Nakhlé, Mariam ; Navarro Horñiacek, Juan Ignacio ; Nedoluzhko, Anna ; Nešpore-Bērzkalne, Gunta ; Nevaci, Manuela ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikaido, Yoshihiro ; Nikolaev, Vitaly ; Nitisaroj, Rattima ; Nourian, Alireza ; Nurmi, Hanna ; Ojala, Stina ; Ojha, Atul Kr. ; Olúòkun, Adédayọ̀ ; Omura, Mai ; Onwuegbuzia, Emeka ; Osenova, Petya ; Östling, Robert ; Øvrelid, Lilja ; Özateş, Şaziye Betül ; Özçelik, Merve ; Özgür, Arzucan ; Öztürk Başaran, Balkız ; Park, Hyunji Hayley ; Partanen, Niko ; Pascual, Elena ; Passarotti, Marco ; Patejuk, Agnieszka ; Paulino-Passos, Guilherme ; Peljak-Łapińska, Angelika ; Peng, Siyao ; Perez, Cenel-Augusto ; Perkova, Natalia ; Perrier, Guy ; Petrov, Slav ; Petrova, Daria ; Phelan, Jason ; Piitulainen, Jussi ; Pirinen, Tommi A ; Pitler, Emily ; Plank, Barbara ; Poibeau, Thierry ; Ponomareva, Larisa ; Popel, Martin ; Pretkalniņa, Lauma ; Prévost, Sophie ; Prokopidis, Prokopis ; Przepiórkowski, Adam ; Puolakainen, Tiina ; Pyysalo, Sampo ; Qi, Peng ; Rääbis, Andriela ; Rademaker, Alexandre ; Rahoman, Mizanur ; Rama, Taraka ; Ramasamy, Loganathan ; Ramisch, Carlos ; Rashel, Fam ; Rasooli, Mohammad Sadegh ; Ravishankar, Vinit ; Real, Livy ; Rebeja, Petru ; Reddy, Siva ; Regnault, Mathilde ; Rehm, Georg ; Riabov, Ivan ; Rießler, Michael ; Rimkutė, Erika ; Rinaldi, Larissa ; Rituma, Laura ; Rizqiyah, Putri ; Rocha, Luisa ; Rögnvaldsson, Eiríkur ; Romanenko, Mykhailo ; Rosa, Rudolf ; Roșca, Valentin ; Rovati, Davide ; Rudina, Olga ; Rueter, Jack ; Rúnarsson, Kristján ; Sadde, Shoval ; Safari, Pegah ; Sagot, Benoît ; Sahala, Aleksi ; Saleh, Shadi ; Salomoni, Alessio ; Samardžić, Tanja ; Samson, Stephanie ; Sanguinetti, Manuela ; Sanıyar, Ezgi ; Särg, Dage ; Saulīte, Baiba ; Sawanakunanon, Yanin ; Saxena, Shefali ; Scannell, Kevin ; Scarlata, Salvatore ; Schneider, Nathan ; Schuster, Sebastian ; Schwartz, Lane ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shahzadi, Syeda ; Shen, Mo ; Shimada, Atsuko ; Shirasu, Hiroyuki ; Shishkina, Yana ; Shohibussirri, Muh ; Sichinava, Dmitry ; Siewert, Janine ; Sigurðsson, Einar Freyr ; Silveira, Aline ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Skachedubova, Maria ; Smith, Aaron ; Soares-Bastos, Isabela ; Sourov, Shafi ; Spadine, Carolyn ; Sprugnoli, Rachele ; Steingrímsson, Steinþór ; Stella, Antonio ; Straka, Milan ; Strickland, Emmett ; Strnadová, Jana ; Suhr, Alane ; Sulestio, Yogi Lesmana ; Sulubacak, Umut ; Suzuki, Shingo ; Szántó, Zsolt ; Taguchi, Chihiro ; Taji, Dima ; Takahashi, Yuta ; Tamburini, Fabio ; Tan, Mary Ann C. ; Tanaka, Takaaki ; Tanaya, Dipta ; Tella, Samson ; Tellier, Isabelle ; Testori, Marinella ; Thomas, Guillaume ; Torga, Liisi ; Toska, Marsida ; Trosterud, Trond ; Trukhina, Anna ; Tsarfaty, Reut ; Türk, Utku ; Tyers, Francis ; Uematsu, Sumire ; Untilov, Roman ; Urešová, Zdeňka ; Uria, Larraitz ; Uszkoreit, Hans ; Utka, Andrius ; Vajjala, Sowmya ; van der Goot, Rob ; Vanhove, Martine ; van Niekerk, Daniel ; van Noord, Gertjan ; Varga, Viktor ; Villemonte de la Clergerie, Eric ; Vincze, Veronika ; Vlasova, Natalia ; Wakasa, Aya ; Wallenberg, Joel C. ; Wallin, Lars ; Walsh, Abigail ; Wang, Jing Xian ; Washington, Jonathan North ; Wendt, Maximilan ; Widmer, Paul ; Wijono, Sri Hartati ; Williams, Seyi ; Wirén, Mats ; Wittern, Christian ; Woldemariam, Tsegay ; Wong, Tak-sum ; Wróblewska, Alina ; Yako, Mary ; Yamashita, Kayo ; Yamazaki, Naoki ; Yan, Chunxiao ; Yasuoka, Koichi ; Yavrumyan, Marat M. ; Yenice, Arife Betül ; Yıldız, Olcay Taner ; Yu, Zhuoran ; Yuliawati, Arlisa ; Žabokrtský, Zdeněk ; Zahra, Shorouq ; Zeldes, Amir ; Zhou, He ; Zhu, Hanzhi ; Zhuravleva, Anna ; Ziane, Rayan
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 3 files (534.14 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 corpus 
corpus
Description:
Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority ...
 This item contains 1 file (15.08 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 toolService 
toolService
Description:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation ...
 This item contains 96 files (2.61 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike