What's New

 corpus 
corpus
Author(s):
Description:
Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech National Corpus. The corpus includes fiction (ca ...
 This item contains 1 file (17.22 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 languageDescription 
languageDescription
Description:
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized ...
 This item contains 2 files (1.01 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two ...
 This item contains 1 file (77.18 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike

Most Viewed Items

Top Last Week
 corpus 
corpus
Author(s):
Zeman, Daniel ; et al.show everyone Zeman, Daniel ; Nivre, Joakim ; Abrams, Mitchell ; Ackermann, Elia ; Aepli, Noëmi ; Aghaei, Hamid ; Agić, Željko ; Ahmadi, Amir ; Ahrenberg, Lars ; Ajede, Chika Kennedy ; Aleksandravičiūtė, Gabrielė ; Alfina, Ika ; Antonsen, Lene ; Aplonova, Katya ; Aquino, Angelina ; Aragon, Carolina ; Aranzabe, Maria Jesus ; Arıcan, Bilge Nas ; Arnardóttir, Þórunn ; Arutie, Gashaw ; Arwidarasti, Jessica Naraiswari ; Asahara, Masayuki ; Aslan, Deniz Baran ; Ateyah, Luma ; Atmaca, Furkan ; Attia, Mohammed ; Atutxa, Aitziber ; Augustinus, Liesbeth ; Badmaeva, Elena ; Balasubramani, Keerthana ; Ballesteros, Miguel ; Banerjee, Esha ; Bank, Sebastian ; Barbu Mititelu, Verginica ; Barkarson, Starkaður ; Basmov, Victoria ; Batchelor, Colin ; Bauer, John ; Bedir, Seyyit Talha ; Bengoetxea, Kepa ; Berk, Gözde ; Berzak, Yevgeni ; Bhat, Irshad Ahmad ; Bhat, Riyaz Ahmad ; Biagetti, Erica ; Bick, Eckhard ; Bielinskienė, Agnė ; Bjarnadóttir, Kristín ; Blokland, Rogier ; Bobicev, Victoria ; Boizou, Loïc ; Borges Völker, Emanuel ; Börstell, Carl ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Boyd, Adriane ; Braggaar, Anouck ; Brokaitė, Kristina ; Burchardt, Aljoscha ; Candito, Marie ; Caron, Bernard ; Caron, Gauthier ; Cassidy, Lauren ; Cavalcanti, Tatiana ; Cebiroğlu Eryiğit, Gülşen ; Cecchini, Flavio Massimiliano ; Celano, Giuseppe G. A. ; Čéplö, Slavomír ; Cesur, Neslihan ; Cetin, Savas ; Çetinoğlu, Özlem ; Chalub, Fabricio ; Chauhan, Shweta ; Chi, Ethan ; Chika, Taishi ; Cho, Yongseok ; Choi, Jinho ; Chun, Jayeol ; Cignarella, Alessandra T. ; Cinková, Silvie ; Collomb, Aurélie ; Çöltekin, Çağrı ; Connor, Miriam ; Courtin, Marine ; Cristescu, Mihaela ; Daniel, Philemon. ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Derin, Mehmet Oguz ; de Souza, Elvis ; Diaz de Ilarraza, Arantza ; Dickerson, Carly ; Dinakaramani, Arawinda ; Di Nuovo, Elisa ; Dione, Bamba ; Dirix, Peter ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eckhoff, Hanne ; Eiche, Sandra ; Eli, Marhaba ; Elkahky, Ali ; Ephrem, Binyam ; Erina, Olga ; Erjavec, Tomaž ; Etienne, Aline ; Evelyn, Wograine ; Facundes, Sidney ; Farkas, Richárd ; Fernanda, Marília ; Fernandez Alcalde, Hector ; Foster, Jennifer ; Freitas, Cláudia ; Fujita, Kazunori ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Gärdenfors, Moa ; Garza, Sebastian ; Gerardi, Fabrício Ferraz ; Gerdes, Kim ; Ginter, Filip ; Godoy, Gustavo ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; González Saavedra, Berta ; Griciūtė, Bernadeta ; Grioni, Matias ; Grobol, Loïc ; Grūzītis, Normunds ; Guillaume, Bruno ; Guillot-Barbance, Céline ; Güngör, Tunga ; Habash, Nizar ; Hafsteinsson, Hinrik ; Hajič, Jan ; Hajič jr., Jan ; Hämäläinen, Mika ; Hà Mỹ, Linh ; Han, Na-Rae ; Hanifmuti, Muhammad Yudistira ; Hardwick, Sam ; Harris, Kim ; Haug, Dag ; Heinecke, Johannes ; Hellwig, Oliver ; Hennig, Felix ; Hladká, Barbora ; Hlaváčová, Jaroslava ; Hociung, Florinel ; Hohle, Petter ; Huber, Eva ; Hwang, Jena ; Ikeda, Takumi ; Ingason, Anton Karl ; Ion, Radu ; Irimia, Elena ; Ishola, Ọlájídé ; Ito, Kaoru ; Jelínek, Tomáš ; Jha, Apoorva ; Johannsen, Anders ; Jónsdóttir, Hildur ; Jørgensen, Fredrik ; Juutinen, Markus ; K, Sarveswaran ; Kaşıkara, Hüner ; Kaasen, Andre ; Kabaeva, Nadezhda ; Kahane, Sylvain ; Kanayama, Hiroshi ; Kanerva, Jenna ; Kara, Neslihan ; Katz, Boris ; Kayadelen, Tolga ; Kenney, Jessica ; Kettnerová, Václava ; Kirchner, Jesse ; Klementieva, Elena ; Köhn, Arne ; Köksal, Abdullatif ; Kopacewicz, Kamil ; Korkiakangas, Timo ; Kotsyba, Natalia ; Kovalevskaitė, Jolanta ; Krek, Simon ; Krishnamurthy, Parameswari ; Kuyrukçu, Oğuzhan ; Kuzgun, Aslı ; Kwak, Sookyoung ; Laippala, Veronika ; Lam, Lucia ; Lambertino, Lorenzo ; Lando, Tatiana ; Larasati, Septina Dian ; Lavrentiev, Alexei ; Lee, John ; Lê Hồng, Phương ; Lenci, Alessandro ; Lertpradit, Saran ; Leung, Herman ; Levina, Maria ; Li, Cheuk Ying ; Li, Josie ; Li, Keying ; Li, Yuan ; Lim, KyungTae ; Lima Padovani, Bruna ; Lindén, Krister ; Ljubešić, Nikola ; Loginova, Olga ; Luthfi, Andry ; Luukko, Mikko ; Lyashevskaya, Olga ; Lynn, Teresa ; Macketanz, Vivien ; Makazhanov, Aibek ; Mandl, Michael ; Manning, Christopher ; Manurung, Ruli ; Marşan, Büşra ; Mărănduc, Cătălina ; Mareček, David ; Marheinecke, Katrin ; Martínez Alonso, Héctor ; Martins, André ; Mašek, Jan ; Matsuda, Hiroshi ; Matsumoto, Yuji ; Mazzei, Alessandro ; McDonald, Ryan ; McGuinness, Sarah ; Mendonça, Gustavo ; Miekka, Niko ; Mischenkova, Karina ; Misirpashayeva, Margarita ; Missilä, Anna ; Mititelu, Cătălin ; Mitrofan, Maria ; Miyao, Yusuke ; Mojiri Foroushani, AmirHossein ; Molnár, Judit ; Moloodi, Amirsaeid ; Montemagni, Simonetta ; More, Amir ; Moreno Romero, Laura ; Moretti, Giovanni ; Mori, Keiko Sophie ; Mori, Shinsuke ; Morioka, Tomohiko ; Moro, Shigeki ; Mortensen, Bjartur ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Munro, Robert ; Murawaki, Yugo ; Müürisep, Kaili ; Nainwani, Pinkey ; Nakhlé, Mariam ; Navarro Horñiacek, Juan Ignacio ; Nedoluzhko, Anna ; Nešpore-Bērzkalne, Gunta ; Nevaci, Manuela ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikaido, Yoshihiro ; Nikolaev, Vitaly ; Nitisaroj, Rattima ; Nourian, Alireza ; Nurmi, Hanna ; Ojala, Stina ; Ojha, Atul Kr. ; Olúòkun, Adédayọ̀ ; Omura, Mai ; Onwuegbuzia, Emeka ; Osenova, Petya ; Östling, Robert ; Øvrelid, Lilja ; Özateş, Şaziye Betül ; Özçelik, Merve ; Özgür, Arzucan ; Öztürk Başaran, Balkız ; Park, Hyunji Hayley ; Partanen, Niko ; Pascual, Elena ; Passarotti, Marco ; Patejuk, Agnieszka ; Paulino-Passos, Guilherme ; Peljak-Łapińska, Angelika ; Peng, Siyao ; Perez, Cenel-Augusto ; Perkova, Natalia ; Perrier, Guy ; Petrov, Slav ; Petrova, Daria ; Phelan, Jason ; Piitulainen, Jussi ; Pirinen, Tommi A ; Pitler, Emily ; Plank, Barbara ; Poibeau, Thierry ; Ponomareva, Larisa ; Popel, Martin ; Pretkalniņa, Lauma ; Prévost, Sophie ; Prokopidis, Prokopis ; Przepiórkowski, Adam ; Puolakainen, Tiina ; Pyysalo, Sampo ; Qi, Peng ; Rääbis, Andriela ; Rademaker, Alexandre ; Rama, Taraka ; Ramasamy, Loganathan ; Ramisch, Carlos ; Rashel, Fam ; Rasooli, Mohammad Sadegh ; Ravishankar, Vinit ; Real, Livy ; Rebeja, Petru ; Reddy, Siva ; Rehm, Georg ; Riabov, Ivan ; Rießler, Michael ; Rimkutė, Erika ; Rinaldi, Larissa ; Rituma, Laura ; Rocha, Luisa ; Rögnvaldsson, Eiríkur ; Romanenko, Mykhailo ; Rosa, Rudolf ; Roșca, Valentin ; Rovati, Davide ; Rudina, Olga ; Rueter, Jack ; Rúnarsson, Kristján ; Sadde, Shoval ; Safari, Pegah ; Sagot, Benoît ; Sahala, Aleksi ; Saleh, Shadi ; Salomoni, Alessio ; Samardžić, Tanja ; Samson, Stephanie ; Sanguinetti, Manuela ; Sanıyar, Ezgi ; Särg, Dage ; Saulīte, Baiba ; Sawanakunanon, Yanin ; Saxena, Shefali ; Scannell, Kevin ; Scarlata, Salvatore ; Schneider, Nathan ; Schuster, Sebastian ; Schwartz, Lane ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shen, Mo ; Shimada, Atsuko ; Shirasu, Hiroyuki ; Shishkina, Yana ; Shohibussirri, Muh ; Sichinava, Dmitry ; Siewert, Janine ; Sigurðsson, Einar Freyr ; Silveira, Aline ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Skachedubova, Maria ; Smith, Aaron ; Soares-Bastos, Isabela ; Spadine, Carolyn ; Sprugnoli, Rachele ; Steingrímsson, Steinþór ; Stella, Antonio ; Straka, Milan ; Strickland, Emmett ; Strnadová, Jana ; Suhr, Alane ; Sulestio, Yogi Lesmana ; Sulubacak, Umut ; Suzuki, Shingo ; Szántó, Zsolt ; Taji, Dima ; Takahashi, Yuta ; Tamburini, Fabio ; Tan, Mary Ann C. ; Tanaka, Takaaki ; Tella, Samson ; Tellier, Isabelle ; Testori, Marinella ; Thomas, Guillaume ; Torga, Liisi ; Toska, Marsida ; Trosterud, Trond ; Trukhina, Anna ; Tsarfaty, Reut ; Türk, Utku ; Tyers, Francis ; Uematsu, Sumire ; Untilov, Roman ; Urešová, Zdeňka ; Uria, Larraitz ; Uszkoreit, Hans ; Utka, Andrius ; Vajjala, Sowmya ; van der Goot, Rob ; Vanhove, Martine ; van Niekerk, Daniel ; van Noord, Gertjan ; Varga, Viktor ; Villemonte de la Clergerie, Eric ; Vincze, Veronika ; Vlasova, Natalia ; Wakasa, Aya ; Wallenberg, Joel C. ; Wallin, Lars ; Walsh, Abigail ; Wang, Jing Xian ; Washington, Jonathan North ; Wendt, Maximilan ; Widmer, Paul ; Williams, Seyi ; Wirén, Mats ; Wittern, Christian ; Woldemariam, Tsegay ; Wong, Tak-sum ; Wróblewska, Alina ; Yako, Mary ; Yamashita, Kayo ; Yamazaki, Naoki ; Yan, Chunxiao ; Yasuoka, Koichi ; Yavrumyan, Marat M. ; Yenice, Arife Betül ; Yıldız, Olcay Taner ; Yu, Zhuoran ; Žabokrtský, Zdeněk ; Zahra, Shorouq ; Zeldes, Amir ; Zhu, Hanzhi ; Zhuravleva, Anna ; Ziane, Rayan
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 3 files (501.12 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 languageDescription 
languageDescription
Description:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
 This item contains 47 files (629.67 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
HindEnCorp parallel texts (sentence-aligned) come from the following sources: Tides, which contains 50K sentence pairs taken mainly from news articles. This dataset was originally col- lected for the DARPA-TIDES ...
 This item contains 3 files (66.13 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike