What's New

 corpus 
corpus
Description:
This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, ...
 This item contains 2 files (6.31 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial
 corpus 
corpus
Description:
The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual ...
 This item contains 5 files (1.99 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 corpus 
corpus
Description:
Indonesian web corpus crawled in 2010. Encoded in UTF-8, cleaned, deduplicated, tagged by Morphind.
 This item contains 1 file (207.88 MB).
 
Academic Use

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: ...
 This item contains 1 file (223.23 KB).
 
Publicly Available
 corpus 
corpus
Author(s):
Nivre, Joakim ; Abrams, Mitchell ; Agić, Željko ; Ahrenberg, Lars ; Antonsen, Lene ; Aplonova, Katya ; Aranzabe, Maria Jesus ; Arutie, Gashaw ; Asahara, Masayuki ; Ateyah, Luma ; Attia, Mohammed ; Atutxa, Aitziber ; Augustinus, Liesbeth ; Badmaeva, Elena ; Ballesteros, Miguel ; Banerjee, Esha ; Bank, Sebastian ; Barbu Mititelu, Verginica ; Basmov, Victoria ; Bauer, John ; Bellato, Sandra ; Bengoetxea, Kepa ; Berzak, Yevgeni ; Bhat, Irshad Ahmad ; Bhat, Riyaz Ahmad ; Biagetti, Erica ; Bick, Eckhard ; Blokland, Rogier ; Bobicev, Victoria ; Börstell, Carl ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Boyd, Adriane ; Burchardt, Aljoscha ; Candito, Marie ; Caron, Bernard ; Caron, Gauthier ; Cebiroğlu Eryiğit, Gülşen ; Cecchini, Flavio Massimiliano ; Celano, Giuseppe G. A. ; Čéplö, Slavomír ; Cetin, Savas ; Chalub, Fabricio ; Choi, Jinho ; Cho, Yongseok ; Chun, Jayeol ; Cinková, Silvie ; Collomb, Aurélie ; Çöltekin, Çağrı ; Connor, Miriam ; Courtin, Marine ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Diaz de Ilarraza, Arantza ; Dickerson, Carly ; Dirix, Peter ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eli, Marhaba ; Elkahky, Ali ; Ephrem, Binyam ; Erjavec, Tomaž ; Etienne, Aline ; Farkas, Richárd ; Fernandez Alcalde, Hector ; Foster, Jennifer ; Freitas, Cláudia ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Gärdenfors, Moa ; Garza, Sebastian ; Gerdes, Kim ; Ginter, Filip ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; Gonzáles Saavedra, Berta ; Grioni, Matias ; Grūzītis, Normunds ; Guillaume, Bruno ; Guillot-Barbance, Céline ; Habash, Nizar ; Hajič, Jan ; Hajič jr., Jan ; Hà Mỹ, Linh ; Han, Na-Rae ; Harris, Kim ; Haug, Dag ; Hladká, Barbora ; Hlaváčová, Jaroslava ; Hociung, Florinel ; Hohle, Petter ; Hwang, Jena ; Ion, Radu ; Irimia, Elena ; Ishola, Ọlájídé ; Jelínek, Tomáš ; Johannsen, Anders ; Jørgensen, Fredrik ; Kaşıkara, Hüner ; Kahane, Sylvain ; Kanayama, Hiroshi ; Kanerva, Jenna ; Katz, Boris ; Kayadelen, Tolga ; Kenney, Jessica ; Kettnerová, Václava ; Kirchner, Jesse ; Kopacewicz, Kamil ; Kotsyba, Natalia ; Krek, Simon ; Kwak, Sookyoung ; Laippala, Veronika ; Lambertino, Lorenzo ; Lam, Lucia ; Lando, Tatiana ; Larasati, Septina Dian ; Lavrentiev, Alexei ; Lee, John ; Lê Hồng, Phương ; Lenci, Alessandro ; Lertpradit, Saran ; Leung, Herman ; Li, Cheuk Ying ; Li, Josie ; Li, Keying ; Lim, KyungTae ; Ljubešić, Nikola ; Loginova, Olga ; Lyashevskaya, Olga ; Lynn, Teresa ; Macketanz, Vivien ; Makazhanov, Aibek ; Mandl, Michael ; Manning, Christopher ; Manurung, Ruli ; Mărănduc, Cătălina ; Mareček, David ; Marheinecke, Katrin ; Martínez Alonso, Héctor ; Martins, André ; Mašek, Jan ; Matsumoto, Yuji ; McDonald, Ryan ; Mendonça, Gustavo ; Miekka, Niko ; Misirpashayeva, Margarita ; Missilä, Anna ; Mititelu, Cătălin ; Miyao, Yusuke ; Montemagni, Simonetta ; More, Amir ; Moreno Romero, Laura ; Mori, Keiko Sophie ; Mori, Shinsuke ; Mortensen, Bjartur ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Murawaki, Yugo ; Müürisep, Kaili ; Nainwani, Pinkey ; Navarro Horñiacek, Juan Ignacio ; Nedoluzhko, Anna ; Nešpore-Bērzkalne, Gunta ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikolaev, Vitaly ; Nitisaroj, Rattima ; Nurmi, Hanna ; Ojala, Stina ; Olúòkun, Adédayọ̀ ; Omura, Mai ; Osenova, Petya ; Östling, Robert ; Øvrelid, Lilja ; Partanen, Niko ; Pascual, Elena ; Passarotti, Marco ; Patejuk, Agnieszka ; Paulino-Passos, Guilherme ; Peng, Siyao ; Perez, Cenel-Augusto ; Perrier, Guy ; Petrov, Slav ; Piitulainen, Jussi ; Pitler, Emily ; Plank, Barbara ; Poibeau, Thierry ; Popel, Martin ; Pretkalniņa, Lauma ; Prévost, Sophie ; Prokopidis, Prokopis ; Przepiórkowski, Adam ; Puolakainen, Tiina ; Pyysalo, Sampo ; Rääbis, Andriela ; Rademaker, Alexandre ; Ramasamy, Loganathan ; Rama, Taraka ; Ramisch, Carlos ; Ravishankar, Vinit ; Real, Livy ; Reddy, Siva ; Rehm, Georg ; Rießler, Michael ; Rinaldi, Larissa ; Rituma, Laura ; Rocha, Luisa ; Romanenko, Mykhailo ; Rosa, Rudolf ; Rovati, Davide ; Roșca, Valentin ; Rudina, Olga ; Rueter, Jack ; Sadde, Shoval ; Sagot, Benoît ; Saleh, Shadi ; Samardžić, Tanja ; Samson, Stephanie ; Sanguinetti, Manuela ; Saulīte, Baiba ; Sawanakunanon, Yanin ; Schneider, Nathan ; Schuster, Sebastian ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shen, Mo ; Shimada, Atsuko ; Shohibussirri, Muh ; Sichinava, Dmitry ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Smith, Aaron ; Soares-Bastos, Isabela ; Spadine, Carolyn ; Stella, Antonio ; Straka, Milan ; Strnadová, Jana ; Suhr, Alane ; Sulubacak, Umut ; Szántó, Zsolt ; Taji, Dima ; Takahashi, Yuta ; Tanaka, Takaaki ; Tellier, Isabelle ; Trosterud, Trond ; Trukhina, Anna ; Tsarfaty, Reut ; Tyers, Francis ; Uematsu, Sumire ; Urešová, Zdeňka ; Uria, Larraitz ; Uszkoreit, Hans ; Vajjala, Sowmya ; van Niekerk, Daniel ; van Noord, Gertjan ; Varga, Viktor ; Villemonte de la Clergerie, Eric ; Vincze, Veronika ; Wallin, Lars ; Wang, Jing Xian ; Washington, Jonathan North ; Williams, Seyi ; Wirén, Mats ; Woldemariam, Tsegay ; Wong, Tak-sum ; Yan, Chunxiao ; Yavrumyan, Marat M. ; Yu, Zhuoran ; Žabokrtský, Zdeněk ; Zeldes, Amir ; Zeman, Daniel ; Zhang, Manying ; Zhu, Hanzhi
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 3 files (335.18 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 languageDescription 
languageDescription
Description:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
 This item contains 47 files (629.67 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike