What's New

 corpus 
corpus
Author(s):
Description:
This is an XML dataset of 17 lecture recordings randomly sampled from the lectures recorded at the Faculty of Informatics, Brno, Czechia during 2010–2016. We drew a stratified sample of up to 25 video frames from each ...
 This item contains 1 file (528.95 MB).
 
Publicly Available
 corpus 
corpus
Author(s):
Description:
OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other ...
 This item contains 2 files (1.01 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
Syntactic annotation of 1600 sentences from the Czesl-MAN corpus using the framework of Universal Dependencies 2.3
 This item contains 1 file (905.31 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Most Viewed Items

Top Last Week
 corpus 
corpus
Author(s):
Nivre, Joakim ; Abrams, Mitchell ; Agić, Željko ; Ahrenberg, Lars ; Antonsen, Lene ; Aplonova, Katya ; Aranzabe, Maria Jesus ; Arutie, Gashaw ; Asahara, Masayuki ; Ateyah, Luma ; Attia, Mohammed ; Atutxa, Aitziber ; Augustinus, Liesbeth ; Badmaeva, Elena ; Ballesteros, Miguel ; Banerjee, Esha ; Bank, Sebastian ; Barbu Mititelu, Verginica ; Basmov, Victoria ; Bauer, John ; Bellato, Sandra ; Bengoetxea, Kepa ; Berzak, Yevgeni ; Bhat, Irshad Ahmad ; Bhat, Riyaz Ahmad ; Biagetti, Erica ; Bick, Eckhard ; Blokland, Rogier ; Bobicev, Victoria ; Börstell, Carl ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Boyd, Adriane ; Burchardt, Aljoscha ; Candito, Marie ; Caron, Bernard ; Caron, Gauthier ; Cebiroğlu Eryiğit, Gülşen ; Cecchini, Flavio Massimiliano ; Celano, Giuseppe G. A. ; Čéplö, Slavomír ; Cetin, Savas ; Chalub, Fabricio ; Choi, Jinho ; Cho, Yongseok ; Chun, Jayeol ; Cinková, Silvie ; Collomb, Aurélie ; Çöltekin, Çağrı ; Connor, Miriam ; Courtin, Marine ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Diaz de Ilarraza, Arantza ; Dickerson, Carly ; Dirix, Peter ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eli, Marhaba ; Elkahky, Ali ; Ephrem, Binyam ; Erjavec, Tomaž ; Etienne, Aline ; Farkas, Richárd ; Fernandez Alcalde, Hector ; Foster, Jennifer ; Freitas, Cláudia ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Gärdenfors, Moa ; Garza, Sebastian ; Gerdes, Kim ; Ginter, Filip ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; Gonzáles Saavedra, Berta ; Grioni, Matias ; Grūzītis, Normunds ; Guillaume, Bruno ; Guillot-Barbance, Céline ; Habash, Nizar ; Hajič, Jan ; Hajič jr., Jan ; Hà Mỹ, Linh ; Han, Na-Rae ; Harris, Kim ; Haug, Dag ; Hladká, Barbora ; Hlaváčová, Jaroslava ; Hociung, Florinel ; Hohle, Petter ; Hwang, Jena ; Ion, Radu ; Irimia, Elena ; Ishola, Ọlájídé ; Jelínek, Tomáš ; Johannsen, Anders ; Jørgensen, Fredrik ; Kaşıkara, Hüner ; Kahane, Sylvain ; Kanayama, Hiroshi ; Kanerva, Jenna ; Katz, Boris ; Kayadelen, Tolga ; Kenney, Jessica ; Kettnerová, Václava ; Kirchner, Jesse ; Kopacewicz, Kamil ; Kotsyba, Natalia ; Krek, Simon ; Kwak, Sookyoung ; Laippala, Veronika ; Lambertino, Lorenzo ; Lam, Lucia ; Lando, Tatiana ; Larasati, Septina Dian ; Lavrentiev, Alexei ; Lee, John ; Lê Hồng, Phương ; Lenci, Alessandro ; Lertpradit, Saran ; Leung, Herman ; Li, Cheuk Ying ; Li, Josie ; Li, Keying ; Lim, KyungTae ; Ljubešić, Nikola ; Loginova, Olga ; Lyashevskaya, Olga ; Lynn, Teresa ; Macketanz, Vivien ; Makazhanov, Aibek ; Mandl, Michael ; Manning, Christopher ; Manurung, Ruli ; Mărănduc, Cătălina ; Mareček, David ; Marheinecke, Katrin ; Martínez Alonso, Héctor ; Martins, André ; Mašek, Jan ; Matsumoto, Yuji ; McDonald, Ryan ; Mendonça, Gustavo ; Miekka, Niko ; Misirpashayeva, Margarita ; Missilä, Anna ; Mititelu, Cătălin ; Miyao, Yusuke ; Montemagni, Simonetta ; More, Amir ; Moreno Romero, Laura ; Mori, Keiko Sophie ; Mori, Shinsuke ; Mortensen, Bjartur ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Murawaki, Yugo ; Müürisep, Kaili ; Nainwani, Pinkey ; Navarro Horñiacek, Juan Ignacio ; Nedoluzhko, Anna ; Nešpore-Bērzkalne, Gunta ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikolaev, Vitaly ; Nitisaroj, Rattima ; Nurmi, Hanna ; Ojala, Stina ; Olúòkun, Adédayọ̀ ; Omura, Mai ; Osenova, Petya ; Östling, Robert ; Øvrelid, Lilja ; Partanen, Niko ; Pascual, Elena ; Passarotti, Marco ; Patejuk, Agnieszka ; Paulino-Passos, Guilherme ; Peng, Siyao ; Perez, Cenel-Augusto ; Perrier, Guy ; Petrov, Slav ; Piitulainen, Jussi ; Pitler, Emily ; Plank, Barbara ; Poibeau, Thierry ; Popel, Martin ; Pretkalniņa, Lauma ; Prévost, Sophie ; Prokopidis, Prokopis ; Przepiórkowski, Adam ; Puolakainen, Tiina ; Pyysalo, Sampo ; Rääbis, Andriela ; Rademaker, Alexandre ; Ramasamy, Loganathan ; Rama, Taraka ; Ramisch, Carlos ; Ravishankar, Vinit ; Real, Livy ; Reddy, Siva ; Rehm, Georg ; Rießler, Michael ; Rinaldi, Larissa ; Rituma, Laura ; Rocha, Luisa ; Romanenko, Mykhailo ; Rosa, Rudolf ; Rovati, Davide ; Roșca, Valentin ; Rudina, Olga ; Rueter, Jack ; Sadde, Shoval ; Sagot, Benoît ; Saleh, Shadi ; Samardžić, Tanja ; Samson, Stephanie ; Sanguinetti, Manuela ; Saulīte, Baiba ; Sawanakunanon, Yanin ; Schneider, Nathan ; Schuster, Sebastian ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shen, Mo ; Shimada, Atsuko ; Shohibussirri, Muh ; Sichinava, Dmitry ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Smith, Aaron ; Soares-Bastos, Isabela ; Spadine, Carolyn ; Stella, Antonio ; Straka, Milan ; Strnadová, Jana ; Suhr, Alane ; Sulubacak, Umut ; Szántó, Zsolt ; Taji, Dima ; Takahashi, Yuta ; Tanaka, Takaaki ; Tellier, Isabelle ; Trosterud, Trond ; Trukhina, Anna ; Tsarfaty, Reut ; Tyers, Francis ; Uematsu, Sumire ; Urešová, Zdeňka ; Uria, Larraitz ; Uszkoreit, Hans ; Vajjala, Sowmya ; van Niekerk, Daniel ; van Noord, Gertjan ; Varga, Viktor ; Villemonte de la Clergerie, Eric ; Vincze, Veronika ; Wallin, Lars ; Wang, Jing Xian ; Washington, Jonathan North ; Williams, Seyi ; Wirén, Mats ; Woldemariam, Tsegay ; Wong, Tak-sum ; Yan, Chunxiao ; Yavrumyan, Marat M. ; Yu, Zhuoran ; Žabokrtský, Zdeněk ; Zeldes, Amir ; Zeman, Daniel ; Zhang, Manying ; Zhu, Hanzhi
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 3 files (335.18 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 corpus 
corpus
Description:
Training data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in 11,000 English-German triplets (source, target and post-edit) belonging to the IT ...
 This item contains 1 file (1.05 MB).
 
Publicly Available
 corpus 
corpus
Description:
HindEnCorp parallel texts (sentence-aligned) come from the following sources: Tides, which contains 50K sentence pairs taken mainly from news articles. This dataset was originally col- lected for the DARPA-TIDES ...
 This item contains 3 files (66.13 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike