What's New

 toolService 
toolService
Description:
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging ...
 This item contains no files.
 corpus 
corpus
Description:
The corpus presented consists of job ads in Spanish related to Engineering positions in Peru. The documents were preprocessed and annotated for POS tagging, NER, and topic modeling tasks. The corpus is divided in two ...
 This item contains 1 file (10.99 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language dataset contains (substantially larger) training ...
 This item contains 13 files (17.37 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike

Most Viewed Items

Top Last Week
 languageDescription 
languageDescription
Description:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
 This item contains 46 files (629.66 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Author(s):
Nivre, Joakim ; Agić, Željko ; Ahrenberg, Lars ; Aranzabe, Maria Jesus ; Asahara, Masayuki ; Atutxa, Aitziber ; Ballesteros, Miguel ; Bauer, John ; Bengoetxea, Kepa ; Bhat, Riyaz Ahmad ; Bick, Eckhard ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Candito, Marie ; Cebiroğlu Eryiğit, Gülşen ; Celano, Giuseppe G. A. ; Chalub, Fabricio ; Choi, Jinho ; Çöltekin, Çağrı ; Connor, Miriam ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Diaz de Ilarraza, Arantza ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eli, Marhaba ; Erjavec, Tomaž ; Farkas, Richárd ; Foster, Jennifer ; Freitas, Cláudia ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Ginter, Filip ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; Gonzáles Saavedra, Berta ; Grioni, Matias ; Grūzītis, Normunds ; Guillaume, Bruno ; Habash, Nizar ; Hajič, Jan ; Hà Mỹ, Linh ; Haug, Dag ; Hladká, Barbora ; Hohle, Petter ; Ion, Radu ; Irimia, Elena ; Johannsen, Anders ; Jørgensen, Fredrik ; Kaşıkara, Hüner ; Kanayama, Hiroshi ; Kanerva, Jenna ; Kotsyba, Natalia ; Krek, Simon ; Laippala, Veronika ; Lê Hồng, Phương ; Lenci, Alessandro ; Ljubešić, Nikola ; Lyashevskaya, Olga ; Lynn, Teresa ; Makazhanov, Aibek ; Manning, Christopher ; Mărănduc, Cătălina ; Mareček, David ; Martínez Alonso, Héctor ; Martins, André ; Mašek, Jan ; Matsumoto, Yuji ; McDonald, Ryan ; Missilä, Anna ; Mititelu, Verginica ; Miyao, Yusuke ; Montemagni, Simonetta ; More, Amir ; Mori, Shunsuke ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Mustafina, Nina ; Müürisep, Kaili ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikolaev, Vitaly ; Nurmi, Hanna ; Ojala, Stina ; Osenova, Petya ; Øvrelid, Lilja ; Pascual, Elena ; Passarotti, Marco ; Perez, Cenel-Augusto ; Perrier, Guy ; Petrov, Slav ; Piitulainen, Jussi ; Plank, Barbara ; Popel, Martin ; Pretkalniņa, Lauma ; Prokopidis, Prokopis ; Puolakainen, Tiina ; Pyysalo, Sampo ; Rademaker, Alexandre ; Ramasamy, Loganathan ; Real, Livy ; Rituma, Laura ; Rosa, Rudolf ; Saleh, Shadi ; Sanguinetti, Manuela ; Saulīte, Baiba ; Schuster, Sebastian ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shakurova, Lena ; Shen, Mo ; Sichinava, Dmitry ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Smith, Aaron ; Suhr, Alane ; Sulubacak, Umut ; Szántó, Zsolt ; Taji, Dima ; Tanaka, Takaaki ; Tsarfaty, Reut ; Tyers, Francis ; Uematsu, Sumire ; Uria, Larraitz ; van Noord, Gertjan ; Varga, Viktor ; Vincze, Veronika ; Washington, Jonathan North ; Žabokrtský, Zdeněk ; Zeldes, Amir ; Zeman, Daniel ; Zhu, Hanzhi
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 4 files (399.22 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons
 corpus 
corpus
Author(s):
Nivre, Joakim ; Agić, Željko ; Ahrenberg, Lars ; Antonsen, Lene ; Aranzabe, Maria Jesus ; Asahara, Masayuki ; Ateyah, Luma ; Attia, Mohammed ; Atutxa, Aitziber ; Augustinus, Liesbeth ; Badmaeva, Elena ; Ballesteros, Miguel ; Banerjee, Esha ; Bank, Sebastian ; Barbu Mititelu, Verginica ; Bauer, John ; Bengoetxea, Kepa ; Bhat, Riyaz Ahmad ; Bick, Eckhard ; Bobicev, Victoria ; Börstell, Carl ; Bosco, Cristina ; Bouma, Gosse ; Bowman, Sam ; Burchardt, Aljoscha ; Candito, Marie ; Caron, Gauthier ; Cebiroğlu Eryiğit, Gülşen ; Celano, Giuseppe G. A. ; Cetin, Savas ; Chalub, Fabricio ; Choi, Jinho ; Cinková, Silvie ; Çöltekin, Çağrı ; Connor, Miriam ; Davidson, Elizabeth ; de Marneffe, Marie-Catherine ; de Paiva, Valeria ; Diaz de Ilarraza, Arantza ; Dirix, Peter ; Dobrovoljc, Kaja ; Dozat, Timothy ; Droganova, Kira ; Dwivedi, Puneet ; Eli, Marhaba ; Elkahky, Ali ; Erjavec, Tomaž ; Farkas, Richárd ; Fernandez Alcalde, Hector ; Foster, Jennifer ; Freitas, Cláudia ; Gajdošová, Katarína ; Galbraith, Daniel ; Garcia, Marcos ; Gärdenfors, Moa ; Gerdes, Kim ; Ginter, Filip ; Goenaga, Iakes ; Gojenola, Koldo ; Gökırmak, Memduh ; Goldberg, Yoav ; Gómez Guinovart, Xavier ; Gonzáles Saavedra, Berta ; Grioni, Matias ; Grūzītis, Normunds ; Guillaume, Bruno ; Habash, Nizar ; Hajič, Jan ; Hajič jr., Jan ; Hà Mỹ, Linh ; Harris, Kim ; Haug, Dag ; Hladká, Barbora ; Hlaváčová, Jaroslava ; Hociung, Florinel ; Hohle, Petter ; Ion, Radu ; Irimia, Elena ; Jelínek, Tomáš ; Johannsen, Anders ; Jørgensen, Fredrik ; Kaşıkara, Hüner ; Kanayama, Hiroshi ; Kanerva, Jenna ; Kayadelen, Tolga ; Kettnerová, Václava ; Kirchner, Jesse ; Kotsyba, Natalia ; Krek, Simon ; Laippala, Veronika ; Lambertino, Lorenzo ; Lando, Tatiana ; Lee, John ; Lê Hồng, Phương ; Lenci, Alessandro ; Lertpradit, Saran ; Leung, Herman ; Li, Cheuk Ying ; Li, Josie ; Li, Keying ; Ljubešić, Nikola ; Loginova, Olga ; Lyashevskaya, Olga ; Lynn, Teresa ; Macketanz, Vivien ; Makazhanov, Aibek ; Mandl, Michael ; Manning, Christopher ; Mărănduc, Cătălina ; Mareček, David ; Marheinecke, Katrin ; Martínez Alonso, Héctor ; Martins, André ; Mašek, Jan ; Matsumoto, Yuji ; McDonald, Ryan ; Mendonça, Gustavo ; Miekka, Niko ; Missilä, Anna ; Mititelu, Cătălin ; Miyao, Yusuke ; Montemagni, Simonetta ; More, Amir ; Moreno Romero, Laura ; Mori, Shinsuke ; Moskalevskyi, Bohdan ; Muischnek, Kadri ; Müürisep, Kaili ; Nainwani, Pinkey ; Nedoluzhko, Anna ; Nešpore-Bērzkalne, Gunta ; Nguyễn Thị, Lương ; Nguyễn Thị Minh, Huyền ; Nikolaev, Vitaly ; Nurmi, Hanna ; Ojala, Stina ; Osenova, Petya ; Östling, Robert ; Øvrelid, Lilja ; Pascual, Elena ; Passarotti, Marco ; Perez, Cenel-Augusto ; Perrier, Guy ; Petrov, Slav ; Piitulainen, Jussi ; Pitler, Emily ; Plank, Barbara ; Popel, Martin ; Pretkalniņa, Lauma ; Prokopidis, Prokopis ; Puolakainen, Tiina ; Pyysalo, Sampo ; Rademaker, Alexandre ; Ramasamy, Loganathan ; Rama, Taraka ; Ravishankar, Vinit ; Real, Livy ; Reddy, Siva ; Rehm, Georg ; Rinaldi, Larissa ; Rituma, Laura ; Romanenko, Mykhailo ; Rosa, Rudolf ; Rovati, Davide ; Sagot, Benoît ; Saleh, Shadi ; Samardžić, Tanja ; Sanguinetti, Manuela ; Saulīte, Baiba ; Schuster, Sebastian ; Seddah, Djamé ; Seeker, Wolfgang ; Seraji, Mojgan ; Shen, Mo ; Shimada, Atsuko ; Sichinava, Dmitry ; Silveira, Natalia ; Simi, Maria ; Simionescu, Radu ; Simkó, Katalin ; Šimková, Mária ; Simov, Kiril ; Smith, Aaron ; Stella, Antonio ; Straka, Milan ; Strnadová, Jana ; Suhr, Alane ; Sulubacak, Umut ; Szántó, Zsolt ; Taji, Dima ; Tanaka, Takaaki ; Trosterud, Trond ; Trukhina, Anna ; Tsarfaty, Reut ; Tyers, Francis ; Uematsu, Sumire ; Urešová, Zdeňka ; Uria, Larraitz ; Uszkoreit, Hans ; Vajjala, Sowmya ; van Niekerk, Daniel ; van Noord, Gertjan ; Varga, Viktor ; Villemonte de la Clergerie, Eric ; Vincze, Veronika ; Wallin, Lars ; Washington, Jonathan North ; Wirén, Mats ; Wong, Tak-sum ; Yu, Zhuoran ; Žabokrtský, Zdeněk ; Zeldes, Amir ; Zeman, Daniel ; Zhu, Hanzhi
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
 This item contains 3 files (274.16 MB).
 
Publicly Available GNU General Public License, version 3.0 Distributed under Creative Commons