dc.contributor.author | Galuščáková, Petra |
dc.contributor.author | Devaud, Romain |
dc.contributor.author | Gonzalez-Saez, Gabriela |
dc.contributor.author | Mulhem, Philippe |
dc.contributor.author | Goeuriot, Lorraine |
dc.contributor.author | Piroi, Florina |
dc.contributor.author | Popel, Martin |
dc.date.accessioned | 2023-04-28T08:50:24Z |
dc.date.available | 2023-04-28T08:50:24Z |
dc.date.issued | 2023-04-27 |
dc.identifier.uri | http://hdl.handle.net/11234/1-5139 |
dc.description | The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on the selected trending topics. The documents in the collection are the webpages which were selected with respect to these queries using the Qwant click model. Apart from the documents selected using this model, the collection also contains randomly selected documents from the Qwant index. The collection serves as the official test collection for the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/) organised at CLEF. The collection contains test datasets for two organized sub-tasks: short-term persistence (sub-task A) and long-term persistence (sub-task B). The data for the short-term persistence sub-task was collected over July 2022 and this dataset contains 1,593,376 documents and 882 queries. The data for the long-term persistence sub-task was collected over September 2022 and this dataset consists of 1,081,334 documents and 923 queries. Apart from the original French versions of the webpages and queries, the collection also contains their translations into English. |
dc.language.iso | fra |
dc.language.iso | eng |
dc.publisher | Université Grenoble Alpes |
dc.publisher | Qwant |
dc.publisher | Research Studios Austria |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation.isreferencedby | https://arxiv.org/abs/2303.03229 |
dc.rights | Qwant LongEval Attribution-NonCommercial-ShareAlike License |
dc.rights.uri | https://lindat.mff.cuni.cz/repository/xmlui/page/Qwant_LongEval_BY-NC-SA_License |
dc.source.uri | https://clef-longeval.github.io/ |
dc.subject | information retrieval |
dc.subject | cross-language |
dc.subject | cross-lingual information retrieval |
dc.subject | parallel corpus |
dc.subject | search |
dc.title | LongEval Test Collection |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Petra Galuščáková galuscakova@gmail.com Université Grenoble Alpes |
sponsor | French Agence Nationale de la Recherche ANR-19-CE23-0029 Kodicare nationalFunds |
sponsor | Austrian Science Fund I4471-N Kodicare nationalFunds |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2023062 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
size.info | 2674710 files |
size.info | 1805 other |
files.size | 21576970656 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Qwant LongEval Attribution-NonCommercial-ShareAlike License
Publicly Available
Licence: Qwant LongEval Attribution-NonCommercial-ShareAlike License
- Název
- longeval-test-collection.tgz
- Velikost
- 20.1 GB
- Formát
- application/x-gzip
- Popis
- test collection
- MD5
- c084272b1f666d57469e7ca683342fdf