Show simple item record

 
dc.contributor.author Galuščáková, Petra
dc.contributor.author Devaud, Romain
dc.contributor.author Gonzalez-Saez, Gabriela
dc.contributor.author Mulhem, Philippe
dc.contributor.author Goeuriot, Lorraine
dc.contributor.author Piroi, Florina
dc.contributor.author Popel, Martin
dc.date.accessioned 2023-04-28T08:50:24Z
dc.date.available 2023-04-28T08:50:24Z
dc.date.issued 2023-04-27
dc.identifier.uri http://hdl.handle.net/11234/1-5139
dc.description The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on the selected trending topics. The documents in the collection are the webpages which were selected with respect to these queries using the Qwant click model. Apart from the documents selected using this model, the collection also contains randomly selected documents from the Qwant index. The collection serves as the official test collection for the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/) organised at CLEF. The collection contains test datasets for two organized sub-tasks: short-term persistence (sub-task A) and long-term persistence (sub-task B). The data for the short-term persistence sub-task was collected over July 2022 and this dataset contains 1,593,376 documents and 882 queries. The data for the long-term persistence sub-task was collected over September 2022 and this dataset consists of 1,081,334 documents and 923 queries. Apart from the original French versions of the webpages and queries, the collection also contains their translations into English.
dc.language.iso fra
dc.language.iso eng
dc.publisher Université Grenoble Alpes
dc.publisher Qwant
dc.publisher Research Studios Austria
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreferencedby https://arxiv.org/abs/2303.03229
dc.rights Qwant LongEval Attribution-NonCommercial-ShareAlike License
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/Qwant_LongEval_BY-NC-SA_License
dc.source.uri https://clef-longeval.github.io/
dc.subject information retrieval
dc.subject cross-language
dc.subject cross-lingual information retrieval
dc.subject parallel corpus
dc.subject search
dc.title LongEval Test Collection
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Petra Galuščáková galuscakova@gmail.com Université Grenoble Alpes
sponsor French Agence Nationale de la Recherche ANR-19-CE23-0029 Kodicare nationalFunds
sponsor Austrian Science Fund I4471-N Kodicare nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2023062 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds
size.info 2674710 files
size.info 1805 other
files.size 21576970656
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Qwant LongEval Attribution-NonCommercial-ShareAlike License
Attribution Required Noncommercial Share Alike
Icon
Name
longeval-test-collection.tgz
Size
20.1 GB
Format
application/x-gzip
Description
test collection
MD5
c084272b1f666d57469e7ca683342fdf
 Download file

Show simple item record