LongEval Train Collection
Please use the following text to cite this item or export to a predefined format:
Galuščáková, Petra; et al., 2023,
LongEval Train Collection, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5010.
Authors
Galuščáková, Petra ; et al.
Item identifier
Project URL
Date issued
2023-02-16
Size
1570734 articles,
672 other
Description
The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on the selected trending topics. The documents in the collection were selected with respect to these queries using the Qwant click model. Apart from the documents selected using this model, the collection also contains randomly selected documents from the Qwant index. All the data were collected over June 2022. In total, the collection contains 672 train queries, with corresponding 9656 assessments coming from the Qwant click model, and 98 heldout queries. The set of documents consist of 1,570,734 downloaded, cleaned and filtered Web Pages. Apart from their original French versions, the collection also contains translations of the webpages and queries into English. The collection serves as the official training collection for the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/) organised at CLEF.
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2018101
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2023062
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
French Agence Nationale de la Recherche
Project code:ANR-19-CE23-0029
Project name:Kodicare
Austrian Science Fund
Project code:I4471-N
Project name:Kodicare
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- longeval-train-v2.tgz
- Size
- 11.7 GB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- e34cf8b5e9b2de98628759bbd621a4ca

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

