Skip to content

[Official Codes] Synthetic Test Collections for Retrieval Evaluation (SIGIR 2024)

Notifications You must be signed in to change notification settings

rahmanidashti/SyntheticTestCollections

Repository files navigation

Synthetic Test Collection

Synthetic Test Collections for Retrieval Evaluation (SIGIR 2024)

arxiv-link made-with-pytorch License: MIT

Overview


Abstract

Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation.

Folders and Files

  • dl-2023-runs: includes the run submissions for TREC Deep Learning Track 2023
  • 2023_queries.tsv: TREC Deep Learning track 2023 test queries
  • 2023.qrels.pass.withDupes.txt: TREC Deep Learning track 2023 passage qrels -- judged by NIST assessors
  • 2023.qrels.pass.gpt4.txt: TREC Deep Learning track 2023 passage qrels -- judged by GPT-4
  • prompts: includes the prompts for different tasks, passage quality rater, query generation
  • metadata_models.csv: includes metadata information for TREC Deep Learning track 2023 run submissions (i.e., type of model)

TREC Deep Learning 2023 Passages

The TREC Deep Learning 2023 Passages can be downloaded form the following URL: msmarco_v2_passage.tar

Test Query

  • qid in the 2M range: These are the human/real queries for TREC Deep Learning track 2023
  • qid in the 3M range: These are the synthetic queries for TREC Deep Learning track 2023
  • qid < 3.1M: These are 250 T5-generated queries
  • qid > 3.1M: These are 250 GPT4-generated queries

Synthetic Queries

T5

We used BeIR T5 pre-trained query generation model for generating T5-based quereis: BeIR/query-gen-msmarco-t5-large-v1

GPT-4

The gpt4-query-generation-prompt.txt prompt in prompts folder used for the GPT-4 query generation.

Synthetic Judgments

We used synthetic-judgments-prompt.txt prompt in prompts folder for the GPT-4 synthetic judgments generation. We set the generation parameters as follows:

  • engine = gpt-4-32k
  • temperature = 0
  • top_p = 1
  • frequency_penalty = 0.5
  • presence_penalty = 0

Runs

treceval

trec_eval -q {qrel_file} {run_file}

ndcgeval

trec_eval -m Rndcg -m ndcg_cut -c -q {qrel_file} {run_file}

Cite

@inproceedings{rahmani2024synthetic,
  title={Synthetic Test Collections for Retrieval Evaluation},
  author={Rahmani, Hossein A and Craswell, Nick and Yilmaz, Emine and Mitra, Bhaskar and Campos, Daniel},
  booktitle={Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={2647--2651},
  year={2024}
}
@inproceedings{craswell2024overview,
author = {Craswell, Nick and Mitra, Bhaskar and Yilmaz, Emine and Rahmani, Hossein A. and Campos, Daniel and Lin, Jimmy and Voorhees, Ellen M. and Soboroff, Ian},
title = {Overview of the TREC 2023 Deep Learning Track},
organization = {NIST},
booktitle = {Text REtrieval Conference (TREC)},
year = {2024},
month = {February},
publisher = {TREC},
url = {https://www.microsoft.com/en-us/research/publication/overview-of-the-trec-2023-deep-learning-track/},
}

Contact

If you have any questions, do not hesitate to contact us by [email protected], we will be happy to assist.

Acknowledgments

  • This research is supported by the Engineering and Physical Sciences Research Council [EP/S021566/1] and the EPSRC Fellowship titled “Task Based Information Retrieval” [EP/P024289/1].
  • TREC 2023 Deep Learning Track Guidelines

About

[Official Codes] Synthetic Test Collections for Retrieval Evaluation (SIGIR 2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published