Synthetic Test Collection

Synthetic Test Collections for Retrieval Evaluation (SIGIR 2024)

Overview

Abstract

Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation.

Folders and Files

dl-2023-runs: includes the run submissions for TREC Deep Learning Track 2023
2023_queries.tsv: TREC Deep Learning track 2023 test queries
2023.qrels.pass.withDupes.txt: TREC Deep Learning track 2023 passage qrels -- judged by NIST assessors
2023.qrels.pass.gpt4.txt: TREC Deep Learning track 2023 passage qrels -- judged by GPT-4
prompts: includes the prompts for different tasks, passage quality rater, query generation
metadata_models.csv: includes metadata information for TREC Deep Learning track 2023 run submissions (i.e., type of model)

TREC Deep Learning 2023 Passages

The TREC Deep Learning 2023 Passages can be downloaded form the following URL: msmarco_v2_passage.tar

Test Query

qid in the 2M range: These are the human/real queries for TREC Deep Learning track 2023
qid in the 3M range: These are the synthetic queries for TREC Deep Learning track 2023
qid < 3.1M: These are 250 T5-generated queries
qid > 3.1M: These are 250 GPT4-generated queries

Synthetic Queries

T5

We used BeIR T5 pre-trained query generation model for generating T5-based quereis: BeIR/query-gen-msmarco-t5-large-v1

GPT-4

The gpt4-query-generation-prompt.txt prompt in prompts folder used for the GPT-4 query generation.

Synthetic Judgments

We used synthetic-judgments-prompt.txt prompt in prompts folder for the GPT-4 synthetic judgments generation. We set the generation parameters as follows:

engine = gpt-4-32k
temperature = 0
top_p = 1
frequency_penalty = 0.5
presence_penalty = 0

Runs

treceval

trec_eval -q {qrel_file} {run_file}

ndcgeval

trec_eval -m Rndcg -m ndcg_cut -c -q {qrel_file} {run_file}

Cite

@inproceedings{rahmani2024synthetic,
  title={Synthetic Test Collections for Retrieval Evaluation},
  author={Rahmani, Hossein A and Craswell, Nick and Yilmaz, Emine and Mitra, Bhaskar and Campos, Daniel},
  booktitle={Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={2647--2651},
  year={2024}
}

@inproceedings{craswell2024overview,
author = {Craswell, Nick and Mitra, Bhaskar and Yilmaz, Emine and Rahmani, Hossein A. and Campos, Daniel and Lin, Jimmy and Voorhees, Ellen M. and Soboroff, Ian},
title = {Overview of the TREC 2023 Deep Learning Track},
organization = {NIST},
booktitle = {Text REtrieval Conference (TREC)},
year = {2024},
month = {February},
publisher = {TREC},
url = {https://www.microsoft.com/en-us/research/publication/overview-of-the-trec-2023-deep-learning-track/},
}

Contact

If you have any questions, do not hesitate to contact us by [email protected], we will be happy to assist.

Acknowledgments

This research is supported by the Engineering and Physical Sciences Research Council [EP/S021566/1] and the EPSRC Fellowship titled “Task Based Information Retrieval” [EP/P024289/1].
TREC 2023 Deep Learning Track Guidelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Test Collection

Overview

Abstract

Folders and Files

TREC Deep Learning 2023 Passages

Test Query

Synthetic Queries

T5

GPT-4

Synthetic Judgments

Runs

treceval

ndcgeval

Cite

Contact

Acknowledgments

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dl-2023-runs		dl-2023-runs
figs		figs
prompts		prompts
2023.qrels.pass.gpt4.txt		2023.qrels.pass.gpt4.txt
2023.qrels.pass.withDupes.txt		2023.qrels.pass.withDupes.txt
2023_queries.tsv		2023_queries.tsv
README.md		README.md
metadata_models.csv		metadata_models.csv

rahmanidashti/SyntheticTestCollections

Folders and files

Latest commit

History

Repository files navigation

Synthetic Test Collection

Overview

Abstract

Folders and Files

TREC Deep Learning 2023 Passages

Test Query

Synthetic Queries

T5

GPT-4

Synthetic Judgments

Runs

treceval

ndcgeval

Cite

Contact

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages