Skip to content

Commit

Permalink
Update format
Browse files Browse the repository at this point in the history
  • Loading branch information
ronakice committed Apr 21, 2024
1 parent 86001c0 commit 6351599
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 12 deletions.
4 changes: 2 additions & 2 deletions docs/_posts/2024-02-12-hello-world.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "TREC RAG 2024 - Hello World!"
title: "TREC 2024 RAG - Hello World!"
date: 2024-02-12T08:00:00-00:00
categories:
- Annoucements
Expand All @@ -12,7 +12,7 @@ toc: true
Hello World.

Best regards,
TREC-RAG Organizers
TREC RAG Organizers

## Useful links

Expand Down
4 changes: 2 additions & 2 deletions docs/_posts/2024-03-25-corpus.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "TREC RAG 2024 - Corpus Discussion"
title: "TREC 2024 RAG - Corpus Discussion"
date: 2024-03-25T08:00:00-00:00
categories:
- Annoucements
Expand All @@ -16,4 +16,4 @@ We now pass this on to the community for feedback. We are particularly intereste

Live Long and Prosper,

TREC-RAG Organizers
TREC RAG Organizers
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "TREC RAG 2024 Corpus Finalization (Draft)"
title: "TREC 2024 RAG Corpus"
date: 2024-04-18T08:00:00-00:00
categories:
- Annoucements
Expand Down Expand Up @@ -40,7 +40,7 @@ Consider the document above, the `docid` `msmarco_v2.1_doc_29_677149` encodes th


## MS MARCO V2.1 Document Corpus - Segmented Version
We also provide the community with a segmented version of the MS MARCO V2.1 document corpus. The segmented version of V2.1 has two new fields `start_char` and `end_char` containing the start and end position of the segment in the body of the corresponding MS MARCO V2.1 document. To get this corpus, we first reproduced the original segmentation from the MS MARCO V2 document corpus, involving steps like finding the precise `spacy` version. The resultant process leverages a sliding window size of 10 sentences and a stride of 5 sentences to create segments of text, roughly between 500-1000 characters, making it more manageable for users and baselines. We plan to leverage this resultant segmented MS MARCO V2.1 document corpus heavily in the TREC RAG 2024 Track, more in the upcoming task description. The corpus contains 113,520,750 segments in total.
We also provide the community with a segmented version of the MS MARCO V2.1 document corpus. The segmented version of V2.1 has two new fields `start_char` and `end_char` containing the start and end position of the segment in the body of the corresponding MS MARCO V2.1 document. To get this corpus, we first reproduced the original segmentation from the MS MARCO V2 document corpus, involving steps like finding the precise `spacy` version. The resultant process leverages a sliding window size of 10 sentences and a stride of 5 sentences to create segments of text, roughly between 500-1000 characters, making it more manageable for users and baselines. We plan to leverage this resultant segmented MS MARCO V2.1 document corpus heavily in the TREC 2024 RAG Track, more in the upcoming task description. The corpus contains 113,520,750 segments in total.


### Segment Format
Expand Down Expand Up @@ -118,7 +118,7 @@ Topics corresponding to the TREC DL 2021-2023 and Dev/Dev2 sets are the same as
## Next Steps
We have already implemented [Anserini]()/Pyserini retrieval baselines for these sets and are in the process of packaging things! Additionally, we hope to provide reranking baselines with state-of-the-art RankZephyr, RankGPT, and Cohere Rerank 3 models through [RankLLM](rankllm.ai).

More information on the topics for the TREC RAG 2024 Track will be released soon. Stay tuned!
More information on the topics for the TREC 2024 RAG Track will be released soon. Stay tuned!

So long and thanks for all the fish 🐟,

Expand Down
4 changes: 2 additions & 2 deletions docs/_posts/2024-04-18-2024-task-descriptions.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "TREC RAG 2024 Task Description (Draft)"
title: "TREC 2024 RAG Task Description (Draft)"
date: 2024-04-18T09:00:00-00:00
categories:
- Annoucements
Expand All @@ -10,7 +10,7 @@ classes: wide
toc: false
---

We are excited to provide details on each task which will be conducted in the first year of the TREC RAG 2024! πŸŽ‰πŸŽ‰
We are excited to provide details on each task which will be conducted in the first year of the TREC RAG! πŸŽ‰πŸŽ‰

# TASK 1: Retrieval Track (R)

Expand Down
6 changes: 3 additions & 3 deletions docs/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The [(TREC)](https://trec.nist.gov/) **R**etrieval-**A**ugmented **G**eneration

## 2024 Tasks Overview

We are conducting three tasks in TREC-RAG 2024 track. These tasks are as follows:
We are conducting three tasks in TREC 2024 RAG track. These tasks are as follows:

1. **(R) Retrieval Task** : The β€œR” track requires participants to rank and retrieve the topmost relevant segments from the MS MARCO Segment v2.1 collection for the provided set of input topics, i.e., queries.

Expand All @@ -20,7 +20,7 @@ We are conducting three tasks in TREC-RAG 2024 track. These tasks are as follows

## Proposed Evaluation Methodology - Version 0.1

The proposed evaluation methodology for the TREC RAG 2024 track is as follows:
The proposed evaluation methodology for the TREC 2024 RAG track is as follows:

![Gather Answers](/assets/images/eval-step1.png){: .align-center}
<figcaption>Given a set of queries, the participants will be asked to generate a set of answers. The answers can be generated using any method, including but not limited to retrieval, generation, or a combination of both. </figcaption>
Expand All @@ -45,7 +45,7 @@ The proposed evaluation methodology for the TREC RAG 2024 track is as follows:

Runs should be due around the end of July. More granular timeline will be released soon.

## Organizers of TREC RAG 2024
## Organizers of TREC 2024 RAG Track

- Ronak Pradeep, University of Waterloo
- Nandan Thakur, University of Waterloo
Expand Down

0 comments on commit 6351599

Please sign in to comment.