Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts about API design #110

Open
lintool opened this issue Apr 19, 2024 · 17 comments
Open

Thoughts about API design #110

lintool opened this issue Apr 19, 2024 · 17 comments

Comments

@lintool
Copy link
Member

lintool commented Apr 19, 2024

Just thoughts... @ronakice @sahel-sh reactions welcome.

Two methods: rerank and rerank_batch.

Method signature:

    def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates:

Commentary:

  • I prefer an actual Query object as opposed to just a str. This will allow us to attach additional fields to include metadata, etc.
  • I prefer an actual Candidates object as opposed to just a list of json objects. Again, this will allow us to attach other metadata, such as execution traces, etc.
  • Contract is that the returned Candidates is a fresh object. I.e., the method is non-destructive - does not mess with the input.

The, rerank_batch would be:

    def rerank_batch(self, query: List[Query], candidates: List[Candidates], k: int = 10) -> Candidates:

rerank can just route to rerank_batch with a list of one object.

@sahel-sh
Copy link
Member

This looks clean, below are my thoughts:
If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.

I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py
Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.

@lintool
Copy link
Member Author

lintool commented Apr 19, 2024

This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.

Agreed. There should be a pretty straightforward mapping from json/dict to Python objects. I envision from_json "importers" and to_json "exporters". (We can quibble about method naming later.)

I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.

Agreed. An example of metadata attached to a Query might be its id; in the case of TREC topics, different fields (title, narrative, description), etc.

@sahel-sh
Copy link
Member

sahel-sh commented Apr 20, 2024 via email

@sahel-sh
Copy link
Member

This is what we agreed on with @ronakice.

New Query and Candidate classes:

class Query(TypedDict):
    text: str
    qid: str


class Candidate(TypedDict):
    docid: str
    score: str
    title: NotRequired[str]
    content: str

Existing Classes:

class RankingExecInfo:
    def __init__(
        self, prompt, response: str, input_token_count: int, output_token_count: int
    ):
        self.prompt = prompt
        self.response = response
        self.input_token_count = input_token_count
        self.output_token_count = output_token_count

    def __repr__(self):
        return str(self.__dict__)

class Result:
    def __init__(
        self,
        query: **Query**,
        candidates: **List[Candidate]**,
        ranking_exec_summary: List[RankingExecInfo] = None,
    ):
        self.query = query
        self.candidates = candidates
        self.ranking_exec_summary = ranking_exec_summary

    def __repr__(self):
        return str(self.__dict__)

Then batch_batch would look like this:

def rerank_batch(self, results: List[Result], k: int = 10) -> List[Result]:

@lintool PTAL and let me know what you think. The remaining issue is the name Result for the class that contains a query, a list of candidates, and execution summery (after reranking)

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

Captured discussion over Slack w/ @sahel-sh ... still unresolved points.

This seems uncontroversial:

 def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates:

Note that Candidates is a container object around something like List[Candidate], plus metadata.

However, what should rerank_batch look like?

The obvious extension would be:

S1: def rerank_batch(self, query: List[Query], candidates: List[Candidates], k: int = 10) -> List[Candidates]:

But @sahel-sh points out: ...not batch friendly: when you always need to zip two lists to iterate through them together (list of queries and list of candidates), it is a strong signal that you should have a list of pairs. I also think the execution info belongs to both query and its candidates not just candidates.

An alternative would be:

S2: def rerank_batch(self, input: Dict[Query, Candidates], k: int = 10) -> Dict[Query, Candidates]:

But at this point, you might as well wrap Dict[Query, Candidates] in a container object, e.g., Results, so we have:

S3: def rerank_batch(self, input: Results, k: int = 10) -> Results:

So, we have S1, S2, S3 as options...

@sahel-sh
Copy link
Member

S3 doesn't capture what I had in mind accurately:

def rerank(self, input: Result, k: int=10) ->Result
def rerank_batch(self, input: List[Result], k: int=10) ->List[Result]

where each Result (for lack of a better name) has a Query , List[Candidate], and optional execution metadata.

@sahel-sh
Copy link
Member

There are a couple of minor issues with s1 naming as well that we should resolve if we decide to adopt it:
Ideally rerank should have singular args, and rerank_batch plural:

  • In s1 rerank_batch can have queries (vs query in rerank), but it cannot have candidatess (vs candidates in rerank) since the arg is already plural in rerank.
  • Candidates class has a List[Candidate] field. What would be the name of this class field? maybe candidates or candidate_list? Then accessing this field will look like candidates[of type Candidates].candidates or candidates.candidate_list which is not very readable.

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

I guess I'm hung up over naming of Result... it's really a Tuple[Query, List[Candidates]]:

Let me propose S4 then?

def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates
def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]

Then the issue is that we don't have anywhere to stuff the metadata.

@sahel-sh
Copy link
Member

I guess I'm hung up over naming of Result... it's really a Tuple[Query, List[Candidates]]:

Let me propose S4 then?

def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates
def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]

Then the issue is that we don't have anywhere to stuff the metadata.

I like s4 better than s1. Similar to s1, the metadata can go into Candidates class (not the best fit imho). I think the only advantages that s3 has over s4 are:

  • inputs for the batch version of the function are a natural extension of the single query version.
  • candidates.candidates or candidates.candidate_list does not seem very intuitive to me.
    But for s3, I don't like the name of the Class containing query and List[Candidate]. Result is to vague and confusing specially since it will be type of the input parameter too.

I think my preference ranking is s3 > s4 > s1 > s2 (s2 is my least favorite since chunking a list in batch mode to be handled with multiple threads is more natural than chunking a dictionary)

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

S5: another strategy - use Python's ability to return multiple values:

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> (List[Tuple[Query, List[Candidate]]], List[ExecutionTraceObject])

Note, in this design, we don't have Candidates anymore.

Or S5a

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]

Or S6, making more prominent use of Tuple

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Tuple[Query, List[Candidate], ExecutionTraceObject]
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

Okay, after further thoughts, S7:

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Result
def rerank(self, request: Request, k: int = 10) -> Result
def rerank_batch(self, input: List[Request], k: int=10) -> List[Result]

Commentary:

  • I don't mind polymorphism on rerank. Just syntactic sugar.
  • I like this better than S6. Actual objects instead of Tuple provides more flexibility.

So I would envision a snippet like:

q = Query('myqid', 'the actual query')
candidates = ...

results = rank_llm.rerank(query=q, documents=candidates, k=10)

print(results.query) # you just get the query back
print(results.execution_trace) # or whatever we want to call it.
for hit in results.candidates:
    # iterate over candidates, these are Candidate objects

And:

request = Request(Query(qid, query), candidates)
# iterate to build up a list of requests

@ronakice
Copy link
Member

Having slept throught most of this, I like S7 the most. Polymorphism is perfectly fine. I think we should discuss more in detail about if we want rerank to be calling rerank_batch (vllm introduces some differences) but a discussion not for this thread.

@sahel-sh
Copy link
Member

I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future).
My only comment is on the Request as the query, candidates wrapper. I think we should reserve Request and Response for RestAPI input and output. I think we can call the wrapper class something like RankingInput or RankingInstance or something like this.
Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could be RankingResult.

re rerank_batch: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future). My only comment is on the Request as the query, candidates wrapper. I think we should reserve Request and Response for RestAPI input and output. I think we can call the wrapper class something like RankingInput or RankingInstance or something like this. Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could be RankingResult.

Not sure how much I agree with this. At a high level, shouldn't methods take conceptual versions of the objects that abstract away from the implementation? That is, we should have a Request "concept" independent of whether it's implemented via REST, protobufs, or whatever. The method should hide invocation details, object marshaling, etc. behind the scenes? And if we really need, we'll have concrete subclasses like HttpRequest, ProtobufRequest, etc.

re rerank_batch: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?

I really mean rerank_batch as in "rerank a batch of requests". To me at least, it doesn't specify an implementation... could be a for loop, a distributed scatter/gather call, or map (in MapReduce)?

My $0.02... but happy to discuss further though.

@sahel-sh
Copy link
Member

Thanks for explaining, I will implement s7 as is, then

@lintool
Copy link
Member Author

lintool commented Apr 22, 2024

Two more issues to sort out:

  1. JSON format.
  2. Anserini/Pyserini command-line options.

For (2), in Anserini, SearchCollection currently takes -output to generate a standard TREC runfile. So I propose to add a new option-outputRerankerRequests to generate whatever we decide for (1) to feed into RankLLM. I propose to also add option -outputRerankerRequests.format, taking in either json or jsonl. I can see either being useful. In particular, it would make sense to output json pretty-printed (i.e., with proper human readable identation).

In Pyserini, the options would be --output-reranker-requests and --output-reranker-requests-format, following standard Python conventions.

Thoughts? Happy to consider alternate names for the options.

@lintool
Copy link
Member Author

lintool commented Apr 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants