-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize "in" predicate as disjunction of "==" #423
base: master
Are you sure you want to change the base?
normalize "in" predicate as disjunction of "==" #423
Conversation
Does the Edit: Yes it does In [4]: class Boom:
...: def __bool__(self):
...: raise Exception()
In [7]: any([b])
---------------------------------------------------------------------------
Exception
....
In [8]: any([True, b])
Out[8]: True |
eventually we might want to consider passing the filters directly to pyarrow since they implemented this by now as well. I would expect them to deal with these things much faster than we are in python. |
do we have any kind of benchmarks? What I'm a bit concerned about is that we interact with the parquet reader now for every element in the list. I don't know how expensive this is |
Very valid concern. We can isolate the operation evaluation logic in a separate function and call that directly so that we don't need to bother about this |
cc @mlondschien this might interest you xref #325 |
All depends on how complex the end state will be since your intention is to simplify things. I think this would be ok, though |
Good point. Do you know how "mature" this logic for pyarrow is atm? It might make more sense to invest in using their functionality directly rather than this kind of work |
Description:
The following two statements are equivalent:
x in [1,2,3]
(x == 1) or (x == 2) or (x == 3)
This approach simplifies the function by just calling itself instead of iterating over a for loop and essentially running the code block under
if op == "==":
for each value of an "in" predicate