-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce len() calls #412
base: main
Are you sure you want to change the base?
reduce len() calls #412
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really convinced this will make a measurable difference in performance. However I would be fine with the changes in the core library as long as the bug in the Cython implementation is fixed.
bench/benchmark_scorer.py
Outdated
print("Words :", len(words)) | ||
print("Sample:", len(sample)) | ||
print("Words :", len_words) | ||
print("Sample:", len_sample) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in the bench directory aren't relevant, since they are outside the benchmarks. So readability is more important here. Really this part is super slow because of:
words = ["".join(random.choice(string.ascii_letters + string.digits) for _ in range(10)) for _ in range(10000)]
anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So readability is more important here.
That makes sense. Reverted changes to the bench
folder.
src/rapidfuzz/process_cpp_impl.pyx
Outdated
@@ -1208,6 +1208,7 @@ def extract(query, choices, *, scorer=WRatio, processor=None, limit=5, score_cut | |||
cdef RF_Scorer* scorer_context = NULL | |||
cdef RF_ScorerFlags scorer_flags | |||
cdef int64_t c_limit | |||
cdef int64_t choices_len = <int64_t>len(choices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash when choices is a generator. So this would need to be handled inside the try/except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, nice catch, changed: 74b248d
src/rapidfuzz/process_cpp_impl.pyx
Outdated
if limit is None or limit > choices_len: | ||
limit = choices_len | ||
|
||
c_limit = limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unnecessary to convert choices_len
back to a Python Object here. This could probably be something like:
if limit is None or limit > choices_len: | |
limit = choices_len | |
c_limit = limit | |
c_limit = choices_len | |
if limit is not None: | |
c_limit = min(c_limit, <int64_t>limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I'm not really sure what's happening here. Sounds good, made that change.
Reduce duplicate
len()
calls by assigning length to variable.Probably only has a microscopic effect on performance, but it is something.