Consider migrating from codesearch to zoekt #68

stapelberg · 2016-05-15T18:06:48Z

In first tests, https://github.com/google/zoekt is between 2-10x faster than codesearch and degrades much more gracefully for pathological queries (queries which have many potential matches).

For 1.4G of source code, zoekt writes a 1.7G index, which is a 1.21x blow-up. Our nodes currently have 22-24G used and 52-54G available, so disk-wise, we could actually switch to zoekt.

TODO list:

How can we keep our incremental indexing, i.e. could we store one zoekt shard per package, and/or could we merge the per-package shards into a single big shard?
- zoekt by default indexes into 1 file per repository, so if we treat one debian package as one repository, we already get cheap updates.
Which features (query keywords) would we need to drop, which could we keep with a compatibility layer?
Do we need to fork zoekt to get all the features our search result page has (context lines etc.)?
- zoekt does not sort the results within a file, at least not within its own UI
- there are no context lines around matches in zoekt
How do we get our own ranking into zoekt?
Could we use the repo/branch feature of zoekt for multiple Debian versions (e.g. sid, testing, …)?
- How much extra disk space would adding other Debian versions need?

stapelberg mentioned this issue Jun 22, 2017

Evaluate how much the Hyperscan approach speeds up searching #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider migrating from codesearch to zoekt #68

Consider migrating from codesearch to zoekt #68

stapelberg commented May 15, 2016 •

edited

Loading

Consider migrating from codesearch to zoekt #68

Consider migrating from codesearch to zoekt #68

Comments

stapelberg commented May 15, 2016 • edited Loading

stapelberg commented May 15, 2016 •

edited

Loading