Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer full-text query match on ordering #34

Closed
jmargatan opened this issue Nov 24, 2015 · 29 comments
Closed

Prefer full-text query match on ordering #34

jmargatan opened this issue Nov 24, 2015 · 29 comments

Comments

@jmargatan
Copy link

Consider the following use case.

Tab Opens:

  • George Matthew - Yahoo Mail
  • Inbox - george.matthew@gmail.com - Gmail

When the query is gmail, the results are ordered as such (bold indicates highlighted letters):

  • George Matthew - Yahoo Mail
  • Inbox - george.matthew@gmail.com - Gmail

The proposal is to return the tab with full-text match on top of the fuzzy result, here is how it looks:

  • Inbox - george.matthew@gmail.com - Gmail
  • George Matthew - Yahoo Mail

@janraasch: What do you think? :)

@janraasch
Copy link
Owner

Hi @jmargatan, you are absolutely right. There is an open issue on the fuzzy search module used here: mattyork/fuzzy#3. I am sure they would accept a pull request implementing this.

@jmargatan
Copy link
Author

@janraasch: I glanced through mattyork/fuzzy's code. The current aggressive/greedy matching strategy will need some major refactoring to have full-text capability and this may not align with the direction of the project, "1k standalone fuzzy search / fuzzy filter".

After digging into this, I think the most optimal way to achieve a full-text search will require some dynamic programming approach, commonly referred to as the LCS problem. It won't be as fast as greedy matching--"fast" is a relative term--but we can tweak it a little bit by caching intermediary computational result.

As I start typing my library I bumped into jeancroy/fuzzaldrin-plus. I read through the description, it looked promising, haven't looked at the code yet, but it seems to align to what we want to achieve. Would you be open to check if this would be a good alternative? Thanks in advance.

@janraasch
Copy link
Owner

Hi @jmargatan. Thank you for your comment. I appreciate that you have not forgotten about this.

I would like to reply with two things.

  1. I think considering the use case of fuzzy.js here, which is really just filtering through a couple of urls and titles (let's assume a maximum of 100 or 200 items/urls+titles to search), we do not necessarily have to solve the LCS problem in its entirety and most performant way to satisfy the use case at hand. There is already a pull request (Added sublime fuzzy search behavior. mattyork/fuzzy#12) on the fuzzy.js repo which looks promising. What do you think?
  2. Of course fuzzaldrin-plus looks great, too. Plus I am pretty sure it is well proven at least for the file path use case. However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin. Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '<bold>', post: '</bold>' } to fuzzy.js.

@janraasch
Copy link
Owner

Hi @jmargatan, I finally remember the other js-library I really wanted to use for this, but could not at the time. Now, once krisk/Fuse#6 lands in a non-beta release of https://github.com/krisk/Fuse I think this will fix our issue.

@janraasch
Copy link
Owner

Alright, good news. This seems to work. Would you be willing to test this? I released a beta for you to check it out: https://github.com/janraasch/tab-ahead/releases/tag/1.3.0-beta1. You can download the TabAhead.zip file, then unzip it, and under chrome://extensions/ enable Developer mode to be able to Load unpacked extension....

I would love to get your feedback on this, @jmargatan.

@jmargatan
Copy link
Author

Woohoo. I'll give it a try once I get home tonight. Thanks @janraasch!

@jmargatan
Copy link
Author

@janraasch I wasn't able to test it. I disabled the original one and upload the beta version as you suggested but Alt+T does not work and the search box itself does not produce any matches. Thoughts?

@janraasch
Copy link
Owner

hi @jmargatan, I'm sorry. You're right. There is some problem when packaging the new dependencies. Don't worry, I'll figure it out and release another beta. Thank you for being willing to test this.

@janraasch
Copy link
Owner

Alright, can you please try again with https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta2.

Also, you do not have to deinstall the original version from the app store, the two can run alongside each other, so you can really compare the results.

Finally, the keyboard shortcut can be configured on the chrome://extensions/ page as well. There is a link named Keyboard shortcuts on the bottom right.

Let me know, when you get around to having a look at this, @jmargatan,

@jmargatan
Copy link
Author

@janraasch, I think this new one is much better. :shipit: Thanks for addressing this issue, I really appreciate it.

screen shot 2016-03-04 at 10 33 27 am

Also thanks for pointing the keyboard shortcut. Although after uploading the the beta, the Alt-T shortcut stopped working. I have yet to figure that out.

@janraasch
Copy link
Owner

Very well then, I'll cut a release once the version of Fuse.js we use is out of beta. Thank you for your help. I'll close this issue, once the new version is available on the Chrome Web Store.

@jeancroy
Copy link

Hi, author of fuzzaldrin-plus here, sorry of being late to the party.

Of course fuzzaldrin-plus looks great, too. Plus I am pretty sure it is well proven at least for the file path use case.

Actually both works moslty the same, the main difference is that there's a second match context in the file name.

However I could not find an API reference on the repo. It does say that it is compatible with fuzzaldrin.

Yes it was written as a substitute for addressing user report in atom text editor.
It may now have a live of it's own do I'll fix the documentation accordingly.
Basic is this

fuzzaldrin  = require("fuzzaldrin-plus")

//Filter a list of candidate - strings
candidates = ...
query = ...
results = fuzzaldrin.filter(candidates, query)

//Filter list of candidate objects
results = fuzzaldrin.filter(candidates, query,  key: 'name')

Do you know, if there is a way to get the positions of the matched characters in the string or is there another way to easily wrap the matched characters? Something like passing options = { pre: '', post: '' } to fuzzy.js.

character_positions = fuzzaldrin.match(candidate, query)

I dont really like this syntax but it was what was required for atom. I may provide a function that wrap inside specified stard/end tag.

I'll cut a release once the version of Fuse.js we use is out of beta.

Fuse is based on edit distance. And thus is much better comparing word (eg spellcheck) between them than expanding shortcut into a large result. I see v2 of Fuse tokenize the candidate into multiple word then do the comparison over each, summing result.

That approach may works well, up to the point the user type a bit of this word, and a bit of that word. You'll also see in the screenshot that every email address part is compared to gmail so a lot is highlighted.

@jmargatan
Copy link
Author

@janraasch: The current solution is fantastic but some side of the aggressive matching still hurts the performance. Please refer to the example below. What's your thought on it? :)

screen shot 2016-04-19 at 10 29 19 am

Thanks for the insight on jeancroy/fuzzaldrin-plus, @jeancroy, would be interesting to compare the result between these two.

@janraasch
Copy link
Owner

janraasch commented Apr 19, 2016

@jeancroy, thank you very much for your detailed explanation. I am sorry for answering so late, but I am quiet busy as I am sure we all are (open source... :)).

@jmargatan Could you elaborate a little bit on what you mean by

aggressive matching still hurts the performance

I am sorry, but I am not sure what you mean just by looking at the screenshot.

And, lastly:

would be interesting to compare the result between these two.

Definitely, I agree. I will make sure to do this. (Now that I know the API ;))

@jmargatan
Copy link
Author

@janraasch Certainly. :)

On the screenshot, I was trying to illustrate there are 2 candidates and the first one does not have the full text, "yahoo mail", but the the second one does at the end. In this case, I would assume the second candidate should have been ranked higher. At least, from LCS perspective. What's your thought on it?

@janraasch
Copy link
Owner

Just to clarify is that screenshot taken using the beta version or the one currently available on the web store?

@jmargatan
Copy link
Author

This is from the Chrome Web Store version.

screen shot 2016-04-19 at 12 19 48 pm

I am not aware of a way to update a Chrome Extension. I assume I am on the latest version. Although the Web Store does say "Updated: November 16, 2015".

screen shot 2016-04-19 at 12 22 47 pm

@janraasch
Copy link
Owner

Ok.

@jeancroy
Copy link

jeancroy commented Apr 20, 2016

Would be interesting to compare the result between these two

Original
gmail0

Fuse.js beta
fuse

FuzzAldrin-Plus
gmail

Made this demo to toy with the project, hopefully it clarify the api/usage a bit too.

@jmargatan
Copy link
Author

@jeancroy: Whoa this is cool! Let's see what @janraasch thinks. :)

@janraasch
Copy link
Owner

janraasch commented Apr 20, 2016

IMHO the Fuse.js beta version seems to work fine, especially, with this commit 877dbb8 on master, which takes care of making the results less noisy by only highlighting matches with more than one character. I should probably release a new beta using the current master so you can check it out, but maybe it is also easy enough to imagine the screenshot above w/o the single character matches :). What do you think, @jmargatan?

@janraasch
Copy link
Owner

janraasch commented Apr 20, 2016

@jeancroy: For your examples you seem to be only searching in the title String, can you search through title and url like with Fuse.js?

var tabs = [{title: 'my-title', url: 'my-url'}]
var options = {
  keys: ['url', 'title']
}
var f = new Fuse(tabs, options);
var result = f.search('my');

@janraasch
Copy link
Owner

Alright, so here https://github.com/janraasch/tab-ahead/releases/tag/v1.3.0-beta3 is the current version built from master.

You can download the TabAhead.zip file, then unzip it, and under chrome://extensions/ enable Developer mode to be able to Load unpacked extension.... The keyboard shortcut can be configured on the chrome://extensions/ page as well. There is a link named Keyboard shortcuts on the bottom right.

Could you take this for a spin, @jmargatan? Maybe compare the results with your screenshot from this comment #34 (comment).

@jmargatan
Copy link
Author

@janraasch Thanks for packing up this build. I tested with both scenarios:

screen shot 2016-04-20 at 1 25 32 am

screen shot 2016-04-20 at 1 26 28 am

From the second screenshot, the ranking does improve although the highlighted characters are rather confusing, note that the oo at ...Google Search... was highlighted.

@janraasch
Copy link
Owner

Ok, one little caveat as this discussion is getting quiet long: We have to be a little bit careful not too overestimate the meaning of a few handpicked samples, because there are currently about 4600 people using this extension (with the old version) and none of them complained even though the current algorithm used is simply incorrect from a technical point of view.

Which is why I would say even though

highlighted characters are rather confusing, note that the oo at ...Google Search... was highlighted.

this is still a better solution, because

ranking does improve

so I would like to

  • Take this version (the current beta using Fuse.JS) and release that as v1.3.0 to the public.
  • Close this issue and see what the feedback of the other users is like. (If there is no feedback, we did well, that is free software for you :)). If user counts start dropping then we'll rollback and go on from there.

What do you say?

@jeancroy
Copy link

jeancroy commented Apr 20, 2016

For your examples you seem to be only searching in the title String, can you search through title and url like with Fuse.js?

There's the api to search for a specific key. fuzzaldrin.filter(candidates, query, {key:"title"})There's not yet the API to search for multiple of them. I'm thinking of averaging the score over search space, maybe let the user specify a weigh for every property. Other option is to select the best field, and still weight from which field it is.

I may add it, I'm a bit busy now, and most importantly, i'm now being super careful with updating that project because atom has so many users. It is still possible to achieve this goal using the fuzzaldrin.score(candidate,query) inside a double loop: foreach items / foreach indexedFields

none of them complained even though the current algorithm used is simply incorrect from a technical point of view.

I guess it depend on the complexity of the search text as well as the expectation of the user.
One thing I think fuse will handle better is spelling mistake (gnail vs gmail)
One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail)
If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.

Close this issue and see what the feedback of the other users is like.

That sound like I plan. I can ping back if the multiple field feature is implemented. Or more cleanly you can open/subscribe to an issue that ask for it

@janraasch
Copy link
Owner

@jeancroy thank you very much for the detailed insight into fuzzaldrin and your informed opinion on the fuse vs. fuzzaldrin matter considering the special use case at hand.

With your comment

One thing I think fuse will handle better is spelling mistake (gnail vs gmail)
One thing I think fuzzaldrin will handle better is acronym. (GMYM vs George Matthew Yahoo Mail)
If people don't expect to search by acronym then fuse is probably the library that meet your immediate need.

I am now even more confident to release the fuse-version to the world.

@jmargatan
Copy link
Author

I agree with you both. Thank you both for looking into this.

@janraasch
Copy link
Owner

janraasch commented Apr 26, 2016

Just released v1.3.0 to the Chrome Web Store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants