Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SynonymV2GraphFilterFactory and NrtsearchSynonymParser #632

Merged
merged 3 commits into from
Mar 20, 2024

Conversation

swethakann
Copy link
Contributor

No description provided.

assertAnalyzesTo(
analyzer,
"str",
new String[] {"strada", "strasse", "straxdfe", "str"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is not working right with string processing. \xDF should decode to a single character, not xdf.


class NrtsearchSynonymParser extends SynonymMap.Parser {
private final boolean expand;
private static final String SYNONYMS_SEPARATOR = "\\|";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this separator be made configurable?

@swethakann swethakann changed the title SynonymV2GraphFilterFactory POC code SynonymV2GraphFilterFactory and NrtsearchSynonymParser Mar 19, 2024
@swethakann swethakann marked this pull request as ready for review March 19, 2024 16:02
@swethakann swethakann requested a review from aprudhomme March 19, 2024 16:38
Comment on lines 103 to 120
Analyzer analyzer;
String analyzerClassName = MessageFormat.format(LUCENE_ANALYZER_PATH, analyzerName);
try {
analyzer =
(Analyzer)
Analyzer.class
.getClassLoader()
.loadClass(analyzerClassName)
.getDeclaredConstructor()
.newInstance();
} catch (InstantiationException
| IllegalAccessException
| NoSuchMethodException
| ClassNotFoundException
| InvocationTargetException e) {
throw new RuntimeException(e);
}
return analyzer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 92 to 97
if (parserFormat.equals("nrtsearch")) {
parser = new NrtsearchSynonymParser(separatorPattern, true, expand, analyzer);
} else {
throw new IllegalArgumentException(
"The parser format: " + parserFormat + " is not valid. It should be nrtsearch");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can only have one value, is it needed at all?

/** SPI name */
public static final String NAME = "synonymV2";

public static final String MAPPINGS = "mappings";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be named synonyms for consistency

Comment on lines 79 to 81
public TokenStream create(TokenStream input) {
return new SynonymGraphFilter(input, synonymMap, ignoreCase);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this method in SynonymGraphFilter, it should also handle the no synonyms case

import org.apache.lucene.analysis.synonym.SynonymMap;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class SynonymV2GraphFilterFactory extends TokenFilterFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a class docstring outlining usage and configuration of this token filter. Also, add a reference in https://github.com/Yelp/nrtsearch/blob/master/docs/analysis.rst

@swethakann swethakann requested a review from aprudhomme March 20, 2024 18:42
@swethakann swethakann merged commit 92a8b5b into master Mar 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants