-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SynonymV2GraphFilterFactory and NrtsearchSynonymParser #632
Conversation
assertAnalyzesTo( | ||
analyzer, | ||
"str", | ||
new String[] {"strada", "strasse", "straxdfe", "str"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something is not working right with string processing. \xDF
should decode to a single character, not xdf
.
|
||
class NrtsearchSynonymParser extends SynonymMap.Parser { | ||
private final boolean expand; | ||
private static final String SYNONYMS_SEPARATOR = "\\|"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this separator be made configurable?
Analyzer analyzer; | ||
String analyzerClassName = MessageFormat.format(LUCENE_ANALYZER_PATH, analyzerName); | ||
try { | ||
analyzer = | ||
(Analyzer) | ||
Analyzer.class | ||
.getClassLoader() | ||
.loadClass(analyzerClassName) | ||
.getDeclaredConstructor() | ||
.newInstance(); | ||
} catch (InstantiationException | ||
| IllegalAccessException | ||
| NoSuchMethodException | ||
| ClassNotFoundException | ||
| InvocationTargetException e) { | ||
throw new RuntimeException(e); | ||
} | ||
return analyzer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be replaced with a call to https://github.com/Yelp/nrtsearch/blob/master/src/main/java/com/yelp/nrtsearch/server/luceneserver/analysis/AnalyzerCreator.java#L59
if (parserFormat.equals("nrtsearch")) { | ||
parser = new NrtsearchSynonymParser(separatorPattern, true, expand, analyzer); | ||
} else { | ||
throw new IllegalArgumentException( | ||
"The parser format: " + parserFormat + " is not valid. It should be nrtsearch"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can only have one value, is it needed at all?
/** SPI name */ | ||
public static final String NAME = "synonymV2"; | ||
|
||
public static final String MAPPINGS = "mappings"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be named synonyms
for consistency
public TokenStream create(TokenStream input) { | ||
return new SynonymGraphFilter(input, synonymMap, ignoreCase); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this method in SynonymGraphFilter
, it should also handle the no synonyms case
import org.apache.lucene.analysis.synonym.SynonymMap; | ||
import org.apache.lucene.analysis.util.TokenFilterFactory; | ||
|
||
public class SynonymV2GraphFilterFactory extends TokenFilterFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a class docstring outlining usage and configuration of this token filter. Also, add a reference in https://github.com/Yelp/nrtsearch/blob/master/docs/analysis.rst
No description provided.