-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArXiv: Handle the "new" search interface #3168
Conversation
zoe-translates
commented
Oct 23, 2023
- Detect the new (2018) search interface
- Move the codes that obtain search results from new search interface and from old search/listing/catchup into their respective functions
- Asyncify doWeb() and doSearch()
- In the legacy search-result function, prefer selector-based approach to XPath
- Add test cases for new search interface
- Add a test for doSearch()
- Detect the new (2018) search interface - Move the codes that obtains search results from new search interface and from old search/listing/catchup into their respective functions - Asyncify doWeb - For the legacy search function, prefer selector-based approach to XPath - Add test cases for new search
@ potential reviewers: This translator is for a very high-traffic service and it underpins some other ones. Please take extra care when reviewing. Thanks! |
@@ -53,10 +53,10 @@ var arxivDOI; | |||
|
|||
|
|||
function detectWeb(doc, url) { | |||
var searchRe = /^https?:\/\/(?:([^.]+\.))?(?:arxiv\.org|xxx\.lanl\.gov)\/(?:find|list|catchup)/; | |||
var searchRe = /^https?:\/\/(?:([^.]+\.))?(?:arxiv\.org|xxx\.lanl\.gov)\/(?:search|find|list|catchup)\b/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\b
is a bit of a funky anchor to use in a non-natural-language regex (or even a natural-language regex, honestly, because it doesn't deal well with Unicode - wish I knew that before I added it to a bunch of translators' data cleaning routines). Is there something specific we're trying to match here? ($|[?#/])
?
let dds = root.querySelectorAll("dl > dd"); | ||
if (dts.length !== dds.length) { | ||
Z.debug(`Warning: unexpected number of <dt> and <dd> elements: ${dts.length} !== ${dds.length}`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of grabbing all the dts and dds and warning if there's a mismatch, why not grab just the dts, iterate through the list, and for each one, check if its nextElementSibling
is a dd? They are supposed to be adjacent children, right?