Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArXiv: Handle the "new" search interface #3168

Closed

Conversation

zoe-translates
Copy link
Collaborator

  • Detect the new (2018) search interface
  • Move the codes that obtain search results from new search interface and from old search/listing/catchup into their respective functions
  • Asyncify doWeb() and doSearch()
  • In the legacy search-result function, prefer selector-based approach to XPath
  • Add test cases for new search interface
  • Add a test for doSearch()

- Detect the new (2018) search interface
- Move the codes that obtains search results from new search interface
  and from old search/listing/catchup into their respective functions
- Asyncify doWeb
- For the legacy search function, prefer selector-based approach to
  XPath
- Add test cases for new search
@zoe-translates
Copy link
Collaborator Author

zoe-translates commented Oct 23, 2023

@ potential reviewers: This translator is for a very high-traffic service and it underpins some other ones. Please take extra care when reviewing. Thanks!

@@ -53,10 +53,10 @@ var arxivDOI;


function detectWeb(doc, url) {
var searchRe = /^https?:\/\/(?:([^.]+\.))?(?:arxiv\.org|xxx\.lanl\.gov)\/(?:find|list|catchup)/;
var searchRe = /^https?:\/\/(?:([^.]+\.))?(?:arxiv\.org|xxx\.lanl\.gov)\/(?:search|find|list|catchup)\b/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\b is a bit of a funky anchor to use in a non-natural-language regex (or even a natural-language regex, honestly, because it doesn't deal well with Unicode - wish I knew that before I added it to a bunch of translators' data cleaning routines). Is there something specific we're trying to match here? ($|[?#/])?

Comment on lines +108 to +111
let dds = root.querySelectorAll("dl > dd");
if (dts.length !== dds.length) {
Z.debug(`Warning: unexpected number of <dt> and <dd> elements: ${dts.length} !== ${dds.length}`);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of grabbing all the dts and dds and warning if there's a mismatch, why not grab just the dts, iterate through the list, and for each one, check if its nextElementSibling is a dd? They are supposed to be adjacent children, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants