-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CISG-Online #3169
base: master
Are you sure you want to change the base?
Conversation
Instead of filling the abstract, info is now saved as formatted notes. Now supports downloading multiple PDF attachments. Parsing logic cleaned up by avoiding trailing array functions.
57c52e4
to
91c2648
Compare
if (url.includes('caseId=')) { | ||
return 'case'; | ||
} | ||
else if (url.includes('search-for-cases')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always true due to target regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's better? A target regex that matches the entire website with a detectWeb that may return false, or a specific target regex where detectWeb always returns an itemType?
for (let block of blocks.values()) { | ||
result += `<h2>${label}</h2><ul>`; | ||
// TODO: check fragility | ||
let rows = block.querySelectorAll("div.row > div > div.row"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does look pretty fragile! Anything else we can match on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tragically no... Example:
<div class="is-content">
<div class="row">
<div class="col-xs-10">
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Name</b>
</div>
<div class="col-xs-12 col-lg-8">
Brands International Corporation
</div>
</div>
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Place of business</b>
</div>
<div class="col-xs-12 col-lg-8">
Canada
</div>
</div>
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Role in transaction</b>
</div>
<div class="col-xs-12 col-lg-8">
Seller
</div>
</div>
</div>
<div class="col-xs-2 text-right">
<img src="/css/flags/blank.gif" class="flag flag-ca align-top" />
</div>
</div>
</div>
Let's just say the website does not really adhere to semantic web...
"url": "https://cisg-online.org/search-for-cases?caseId=14425", | ||
"attachments": [ | ||
{ | ||
"title": "Full text of decision Original language: English", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full Text PDF
(can match on common title patterns like "Full text of decision"). Language should go in language
field on item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This database provides both the original language PDF and translations (most of the time). The name of the PDF files comes from the database itself. How should the files be "canonically" named in Zotero when there are multiple?
CISG-Online is (chiefly) a database of cases related to the UN Convention on United Nations Convention on Contracts for the International Sale of Goods. This translator allows Zotero to save cases found on there.