Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Injest enrichment into K10plus #10

Open
nichtich opened this issue Dec 2, 2024 · 2 comments
Open

Injest enrichment into K10plus #10

nichtich opened this issue Dec 2, 2024 · 2 comments

Comments

@nichtich
Copy link
Member

nichtich commented Dec 2, 2024

It turns out we can have no simple HTTP API to write PICA+ to K10plus. Instead the process will be as following, a little bit more complicated:

  • coli-rich-web gets an API endpoints enrichment
  • enrichment is called via POST with client function submitEnrichments. The API endpoint checks token and payload. On success it
    • generates a UUID
    • writes the enrichment to an internal queue
    • returns HTTP Statuc code 201 "Created" and Location header enrich/$UUID
  • GET enrichment/$UUID for existing UUID in the queue return the enrichment payload and its creation timestamp as HTTP header
  • GET enrichment returns a list of entries in the queue (plain text list of URLs followed by space, followed by timestamp)
  • DELETE enrichment/$UUID for existing UUID is only allowed from localhost

This way a script at K10plus server (called every minute or more frequently) can query for new enrichments to be injested. Purging of enrichment queue is not taks of this script.

This is enough to start with a first version. For full production we need a method to purge the queue of enrichments. This can be done by a cronjob calling a script from coli-rich-web directory to cycle through enrichments, whether they have been applied (retrieve PICA record via PPN and check contents), and delete if done so.

@stefandesu
Copy link
Member

Should there be a check whether the same enrichment is already in the queue? Or should we simply hope that people won't get impatient? 😅

@stefandesu
Copy link
Member

Implemented!

POST /enrichment

  • Takes PICA patch data as body
  • Creates a hash of that body as an identifier (instead of using a UUID)
  • Writes body to file system with identifier as filename
  • Returns HTTP status code 201 and full URI as Location header
  • Also returns a JSON body which is currently unused

GET /enrichment/:id

  • Tries to read id from file system
  • If successful, it returns the file's content as body, with the file's creation date as Date header
  • If unsuccessful, it returns a HTTP status code 404 (currently no further differentiation of errors)

GET /enrichment

  • Reads all written enrichment files
  • Returns as text a newline delimited list of enrichment URIs and creation timestamps (in UTC, I hope that's okay)

No DELETE /enrichment/:id at this point. It's open whether files will be manually deleted using something like a cronjob that checks whether enrichments were actually written into the catalogue. Currently, enrichments will pile up, but it shouldn't be an issue in the beginning.

I also adjusted the frontend so that success and error messages are displayed in a better way. You can test it here: https://coli-conc.gbv.de/coli-rich/dev/ / https://coli-conc.gbv.de/coli-rich/dev/enrichment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants