Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex books that can contain other books #12

Open
glenrobson opened this issue Sep 13, 2023 · 11 comments
Open

Complex books that can contain other books #12

glenrobson opened this issue Sep 13, 2023 · 11 comments
Assignees
Labels
discuss Issues to flag for discussion enhancement New feature or request
Milestone

Comments

@glenrobson
Copy link
Collaborator

As noted on the IIIF call this currently isn't working:

https://archive.org/details/BollettiniEcn/

Seems to only show the first item:

https://iiif.archive.org/iiif/BollettiniEcn/manifest.json

@digitaldogsbody
Copy link
Collaborator

Complex item that also has things in subdirectories: https://iiif.archive.org/iiif/mareful/manifest.json

Mike to write the subdirectory thing up in more detail here

@digitaldogsbody
Copy link
Collaborator

digitaldogsbody commented Nov 2, 2023

So for the item above, there are multiple jp2 zipfiles both at the top level, and also in a subdirectory:

Root level: https://archive.org/download/mareful/Maariful-Quran-01%28Almodina.com%29_jp2.zip/
Subdir (new): https://archive.org/download/mareful/new/mareful-quran-01-new-edition_jp2.zip/

Cantaloupe is able to provide images (and info.json etc) for all of these, you just need to encode the directory path after the item name:
Root level: https://iiif.archive.org/image/iiif/3/mareful%2fMaariful-Quran-03(Almodina.com)_jp2.zip%2FMaariful-Quran-03(Almodina.com)_jp2%2fMaariful-Quran-03(Almodina.com)_0013.jp2/info.json
Subdir: https://iiif.archive.org/image/iiif/3/mareful%2fnew%2Fmareful-quran-08-new-edition_jp2.zip%2Fmareful-quran-08-new-edition_jp2%2Fmareful-quran-08-new-edition_0921.jp2/info.json

So in theory it's not a problem for us to get these items into a manifest, but probably a question of approach.
My initial thinking is that the logic for triggering is something like: if mediatype == 'texts' and count_of_originals > 1, and that we could consider returning a Collection instead of a manifest, with one manifest for each item. However, this means we will need to adapt the manifest generation code to handle multiple manifests per item, which might be awkward.

The other option could be to have the same triggering logic, and instead produce a single manifest with every page (for this item it will be monstrously big!!) and add Range objects for each individual item within.

Thoughts very much welcome!

@digitaldogsbody digitaldogsbody added the discuss Issues to flag for discussion label Nov 2, 2023
@glenrobson glenrobson changed the title Compex books that can contain other books Complex books that can contain other books Jan 18, 2024
@glenrobson
Copy link
Collaborator Author

This seems like a collection. Will need to sort something as it will be /manifest.json, maybe we could forward to collection.json.

@glenrobson glenrobson added this to the Spring 2024 sprint milestone Feb 8, 2024
@glenrobson
Copy link
Collaborator Author

Maybe add it to isCollection code.

@glenrobson
Copy link
Collaborator Author

Discussed this today and Sara and Ben convinced Glen that it should be a single manifest with a structure because there is no metadata for each individual item metadata.

The test to see if this falls into this use case, is if there are multiple image zip files then include all the images in the manifest and a range per zip file.

@digitaldogsbody
Copy link
Collaborator

I think we can duplicate the metadata. The example item I am using (mareful) has 14,611 images across the various JP2 zipfiles. Do we really want to make a manifest with that many canvases? Will it work in any viewers without exploding?

@glenrobson
Copy link
Collaborator Author

Due to scale back to collections again. Copy all metadata use filename as label. Work on the seeAlso and rendering to only link to relevant files.

@glenrobson
Copy link
Collaborator Author

@saracarl
Copy link
Collaborator

We should also test this with the following item (of multiple pdfs and jpgs):
https://archive.org/details/st-anthony-relics-01/

@benwbrum
Copy link
Collaborator

Looking at mareful,

  1. Test to see whether there is more than one original PDF in the item
  2. If there is, display a collection instead of an item manifest
  3. For each PDF, create an item manifest (copying the metadata from the collection) following the existing patterns.
  4. Add each item to the collection manifest

@benwbrum
Copy link
Collaborator

For the strange one at https://archive.org/metadata/st-anthony-relics-01/ all the Internet Archive does is display the PDF files; it ignores the images that were uploaded with the PDFs. We should do the same; treating it as a multi-PDF item and regarding the images as dross.

@glenrobson glenrobson modified the milestones: Autumn 2024, Spring 2025 Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues to flag for discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants