Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-run technical metadata for all media items #1441

Open
3 of 7 tasks
peetucket opened this issue Dec 16, 2024 · 1 comment
Open
3 of 7 tasks

Re-run technical metadata for all media items #1441

peetucket opened this issue Dec 16, 2024 · 1 comment
Assignees

Comments

@peetucket
Copy link
Member

peetucket commented Dec 16, 2024

After sul-dlss/technical-metadata-service#572 is merged, we should run all media items in batch back through technical metadata generation so the volume levels are available so we can correctly send items to speechToTextWF. Else the work in #1439 will prevent them from working correctly.

Blocked by sul-dlss/technical-metadata-service#572

  • Exported all 65k druids
  • Break into 10k chunks
  • Run each chunk
  • druids_1
  • druids_2
  • druids_3
  • druids_4
  • druids_5
  • druids_6
  • druids_7
@peetucket
Copy link
Member Author

peetucket commented Jan 7, 2025

from the technical-metadata-service readme, it sounds like we need a list of all media druids in a file called druids.txt on the server in the base of the rails app, then do something like this:

RAILS_ENV=production bundler exec rake techmd:generate_for_moab_list['true']
  1. Produce list of druids by filtering for media items and then running a report: https://argo.stanford.edu/report?f%5Bcontent_type_ssim%5D%5B%5D=media&f%5BobjectType_ssim%5D%5B%5D=item
  2. Download report as CSV. Open in excel, remove all columns except druid column. Remove header row. Save as txt file.
  3. Suggest doing in 10k batches by splitting into multiple txt files.
  4. Place txt files on the server. The one currently being run should be called druids.txt and be in the root of the tech md app. scp ~/Downloads/techmd/druids_1.txt dor-techmd-prod-a.stanford.edu:/opt/app/techmd/dor_techmd/current/druids.txt
  5. Start the rake task above, which will queue all the jobs. Suggest doing it in a screen to avoid getting disconnected mid-queuing. screen -S queue
  6. Monitor queues at https://dor-techmd-prod-a.stanford.edu/queues/busy
  7. Monitor HB at https://app.honeybadger.io/projects/68956/faults?q=-is%3Aresolved+-is%3Aignored&sort=last_seen_desc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants