Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Ensure that we can run the db ingestion pipeline safely without --import-everything #1389

Open
jgadling opened this issue Dec 9, 2024 · 0 comments
Assignees
Labels
backend bug Something isn't working

Comments

@jgadling
Copy link
Contributor

jgadling commented Dec 9, 2024

Right now the DB ingestion workflow runs 3 steps:

  1. Ingest data into the v1 db
  2. Copy data from the v1 db to the v2 db -- we do this primarily because we need to keep ID's in sync between the old & new API
  3. Ingest data into the v2 db

However, step 2 indiscriminately copies all relevant data from the v1 db to the v2 db. There are several functionalities that the v2 db supports that the v1 db does not, so this copy is imperfect, and can lead to errors in the v2 data. WHEN we run the v2 import fully after this db copy, it's fairly safe to use this workflow, but this can be somewhat slow and costly to do when we only want to update a few fields.

I think the best solution here is to update the copy script (scrape.py) to accept most/all of the same flags the the ingestion scripts do, so we never wind up with stale data in the v2 db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants