Bug: Ensure that we can run the db ingestion pipeline safely without `--import-everything` #1389

jgadling · 2024-12-09T23:29:50Z

Right now the DB ingestion workflow runs 3 steps:

Ingest data into the v1 db
Copy data from the v1 db to the v2 db -- we do this primarily because we need to keep ID's in sync between the old & new API
Ingest data into the v2 db

However, step 2 indiscriminately copies all relevant data from the v1 db to the v2 db. There are several functionalities that the v2 db supports that the v1 db does not, so this copy is imperfect, and can lead to errors in the v2 data. WHEN we run the v2 import fully after this db copy, it's fairly safe to use this workflow, but this can be somewhat slow and costly to do when we only want to update a few fields.

I think the best solution here is to update the copy script (scrape.py) to accept most/all of the same flags the the ingestion scripts do, so we never wind up with stale data in the v2 db.

The text was updated successfully, but these errors were encountered:

jgadling added bug Something isn't working backend labels Dec 9, 2024

manasaV3 mentioned this issue Dec 23, 2024

fix: scrape script overwriting existing v2 data chanzuckerberg/cryoet-data-portal-backend#390

Merged

manasaV3 self-assigned this Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Ensure that we can run the db ingestion pipeline safely without `--import-everything` #1389

Bug: Ensure that we can run the db ingestion pipeline safely without `--import-everything` #1389

jgadling commented Dec 9, 2024

Bug: Ensure that we can run the db ingestion pipeline safely without --import-everything #1389

Bug: Ensure that we can run the db ingestion pipeline safely without --import-everything #1389

Comments

jgadling commented Dec 9, 2024

Bug: Ensure that we can run the db ingestion pipeline safely without `--import-everything` #1389

Bug: Ensure that we can run the db ingestion pipeline safely without `--import-everything` #1389