Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat pb 1160 harvest package data #57

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

msom
Copy link
Contributor

@msom msom commented Dec 18, 2024

This PR adds a stac_harvest management command to the distributions application to sync package distributions with service-stac.

The command uses the STAC collection ID as the package distribution slug and relies on the managed_by_stac flag to identify package distributions managed by service-stac.

The command ensures that for each STAC collection, a corresponding package distribution exists by:

  • Creating new package distributions if no package distribution matches the STAC collection.
  • Updating package distributions with a matching STAC collection. This will update the dataset and force the managed_by_stac flag to be set.
  • Removing package distributions with the managed_by_stac flag set but no matching STAC collection.

Assumptions and Behavior

  • Dataset Matching: A package distribution is created only if a dataset exists with a slug matching the STAC collection ID. If no such dataset exists, no package distribution will be created.
  • Provider Matching:
    • Each STAC collection must have exactly one provider, which should match the dataset’s provider (by its English name).
    • If a mismatch is detected, a warning is issued, but the package distribution is still created.
    • To suppress warnings for near-matching provider names (e.g., 'Federal Office for Agriculture - FOAG' vs. 'Federal Office for Agriculture FOAG'), a similarity ratio can be used.

Customization Options

  • --url: Allows to specify the STAC endpoint. By default, the command uses the service-stac endpoint (https://data.geo.admin.ch/api/stac/v0.9).
  • --clear: Deletes all package distributions previously marked as managed by service-stac.
  • --dry-run: Runs the command without making any changes, allowing you to preview the actions.

@msom msom requested a review from boecklic December 18, 2024 07:16
@msom
Copy link
Contributor Author

msom commented Dec 18, 2024

@boecklic I'm not sure if introducing pystac-client is kind of an overkill here (see 8ec3478).

@msom msom force-pushed the feat-PB-1160-harvest-package-data branch 2 times, most recently from c377ccf to 92d2a41 Compare December 19, 2024 13:02
@msom msom force-pushed the feat-PB-1160-harvest-package-data branch from 92d2a41 to 11ff3ba Compare December 19, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant