Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache latest version per dataset #528

Merged
merged 6 commits into from
May 28, 2024
Merged

Cache latest version per dataset #528

merged 6 commits into from
May 28, 2024

Conversation

dmannarino
Copy link
Member

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

  • get_latest_version is called A LOT, and every time it's called it makes a DB call.

Issue Number: N/A

What is the new behavior?

  • Cache the latest version value to reduce DB calls. Invalidate cache in new and update version functions, but set a TTL of 1 hour in case I missed something.

Does this introduce a breaking change?

  • Yes
  • No

Other information

@dmannarino dmannarino changed the base branch from master to develop May 24, 2024 23:23
@codecov-commenter
Copy link

codecov-commenter commented May 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.72%. Comparing base (3cb83d7) to head (e66fefb).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #528      +/-   ##
===========================================
+ Coverage    81.70%   81.72%   +0.01%     
===========================================
  Files          125      125              
  Lines         5615     5619       +4     
===========================================
+ Hits          4588     4592       +4     
  Misses        1027     1027              
Flag Coverage Δ
unittests 81.72% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -100,6 +99,18 @@ async def create_version(dataset: str, version: str, **data) -> ORMVersion:
)
new_version.metadata = metadata

# FIXME: Should we really allow one to specify a new version as
# latest on creation? I thought we didn't. Seems like it will cause
# requests to temporarily go to a perhaps incompletely-imported
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at all the callers, it is only used for testing (see test_versions.py::test_latest_versions). So, how about you remove the FIXME but add to the function comment that data.is_latest is only allowed to be true for testing. (You;ll see is_latest is not a valid arg in the version-create route.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay, thanks for figuring that out! Will add the note.

# f"Setting version {version} to latest for dataset {dataset}. "
# f"Cache info: {get_latest_version.cache_info()}"
# )
_: bool = get_latest_version.cache_invalidate(dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it might make sense to put the cache_invalidate call in _reset_is_latest, so we don't forget to call it when we call reset_is_latest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll do that. I was on the fence about it, but now that you say it, that seems better.

@@ -155,6 +171,12 @@ async def _update_is_downloadable(


async def _reset_is_latest(dataset: str, version: str) -> None:
"""Set is_latest to False for all other versions of a dataset."""
# TODO: Should we make sure the version is valid to avoid setting nothing
# to latest? Or is being able to do that a desired feature?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this TODO, and just add a line to the header comment that says it is recommended to set is_latest flag for version before calling this method to reset is_latest for all other versions [since that is what is done in update_version].

@dmannarino dmannarino merged commit a10fa4b into develop May 28, 2024
2 checks passed
@dmannarino dmannarino deleted the cache_latest branch May 28, 2024 19:27
@manukala6 manukala6 mentioned this pull request Jul 1, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants