Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add selector feature #13

Merged
merged 16 commits into from
May 17, 2024
Merged

Add selector feature #13

merged 16 commits into from
May 17, 2024

Conversation

mraspaud
Copy link
Member

This PR adds the selector feature and script for when multiple sources providing the same data are to be used.

@mraspaud mraspaud added the enhancement New feature or request label May 14, 2024
@mraspaud mraspaud self-assigned this May 14, 2024
Copy link

codecov bot commented May 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.72%. Comparing base (f15af05) to head (ba3b20b).
Report is 40 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #13      +/-   ##
==========================================
+ Coverage   95.00%   95.72%   +0.72%     
==========================================
  Files          10       11       +1     
  Lines         540      632      +92     
==========================================
+ Hits          513      605      +92     
  Misses         27       27              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


usage: pytroll-selector [-h] [-l LOG_CONFIG] config

Selects unique messages from multiple sources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a mention what defines "uniqueness"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in the help message and in the module docstring.

messages.

"""
with _running_redis_server(port=selector_config.get("port"), directory=selector_config.pop("directory", None)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an existing Redis instance be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it is necessary to implement that option in this PR, just curious.

Copy link
Member Author

@mraspaud mraspaud May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle we could add the feature, yes. But I thought it would be best to keep the instances separate, not really being familiar with redis yet and how that could affect an existing instance.

Comment on lines 124 to 126
key = msg.data["uid"]
try:
_ = sel[key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the uniqueness check should be a separate function for clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

key = msg.data["uid"]
try:
_ = sel[key]
logger.info(f"Discarded {str(msg)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(f"Discarded {str(msg)}")
logger.debug(f"Discarded {str(msg)}")

I think INFO is too high log level for discarding duplicates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



@contextmanager
def _running_redis_server(port=None, directory=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _running_redis_server(port=None, directory=None):
def _start_redis_server(port=None, directory=None):

To me running sounds like we are just connecting to an already running instance, not starting a new server process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +8 to +9
At the moment, this module makes use of redis as a refined dictionary for keeping track of the received files. Hence a
redis server instance will be started along with some of the functions here, and with the cli.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Redis database persisted? Where is it stored? If the database is not persisted, would a more lightweight solution (like dict) be an option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't explored all of redis' possibilities, so I'm using the defaults now. The directory where the data is persisted can be defined through the directory parameter in the selector config (see docstrings).
I have looked for alternatives such as a dict with ttl for items but I haven't found anything that would work really.

@mraspaud mraspaud merged commit 0a4c946 into pytroll:main May 17, 2024
4 checks passed
@mraspaud mraspaud deleted the feature-selector branch May 17, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants