-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of explanation of difference between data that describes tasks #478
Comments
Hey @Make42! My mental mapping is as follows:
With these in mind, for your questions:
The ones that appear in the
I agree that The
I think this is a great suggestion, maybe @ainoam can comment on this? |
That's an excellent summary @idantene. I will add that
Tags are not the only query arguments (see |
@idantene, @ainoam: Thank you so much, you helped me out a lot! Based on your content, I would like to improve on my suggestions. Maybe it is helpful. I put quite a bit of thought into it, but it probably is not yet perfect. Queries I did not mean, that one should be able to query every field in the entire Web UI. I wanted to suggest that one could query the field in CONFIGURATION additionally to tags. Renaming Suggestion for Tasks Considering, what @idantene, I would suggest to name them as you have described them.
Renaming Suggestion for Models
The model configuration does not have to relate to neural networks. However, I would in fact suggest to simply have the same scheme as for tasks. |
Thanks for summarizing @Make42 :)
Querying Hyperparameters does indeed make a lot of sense - We should definitely make it available in a near version. Task.get_tasks(project_name='my project', task_filter = {'hyperparams.Args.batch_size.value': '64', '_allow_extra_fields_': True})` Perhaps better open this as a dedicated issue in the package repo?
Much appreciate the time you took to consider these - We'll take these under advisement, though I'm not sure all terms might make it through. |
ConfigurationConfiguration or Parameters suggests that this is "input". However, current, I am reporting statistics about the task in "Hyperparameters". "Properties" on the other hand suggests something like statistics about the task, something that results from the way the task is set up or something that is part of the nature of the task. I suggested "Attributes", because it (kind of) entails both input ("configuration", "parameters", etc.) and resulting information (statistics, properties, etc.). "Metadata" is a type of "resulting information" as well, I think. User PropertiesHow about "Descriptors" or "Annotations"? Everything in "Attributes" / "Configuration" is "from the user", but with "User Properties" you mean something that can be changed even after it is final, so it should not be a real property of the task or a configuration or something like that. However, a description or annotation might be changed later. Configuration ObjectsWhat are "configuration objects" then? They are blobs of data right? If so, "object" (as in "object store") would be right, so "attribute object" might actually be good. One might store not only configuration in them that is why the change might be sensible. |
Chiming in, hope that's fine with you 😉 Per @Make42's suggestions, here's my two cents on these (plus explanations, in the hope that it yields a fruitful discussion): tl;dr: I think
|
@idantene, sure chime away :-D. Let me respond in the same spirit. I am going to be rather matter-of-factly, I hope you can still read it to be in a friendly spirit. And because this kind of terminology stuff is part of my academic research, I might be a bit picky.
|
I very much agree with the majority of your comments there 👍🏻 (and also highlighting that I'm not a ClearML member in anyway - I'm mostly sharing your frustration with this terminology issue). My last thoughts on this before chiming out 😁: One thing I perhaps disagree with, is that you use The input/output data is available to varying degree - inputs include the Finally, I completely agree about the concept of iterations generally in ML. My issue with it (in ClearML) is that it is of no practical use in many cases, and making it a mandatory argument (and forcing it to be integer as well) is extremely annoying IMO. |
@idantene: Regarding "going with ClearML's paradigm". I would love to, but I do not understand the paradigm. That is what my original question here was about. Your last post cleared the fog a bit more. So, how do you save statistics/output, "going with the indended approach"?
is so true! |
We also discussed this on Slack, but in case anyone follows up on this - We save any statistics/outputs using the One more side note to the ClearML team (@ainoam) - it is quite weird that a dataframe is under the "Plots" tab. I would say it's more of "Scalars" -- could just be another naming issue though. I believe the original "Scalars" were meant to be used with iterations (on the X axis), and anything else was "Plots". If the concept of iterations is dropped (as well it should!), then DataFrames definitely belong in the "Scalars" tab, and "Debug Samples" can fit nicely in the "Plots". |
I have thought some more about the whole topic. I think either one should stick to one of two approaches:
ClearML follows both approaches simultaneously it seems, which leads to considerable confusion:
|
It is unclear what the differences are conceptually between the following things.
In the web app there are
task.connect
ortask.connect_configuration
task.set_user_properties
The hyperparameters are more convenient to set and are more flexible than the user properties programmatically.
Conceptually they are all data describing the task, but it is not clear to me what the differences conceptually are.
Also, it seems to me that the hyperparameters seem to be able to do anything the user properties can do, so what is the point of the user properties?
The main advantage of user properties over hyperparameters seems to be that they can be changed after the task finished; but, tags can also be changed after the task finished and tags can be query arguments when searching for task.
However, tags are effectively only boolean (either a task has a tag or it does not), while user properties have a value, e.g., explicit boolean, integer, float, string.
The tabs "execution" and "info" both seem to contain information that I can not set, but that are extracted by ClearML automatically from running the task. But what is the reasoning behind those two different tabs?
Everything here is "information", so the name "info" for the tab is not very descriptive.
Here is my interpretation:
Assuming my interpretation is correct, this raises a couple of questions for me:
Maybe all of this can be documented in more detail.
If someone explains those things here, I am happy to make a pull request.
The text was updated successfully, but these errors were encountered: