Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table and standardID management #49

Open
Bonnarel opened this issue Mar 4, 2024 · 6 comments
Open

table and standardID management #49

Bonnarel opened this issue Mar 4, 2024 · 6 comments

Comments

@Bonnarel
Copy link
Collaborator

Bonnarel commented Mar 4, 2024

Discussion started after 2023-11-98 release

@Bonnarel
Copy link
Collaborator Author

Bonnarel commented Mar 4, 2024

[The ObsCore extension for radio (including or not visibility data) described above SHOULD be added to the main ObsCore table.

This is painful for data centres that have only a few radio items and millions of non-radio items (such as me). In effect, you'd be forcing me to do a massive denormalisation of my data. And yes, people shouldn't do SELECT *, but they do, and then they have all these extra columns with NULLs in them for no benefit at all, in particular because you acknowledge it'd be a joined view anyway.

On the other hand, if there were some overriding reason to have the columns in the main obscore table, then don't make it SHOULD. If this is supposed to work at all, it must be a MUST -- software doing radio obscore queries has to be able to rely on it if it's necessary for some of your usecases.

But as I said: I don't believe it's necessary, and then we shouldn't do it, neither as SHOULD or MUST (I take the liberty of citing my own https://blog.g-vo.org/requirements-and-validators.html in support of this argument).

Doc : In practice a table containing only the extension attributes MAY be added to the same schema.

That's where I believe the MUST needs to sit. That way, your sample queries will have a JOIN (my advice based on RegTAP experience: make it a NATURAL JOIN and leave it to the implementors what the join columns actually are). Side benefit discovery rules are a lot simpler.

You'd just be saying:

Register the obscore extension as a VODataService CatalogService with a tableset only containing the ivoa.obsradio [or whatever; feel free to suggest a different name. I'd also be open to loosening naming schemes for ivoa.obscore, but I don't think the radio extension is the place to start that discussion] table. Assign a table utype of ivo://ivoa.net/std/obscore#radioext-1.0 to that table. For later extensibility, discover radio-extended obscore tables using this utype (rather than the table name). In RegTAP, you would find TAP services having radio extensions like this:

SELECT access_url, table_name FROM rr.capability NATURAL JOIN rr.interface NATURAL JOIN rr.res_table WHERE standard_id='ivo://ivoa.net/std/tap' AND table_utype like 'ivo://ivoa.net/std/obscore#radioext-1.%'

I'd do a PR for that if you don't flame me too hard.

Also, if someone gives me radio data needing these extra columns, I'm happy to do a reference implementation.

-- MarkusDemleitner 2023-11-09

@BaptisteCecconi
Copy link
Contributor

I understand your concern as a data centre provider, and also as the developer of DaCHS. But on the point of view of the user, I'm not sure having to do a join is really a simple and appealing solution, at least at the CatalogService level. At server level (with ivoa.obscore and ivoa.obsradio tables), that may be reasonable.

I'm thinking of how we do it in EPNcore, where we don't have the global ivoa.obscore table at server level. In our case, EPNcore tables include extra columns (some from extensions, some are specific to each table).

In VESPA, we have not worked out (yet) the multiple typing of the table as in EPNcore + extensions. In VODataService/CatalogService, it is not an issue since the cardinality of the utype child of vs:Table is minOccurs="0", so there can be several utype's per table. However, I'm not sure how this will behave in RegTAP...

So one other option could be to have all columns in a table of a CatalogService (including as many extensions as necessary), and building the ivoa.obscore selecting only the relevant columns. The ivoa.obsradio, ivoa.obstimeseries, etc, could contain a subset of the columns too, relevant of that extension, and requiring the JOIN operation.

@msdemlei
Copy link
Contributor

It's hard to speculate what the eventual consumers will or will not like, but I'm rather convinced that "write 'NATURAL JOIN ivoa.obs_radio' if you want to get and/or constrain radio columns" is about the simplest thing we can give to people.

And I'm sure that hiding the join is something that will lead to no end of confusion; sure, the nice thing about relational algebra is that it's an algebra, but in practice, abstractions do leak.

On the other hand, establishing "write NATURAL JOIN x to get more columns" sounds like something that can scale very well (though I will give you that when there's 1:n relationships coming into the game, that abstraction will leak, too).

@Bonnarel
Copy link
Collaborator Author

[The ObsCore extension for radio (including or not visibility data) described above SHOULD be added to the main ObsCore table.

This is painful for data centres that have only a few radio items and millions of non-radio items (such as me). In effect, you'd be forcing me to do a massive denormalisation of my data. And yes, people shouldn't do SELECT *, but they do, and then they have all these extra columns with NULLs in them for no benefit at all, in particular because you acknowledge it'd be a joined view anyway.

On the other hand, if there were some overriding reason to have the columns in the main obscore table, then don't make it SHOULD. If this is supposed to work at all, it must be a MUST -- software doing radio obscore queries has to be able to rely on it if it's necessary for some of your usecases.

But as I said: I don't believe it's necessary, and then we shouldn't do it, neither as SHOULD or MUST (I take the liberty of citing my own https://blog.g-vo.org/requirements-and-validators.html in support of this argument).

I think there was a misunderstanding and I shoud rephrase the document (see your PR later today). I'm not proposing to have the radio extension fields in the main obscore table but to have two different tables : obscore only with appropriate standardID and ObsCore+"extension fields" as a secondtable with a different standardID. Al other things (exposing or not the extension fields only in a specific view. Working with views, etc...) are implementation details.

@Bonnarel
Copy link
Collaborator Author

Bonnarel commented Mar 25, 2024

I agree with all you wrote Baptiste

So one other option could be to have all columns in a table of a CatalogService (including as many extensions as necessary), and building the ivoa.obscore selecting only the relevant columns. The ivoa.obsradio, ivoa.obstimeseries, etc, could contain a subset of the columns too, relevant of that extension, and requiring the JOIN operation.

So there will be the full table with all extensions, another one with the basic Obscore only, and then if needed the variousn extenion tables ? But do we really have multi utypes on the full table in RegTAP ? or should we define a single combined standardID for a single table utype ?

@msdemlei
Copy link
Contributor

msdemlei commented Mar 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants