Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: convert GAZ to an instance-based representation #20

Open
cmungall opened this issue Apr 29, 2019 · 18 comments
Open

Proposal: convert GAZ to an instance-based representation #20

cmungall opened this issue Apr 29, 2019 · 18 comments
Labels
EnvO assignment Assignment of GAZ entities to ENVO classes question questions or discussion items (comnsider splitting) technical Anything regarding the build/release pipeline or requiring dev help

Comments

@cmungall
Copy link
Member

Currently GAZ is represented as all classes. E.g. Andorra is a class. Clearly this is not correct, but it's well-known that the reason for this choice is purely pragmatic - Michael would edit in Obo-Edit, which only handles classes.

I propose the following:

  • gaz.owl is converted to an instance-based representation
  • gaz.obo is converted to classes in the release process (instances are not well-supported in .obo format)

Any consumers of gaz.obo (which includes a number of relational database systems) should be unaffected, provided the conversion does not lose anything.

We would also need to check with all consumers of gaz.owl to make sure this does not break anything.

I think the OLS display would break somewhat, but we can just point OLS at gaz.obo

This will need considerable planning.

  • Will editing instances in Protege be as easy as editing classes? The current representation of GAZ is not just geared towards OE, it's an easy structure to browse (especially now Protege supports existential hierarchies directly)
  • We will need at least preliminary rdf:type / ClassAssertion axioms, I had a conversion at some point, will look this up...
@cmungall
Copy link
Member Author

@beckyjackson
Copy link
Collaborator

Interestingly, a lot of the entities in GAZ are both classes and instances, which gets quite weird. The 'located in' axioms are on the instances. There are some subclass axioms, I believe mostly under populated places.

Anyway, I say +1 to moving entirely to instances. We could run SPARQL UPDATE to convert? I haven't thought much about that though.

@cmungall
Copy link
Member Author

UPDATE I just remembered gaz.owl is already instance-based, using a transformation I made possibly a decade ago...

$ curl -L -s http://purl.obolibrary.org/obo/GAZ_00052098 
...
    <!-- http://purl.obolibrary.org/obo/GAZ_00004906 -->

    <NamedIndividual rdf:about="http://purl.obolibrary.org/obo/GAZ_00004906">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Dundee City</rdfs:label>
    </NamedIndividual>
    


    <!-- http://purl.obolibrary.org/obo/GAZ_00052098 -->

    <NamedIndividual rdf:about="http://purl.obolibrary.org/obo/GAZ_00052098">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Dundee</rdfs:label>
        <ns2:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A populated place. A city on the E coast of Scotland.</ns2:IAO_0000115>
        <obo:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ</obo:hasOBONamespace>
        <obo:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ:00052098</obo:id>
        <ns2:RO_0001025 rdf:resource="http://purl.obolibrary.org/obo/GAZ_00004906"/>
    </NamedIndividual>

Also sometimes punning is induced:

   <!-- http://purl.obolibrary.org/obo/GAZ_00002561 -->

    <Class rdf:about="http://purl.obolibrary.org/obo/GAZ_00002561">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Province (Canada)</rdfs:label>
        <obo:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ</obo:hasOBONamespace>
        <obo:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ:00002561</obo:id>
        <ns2:RO_0001025 rdf:resource="http://purl.obolibrary.org/obo/GAZ_00002560"/>
    </Class>
    


    <!-- 
    ///////////////////////////////////////////////////////////////////////////////////////
    //
    // Individuals
    //
    ///////////////////////////////////////////////////////////////////////////////////////
     -->

    


    <!-- http://purl.obolibrary.org/obo/GAZ_00002560 -->

    <NamedIndividual rdf:about="http://purl.obolibrary.org/obo/GAZ_00002560">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Canada</rdfs:label>
    </NamedIndividual>
    


    <!-- http://purl.obolibrary.org/obo/GAZ_00002561 -->

    <NamedIndividual rdf:about="http://purl.obolibrary.org/obo/GAZ_00002561">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Province (Canada)</rdfs:label>
        <obo:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ</obo:hasOBONamespace>
        <obo:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ:00002561</obo:id>
        <ns2:RO_0001025 rdf:resource="http://purl.obolibrary.org/obo/GAZ_00002560"/>
    </NamedIndividual>

Looks like OLS loads the obo

OntoBee loads the OWL, however, it doesn't show instance-instance connections, so things look sparse:

http://purl.obolibrary.org/obo/GAZ_00004906

@cmungall
Copy link
Member Author

cmungall commented Apr 29, 2019

My comment crossed with Becky's who noticed the weirdness

It's not a completely straightforward transform, since there are some class-like things in GAZ:

https://www.ebi.ac.uk/ols/ontologies/gaz/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGAZ_00051071

https://www.ebi.ac.uk/ols/ontologies/gaz/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGAZ_00000869

image

However, it may be most straightforward to treat everything as instances, even if the name is class-like. It could be framed as a mereological sum. E.g. "Neotropical region" = population of all such regions, subClassOf => part-of

@cmungall cmungall added question questions or discussion items (comnsider splitting) technical Anything regarding the build/release pipeline or requiring dev help labels Apr 29, 2019
@pbuttigieg
Copy link
Member

The biogeographic regions are pertinent to some classes we've been incubating in ENVO. See EnvironmentOntology/envo#658

I'll create a new issue about that, but what's relevant here is that, in general, ENVO should provide the classes for GAZ instances. @rctauber feel free to request such classes if they're missing.

@cmungall cmungall added the EnvO assignment Assignment of GAZ entities to ENVO classes label May 9, 2019
@lschriml
Copy link
Collaborator

lschriml commented Jun 6, 2019

For modularization: using GAZ.owl
-- Identify classes in the GAZ - auto create definitions for instances
-- will put this on hold for now.

@beckyjackson
Copy link
Collaborator

We have the class 'populated place, X' and then we have subclasses of that populated place. For example, 'populated place, Yemen' is both a class and an instance. It has the object property assertion 'located in' Yemen.

All populated places are subclasses of 'populated place, X' AND instances with "located in" object property assertions (but they do not have class assertions). Here, I think it would make more sense to have these actual places be instances of 'populated place' that are located in X (using object property assertions). There would be no more 'populated place, X' classes.

Governorates, municipalities, regions, rivers... all of these follow this pattern. In this representation, each country becomes only an instance of country, instead of both a class and an instance.

That said, an instance-based representation is not easy to browse and edit using Protege. One solution is to switch completely to template-based development. Another question is - how will this be displayed on OLS? Can we somehow have an instance-based browser that follows the located_in hierarchy or will we need to release a version of GAZ that converts all the instances to classes?

@ddooley
Copy link
Collaborator

ddooley commented Aug 1, 2019

The protege folks (Matthew Horridge) might have a suggestion or be open to changes for improving instance data views, and navigation via 'located in'. Web protege 5.5 apparently has the facility for navigating via other object relations besides is-a. (Display relationships section in http://protegeproject.github.io/protege/views/class-hierarchy/ ). I can picture one extra toggle allowing instances to be listed under each class using object properties. Protege seems to handle long subclass lists no problem so a long list of instances might be easy to render too.

@lschriml
Copy link
Collaborator

lschriml commented Aug 1, 2019 via email

@lschriml
Copy link
Collaborator

lschriml commented Aug 1, 2019 via email

@beckyjackson
Copy link
Collaborator

beckyjackson commented Aug 1, 2019

In my opinion, it makes more sense for these to be instances because they are specific countries, cities, etc. There's not multiple instances of the city of Paris that is located in France. That said, I'm sure there's an argument for ease-of-use with a class-based representation. Either way, each entity should be one or the other. The fact that most, if not all, of the entities in GAZ are both instances and classes is making it very difficult to run some ROBOT commands.

@cmungall talked about changing everything to instances:

However, it may be most straightforward to treat everything as instances, even if the name is class-like. It could be framed as a mereological sum. E.g. "Neotropical region" = population of all such regions, subClassOf => part-of

Any of the proper nouns (like Neotropical region) are talking about specific things. I like saying that the 'Central Andes bioregion' is part of the 'Neotropical region' and that would be very easy to convert from subclass axioms to 'part of' axioms. I don't think terms like 'biogeographic realm' would be instances, though.

image

All of the current subclasses of 'biogeographic realm' would become instances of 'biogeographic realm' instead.

@lschriml
Copy link
Collaborator

lschriml commented Aug 1, 2019 via email

@cmungall
Copy link
Member Author

cmungall commented Aug 1, 2019

@rctauber - that plan sounds good. Note that decomposition of pre-composed terms will have some impact on hierarchical browsing, regardless of questions of class vs instance..

Quick summary so far

  • gaz is already mostly instance-based, but there is some punning, which is what this ticket is mostly about, Becky's plan will get rid of this
  • keep as instance-based has multiple advantages, mostly being the correct way to do it as Becky says, and it will be easier to work with using most modern tools
  • for legacy apps we can maintain an obo format release that converts back to Terms/classes
  • we can also use this for any browser that expects classes, e.g. OLS
  • questions remain about edit workflow

@ddooley
Copy link
Collaborator

ddooley commented Aug 2, 2019

I popped the protege class-instance navigation question on protegeproject/protege#912 .

@lschriml
Copy link
Collaborator

lschriml commented Aug 2, 2019 via email

@ddooley
Copy link
Collaborator

ddooley commented Aug 13, 2019

Really basic question: All geopolitical gaz entities are ultimately sites? Each site has a context and fiat boundary. (It may be moving only to the extent that redistricting and plate techtonics allow ;) ).

@cmungall
Copy link
Member Author

cmungall commented Aug 7, 2020

I strongly recommend against modeling anything as immaterial entities. This has multiple negative consequences. Certainly the Atlantic Ocean, London, Mt Everest, etc are material. Given the existing part-of and overlaps axioms in GAZ already, OWL inconsistencies will immediately arise if we have a mix of immaterial and material entities.

@ddooley
Copy link
Collaborator

ddooley commented Aug 7, 2020

I've always struggled with concept of "site". Seems useful for things like "lung cavity". It also seemed appropriate for things whose boundaries were arbitrary (politically defined), rather than inherent. But I see wisdom of avoiding material and non-material stuff together. Perhaps simply, "a geopolitical reference to some region at any time t is a reference to a material entity as defined politically at that time."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EnvO assignment Assignment of GAZ entities to ENVO classes question questions or discussion items (comnsider splitting) technical Anything regarding the build/release pipeline or requiring dev help
Projects
None yet
Development

No branches or pull requests

5 participants