Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene -- causes -> phenotype edge from uniprot? #415

Open
amykglen opened this issue Oct 3, 2024 · 4 comments
Open

gene -- causes -> phenotype edge from uniprot? #415

amykglen opened this issue Oct 3, 2024 · 4 comments
Labels
question Further information is requested

Comments

@amykglen
Copy link
Member

amykglen commented Oct 3, 2024

Sierra was a bit concerned at today's data modeling call to see that the KG2 API returns a uniprot 'causes' edge for the query NCBIGene:351 (APP) -- related_to --> HP:0100543 (Cognitive impairment)

I traced this to the edge with this ID in KG2.10.0pre:

UniProtKB:P05067---biolink:causes---None---None---None---OMIM:104300---identifiers_org_registry:uniprot

(which was returned from the KG2 CI API for this query due to concept subclass reasoning)

does anything seem fishy here? or do we expect to have this sort of edge from uniprot?

@edeutsch
Copy link
Contributor

edeutsch commented Oct 3, 2024

In principle UniProtKB does contain such information:
https://www.uniprot.org/uniprotkb/P05067/entry#disease_variants

@amykglen amykglen added the question Further information is requested label Oct 17, 2024
@saramsey
Copy link
Member

See this code:

if 'disease' in record_dict:
for disease_rec in record_dict['disease']:
mims = REGEX_MIM.findall(disease_rec)
for m in mims:
mp = REGEX_PUBLICATIONS.findall(disease_rec)
pubs = [fix_publications(pub) for pub in mp]
e = kg2_util.make_edge_biolink(curie_id,
kg2_util.CURIE_PREFIX_OMIM + ':' + m,
kg2_util.EDGE_LABEL_BIOLINK_CAUSES,
UNIPROTKB_PROVIDED_BY_CURIE_ID,
update_date)

@saramsey
Copy link
Member

OK to close this issue, @amykglen ?

@amykglen
Copy link
Member Author

yes, thank you - I suppose we should let Sierra know:

@sierra-moxon - just letting you know that we investigated this UniProt 'causes' edge in KG2 that you brought up on the Data Modeling call in October, and everything looks ok to us. basically the raw form of the edge being returned was UniProtKB:P05067--causes-->OMIM:104300 (Alzheimer's disease), which is actual info that UniProt contains (see above comments). and this edge was returned for the original query of NCBIGene:351 (APP) -- related_to --> HP:0100543 (Cognitive impairment) since UniProtKB:P05067 is a synonym for NCBIGene:351, and Alzheimer's disease is a descendant concept of Cognitive impairment (at least according to our concept subclass reasoning, which is based on MONDO and HP subclass_of edges).

also interestingly, in CI, this causes edge is no longer returned - the only 'causes' edges now linking NCBIGene:351 and HP:0100543 are from SemmedDB - perhaps due to changes in the underlying KG2 graph, node normalization, or other parts of the system..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants