This is a list of real-world identifier issues encountered; it aims to be representative rather than exhaustive. This list could be used to
- Convince funders of the problem
- Provide a set of references for a paper or specification
- See what can be done to improve informatics/tooling around identifiers
We warmly welcome anyone to contribute.
Reported by | Reported about | Problems referenced | Problem category |
---|---|---|---|
EBI-Ontology Lookup Service (OLS) | various ontologies | underscore delimited vs colon-delimited forms, case sensitivity | search, delimiters |
Not clear | Darwin Core Triples | institutional code collisions amongst darwin core triples | collisions, institution identifiers |
PrefixCommons | NCBI | number of shortform and http URI permutations found in the wild for a single identifier in NCBI gene | data integration, text mining |
General (wikipedia entry) | Web-at-large | 17 different ways in which URLs could be determined to be equivalent; some of these are lossy | data integration |
biostars | HGNC | Mapping between similar entities across databases | mapping |
Human Phenotype Ontology | OMIM | Prefix heterogeneity OMIM vs MIM. Have to build special processors to collapse them | prefix variation, data integration |
Monarch Initiative | TAIR | TAIR prefix variation difficult to resolve | type-specificity |
Stian | EU grants | No obvious documentation for permalinks in EU grants, nor any correlation between destination URL and project ID | documentation |
H pylori paper | HP Protein identifiers | Naming problems that result from embedded meaning in identifiers and evolving scientific knowledge. | Embedded meaning |
PrefixCommons | HGNC | co-occuring identifier complexities in HGNC (multiple entity types, multiple identifier types, prefixed/unprefixed versions, type-specific URLs without type-specific determinism in local IDs) | type-specificity |
WebProNews | EBAY | need for location-independent ids | data integration |
PrefixCommons | ZENODO | No rollup to impact for all DOI versions | DOI versions |
Monarch Initiative | Monarch's ingest of FlyBase | Faulty ingest process resulted in fly and human genes being considered equivalents instead of orthologs. | Data integration |
Monarch Initiative | EBI-OLS | Tricky to support searches of identifiers because of standard query-parsing behavior of solr. | Data applications |
Ziemann et al | Several journals | Gene name corruption in supplementary data affects 20% of papers | Data quality |