Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikidata adapter v2 #341

Merged
merged 5 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 3 additions & 34 deletions packages/wikidata-experimental-adapter/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,8 @@
# Wikidata experimental adapter
# Wikidata experimental adapter v2

A first iteration of a Wikidata integration to the Dataspecer tool.
The client queries the Wikidata SPARQL endpoint.
A second iteration of a Wikidata integration to the Dataspecer tool.
The client queries the Wikidata backend with extracted ontology.

## Comments

- The root search
- It queries entire Wikidata, that means the root can be any entity from the Wikidata, including instances and properties.
- It should handle only English language as of now.
- Hierarchy
- The hierarchy is made up of following `subclass of` properties to the parents.
- Using the SPARQL it can follow the `subclass of` property in reverse order and get children.
- Surroundings
- For each part of the surroundings (parents, children and associations with endpoints) stands a separate SPARQL query.
- Parents and children are the same as in hierarchy but only in the depth 1.
- Associations:
- Associations are created from `subject type` and `value type` constraints on properties.
- To find properties of a class it queries the SPARQL for properties that the class can be `subject of` or `value of`.
- If the class is `subject of` a property, then associations are created so that the `value types` are the ends of outgoing edge. In reverse if the class is `value of` a property, then the incoming edges are with endpoints of the `subject types` of the property.
- If the class is a `subject of` property but the property has a literal type, then it is an attribute.

## What can it do?

- search
- search based on string
- search based on iri
- full hierarchy
- children
- parents
- surroundings
- parents in height 1
- children in depth 1
- attributes (wikidata properties that do not point to items based on subject contraint)
- associations

## How to start it up for development?

1. `> git clone repository`
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import type { EntityId, EntityIdsList } from './wd-entity';

export enum PropertyScopeValue {
AS_MAIN = 0,
AS_QUALIFIER = 1,
AS_REFERENCE = 2,
}

export enum AllowedEntityTypesValue {
ITEM = 0,
PROPERTY = 1,
LEXEME = 2,
FORM = 3,
SENSE = 4,
MEDIA_INFO = 5,
}

export type StatementAllowanceMap = Record<string, EntityIdsList>;

export interface SubjectValueTypeContraint {
readonly subclassOf: EntityIdsList;
readonly instanceOf: EntityIdsList;
readonly subclassOfInstanceOf: EntityIdsList;
}
export interface GeneralConstraints {
readonly propertyScope: readonly PropertyScopeValue[];
readonly allowedEntityTypes: readonly AllowedEntityTypesValue[];
readonly allowedQualifiers: EntityIdsList;
readonly requiredQualifiers: EntityIdsList;
readonly conflictsWith: StatementAllowanceMap;
readonly itemRequiresStatement: StatementAllowanceMap;
readonly subjectType: SubjectValueTypeContraint;
}

export interface ItemTypeConstraints {
readonly valueType: SubjectValueTypeContraint;
readonly valueRequiresStatement: StatementAllowanceMap;
readonly isSymmetric: boolean;
readonly oneOf: EntityIdsList;
readonly noneOf: EntityIdsList;
readonly inverse: null | EntityId;
}

export type EmptyTypeConstraint = null;
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import { IWdEntity, EntityIdsList, ExternalOntologyMapping } from './wd-entity';

export const ROOT_CLASS_ID = 35120;

export interface IWdClass extends IWdEntity {
readonly subclassOf: EntityIdsList;
readonly children?: EntityIdsList;
readonly propertiesForThisType: EntityIdsList;
readonly equivalentExternalOntologyClasses: ExternalOntologyMapping;
readonly subjectOfProperty: EntityIdsList;
readonly valueOfProperty: EntityIdsList;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
export type LanguageMap = Record<string, string>;

export type EntityId = number;
export type EntityIdsList = readonly EntityId[];

export type ExternalEntityId = string;
export type ExternalOntologyMapping = readonly ExternalEntityId[];

export enum EntityTypes {
CLASS,
PROPERTY,
}

export interface IWdEntity {
readonly id: EntityId;
readonly labels: LanguageMap;
readonly descriptions: LanguageMap;
readonly instanceOf: EntityIdsList;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import type { IWdEntity, EntityIdsList, ExternalOntologyMapping } from './wd-entity';
import { type EmptyTypeConstraint, GeneralConstraints, type ItemTypeConstraints } from './constraints';

export enum UnderlyingType {
ENTITY = 0,
STRING = 1,
TIME = 2,
QUANTITY = 3,
GLOBE_COORDINATE = 4,
}

export enum Datatype {
ITEM = 0,
PROPERTY = 1,
LEXEME = 2,
SENSE = 3,
FORM = 4,
MONOLINGUAL_TEXT = 5,
STRING = 6,
EXTERNAL_IDENTIFIER = 7,
URL = 8,
COMMONS_MEDIA_FILE = 9,
GEOGRAPHIC_SHAPE = 10,
TABULAR_DATA = 11,
MATHEMATICAL_EXPRESSION = 12,
MUSICAL_NOTATION = 13,
QUANTITY = 14,
POINT_IN_TIME = 15,
GEOGRAPHIC_COORDINATES = 16,
}

export interface IWdProperty extends IWdEntity {
readonly datatype: Datatype;
readonly underlyingType: UnderlyingType;
readonly subpropertyOf: EntityIdsList;
readonly relatedProperty: EntityIdsList;
readonly equivalentExternalOntologyProperties: ExternalOntologyMapping;
readonly generalConstraints: GeneralConstraints;

readonly itemConstraints?: ItemTypeConstraints;
readonly stringConstraints?: EmptyTypeConstraint;
readonly quantityConstraints?: EmptyTypeConstraint;
readonly timeConstraints?: EmptyTypeConstraint;
readonly coordinatesConstraints?: EmptyTypeConstraint;
}
57 changes: 57 additions & 0 deletions packages/wikidata-experimental-adapter/src/connector/response.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import { IWdClass } from "./entities/wd-class";
import { IWdProperty } from "./entities/wd-property";

// Error response

export interface IErrorResponse {
statusCode: number,
error: string,
message: string
}

// Search api

export interface ISearchResults {
classes: IWdClass[]
}

export interface ISearchResponse {
results: ISearchResults;
}

// Get class api

export interface IGetClassResults {
classes: IWdClass[]
}

export interface IGetClassResponse {
results: IGetClassResults
}

// Get hierarchy api

export interface IHierarchyResults {
root: IWdClass
parents: IWdClass[]
children: IWdClass[]
}

export interface IHierarchyResponse {
results: IHierarchyResults
}

// Get surroundings api

export interface ISurroundingsResults {
root: IWdClass
parents: IWdClass[]
children: IWdClass[]
subjectOf: IWdProperty[]
valueOf: IWdProperty[]
propertyEndpoints: IWdClass[]
}

export interface ISurroundingsResponse {
results: ISurroundingsResults
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import { HttpFetch } from "@dataspecer/core/io/fetch/fetch-api";
import { IGetClassResponse, IHierarchyResponse, ISearchResponse, ISurroundingsResponse } from "./response";
import { EntityId } from "./entities/wd-entity";

export class WdConnector {
private readonly BASE_URL = "http://localhost:3042/api/v1";
private readonly API_ENDPOINTS = {
search: (query: string) => this.BASE_URL + `/search?query=${encodeURI(query)}`,
getClass: (id: EntityId) => this.BASE_URL + `/classes/${id}`,
hierarchy: (id: EntityId, part: 'full' | 'parents' | 'children' ) => this.BASE_URL + `/classes/${id}/hierarchy?part=${part}`,
surroundings: (id: EntityId) => this.BASE_URL + `/classes/${id}/surroundings`,
};

private readonly httpFetch: HttpFetch;

constructor(httpFetch: HttpFetch) {
this.httpFetch = httpFetch;
}

private isIErrorResponse(response: object): boolean {
return 'statusCode' in response &&
'message' in response &&
'error' in response;
}

public async search(query: string): Promise<ISearchResponse | undefined> {
const url = this.API_ENDPOINTS.search(query);
const resp = await ((await this.httpFetch(url)).json()) as object
return this.isIErrorResponse(resp) ? undefined : resp as ISearchResponse;
}

public async getClass(id: EntityId): Promise<IGetClassResponse | undefined> {
const url = this.API_ENDPOINTS.getClass(id);
const resp = await ((await this.httpFetch(url)).json()) as object
return this.isIErrorResponse(resp) ? undefined : resp as IGetClassResponse;
}

public async hierarchy(id: EntityId): Promise<IHierarchyResponse | undefined> {
const url = this.API_ENDPOINTS.hierarchy(id, 'parents');
const resp = await ((await this.httpFetch(url)).json()) as object
return this.isIErrorResponse(resp) ? undefined : resp as IHierarchyResponse;
}

public async surroundings(id: EntityId): Promise<ISurroundingsResponse | undefined> {
const url = this.API_ENDPOINTS.surroundings(id);
const resp = await ((await this.httpFetch(url)).json()) as object
return this.isIErrorResponse(resp) ? undefined : resp as ISurroundingsResponse;
}
}

This file was deleted.

This file was deleted.

Loading
Loading