Skip to content

Data Models

Nacho edited this page Feb 21, 2015 · 5 revisions

We believe that it is important to keep the databases mostly unaware in which format the data was originally modelled and stored. A reference to this format will only be stored for specific purposes involving file transfers. Data models can be extended and changed to improve database performance, this changes are transparent for users.

Data models for CellBase data have been designed and implemented in Java. They explicitly specify the most commonly used fields, and at the same time provide mechanisms for preserving all the information of a certain format. For instance, the fields specified for a variant would be (among others) chromosome, position, reference and alternatives; if a VCF file is being stored, then columns such as INFO are also saved in a key-value data structure.

Implementation

CellBase data models are stored in a related project Biodata, this guarantee that all OpenCB projects talk the same language whether if the use CellBase or not. You can visit Biodata wiki for more detailed information. For a brief overview you can take a look to the following sections.

Gene
id: "ENSG00000139618",
name: "BRCA2",
biotype: "protein_coding",
status: "KNOWN",
chromosome: "13",
start: 32889611,
end: 32973805,
strand: "+",
source: "Ensembl",
description: "breast cancer 2, early onset [Source:HGNC Symbol;Acc:1101]",
transcripts: [
 {
 "id": "ENST00000380152",
 "name": "BRCA2-001",
 "biotype": "protein_coding",
 "status": "KNOWN",
 "chromosome": "13",
 "start": 32889611,
 "end": 32973347,
 "strand": "+",
 "genomicCodingStart": 32890598,
 "genomicCodingEnd": 32972907,
 "cdnaCodingStart": 234,
 "cdnaCodingEnd": 10490,
 "cdsLength": 10256,
 "proteinID": "ENSP00000369497",
 "description": "",
 "xrefs": [],
 "exons": []
 },
 {
 "id": "ENST00000544455",
 "name": "BRCA2-201",
 "biotype": "protein_coding",
 "status": "KNOWN",
 "chromosome": "13",
 "start": 32889617,
 "end": 32973805,
 "strand": "+",
 "genomicCodingStart": 32890598,
 "genomicCodingEnd": 32972907,
 "cdnaCodingStart": 228,
 "cdnaCodingEnd": 10484,
 "cdsLength": 10256,
 "proteinID": "ENSP00000439902",
 "description": "",
 "xrefs": [],
 "exons": []
 }
]

A full example can be found here:

https://wwwdev.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/feature/gene/BRCA2/info