-
Notifications
You must be signed in to change notification settings - Fork 53
Data Models
We believe that it is important to keep the databases mostly unaware in which format the data was originally modelled and stored. A reference to this format will only be stored for specific purposes involving file transfers. Data models can be extended and changed to improve database performance, this changes are transparent for users.
Data models for CellBase data have been designed and implemented in Java. They explicitly specify the most commonly used fields, and at the same time provide mechanisms for preserving all the information of a certain format. For instance, the fields specified for a variant would be (among others) chromosome, position, reference and alternatives; if a VCF file is being stored, then columns such as INFO are also saved in a key-value data structure.
CellBase data models are stored in a related project Biodata, this guarantee that all OpenCB projects talk the same language whether if the use CellBase or not. You can visit Biodata wiki for more detailed information. For a brief overview you can take a look to the following sections.
id: "ENSG00000139618",
name: "BRCA2",
biotype: "protein_coding",
status: "KNOWN",
chromosome: "13",
start: 32889611,
end: 32973805,
strand: "+",
source: "Ensembl",
description: "breast cancer 2, early onset [Source:HGNC Symbol;Acc:1101]",
transcripts: [
{
"id": "ENST00000380152",
"name": "BRCA2-001",
"biotype": "protein_coding",
"status": "KNOWN",
"chromosome": "13",
"start": 32889611,
"end": 32973347,
"strand": "+",
"genomicCodingStart": 32890598,
"genomicCodingEnd": 32972907,
"cdnaCodingStart": 234,
"cdnaCodingEnd": 10490,
"cdsLength": 10256,
"proteinID": "ENSP00000369497",
"description": "",
"xrefs": [],
"exons": []
},
{
"id": "ENST00000544455",
"name": "BRCA2-201",
"biotype": "protein_coding",
"status": "KNOWN",
"chromosome": "13",
"start": 32889617,
"end": 32973805,
"strand": "+",
"genomicCodingStart": 32890598,
"genomicCodingEnd": 32972907,
"cdnaCodingStart": 228,
"cdnaCodingEnd": 10484,
"cdsLength": 10256,
"proteinID": "ENSP00000439902",
"description": "",
"xrefs": [],
"exons": []
}
]
A full example can be found here:
https://wwwdev.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/feature/gene/BRCA2/info