MINDS is a framework designed to integrate multimodal oncology data. It queries and integrates data from multiple sources, including clinical data, genomic data, and imaging data from the NIH NCI CRDC and TCIA portals.
Note
We are currently updating MINDS to include more data sources and improve the user experience. If you have any suggestions or would like to contribute, please feel free to reach out to us. Here is a list of the projects to be included in MINDS (115,974 total patients).
Projects in MINDS
Project Name | Cases | Clinical | Radiology | Histopathology | Molecular |
---|---|---|---|---|---|
Foundation Medicine (FM) | 18,004 | ✓ | ✓ | ||
The Cancer Genome Atlas (TCGA) | 11,428 | ✓ | ✓ | ✓ | ✓ |
Therapeutically Applicable Research to Generate Effective Treatments (TARGET) | 6,543 | ✓ | ✓ | ||
Clinical Proteomic Tumor Analysis Consortium (CPTAC) | 1,656 | ✓ | ✓ | ✓ | |
The Molecular Profiling to Predict Response to Treatment (MP2PRT) | 1,562 | ✓ | ✓ | ||
Multiple Myeloma Research Foundation (MMRF) | 995 | ✓ | ✓ | ||
BEATAML1.0 | 882 | ✓ | ✓ | ||
Cancer Genome Characterization Initiatives (CGCI) | 645 | ✓ | ✓ | ✓ | |
NCI Center for Cancer Research (NCICCR) | 489 | ✓ | ✓ | ||
REBC | 449 | ✓ | ✓ | ||
MATCH | 448 | ✓ | ✓ | ||
Ukrainian National Research Center for Radiation Medicine Trio Study (TRIO) | 339 | ✓ | ✓ | ||
Count Me In (CMI) | 299 | ✓ | ✓ | ||
Human Cancer Model Initiative (HCMI) | 278 | ✓ | ✓ | ✓ | |
West Coast Prostrate Cancer Dream Team (WCDT) | 101 | ✓ | ✓ | ||
Oregon Health and Science University (OHSU) | 176 | ✓ | ✓ | ||
Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) | 87 | ✓ | ✓ | ||
EXCEPTIONAL RESPONDERS | 84 | ✓ | ✓ | ||
Environment And Genetics in Lung Cancer Etiology (EAGLE) | 50 | ✓ | ✓ | ||
ORGANOID | 70 | ✓ | ✓ | ||
Clinical Trials Sequencing Project (CTSP) | 45 | ✓ | ✓ | ||
VA Research Precision Oncology Program (VAREPOP) | 7 | ✓ | ✓ | ||
4D-Lung | 20 | ✓ | |||
A091105 | 83 | ✓ | |||
AAPM-RT-MAC | 55 | ✓ | |||
ACNS0332 | 85 | ✓ | |||
ACRIN-6698 | 385 | ✓ | |||
ACRIN-Contralateral-Breast-MR | 984 | ✓ | |||
ACRIN-DSC-MR-Brain | 123 | ✓ | |||
ACRIN-FLT-Breast | 83 | ✓ | ✓ | ||
ACRIN-FMISO-Brain | 45 | ✓ | |||
ACRIN-HNSCC-FDG-PET-CT | 260 | ✓ | |||
ACRIN-NSCLC-FDG-PET | 242 | ✓ | |||
Adrenal-ACC-Ki67-Seg | 53 | ✓ | ✓ | ||
Advanced-MRI-Breast-Lesions | 632 | ✓ | ✓ | ✓ | |
AHEP0731 | 80 | ✓ | |||
AHOD0831 | 165 | ✓ | |||
AML-Cytomorphology_LMU | 200 | ✓ | |||
AML-Cytomorphology_MLL_Helmholtz | 189 | ✓ | |||
Anti-PD-1_Lung | 46 | ✓ | |||
Anti-PD-1_MELANOMA | 47 | ✓ | |||
APOLLO-5 | 414 | ✓ | |||
ARAR0331 | 108 | ✓ | |||
AREN0532 | 544 | ✓ | |||
AREN0533 | 294 | ✓ | |||
AREN0534 | 239 | ✓ | |||
B-mode-and-CEUS-Liver | 120 | ✓ | |||
Bone-Marrow-Cytomorphology_MLL_Helmholtz_Fraunhofer | 945 | ✓ | |||
Brain-TR-GammaKnife | 47 | ✓ | |||
Brain-Tumor-Progression | 20 | ✓ | |||
Breast-Cancer-Screening-DBT | 5,060 | ✓ | |||
BREAST-DIAGNOSIS | 88 | ✓ | |||
Breast-Lesions-USG | 256 | ✓ | |||
Breast-MRI-NACT-Pilot | 64 | ✓ | |||
Burdenko-GBM-Progression | 180 | ✓ | |||
C-NMC 2019 | 118 | ✓ | |||
C4KC-KiTS | 210 | ✓ | |||
CALGB50303 | 155 | ✓ | |||
CBIS-DDSM | 1,566 | ✓ | |||
CC-Radiomics-Phantom | 17 | ✓ | |||
CC-Radiomics-Phantom-2 | 251 | ✓ | |||
CC-Tumor-Heterogeneity | 23 | ✓ | |||
CDD-CESM | 326 | ✓ | |||
CMB-AML | 8 | ✓ | ✓ | ||
CMB-CRC | 49 | ✓ | ✓ | ||
CMB-GEC | 7 | ✓ | ✓ | ||
CMB-LCA | 61 | ✓ | ✓ | ||
CMB-MEL | 44 | ✓ | ✓ | ||
CMB-MML | 64 | ✓ | ✓ | ||
CMB-PCA | 12 | ✓ | ✓ | ||
CMMD | 1,775 | ✓ | ✓ | ✓ | |
CODEX imaging of HCC | 15 | ✓ | |||
Colorectal-Liver-Metastases | 197 | ✓ | |||
COVID-19-AR | 105 | ✓ | |||
COVID-19-NY-SBU | 1,384 | ✓ | |||
CRC_FFPE-CODEX_CellNeighs | 35 | ✓ | |||
CT COLONOGRAPHY | 825 | ✓ | ✓ | ||
CT Images in COVID-19 | 661 | ✓ | |||
CT Lymph Nodes | 176 | ✓ | |||
CT-ORG | 140 | ✓ | |||
CT-Phantom4Radiomics | 1 | ✓ | |||
CT-vs-PET-Ventilation-Imaging | 20 | ✓ | |||
CTpred-Sunitinib-panNET | 38 | ✓ | |||
DFCI-BCH-BWH-PEDs-HGG | 61 | ✓ | |||
DLBCL-Morphology | 209 | ✓ | |||
DRO-Toolkit | 32 | ✓ | |||
Duke-Breast-Cancer-MRI | 922 | ✓ | |||
EA1141 | 500 | ✓ | |||
ExACT | 30 | ✓ | |||
FDG-PET-CT-Lesions | 900 | ✓ | |||
GammaKnife-Hippocampal | 390 | ✓ | |||
GBM-DSC-MRI-DRO | 3 | ✓ | |||
GLIS-RT | 230 | ✓ | |||
HCC-TACE-Seg | 105 | ✓ | |||
HE-vs-MPM | 12 | ✓ | |||
Head-Neck Cetuximab | 111 | ✓ | |||
Head-Neck-PET-CT | 298 | ✓ | |||
HEAD-NECK-RADIOMICS-HN1 | 137 | ✓ | |||
Healthy-Total-Body-CTs | 30 | ✓ | |||
HER2 tumor ROIs | 273 | ✓ | |||
HistologyHSI-GB | 13 | ✓ | |||
HNC-IMRT-70-33 | 211 | ✓ | |||
HNSCC | 627 | ✓ | |||
HNSCC-3DCT-RT | 31 | ✓ | |||
HNSCC-mIF-mIHC-comparison | 8 | ✓ | |||
Hungarian-Colorectal-Screening | 200 | ✓ | |||
ISPY1 | 222 | ✓ | |||
ISPY2 | 719 | ✓ | |||
IvyGAP | 39 | ✓ | |||
LCTSC | 60 | ✓ | |||
LDCT-and-Projection-data | 299 | ✓ | |||
LGG-1p19qDeletion | 159 | ✓ | |||
LIDC-IDRI | 1,010 | ✓ | |||
Lung Phantom | 1 | ✓ | |||
Lung-Fused-CT-Pathology | 6 | ✓ | |||
Lung-PET-CT-Dx | 355 | ✓ | |||
LungCT-Diagnosis | 61 | ✓ | |||
Meningioma-SEG-CLASS | 96 | ✓ | |||
MIDRC-RICORD-1A | 110 | ✓ | |||
MIDRC-RICORD-1B | 117 | ✓ | |||
MIDRC-RICORD-1C | 361 | ✓ | |||
MiMM_SBILab | 5 | ✓ | |||
NADT-Prostate | 37 | ✓ | |||
NaF PROSTATE | 9 | ✓ | |||
NLST | 26,254 | ✓ | ✓ | ||
NRG-1308 | 12 | ✓ | |||
NSCLC Radiogenomics | 211 | ✓ | |||
NSCLC-Cetuximab | 490 | ✓ | |||
NSCLC-Radiomics | 422 | ✓ | |||
NSCLC-Radiomics-Genomics | 89 | ✓ | |||
NSCLC-Radiomics-Interobserver1 | 22 | ✓ | |||
OPC-Radiomics | 606 | ✓ | |||
Osteosarcoma-Tumor-Assessment | 4 | ✓ | |||
Ovarian Bevacizumab Response | 78 | ✓ | |||
Pancreas-CT | 82 | ✓ | |||
Pancreatic-CT-CBCT-SEG | 40 | ✓ | |||
PCa_Bx_3Dpathology | 50 | ✓ | ✓ | ||
Pediatric-CT-SEG | 359 | ✓ | |||
Pelvic-Reference-Data | 58 | ✓ | |||
Phantom FDA | 7 | ✓ | |||
Post-NAT-BRCA | 64 | ✓ | |||
Pretreat-MetsToBrain-Masks | 200 | ✓ | ✓ | ||
Prostate Fused-MRI-Pathology | 28 | ✓ | |||
Prostate-3T | 64 | ✓ | |||
Prostate-Anatomical-Edge-Cases | 131 | ✓ | |||
PROSTATE-DIAGNOSIS | 92 | ✓ | |||
PROSTATE-MRI | 26 | ✓ | |||
Prostate-MRI-US-Biopsy | 1,151 | ✓ | |||
PROSTATEx | 346 | ✓ | |||
Pseudo-PHI-DICOM-Data | 21 | ✓ | |||
PTRC-HGSOC | 174 | ✓ | |||
QIBA CT-1C | 1 | ✓ | |||
QIBA-CT-Liver-Phantom | 3 | ✓ | |||
QIN Breast DCE-MRI | 10 | ✓ | |||
QIN GBM Treatment Response | 54 | ✓ | |||
QIN LUNG CT | 47 | ✓ | |||
QIN PET Phantom | 2 | ✓ | |||
QIN PROSTATE | 22 | ✓ | |||
QIN-BRAIN-DSC-MRI | 49 | ✓ | |||
QIN-BREAST | 67 | ✓ | |||
QIN-BREAST-02 | 13 | ✓ | |||
QIN-HEADNECK | 279 | ✓ | |||
QIN-PROSTATE-Repeatability | 15 | ✓ | |||
QIN-SARCOMA | 15 | ✓ | |||
RADCURE | 3,346 | ✓ | ✓ | ||
REMBRANDT | 130 | ✓ | |||
ReMIND | 114 | ✓ | |||
RHUH-GBM | 40 | ✓ | |||
RIDER Breast MRI | 5 | ✓ | |||
RIDER Lung CT | 32 | ✓ | |||
RIDER Lung PET-CT | 244 | ✓ | |||
RIDER NEURO MRI | 19 | ✓ | |||
RIDER PHANTOM MRI | 10 | ✓ | |||
RIDER PHANTOM PET-CT | 20 | ✓ | |||
RIDER Pilot | 8 | ✓ | |||
S0819 | 1,299 | ✓ | |||
SLN-Breast | 78 | ✓ | |||
SN-AM | 60 | ✓ | |||
Soft-tissue-Sarcoma | 51 | ✓ | |||
SPIE-AAPM Lung CT Challenge | 70 | ✓ | |||
StageII-Colorectal-CT | 230 | ✓ | |||
UCSF-PDGM | 495 | ✓ | |||
UPENN-GBM | 630 | ✓ | |||
Vestibular-Schwannoma-MC-RC | 124 | ✓ | |||
Vestibular-Schwannoma-SEG | 242 | ✓ | |||
VICTRE | 2,994 | ✓ |
Currently the cloud version of MINDS is in closed beta, but, you can still recreate the MINDS database locally. To get the local version of the MINDS database running, you will need to setup a MySQL database and populate it with the MINDS schema. This can be easily done using a docker container. First, you will need to install docker. You can find the installation instructions for your operating system here. Next, you will need to pull the MySQL docker image and run a container with the following command.
Note
Please replace my-secret-pw
with your desired password and port
with the port you want to use to access the database. The default port for MySQL is 3306. The following command will not work until you replace port
with a valid port number.
docker run -d --name minds -e MYSQL_ROOT_PASSWORD=my-secret-pw -e MYSQL_DATABASE=minds -p port:3306 mysql
Finally, to install the MINDS python package use the following pip command:
pip install git+https://github.com/lab-rasool/MINDS.git
After installing the package, please create a .env file in the root directory of the project with the following variables:
HOST=127.0.0.1
PORT=3306
DB_USER=root
PASSWORD=my-secret-pw
DATABASE=minds
If you have locally setup the MINDS database, then you will need to populate it with data. To do this, or to update the database with the latest data, you can use the following command:
# Import the minds package
import minds
# Update the database with the latest data
minds.update()
The MINDS python package provides a python interface to the MINDS database. You can use this interface to query the database and return the results as a pandas dataframe.
import minds
# get a list of all the tables in the database
tables = minds.get_tables()
# get a list of all the columns in a table
columns = minds.get_columns("clinical")
# Query the database directly
query = "SELECT * FROM minds.clinical WHERE project_id = 'TCGA-LUAD' LIMIT 10"
df = minds.query(query)
# Generate a cohort to download from query
query_cohort = minds.build_cohort(query=query, output_dir="./data")
# or you can now directly supply a cohort from GDC
gdc_cohort = minds.build_cohort(gdc_cohort="cohort_Unsaved_Cohort.2024-02-12.tsv", output_dir="./data")
# to get the cohort details
gdc_cohort.stats()
# to download the data from the cohort to the output directory specified
# you can also specify the number of threads to use and the modalities to exclude or include
gdc_cohort.download(threads=12, exclude=["Slide Image"])
@Article{s24051634,
AUTHOR = {Tripathi, Aakash and Waqas, Asim and Venkatesan, Kavya and Yilmaz, Yasin and Rasool, Ghulam},
TITLE = {Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets},
JOURNAL = {Sensors},
VOLUME = {24},
YEAR = {2024},
NUMBER = {5},
ARTICLE-NUMBER = {1634},
URL = {https://www.mdpi.com/1424-8220/24/5/1634},
ISSN = {1424-8220},
DOI = {10.3390/s24051634}
}
We welcome contributions from the community. If you would like to contribute to the MINDS project, please read our contributing guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.