cBioPortal Data Collection Automation #84

inodb · 2018-01-25T18:21:30Z

Background:

The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.

Whenever data submissions come from external sources, a lot of manual curation needs to be performed to make sure the data is imported smoothly and rendered correctly in the cBioPortal. We would like to automate parts of this data curation process which will be in part handled through our datahub, a data repository that stores all cancer study data that is currently available in the cBioPortal.

Currently, whenever a Pull Request is made to datahub, the data undergoes a series of validation steps run by our data validation tool. However, to ensure that the data looks and renders as expected in the cBioPortal, one must manually import the data into a live instance of the portal. Automating this step in particular will be hugely beneficial to the QC process and greatly improve the turnaround time from data submission to import and visualization in the cBioPortal.

Goal:

Streamline and improve the turnaround time and review process for cancer study data submissions by automating the import of validated data files into a live instance of the cBioPortal.

Approach:

One option for spinning up review apps includes Heroku, which we use for reviewing changes to the backend of cBioPortal.

Another option might be Github Action for AWS Lightsail.

Both platforms support docker compose, for which configuration files already exist.

Needed skills:

General problem solving skills.
Some basic knowledge of *nix, bash and devops would be useful, but can be learned during the project.

Possible mentors:
@inodb

The text was updated successfully, but these errors were encountered:

css911 · 2019-02-28T09:13:10Z

Hello!. It's Chetan. The idea is quite interesting. would like to work on it. To start with what task should I perform?

inodb · 2020-08-10T16:41:02Z

@ao508 I noticed this was transferred from GSoC. If we are not working on it, maybe we can transfer it back?

ao508 · 2020-08-10T19:19:22Z

@inodb that's okay with me

daniocionini · 2022-04-05T20:38:38Z

Very interesting idea. I would like to have a go at it, where is the open source code to start from?

jagnathan · 2022-04-13T20:36:45Z

the source code is available in github. https://github.com/cBioPortal

devharsh2k4 · 2023-02-23T14:00:04Z

hey am interested in this project can u guide me further @inodb

muskan-k · 2023-02-25T08:24:43Z

Hi @inodb ! I'm Muskan Kothari, currently a CSE senior at PES University, India. I'm here to contribute to this project through GSoC '23. I studied biology prior to starting undergrad in CSE and I'm highly interested in applying CSE to interdisciplinary domains. Having said that, I do have multiple projects involving computer science fundamentals to biology (Measures of lexical diversity and Alzheimer's detection) and physics (Tree based models for critical temperature of super conductors).

I also have experience working in big data and devops technologies like Docker and Kubernetes (converting monolith application to micro-services), PySpark and Hadoop (sentiment analysis of twitter).

I am proficient in programming languages likePython, C++ and Java and comfortable using Git.

I found the cBioPortal organization a perfect mix of my interests in interdisciplinary projects and my skills in various technologies that particularly help this project - cBioPortal Data Collection Automation. I'd love to learn and contribute to this project.

I understand that working on some issues would strengthen my application and I will also be spending time understanding the organization. I'd like to get started with my proposal. I've joined the slack as well.

Could we perhaps set up a discussion call? Could you tell me what technologies would be involved under DevOps?

Thanks!
Muskan

inodb self-assigned this Jan 25, 2018

ao508 transferred this issue from cBioPortal/GSoC Jan 24, 2020

inodb transferred this issue from cBioPortal/datahub Aug 10, 2020

inodb added the GSoC-2021 GSoC 2021 Candidate Projects label Nov 16, 2020

cBioPortal deleted a comment from pieterlukasse Jan 25, 2021

inodb added GSoC-2022 GSoC 2022 Candidate Projects devops Size: Medium (175h) and removed GSoC-2021 GSoC 2021 Candidate Projects labels Feb 17, 2022

inodb removed their assignment Feb 22, 2022

cBioPortal deleted a comment from stale bot Feb 24, 2022

ao508 added cBioPortal enhancement Difficulty: Medium and removed enhancement labels Feb 24, 2022

jagnathan closed this as completed Apr 14, 2022

jagnathan reopened this Apr 14, 2022

inodb added GSoC-2023 GSoC 2023 Candidate Projects and removed GSoC-2022 GSoC 2022 Candidate Projects labels Jan 25, 2023

This was referenced Sep 11, 2023

Streamline Data Validation Process with Automated Staging Environment cBioPortal/datahub#1908

Open

Add preview workflow cBioPortal/datahub#1909

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cBioPortal Data Collection Automation #84

cBioPortal Data Collection Automation #84

inodb commented Jan 25, 2018 •

edited by ao508

Loading

css911 commented Feb 28, 2019 •

edited

Loading

inodb commented Aug 10, 2020

ao508 commented Aug 10, 2020

daniocionini commented Apr 5, 2022

jagnathan commented Apr 13, 2022

devharsh2k4 commented Feb 23, 2023

muskan-k commented Feb 25, 2023 •

edited

Loading

cBioPortal Data Collection Automation #84

cBioPortal Data Collection Automation #84

Comments

inodb commented Jan 25, 2018 • edited by ao508 Loading

css911 commented Feb 28, 2019 • edited Loading

inodb commented Aug 10, 2020

ao508 commented Aug 10, 2020

daniocionini commented Apr 5, 2022

jagnathan commented Apr 13, 2022

devharsh2k4 commented Feb 23, 2023

muskan-k commented Feb 25, 2023 • edited Loading

inodb commented Jan 25, 2018 •

edited by ao508

Loading

css911 commented Feb 28, 2019 •

edited

Loading

muskan-k commented Feb 25, 2023 •

edited

Loading