-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cBioPortal Data Collection Automation #84
Comments
Hello!. It's Chetan. The idea is quite interesting. would like to work on it. To start with what task should I perform? |
@ao508 I noticed this was transferred from GSoC. If we are not working on it, maybe we can transfer it back? |
@inodb that's okay with me |
Very interesting idea. I would like to have a go at it, where is the open source code to start from? |
the source code is available in github. https://github.com/cBioPortal |
hey am interested in this project can u guide me further @inodb |
Hi @inodb ! I'm Muskan Kothari, currently a CSE senior at PES University, India. I'm here to contribute to this project through I also have experience working in big data and devops technologies like I am proficient in programming languages like I found the cBioPortal organization a perfect mix of my interests in interdisciplinary projects and my skills in various technologies that particularly help this project - I understand that working on some issues would strengthen my application and I will also be spending time understanding the organization. I'd like to get started with my proposal. I've joined the slack as well. Could we perhaps set up a discussion call? Could you tell me what technologies would be involved under DevOps? Thanks! |
Background:
The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.
Whenever data submissions come from external sources, a lot of manual curation needs to be performed to make sure the data is imported smoothly and rendered correctly in the cBioPortal. We would like to automate parts of this data curation process which will be in part handled through our datahub, a data repository that stores all cancer study data that is currently available in the cBioPortal.
Currently, whenever a Pull Request is made to datahub, the data undergoes a series of validation steps run by our data validation tool. However, to ensure that the data looks and renders as expected in the cBioPortal, one must manually import the data into a live instance of the portal. Automating this step in particular will be hugely beneficial to the QC process and greatly improve the turnaround time from data submission to import and visualization in the cBioPortal.
Goal:
Streamline and improve the turnaround time and review process for cancer study data submissions by automating the import of validated data files into a live instance of the cBioPortal.
Approach:
One option for spinning up review apps includes Heroku, which we use for reviewing changes to the backend of cBioPortal.
Another option might be Github Action for AWS Lightsail.
Both platforms support docker compose, for which configuration files already exist.
Needed skills:
*nix
,bash
anddevops
would be useful, but can be learned during the project.Possible mentors:
@inodb
The text was updated successfully, but these errors were encountered: