Repos for storing all sower jobs
Contains job for ingesting metadata from a file.
NOTE: These job configurations assume you have setup Service Accounts in k8s with fine-grained IAM roles in AWS to interact with an S3 bucket. See Cloud Automation docs.
IMPORTANT NOTE: You must supply the correct "serviceAccountName" in the following examples. By default these are in the form "jobs-${hostname//./-}-sa". Example:
jobs-example-planx-pla-net-sa
. Cloud Automation enables the creation of the necessary infrastructure (buckets, SAs, roles) by runninggen3 kube-setup-sower-jobs
.
{
"name": "ingest-metadata-manifest",
"action": "ingest-metadata-manifest",
"serviceAccountName": "jobs-${hostname//./-}",
"container": {
"name": "job-task",
"image": "quay.io/cdis/metadata-manifest-ingestion:master",
"pull_policy": "Always",
"env": [
{
"name": "GEN3_HOSTNAME",
"valueFrom": {
"configMapKeyRef": {
"name": "manifest-global",
"key": "hostname"
}
}
}
],
"volumeMounts": [
{
"name": "creds-volume",
"readOnly": true,
"mountPath": "/creds.json",
"subPath": "creds.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "creds-volume",
"secret": {
"secretName": "sower-jobs-g3auto"
}
}
],
"restart_policy": "Never"
}
Contains job to parse dbGaP and associate samples to indexed file objects and returns the file. You can then QA the file and use the "Metadata Ingestion" job above to get the metadata into a commons.
{
"name": "get-dbgap-metadata",
"action": "get-dbgap-metadata",
"serviceAccountName": "jobs-${hostname//./-}",
"container": {
"name": "job-task",
"image": "quay.io/cdis/get-dbgap-metadata:master",
"pull_policy": "Always",
"env": [],
"volumeMounts": [
{
"name": "creds-volume",
"readOnly": true,
"mountPath": "/creds.json",
"subPath": "creds.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "creds-volume",
"secret": {
"secretName": "sower-jobs-g3auto"
}
}
],
"restart_policy": "Never"
}
It contains jobs for indexing manifest and downloading indexd manifest
The following is a manifest config for indexing manifest job and downloading indexd manifest
{
"name": "manifest-indexing",
"action": "index-object-manifest",
"serviceAccountName": "jobs-${hostname//./-}",
"container": {
"name": "job-task",
"image": "quay.io/cdis/manifest-indexing:master",
"pull_policy": "Always",
"env": [
{
"name": "GEN3_HOSTNAME",
"valueFrom": {
"configMapKeyRef": {
"name": "manifest-global",
"key": "hostname"
}
}
}
],
"volumeMounts": [
{
"name": "sower-jobs-creds-volume",
"readOnly": true,
"mountPath": "/creds.json",
"subPath": "creds.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "sower-jobs-creds-volume",
"secret": {
"secretName": "sower-jobs-g3auto"
}
}
],
"restart_policy": "Never"
}
{
"name": "indexd-manifest",
"action": "download-indexd-manifest",
"serviceAccountName": "jobs-${hostname//./-}",
"container": {
"name": "job-task",
"image": "quay.io/cdis/download-indexd-manifest:master",
"pull_policy": "Always",
"env": [
{
"name": "GEN3_HOSTNAME",
"valueFrom": {
"configMapKeyRef": {
"name": "manifest-global",
"key": "hostname"
}
}
}
],
"volumeMounts": [
{
"name": "sower-jobs-creds-volume",
"readOnly": true,
"mountPath": "/creds.json",
"subPath": "creds.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "sower-jobs-creds-volume",
"secret": {
"secretName": "sower-jobs-g3auto"
}
}
],
"restart_policy": "Never"
}
It contains a job for merging bucket manifests of file objects, and can be set up with the following cdis-manifest configuration:
{
"name": "manifest-merging",
"action": "merge-manifests",
"activeDeadlineSeconds": 86400,
"serviceAccountName": "jobs-${hostname//./-}",
"container": {
"name": "job-task",
"image": "quay.io/cdis/manifest-merging:master",
"pull_policy": "Always",
"env": [
{
"name": "GEN3_HOSTNAME",
"valueFrom": {
"configMapKeyRef": {
"name": "manifest-global",
"key": "hostname"
}
}
}
],
"volumeMounts": [
{
"name": "sower-jobs-creds-volume",
"readOnly": true,
"mountPath": "/creds.json",
"subPath": "creds.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "sower-jobs-creds-volume",
"secret": {
"secretName": "sower-jobs-g3auto"
}
}
],
"restart_policy": "Never"
}
The export job accepts a list of discovery metadata guids and returns an S3 presigned URL pointing to a zip of the __manifest
-referenced files of all guids.
This job uses the aggregate metadata service for lookups and generates a temporary API key for downloads via the gen3 SDK.
Export sizes are limited to 250 megabytes and are stored under a user-unique S3 object. Each new download replaces the previous, if there was one.
{
"name": "batch-export",
"action": "batch-export",
"activeDeadlineSeconds": 600,
"container": {
"name": "job-task",
"image": "quay.io/cdis/batch-export:master",
"pull_policy": "Always",
"env": [
{
"name": "GEN3_HOSTNAME",
"valueFrom": {
"configMapKeyRef": {
"name": "manifest-global",
"key": "hostname"
}
}
}
],
"volumeMounts": [
{
"name": "batch-export-creds-volume",
"readOnly": true,
"mountPath": "/batch-export-creds.json",
"subPath": "config.json"
}
],
"cpu-limit": "1",
"memory-limit": "1Gi"
},
"volumes": [
{
"name": "batch-export-creds-volume",
"secret": {
"secretName": "batch-export-g3auto"
}
}
],
"restart_policy": "Never"
}
See here.
The secret sower-jobs-g3auto
should be setup automatically with Cloud Automation and contains a JSON blob with:
{
"index-object-manifest": {
"job_requires": {
"arborist_url": "http://arborist-service",
"job_access_req": [
{
"resource": "/sower",
"action": {
"service": "job",
"method": "access"
}
},
{
"resource": "/programs",
"action": {
"service": "indexd",
"method": "write"
}
}
]
},
"bucket": "some-bucket",
"indexd_user": "",
"indexd_password": ""
},
"download-indexd-manifest": {
"job_requires": {
"arborist_url": "http://arborist-service",
"job_access_req": [
{
"resource": "/sower",
"action": {
"service": "job",
"method": "access"
}
}
]
},
"bucket": "some-bucket"
},
"get-dbgap-metadata": {
"job_requires": {
"arborist_url": "http://arborist-service",
"job_access_req": [
{
"resource": "/sower",
"action": {
"service": "job",
"method": "access"
}
},
{
"resource": "/mds_gateway",
"action": {
"service": "mds_gateway",
"method": "access"
}
}
]
},
"bucket": "some-bucket"
},
"ingest-metadata-manifest": {
"job_requires": {
"arborist_url": "http://arborist-service",
"job_access_req": [
{
"resource": "/sower",
"action": {
"service": "job",
"method": "access"
}
},
{
"resource": "/mds_gateway",
"action": {
"service": "mds_gateway",
"method": "access"
}
}
]
},
"bucket": "some-bucket"
}
}
NOTE: some of the above fields will get set to a default value if not provided or empty. Specifically you can leave out "arborist_url" and "job_access_req" and the default Arborist url and access requirements will be set. Also note that the "bucket" and AWS creds can be the same for all the jobs or different if necessary
- Go to quay.io and "Create a new repository" by clicking the plus at the top
- Use whatever name you want, you'll need to use this same name in the "image" option in the above example configurations
- For example, we can use
get-dbgap-metadata
and then reference the image withquay.io/cdis/get-dbgap-metadata:master
- For example, we can use
- Select "Public" visibility and "Link to a Github Repository Push"
- Click "Create Public Repo" button
- Select the uc-cdis org and find this repo (sower-jobs)
- Keep default "Trigger for all branches and tags (default)" option
- Hit "Continue" button
- Leave defaults for "Configure Tagging" section
- Hit "Continue" button
- For "Select Dockerfile" enter the path to the new Dockerfile you're trying to build
- NOTE: If you're trying to build something on a branch for testing this box won't autopopulate but you can still type the path to the file on the branch and it'll work
- Example:
/metadata_ingestion/get_dbgap_metadata_manifest.Dockerfile
- For "Select Context" section, enter the folder path to where the Dockerfile is
- Example:
/metadata_ingestion
- Example:
- If the dockerfile is on a branch you may get a warning "Verification Warning: Specified Dockerfile path for the trigger was not found on the main branch. This trigger may fail."
- This is fine. Hit "Continue" button
- Hit final green "Continue" button!