Skip to content

Commit

Permalink
update readmes (#319)
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood authored Dec 19, 2023
1 parent 2cf2a73 commit 45e9e43
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 26 deletions.
4 changes: 3 additions & 1 deletion src/tasks/denoising/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Format:
<div class="small">

AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
obsm: 'X_pca'
obsp: 'knn_distances', 'knn_connectivities'
Expand All @@ -120,6 +120,8 @@ Slot description:
| `obs["disease_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`). |
| `obs["donor_id"]` | `string` | (*Optional*) Identifier for the donor from whom the cell sample is obtained. |
| `obs["is_primary_data"]` | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
| `obs["organism"]` | `string` | (*Optional*) Organism from which the cell sample is obtained. |
| `obs["organism_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`. |
| `obs["self_reported_ethnicity"]` | `string` | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
| `obs["self_reported_ethnicity_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used. |
| `obs["sex"]` | `string` | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
Expand Down
4 changes: 3 additions & 1 deletion src/tasks/dimensionality_reduction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Format:
<div class="small">

AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
obsm: 'X_pca'
obsp: 'knn_distances', 'knn_connectivities'
Expand All @@ -112,6 +112,8 @@ Slot description:
| `obs["disease_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`). |
| `obs["donor_id"]` | `string` | (*Optional*) Identifier for the donor from whom the cell sample is obtained. |
| `obs["is_primary_data"]` | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
| `obs["organism"]` | `string` | (*Optional*) Organism from which the cell sample is obtained. |
| `obs["organism_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`. |
| `obs["self_reported_ethnicity"]` | `string` | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
| `obs["self_reported_ethnicity_ontology_term_id"]` | `string` | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used. |
| `obs["sex"]` | `string` | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
Expand Down
58 changes: 34 additions & 24 deletions src/tasks/predict_modality/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ the information about cellular state from one modality to the other.

``` mermaid
flowchart LR
file_dataset_rna("Raw dataset RNA")
file_common_dataset_rna("Raw dataset RNA")
comp_process_dataset[/"Data processor"/]
file_train_mod1("Train mod1")
file_train_mod2("Train mod2")
Expand All @@ -61,8 +61,8 @@ flowchart LR
comp_metric[/"Metric"/]
file_prediction("Prediction")
file_score("Score")
file_dataset_other_mod("Raw dataset mod2")
file_dataset_rna---comp_process_dataset
file_common_dataset_other_mod("Raw dataset mod2")
file_common_dataset_rna---comp_process_dataset
comp_process_dataset-->file_train_mod1
comp_process_dataset-->file_train_mod2
comp_process_dataset-->file_test_mod1
Expand All @@ -79,14 +79,15 @@ flowchart LR
comp_method-->file_prediction
comp_metric-->file_score
file_prediction---comp_metric
file_dataset_other_mod---comp_process_dataset
file_common_dataset_other_mod---comp_process_dataset
```

## File format: Raw dataset RNA

The RNA modality of the raw dataset.

Example file: `resources_test/common/bmmc_cite_starter/dataset_rna.h5ad`
Example file:
`resources_test/common/neurips2021_bmmc_cite/dataset_rna.h5ad`

Description:

Expand All @@ -100,24 +101,31 @@ Format:
obs: 'batch', 'size_factors'
var: 'gene_ids'
obsm: 'gene_activity'
layers: 'counts'
uns: 'dataset_id', 'gene_activity_var_names'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'gene_activity_var_names'

</div>

Slot description:

<div class="small">

| Slot | Type | Description |
|:---------------------------------|:----------|:-------------------------------------------------------------------|
| `obs["batch"]` | `string` | Batch information. |
| `obs["size_factors"]` | `double` | (*Optional*) The size factors of the cells prior to normalization. |
| `var["gene_ids"]` | `string` | (*Optional*) The gene identifiers (if available). |
| `obsm["gene_activity"]` | `double` | (*Optional*) ATAC gene activity. |
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["gene_activity_var_names"]` | `string` | (*Optional*) Names of the gene activity matrix. |
| Slot | Type | Description |
|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
| `obs["batch"]` | `string` | Batch information. |
| `obs["size_factors"]` | `double` | (*Optional*) The size factors of the cells prior to normalization. |
| `var["gene_ids"]` | `string` | (*Optional*) The gene identifiers (if available). |
| `obsm["gene_activity"]` | `double` | (*Optional*) ATAC gene activity. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["gene_activity_var_names"]` | `string` | (*Optional*) Names of the gene activity matrix. |

</div>

Expand Down Expand Up @@ -149,7 +157,7 @@ Arguments:
The mod1 expression values of the train cells.

Example file:
`resources_test/predict_modality/bmmc_cite_starter/train_mod1.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/train_mod1.h5ad`

Description:

Expand Down Expand Up @@ -191,7 +199,7 @@ Slot description:
The mod2 expression values of the train cells.

Example file:
`resources_test/predict_modality/bmmc_cite_starter/train_mod2.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/train_mod2.h5ad`

Description:

Expand Down Expand Up @@ -233,7 +241,7 @@ Slot description:
The mod1 expression values of the test cells.

Example file:
`resources_test/predict_modality/bmmc_cite_starter/test_mod1.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/test_mod1.h5ad`

Description:

Expand Down Expand Up @@ -280,7 +288,7 @@ Slot description:
The mod2 expression values of the test cells.

Example file:
`resources_test/predict_modality/bmmc_cite_starter/test_mod2.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/test_mod2.h5ad`

Description:

Expand Down Expand Up @@ -387,7 +395,7 @@ Arguments:
A prediction of the mod2 expression values of the test cells

Example file:
`resources_test/predict_modality/bmmc_cite_starter/prediction.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/prediction.h5ad`

Description:

Expand Down Expand Up @@ -420,7 +428,7 @@ Slot description:
Metric score file

Example file:
`resources_test/predict_modality/bmmc_cite_starter/score.h5ad`
`resources_test/predict_modality/neurips2021_bmmc_cite/score.h5ad`

Description:

Expand Down Expand Up @@ -453,7 +461,8 @@ Slot description:
The second modality of the raw dataset. Must be an ADT or an ATAC
dataset

Example file: `resources_test/common/bmmc_cite_starter/dataset_adt.h5ad`
Example file:
`resources_test/common/neurips2021_bmmc_cite/dataset_other_mod.h5ad`

Description:

Expand All @@ -467,7 +476,7 @@ Format:
obs: 'batch', 'size_factors'
var: 'gene_ids'
obsm: 'gene_activity'
layers: 'counts'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'gene_activity_var_names'

</div>
Expand All @@ -483,6 +492,7 @@ Slot description:
| `var["gene_ids"]` | `string` | (*Optional*) The gene identifiers (if available). |
| `obsm["gene_activity"]` | `double` | (*Optional*) ATAC gene activity. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
Expand Down

0 comments on commit 45e9e43

Please sign in to comment.