update readmes (#319)

openproblems-bio · Dec 19, 2023 · 45e9e43 · 45e9e43
1 parent 2cf2a73
commit 45e9e43
Show file tree

Hide file tree

Showing 3 changed files with 40 additions and 26 deletions.
diff --git a/src/tasks/denoising/README.md b/src/tasks/denoising/README.md
@@ -93,7 +93,7 @@ Format:
 <div class="small">
 
     AnnData object
-     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
+     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
      var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
      obsm: 'X_pca'
      obsp: 'knn_distances', 'knn_connectivities'
@@ -120,6 +120,8 @@ Slot description:
 | `obs["disease_ontology_term_id"]`                 | `string`  | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).                                                                                                                                                                                                                              |
 | `obs["donor_id"]`                                 | `string`  | (*Optional*) Identifier for the donor from whom the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | `obs["is_primary_data"]`                          | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.                                                                                                                                                                                                                                                                                                                                          |
+| `obs["organism"]`                                 | `string`  | (*Optional*) Organism from which the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["organism_ontology_term_id"]`                | `string`  | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.                                                                                                                                                                                                                                                                             |
 | `obs["self_reported_ethnicity"]`                  | `string`  | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.                                                                                                                                                                                                                                                                                                                                                      |
 | `obs["self_reported_ethnicity_ontology_term_id"]` | `string`  | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.                                                                                                                                                                                                                    |
 | `obs["sex"]`                                      | `string`  | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.                                                                                                                                                                                                                                                                                                                                                                 |

diff --git a/src/tasks/dimensionality_reduction/README.md b/src/tasks/dimensionality_reduction/README.md
@@ -85,7 +85,7 @@ Format:
 <div class="small">
 
     AnnData object
-     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
+     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
      var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
      obsm: 'X_pca'
      obsp: 'knn_distances', 'knn_connectivities'
@@ -112,6 +112,8 @@ Slot description:
 | `obs["disease_ontology_term_id"]`                 | `string`  | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).                                                                                                                                                                                                                              |
 | `obs["donor_id"]`                                 | `string`  | (*Optional*) Identifier for the donor from whom the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | `obs["is_primary_data"]`                          | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.                                                                                                                                                                                                                                                                                                                                          |
+| `obs["organism"]`                                 | `string`  | (*Optional*) Organism from which the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["organism_ontology_term_id"]`                | `string`  | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.                                                                                                                                                                                                                                                                             |
 | `obs["self_reported_ethnicity"]`                  | `string`  | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.                                                                                                                                                                                                                                                                                                                                                      |
 | `obs["self_reported_ethnicity_ontology_term_id"]` | `string`  | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.                                                                                                                                                                                                                    |
 | `obs["sex"]`                                      | `string`  | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.                                                                                                                                                                                                                                                                                                                                                                 |

diff --git a/src/tasks/predict_modality/README.md b/src/tasks/predict_modality/README.md
@@ -50,7 +50,7 @@ the information about cellular state from one modality to the other.
 
 ``` mermaid
 flowchart LR
-  file_dataset_rna("Raw dataset RNA")
+  file_common_dataset_rna("Raw dataset RNA")
   comp_process_dataset[/"Data processor"/]
   file_train_mod1("Train mod1")
   file_train_mod2("Train mod2")
@@ -61,8 +61,8 @@ flowchart LR
   comp_metric[/"Metric"/]
   file_prediction("Prediction")
   file_score("Score")
-  file_dataset_other_mod("Raw dataset mod2")
-  file_dataset_rna---comp_process_dataset
+  file_common_dataset_other_mod("Raw dataset mod2")
+  file_common_dataset_rna---comp_process_dataset
   comp_process_dataset-->file_train_mod1
   comp_process_dataset-->file_train_mod2
   comp_process_dataset-->file_test_mod1
@@ -79,14 +79,15 @@ flowchart LR
   comp_method-->file_prediction
   comp_metric-->file_score
   file_prediction---comp_metric
-  file_dataset_other_mod---comp_process_dataset
+  file_common_dataset_other_mod---comp_process_dataset
 ```
 
 ## File format: Raw dataset RNA
 
 The RNA modality of the raw dataset.
 
-Example file: `resources_test/common/bmmc_cite_starter/dataset_rna.h5ad`
+Example file:
+`resources_test/common/neurips2021_bmmc_cite/dataset_rna.h5ad`
 
 Description:
 
@@ -100,24 +101,31 @@ Format:
      obs: 'batch', 'size_factors'
      var: 'gene_ids'
      obsm: 'gene_activity'
-     layers: 'counts'
-     uns: 'dataset_id', 'gene_activity_var_names'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'gene_activity_var_names'
 
 </div>
 
 Slot description:
 
 <div class="small">
 
-| Slot                             | Type      | Description                                                        |
-|:---------------------------------|:----------|:-------------------------------------------------------------------|
-| `obs["batch"]`                   | `string`  | Batch information.                                                 |
-| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization. |
-| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                  |
-| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                   |
-| `layers["counts"]`               | `integer` | Raw counts.                                                        |
-| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                               |
-| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                    |
+| Slot                             | Type      | Description                                                                    |
+|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                             |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization.             |
+| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                              |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
+| `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`       | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`         | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]`     | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                                |
 
 </div>
 
@@ -149,7 +157,7 @@ Arguments:
 The mod1 expression values of the train cells.
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/train_mod1.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/train_mod1.h5ad`
 
 Description:
 
@@ -191,7 +199,7 @@ Slot description:
 The mod2 expression values of the train cells.
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/train_mod2.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/train_mod2.h5ad`
 
 Description:
 
@@ -233,7 +241,7 @@ Slot description:
 The mod1 expression values of the test cells.
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/test_mod1.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/test_mod1.h5ad`
 
 Description:
 
@@ -280,7 +288,7 @@ Slot description:
 The mod2 expression values of the test cells.
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/test_mod2.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/test_mod2.h5ad`
 
 Description:
 
@@ -387,7 +395,7 @@ Arguments:
 A prediction of the mod2 expression values of the test cells
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/prediction.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/prediction.h5ad`
 
 Description:
 
@@ -420,7 +428,7 @@ Slot description:
 Metric score file
 
 Example file:
-`resources_test/predict_modality/bmmc_cite_starter/score.h5ad`
+`resources_test/predict_modality/neurips2021_bmmc_cite/score.h5ad`
 
 Description:
 
@@ -453,7 +461,8 @@ Slot description:
 The second modality of the raw dataset. Must be an ADT or an ATAC
 dataset
 
-Example file: `resources_test/common/bmmc_cite_starter/dataset_adt.h5ad`
+Example file:
+`resources_test/common/neurips2021_bmmc_cite/dataset_other_mod.h5ad`
 
 Description:
 
@@ -467,7 +476,7 @@ Format:
      obs: 'batch', 'size_factors'
      var: 'gene_ids'
      obsm: 'gene_activity'
-     layers: 'counts'
+     layers: 'counts', 'normalized'
      uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'gene_activity_var_names'
 
 </div>
@@ -483,6 +492,7 @@ Slot description:
 | `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                              |
 | `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
 | `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
 | `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
 | `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
 | `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |