BiocPy · jkanche · Feb 20, 2024 · Feb 20, 2024 · Feb 20, 2024 · Feb 20, 2024
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -15,7 +15,7 @@ jobs:
       contents: write
     steps:
       - name: Check out repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
 
       - name: Set up Quarto
         uses: quarto-dev/quarto-actions/setup@v2
@@ -27,9 +27,24 @@ jobs:
         with:
           python-version: '3.9'
           cache: 'pip'
-      - run: pip install jupyter
       - run: pip install -r requirements.txt
 
+      - name: Install R
+        uses: r-lib/actions/setup-r@v2
+        with:
+          r-version: 'devel'
+      - uses: r-lib/actions/setup-r-dependencies@v2
+        with:
+          cache-version: 2
+          packages: |
+            any::BiocManager
+            any::alabaster
+            any::scRNAseq
+            any::rmarkdown
+            any::knitr
+            any::reticulate
+      # - run: Rscript r_requirements.r
+
       - name: Render
         uses: quarto-dev/quarto-actions/render@v2
         with:

diff --git a/_quarto.yml b/_quarto.yml
@@ -43,6 +43,7 @@ book:
         - chapters/experiments/extending_se.qmd
         - chapters/experiments/multiassay_expt.qmd
     - chapters/interop.qmd
+    - chapters/workflow.qmd
     - part: chapters/extras/index.qmd
       chapters:
         - chapters/extras/iranges.qmd

diff --git a/chapters/interop.qmd b/chapters/interop.qmd
@@ -1,6 +1,6 @@
 # Interop with R
 
-The [rds2py](https://github.com/BiocPy/rds2py) package serves as a Python interface to the [rds2cpp](https://github.com/LTLA/rds2cpp) library, enabling direct reading of RDS files within Python. This eliminates the need for additional data conversion tools or intermediate formats, streamlining the transition between Python and R for seamless analysis.
+The [rds2py](https://github.com/BiocPy/rds2py) package provides Python bindings to the [rds2cpp](https://github.com/LTLA/rds2cpp) library, enabling direct reading of RDS files within Python. This eliminates the need for additional data conversion tools or intermediate formats, streamlining the transition between Python and R for seamless analysis.
 
 One notable feature is the use of memory views (excluding strings) to access the same memory from C++ in Python, facilitated through Cython. This approach is particularly advantageous for handling large datasets, as it avoids unnecessary duplication of data.
 

diff --git a/chapters/workflow.qmd b/chapters/workflow.qmd
@@ -0,0 +1,95 @@
+---
+engine: knitr
+---
+
+# Interchange data between Python and R
+
+In this section, we will illustrate a workflow that utilizes language-agnostic representations for storing genomic data, facilitating seamless access to datasets and analysis results across multiple programming frameworks such as R and Python. The [ArtifactDB](https://github.com/artifactdb) framework supports this functionality.
+
+To begin, we will download the "zilionis lung" dataset from the [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) package. Subsequently, we will store this dataset in a language-agnostic format using the [alabaster suite](https://github.com/ArtifactDB/alabaster.base) of R packages.
+
+```{r}
+library(scRNAseq)
+library(alabaster)
+
+sce <- ZilionisLungData()
+saveObject(sce, path=paste(getwd(), "zilinoislung", sep="/"))
+```
+
+:::{.callout-note}
+Additionally, you can save this dataset as an RDS object for access in Python. Refer t [interop with R](./interop.qmd) section for more details.
+:::
+
+We can now load this dataset in Python using the [dolomite suite](https://github.com/ArtifactDB/dolomite-base) of Python packages. Both dolomite and alabaster are integral parts of the ArtifactDB ecosystem designed to read artifacts stored in language-agnostic formats.
+
+```{python}
+from dolomite_base import read_object
+
+data = read_object("./zilinoislung")
+print(data)
+```
+
+To demonstrate this workflow, we will employ the [CellTypist](https://github.com/Teichlab/celltypist) model to annotate cell types for this dataset. CellTypist operates on an AnnData representation.
+
+```{python}
+adata = data.to_anndata()
+```
+
+Before annotation, let's download the "human lung atlas" model from celltypist.
+
+```{python}
+import celltypist
+from celltypist import models
+
+models.download_models()
+model_name = "Human_Lung_Atlas.pkl"
+model = models.Model.load(model = model_namel)
+print(model)
+```
+
+Now, let's annotate our dataset.
+
+```{python}
+predictions = celltypist.annotate(adata, model = model_name, majority_voting = True)
+print(predictions.predicted_labels)
+```
+
+:::{.callout-note}
+The celltypist workflow is based on the tutorial described [here](https://colab.research.google.com/github/Teichlab/celltypist/blob/main/docs/notebook/celltypist_tutorial.ipynb#scrollTo=postal-chicken).
+:::
+
+Next, let's retrieve the `AnnData` object with the predicted labels embedded into the `obs` dataframe.
+
+```{python}
+adata = predictions.to_adata()
+```
+
+We can now reverse the workflow and save this object into an Artifactdb format from Python. However, the object needs to be converted to a `SingleCellExperiment` class first. Read more about our experiment representations [here](./experiments/singlecell_expt.qmd).
+
+```{python}
+from singlecellexperiment import SingleCellExperiment
+
+sce = SingleCellExperiment.from_anndata(adata)
+print(sce)
+```
+
+We use the dolomite package to save it into a language-agnostic format.
+```{python}
+import dolomite_base
+
+dolomite_base.save_object(df, "./zilinoislung_with_celltypist")
+```
+
+Finally, read the object back in R.
+```{r}
+sce_with_celltypist = readObject(path=paste(getwd(), "zilinoislung_with_celltypist", sep="/"))
+sce_with_celltypist
+```
+
+And that concludes the workflow. Leveraging the generic **read** functions `readObject` (R) and `read_object` (Python), along with the **save** functions `saveObject` (R) and `save_object` (Python), you can seamlessly store most Bioconductor objects in language-agnostic formats.
+
+----
+
+## Further reading
+
+- ArtifactDB GitHub organization - https://github.com/ArtifactDB.
diff --git a/r_requirements.r b/r_requirements.r
@@ -0,0 +1,6 @@
+install.packages(c("BiocManager", "devtools"), repos='http://cran.us.r-project.org')
+BiocManager::install(version = "3.18", ask=FALSE)
+
+# install alabaster
+BiocManager::install(c("alabaster", "scRNAseq"))
+
diff --git a/requirements.txt b/requirements.txt
@@ -21,4 +21,6 @@ mudata
 delayedarray[dask]
 joblib
 dolomite
-hdf5array
+hdf5array
+celltypist
+rpy2