Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include R #6

Closed
wants to merge 17 commits into from
19 changes: 17 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
contents: write
steps:
- name: Check out repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2
Expand All @@ -27,9 +27,24 @@ jobs:
with:
python-version: '3.9'
cache: 'pip'
- run: pip install jupyter
- run: pip install -r requirements.txt

- name: Install R
uses: r-lib/actions/setup-r@v2
with:
r-version: 'devel'
- uses: r-lib/actions/setup-r-dependencies@v2
with:
cache-version: 2
packages: |
any::BiocManager
any::alabaster
any::scRNAseq
any::rmarkdown
any::knitr
any::reticulate
# - run: Rscript r_requirements.r

- name: Render
uses: quarto-dev/quarto-actions/render@v2
with:
Expand Down
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ book:
- chapters/experiments/extending_se.qmd
- chapters/experiments/multiassay_expt.qmd
- chapters/interop.qmd
- chapters/workflow.qmd
- part: chapters/extras/index.qmd
chapters:
- chapters/extras/iranges.qmd
Expand Down
2 changes: 1 addition & 1 deletion chapters/interop.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Interop with R

The [rds2py](https://github.com/BiocPy/rds2py) package serves as a Python interface to the [rds2cpp](https://github.com/LTLA/rds2cpp) library, enabling direct reading of RDS files within Python. This eliminates the need for additional data conversion tools or intermediate formats, streamlining the transition between Python and R for seamless analysis.
The [rds2py](https://github.com/BiocPy/rds2py) package provides Python bindings to the [rds2cpp](https://github.com/LTLA/rds2cpp) library, enabling direct reading of RDS files within Python. This eliminates the need for additional data conversion tools or intermediate formats, streamlining the transition between Python and R for seamless analysis.

One notable feature is the use of memory views (excluding strings) to access the same memory from C++ in Python, facilitated through Cython. This approach is particularly advantageous for handling large datasets, as it avoids unnecessary duplication of data.

Expand Down
95 changes: 95 additions & 0 deletions chapters/workflow.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
engine: knitr
---

# Interchange data between Python and R

In this section, we will illustrate a workflow that utilizes language-agnostic representations for storing genomic data, facilitating seamless access to datasets and analysis results across multiple programming frameworks such as R and Python. The [ArtifactDB](https://github.com/artifactdb) framework supports this functionality.

To begin, we will download the "zilionis lung" dataset from the [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) package. Subsequently, we will store this dataset in a language-agnostic format using the [alabaster suite](https://github.com/ArtifactDB/alabaster.base) of R packages.

```{r}
library(scRNAseq)
library(alabaster)

sce <- ZilionisLungData()
saveObject(sce, path=paste(getwd(), "zilinoislung", sep="/"))
```

:::{.callout-note}
Additionally, you can save this dataset as an RDS object for access in Python. Refer t [interop with R](./interop.qmd) section for more details.
:::

We can now load this dataset in Python using the [dolomite suite](https://github.com/ArtifactDB/dolomite-base) of Python packages. Both dolomite and alabaster are integral parts of the ArtifactDB ecosystem designed to read artifacts stored in language-agnostic formats.

```{python}
from dolomite_base import read_object

data = read_object("./zilinoislung")
print(data)
```

To demonstrate this workflow, we will employ the [CellTypist](https://github.com/Teichlab/celltypist) model to annotate cell types for this dataset. CellTypist operates on an AnnData representation.

```{python}
adata = data.to_anndata()
```

Before annotation, let's download the "human lung atlas" model from celltypist.

```{python}
import celltypist
from celltypist import models

models.download_models()
model_name = "Human_Lung_Atlas.pkl"
model = models.Model.load(model = model_namel)
print(model)
```

Now, let's annotate our dataset.

```{python}
predictions = celltypist.annotate(adata, model = model_name, majority_voting = True)
print(predictions.predicted_labels)
```

:::{.callout-note}
The celltypist workflow is based on the tutorial described [here](https://colab.research.google.com/github/Teichlab/celltypist/blob/main/docs/notebook/celltypist_tutorial.ipynb#scrollTo=postal-chicken).
:::

Next, let's retrieve the `AnnData` object with the predicted labels embedded into the `obs` dataframe.

```{python}
adata = predictions.to_adata()
```

We can now reverse the workflow and save this object into an Artifactdb format from Python. However, the object needs to be converted to a `SingleCellExperiment` class first. Read more about our experiment representations [here](./experiments/singlecell_expt.qmd).

```{python}
from singlecellexperiment import SingleCellExperiment

sce = SingleCellExperiment.from_anndata(adata)
print(sce)
```

We use the dolomite package to save it into a language-agnostic format.
```{python}
import dolomite_base

dolomite_base.save_object(df, "./zilinoislung_with_celltypist")
```

Finally, read the object back in R.
```{r}
sce_with_celltypist = readObject(path=paste(getwd(), "zilinoislung_with_celltypist", sep="/"))
sce_with_celltypist
```

And that concludes the workflow. Leveraging the generic **read** functions `readObject` (R) and `read_object` (Python), along with the **save** functions `saveObject` (R) and `save_object` (Python), you can seamlessly store most Bioconductor objects in language-agnostic formats.

----

## Further reading

- ArtifactDB GitHub organization - https://github.com/ArtifactDB.
6 changes: 6 additions & 0 deletions r_requirements.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
install.packages(c("BiocManager", "devtools"), repos='http://cran.us.r-project.org')
BiocManager::install(version = "3.18", ask=FALSE)

# install alabaster
BiocManager::install(c("alabaster", "scRNAseq"))

4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,6 @@ mudata
delayedarray[dask]
joblib
dolomite
hdf5array
hdf5array
celltypist
rpy2
Loading