No common gene when running Tangram in Sopa-CLI model #174

KunHHE · 2024-12-31T02:53:51Z

Hi @quentinblampey, I used CLI mode to run sopa, and want to use Tangram directly with sopa for my merfish data. But it error: no common gene found between .zarr and .h5ad reference. Looks like in the .h5ad reference, it hides gene names and gene ensemble id jump out for the cell type training, that is why two datasets cannot match.
Because in the jupyter I run :
gene_name_mapping = adata_sc.var['feature_name']
adata_sc.var_names = gene_name_mapping

Then the overlapped genes showed up for training.

Is there any way in the CLI running to figure out?

Thansk!

quentinblampey · 2025-01-02T13:38:59Z

Hi @KunHHE, indeed adata_sc.var_names should contain the gene names.
Could you update your reference (as you did), and update your .h5ad file?

NB: I believe it's easier to update the reference rather than adding an argument to the CLI

KunHHE · 2025-01-02T14:50:55Z

Thanks very much! @quentinblampey So you mean I update this using the code for the reference.h5ad: gene_name_mapping = adata_sc.var['feature_name']; adata_sc.var_names = gene_name_mapping. Then save it and reuse it in sopa for Tangram?

quentinblampey · 2025-01-03T18:20:04Z

Yes @KunHHE, exactly! Let me know if this works

KunHHE · 2025-01-04T20:03:58Z

HI, @quentinblampey, tested and it works. But I have a question, single cell resolution-like technologies like merscope, Xenium, Visium HD are recommended using uniform mode? based on the introduction from Tangram github. non-single cell like technologies are recommended using rna_count_based density_prior. In the CLI of sopa, it's not flexible switching to 'uniform'?
So after the Tangram is done, it will automatically adding INFO:root:spatial prediction dataframe is saved in obsm tangram_ct_pred of the spatial AnnData. I checked zarr folder-Table, and only see a folder named 'tangram_pred', but not 'tangram_ct_pred', there are cell types sub-folders, looks like something is wrong. Any idea?

And Can I ask you what is the next coding steps once read and open the AnnData, to project the cell types either mapping to leiden or spatial coordinates? This would be different from the 'tutorial_tangram_with_squidpy.ipynb'

For example, I should do normalization for the probability value and then project to cell types?

probabilities = np.array(comb_adata.obsm['tangram_pred'])
n_voxels = probabilities.shape[0]
n_cell_types = probabilities.shape[1]

predicted_cell_types = [XXXX cell types
]
assert len(predicted_cell_types) == n_cell_types, f"Mismatch: {len(predicted_cell_types)} vs {n_cell_types}"
sampled_cell_types = []
for voxel_idx in range(n_voxels):
voxel_probabilities = probabilities[voxel_idx, :]
voxel_probabilities /= np.sum(voxel_probabilities)
sampled_cell_type_idx = np.random.choice(n_cell_types, p=voxel_probabilities)
sampled_cell_type = predicted_cell_types[sampled_cell_type_idx]
sampled_cell_types.append(sampled_cell_type)
comb_adata.obs['sampled_cell_type'] = sampled_cell_types

Thanks!!!

(sopa) C:\Users\hekun>sopa annotate tangram C:/Users/hekun/Downloads/S3R1.zarr --sc-reference-path C:/Users/hekun/Downloads/M1_modified.h5ad --cell-type-key cell_type
C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\dask\dataframe_init_.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option dataframe.query-planning to True or None to enable the new Dask Dataframe implementation and silence this warning.
warnings.warn(
C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\anndata\utils.py:429: FutureWarning: Importing read_text from anndata is deprecated. Import anndata.io.read_text instead.
warnings.warn(msg, FutureWarning)
[INFO] (sopa.annotation.tangram.run) Using device: cpu
[INFO] (sopa.annotation.tangram.run) Running on level 0
[INFO] (sopa.annotation.tangram.run) Subsampling reference to 10000 cells...
[INFO] (sopa.annotation.tangram.run) (n_obs_spatial=22373, n_obs_ref=10000)
[INFO] (sopa.annotation.tangram.run) --- Split 1 / 3 ---
[INFO] (sopa.annotation.tangram.run) Using raw counts for the spatial adata object
[INFO] (sopa.annotation.tangram.run) Genes with zero counts: 0 spatial, 3312 ref
[INFO] (sopa.annotation.tangram.run) Keeping 404 shared genes
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 404 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.
Score: 0.189, KL reg: 0.322

quentinblampey · 2025-01-07T17:50:33Z

I'm not sure to understand your question. We use a uniform density, indeed.
Regarding the predictions, tangram saves them under tangram_ct_pred, but we then move it under tangram_pred, so you're looking at the right thing

quentinblampey mentioned this issue Jan 2, 2025

data reproduce on annotation for sopa paper #158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No common gene when running Tangram in Sopa-CLI model #174

No common gene when running Tangram in Sopa-CLI model #174

KunHHE commented Dec 31, 2024

quentinblampey commented Jan 2, 2025

KunHHE commented Jan 2, 2025

quentinblampey commented Jan 3, 2025

KunHHE commented Jan 4, 2025 •

edited

Loading

quentinblampey commented Jan 7, 2025

No common gene when running Tangram in Sopa-CLI model #174

No common gene when running Tangram in Sopa-CLI model #174

Comments

KunHHE commented Dec 31, 2024

quentinblampey commented Jan 2, 2025

KunHHE commented Jan 2, 2025

quentinblampey commented Jan 3, 2025

KunHHE commented Jan 4, 2025 • edited Loading

quentinblampey commented Jan 7, 2025

KunHHE commented Jan 4, 2025 •

edited

Loading