Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No common gene when running Tangram in Sopa-CLI model #174

Open
KunHHE opened this issue Dec 31, 2024 · 5 comments
Open

No common gene when running Tangram in Sopa-CLI model #174

KunHHE opened this issue Dec 31, 2024 · 5 comments

Comments

@KunHHE
Copy link

KunHHE commented Dec 31, 2024

Hi @quentinblampey, I used CLI mode to run sopa, and want to use Tangram directly with sopa for my merfish data. But it error: no common gene found between .zarr and .h5ad reference. Looks like in the .h5ad reference, it hides gene names and gene ensemble id jump out for the cell type training, that is why two datasets cannot match.
Because in the jupyter I run :
gene_name_mapping = adata_sc.var['feature_name']
adata_sc.var_names = gene_name_mapping

Then the overlapped genes showed up for training.

Is there any way in the CLI running to figure out?

Thansk!

@quentinblampey
Copy link
Collaborator

Hi @KunHHE, indeed adata_sc.var_names should contain the gene names.
Could you update your reference (as you did), and update your .h5ad file?

NB: I believe it's easier to update the reference rather than adding an argument to the CLI

@KunHHE
Copy link
Author

KunHHE commented Jan 2, 2025

Thanks very much! @quentinblampey So you mean I update this using the code for the reference.h5ad: gene_name_mapping = adata_sc.var['feature_name']; adata_sc.var_names = gene_name_mapping. Then save it and reuse it in sopa for Tangram?

@quentinblampey
Copy link
Collaborator

Yes @KunHHE, exactly! Let me know if this works

@KunHHE
Copy link
Author

KunHHE commented Jan 4, 2025

HI, @quentinblampey, tested and it works. But I have a question, single cell resolution-like technologies like merscope, Xenium, Visium HD are recommended using uniform mode? based on the introduction from Tangram github. non-single cell like technologies are recommended using rna_count_based density_prior. In the CLI of sopa, it's not flexible switching to 'uniform'?
So after the Tangram is done, it will automatically adding INFO:root:spatial prediction dataframe is saved in obsm tangram_ct_pred of the spatial AnnData. I checked zarr folder-Table, and only see a folder named 'tangram_pred', but not 'tangram_ct_pred', there are cell types sub-folders, looks like something is wrong. Any idea?

And Can I ask you what is the next coding steps once read and open the AnnData, to project the cell types either mapping to leiden or spatial coordinates? This would be different from the 'tutorial_tangram_with_squidpy.ipynb'

For example, I should do normalization for the probability value and then project to cell types?

probabilities = np.array(comb_adata.obsm['tangram_pred'])
n_voxels = probabilities.shape[0]
n_cell_types = probabilities.shape[1]

predicted_cell_types = [XXXX cell types
]
assert len(predicted_cell_types) == n_cell_types, f"Mismatch: {len(predicted_cell_types)} vs {n_cell_types}"
sampled_cell_types = []
for voxel_idx in range(n_voxels):
voxel_probabilities = probabilities[voxel_idx, :]
voxel_probabilities /= np.sum(voxel_probabilities)
sampled_cell_type_idx = np.random.choice(n_cell_types, p=voxel_probabilities)
sampled_cell_type = predicted_cell_types[sampled_cell_type_idx]
sampled_cell_types.append(sampled_cell_type)
comb_adata.obs['sampled_cell_type'] = sampled_cell_types

Thanks!!!

image

image

(sopa) C:\Users\hekun>sopa annotate tangram C:/Users/hekun/Downloads/S3R1.zarr --sc-reference-path C:/Users/hekun/Downloads/M1_modified.h5ad --cell-type-key cell_type
C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\dask\dataframe_init_.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option dataframe.query-planning to True or None to enable the new Dask Dataframe implementation and silence this warning.
warnings.warn(
C:\Users\hekun\miniconda3\envs\sopa\lib\site-packages\anndata\utils.py:429: FutureWarning: Importing read_text from anndata is deprecated. Import anndata.io.read_text instead.
warnings.warn(msg, FutureWarning)
[INFO] (sopa.annotation.tangram.run) Using device: cpu
[INFO] (sopa.annotation.tangram.run) Running on level 0
[INFO] (sopa.annotation.tangram.run) Subsampling reference to 10000 cells...
[INFO] (sopa.annotation.tangram.run) (n_obs_spatial=22373, n_obs_ref=10000)
[INFO] (sopa.annotation.tangram.run) --- Split 1 / 3 ---
[INFO] (sopa.annotation.tangram.run) Using raw counts for the spatial adata object
[INFO] (sopa.annotation.tangram.run) Genes with zero counts: 0 spatial, 3312 ref
[INFO] (sopa.annotation.tangram.run) Keeping 404 shared genes
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 404 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.
Score: 0.189, KL reg: 0.322

@quentinblampey
Copy link
Collaborator

I'm not sure to understand your question. We use a uniform density, indeed.
Regarding the predictions, tangram saves them under tangram_ct_pred, but we then move it under tangram_pred, so you're looking at the right thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants