Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make obs and var optional, make obs_names and var_names required #76

Closed
rcannood opened this issue May 10, 2023 · 5 comments
Closed

Make obs and var optional, make obs_names and var_names required #76

rcannood opened this issue May 10, 2023 · 5 comments
Labels
design Discussion about how things should be designed
Milestone

Comments

@rcannood
Copy link
Collaborator

At the moment, we require that obs and var are passed to the InMemoryAnnData and HDF5AnnData, while other slots are optional.

However, looking at what happens when I create an empty AnnData in Python, it's actually the obs_names and the var_names that are required, and not obs and var -- I guess.

import anndata as ad
adata = ad.AnnData(shape=[10, 20])
adata.write_h5ad("empty.h5ad")
file <- rhdf5::H5Fopen("empty.h5ad")
rhdf5::h5ls(file)
  group   name       otype dclass dim
0     / layers   H5I_GROUP           
1     /    obs   H5I_GROUP           
2  /obs _index H5I_DATASET STRING  10
3     /   obsm   H5I_GROUP           
4     /   obsp   H5I_GROUP           
5     /    uns   H5I_GROUP           
6     /    var   H5I_GROUP           
7  /var _index H5I_DATASET STRING  20
8     /   varm   H5I_GROUP           
9     /   varp   H5I_GROUP

As such; I propose altering the code of InMemoryAnnData and HDF5AnnData require the obs_names and the var_names to be passed, not the obs and the var.

@rcannood rcannood added the design Discussion about how things should be designed label May 10, 2023
@lazappi
Copy link
Collaborator

lazappi commented May 10, 2023

I think this makes sense but is it much different to just doing obs = data.frame(row.names=obs_names).

What does Python anndata put in obs_names/var_names when you just set the shape?

@rcannood
Copy link
Collaborator Author

I agree this is very practical, but given #73 we can't add the obs_names to the obs as rownames or to the X as rownames because the obs_names might be integers instead of characters.

@rcannood
Copy link
Collaborator Author

What does Python anndata put in obs_names/var_names when you just set the shape?

This is what's shown in the original message, no?

@lazappi
Copy link
Collaborator

lazappi commented May 12, 2023

What does Python anndata put in obs_names/var_names when you just set the shape?

This is what's shown in the original message, no?

Not entirely, but I could just check myself rather than asking you silly questions 😸

@rcannood
Copy link
Collaborator Author

:p

Closed with #81

lazappi added a commit that referenced this issue May 22, 2023
* origin/main: (23 commits)
  initial implementation of from_Seurat (#64)
  only check size when the names are already defined
  todo: add back obs_names and var_names length check
  update roxygen
  move required args to the front
  Update R/HDF5-read.R
  fix lint issues
  update tests
  refactor code to assume obs_names and var_names are defined
  Style and lint
  Add rhdf5 package skipe to HDF5 tests
  Add 1D sparse array to example.h5ad
  Remove requireNamespace("rhdf5")
  Fix bug in read_h5ad_string_array()
  Add support for reading H5AD rec arrays
  Covert 1D string arrays to vectors in read_h5ad_string_array()
  Update tests to use example.h5ad
  wip changes for #76
  Remove Krumsiek augemented dataset
  Reduce example H5AD to 50 cells x 100 genes
  ...
@rcannood rcannood added this to the 1.0.0 milestone Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Discussion about how things should be designed
Projects
None yet
Development

No branches or pull requests

2 participants