Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping the geometry column no longer produces a Dask DataFrame object. #321

Open
fbunt opened this issue Dec 11, 2024 · 2 comments · May be fixed by #322
Open

Dropping the geometry column no longer produces a Dask DataFrame object. #321

fbunt opened this issue Dec 11, 2024 · 2 comments · May be fixed by #322

Comments

@fbunt
Copy link
Contributor

fbunt commented Dec 11, 2024

Dropping the geometry column from a dask-geopandas GeoDataFrame no longer produces a dask DataFrame object. This causes errors in future operations where a GeoDataFrame is not expected or where the resulting GeoDataFrame thinks it should have a geometry column. The issue seems to stem from the dask-expr backend.

Example:

>>> import dask_geopandas as dgpd
>>> import geopandas as gpd
>>>
>>> gdf = gpd.GeoSeries.from_xy(x=[1, 0], y=[0, 1], crs=4326).to_frame("geometry")
>>> gdf["data"] = [1, 2]
>>> gdf = dgpd.from_geopandas(gdf, npartitions=1)
>>> df = gdf.drop(columns="geometry")
>>> print(type(df))
dask_geopandas.expr.GeoDataFrame

Example Error:

>>> df.shape
<class 'dask_geopandas.expr.GeoDataFrame'>
Traceback (most recent call last):
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/geopandas/geodataframe.py", line 517, in crs
    return self.geometry.crs
           ^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/geopandas/geodataframe.py", line 253, in _get_geometry
    raise AttributeError(msg)
AttributeError: You are calling a geospatial method on the GeoDataFrame, but the active geometry column ('geometry') is not present. 
There are no existing columns with geometry data type. You can add a geometry column as the active geometry column with df.set_geometry. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_core.py", line 493, in __getattr__
    return object.__getattribute__(self, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_reductions.py", line 435, in _meta_chunk
    meta = meta_nonempty(self.frame._meta)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_geopandas/backends.py", line 71, in _nonempty_geodataframe
    return geopandas.GeoDataFrame(df, geometry=x._geometry_column_name, crs=x.crs)
                                                                            ^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/geopandas/geodataframe.py", line 519, in crs
    raise AttributeError(
AttributeError: The CRS attribute of a GeoDataFrame without an active geometry column is not defined. Use GeoDataFrame.set_geometry to set the active geometry column.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fred/homes/rts/work/raster_tools/tmp.py", line 9, in <module>
    df.shape
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_collection.py", line 2706, in shape
    return self.size // max(len(self.columns), 1), len(self.columns)
           ^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_collection.py", line 361, in size
    return new_collection(self.expr.size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_collection.py", line 4835, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_reductions.py", line 440, in _meta
    meta = self._meta_chunk
           ^^^^^^^^^^^^^^^^
  File "/home/fred/homes/rts/anaconda3/envs/rts-dev/lib/python3.12/site-packages/dask_expr/_core.py", line 498, in __getattr__
    raise RuntimeError(
RuntimeError: Failed to generate metadata for (Drop(frame=df, columns='geometry')).size(). This operation may not be supported by the current backend.

Environment:

dask                              2024.12.0     pyhd8ed1ab_1             conda-forge
dask-core                         2024.12.0     pyhd8ed1ab_1             conda-forge
dask-expr                         1.1.20        pyhd8ed1ab_1             conda-forge
dask-geopandas                    0.4.2         pyhd8ed1ab_0             conda-forge
dask-image                        2024.5.3      pyhd8ed1ab_0             conda-forge
geopandas                         1.0.1         pyhd8ed1ab_2             conda-forge
geopandas-base                    1.0.1         pyha770c72_2             conda-forge
@TomAugspurger
Copy link
Contributor

Thanks for the report. drop ends up as https://github.com/dask/dask-expr/blob/edb6fd54a5520491b9ebb16c9ed91333d7e02b22/dask_expr/_expr.py#L1825, which ends up calling drop_by_shallow_copy. That does an inplace drop, which isn't able to do the geopandas -> pandas conversion:

In [17]: df = geopandas.GeoDataFrame({"geometry": geopandas.points_from_xy([0, 0], [0, 1]), "a": 1})

In [18]: df.drop(columns="geometry", inplace=True)

In [19]: type(df)
Out[19]: geopandas.geodataframe.GeoDataFrame

I'll put up a PR with a fix.

@TomAugspurger TomAugspurger linked a pull request Dec 21, 2024 that will close this issue
@fbunt
Copy link
Contributor Author

fbunt commented Dec 21, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants