Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_deltalake breaks with dask>=2024.3.1 #79

Closed
avriiil opened this issue Apr 5, 2024 · 8 comments
Closed

read_deltalake breaks with dask>=2024.3.1 #79

avriiil opened this issue Apr 5, 2024 · 8 comments

Comments

@avriiil
Copy link
Contributor

avriiil commented Apr 5, 2024

Hey folks! Amazing work on the query planning functionality with dask-expr :)
I was just running some deltalake queries and it seems like this upgrade is interfering with read_deltalake

I'm getting a NotImplementedError: dask_expr does not support a token argument. when running the code snippet below:

To reproduce:

from deltalake import write_deltalake
import dask_deltatable as ddt
import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame({
    "group": [1, 1, 2, 2, 3, 3, 4, 4],
    "num": list(range(8)),
    "letter": ["a", "b", "c", "d", "e", "f", "g", "h"],
})

write_deltalake("some-table", df, partition_by="group")

delta_path = "some-table"
ddf = ddt.read_deltalake(delta_path)

To fix:

  • downgrade Dask==2024.2.1
  • OR run
import dask
dask.config.set({'dataframe.query-planning': False})

EDIT
I've tried upgrading dask-expr as suggested here. This upgrades Dask to 2024.4.1 but does not fix the issue for me.

@milesgranger
Copy link
Collaborator

Hi @avriiil, this should have been fixed in ec1c90c, but hasn't been released. Are you / could you try installing from source and see if that fixes things for you?

@avriiil
Copy link
Contributor Author

avriiil commented Apr 5, 2024

Thanks @milesgranger! I don't have time to test that right now, workaround is fine for me atm just wanted to flag it for the dev team. Happy to wait for the release, thanks so much for your help :)

@jacobtomlinson
Copy link
Collaborator

Good to see you around @avriiil! I'm going to close this as resolved in #78.

@avriiil
Copy link
Contributor Author

avriiil commented May 20, 2024

@jacobtomlinson @milesgranger I'm still getting the same NotImplementedError on dask==2024.5.1 with deltalake==0.13.0

EDIT: strangely enough setting dask.config.set({'dataframe.query-planning': False}) as a workaround doesn't seem to be working.

and when I downgrade dask==2024.2.1 then import dask_deltatable fails with TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 import dask_deltatable as ddt

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask_deltatable/__init__.py:8](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask_deltatable/__init__.py#line=7)
      1 from __future__ import annotations
      3 __all__ = [
      4     "read_deltalake",
      5     "to_deltalake",
      6 ]
----> 8 from .core import read_deltalake as read_deltalake
      9 from .write import to_deltalake as to_deltalake

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask_deltatable/core.py:8](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask_deltatable/core.py#line=7)
      5 from functools import partial
      6 from typing import Any, cast
----> 8 import dask.dataframe as dd
      9 import pyarrow as pa
     10 import pyarrow.parquet as pq

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/__init__.py:100](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/__init__.py#line=99)
     98 import dask.dataframe._pyarrow_compat
     99 from dask.base import compute
--> 100 from dask.dataframe import backends, dispatch, rolling
    101 from dask.dataframe.core import (
    102     DataFrame,
    103     Index,
   (...)
    109     to_timedelta,
    110 )
    111 from dask.dataframe.groupby import Aggregation

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/backends.py:15](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/backends.py#line=14)
     13 from dask.backends import CreationDispatch, DaskBackendEntrypoint
     14 from dask.dataframe._compat import PANDAS_GE_220, is_any_real_numeric_dtype
---> 15 from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame
     16 from dask.dataframe.dispatch import (
     17     categorical_dtype_dispatch,
     18     concat,
   (...)
     35     union_categoricals_dispatch,
     36 )
     37 from dask.dataframe.extensions import make_array_nonempty, make_scalar

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/core.py:36](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/core.py#line=35)
     34 from dask.blockwise import Blockwise, BlockwiseDep, BlockwiseDepDict, blockwise
     35 from dask.context import globalmethod
---> 36 from dask.dataframe import methods
     37 from dask.dataframe._compat import (
     38     PANDAS_GE_140,
     39     PANDAS_GE_150,
   (...)
     47     is_string_dtype,
     48 )
     49 from dask.dataframe.accessor import CachedAccessor, DatetimeAccessor, StringAccessor

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/methods.py:34](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/methods.py#line=33)
     22 #  preserve compatibility while moving dispatch objects
     23 from dask.dataframe.dispatch import (  # noqa: F401
     24     concat,
     25     concat_dispatch,
   (...)
     32     union_categoricals,
     33 )
---> 34 from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like
     35 from dask.utils import _deprecated_kwarg
     37 # cuDF may try to import old dispatch functions

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/utils.py:20](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/utils.py#line=19)
     18 from dask.base import get_scheduler, is_dask_collection
     19 from dask.core import get_deps
---> 20 from dask.dataframe import (  # noqa: F401 register pandas extension types
     21     _dtypes,
     22     methods,
     23 )
     24 from dask.dataframe._compat import PANDAS_GE_150, tm  # noqa: F401
     25 from dask.dataframe.dispatch import (  # noqa : F401
     26     make_meta,
     27     make_meta_obj,
     28     meta_nonempty,
     29 )

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/_dtypes.py:9](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/_dtypes.py#line=8)
      6 import pandas as pd
      8 from dask.dataframe._compat import PANDAS_GE_150
----> 9 from dask.dataframe.extensions import make_array_nonempty, make_scalar
     12 @make_array_nonempty.register(pd.DatetimeTZDtype)
     13 def _(dtype):
     14     return pd.array([pd.Timestamp(1), pd.NaT], dtype=dtype)

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/extensions.py:8](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/extensions.py#line=7)
      1 """
      2 Support for pandas ExtensionArray in dask.dataframe.
      3 
      4 See :ref:`extensionarrays` for more.
      5 """
      6 from __future__ import annotations
----> 8 from dask.dataframe.accessor import (
      9     register_dataframe_accessor,
     10     register_index_accessor,
     11     register_series_accessor,
     12 )
     13 from dask.utils import Dispatch
     15 make_array_nonempty = Dispatch("make_array_nonempty")

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py:126](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py#line=125)
    113         token = f"{self._accessor_name}-{attr}"
    114         return self._series.map_partitions(
    115             self._delegate_method,
    116             self._accessor_name,
   (...)
    122             token=token,
    123         )
--> 126 class DatetimeAccessor(Accessor):
    127     """Accessor object for datetimelike properties of the Series values.
    128 
    129     Examples
   (...)
    132     >>> s.dt.microsecond  # doctest: +SKIP
    133     """
    135     _accessor_name = "dt"

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py:81](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py#line=80), in Accessor.__init_subclass__(cls, **kwargs)
     79 attr, min_version = item if isinstance(item, tuple) else (item, None)
     80 if not hasattr(cls, attr):
---> 81     _bind_property(cls, pd_cls, attr, min_version)

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py:35](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/dataframe/accessor.py#line=34), in _bind_property(cls, pd_cls, attr, min_version)
     33 except Exception:
     34     pass
---> 35 setattr(cls, attr, property(derived_from(pd_cls, version=min_version)(func)))

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py:987](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py#line=986), in derived_from.<locals>.wrapper(method)
    985 try:
    986     extra = getattr(method, "__doc__", None) or ""
--> 987     method.__doc__ = _derived_from(
    988         original_klass,
    989         method,
    990         ua_args=ua_args,
    991         extra=extra,
    992         skipblocks=skipblocks,
    993         inconsistencies=inconsistencies,
    994     )
    995     return method
    997 except AttributeError:

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py:940](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py#line=939), in _derived_from(cls, method, ua_args, extra, skipblocks, inconsistencies)
    938 # Mark unsupported arguments
    939 try:
--> 940     method_args = get_named_args(method)
    941     original_args = get_named_args(original_method)
    942     not_supported = [m for m in original_args if m not in method_args]

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py:701](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/site-packages/dask/utils.py#line=700), in get_named_args(func)
    699 def get_named_args(func) -> list[str]:
    700     """Get all non ``*args[/](http://127.0.0.1:8888/)**kwargs`` arguments for a function"""
--> 701     s = inspect.signature(func)
    702     return [
    703         n
    704         for n, p in s.parameters.items()
    705         if p.kind in [p.POSITIONAL_OR_KEYWORD, p.POSITIONAL_ONLY, p.KEYWORD_ONLY]
    706     ]

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py:3263](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py#line=3262), in signature(obj, follow_wrapped, globals, locals, eval_str)
   3261 def signature(obj, *, follow_wrapped=True, globals=None, locals=None, eval_str=False):
   3262     """Get a signature object for the passed callable."""
-> 3263     return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
   3264                                    globals=globals, locals=locals, eval_str=eval_str)

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py:3011](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py#line=3010), in Signature.from_callable(cls, obj, follow_wrapped, globals, locals, eval_str)
   3007 @classmethod
   3008 def from_callable(cls, obj, *,
   3009                   follow_wrapped=True, globals=None, locals=None, eval_str=False):
   3010     """Constructs Signature for the given callable object."""
-> 3011     return _signature_from_callable(obj, sigcls=cls,
   3012                                     follow_wrapper_chains=follow_wrapped,
   3013                                     globals=globals, locals=locals, eval_str=eval_str)

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py:2599](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py#line=2598), in _signature_from_callable(obj, follow_wrapper_chains, skip_bound_arg, globals, locals, eval_str, sigcls)
   2597     call = getattr_static(type(obj), '__call__', None)
   2598     if call is not None:
-> 2599         call = _descriptor_get(call, obj)
   2600         return _get_signature_of(call)
   2602 raise ValueError('callable {!r} is not supported by signature'.format(obj))

File [~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py:2432](http://127.0.0.1:8888/lab/tree/~/miniforge3/envs/delta-jupyter/lib/python3.11/inspect.py#line=2431), in _descriptor_get(descriptor, obj)
   2430 if get is _sentinel:
   2431     return descriptor
-> 2432 return get(descriptor, obj, type(obj))

TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object

@jacobtomlinson
Copy link
Collaborator

cc @phofl @rjzamora in case you have thoughts here?

@rjzamora
Copy link
Contributor

--> 701     s = inspect.signature(func

Only took a very brief look so far, but this looks like the python 3.11.9 bug that was supposed to be fixed in dask-2024.4.1

@avriiil
Copy link
Contributor Author

avriiil commented May 27, 2024

@rjzamora -- I can confirm that downgrading python==3.11.8 works.

EDIT:

  • it works fine with dask==2024.2.1
  • it doesn't work with dask==2024.5.1 --> I get a NotImplementedError: dask_expr does not support a token argument. even though I've set dask.config.set({'dataframe.query-planning': False})

UPDATE:

  • I need to set dask.config.set({'dataframe.query-planning': False}) before import dask-deltatable and then it works fine with latest dask as well.

@phofl
Copy link
Collaborator

phofl commented Jul 17, 2024

We've merged a PR now that should make stuff run smoothly with dask-expr.

@phofl phofl closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants