Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: postgres.to_pyarrow(ibis.uuid()) errors #8902

Closed
1 task done
NickCrews opened this issue Apr 5, 2024 · 1 comment
Closed
1 task done

bug: postgres.to_pyarrow(ibis.uuid()) errors #8902

NickCrews opened this issue Apr 5, 2024 · 1 comment
Labels
bug Incorrect behavior inside of ibis

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Apr 5, 2024

What happened?

See the xfailing test in #8901

ibis/backends/__init__.py:221: in to_pyarrow
    table = pa.Table.from_batches(reader, schema=arrow_schema)
pyarrow/table.pxi:4104: in pyarrow.lib.Table.from_batches
    ???
pyarrow/ipc.pxi:666: in pyarrow.lib.RecordBatchReader.__next__
    ???
pyarrow/ipc.pxi:700: in pyarrow.lib.RecordBatchReader.read_next_batch
    ???
pyarrow/error.pxi:88: in pyarrow.lib.check_status
    ???
ibis/backends/sql/__init__.py:375: in <genexpr>
    pa.array(map(tuple, batch), type=array_type)
pyarrow/array.pxi:343: in pyarrow.lib.array
    ???
pyarrow/array.pxi:42: in pyarrow.lib._sequence_to_array
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowTypeError: Expected bytes, got a 'UUID' object

pyarrow/error.pxi:91: ArrowTypeError

in ibis/formats/pyarrow.py, we say that ibis.UUID should map to pa.string. This makes sense, there is no builting UUID in pyarrow.

but then in SQLBackend.to_pyarrow_batches(), we do

schema = expr.as_table().schema()
        array_type = schema.as_struct().to_pyarrow()
        arrays = (
            pa.array(map(tuple, batch), type=array_type)
            for batch in self._cursor_batches(
                expr, params=params, limit=limit, chunk_size=chunk_size
            )
        )

here, batch is tuples of the python standard lib uuid.UUID from the psyopg2 cursor. Then, the pa.array(<uuid.UUIDs> type=pa.string) call fails.

I see a few possible paths here:

  • do the conversion on the backend side, before fetching results. Then the psyopg2 cursor returns actual strings. We already have the ibis->pyarrow type mappings, we could then roundtrip the other way first, eg expr.cast(type.to_arrow().to_ibis()) Probably the easiest, but might lead to the wrong types sometimes if some info is lost during this roundtrip?
  • do the conversion after fetching from the cursor. maybe hardcode the few datatypes that are problems?
  • submit a patch to upstream pyarrow to accept uuid.UUID types

What version of ibis are you using?

main

What backend(s) are you using, if any?

postgres

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cpcloud
Copy link
Member

cpcloud commented Aug 11, 2024

Closing in favor of #8532.

@cpcloud cpcloud closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants