When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors #1974

albertocalderari · 2022-04-12T13:59:34Z

I've noticed that when running under spark issing is not skipped.
It is propagated to the deserialiser/serializer which, in case of an int value explodes with:

TypeError(int() argument must be a string, a bytes-like object or a real number, not '_Missing')

Alos all the missing string fields are serialized as '<marshmallow.missing>'.

This oddly doesn't happen in a unit test, only when I execute within spark.
The exception gets thrown here: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L520

See below my pip freeze:

attrs==21.4.0
iniconfig==1.1.1
marshmallow==3.15.0
marshmallow-dataclass==8.4.1
marshmallow-enum==1.5.1
marshmallow-union==0.1.15
mypy-extensions==0.4.3
packaging==21.3
pluggy==1.0.0
py==1.11.0
py4j==0.10.9
pyparsing==3.0.8
pyspark==3.1.2
pytest==6.2.5
PyYAML==6.0
toml==0.10.2
typeguard==2.13.3
typing-inspect==0.7.1
typing_extensions==4.1.1

I'm running under Python 3.10.2.

The text was updated successfully, but these errors were encountered:

albertocalderari changed the title ~~Ehn deserializing/Serializing in a spark job _Missing fileds throw TypeErrrors~~ When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors #1974

When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors #1974

albertocalderari commented Apr 12, 2022

When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors #1974

When deserializing/serializing in a spark job _Missing fileds throw TypeErrrors #1974

Comments

albertocalderari commented Apr 12, 2022