Serialization of pd.NA #37

goncas23 · 2020-10-22T14:10:10Z

When trying to write an integer64 field, I was getting an error due to the presence of missing values. The missing values were in the form of pd.NA, rather than np.nan and they were not being excluded in the serialization.

I made an attempt to fix this and it worked, though might not be the most elegant solution. In the _replace function, I added a new replacement tuple to the list of replacements, very similar to the one that handles the nans:

def _replace(df):
    obj_cols = {k for k, v in dict(df.dtypes).items() if v is np.dtype('O')}
    other_cols = set(df.columns) - obj_cols
    obj_nans = (f'{k}="nan"' for k in obj_cols)
    other_nans = (f'{k}=nani?' for k in other_cols)
    obj_nas = (f'{k}="<NA>"' for k in obj_cols)
    other_nas = (f'{k}=<NA>i?' for k in other_cols)
    replacements = [
        ('|'.join(chain(obj_nans, other_nans)), ''),
        ('|'.join(chain(obj_nas, other_nas)), ''),
        (',{2,}', ','),
        ('|'.join([', ,', ', ', ' ,']), ' '),
    ]
    return replacements

Hope this ends up helping someone

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization of pd.NA #37

Serialization of pd.NA #37

goncas23 commented Oct 22, 2020

Serialization of pd.NA #37

Serialization of pd.NA #37

Comments

goncas23 commented Oct 22, 2020