Skip to content
This repository has been archived by the owner on Aug 5, 2023. It is now read-only.

Serialization of pd.NA #37

Open
goncas23 opened this issue Oct 22, 2020 · 0 comments
Open

Serialization of pd.NA #37

goncas23 opened this issue Oct 22, 2020 · 0 comments

Comments

@goncas23
Copy link

When trying to write an integer64 field, I was getting an error due to the presence of missing values. The missing values were in the form of pd.NA, rather than np.nan and they were not being excluded in the serialization.

I made an attempt to fix this and it worked, though might not be the most elegant solution. In the _replace function, I added a new replacement tuple to the list of replacements, very similar to the one that handles the nans:

def _replace(df):
    obj_cols = {k for k, v in dict(df.dtypes).items() if v is np.dtype('O')}
    other_cols = set(df.columns) - obj_cols
    obj_nans = (f'{k}="nan"' for k in obj_cols)
    other_nans = (f'{k}=nani?' for k in other_cols)
    obj_nas = (f'{k}="<NA>"' for k in obj_cols)
    other_nas = (f'{k}=<NA>i?' for k in other_cols)
    replacements = [
        ('|'.join(chain(obj_nans, other_nans)), ''),
        ('|'.join(chain(obj_nas, other_nas)), ''),
        (',{2,}', ','),
        ('|'.join([', ,', ', ', ' ,']), ' '),
    ]
    return replacements

Hope this ends up helping someone

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant