You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What are the different metrics I can use to check quality of PII information produced?
report.get_diagnostics() checks the coverage and range of numerical/categorical data. But is there a sole metric I can use to check like duplicacy/quality of PII generated ?
The text was updated successfully, but these errors were encountered:
Thanks for filing this issue @yash-rathore. This requires some more thought. We can keep it open to communicate updates and have discussions.
At a high level, it would be interesting to identify the useful properties of PII columns.
Duplicity may be one, in the sense that we can check if the sensitive values in the synthetic data are repeats of the real data. Do note that some duplicity might be ok -- and it may even be good for privacy, as it prevents an "attack by omission" (wherein an attacker knows what is in the real data by identifying what is missing from the synthetic data)
Quality is an interesting one. How are you thinking about quality in PII values?
Problem Description
What are the different metrics I can use to check quality of PII information produced?
report.get_diagnostics() checks the coverage and range of numerical/categorical data. But is there a sole metric I can use to check like duplicacy/quality of PII generated ?
The text was updated successfully, but these errors were encountered: