-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Pydantic to version 2 #160
Conversation
@NevilleS a major question we need to answer is if we also want to strip out everything that is I think it makes sense to me from a versioning standpoint, but I also want to get more informed opinions from |
@@ -93,9 +93,9 @@ jobs: | |||
Pytest-Matrix: | |||
strategy: | |||
matrix: | |||
python_version: ["3.8", "3.9", "3.10", "3.11"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.8 wasn't playing nicely, so I axed it. It is almost EOL and we can only make the matrix so large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, how do we generally communicate things like this to customers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than changelog, it'll automatically fail to pip install if they have an older version.
Afaik most people are using Docker containers, this would be most disruptive to customers using the CLI
pydantic_version: ["1.8.2", "1.9.2", "1.10.9"] | ||
pyyaml_version: ["5.4.1", "6.0"] | ||
python_version: ["3.9", "3.10", "3.11"] | ||
pydantic_version: ["2.2.1", "2.3.0", "2.4.2", "2.5.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the most recent ones, but Pydantic moves fast so we'll need to keep an eye on this
@@ -1,3 +1,3 @@ | |||
pydantic>=1.8.1,<1.11.0 | |||
pydantic>=2.2.1,<=2.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not cross-supporting Pydantic 1.0!
src/fideslang/validation.py
Outdated
""" | ||
A FidesKey type that creates a custom constrained string. | ||
""" | ||
FidesKey = Annotated[str, PlainValidator(validate_fides_key)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CustomTypes are now handled very differently, as Annotated instead of Classes that inherit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all these helpful review comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied you! haha yours are always so helpful
raise FidesValidationError("FidesKey can not self-reference!") | ||
return value | ||
|
||
|
||
def deprecated_version_later_than_added( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated almost all of these functions to be pure functions, not set up as specific validators. They are then more transparently composed on the model itself
@@ -16,14 +16,14 @@ def resources_dict(): | |||
""" | |||
resources_dict: Dict[str, Any] = { | |||
"data_category": models.DataCategory( | |||
organization_fides_key=1, | |||
organization_fides_key="1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic is more picky now! It will not do these subtle conversions (int -> str in this case) but instead will throw an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, feels like something that will cause potential issues with existing yaml files etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially for sure...I'm going to test this branch out in Fides and see what kind of damage it does 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it did, so I had to update everything in Fides
....will need to think about how to handle this if we don't want to break things...
although coming from the db they should already be correct
system_type="SYSTEM", | ||
tags=["some", "tags"], | ||
) | ||
|
||
def test_system_no_egress(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here, and I think this was a bug? Not sure why these were expected to fail before but aren't now. Especially given that they seem fine given how the model was written? This is really old code though so I'm not sure if just deleting it is right, but I'm not sure what it is solving either.
I would appreciate any possible insight here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to put this back and fix the model-level validation.
It looks like privacy_declarations_reference_data_flows
is not getting called now. This first test, the privacy declaration seems to reference a system but there are no system egresses?
"Config for the Evaluation" | ||
extra = "ignore" | ||
orm_mode = True | ||
model_config = ConfigDict(extra="ignore", from_attributes=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another change, this time with how models are configured
description="An array of data categories describing a system in a privacy declaration.", | ||
) | ||
data_use: FidesKey = Field( | ||
description="The Data Use describing a system in a privacy declaration.", | ||
) | ||
data_subjects: List[FidesKey] = Field( | ||
default_factory=list, | ||
default=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I nitted myself here! As taught by @pattisdr it is safe to use []
as a default value in Pydantic
Starting review..a lot to catch up on here! |
I apologize! I just wasn't sure how to break it up further I added comments everywhere I thought made sense but of course feel free to dig in and ask other questions and I'll reply as best as I can! |
Oh no this is exciting! Lots of great improvements here, I more meant big picture understanding what Pydantic 2 brings as well. |
There's a whole new lingo to learn! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge effort here Thomas. Main thing I'd make sure these validators that are working on a list of items still work, in places they're not getting called, but only small things were noted here, quick to turn around.
Have you just pinned this latest fideslang commit in Fides and experimented with the level effort required here? I wonder if trying to integrate will cause more changes needed in Fideslang so it might be nice to verify before we do the big 3.0 release.
check_valid_country_code | ||
) | ||
matching_parent_key_validator = validator("parent_key", allow_reuse=True, always=True)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, easier to follow too on the model!
name: Optional[str] = Field( | ||
default=None, description="Human-Readable name for this resource." | ||
) | ||
description: Optional[str] = Field( | ||
default=None, description="A detailed description of what this resource is." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was behind pulling out the usage of name_field and description_field here, when you're still using them on other models? Just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kind of a messy/bad abstraction. Names can sometimes be optional, and sometimes are not (they're treated as keys) so in my opinion this was a premature optimization/abstraction on my part :)
"""Config for the cookies""" | ||
|
||
orm_mode = True | ||
path: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it thanks!
default=None, | ||
description="Deprecated. " | ||
+ ( | ||
DataProtectionImpactAssessment.__doc__ or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like DataProtectionImpactAssessment > progress/link files might need default=None too (although this is supposed to be deprecated anyway)
src/fideslang/models.py
Outdated
@validator("legitimate_interest", always=True) | ||
@field_validator("legitimate_interest") | ||
@classmethod | ||
def set_legitimate_interest(cls, value: bool, values: Dict) -> bool: | ||
def set_legitimate_interest(cls, value: bool, info: ValidationInfo) -> bool: | ||
"""Sets if a legitimate interest is used.""" | ||
values = info.data | ||
|
||
if values["legal_basis"] == "Legitimate Interests": | ||
value = True | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to run if legitimate_interest is not an argument when instantiating a DataUse. Looks like you can add validate_default=True
to the legitimate_interest
Field above to get the same always=True
behavior.
(Doesn't matter much in practice, this field is deprecated, an argument to remove these deprecated fields soon)
@@ -93,9 +93,9 @@ jobs: | |||
Pytest-Matrix: | |||
strategy: | |||
matrix: | |||
python_version: ["3.8", "3.9", "3.10", "3.11"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, how do we generally communicate things like this to customers?
src/fideslang/validation.py
Outdated
""" | ||
A FidesKey type that creates a custom constrained string. | ||
""" | ||
FidesKey = Annotated[str, PlainValidator(validate_fides_key)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all these helpful review comments
@@ -1084,48 +1096,60 @@ class System(FidesModel): | |||
""" | |||
|
|||
registry_id: Optional[int] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in FIdeslang the type is an integer, but the tests updated to registry_id="1"
The ctl_systems
table has it as a string though, so should this be updated to a string? Come to think of it I think I've run into 500 errors before around this - the discrepancy between fideslang and the database here.
Table "public.ctl_systems"
Column | Type | Collation | Nullable | Default
--------------------------------------+--------------------------+-----------+----------+---------------------------
registry_id | character varying | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh oh...hmm ok I'll need to do more testing over in fides
for this
def __get_validators__(cls) -> Generator: | ||
yield cls.validate | ||
regex: Pattern[str] = re.compile(FIDES_KEY_PATTERN) | ||
if not regex.match(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This throws confusing Type errors if the fides key is not a string, should we add more validation to require FidesKeys to be a string upfront?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situations? We might need to pass in as str
specifically
system_type="SYSTEM", | ||
tags=["some", "tags"], | ||
) | ||
|
||
def test_system_no_egress(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to put this back and fix the model-level validation.
It looks like privacy_declarations_reference_data_flows
is not getting called now. This first test, the privacy declaration seems to reference a system but there are no system egresses?
no kidding! Soooo many major changes here, hopefully the performance improvements and new ergonomics are worth it! |
Migrated this one to Ethyca's downstream fork: ethyca#9. Once we finish this update for the Python bindings it'll be useful to merge upstream 👍 |
Closes #159
Description Of Changes
Pydantic 2 sees major changes that bring large performance advantages. Worth upgrading to stay up-to-date as well as enjoy downstream effects in increased performance
Pydantic 1 -> 2 Migration Guide: link
I fully expected we might need to tweak more things here based on testing against
Fides
directly, but we'll need to do more checks there to verify. Here is the corresponding Fides PRAs part of this migration, I adopted a very different general pattern for validation than we used before. Due to the major changes, I found it easier to use
model_validator
(formerly known asroot_validator
) and composed multiple validators together into a single, full-model validator instead of always using discrete validators by field.Code Changes
Pydantic
v2requirements.txt
Steps to Confirm
Pre-Merge Checklist
CHANGELOG.md