Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Pydantic to version 2 #160

Closed
wants to merge 29 commits into from

Conversation

ThomasLaPiana
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana commented Sep 7, 2023

Closes #159

Description Of Changes

Pydantic 2 sees major changes that bring large performance advantages. Worth upgrading to stay up-to-date as well as enjoy downstream effects in increased performance

Pydantic 1 -> 2 Migration Guide: link

I fully expected we might need to tweak more things here based on testing against Fides directly, but we'll need to do more checks there to verify. Here is the corresponding Fides PR

As part of this migration, I adopted a very different general pattern for validation than we used before. Due to the major changes, I found it easier to use model_validator (formerly known as root_validator) and composed multiple validators together into a single, full-model validator instead of always using discrete validators by field.

Code Changes

  • make updates required for Pydantic v2
  • update the requirement bounds in requirements.txt
  • remove Python 3.8 from supported versions

Steps to Confirm

  • CI checks pass

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md

@ThomasLaPiana ThomasLaPiana self-assigned this Sep 7, 2023
@ThomasLaPiana ThomasLaPiana changed the base branch from main to ThomasLaPiana-remove-data-qualifiers November 10, 2023 08:30
requirements.txt Outdated Show resolved Hide resolved
@ThomasLaPiana
Copy link
Contributor Author

@NevilleS a major question we need to answer is if we also want to strip out everything that is deprecated as part of 3.0

I think it makes sense to me from a versioning standpoint, but I also want to get more informed opinions from

@SteveDMurphy @pattisdr & @adamsachs

@@ -93,9 +93,9 @@ jobs:
Pytest-Matrix:
strategy:
matrix:
python_version: ["3.8", "3.9", "3.10", "3.11"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.8 wasn't playing nicely, so I axed it. It is almost EOL and we can only make the matrix so large

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, how do we generally communicate things like this to customers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than changelog, it'll automatically fail to pip install if they have an older version.

Afaik most people are using Docker containers, this would be most disruptive to customers using the CLI

pydantic_version: ["1.8.2", "1.9.2", "1.10.9"]
pyyaml_version: ["5.4.1", "6.0"]
python_version: ["3.9", "3.10", "3.11"]
pydantic_version: ["2.2.1", "2.3.0", "2.4.2", "2.5.0"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the most recent ones, but Pydantic moves fast so we'll need to keep an eye on this

@@ -1,3 +1,3 @@
pydantic>=1.8.1,<1.11.0
pydantic>=2.2.1,<=2.6.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not cross-supporting Pydantic 1.0!

"""
A FidesKey type that creates a custom constrained string.
"""
FidesKey = Annotated[str, PlainValidator(validate_fides_key)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomTypes are now handled very differently, as Annotated instead of Classes that inherit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all these helpful review comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied you! haha yours are always so helpful

raise FidesValidationError("FidesKey can not self-reference!")
return value


def deprecated_version_later_than_added(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated almost all of these functions to be pure functions, not set up as specific validators. They are then more transparently composed on the model itself

@@ -16,14 +16,14 @@ def resources_dict():
"""
resources_dict: Dict[str, Any] = {
"data_category": models.DataCategory(
organization_fides_key=1,
organization_fides_key="1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic is more picky now! It will not do these subtle conversions (int -> str in this case) but instead will throw an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, feels like something that will cause potential issues with existing yaml files etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially for sure...I'm going to test this branch out in Fides and see what kind of damage it does 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it did, so I had to update everything in Fides....will need to think about how to handle this if we don't want to break things...

although coming from the db they should already be correct

system_type="SYSTEM",
tags=["some", "tags"],
)

def test_system_no_egress(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here, and I think this was a bug? Not sure why these were expected to fail before but aren't now. Especially given that they seem fine given how the model was written? This is really old code though so I'm not sure if just deleting it is right, but I'm not sure what it is solving either.

I would appreciate any possible insight here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put this back and fix the model-level validation.

It looks like privacy_declarations_reference_data_flows is not getting called now. This first test, the privacy declaration seems to reference a system but there are no system egresses?

"Config for the Evaluation"
extra = "ignore"
orm_mode = True
model_config = ConfigDict(extra="ignore", from_attributes=True)
Copy link
Contributor Author

@ThomasLaPiana ThomasLaPiana Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another change, this time with how models are configured

description="An array of data categories describing a system in a privacy declaration.",
)
data_use: FidesKey = Field(
description="The Data Use describing a system in a privacy declaration.",
)
data_subjects: List[FidesKey] = Field(
default_factory=list,
default=[],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I nitted myself here! As taught by @pattisdr it is safe to use [] as a default value in Pydantic

@ThomasLaPiana ThomasLaPiana marked this pull request as ready for review November 14, 2023 08:38
@pattisdr
Copy link
Contributor

Starting review..a lot to catch up on here!

@ThomasLaPiana
Copy link
Contributor Author

Starting review..a lot to catch up on here!

I apologize! I just wasn't sure how to break it up further

I added comments everywhere I thought made sense but of course feel free to dig in and ask other questions and I'll reply as best as I can!

@pattisdr
Copy link
Contributor

Oh no this is exciting! Lots of great improvements here, I more meant big picture understanding what Pydantic 2 brings as well.

@pattisdr
Copy link
Contributor

There's a whole new lingo to learn!

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge effort here Thomas. Main thing I'd make sure these validators that are working on a list of items still work, in places they're not getting called, but only small things were noted here, quick to turn around.

Have you just pinned this latest fideslang commit in Fides and experimented with the level effort required here? I wonder if trying to integrate will cause more changes needed in Fideslang so it might be nice to verify before we do the big 3.0 release.

check_valid_country_code
)
matching_parent_key_validator = validator("parent_key", allow_reuse=True, always=True)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, easier to follow too on the model!

Comment on lines +67 to +72
name: Optional[str] = Field(
default=None, description="Human-Readable name for this resource."
)
description: Optional[str] = Field(
default=None, description="A detailed description of what this resource is."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was behind pulling out the usage of name_field and description_field here, when you're still using them on other models? Just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of a messy/bad abstraction. Names can sometimes be optional, and sometimes are not (they're treated as keys) so in my opinion this was a premature optimization/abstraction on my part :)

"""Config for the cookies"""

orm_mode = True
path: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it thanks!

default=None,
description="Deprecated. "
+ (
DataProtectionImpactAssessment.__doc__ or ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like DataProtectionImpactAssessment > progress/link files might need default=None too (although this is supposed to be deprecated anyway)

Comment on lines 386 to 405
@validator("legitimate_interest", always=True)
@field_validator("legitimate_interest")
@classmethod
def set_legitimate_interest(cls, value: bool, values: Dict) -> bool:
def set_legitimate_interest(cls, value: bool, info: ValidationInfo) -> bool:
"""Sets if a legitimate interest is used."""
values = info.data

if values["legal_basis"] == "Legitimate Interests":
value = True
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to run if legitimate_interest is not an argument when instantiating a DataUse. Looks like you can add validate_default=True to the legitimate_interest Field above to get the same always=True behavior.

(Doesn't matter much in practice, this field is deprecated, an argument to remove these deprecated fields soon)

@@ -93,9 +93,9 @@ jobs:
Pytest-Matrix:
strategy:
matrix:
python_version: ["3.8", "3.9", "3.10", "3.11"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, how do we generally communicate things like this to customers?

"""
A FidesKey type that creates a custom constrained string.
"""
FidesKey = Annotated[str, PlainValidator(validate_fides_key)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all these helpful review comments

@@ -1084,48 +1096,60 @@ class System(FidesModel):
"""

registry_id: Optional[int] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in FIdeslang the type is an integer, but the tests updated to registry_id="1"

The ctl_systems table has it as a string though, so should this be updated to a string? Come to think of it I think I've run into 500 errors before around this - the discrepancy between fideslang and the database here.

                                            Table "public.ctl_systems"
                Column                |           Type           | Collation | Nullable |          Default          
--------------------------------------+--------------------------+-----------+----------+---------------------------
registry_id                          | character varying        |           |          | 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh oh...hmm ok I'll need to do more testing over in fides for this

def __get_validators__(cls) -> Generator:
yield cls.validate
regex: Pattern[str] = re.compile(FIDES_KEY_PATTERN)
if not regex.match(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws confusing Type errors if the fides key is not a string, should we add more validation to require FidesKeys to be a string upfront?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which situations? We might need to pass in as str specifically

system_type="SYSTEM",
tags=["some", "tags"],
)

def test_system_no_egress(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put this back and fix the model-level validation.

It looks like privacy_declarations_reference_data_flows is not getting called now. This first test, the privacy declaration seems to reference a system but there are no system egresses?

@ThomasLaPiana
Copy link
Contributor Author

There's a whole new lingo to learn!

no kidding! Soooo many major changes here, hopefully the performance improvements and new ergonomics are worth it!

Base automatically changed from ThomasLaPiana-remove-data-qualifiers to fideslang-3 November 27, 2023 05:01
@NevilleS
Copy link
Contributor

NevilleS commented Jan 8, 2024

Migrated this one to Ethyca's downstream fork: ethyca#9. Once we finish this update for the Python bindings it'll be useful to merge upstream 👍

@NevilleS NevilleS closed this Jan 8, 2024
@NevilleS NevilleS deleted the ThomasLaPiana-bump-pydantic-2 branch January 8, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to Pydantic 2.X
4 participants