-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the meaning of "match" between sdist and wheel #1594
base: main
Are you sure you want to change the base?
Conversation
In the current version of the description of `Dynamic` in the core metadata, it is unclear whether "match" means string equivalence or semantic equivalence. Given that pypa/packaging is reordering version specifiers between sdist and wheel when used by a build backend (which to my best knowledge are all build backend written in python), we can clarify this to semantic equivalence.
@konstin, this seems like a reasonable clarification to me but I don't know what is the process when making edits to a specification. I think maybe it should be discussed in the "Packaging" category of |
This is a clarification on what is already happening in practice (like e.g. #1550), since |
It's technically up to @pfmoore whether we need to have a discussion or not. |
For the example of specifiers I agree1 that this is the intended meaning, and the proposed change is just a textual clarification, which can be done by a PR. For the So while the PR as it stands is OK, I'd like it to add a note that makes it clear that the equivalence rule does not imply permission to alter user-supplied values in a way that could be visible to the user (with the project name given as an explicit example of this). Footnotes
|
That's a really interesting perspective, cause the inverse is how we initially got to recognizing this as a problem: In uv, we were preserving the order of specifiers exactly because we didn't want to alter user provided values. We were then seeing differences in the lockfile (which serializes the requirements of workspace packages so we can check lockfile freshness) when this information was read directly from In uv, we are also currently also normalizing all package and extra names (though we currently don't have a build backend), since we found that these were often inconsistent, even for a single wheel or project. In the resolver, we couldn't determine a source for a canonical name, so we chose the normalized name instead. Effectively, our implementation is assuming the inverse: You can normalize the package name, but we were treating the specifier order as user input. For context, i have a PR up normalizing the specifier order (astral-sh/uv#6333), and my intent is to upstream this concept. If we want to preserve certain characteristic (e.g. the spelling of the project name), should we annotate this for the applicable core metadata fields? |
OK, I've thought some more about this and changed my mind. There appears to be a lot more subtlety here than I originally thought, and I'm not comfortable with simply changing "match" to "semantically equivalent". I'm happy for now to explore the issue further here, rather than going to a full community debate and possibly a PEP, but I'm not ruling that out at this point.
First of all, what is If I had to clarify "match", I'd probably say that it means "tools consuming the data can use data from either source without fear of errors, and tools that produce the data must write the data so that the previous assumption holds". But I'm not sure that's better than simply saying "match" - it's just longer.
Of course you can normalise the package name for the purposes of the resolver, but you also need to retain the unnormalised name, because that's what you have to use when displaying the package name to the user. If you're displaying normalised names to the user, and you haven't received bug reports yet, let me find the users who complained at pip for doing precisely that, and ask them to express their unhappiness to you as well 🙂
Personally, I think that might cause more disruption than it warrants. This is the first time I've seen anyone suggest that it's an issue, and I'd like to understand why it's causing you a problem before we start trying to "solve" it. Footnotes
|
I just realized there's a slight misunderstanding here:
I've seen projects store the original name as the "display name" and/or the normalized name as, well, the "normalized name" if they wanted to avoid normalizing at file input time.
I'm going to guess it's about what gets written out to |
Please share! We haven't received any reports, but we always try to learn from other projects' experiences. So far, we're showing the name in all sorts of messages and serializing the name into the lockfile (which needs to be user-readable, for auditing and PR reviews). Relatively early in development we switched to normalized names after seeing that the spelling of the name in the wheel filename, in For reference, here's an excerpt from a [[package]]
name = "albatross"
version = "0.1.0"
source = { editable = "." }
dependencies = [
{ name = "bird-feeder" },
{ name = "tqdm" },
]
[package.metadata]
requires-dist = [
{ name = "bird-feeder", editable = "packages/bird-feeder" },
{ name = "tqdm", specifier = ">=4,<5" },
]
[[package]]
name = "anyio"
version = "4.4.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "idna" },
{ name = "sniffio" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e6/e3/c4c8d473d6780ef1853d630d581f70d655b4f8d7553c6997958c283039a2/anyio-4.4.0.tar.gz", hash = "sha256:5aadc6a1bbb7cdb0bede386cac5e2940f5e2ff3aa20277e991cf028e0585ce94", size = 163930 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7b/a2/10639a79341f6c019dedc95bd48a4928eed9f1d1197f4c04f546fc7ae0ff/anyio-4.4.0-py3-none-any.whl", hash = "sha256:c1b2d8f46a8a812513012e1107cb0e68c17159a7a594208005a57dc776e1bdc7", size = 86780 },
] |
I'll be honest, I'm not sure I can. We've been using the project name unnormalised for years now, and have treated cases where we've accidentally displayed normalised names as bugs for as long, so any reports would be from a long time ago. I did a search on the tracker, and couldn't find anything specific, but it's difficult to think of a good search term. So I may well be misremembering the history here. There's a lot of tangential discussion of issues related to name normalisation on Discourse. One that I found is https://discuss.python.org/t/change-in-pypi-upload-behavior-intentional-accidental-pebkac/27707. It's mostly about normalising the project name in wheel filenames and It's possible that I'm either mistaken in how much importance I'm putting on this, or flat-out spreading FUD based on what I recall of old discussions. So feel free to ignore me on this point. None of this is relevant to this issue, though. I'd still like to understand why the precise wording matters to you. I clarified my intended meaning of "match" above:
and that remains my position. If this is about building a backend, then I'm not the person to advise you as I have no experience with that. I'd suggest asking other backend developers what their view is, as no-one else seems to have hit an issue with the wording as it stands. If it turns out we do need a more precise wording, I'd like to see what's being proposed before saying whether it needs a PEP - but the change in this PR to say "semantically equivalent" is no better in practice than the current form, because we don't have a precise definition of what "semantic equivalence" means (both in terms of the complexities of specifier equivalence, and in terms of the question of whether the display form of a project name counts as "semantically significant"). |
My intention with this change was to clarify that the strings don't need to be identical between
We can totally use this as clarification for what "match" means. For a tool that does this kind of transformation, it's important to know what kind of changes are safe to do without fear of errors, and which changes could lead to an error. I think we agree that let's say removing whitespace in version specifiers (like |
But that's got nothing to do with the core metadata
I think this is pretty explicit that the data provided by the user must be copied unchanged. |
In the current version of the description of
Dynamic
in the core metadata, it is unclear whether "match" means string equivalence or semantic equivalence. Given that pypa/packaging is reordering version specifiers between sdist and wheel when used by a build backend (which to my best knowledge are all build backend written in python), we can clarify this to semantic equivalence.📚 Documentation preview 📚: https://python-packaging-user-guide--1594.org.readthedocs.build/en/1594/