Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyPI JSON and "Simple" API instead of deprecated XML-RPC API #201

Merged
merged 4 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
55 changes: 6 additions & 49 deletions stdeb/downloader.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
from __future__ import print_function
import os
try:
# Python 2.x
import xmlrpclib
except ImportError:
# Python 3.x
import xmlrpc.client as xmlrpclib
from functools import partial
import requests
import hashlib
import warnings
import stdeb
from stdeb.transport import RequestsTransport
import time
from stdeb.pypi_simple import PyPIClient

myprint = print

Expand All @@ -22,11 +15,7 @@

def find_tar_gz(package_name, pypi_url='https://pypi.org',
verbose=0, release=None):
transport = RequestsTransport()
transport.user_agent = USER_AGENT
if pypi_url.startswith('https://'):
transport.use_https = True
pypi = xmlrpclib.ServerProxy(pypi_url, transport=transport)
pypi_client = PyPIClient(pypi_url, USER_AGENT)

download_url = None
expected_md5_digest = None
Expand All @@ -35,8 +24,7 @@ def find_tar_gz(package_name, pypi_url='https://pypi.org',
myprint('querying PyPI (%s) for package name "%s"' % (pypi_url,
package_name))

show_hidden = True
all_releases = _call(pypi.package_releases, package_name, show_hidden)
all_releases = pypi_client.release_versions(package_name)
if release is not None:
# A specific release is requested.
if verbose >= 2:
Expand All @@ -49,51 +37,20 @@ def find_tar_gz(package_name, pypi_url='https://pypi.org',
'releases %r' % (release, all_releases))
version = release
else:
default_releases = _call(pypi.package_releases, package_name)
if len(default_releases) != 1:
raise RuntimeError('Expected one and only one release. '
'Non-hidden: %r. All: %r' %
(default_releases, all_releases))
default_release = default_releases[0]
default_release = pypi_client.release_version(package_name)
if verbose >= 2:
myprint(
'found default release: %s' % (', '.join(default_releases),))
'found default release: %s' % (', '.join(default_release),))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a long-lived typo and the intent was always to log the single default release found.


version = default_release

urls = _call(pypi.release_urls, package_name, version)
for url in urls:
if url['packagetype'] == 'sdist':
assert url['python_version'] == 'source', \
'how can an sdist not be a source?'
if url['url'].endswith(('.tar.gz', '.zip')):
download_url = url['url']
if 'md5_digest' in url:
expected_md5_digest = url['md5_digest']
break
download_url, expected_md5_digest = pypi_client.download_url(package_name, version)

if download_url is None:
# PyPI doesn't have package. Is download URL provided?
result = _call(pypi.release_data, package_name, version)
if result['download_url'] != 'UNKNOWN':
download_url = result['download_url']
# no download URL provided, see if PyPI itself has download
urls = _call(pypi.release_urls, result['name'], result['version'])
Comment on lines -75 to -81
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behavior for which I could find no sensible replacement in the current PyPI APIs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think the concept carries over to the new API.

if download_url is None:
raise ValueError('no package "%s" was found' % package_name)
return download_url, expected_md5_digest


def _call(callable_, *args, **kwargs):
try:
return callable_(*args, **kwargs)
except xmlrpclib.Fault as e:
if not e.faultString.startswith('HTTPTooManyRequests'):
raise
time.sleep(1) # try again after rate limit
return callable_(*args, **kwargs)


def md5sum(filename):
# from
# http://stackoverflow.com/questions/7829499/using-hashlib-to-compute-md5-digest-of-a-file-in-python-3
Expand Down
48 changes: 48 additions & 0 deletions stdeb/pypi_simple.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import re
import requests
import stdeb

USER_AGENT = "pypi-install/%s ( https://github.com/astraw/stdeb )" % \
stdeb.__version__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only current use case passes in an explicit USER_AGENT anyway, it might be good to remove the duplicated string here and make the default behavior not specify a user agent at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to make this change but I agree with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 0787a84 I ended up importing the USER_AGENT constant from the downloader. I think adding a download method to the pypi client would also be another way to push this down. But this cuts the duplication although it keeps the default user agent.



def normalize_package_name(package_name):
return re.sub(r"[-_.]+", "-", package_name).lower()


class PyPIClient(object):
def __init__(self, pypi_url="https://pypi.org", user_agent=USER_AGENT):
self.pypi_url = pypi_url
self._http_session = requests.Session()
self._http_session.headers["User-Agent"] = user_agent

def release_version(self, package_name):
package_name = normalize_package_name(package_name)
response = self._http_session.get("%s/pypi/%s/json" % (self.pypi_url, package_name))
data = response.json()
return data["info"]["version"]

def release_versions(self, package_name):
package_name = normalize_package_name(package_name)
response = self._http_session.get(
"%s/simple/%s" % (self.pypi_url, package_name),
headers={"Accept": "application/vnd.pypi.simple.latest+json"})
data = response.json()
return data["versions"]

def download_url(self, package_name, version):
download_url = None
md5_digest = None
package_name = normalize_package_name(package_name)
response = self._http_session.get(
"%s/pypi/%s/%s/json" % (self.pypi_url, package_name, version))
data = response.json()
for url in data["urls"]:
if url["packagetype"] == "sdist" and \
url["python_version"] == "source" and \
url["url"].endswith((".tar.gz", ".zip")):
download_url = url["url"]
md5_digest = url["digests"]["md5"]
Comment on lines +38 to +42
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original algorithm has a break here, meaning that the first encountered match is used. This continues to loop, meaning that the last match is used. Is that an intentional deviation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! No it isn't. I don't know that there would be a situation with multiple sdist payloads for a given version so I think the break should just be a minor optimization, but it is worth preserving the behavior. In the event that there are multiple sdist urls I'm not sure that the sorting from the server is reliable so we just have to trust that the first one we get is good enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the break in 86d8edc

break

return download_url, md5_digest