TL;DR
This blog post dives in to the Python PyPI repository metadata format. We’ll cover the different metadata files that make up a PyPI repository, what the files mean, and show how a user can inspect metadata themselves.
What is a Python PyPI repository?
A Python PyPI repository is a collection of Python packages and metadata that is readable by the pip
command line tool. Most python programmers will be familiar with running pip install
for installing python packages.
Python PyPI root repository metadata
Python PyPI repository metadata is a set of two simple HTML documents describing the available Python packages and the versions of each package.
The first metadata file is located at the /simple/
endpoint of a PyPI repository URL. This document contains an API version number and a list of HTML links for each available package in the repository.
For example, a PyPI repository on packagecloud would have its main metadata available at https://packagecloud.io/joe/hi/pypi/simple/.
You can use curl
to download the metadata and examine it. For example:
This repository has a single package called packagecloud-test
and an HTML link pointing to the path of a document listing the versions of the packagecloud-test
package that are available.
The metadata format is relatively simple and is documented in the Warehouse docs.
This documentation shows that a response header named X-PyPI-Last-Serial
should be included from the server. It appears that this option was originally added to support a PyPI mirroring tool called bandersnatch.
The need for this option was later removed from bandersnatch. As far as we can tell, this option is not necessary for any other application, but is probaly necessary for backward compatibility with older versions of bandersnatch.
Python PyPI package metadata
Individual Python package metadata can also be retrieved by following the links in the root metadata or by constructing the metadata URL manually. Package metadata is located at the URL endpoint /simple/packagename
.
Following our previous example, we can request the metadata for the Python package packagecloud-test
by using curl:
The metadata for individual packages is relatively straightforward, as well. The metadata contains a list of HTML links for each version of the available package. The link to the package can also include a checksum that clients like pip
can verify. According to the documentation, the supported checksum algorithms are: md5, sha1, sha224, sha256, sha384, and sha512.
The rel
attribute has special meaning. Per the documentation, internal
refers to a URL that is a direct package link. There are other possible values for rel
which include things like a homepage for the project.
Similarly, this page includes the X-PyPI-Last-Serial
header, as well.
There are a few important things to note when examining PyPI metadata: Python PyPI metadata URLs are case insensitive and should treat hyphens and underscores as interchangeable.
This means that a request for https://packagecloud.io/joe/hi/pypi/simple/packagecloud-test
and https://packagecloud.io/joe/hi/pypi/simple/packagecloud-TEST
should result in the same metadata being output.
This and other important information about Python package metadata is documented in PEP 0426.
Conclusion
Python PyPI metadata is comprised of a set two of HTML documents. The HTML documents describe the packages available, the versions of those packages, and a URL for downloading each of the package files.
Python PyPI metadata can be manually retrieved and examined on the command line using curl
. This is useful if you need to debug some sort of issue with your repository or are curious about the inner workings of Python PyPI repositories.