TL;DR
The APT repository metadata format is inherently racy.
This bug makes it impossible to guarantee that:
- Frequently updated APT repositories will remain consistent for users
- Mirrors of APT repositories will be consistent
A new feature has been added to APT 1.2.0 and newer to prevent this race condition. Support for this feature has been added to packagecloud.io for all APT repositories. No action is required by our users to take advantage of this new APT feature; it is enabled automatically for everyone.
This bug is largely responsible for the infamous: “Hash sum mismatch” error seen by thousands of Ubuntu and Debian users. We previously wrote about another cause of this error, as well.
At the time of this writing, there is no support for this feature in reprepro
, aptly
, or any other commercially available tool that we could find.
Hash sum mismatch
Many Ubuntu and Debian users have been faced with the cryptic “Hash sum mismatch” error message while running apt-get update
. To understand this error message, you must first have a general understanding of APT repository metadata.
APT repository metadata
There are a few files which make up an APT repository, two of them being the essential parts of this bug:
- The Release (or InRelease) file.
- The Packages file.
The Packages file contains a list of every package in the repository for a given CPU architecture. There are separate Packages files for each supported CPU architecture. For example, a repository that has packages for amd64
and i386
CPUs will have two Packages files.
The Release file contains a list of all the available Packages files and their checksums (or hash sums). When you run apt-get update
, APT will attempt to verify that the checksum of the downloaded Packages file matches the checksum listed in for that file in the Release file.
If these checksums mismatch, a hash sum mismatch error is generated.
How does Hash sum mismatch happen?
There are at least 3 ways this can happen for most Ubuntu and Debian based systems today:
- Stale metadata cached between the client and server. This is unlikely in most cases and not possible if SSL is used.
- The metadata does not match because of a bug during the extraction of the metadata.
- The repository is being updated while an
apt-get update
is run, orapt
has cached a stale Release file.
Users can avoid all 3 cases by:
- Using SSL.
- Disabling XZ compressed metadata, or ensuring a newer version of APT is used.
- Using the new
Acquire-by-hash
feature available in APT 1.2.0.
Fixing APT Hash sum mismatch
Users typically “fix” this issue by running apt-get clean
and manually cleaning the APT directory /var/lib/apt/lists/
(which is not cleaned by apt-get clean
). The real solution is to take advantage of a new feature of APT repositories: Acquire-by-hash
.
Acquire-by-hash
Repository serves can set Acquire-by-hash
to “Yes” in their Release/InRelease file. This indicates that APT clients can download the Packages files by issuing a request against a URL with the hash sum of the file instead of the file name. Thus, as long as the server retains enough copies of older metadata, the client will always request a file with a correct checksum.
Acquire-by-hash
is defaulted to “No” for backward compatibility.
Availability of the fix
On the client side: you must use APT 1.2.0 or newer. If you are using Ubuntu Xenial (16.04) or Debian Stretch, or anything newer, you will be running a version of APT that supports this feature.
We’ve backported this APT version to Ubuntu 12.04 (Ubuntu Precise) and Ubuntu 14.04 (Ubuntu Trusty). Learn more here.
On the server side: the official Ubuntu 16.04 and Debian Stretch APT repositories support Acquire-by-hash
. Earlier versions do not, so running a more recent APT on your Ubuntu 12.04 systems won’t help as far as the official Ubuntu mirrors are concerned.
All repositories on packagecloud.io support this feature, so users of our service simply need to ensure they are running a recent enough version of APT. At the time this article was written no open source project that we could find supports this new feature.
Conclusion
Acquire-by-hash
is a welcome addition to the suite of APT features. It ensures that internally consistent APT repository metadata is available for clients and helps to eliminate one of the most cryptic and difficult to debug errors facing APT user: Hash sum mismatch.