Packagecloud logo

yum and createrepo generate incorrect metadata

TL;DR

createrepo 0.9.9 uses a library called rpmUtils which is provided by yum. This library contains code which parses version strings from an RPM package. This code contains a bug which results in incorrect version and release strings being output by rpmUtils and further results in createrepo generating incorrect metadata for several packages.

CentOS 6 and CentOS 7 use createrepo to generate metadata for the official YUM package repository, and because of this, the official metadata has incorrect version and release strings for several packages.

RPM file format

RPM is a binary file format containing a few sections of metadata followed by a compressed CPIO archive. The RPM header contains an index structure which consists of 16-byte entries. An index entry contains a tag value, tag type, data offset, and a count.

Some useful tag values include name, version, release, architecture, and so on.

The librpm library can be used to work with RPM files and allows programs to read and extract RPM file information, among other operations. The rpm user program works by using librpm.

For example, it’s possible to dump all known tags on the command line by running: rpm --querytags

 

RPM EVR strings

RPM versions are fully expressed as strings of the form: epoch:version-release. This is where the acronym EVR comes from: EpochVersionRelease. Throughout the librpm source, these strings are referred to as EVR.

An epoch is a special version number that can be set by the person writing the RPM packaging information if the software does not have a version number scheme that librpm can parse. In most cases, this value is not set by the packaging author.

 

Official algorithm for parsing EVRs

The official algorithm for parsing EVR strings can be found in librpm. The exact file and line number will vary depending on the version of the source, but version 4.12.0.1 (available here: http://www.rpm.org/wiki/Releases/4.12.0.1) contains a file called ./lib/rpmds.c which has a function named parseEVR which can be found at line 949.

The important take-away from this simple algorithm is that the release string begins with the character following the last hyphen in the EVR string (found by using the C function strrchr).

For example, the string: 7.4.160-1 is parsed by librpm into a version of 7.4.160 and a release of 1.

I assume the source provided by librpm contains the official parsing algorithm since librpm is used to power rpm and the other command line tools that actually install, remove, and deal with rpm files once they are placed on the target system.

 

Buggy code in createrepo, yum, and rpmUtils

You can clone the yum source from git by following the instructions here: http://yum.baseurl.org/.

yum contains a directory named rpmUtils which is installed as a python library package when yum is installed. In the rpmUtils library, a function named stringToVersion around line 391 attempts to parse an EVR string and output the version, release, and epoch.

Tragically, this code attempts to locate the start of the release string by using a python function called find (described here: https://docs.python.org/2/library/string.html#string.find).

This function will create a release string starting with the first character following the first hyphen.

createrepo uses rpmUtils for generating metadata, but it also duplicates this code when dealing with deltarpms.

You can clone the createrepo source from git by following the instructions here: http://createrepo.baseurl.org/.

createrepo 0.9.9 contains code to parsee EVR strings in the file ./createrepo/deltarpms.py around line 70 in a method named _stringToVersion.

 

Consequences

The result of the yum and createrepo algorithms for most packages is equivalent to the result of the librpm algorithm.

However, there are a few packages for which createrepo will generate different version and release strings than librpm will generate. Since rpm and many other command line tools are built upon librpm this can lead to subtle bugs in version comparison which can affect upgrades, downgrades, and package installation.

createrepo generates metadata describing what each package “requires” so that it can be installed and what it “provides” after it has been installed. When doing so, it includes the version and release of each requirement.

An example of this bug can be seen with the package maven-repository-builder-1.0-0.5.alpha2.el7.noarch.rpm.

createrepo generates the following “provide” entry for maven-repository-builder:

<rpm:entry name="mvn(org.apache.maven.shared:maven-repository-builder)"
           flags="EQ"
           epoch="0"
           ver="1.0"
           rel="alpha-2"/>

Note that the provide entry has the EVR string “1.0-alpha-2” which was split (incorrectly) on the first hyphen.

The correct metadata that should have been generated is:

<rpm:entry name="mvn(org.apache.maven.shared:maven-repository-builder)"
           flags="EQ"
           epoch="0"
           ver="1.0-alpha"
           rel="2"/>

This is an unfortunate bug because now when yum or rpm or other tools attempt to satisfy the dependency graph, the tools will be using different understandings of the version and release strings for maven-repository-builder and may be unable to find a path through the graph even though one exists.

Appropriate bugs have been filed against CentOS and Red Hat here and here.

Affected packages

There are 12 affected packages in the official CentOS 7 repository according to my tests:

base64coder-20101219-10.el7.noarch.rpm
httpd-2.4.6-17.el7.centos.1.x86_64.rpm
maven-repository-builder-1.0-0.5.alpha2.el7.noarch.rpm
plexus-ant-factory-1.0-0.12.a2.3.el7.noarch.rpm
plexus-bsh-factory-1.0-0.14.a7.el7.noarch.rpm
plexus-cdc-1.0-0.20.a14.el7.noarch.rpm
plexus-component-api-1.0-0.16.alpha15.el7.noarch.rpm
plexus-component-factories-pom-1.0-0.7.alpha11.el7.noarch.rpm
plexus-i18n-1.0-0.6.b10.4.el7.noarch.rpm
plexus-interactivity-1.0-0.14.alpha6.el7.noarch.rpm
plexus-mail-sender-1.0-1.a2.25.el7.noarch.rpm
plexus-resources-1.0-0.15.a7.el7.noarch.rpm

 

Conclusion

You should use createrepo from the tag in the git source tree createrepo_0_4_10 (SHA e9ab4444d67cd79533441e8d9b65488f423661a2) as this version has an EVR parsing algorithm which is identical to librpm. This version of createrepo does not use rpmUtils and does not suffer from this bug.

Hopefully, both yum and createrepo will be modified to fix this bug and reduce code duplication in the future.

Alternatively, you can avoid tracking this (and other) bugs in packaging tools by uploading your RPMs and other packages to packagecloud.io.

You might also like other posts...