Packagecloud logo

yum repository internals

TL;DR

This blog post will go into details on the internals of yum repositories by examining each of the index files created as part of yum repository’s metadata. We’ll cover what each index file means and take a look at how a user can inspect the metadata themselves.

What is a yum repository?

A yum repository is a collection of RPM packages with metadata that is readable by the yum command line tool. Having a yum repository allows you to perform package install, removal, upgrade, and other operations on individual packages or groups of packages.

yum repositories are essential for storing, managing, and delivering software to end systems.

 

Create a yum repository with createrepo

Before taking a closer look at the repository metadata itself, let’s show how to get a repository setup by using the open source createrepo command line tool.

You can create a yum repository by using the command line tool createrepo, which you can install on CentOS and Red Hat systems by running:

$ sudo yum install createrepo

createrepo takes a single command line argument in its simplest usage: the directory to output the yum repository metadata.

Assuming you have some RPM files in your current directory, you can run one commands to generate the yum repository:

$ createrepo .

This will create a directory named repodata, containing the repository metadata which is described in depth below.

You can then use GPG to sign the repository metadata. Doing this guarantees to users of your repository that you generated the metadata. This is not the same as using rpm or rpmsign:

$ gpg --detach-sign --armor myrepo/repodata/repomd.xml

More details about using GPG with RPM packages and yum repositores can be found on a previous blog post dedicated to this topic.

If you’d like others to access your yum repository, you’ll need to setup Apache, nginx, or some other web server and point it at the base directory of the repository. It is recommended that you also obtain SSL certificates so that package data is securely transfered to end systems.

Of course, using packagecloud is a much faster and simpler solution :) with SSL, GPG, authentication, collaboration, and everything else you need ready to go!

 

yum repository metadata

yum repository metatadata is structured as a series of XML files, that contain checksums of other files, and the packages to which they refer.

The metadata files usually found in a yum repository are:

  • repomd.xml: Essentially an index that contains the location, checksums, and timestamp of the other XML metadata files listed below.
  • repomd.xml.asc: This file is generated only if the repository creator has signed the repomd.xml file using GPG, as shown in the above example. yum will download and verify this signature if the user has the pygpgme package installed.
  • primary.xml.gz: Contains detailed information about each package in the repository. You’ll find information like name, version, license, dependency information, timestamps, size, and more.
  • filelists.xml.gz: Contains information about every file and directory in each package in the repository.
  • other.xml.gz: Contains the changelog entries found in the RPM SPEC file for each package in the repository.

There are a handful of other files, but they aren’t widely used and for most package repositories, the above metadata files are sufficient.

Typically, repository metadata is namespaced under ‘repodata’ in YUM repository URLs, and/or stored in a directory named ‘repodata’ on the repository server.

It is a common practice to organize your yum repository so that packages of the same architecture are stored together. Doing this also segments your repository metadata segmented by architecture type, reducing the amount of metadata to serve to clients and to regenerate on updates.

A typical set of URLs from packagecloud for both x86_64 and i386 repomd.xml files would be:

Most publicly accessible repositories have a similar scheme. For example, the official CentOS 7 repomd.xml files:

 

Examining and verifying yum repository metadata

You can use a series of command line tools to examine yum repository metadata, calculate checksums, and verify GPG signatures.

Let’s use the CentOS 7 repository located at https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7.

 

Examine repomd.xml first

You must begin by first examining the repomd.xml file with curl. The paths to the other index files and their checksums will be found here.

$ curl -Ls
https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/repomd.xml

(TIP: If you like seeing greater detail or need to debug, try using the curl flags -Lv for verbose mode.)

 

Verify GPG signature of repomd.xml

yum will automatically attempt to verify the repository GPG signature (if repo_gpgcheck is set to 1 in the yum configuration, more here), but you can also verify the signature manually if you like.

If the repository is GPG signed and you have imported the public GPG key, you can verify the signature by downloaded the repomd.xml file, the repomd.xml.asc file, and using gpg --verify:

$ curl -Ls
https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/repomd.xml
> repomd.xml
$ curl -Ls
https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/repomd.xml.asc
> repomd.xml.asc
$ gpg --verify repomd.xml.asc repomd.xml
gpg: Signature made Sun Oct 12 11:07:54 2014 PDT using RSA key ID 7AD95B3F
gpg: Good signature from "packagecloud ops (production key)
<ops@packagecloud.io>"

 

Examine primary.xml.gz metadata

Next, let’s examing the primary.xml.gz metadata. As mentioned above, this file contains information about the packages in the repository.

The repomd.xml file mentions the location of this file:

<data type="primary">
  <location href="repodata/primary.xml.gz"/>
  <checksum type="sha">6eb7ecc041f69a5ffeabdebcb466c443aa5e8028</checksum>
  <timestamp>1413137274</timestamp>
  <open-checksum type="sha">0b08c81e46081059cbe56d2f0871017ef8073d93</open-checksum>
</data>

NOTE: The location of this file isn’t always so straight-forward as some repositories will include the SHA or MD5 sum of the primary.xml.gz file in the URL itself.

The location is relative to the base repository.

Let’s first check that the SHA checksum matches what repomd.xml has listed. The value listed for checksum is the SHA checksum of the file. The value isted for open-checksum is the checksum of the gunzipped version of the file.

$ curl -Ls https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/primary.xml.gz | shasuma
6eb7ecc041f69a5ffeabdebcb466c443aa5e8028  -

Good news, the checksum matches! Let’s use zless to gunzip the file and page through it:

$ curl -Ls
https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/primary.xml.gz
| zless

A sample of the data in the primary.xml.gz file:

  <package type="rpm">
    <name>jake</name>
    <arch>x86_64</arch>
    <version epoch="87" rel="3.el6" ver="1.0"/>
    <checksum pkgid="YES" type="sha">ea721867eb0389e28bcd32e2deef7d4472c6ced8</checksum>
    <summary>jake douglas is a very nice young man.</summary>
    <description>as it so happens, jake douglas is a very nice young man.</description>
    <packager></packager>
    <url>https://twitter.com/jakedouglas</url>
    <time build="1401650103" file="1413137269"/>
    <size archive="4536" installed="4280" package="3740"/>
    <location href="jake-1.0-3.el6.x86_64.rpm"/>
    <format>
      <rpm:license>GPL</rpm:license>

 

filelists.xml.gz and other.xml.gz

Repeat the above process for primary.xml.gz to examine these files:

  1. Obtain the location and checksums of the metadata file from the repomd.xml file.
  2. Use curl -Ls <url> | shasum to verify the checksum of the file matches what is found in repomd.xml.
  3. Use curl -Ls <url> | zless to examine the file.

Conclusion

yum repository metadata is comprised of a set of XML files, checksums, and in some cases a GPG signature. The metadata describes which packages can be found in a repository, various attributes about each package, file and directory listings, as well changelog information.

yum repository metadata can be manually examined and verified on the command line using a combination of curl, less/zless, gpg, and shasum. This is useful if you need to debug some sort of issue with your repository (missing package, missing dependency, incorrect version, etc) or are curious about the inner workings of one of the most important pieces to your operating system.

You might also like other posts...