TL;DR
This blog post will go into details on the internals of yum repositories by examining each of the index files created as part of yum repository’s metadata. We’ll cover what each index file means and take a look at how a user can inspect the metadata themselves.
What is a yum repository?
A yum repository is a collection of RPM packages with metadata that is readable by the yum
command line tool. Having a yum repository allows you to perform package install, removal, upgrade, and other operations on individual packages or groups of packages.
yum repositories are essential for storing, managing, and delivering software to end systems.
Create a yum repository with createrepo
Before taking a closer look at the repository metadata itself, let’s show how to get a repository setup by using the open source createrepo
command line tool.
You can create a yum repository by using the command line tool createrepo
, which you can install on CentOS and Red Hat systems by running:
createrepo
takes a single command line argument in its simplest usage: the directory to output the yum repository metadata.
Assuming you have some RPM files in your current directory, you can run one commands to generate the yum repository:
This will create a directory named repodata, containing the repository metadata which is described in depth below.
You can then use GPG to sign the repository metadata. Doing this guarantees to users of your repository that you generated the metadata. This is not the same as using rpm
or rpmsign
:
More details about using GPG with RPM packages and yum repositores can be found on a previous blog post dedicated to this topic.
If you’d like others to access your yum repository, you’ll need to setup Apache, nginx, or some other web server and point it at the base directory of the repository. It is recommended that you also obtain SSL certificates so that package data is securely transfered to end systems.
Of course, using packagecloud is a much faster and simpler solution :) with SSL, GPG, authentication, collaboration, and everything else you need ready to go!
yum repository metadata
yum repository metatadata is structured as a series of XML files, that contain checksums of other files, and the packages to which they refer.
The metadata files usually found in a yum repository are:
- repomd.xml: Essentially an index that contains the location, checksums, and timestamp of the other XML metadata files listed below.
- repomd.xml.asc: This file is generated only if the repository creator has signed the repomd.xml file using GPG, as shown in the above example.
yum
will download and verify this signature if the user has thepygpgme
package installed. - primary.xml.gz: Contains detailed information about each package in the repository. You’ll find information like name, version, license, dependency information, timestamps, size, and more.
- filelists.xml.gz: Contains information about every file and directory in each package in the repository.
- other.xml.gz: Contains the changelog entries found in the RPM SPEC file for each package in the repository.
There are a handful of other files, but they aren’t widely used and for most package repositories, the above metadata files are sufficient.
Typically, repository metadata is namespaced under ‘repodata’ in YUM repository URLs, and/or stored in a directory named ‘repodata’ on the repository server.
It is a common practice to organize your yum repository so that packages of the same architecture are stored together. Doing this also segments your repository metadata segmented by architecture type, reducing the amount of metadata to serve to clients and to regenerate on updates.
A typical set of URLs from packagecloud for both x86_64 and i386 repomd.xml files would be:
- https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/i386/repodata/repomd.xml
- https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7/x86_64/repodata/repomd.xml
Most publicly accessible repositories have a similar scheme. For example, the official CentOS 7 repomd.xml files:
- http://mirror.centos.org/centos/6/os/i386/repodata/repomd.xml
- http://mirror.centos.org/centos/6/os/x86_64/repodata/repomd.xml
Examining and verifying yum repository metadata
You can use a series of command line tools to examine yum repository metadata, calculate checksums, and verify GPG signatures.
Let’s use the CentOS 7 repository located at https://packagecloud.io/computology/packagecloud-cookbook-test-public/el/7.
Examine repomd.xml first
You must begin by first examining the repomd.xml file with curl. The paths to the other index files and their checksums will be found here.
(TIP: If you like seeing greater detail or need to debug, try using the curl flags -Lv for verbose mode.)
Verify GPG signature of repomd.xml
yum will automatically attempt to verify the repository GPG signature (if repo_gpgcheck
is set to 1 in the yum configuration, more here), but you can also verify the signature manually if you like.
If the repository is GPG signed and you have imported the public GPG key, you can verify the signature by downloaded the repomd.xml file, the repomd.xml.asc file, and using gpg --verify
:
Examine primary.xml.gz metadata
Next, let’s examing the primary.xml.gz metadata. As mentioned above, this file contains information about the packages in the repository.
The repomd.xml file mentions the location of this file:
NOTE: The location of this file isn’t always so straight-forward as some repositories will include the SHA or MD5 sum of the primary.xml.gz file in the URL itself.
The location is relative to the base repository.
Let’s first check that the SHA checksum matches what repomd.xml has listed. The value listed for checksum is the SHA checksum of the file. The value isted for open-checksum is the checksum of the gunzipped version of the file.
Good news, the checksum matches! Let’s use zless
to gunzip the file and page through it:
A sample of the data in the primary.xml.gz file:
filelists.xml.gz and other.xml.gz
Repeat the above process for primary.xml.gz to examine these files:
- Obtain the location and checksums of the metadata file from the repomd.xml file.
- Use
curl -Ls <url> | shasum
to verify the checksum of the file matches what is found in repomd.xml. - Use
curl -Ls <url> | zless
to examine the file.
Conclusion
yum repository metadata is comprised of a set of XML files, checksums, and in some cases a GPG signature. The metadata describes which packages can be found in a repository, various attributes about each package, file and directory listings, as well changelog information.
yum repository metadata can be manually examined and verified on the command line using a combination of curl
, less
/zless
, gpg
, and shasum
. This is useful if you need to debug some sort of issue with your repository (missing package, missing dependency, incorrect version, etc) or are curious about the inner workings of one of the most important pieces to your operating system.