Continuous integration and deployment pipelines depend on package registries to install or add the required dependencies to create the application builds. Package registries can be internally hosted as well as publicly hosted ones provided by the specific programming language, framework, or operating system that is used by the application bundle.
Package registries consist of reusable code that can be used with multiple applications, saving a considerable amount of time that goes into reimplementing common functionality. Since developers depend on these packages downloaded from public or private sources for critical functionality, the authenticity of the source and the integrity of the packages are very important for the organization.
This is why most organizations choose to implement their private package registry. If you are looking for an easy way to build a highly available and secure package registry, look no further than packagecloud.
Sign up for the packagecloud free trial to get your machines set up and updated easily.
Now, let’s dive into our post about how to build a package registry.
What are package registries?
Package registries host reusable code assets, as well as their version details and configuration details, and maintain the metadata about them. The repositories allow one to store multiple versions of the same packages, ensure security through signature-based verification procedures, and offer a mechanism for remote machines to download and use them.
Package registries help developers to search and install the required dependencies to complete the functionality. The version details help developers choose the right package and keep their software up to date. In short, a package registry is a collection of packages and aids in reliable searching, configuration, and installation of packages. All in all, they play a key role in ensuring the reproducibility of your build and deployment processes.
Examples of well-known package registries are Python’s PyPI repository, Debian APT repository, RedHat’s RPM, etc. Most of these are package registries specific to a programing language or operating system.
Does this mean package registries always have to be locked to one language or framework? If this were the case, then how would an organization that uses multiple programming languages, frameworks, and operating systems manage its packages?
This is where third-party package repositories like packagecloud come into the picture. They support most of the common languages and frameworks that are used in the enterprise space. Hence, such package repositories enable organizations to manage all their packages in one place.
Public vs. private package registries
As discussed in the above sections, a package repository can be a publicly hosted one like Python’s public PyPi server or Debian APT mirrors. Such public repositories help anyone to download and install the required packages. The uploading of packages to such repositories is verified by open-source communities or consortiums. Hence, they are generally secure, but they are still vulnerable enough for organizations with tight security requirements to be concerned. The advantage of using a public package repository is that your developers are always working with the latest and greatest packages.
Other than the public open-source packages, most organizations will also work with reusable assets that are created in-house. Those packages also have to be hosted somewhere that developers in the organization can freely access them. It may not be a good idea to host them in public repositories since that can lead to security vulnerabilities or intellectual property theft. This results in such companies having to implement their own version of package repositories inside their private network.
The build workflow at organizations that use private repositories is as follows: First, they search for the required package in the private repository. If it can be found, they use it to build the software. If it cannot found, then there is no custom version of that dependency, and it needs to be fetched from a public repository. The build routine will then fetch it from authorized public repositories. While this looks like a good solution, this kind of routine has recently come under serious threat from substitution attacks.
A substitution attack or dependency confusion attack works by strategically placing packages with similar names as that of internal dependencies in public repositories. The attackers then wait for the developer to misspell the name of a dependency while trying to install packages. Package repositories like packagecloud can help mitigate this risk by enforcing the integrity of downloaded packages.
A common way of mitigating this risk, without using smart package managers like packagecloud, is to avoid dependencies from public repositories altogether. This requires the organization to clone the public repository to the internal network, verify it, and then create a carefully curated list of packages for their developers.
Such a mechanism helps them to verify the authenticity and integrity of individual packages, but this is hard and tedious work. It means a package available in a public package repository may not be available in the private repository in time for a developer to take advantage of it, so the developer may miss the latest and greatest packages.
How do we build a package registry?
A package registry must be able to store different versions of packages, maintain the metadata about them, enable searching based on this metadata, and reliably install the packages to the client system. To accomplish all these, a package registry has the following components at its core.
- A storage space where the packages can be physically stored. Usually, this is a file system, but some advanced package repositories even allow storing them in completely managed services like S3, Dropbox, etc.
- A database that can maintain the metadata and configuration.
- A way of uploading the packages to the repository.
- A set of APIs for the client applications to use while searching, configuring, and installing dependencies.
- A dashboard or webpage to allow easy search dependencies and control administration parameters.
As evident from the description of the above components, it is not easy to implement custom code for doing this. The ones who do this build the code using the open-source package management utilities provided by the specific programming language or operating system. Such utilities generally come with support for all the above functionalities including web servers and databases.
Let us try to build a private package registry for Python using the PyPi server. On the client-side, Python package management is done using a command-line utility called pip. This guide assumes that you have a working Python3 installation. Follow the steps below to accomplish this.
Install the ‘pip’ utility to manage packages on the python client side.
This can be done by executing the command below if you are using a Debian-based operating system:
sudo apt install python-pip
Install Python virtual environment if it is not already installed:
pip install virtualenv
Now, create a directory to hold the packages and the libraries for the server:
mkdir ~/packages_storage cd packages_storage
Create a virtual environment here and activate it:
virtualenv venv source venv/bin/activate
Install the PyPi server:
pip install pypiserver
Start the server using the following command:
pypi-server -p 8080 ~/packages_storage
You can now head to localhost:8080 in your browser to ensure your PyPi server is up and running. The browser will show the welcome page and a simple index of all the packages that are present in the PyPi server.
Adding packages to this repository can be done by simply moving your package files to the packages_storage directory. There are also tools like Twine that can upload packages to the PyPi registry.
You can access the packages in this registry by using a directive called --extra-index:
pip install --extra-index-url http://localhost:8080/simple/ package_name
That is how you build a package registry. While that is simple enough to accomplish, there is still a lot of work to be done for it to be used in a production environment. Some of the most critical items are listed below.
- The registry server requires authenticated access and HTTPS support. All this needs to be enabled with a multitude of configurations.
- The infrastructure to set this up is still your headache. You can choose an instance from a cloud provider like AWS or an on-premises instance.
After all these efforts, what you get is a package registry that can only handle Python packages—if you work with multiple languages and frameworks, you will have to repeat this for all of them! This is where packagecloud can help.
How can packagecloud help?
Packagecloud can help you create package registries that support multiple languages, frameworks, and operating systems at the same time. It supports Python, Java, NodeJS, Debian, RPM, and much more. It helps you create private or public cloud-based registries. You don’t have to think about infrastructure when you work with packagecloud.
You can create a robust Python package registry under 10 seconds using packagecloud. Packagecloud can help you avoid risks like dependency confusion attacks. Its security features help you verify the authenticity and integrity of packages.
Sign up for the packagecloud free trial to get your machines set up and updated easily!
Most languages and frameworks provide package registry utilities that can be used to set up your own private registries. However, using them requires one to keep track of many package registries. This can get very tedious in large engineering organizations because of the breadth of tools that they use. Packagecloud can refine the package registry setup and management process while providing the best security possible.
Check out the packagecloud free trial to see how easy it is to distribute packages throughout your entire organization. Never worry about the scaling, consistency, or security of your packages again.