What Is a Package Registry?

Sep 12, 2021 · 11 min read

packagecloud-how-to

Introduction

Modern engineering organizations are driven by the race to achieve the lowest time to market. An important element of lowering your time to market is to take advantage of reusable software components as much as possible. The engineering processes should be able to take advantage of utilities that are built by internal teams within the organization and open-source alternatives available in the public domain.

For large organizations with globally scattered teams, even ensuring that the wheel does not get reinvented within the organizations is itself a big challenge. The key to solving this problem is to use reusable packages whenever possible. Robust engineering processes that take advantage of a secure and highly available package registry like packagecloud are also critical.

Packagecloud is a cloud-based service for distributing different software packages in a unified, reliable, and scalable way, without owning any infrastructure. You can keep all of the packages that need to be distributed across your organization's machines in one repo, regardless of OS or programming language. Then, you can efficiently distribute your packages to your devices in a secure way, without having to own any of the infrastructure involved in doing so.

This enables users to save time and money on setting up servers for hosting packages for each OS. Packagecloud allows users to set up and update machines faster and with less overhead than ever before.

Sign up for the packagecloud free trial to get your machines set up and updated easily!

This article is about package registries and the factors you must consider while selecting one for your organization.

What are packages?

As mentioned above, packages are simply reusable software components. Developers use packages to accelerate their development time. In a nutshell, a package contains the following basic elements:

Code for the reusable component. For some languages, it may be actual code itself, while, for others, it may be the compiled executable or binaries.
The metadata that makes them searchable.
Configurations that are needed to integrate and execute it.
A mechanism to install it in a compatible system.

Based on the operating system, programming language, or frameworks, packages come in different formats. For example, packages for the Debian operating system come in .deb format, Redhat-based Linux distributions use .rpm format, Python uses packages in .py format and Java uses them .jar format, and the NPM format used by NodeJS is also well known. While only the client applications that use these packages will be able to make sense of them, all of them basically consist of the same core elements listed above.

Since packages come in different formats, the utilities that host them and facilitate the installation also need specialized features to handle each type. Imagine having an all-in-one package manager that can handle all your package needs for different languages and operating systems.

That is where packagecloud comes into the picture. Packagecloud can handle all these in one place.

Check out packagecloud here.

Need for package registry

The major elements of a package are discussed in the above section. To take advantage of a package, a facilitator software that can make sense of the information of the packages is needed. Ultimately, packages are unusable if a developer who needs functionality is not able to search and find them. Once found, they then need to install it using a client software present in their build or development instance. In short, a facilitator utility must be present to store the packages and manage the search and install processes.

Packages used in modern software development extend to tens of megabytes and need quite a bit of storage space to manage. At times, the model files or dictionary files that come with packages, especially in machine learning and ETL projects, can very well eat up gigabytes of data.

Another critical element of using packages is to make changes when the packages are updated by their author. The author of the package should be able to update the artifact. What happens to the developers who are already using the original package? They cannot be asked to forcefully migrate to the new package. All the developers who are already using the original package must be able to maintain and rebuild their applications if needed. That is why packages are generally versioned, and tracking the version becomes another key responsibility of the facilitator.

Packages are not always built in-house. Developers often get open-source packages from external sources. This presents a big security risk for engineering organizations. The recent attacks against organizations that use a mix of private and public packages throw light on the extent of security vulnerability that an organization can face if they are not careful with packages.

You can read about substitution attacks and dependency confusion attacks here. Ensuring the authenticity or integrity of the packages is also a key requirement in facilitating the seamless use of packages. Package registries exist to cover all the above functionalities. Let us now shift our focus to what package registries are.

What is a package registry?

A package registry stores packages, the metadata associated with them, and the configurations that are needed to install it, as well as keeps track of the versions of packages. A package registry generally has the following features.

A way to upload packages .
A set of APIs for the client to build routines to interact with them. This includes searching, installing, checking for upgrades, etc.
An admin dashboard or command-line utility to manage user access and view all the packages that are present.

Most languages and frameworks provide their own public package registries, from which anyone can download and use the packages. Since such repositories are maintained and verified by open-source communities or consortiums, there is a minimum standard of security associated with them, but this may not be enough for an organization that places the utmost importance on security. Python’s PyPI repository, Debian’s APT repository, NodeJS’ NPM, etc. are examples of such public registries.

This is why many languages and frameworks provide utilities for engineering organizations to set up their own package registries. For example, Python’s PyPI server can be easily deployed inside a virtual private network on a standalone machine. Ensuring the network firewalls, infrastructure maintenance, etc. then becomes the responsibility of the organization.

Thus, there are many ways in which a package registry can be set up. Let us now look at the common types of package registries found in the industry.

Types of package registries

Cloud deployed vs. on-premises package registries

Based on the physical location in which package registries are hosted, they can be divided into cloud-based and on-premises package registries.

Setting up your own package registry is easy with the tools that are provided by the languages and framework. However, the caveat is that you need to maintain the infrastructure to enable a highly secure and highly available package repository. This takes a significant amount of work, and the effort has to be continued as long as the package registry remains in use. Cloud-based package registries can solve this problem.

The cloud-based registry packagecloud abstracts away all the infrastructure and maintenance details from the actual users. It can guarantee perfect up times and high availability by absorbing all the problems related to designing and scaling the infrastructure needed for hosting packages.

There are other well-known players like Cloudsmith, JFrog, Gemfury, etc. that provide similar functionality, but, on deeper research, you will find that packagecloud offers the most value for money in terms of storage and bandwidth compared to all of them.

On-premises package registries are generally set up using the tools provided by the specific languages and frameworks. The biggest challenges here are in setting up the infrastructure and security, which is why this is generally preferred by large engineering organizations with dedicated infrastructure teams. If properly done, this kind of package registry provides the maximum control and security possible. The registry is completely inside the virtual private network of the organization, thus it is virtually impossible to tamper with it. GitLab is a popular choice when it comes to on-premises package registries.

Some organizations use private on-premises repositories to store the packages with strict IP constraints. They will still allow the developers to install packages from the outside world but will keep their most valuable, proprietary packages in the on-premises repository.

Public vs. private package registries

As the name suggests, public registries allow anyone to upload packages and download packages for use. They can be cloud-hosted or on-premises depending on the organization’s engineering requirements. Most open-source languages and frameworks use public package registries. Some entities that want to contribute back to the community, but do not want to push code to public repositories, host their own public repositories. This happens when there is considerable effort in making the packages adhere to the licensing terms of the public repository managed by the framework’s authors.

Private package repositories are accessible only to the employees of the organization. They can only be accessed with authentication credentials and in some cases even two-factor authentication. They can be cloud-based or on-premises. Packagecloud can help you set up a cloud-based private package registry in a matter of minutes.

Considerations when selecting a package registry

Let us now summarize all that we have read up till now to define the factors that need to be considered while selecting the package registry for your organizations.

Do you have a dedicated engineering team to take care of infrastructure requirements associated with setting up a package registry? If not, then a cloud-based package registry like packagecloud is your best bet.
Do your organization's security policies allow you to use a public package registry? It is likely that your organization has different policies and priorities for different packages.
Does your organization use one single language or framework for all its development? If not, do you want to create package registries separately for each of them, or do you want an all-in-one solution like packagecloud that can handle all your package requirements?
Do you have security consultants who can monitor and ensure the authenticity and integrity of your packages?

How can packagecloud help?

Package cloud is an all-in-one solution to handle package requirements in multiple languages, frameworks, and operating systems. It can keep your packages secure and verify their integrity through checksums and GPG-based signatures. Packagecloud supports all major platforms—Debian, RPM, Python, Maven, Ruby, etc.

You can try packagecloud here.