Dependency confusion and substitution attacks

Jul 12, 2021 · 10 min read

security

Introduction

Package managers make the life of programmers easy by simplifying the process of exploiting reusable libraries for development. These libraries may be developed internally by other teams or downloaded from public repositories.

A side effect of this simplified process is that in most cases developers are oblivious to the library source or the logic behind fetching the dependencies. In such development models, there is a blind trust that is placed in the package manager’s ability to fetch the right package and the authenticity of packages fetched from community-powered public repositories.

Dependency confusion attacks exploit this trust and lack of transparency.

Another word for dependency confusion is a substitution attack. Dependency confusion attacks and substitution attacks are both subsets of a greater realm of hack called software supply chain attacks.

A software supply chain attack would be any kind of attack that takes advantage of vulnerabilities in the software supply chain. A dependency confusion attack is one type of the many kinds of software supply chain attacks.

To manage your system’s vulnerabilities to package dependency attacks, check out packagecloud. Packagecloud is a cloud-based service for the distribution of various software packages in a unified and reliable manner. You can use packagecloud to make sure that you are only using verified packages throughout your system.

Packagecloud supports all the common programming languages and provides a one-stop-shop for hosting all your packages without owning any infrastructure. Sign up for the Packagecloud free trial to get your machines set up and updated easily!

What is a Dependency Confusion Attack?

A dependency confusion attack deliberately confuses the package manager by placing malicious artifacts in public repositories, tricking the package manager into downloading and installing them. The attackers identify the internal package names used by companies and then strategically place malicious code with the same name in public package repositories.

The package managers, in their quest to get the latest versions of packages, fetch the malicious code from these uncurated public feeds. This causes a “confusion” between the desired package and the malicious package, leading to the applications being compromised.

Why is a dependency confusion attack possible?

Since organizations use their internal repository for hosting libraries, one might wonder how it is even possible to become confused about the libraries. The reason is that most organizations use a hybrid approach where package managers download the dependencies from internal private repositories but allow the use of public repositories in certain special cases.

The package managers in those cases are configured to fetch libraries from private and public repositories based on availability and not based on specific logic to fetch the exactly correct package from a specified source. This creates a critical security hole and is the biggest reason for substitution attacks being viable.

Now, imagine an attacker who already has the knowledge of the package names that the organization uses. The attacker can then strategically place malicious code with the same name in public repositories, hoping that logical flaws or configuration mistakes in package managers will result in the attacker’s code being downloaded to the organization’s application.

Apparently, such configuration problems and flaws are fairly common in even the top organizations. Most of the vulnerable companies discovered by Alex Birsan, the security researcher who exposed this mode of attack, had more than 1000 employees.

Let us now explore the strategies generally used by attackers to take advantage of this vulnerability.

Dependency Confusion - How is it done?

Dependency confusion or substitution attacks make use of a number of strategies - right from typos that developers make while searching for internal package names to vulnerabilities in package manager configurations.

You might be thinking, “how in the world do the attackers know about the names of the internal libraries?”. It turns out that it is surprisingly easy if you know how to write a crawler to churn through public repositories like GitHub. Most organizations, in their quest to give back to the open-source community, make the reusable components public through hosting repositories like Github and Gitlab etc.

Buried in those files are package manager dependency lists which often divulge information about private libraries that they use. The requirements.txt file in Python or the package.json file in npm is a typical example that can open up such information.

Let us now look into some of the strategies that attackers use to trick the dependency managers.

Dependencies with higher version numbers

Sometimes dependency managers are configured to fetch packages from the public repositories in the case where the public repo has a higher version number than the private repo. While this is a good strategy to ensure that you always use the latest and the greatest of all libraries, it opens up a big security hole.

Such configurations remain in effect even for internal libraries that are hosted only privately. This means attackers can place malicious code with the same name as the internal package name, just with a higher version number, in a public repository. Such a malicious artifact tricks the package manager into thinking there is a better version in the public repository, leading to harmful code getting entry into the application.

For example, the package manager looking for an internal library named etl-util.jar may find a library with version number 1.02 in the internal repository and a malicious one with the same name and version number 1.03 in a public repository. Since the package manager is configured to prefer the latest library, it will download the malicious code and install it. As discussed above, it’s easy to find out the names of the internal packages used by organizations, because they are usually present in the public code repositories maintained by companies as part of their open-source contributions.

Dependencies with different package names and import names

At times, dependencies have very different import names and package names. For example, in the case of the well-known Python image processing library called OpenCV, the import name is cv2. Python’s pip command-based package management system boasts of its ability to install any required dependency with a simple command.

pip install <dependency_name>

Now, it is natural for a junior developer who copies a snippet of code that contains an ‘import cv2’ to try ‘pip install cv2’ in the command line. But the real library is called opencv-python and the actual command to make this possible is pip install opencv-python.

Now imagine an environment where the dependency manager or proxy is configured to go to the public repository in case it does not find the dependency. An attacker can upload a package name cv2 in a public repository and the proxy will download that package that contains harmful code.

Typosquatting - dependencies with possible wrong names

Typosquatting is an attack that targets users who type the search terms or URLs wrong. The attacker would have strategically placed items to handle the wrongly typed search term.

For example, consider the previous case of opencv-python. Since it is a long name, it is natural to assume that at least some developers would have typed this name wrong in their lifetime. An attacker will strategically place different combinations of possible typos for well-known names. In this case, an attacker can place pythonopencv, python-opencv, or pythonopncv as bait.

How can dependency confusion attacks be prevented?

Dependency confusion or substitution attacks can be prevented by following strict rules in your package management process. From the above details, it is clear that a substitution attack can be possible only if the package manager fetches dependencies from public repositories or if the configuration rules are broad enough to allow for nonspecific versions or names. Let us formalize all of these details into a few best practices to prevent dependency confusion.

Using a single private internal repository

Using a single private internal repository is a straightforward method to combat dependency confusion or substitution attack. Here, the package manager is configured to never go beyond the internal repository. For example, in Python, this means setting the –index-url option explicitly to a private URL and not specifying the fallback option using –extra-index-url.

The problem here is that the organization then has to implement a carefully curated approach to load dependencies into this private repository from the public repositories. This obviously means more effort and manpower.

Many companies opt to have this process managed by a third party. Packagecloud is a good option for teams looking to offload the management of a secure package repository to a qualified team. Check out the packagecloud free trial.

Using dependency scopes or namespaces

Most package managers and public repositories allow establishing a controlled area for a specific organization. For example, in npm, this is a concept called scopes. In Maven, this can be accomplished by using namespaces. It is possible for organizations to own the scopes or namespaces and then have the public repository verify dependency uploads through the registration process or DNS verification.

One fallback is that developers will have to use the scope names while fetching dependencies. This also means you will have to change a lot of existing build scripts if you are migrating to scope based repositories. While this poses an additional effort, it helps developers to ensure that dependencies come from authentic sources.

Using version pinning

Substitution attacks can be prevented to an extent by explicitly mentioning version numbers of the dependencies. This ensures that package managers will not download dependencies from public repositories in case a higher version number is available. This is done by specifically mentioning dependencies with version numbers like 2.5.4 rather than >= 2.5 or 2.5.*

Version pinning is a client-side control. Hence it will not help if your index is already compromised and someone has already uploaded dependencies with the exact version that the developer is looking for.

Using client-side verification

Integrity checking at the client-side while installing dependencies can help avoid dependency confusion and substitution attacks. For example, Python’s pip supports hash checking mode that verifies all downloaded dependencies against the SHA256 hash stored in the client-side. Having such a hash verification means that the substitution attack needs to control both the server and client to have any success. Maven also provides plugins that can verify PGP signatures of dependencies. These signatures can be used to check if there have been any changes to the artifacts after they have been created.

If all these best practices to prevent dependency confusion or substitution attack feels like too much work, you may want to take a look at packagecloud - An artifact repository that supports all programming languages and infrastructure setups.

Packagecloud

Packagecloud can manage all of your packages and deploy them to any infrastructure: on-premise or cloud. It can help set up private registries for npm, python, java, and many other popular package types. It abstracts away all the manual configurations needed to prevent dependency confusion or substitution attack through a simple intuitive interface.

It automatically scans all the packages for vulnerabilities, trojan-horse attacks, and dependency poisoning attacks to make sure the packages that go into your mission-critical applications are free of any malicious code. Sign up for the packagecloud free trial to get your machines set up and updated easily!