Similar to our APT Repository Internals and YUM Repository Internals posts, this post aims to illustrate the inner workings of a Maven repository. Read on if you have ever been curious as to how
mvn compile figures out which dependencies to download and how to retrieve them in order to build your project.
In this post we’ll examine how dependencies are defined and resolved within your maven project, then we’ll dive into how maven repositories make these dependencies available for consumption.
- What is a maven dependency?
- What is a maven repository?
- Related Posts
What is a maven dependency?
A maven dependency is an artifact that your project (or Maven itself, in the case of Maven plugins) needs to have during the maven build lifecycle.
These are declared in the
<dependencies/> section of your project’s
pom.xml file like this:
<dependencies> <dependency> <groupId>io.packagecloud</groupId> <artifactId>client</artifactId> <version>3.0.0</version> </dependency> </dependencies>
Most dependency declarations consist of
version fields. A group of these key/value pairs is referred to as the Maven Coordinates for a particular dependency and much like geographical coordinates, they allow you precisely specify a particular dependency in an absolute way.
How does maven locate and resolve dependencies?
Unlike other repository formats (APT, YUM Rubygems, there is no main index file that enumerates all possible artifacts available for that repository. Maven uses the coordinates values for a given dependency to construct a URL according to the maven repository layout.
Maven Repository Layout mapping
For primary artifacts (explained below) the URL template looks like:
According to the specification, the rule is:
$groupId is a array of strings made by splitting the groupId’s on “.” into directories.
So for the
groupId value of
$groupId array would be
[org, example, subdepartment], which when translated into directories, becomes
One of the core features of Maven is its ability to handle Transitive Dependencies. That is, to find and download the dependencies of your dependencies, and their dependencies also, recursively, until they are all satisfied.
Just how your own Maven project has a
pom.xml file listing its main dependencies, those dependencies also have a remote
pom file serving a similar purpose. Maven uses this file to figure out what other dependencies to download. When a coordinate does not contain a
classifier, it is considered a primary artifact and is expected to have a
Let’s resolve the
jar for the given coordinates at the beginning of this post:
<dependency> <groupId>io.packagecloud</groupId> <artifactId>client</artifactId> <version>3.0.0</version> </dependency>
We turn the
/io/packagecloud, then construct the rest of the URL with
$versionId, like so:
Similarly, for the extension of
Secondary artifacts, or “attached artifacts”, are dependencies that you want maven to download that are ancillary to your project. Most often they are used to download the
sources for a particular dependency. However, unlike a primary artifact, a secondary artifact is not expected to have a remote
pom and has thus never has any dependencies.
They can be specified in the
<dependencies/> section just like primary artifacts:
<dependency> <groupId>io.packagecloud</groupId> <artifactId>client</artifactId> <version>3.0.0</version> <classifier>sources</classifier> </dependency>
Or, you can download them using
mvn install:install-file, like so:
$ mvn install:install-file -DgroupId=io.packagecloud \ -DartifactId=client \ -Dversion=3.0.0 \ -Dclassifier=sources \
The URL template for secondary artifacts is just like the one for primary artifacts, but with an additional
To verify the downloaded artifacts Maven computes the
sha1 checksum for that artifact and compares it to the values found in the checksum files located at
NOTE: This is strictly meant as a way to quickly verify downloads, and it is NOT meant to be used for authentication or security purposes. This is also NOT a substitute for using HTTPS, as checksums can be trivially intercepted and modified along with the modified artifacts.
For example, the
sha1 file for our
jar artifact would be located at:
md5 file for our
pom artifact would be located at:
To absolutely ensure the authenticity of downloaded artifacts, you can configure Maven to download and validate the cryptographic signatures for the artifacts and checksums it downloads (if available).
The artifact is signed and deployed to a repository at the following URLs:
The checksums for those artifacts are also signed and deployed at the following URLs:
What is a maven repository?
A Maven repository is wherever these constructed artifact URLs live. Most of the time, this is a Web server with a
/maven2 document root, but it can actually be any protocol Maven has a transport plugin for.
To make it easier for humans to discover artifacts, most Web based repositories will be configured to render virtual directory listings, for instance the Maven Central repository lets you browse the entire
org.apache group this way: http://repo1.maven.org/maven2/org/apache/.
The local repository
Before Maven attempts to download a particular artifact from a remote repository it checks the local repository. This is usually located at
$HOME/.m2/repository. The local repository follows the same standard repository layout as remote repositories.
Remote repositories are defined in your project’s
pom.xml file under the
<repositories/> section. For example:
<repositories> <repository> <id>computology-packagecloud-test-packages</id> <url>https://packagecloud.io/computology/packagecloud-test-packages/maven2</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories>
You’ll notice that besides a
<id/> attribute, there are two boolean attributes,
If you are on Maven 2.x, then this would be
<snapshotRepository/>, respectively. Previously,
<repository/> definitions were implicitly release repositories, and it was not possible to support both releases and snapshots.
Repository search order
As of Maven 3.x, repositories are searched in the order in which they are declared.
Release and SNAPSHOT repositories
As seen above, there are two features that can be enabled on repositories, even at the same time.
This is enabled by default on all defined repositories and it simply means that this repository should be added to the list of repositories to use for resolving “released” artifacts. These are artifacts that once published to a coordinate, must not be changed.
Because of the heavily cached and distributed nature of maven repositories (think of everyone's local repository and remote mirrors), you are strongly discouraged from deleting and republishing a changed artifact under the same coordinates. Unless every copy of the previous artifact can be purged from all repositories containing it, this make it difficult to ensure that everyone receives the same artifact given the same coordinates.
When a repository has the “snapshot” feature enabled, this means that Maven will add this to the list of repositories to use only when resolving
SNAPSHOT versions of your dependencies.
What are SNAPSHOT versions?
Having to increase the version and permanently release your software every iteration can painfully lengthen your feedback cycles. Maven solves this problem with
SNAPSHOT version dependencies look just like regular dependencies, except the version will have
-SNAPSHOT appended to it. For example:
<dependency> <groupId>io.packagecloud</groupId> <artifactId>client</artifactId> <version>3.0.0-SNAPSHOT</version> </dependency>
The idea is that you can continuously push your latest changes to
3.0.0-SNAPSHOT and anyone depending on it will get the latest changes every time they build their project. Then, after a few iterations, and everyone is happy the latest state of
3.0.0-SNAPSHOT, it can be permanently released as
3.0.0, and rapid development can continue on
In order to determine the the latest artifact to download for a particular
SNAPSHOT version, Maven uses the Standard Repository Layout to locate a
maven-metadata.xml file for that dependency. For example, using our SNAPSHOT dependency above, Maven constructs the following URL:
This file looks like this:
<metadata modelVersion="1.1.0"> <groupId>io.packagecloud</groupId> <artifactId>client</artifactId> <version>3.0.0-SNAPSHOT</version> <versioning> <snapshot> <timestamp>20161003.234325</timestamp> <buildNumber>2</buildNumber> </snapshot> <lastUpdated>20161003234325</lastUpdated> <snapshotVersions> <snapshotVersion> <extension>jar</extension> <value>3.0.0-20161003.234325-2</value> <updated>20161003234325</updated> </snapshotVersion> <snapshotVersion> <extension>pom</extension> <value>3.0.0-20161003.234325-2</value> <updated>20161003234325</updated> </snapshotVersion> </snapshotVersions> </versioning> </metadata>
According to version 1.1.0 of the Maven Repository Metadata Model(latest at time of writing),
<snapshotVersion> contains the latest artifact corresponding to this snapshot version.
<value> of that
<snapshotVersion> as the
$version in our URL construction scheme, we get the following URL for the
Checksums and signatures work as expected:
As more snapshot artifacts are pushed to
maven-metadata.xml will always get updated to reflect the latest
<snapshotVersion> to use.
Unique vs Non-Unique Snapshots
There are two snapshot “styles” that Maven can use.
These are the snapshot versions detailed in the example above, they use a high resolution timestamp as a version and clients must a
maven-metadata.xml file to resolve the latest. This is the only snapshot style supported by Maven 3.
Maven 2 allowed you to set a
<uniqueVersion>false</uniqueVersion> on a repository definition. When this behavior is selected, there is no
maven-metadata.xml file that is used and “-SNAPSHOT” versions are not treated any differently. The artifact is resolved just like any other. Thus, the URL for our example in a non-unique repository context would look like this:
This artifact URL simply gets overwritten every time there is a new version pushed up at those coordinates.
Due to the obvious issues this introduces, this style has been deprecated for a while now and completely unsupported in Maven 3.
Maven Central and the Super
In addition to your project
pom.xml, Maven uses a “Super”
pom.xml to inherit some default configuration shared by all Maven installations. This is where the default repository, Maven Central is defined:
<repositories> <repository> <id>central</id> <name>Maven Repository Switchboard</name> <layout>default</layout> <url>http://repo1.maven.org/maven2</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories>
That is why you can depend on artifacts hosted at Maven Central without having to define the repository.
Knowing how Maven constructs URLs and resolves dependencies can help you debug issues with your Maven repository. For more information, be sure to check out the official Maven documentation and Maven Source Code.