RubyGem Index Internals

Dec 15, 2015 • packagecloud

TL;DR

This post briefly goes over the contents of a gem package, the RubyGem Gem::Specification, Gem::Indexer and Gem::Server classes, a breakdown of the different RubyGem index files (specs.4.8, prerelease_specs.4.8, and latest_specs.4.8) and how they are used when determining dependencies and installing gem packages onto a system.

What’s in a gem?

A gem is just a tar archive containing the gem’s files and metadata. The files we are looking for are the metadata.gz and data.tar.gz files.

Uncompress a gem archive like this:

 $ tar -vxf coolgem-1.6.4.gem
 x metadata.gz
 x data.tar.gz

Let’s take a look at these data.tar.gz and metadata.gz files.

data.tar.gz

The data.tar.gz contains the data payload for the gem package. The data folder that gets extracted contains the executable code, required files, and anything else that was included in the .gemspec file.

uncompressed data folder

├── LICENSE
├── README
├── Rakefile
├── bin
│   └── coolgem
├── lib
│   └── coolgem
│       └── test
│           ├── gem
│           │   └── version.rb
│           └── gem.rb
└── coolgem.gemspec

metadata.gz

The metadata.gz file is just a gzipped file containing a YAML representation of the gem package, as defined in a gem’s .gemspec file. The uncompressed YAML metadata contains the gem package information and includes version information, dependencies, and lists all the required paths and files related to the data payload.

The RubyGem Gem::Specification Class

The Gem::Specification class contains information (including runtime and development dependency information) for a specific gem. It’s typically defined in a .gemspec file, and it’s used by the gem build command when creating a gem package. This is an example of a Gem::Specification used within a .gemspec file:

Gem::Specification.new do |s|
  s.name          = 'coolgem'
  s.version       = '1.6.4'
  s.licenses      = ['MIT']
  s.summary       = "Do cool stuff with things"
  s.description   = "This does cool stuff with some things"
  s.authors       = ["An Person"]
  s.email         = 'an.person@packagecloud.io'
  s.files         = Dir.glob("{bin,lib}/**/*")+ %w(LICENSE README Rakefile)
  s.executables   = ['bin/somegem']
  s.require_paths = ["lib", "bin"]
  s.homepage      = 'https://packagecloud.io'

  s.add_development_dependency "rake", "10.4.2"
end

The RubyGem Gem::Indexer Class

The Gem::Indexer class is used to build the gem repository index. On initialization, the Gem::Indexer takes a directory as its first argument, and an optional second parameter to build indices for older versions of RubyGems. This directory is a path to a directory that contains a gems sub-directory which holds/will hold all the .gem files to be indexed. The secondary options hash is used to set a value (build_modern: false or build_legacy on older versions) for indices targeted by versions of RubyGems prior to 1.2.

> Gem::Indexer.new(directory, { build_modern: true })

Building the index files

The generate_index method can be called on an instance of Gem::Indexer to build the necessary indices used by the RubyGems API.

> Gem::Indexer.new('/path/to/repo',{ build_modern:true }).generate_index

Generating Marshal quick index gemspecs for 2 gems
.
Complete
Generated Marshal quick index gemspecs: 0.001s
Generating specs index
Generated specs index: 0.000s
Generating latest specs index
Generated latest specs index: 0.000s
Generating prerelease specs index
Generated prerelease specs index: 0.000s
Compressing indicies
Compressed indicies: 0.001s
 => ["specs.4.8", "specs.4.8.gz", "latest_specs.4.8", "latest_specs.4.8.gz", "prerelease_specs.4.8", "prerelease_specs.4.8.gz"]

RubyGem Index Files

The RubyGem index files are Marshal‘d and gzipped arrays and the 4.8 in the filename is referring to the current version of Ruby’s Marshal-ing format. Read more about Ruby’s Marshal format and Marshal‘ing Library.

specs.4.8
specs.4.8.gz
latest_specs.4.8
latest_specs.4.8.gz
prerelease_specs.4.8
prerelease_specs.4.8.gz

specs.4.8.gz

The specs.4.8.gz file is a Marshal‘d and gzipped array that contains smaller arrays that hold the name, version and platform for each non-prerelease gem that has been indexed.

> Marshal.load(Gem.gunzip(File.read("specs.4.8.gz")))
=> [["coolgem", #<Gem::Version "1.6.4">, "ruby"]]

prerelease_specs.4.8.gz

Similarly, the prerelease_specs.4.8.gz file is a Marshal‘d and gzipped array that contains smaller arrays holding the name, version and platform for each prerelease gem that has been indexed.

> Marshal.load(File.read("prerelease_specs.4.8"))
=> [["rack", #<Gem::Version "1.6.0.beta2">, "ruby"]]

latest_specs.4.8.gz

The latest_specs.4.8.gz file is also a a Marshal‘d and gzipped array containing smaller arrays holding the name, version and platform for only the latest non-prerelease gems that have been indexed. This index file is only useful when you are certain you want to install the latest version of a gem.

> Marshal.load(Gem.gunzip(File.read("latest_specs.4.8.gz")))
=> [["coolgem", #<Gem::Version "1.6.4">, "ruby"]]

Gem dependencies and RubyGem gemspec files

The dependency information for an indexed gem can be found inside a gem’s gemspec file. When the Gem::Indexer generates the index for a set of gems, it iterates over each gem in the gems directory and parses the gemspecs from the .gem files. When the index is generated, these parsed gemspecs are then placed into a directory named quick/Marshal.4.8/ containing all the individual Marshal‘d and gzipped gemspec.rz files. The /quick directory can be found inside the directory that was passed to the Gem::Indexer class on initialization. The Marshal.4.8 directory inside the /quick directory is namespaced to Ruby’s current Marshal format (Marshal.4.8) and it contains the Marshal‘d and gzipped gemspec.rz files:

quick
 ├── Marshal.4.8
     └─ coolgem-1.6.4.gemspec.rz
     └─ rack-1.6.0.beta2.gemspec.rz

Taking a look at the prerelease rack gem, we can see the dependency information:

> spec = Marshal.load(Gem.inflate(File.read('quick/Marshal.4.8/rack-1.6.0.beta2.gemspec.rz')))
> spec.dependencies
[<Gem::Dependency type=:development name="bacon" requirements=">= 0">, <Gem::Dependency type=:development name="rake" requirements=">= 0">]

The Gem::Server Class

The Gem::Server: class provides a way for users to consume gem packages via gem install. Gem::Server.new command starts a server on a given port and allows users to download the different index files, gemspec files, rdoc documentation, and installable gem packages on a set of routes.

Gem::Server.new Gem.dir, 8089, false

Routes

From the stdlib docs on Gem::Server:

/                    - Browsing of gem spec files for installed gems
/specs.4.8.gz        - specs name/version/platform index
/latest_specs.4.8.gz - latest specs name/version/platform index
/quick/              - Individual gemspecs
/gems                - Direct access to download the installable gems
/rdoc?q=             - Search for installed rdoc documentation

Gem Indices and Installing Gem Packages

These index files (latest_specs.4.8, specs.4.8, prerelease_specs.4.8) are requested when the gem install command is used to install a gem package. The following examples show the different specs files that are used:

when installing a gem with a specific version: gem install rails -v=4.0.0

https://packagecloud.io/computology/test-gems/specs.4.8.gz
200 OK

when installing a gem without specific version: gem install rails

https://packagecloud.io/computology/test-gems/latest_specs.4.8.gz
200 OK

and when installing a gem using the --pre flag to specify a prerelease version:

https://packagecloud.io/computology/test-gems/prerelease_specs.4.8.gz
200 OK

Once the required specs file is resolved, the gemspec file for the gem to be installed is downloaded:

https://packagecloud.io/computology/test-gems/quick/Marshal.4.8/coolgem-1.6.4.gemspec.rz

Resolving gem dependencies

Once the gemspec file is downloaded for a gem, its dependencies can now be resolved. This process of walking through the specs and gemspec files will need to happen for each dependency until all required dependencies have been installed.

Downloading and Unpacking a gem package

The actual .gem packages are found in the indexed gems directory. The RubyGems API expects direct access to download the installable gems via /gems path of your source:

https://packagecloud.io/computology/test-gems/gems/coolgem-1.6.4.gem
200 OK

After all the dependencies are installed, the gem command will unpack and install the gem contents to your system:

$ sudo gem install -V coolgem
...
GET https://packagecloud.io/computology/test-gems/latest_specs.4.8.gz
200 OK
GET https://packagecloud.io/computology/test-gems/quick/Marshal.4.8/coolgem-1.6.4.gemspec.rz
200 OK
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/Gemfile
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/LICENSE.txt
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/README.md
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/Rakefile
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/bin/coolgem
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/lib/coolgem/test/gem.rb
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/lib/coolgem/test/gem/version.rb
/Users/person/.rvm/rubies/ruby-2.2.3-p451/lib/ruby/gems/2.2.3/gems/coolgem-1.6.4/coolgem.gemspec
/Users/person/.rvm/rubies/ruby-2.2.3-p451/bin/coolgem
Successfully installed coolgem-1.6.4
1 gem installed

Conclusion

When building your own gem server, or trying to understand how the gem command installs packages and resolves dependencies, it’s helpful to understand what the index and gemspec files are, and how they are used by the gem command. While knowing how these file and indices are used isn’t a requirement, it can definitely assist when debugging why gem or Bundler isn’t finding a particular gem package or when a gem dependency breaks your app. Happy packaging.

Never miss an update!

Subscribe to our RSS feed