Dependency Resolvers in Galaxy

There are two parts to building a link between Galaxy and command line bioinformatics tools: the tool XML that specifies a mapping between the Galaxy web user interface and the tool command line and tool dependencies that specify how to source the actual packages that implement the tool’s commands. The final script that Galaxy submits to run a job uses includes commands, such as changes to the PATH environment variable, that are generated by dependency resolvers. There is a default dependency resolver configuration but administrators can provide their own configuration using the dependency_resolvers_conf.xml configuration file in the Galaxy config/ directory.

The binding between tool XML and the tools they need to run is specified in the tool XML using requirements tags, for example

<requirement type="package" version="0.7.10.039ea20639">bwa</requirement>

In some cases these requirement tags can be specified without a version

<requirement type="package">bedtools</requirement>

The requirement turn into inputs to the dependency resolver. Each dependency resolver is thus given given one or two inputs: the name of the dependency to resolve and, in most cases, the version string of the dependency.

Default Dependency Resolvers

The default configuration of dependency resolvers is equivalent to the following dependency_resolvers_conf.xml

<dependency_resolvers>
<!-- the default configuration, first look for dependencies installed from the toolshed -->
  <tool_shed_packages />
<!-- then look for env.sh files profile according to the "galaxy packages" schema -->
  <galaxy_packages />
  <galaxy_packages versionless="true" />
  <conda />
  <conda versionless="true" />
</dependency_resolvers>

This default dependency resolver configuration contains three items. First, the tool shed dependency resolver is used, then the Galaxy packages dependency resolver is used, first looking for packages by name and version string and then finally looking for the package just by name. The default configuration thus prefers packages installed from the Galaxy Tool Shed, before trying to find a “Galaxy package” satisfying the specific version the dependency requires before finally falling back to looking for a Galaxy package with merely the correct name. If any of the dependency resolvers succeeds a dependency resolution object is returned and no more resolvers are called. This dependency resolution object provides shell commands to prepend to the shell script that runs the tool.

Tool Shed Dependency Resolver

The tool_shed_packages dependency resolver works with packages installed from the Galaxy Tool Shed. When a package is installed from the Tool Shed it creates a directory structure under the directory that is specified as the tool_dependency_dir in Galaxy’s configuration. This directory structure contains references to the tool’s name, owner (in the Tool Shed) and version string (amongst other things) and ultimately contains a file named env.sh that contains commands to make the dependency runnable. This is installed, along with the packaged tool, by the tool package and doesn’t require any configuration by the Galaxy administrator.

The Tool Shed dependency resolver is not able to resolve package requirements that do not have a version string, like the bedtools example above.

Galaxy Packages Dependency Resolver

The galaxy_packages dependency resolver allows Galaxy admins to specify how Galaxy should load manually installed packages. This resolver can be configured either to use the version string or in versionless mode.

The Galaxy Packages dependency resolver takes a base_path argument that specifies the path under which it starts looking for the files it requires. The default value for this base_path is the tool_dependency_dir configured in Galaxy’s config/galaxy.ini. Below the base path, the Galaxy Packages resolver looks for directories named after tools, e.g. bedtools. As mentioned before, this resolver works in versioned and versionless mode. The default mode is versioned, where the dependency resolver looks for a directory named after the dependency’s version string. For example, if the Galaxy tool specifies that it needs bedtools version 2.20.1, the dependency resolver will look for a directory bedtools/2.20.1.

If the Galaxy Package dependency resolver finds a bin directory in this directory, it adds it to the PATH used by the scripts Galaxy uses to run tools. If, however, it finds an env.sh script, it sources this script before running the tool that requires this dependency. This can be used to set up the environment needed for the tool to run. For example, this env.sh uses Environment Modules to setup the environment for bedtools

#!/bin/sh

if [ -z "$MODULEPATH" ] ; then
  . /etc/profile.d/module.sh
fi

module add bedtools/bedtools-2.20.1

The Galaxy Package dependency resolver operates quite similarly when used in versionless module. Instead of looking for a directory named after a version, it looks for a directory ending in default. For example bedtools/default. It then looks for a bin subdirectory or envh.sh and incorporates these in the tool script that finally gets run. This versionless (i.e. default) lookup is also used if the package requirement does not specify a version string.

Environment Modules Dependency Resolver

The example above used Environment Modules to set the PATH (and other settings) for bedtools. With the modules dependency resolver it is possible to use Environment Modules directory. This resolver takes these parameters:

modulecmd
path to Environment Modules’ modulecmd tool
modulepath
value used for MODULEPATH environment variable, used to locate modules
versionless
whether to resolve tools using a version string or not (default: false)
find_by
whether to use the DirectoryModuleChecker or AvailModuleChecker (permissable values are “directory” or “avail”, default is “avail”)
prefetch
in the AvailModuleChecker prefetch module info with module avail (default: true)
default_indicator
what indicate to the AvailModuleChecker that a module is the default version (default: “(default)”). Note that the first module found is considered the default when no version is used by the resolver, so the sort order of modules matters.

The Environment Modules dependency resolver can work in two modes. The AvailModuleChecker searches the results of the module avail command for the name of the dependency. If it is configured in versionless mode, or is looking for a package with no version specified, it accepts any module whose name matches and is a bare word or the first module whose name matched. For this reason, the default version of the module should be the first one listed, something that can be achieved by tagging it with a word that appears first in sort order, for example the string “(default)” (yielding a module name like bedtools/(default)). So when looking for bedtools in versionless mode the search would match the first module called bedtools, and in versioned mode the search would only match if a module named bedtools/2.20.1 was present (assuming you’re looking for bedtools/2.20.1).

The``DirectoryModuleChecker`` looks for files or directories in the path specified by MODULEPATH or MODULESHOME that match the dependency being resolved. In versionless mode a match on simply the dependency name is needed, and in versioned mode a match on the dependency name and version string is needed.

If a module matches the dependency is found, code to executed modulecmd sh load with the name of the dependency is added to the script that is run to run the tool. E.g. modulecmd sh load bedtools. If version strings are being used, they’ll be used in the load command e.g. modulecmd sh load bwa/0.7.10.039ea20639.

Homebrew Dependency Resolver

This dependency resolver uses homebrew packages to resolve requirements.

Brew Tool Shed Package Resolver

This dependency resolver would resolve tool shed packages that had been auto converted to the tool shed. It is highly experimental, undocumented, and will almost certainy be removed from the code base.

Conda Dependency Resolver

The conda XML tag can be used to configure a conda dependency resolver. This resolver can be configured with the following options.

prefix
The conda_prefix used to locate dependencies in (default: <tool_dependency_dir>/_conda).
exec
The conda executable to use, it will default to the one on the PATH (if available) and then to <conda_prefix>/bin/conda.
versionless
whether to resolve tools using a version string or not (default: false)
debug
Pass debug flag to conda commands (default: false).
ensure_channels
conda channels to enable by default. See http://conda.pydata.org/docs/custom-channels.html for more information about channels. (default: iuc,bioconda,r,defaults,conda-forge).
auto_install
Set to True to instruct Galaxy to look for and install missing tool dependencies before each job runs. (default: False)
auto_init
Set to True to instruct Galaxy to install conda from the web automatically if it cannot find a local copy and conda_exec is not configured.