Dependency Resolvers in Galaxy
There are two parts to building a link between Galaxy and command line bioinformatics tools: (1) the tool XML that
specifies a mapping between the Galaxy web user interface and the tool command line, and (2) the actual command-line
tools, known as Galaxy tool dependencies, which must be installed and available on the system(s) where Galaxy is
configured to run those tools. The job script that Galaxy uses to run a job includes commands (such as changes to the
PATH
environment variable) that are generated by dependency resolvers. These same dependency resolvers are used by
the Galaxy administrative UI to display whether an installed tool’s dependencies have been installed on the Galaxy
server, and to show how they will be resolved at job runtime. There is a default dependency resolver configuration but
administrators can provide their own configuration using the dependency_resolvers
configuration option in Galaxy’s
configuration file, galaxy.yml
. Previously this configuration was stored in a separate XML file,
dependency_resolvers_conf.xml
. Loading the dependency resolvers configuration from that XML file is deprecated but
still supported, however, the documentation and sample configuration file for the XML format can only be found in Galaxy
releases prior to 21.09.
Note
The tool XML referred to below is different from the deprecated dependency resolvers XML referred to above.
The binding between tool XML and the command-line tools they need to run is specified in the tool XML using
<requirement>
tags, for example:
<requirement type="package" version="0.7.10.039ea20639">bwa</requirement>
In some cases these requirement tags can be specified without a version:
<requirement type="package">bedtools</requirement>
These declared requirements are passed as inputs to the dependency resolver in order to generate the environmental setup
in the job script so that the correct tool dependencies required by the tool are found on the $PATH
.
Default Dependency Resolvers
The default configuration of dependency resolvers is equivalent to the following configuration in galaxy.yml
:
galaxy:
dependency_resolvers:
- type: tool_shed_packages
- type: galaxy_packages
- type: conda
- type: galaxy_packages
versionless: true
- type: conda
versionless: true
This default dependency resolver configuration contains five items:
First, the Tool Shed packages dependency resolver is used, which resolves packages installed from the Galaxy Tool Shed using legacy
tool_dependencies.xml
files,then the Galaxy packages dependency resolver is checked for a package matching the requirement name and version,
then the Conda dependency resolver is checked for a package matching the requirement name and version. If no versioned match can be found, it then moves on to searching for unversioned matches, that is,
the Galaxy packages dependency resolver is checked for a package matching the required name only, and
finally the Conda dependency resolver is checked for a package matching the required name only.
If any of the dependency resolvers succeed, a dependency resolution object is returned and no more resolvers are called. This dependency resolution object provides shell commands to prepend to the shell script that runs the command-line tool.
This order can be thought of as a descending order of deliberation. Tool Shed dependencies must be declared next to the tool by the tool author and must be selected for installation at tool installation time - this requires specific actions by both the tool author and the deployer who installed the tools. The dependency is therefore highly crafted to the individual tool. If Galaxy packages have been setup, the deployer of a Galaxy tool has purposely crafted tool dependency statements for a specific installation - this is slightly less deliberate than tool shed packages but such requirements are less likely to be incidentally resolved than Conda packages. Conda recipes are neither tied to tools or a specific installation and are maintained in Conda channels such as Bioconda.
So while tool shed packages are first - they are also somewhat deprecated. Maintaining Conda recipes makes it easier to describe software dependencies both inside of Galaxy and outside.
Tool Shed Dependency Resolver
- type: tool_shed_packages
The tool_shed_packages
dependency resolver works with explicit software packages installed from the Galaxy Tool
Shed as described by legacy tool_dependencies.xml
files. When such a package is installed from the Tool Shed it
creates a directory structure under the directory that is specified as the tool_dependency_dir
in Galaxy’s
configuration. This directory structure contains references to the tool’s ID, owner (in the Tool Shed) and version
string (amongst other things) and ultimately contains a file named env.sh
that contains commands to make the
dependency runnable. This env.sh
file is installed, along with the packaged tool, by the tool package and doesn’t
require any configuration by the Galaxy administrator. The Tool Shed-specific components of the path come from
Galaxy’s install database (see the install_database_connection
option in the Galaxy configuration) and are not
configured by hand.
All new and updated tools in the Tool Shed that follow Galaxy IUC best practices no longer use Tool Shed dependencies, and must have dependencies resolvable via Conda. Because of this, the Tool Shed resolver is largely only relevant to older Galaxy servers that have old tools installed.
The Tool Shed dependency resolver is not able to resolve package requirements that do not have a version string,
like the bedtools
example above.
Galaxy Packages Dependency Resolver
- type: galaxy_packages
versionless: <true|false>
base_path: <filesystem path>
The galaxy_packages
dependency resolver allows Galaxy admins to specify how Galaxy should load manually
installed packages.
This resolver can be configured with the following parameters, all of which are optional:
- base_path
The path under which the resolver looks for packages matching the tool’s specified requirements. The default value is the value of the
tool_dependency_dir
option in Galaxy’s configuration file.- versionless
Ignore requirement versions and use the “default” version instead (see below).
Below the base path, the Galaxy Packages resolver looks for a directory matching the requirement name, e.g.
bedtools
. Inside the name directory, the resolver looks for a directory matching the requirement version. For
example, if the Galaxy tool specifies that it needs bedtools
version 2.20.1, the dependency resolver will look for
a directory <base_path>/bedtools/2.20.1
.
If the Galaxy Package dependency resolver finds a bin
directory in this directory, it adds it to the PATH
used by the scripts Galaxy uses to run tools. If, however, it finds an env.sh
script, it sources this
script before running the tool that requires this dependency. This can be used to set up the environment
needed for the tool to run.
A simple example might be to assume that a collection of bioinformatics software is manually installed in various
directories under /opt/biosoftware
. In this case a <tool_dependency_dir>/bedtools/2.20.1/env.sh
could be
setup to add the corresponding bedtools installation to the Galaxy tool execution’s PATH
.
#!/bin/sh
export PATH=$PATH:/opt/biosoftware/bedtools/2.20.1/bin
As another example, this env.sh
uses Environment Modules
to setup the environment for bedtools
#!/bin/sh
if [ -z "$MODULEPATH" ] ; then
. /etc/profile.d/module.sh
fi
module add bedtools/bedtools-2.20.1
The Galaxy Package dependency resolver operates quite similarly when used in versionless mode. Instead of looking
for a directory named after a version, it looks for a directory symbolic link named default
that links to a
concrete version such as the 2.20.1
example above. For example if bedtools/default
links to bedtools/2.20.1
.
It then looks for a bin subdirectory or env.sh
and incorporates these in the tool script that finally gets run.
This versionless (i.e. default) lookup is also used if the package requirement does not specify a version string.
Conda Dependency Resolver
- type: conda
versionless: <true|false>
prefix: <filesystem path>
exec: <filesystem path>
debug: <true|false>
ensure_channels: [channel, channel...]
auto_install: <true|false>
auto_init: <true|false>
copy_dependencies: <true|false>
read_only: <true|false>
The conda
dependency resolver is used to find (and optionally install-on-demand) dependencies using the Conda
Package Manager. For a very detailed discussion of Conda dependency resolution, check out the
Conda FAQ.
Additionally, the conda resolver makes use of mulled dependencies, where all of the tool’s specified requirements are installed into a single Conda environment. More details about mulled dependencies can be found in the Mulled Containers documentation.
This resolver can be configured with the following parameters, all of which are optional:
- prefix
The root of the conda installation used to locate dependencies in (default: value of global
conda_prefix
option or<tool_dependency_dir>/_conda
otherwise).- exec
The conda executable to use, it will default to the one on
$PATH
(if available) and then to<conda_prefix>/bin/conda
.- versionless
Whether to resolve tools using a version string or not (default:
false
).- debug
Pass debug flag to conda commands (default:
false
).- ensure_channels
Conda channels to enable by default. See https://conda.io/docs/user-guide/tasks/manage-channels.html for more information about channels. This defaults to the value of the global
conda_ensure_channels
option orconda-forge,bioconda
otherwise. This order should be consistent with the Bioconda prescribed order if it includesbioconda
.- auto_install
If
true
, Galaxy will look for and install missing tool dependencies before running a job (default: value of the globalconda_auto_install
option orfalse
otherwise).- auto_init
If
true
, Galaxy will try to install Conda from the web automatically if it cannot find a local copy andconda_exec
is not configured (default: the value of the globalconda_auto_init
option ortrue
otherwise).- copy_dependencies
If
true
, Galaxy will copy dependencies over instead of symbolically linking them when creating per-job environments. This is deprecated because Conda will do this as needed for newer versions of Conda - such as the versions targeted with Galaxy 17.01 and later.- read_only
If
true
, Galaxy will not attempt to install or uninstall requirement sets into this environment.
The conda resolver will search for Conda environments named:
__<requirement_name>@<requirement_version>
in the case that a tool only has one requirement tag, or:
mulled-v1-<hash>
when a tool has multiple requirement tags, where <hash>
is a hash derived from the requirements’ names and
versions.
For example, to try an administrator-maintained read-only Conda installation at /hpc/conda
first and then a
Galaxy-maintained writable Conda installation at /galaxy/conda
second (where any missing dependencies will
be automatically installed at tool runtime), use the following:
- type: conda
auto_init: false
auto_install: false
prefix: /hpc/conda
- type: conda
auto_init: true
auto_install: true
prefix: /galaxy/conda
Lmod Dependency Resolver
- type: lmod
versionless: <true|false>
lmodexec: <filesystem path>
settargexec: <filesystem path>
modulepath: <filesystem path[:filesystem path:...]>
mapping_files: <filesystem path>
The lmod
dependency resolver interacts with the Lmod environment modules system
commonly found on HPC systems.
This resolver can be configured with the following parameters, all of which are optional:
- lmodexec
Path to the Lmod executable on your system. This cannot be just “module” because module is actually a bash function and not the real Lmod binary (see the result of the “type module” command). Default: value of the
$LMOD_CMD
environment variable.- settargexec
Path to the settarg executable on your system. Default: value of the
$LMOD_SETTARG_CMD
environment variable.- modulepath
Path to the folder that contains the LMOD module files on your system. This can be a single path or a semicolon-separated list of paths. Default: value of the
$MODULEPATH
environment variable.- versionless
Set to
true
to resolve a dependency based on its name only (the version number is ignored). Only modules marked as Default will be listed by the “avail” command (The -d option is used). Default:false
.- mapping_files
Path to a YAML configuration file that can be used to link tools requirements with existing Lmod modules. Default:
config/lmod_modules_mapping.yml
Environment Modules Dependency Resolver
- type: modules
versionless: <true|false>
modulecmd: <filesystem path>
modulepath: <filesystem path[:filesystem path:...]>
find_by: <directory|avail>
prefetch: <true|false>
default_indicator: <string>
The modules
dependency resolver interacts with the Environment Modules system
commonly found on HPC systems.
This resolver can be configured with the following parameters, all of which are optional:
- modulecmd
Path to Environment Modules’
modulecmd
tool.- modulepath
Value used for
$MODULEPATH
environment variable, used to locate modules.- versionless
Whether to resolve tools using a version string or not (default:
false
).- find_by
Whether to use the
DirectoryModuleChecker
orAvailModuleChecker
(permissable values aredirectory
oravail
, default isavail
).- prefetch
In the AvailModuleChecker, prefetch module info with
module avail
(default:true
).- default_indicator
What indicates to the AvailModuleChecker that a module is the default version (default:
(default)
). Note that the first module found is considered the default when no version is used by the resolver, so the sort order of modules matters.
The Environment Modules dependency resolver can work in two modes. The AvailModuleChecker
searches the results
of the module avail
command for the name of the dependency. If it is configured in versionless mode,
or is looking for a package with no version specified, it accepts any module whose name matches and is a bare word
or the first module whose name matched. For this reason, the default version of the module should be the first one
listed, something that can be achieved by tagging it with a word that appears first in sort order, for example the
string (default)
(yielding a module name like bedtools/(default)
). So when looking for bedtools
in
versionless mode the search would match the first module called bedtools
, and in versioned mode the search would
only match if a module named bedtools/2.20.1
was present (assuming you’re looking for bedtools/2.20.1
).
The``DirectoryModuleChecker`` looks for files or directories in the path specified by MODULEPATH
or
MODULESHOME
that match the dependency being resolved. In versionless mode a match on simply
the dependency name is needed, and in versioned mode a match on the dependency name and
version string is needed.
If a module matches the dependency is found, code to executed modulecmd sh load
with the name of the dependency
is added to the script that is run to run the tool. E.g. modulecmd sh load bedtools
. If version strings are being
used, they’ll be used in the load
command e.g. modulecmd sh load bwa/0.7.10.039ea20639
.
Homebrew Dependency Resolver
The homebrew
dependency resolver uses the Homebrew Package Manager to resolve requirements.
It is highly experimental, undocumented, and unmaintained, and likely to be dropped from the code base.
Brewed Tool Shed Package Resolver
The brewed_tool_shed
dependency resolver was an attmept to resolve tool shed packages that had been auto converted
to the tool shed. It is highly experimental, undocumented, unmaintained, and will almost certainly be removed from the
code base.