.. _dependency_resolvers:


Dependency Resolvers in Galaxy
==============================

There are two parts to building a link between Galaxy and command line bioinformatics tools: (1) the tool XML that
specifies a mapping between the Galaxy web user interface and the tool command line, and (2) system tool dependencies that
specify how to source the actual packages that implement the tool’s commands. The job script that Galaxy uses to run a job
includes commands (such as changes to the ``PATH`` environment variable) that are generated by *dependency
resolvers*. These same dependency resolvers are used by the Galaxy administrative UI to display whether an installed
tool's dependencies have been installed on the Galaxy server, and to show how they will be resolved at job runtime.
There is a default dependency resolver configuration but administrators can provide their own configuration using the
``dependency_resolvers_conf.xml`` configuration file in the Galaxy ``config/`` directory.

The binding between tool XML and the system tools they need to run is specified in the tool XML using ``<requirement>``
tags, for example

.. code-block:: xml

    <requirement type="package" version="0.7.10.039ea20639">bwa</requirement>

In some cases these requirement tags can be specified without a version

.. code-block:: xml

    <requirement type="package">bedtools</requirement>

These declared requirements are passed as inputs to the dependency resolver.

Default Dependency Resolvers
----------------------------

The default configuration of dependency resolvers is equivalent to the following ``dependency_resolvers_conf.xml``

.. code-block:: xml

  <dependency_resolvers>
    <tool_shed_packages />
    <galaxy_packages />
    <conda />
    <galaxy_packages versionless="true" />
    <conda versionless="true" />
  </dependency_resolvers>

This default dependency resolver configuration contains five items:

1. First, the *Tool Shed dependency resolver* is used, which resolves packages installed from the Galaxy Tool Shed
   using legacy ``tool_dependencies.xml`` files,
2. then the *Galaxy packages dependency resolver* is checked for a package matching the requirement name and version,
3. then the *Conda dependency resolver* is checked for a package matching the requirement name and version. If no
   versioned match can be found, it then moves on to searching for unversioned matches, that is,
4. the *Galaxy packages dependency resolver* is checked for a package matching the required name only, and
5. finally the *Conda dependency resolver* is checked for a package matching the required name only.

If any of the dependency resolvers succeed, a dependency resolution object is returned and no more resolvers are
called. This dependency resolution object provides shell commands to prepend to the shell script that runs the system tool.

This order can be thought of as a descending order of deliberation. Tool Shed dependencies must be declared next to the
tool by the tool author and must be selected for installation at tool installation time - this requires specific actions
by both the tool author and the deployer who installed the tools. The dependency is therefore highly
crafted to the individual tool. If Galaxy packages have been setup, the deployer of a Galaxy tool has purposely crafted
tool dependency statements for a specific installation - this is slightly less deliberate than tool shed packages but
such requirements are less likely to be incidentally resolved than Conda packages. Conda recipes are neither tied to
tools or a specific installation and are maintained in Conda channels such as Bioconda.

So while tool shed packages are first - they are also somewhat deprecated. Maintaining Conda recipes makes it easier
to describe software dependencies both inside of Galaxy and outside.

Tool Shed Dependency Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``tool_shed_packages`` dependency resolver works with explicit software packages installed from the Galaxy Tool
Shed as described by legacy ``tool_dependencies.xml`` files. When such a package is installed from the Tool Shed it
creates a directory structure under the directory that is specified as the ``tool_dependency_dir`` in Galaxy's
configuration. This directory structure contains references to the tool's ID, owner (in the Tool Shed) and version
string (amongst other things) and ultimately contains a file named ``env.sh`` that contains commands to make the
dependency runnable. This is installed, along with the packaged tool, by the tool package and doesn't require any
configuration by the Galaxy administrator.

Tools installed from the Tool Shed may also install Conda recipes and most new best practice tools do this
by default now.

The Tool Shed dependency resolver is not able to resolve package requirements that do not have a version string,
like the `bedtools` example above.

Galaxy Packages Dependency Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``galaxy_packages`` dependency resolver allows Galaxy admins to specify how Galaxy should load manually
installed packages. This resolver can be configured either to use the version string or in *versionless* mode.

The Galaxy Packages dependency resolver takes a ``base_path`` argument that specifies the path under which
it starts looking for the files it requires. The default value for this ``base_path`` is the
``tool_dependency_dir`` configured in Galaxy's ``config/galaxy.yml``. Below the base path, the Galaxy Packages
resolver looks for directories named after tools, e.g. ``bedtools``. As mentioned before, this resolver
works in versioned and versionless mode. The default mode is versioned, where the dependency resolver looks for a
directory named after the dependency's version string. For example, if the Galaxy tool specifies that it
needs ``bedtools`` version 2.20.1, the dependency resolver will look for a directory ``bedtools/2.20.1``.

If the Galaxy Package dependency resolver finds a ``bin`` directory in this directory, it adds it to the ``PATH``
used by the scripts Galaxy uses to run tools. If, however, it finds an ``env.sh`` script, it sources this
script before running the tool that requires this dependency. This can be used to set up the environment
needed for the tool to run.

A simple example might be to assume that a collection of bioinformatics software is manually installed in various
directories under ``/opt/biosoftware``. In this case a ``<tool_dependency_dir>/bedtools/2.20.1/env.sh`` could be
setup to add the corresponding bedtools installation to the Galaxy tool execution's ``PATH``.

.. code-block:: bash

    #!/bin/sh

    export PATH=$PATH:/opt/biosoftware/bedtools/2.20.1/bin


As another example, this ``env.sh`` uses `Environment Modules <http://modules.sourceforge.net/>`_
to setup the environment for ``bedtools``

.. code-block:: bash

    #!/bin/sh

    if [ -z "$MODULEPATH" ] ; then
      . /etc/profile.d/module.sh
    fi

    module add bedtools/bedtools-2.20.1

The Galaxy Package dependency resolver operates quite similarly when used in versionless module. Instead of looking
for a directory named after a version, it looks for a directory symbolic link named ``default`` that links to a
concrete version such as the ``2.20.1`` example above. For example if ``bedtools/default`` links to ``bedtools/2.20.1``.
It then looks for a `bin` subdirectory or ``env.sh`` and incorporates these in the tool script that finally gets run.
This versionless (i.e. default) lookup is also used if the package requirement does not specify a version string.

Environment Modules Dependency Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The example above used Environment Modules to set the ``PATH`` (and other settings) for ``bedtools``. With
the ``modules`` dependency resolver it is possible to use Environment Modules directory. This resolver
takes these parameters:

modulecmd
    path to Environment Modules' ``modulecmd`` tool

modulepath
    value used for MODULEPATH environment variable, used to locate modules

versionless
    whether to resolve tools using a version string or not (default: ``false``)

find_by
    whether to use the ``DirectoryModuleChecker`` or ``AvailModuleChecker`` (permissable values are ``directory`` or ``avail``,
    default is ``avail``)

prefetch
    in the AvailModuleChecker prefetch module info with ``module avail`` (default: ``true``)

default_indicator
    what indicate to the AvailModuleChecker that a module is the default version (default: ``(default)``). Note
    that the first module found is considered the default when no version is used by the resolver, so
    the sort order of modules matters.

The Environment Modules dependency resolver can work in two modes. The ``AvailModuleChecker`` searches the results
of the ``module avail`` command for the name of the dependency. If it is configured in versionless mode,
or is looking for a package with no version specified, it accepts any module whose name matches and is a bare word
or the first module whose name matched. For this reason, the default version of the module should be the first one
listed, something that can be achieved by tagging it with a word that appears first in sort order, for example the
string ``(default)`` (yielding a module name like ``bedtools/(default)``). So when looking for ``bedtools`` in
versionless mode the search would match the first module called ``bedtools``, and in versioned mode the search would
only match if a module named ``bedtools/2.20.1`` was present (assuming you're looking for ``bedtools/2.20.1``).

The``DirectoryModuleChecker`` looks for files or directories in the path specified by ``MODULEPATH`` or
``MODULESHOME`` that match the dependency being resolved. In versionless mode a match on simply
the dependency name is needed, and in versioned mode a match on the dependency name and
version string is needed.

If a module matches the dependency is found, code to executed ``modulecmd sh load`` with the name of the dependency
is added to the script that is run to run the tool. E.g. ``modulecmd sh load bedtools``. If version strings are being
used, they'll be used in the ``load`` command e.g. ``modulecmd sh load bwa/0.7.10.039ea20639``.


Homebrew Dependency Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This dependency resolver uses homebrew packages to resolve requirements. It is highly experimental
and undocumented.


Brew Tool Shed Package Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This dependency resolver would resolve tool shed packages that had been
auto converted to the tool shed. It is highly experimental, undocumented,
and will almost certainy be removed from the code base.


Conda Dependency Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~

The ``conda`` directive can be used to configure a conda dependency resolver.
This resolver can be configured with the following options. For a very detailed
discussion of Conda dependency resolution, check out the :ref:`Conda FAQ <conda_faq>`.

prefix
    The conda_prefix used to locate dependencies in (default: ``<tool_dependency_dir>/_conda``).

exec
    The conda executable to use, it will default to the one on the
    PATH (if available) and then to ``<conda_prefix>/bin/conda``.

versionless
    whether to resolve tools using a version string or not (default: ``False``).

debug
    Pass debug flag to conda commands (default: ``False``).

ensure_channels
    conda channels to enable by default. See
    https://conda.io/docs/user-guide/tasks/manage-channels.html for more
    information about channels. This defaults to ``iuc,conda-forge,bioconda,defaults``.
    This order should be consistent with the `Bioconda prescribed order <https://github.com/bioconda/bioconda-recipes/blob/master/config.yml>`__
    if it includes ``bioconda``.

auto_install
    If ``True``, Galaxy will look for and install missing tool
    dependencies before running a job (default: ``False``).

auto_init
    If ``True``, Galaxy will try to install Conda from the web
    automatically if it cannot find a local copy and ``conda_exec`` is not
    configured. This defaults to ``True`` as of Galaxy 17.01.

copy_dependencies
    If ``True``, Galaxy will copy dependencies over instead of symbolically
    linking them when creating per job environments. This should be considered somewhat
    deprecated because Conda will do this as needed for newer versions of Conda - such
    as the version targeted with Galaxy 17.01+.