Containers in Galaxy¶
Galaxy can run tools inside containers using
The containers can be either explicit or mulled (also called multi package containers).
The former are given by
<container> requirements pointing to a specific container.
The latter are containers built for a set of requirements of type
Mulled containers are described by a hash that is unique for a set of
packages and versions (for mulled v2), e.g.
(mulled-v2-PACKAGEHASH:VERSIONHASH-BUILDNUMBER). For mulled containers
of single packages simply the package name and version are used instead of the hashes,
Bioconda and the Galaxy project provide infrastructure to create mulled
containers and to make them globally available on the
For each bioconda package a container is deployed
Mulled containers are created and deployed by the infrastructure provided by the multi-package-containers repository. Mulled containers are added automatically to this repository for all tools in tool repositories that are crawled by the planemo monitor repository (which includes for instance tools-iuc and several other tool repositories).
Container Resolvers in Galaxy¶
A container resolver tries to get a container description, i.e. the information (URI/path to the container image, …) that is needed to execute a tool in a container (in the execution environment), given the requirements specified in this tool. Galaxy implements various container resolvers that are suitable for different needs.
Galaxy tries to execute jobs using containers if they are sent
to execution environments (previously called destinations) with either
enabled. Note, the links to the sample configurations exemplify this for local execution environments,
but this works for any environment as long as
For jobs that are sent to such an execution environment Galaxy tries to obtain a
container description by sequentially executing the configured container
resolvers (see below). The job is then executed using the description returned
by the first successful container resolver.
If all configured container resolvers failed, i.e. no container description
could be obtained, the tool is by default executed using
standard dependency resolvers, e.g.
Alternatively, if the execution environment specifies
the job fails in this case.
Besides determining a container description, some container resolvers also cache and/or build containers.
The list of container resolvers is defined using YAML. This can be either
globally in an extra file (
container_resolvers_config_file) or inline the Galaxy configuration (
per execution environment using
Container resolvers defined for the execution environment take precedence over globally defined container resolvers. A sample YAML file showing the default configuration which is active if neither a global or local configuration is given in container_resolvers.yml.sample.
During the container resolution the configured container resolvers are sequentially applied, stopping at the first resolver that yields a container description.
Main resolver types:¶
The main types of container resolvers follow this naming scheme:
[cached_][explicit,mulled][_singularity]. That is
a container resolver is either
cached if it is prefixed with
cached_and non-cached otherwise.
yield a container description suitable for singularity if suffixed by
_singularityand docker otherwise.
It’s important to note that similarities in the names not necessarily imply any similarity in the function of the container resolvers.
There are the following mulled container resolvers:
Furthermore there are the following explicit container resolvers:
Note that there is no
1. docker vs singularity¶
Galaxy can execute tools in containers using
The corresponding container resolvers yield container descriptions suitable
for the corresponding “executor”, i.e., docker (singularity, resp.)
container resolvers will resolve a container only in execution environments
with enabled docker (singularity, resp.). Thus, if only execution environments
with docker (resp. singularity) are present then singularity (resp. docker)
container resolvers are ignored (and may be omitted).
Note that, for the execution with
singularity Galaxy relies mostly on
docker containers that are either executed directly or are converted
to singularity images. An exception is for instance explicit container
2. mulled vs explicit¶
Mulled container resolvers apply for requirements defined by tools that are a set of packages:
<requirements> <requirement type="package" version="0.5">foo</requirement> <requirement type="package" version="1.0">bar</requirement> </requirements>
Explicit container resolvers apply for requirements defined by tools in the form of a container requirement:
<requirements> <container type="docker">quay.io/qiime2/core:2022.8</container> </requirements>
See also Additional resolver types.
3. cached vs non-cached¶
While non-cached resolvers will yield a container description pointing to an online available docker container, cached resolvers will store container images on disk and use those.
This distinction is the weakest: some (by name) non-cached container resolvers
can also resolve cached containers and are even responsible for the caching itself,
i.e. they execute a
There are important differences between Galaxy’s cached docker and singularity
container resolvers. The caching mechanism essentially executes a
docker pull or
singularity pull, respectively. For docker this creates
an entry in the docker image cache (on the local node) whereas for
singularity an image file is created in the specified
On distributed systems
cache_directory needs to be accessible on all
For singularity, admins should also take care of the
docker inspect ... ; [ $? -ne 0 ] && docker pull ...
command is used in each job script to ensure that images are available on a compute node.
Thereby a container will be cached after the tool run even if no cached container resolver was used.
Admins need to take care of docker caches of the main and compute nodes.
For distributed compute systems, built-in techniques of docker may be useful:
Function and use of the
resolve function of the main resolver types:¶
The resolve function is called when
listing the container tab in the dependency admin UI (using
triggering a build from the admin UI (using
when a job is prepared
resolve function implements the caching of images then this only
happens if its
install parameter is set to
True. This is the case
in case 2 and case 3 (but see https://github.com/galaxyproject/tools-iuc/pull/5221#discussion_r1152025883).
It’s important to understand that 1 and 2 rely on the global container resolver config and do not set a resolver type!
This becomes relevant (e.g.) for setups specifying either:
container resolver config(s) only per execution environment (i.e. no global container resolver config) or
different global and execution environment container resolver config(s)
In case a) the default container config will be used which contains docker
and singularity container resolvers (see container_resolvers.yml.sample).
If both container backends (i.e. the
are available then only the docker container resolvers will be used.
In case b) using the Admin UI for building/caching containers might be impossible, but one needs to use the API directly which allows to specify the container type and the resolver(s) that should be used.
1. Explicit resolvers¶
The uncached explicit resolvers (
compute a container description using an URI that suites the
explicit will still cache the docker container on tool run, since
the job script contains
docker pull ...
The cached explicit resolver, i.e.
cached_explicit_singularity (no docker
analog available), downloads the image to the
cache_directory if needed and
return a container description that points to the image file in the
cached_explicit_singularity will automatically cache the container
on first tool run (and when the build/installation is triggered via the Admin
UI or the API). When listing the container the container resolver will always
yield the path (even if non existent, i.e. before the 1st tool run or the
caching was triggered).
2. Mulled resolvers¶
All mulled resolvers compute a mulled hash that describes the requirements and is included in the container name (see above).
For the cached mulled resolvers (
resolve function only queries if the required image is already cached
and returns a container description pointing to the cached image. For docker this is
done by executing
docker images and for
singularity the content of the
cache directory (
cache_directory) is queried.
In contrast to the cached explicit resolver the cached mulled resolvers do not cache images, but they only query the available cached images.
The “uncached” mulled resolvers (
default just return a container description containing the URI of the container
and download the image to the cache if
install=True (see also
Function and use of the resolve function of the main resolver types:). The caching
is done by a call to
docker pull and
singularity pull, respectively.
Note that, by default the URI is returned in any case, i.e. even if the image
just has been downloaded or if the image is already in the cache. Only if the
resolvers are initialized with
returns a container description pointing to the cached image. Note that this
makes a difference only for singularity (since for docker the URI is identical
to the name of the cached image).
In contrast to the uncached explicit resolver, the uncached mulled resolvers
do cache images, but the returned container description by default points to
the uncached URI (if the default of
auto_install=True is used; otherwise
the cached image is used).
Additional resolver types¶
In addition there are several resolvers that allow to hardcode container identifiers for certain conditions:
mappingresolver allows to map pairs of tool IDs and tool versions to container identifiers and container types. This allows to hardcode or overwrite container definitions for specific tools.
fallback_no_requirementsfor tools specifying no requirements
requires_galaxy_environmentfor (internal) tools that need Galaxy’s (python) environment
fallbacka fallback container for tools that don’t match any resolver
Building resolver types:¶
There are two container resolvers that locally create a mulled container.
Note that at the moment
build_mulled_singularity also requires docker for
"biocontainers"for the non-building and
"local"for the building mulled resolvers. Available for all mulled container resolvers except
cached_mulled_singularity. Used to set the namespace that is used to query quay.io. Note that there is no “local” namespace at quay.io, but Galaxy uses it to refer to locally built images (that’s why it is the default for the building resolvers).
"v2"(default: “v2”): Applies to all mulled container resolvers. Sets the version of the mulled hash that is used in the image name.
/bin/bashand sets the shell to be used in the container. Applies only to the resolvers listed in Additional resolver types.
auto_install: defaults to
True. Applies to
build_mulled_singularity. For the non-building resolvers this controls if a container description pointing to the cached image shall be returned (
auto_install==False). For the building resolvers the parameter controls if the container should be built also if the resolve function is called with
install=False(e.g. when listing the container in the Admin UI and no other container resolver worked for a tool).
Admins certainly should think carefully about
auto_install, since there are
many scenarios where the default is not desirable.
cache_directory: applies to singularity container resolvers that allow to cache images and sets the directory where to save images. If not set, containers are saved in
"dir_mtime". The singularity resolvers iterate over the contents of the cache directory. The contents of the directory can be accessed uncached (in which case the file listing is computed for each access) or cached (then the listing is computed only if the mtime of the cache dir changes and on first access). (applies to all singularity resolvers that can cache images, except explicit_singularity)
Note on the built-in caching capabilities of singularity and docker¶
It is important to note that docker as well as singularity have their own built-in caching mechanism.
In case of docker, a
docker pull (e.g. executed from a container resolver) or
docker run (e.g. executed on the compute node running the job) will add the
image to the local image cache.
Galaxy’s docker container resolvers rely on docker’s built-in image cache,
i.e. they query the image cache on the node that is executing Galaxy.
If the nodes that execute jobs are different from the node executing Galaxy
it’s important to note that these nodes will have independent caches that
admins might want to control.
For the the execution of jobs Galaxy already implement the support for using
tarballs of container images.
container_image_cache_path (set in galaxy.yml) or the destination
docker_container_image_cache_path. But at the moment none of the
docker container resolvers creates these image tarballs.
Also singularity has its own caching mechanism and caches by default to
It can be cleaned regularly using the
singularity cache command, or disabled by using the
SINGULARITY_DISABLE_CACHE environment variable.
Setting up Galaxy using docker / singularity on distributed compute resources (in particular in real user setups) requires careful planning.
Tools frequently use
$TMPDIR (or simply use hardcoded
/tmp) for storing temporary data. In containerized environments
is by default bound to a directory in the job working dir (
$_GALAXY_JOB_TMP_DIR:/tmp:rw is in the bind strings (in addition to
Galaxy automatically passes the environment variables
$TMPDIR to the container and bind-mounts these.