Containers for Tool Dependencies

Galaxy tools (also called wrappers) are able to use Conda packages (see more information in our Galaxy Conda documentation) and Docker containers as dependency resolvers. The IUC recommends to use Conda packages as the primary dependency resolver, mainly because Docker is not available on every (HPC-) system. Conda on the other hand can be installed by Galaxy and maintained entirely in user-space. Nevertheless, Docker and containers in general have some unique features and there are many use-cases in the Galaxy community that make containerized tools very appealing.

Since 2014 Galaxy supports running tools in Docker containers via a special container annotation inside of the requirement field.

<requirements>
    <!-- Container based dependency handling -->
    <container type="docker">busybox:ubuntu-14.04</container>
    <!-- Conda based dependency handling -->
    <requirement type="package" version="8.22">gnu_coreutils</requirement>
</requirements>

This approach has shown two limitations that slowed down the adoption by tool developers. First, every tool needs to be annotated with a container name (as shown above) and this container needs to be created beforehand, usually manually. The second reason is that a Galaxy tool aims to be deployed everywhere, independet of the underlying system, meaning if Docker is not available Galaxy should use Conda packages. This puts an additional burden on tool developers who need to take care of two dependency resolvers. This setup can cause different tool results depending on the resolver, because both the Conda package and the Docker container are usually not created out of the same recipe and maybe were compiled in a different way, use different sources etc.

Not an ideal solution and something we wanted to solve.

Here we demonstrate a solution that can create Containers out of Conda packages automatically. This can be either used to support communities like BioContainers to create Containers before deploying a Galaxy tool, or this can be used by Galaxy to create Containers on-demand and on-the-fly if one is not available already.

Automatic build of Linux containers

We utilize mulled with involucro to automatically convert all packages in Bioconda into Linux containers images (Docker and rkt at the moment) and make them available at the BioContainers Quay.io account.

We have developed small utilities around this technology stack, which is currently included in galaxy-lib. Here is a short introduction:

Search for containers

This will search for Docker containers (in the biocontainers organisation on quay.io), Singularity containers (located at https://depot.galaxyproject.org/singularity/), Conda packages (in the bioconda channel), and GitHub files (on the bioconda-recipes repository.

$ mulled-search --destination docker conda --search vsearch

The user can specify the location(s) for a search using the --destination option. The search term is specified using --search. Multiple search terms can be specified simultaneously; in this case, the search will also encompass multi-package containers. For example, --search samtools bamtools, will return mulled-v2-0560a8046fc82aa4338588eca29ff18edab2c5aa:c17ce694dd57ab0ac1a2b86bb214e65fedef760e-0, in addition to all individual samtools and bamtools results.

If the user wishes to specify a quay.io organization or Conda channel for the search, this may be done using the --organization and --channel options respectively, e.g. --channel conda-forge. Enabling --json causes results to be returned in JSON format.

Build all packages from bioconda from the last 24h

The BioConda community is building a container for every package they create with a command similar to this.

$ mulled-build-channel --channel bioconda --namespace biocontainers \
   --involucro-path ./involucro --recipes-dir ./bioconda-recipes --diff-hours 25 build

Building Docker containers for local Conda packages

Conda packages can be tested with creating a busybox based container for this particular package in the following way. This also demonstrates how you can build a container locally and on-the-fly.

> we modified the samtools package to version 3.0 to make it clear we are using a local version
  1. Build your recipe
$ conda build recipes/samtools
  1. Index your local builds
$ conda index /home/bag/miniconda2/conda-bld/linux-64/
  1. Build a container for your local package
$ mulled-build build-and-test 'samtools=3.0--0' \
   --extra-channel file://home/bag/miniconda2/conda-bld/ --test 'samtools --help'

The --0 indicates the build version of the conda package. It is recommended to specify this number, otherwise you will override already existing images. For Python Conda packages this extension might look like this --py35_1.

Build, test, and push a conda-forge package to biocontainers

> You need to have write access to the biocontainers repository

You can build packages from other Conda channels as well, not only from BioConda. pandoc tool is available from the conda-forge channel and conda-forge is also enabled by default in Galaxy. To build pandoc and push it to biocontainrs you could do something along these lines.

$ mulled-build build-and-test 'pandoc=1.17.2--0' --test 'pandoc --help' -n biocontainers
$ mulled-build push 'pandoc=1.17.2--0' --test 'pandoc --help' -n biocontainers

Build Singularity containers from Docker containers

Singularity containers can be built from Docker containers using the mulled-update-singularity-containers command.

To generate a single container:

$ mulled-update-singularity-containers --containers samtools:1.6--0 --logfile /tmp/sing/test.log --filepath /tmp/sing/ --installation /usr/local/bin/singularity

--containers indicates the container name (here samtools:1.6--0), --filepath the location where the containers should be placed, and --installation the location of the Singularity installation. (This can be found using whereis singularity.)

Multiple containers can be installed simultaneously by giving --containers more than one argument:

$ mulled-update-singularity-containers --containers samtools:1.6--0 bamtools:2.4.1--0 --filepath /tmp/sing/ --installation /usr/local/bin/singularity

For a large number of containers, it may be more convenient to employ the --container-list option:

$ mulled-update-singularity-containers --container-list list.txt --filepath /tmp/sing/ --installation /usr/local/bin/singularity

Here list.txt should contain a list of containers, each on a new line.

In order to generate the list file the mulled-list command may be useful. The following command returns a list of all Docker containers available on the quay.io biocontainers organization, excluding those already available as Singularity containers via https://depot.galaxyproject.org/singularity/.:: bash

$ mulled-list –source docker –not-singularity –blacklist blacklist.txt –file output.txt

The list of containers will be saved as output.txt. The (optional) --blacklist option may be used to exclude containers which should not included in the output; blacklist.txt should contain a list of the ‘blacklisted’ containers, each on a new line.

Containers, once generated, should be tested. This can be achieved by affixing --testing test-output.log to the command, or alternatively, by use of the dedicated mulled-singularity-testing tool.:: bash

$ mulled-singularity-testing –container-list list.txt –filepath /tmp/sing/ –installation /usr/local/bin/singularity –logfile test-output.txt