Scaling and Load Balancing¶
The Galaxy framework is written in Python and makes extensive use of threads. However, one of the drawbacks of Python is the Global Interpreter Lock, which prevents more than one thread from being on CPU at a time. Because of this, having a multi-core system will not improve the Galaxy framework’s performance out of the box since Galaxy can use (at most) one core at a time in its default configuration. However, Galaxy can easily run in multiple separate processes, which solves this problem. For a more thorough explanation of this problem and why you will almost surely want to switch to the multiprocess configuration if running for more than a small handful of users, see the production configuration page.
Just to be clear: Increasing the number of plugin workers in
job_conf.xml will not make your Galaxy server much more responsive.
The key to scaling Galaxy is the ability to run multiple Galaxy servers which co-operatively work on the same database.
web worker - Galaxy server process responsible for servicing web requests for the UI/API
job handler - Galaxy server process responsible for setting up, starting, and monitoring jobs, submitting jobs to a cluster (if configured), for setting metadata (if not set on the cluster), and cleaning up after jobs
tags - Handlers can be grouped in to a “pool” of handlers using tags, after which, individual tools may be mapped to a handler tag such that all executions of that tool are handled by the tagged handler(s).
default - Any handlers without defined tags - aka “untagged handlers” - will handle executions of all tools not mapped to a specific handler ID or tag.
Webless Galaxy application - The Galaxy application run as a standalone Python application with no web/ASGI server
It is possible to run the Galaxy server in many different ways, including under different web application frameworks, or as a standalone server with no web stack.
Starting with the 22.01 release the default application server is Gunicorn. Gunicorn serves Galaxy as an ASGI web application.
Prior to the 18.01 release, Galaxy (by default) used the Python Paste web stack, and ran in a single process. Between the 18.01 release and the 22.01 release, uWSGI was used as the default application server. In release 22.05 we dropped support for running Galaxy as a WSGI application via uWSGI or paste. For more information about this, please consult the version of this document that is appropriate to your Galaxy release.
There are multiple deployment strategies for the Galaxy application that you can choose from. The right one depends on the configuration of the infrastructure on which you are deploying. In all cases, all Galaxy job features such as running on a cluster are supported.
Although Gunicorn implements many features that were previously the responsibility of an upstream proxy server, it is recommended to place a proxy server in front of Gunicorn and utilize it for all of its traditional roles (serving static content, serving dataset downloads, etc.) as described in the production configuration documentation.
Gunicorn with jobs handled by web workers (default configuration)¶
Referred to in this documentation as the all-in-one strategy.
Job handlers and web workers are the same processes and cannot be separated
A random web worker will be the job handler for that job
Under this strategy, jobs will be handled by Gunicorn workers. Having web processes handle jobs will negatively impact UI/API performance.
This is the default out-of-the-box configuration.
Gunicorn for web serving and Webless Galaxy applications as job handlers¶
Referred to in this documentation as the Gunicorn + Webless strategy.
Job handlers are started as standalone Python applications with no web stack
Jobs are dispatched from web workers to job handlers via the Galaxy database
Jobs can be dispatched to job handlers running on any host
Additional job handlers can be added dynamically without reconfiguring/restarting Galaxy (19.01 or later)
The recommended deployment strategy for production Galaxy instances
By default, handler assignment will occur using the Database Transaction Isolation or Database SKIP LOCKED methods (see below).
Job Handler Assignment Methods¶
Job handler assignment methods are configurable with the
attribute on the
<handlers> tag in
job_conf.xml. The available methods are:
Database Transaction Isolation (
db-transaction-isolation, new in 19.01) - Jobs are assigned a handler by handlers selecting the unassigned job from the database using SQL transaction isolation, which uses database locks to guarantee that only one handler can select a given job. This occurs by the web worker that receives the tool execution request (via the UI or API) setting a new job’s ‘handler’ column in the database to the configured tag/default (or
_default_if no tag/default is configured). Handlers “listen” for jobs by selecting jobs from the database that match the handler tag(s) for which they are configured.
db-transaction-isolationis the default assignment method if no handlers are defined, or handlers are defined but no
assign_withattribute is set on the
handlerstag and Database SKIP LOCKED is not available.
Database SKIP LOCKED (
db-skip-locked, new in 19.01) - Jobs are assigned a handler by handlers selecting the unassigned job from the database using
SELECT ... FOR UPDATE SKIP LOCKEDon databases that support this query (see the next section for details). This occurs via the same process as Database Transaction Isolation, the only difference is the way in which handlers query the database. This is the default if no handlers are defined, or handlers are defined but no
assign_withattribute is set on the
handlerstag and Database SKIP LOCKED is available.
Database Self Assignment (
db-self) - Like In-memory Self Assignment but assignment occurs by setting a new job’s ‘handler’ column in the database to the process that created the job at the time it is created. Additionally, if a tool is configured to use a specific handler (ID or tag), that handler is assigned (tags by Database Preassignment). This is the default fallback if no handlers are defined and the database does not support Database SKIP LOCKED or Database Transaction Isolation.
In-memory Self Assignment (
mem-self) - Jobs are assigned to the web worker that received the tool execution request from the user via an internal in-memory queue. If a tool is configured to use a specific handler, that configuration is ignored; the process that creates the job always handles it. This can be slightly faster than Database Self Assignment but only makes sense in single process environments without dedicated job handlers. This option supersedes the former
galaxy.ymland corresponds to setting that option to
Database Preassignment (
db-preassign) - Jobs are assigned a handler by selecting one at random from the configured tag or default handlers at the time the job is created. This occurs by the web worker that receives the tool execution request (via the UI or API) setting a new job’s ‘handler’ column in the database to the randomly chose handler ID (hence “preassignment”). This is the default only if handlers are defined and the database does not support Database SKIP LOCKED or Database Transaction Isolation.
In all cases, if a tool is configured to use a specific handler (by ID, not tag), configured assignment methods are ignored and that handler is directly assigned in the job’s ‘handler’ column at job creation time.
config/job_conf.xml.sample_advanced file in the Galaxy distribution for instructions on setting the
Choosing an Assignment Method¶
Prior to Galaxy 19.01, the most common deployment strategies assigned handlers using what is now (since 19.01) referred to as Database Preassignment. Although still a fallback option when the database does not support Database SKIP LOCKED or Database Transaction Isolation, preassignment has a few drawbacks:
Web workers do not have a way to know whether a particular handler is alive when assigning that handler
Jobs are not load balanced across handlers
Changing the number of handlers requires changing
job_conf.xmland restarting all Galaxy processes
The “database locking” methods (Database SKIP LOCKED and Database Transaction Isolation) were created to solve these issues. The preferred method between the two options is Database SKIP LOCKED, but it requires PostgreSQL 9.5 or newer, SQLite 3.25 or newer or MySQL 8.0 or newer (untested), or MariaDB 10.3 or newer (untested). If using an older database version, use Database Transaction Isolation instead. A detailed explanation of these database locking methods in PostgreSQL can be found in the excellent What is SKIP LOCKED for in PostgreSQL 9.5? entry on the 2ndQuadrant PostgreSQL Blog.
The preferred assignment method is Database SKIP LOCKED or Database Transaction Isolation.
We will only outline a few of Gunicorn’s options, consult the Gunicorn documentation for more.
Note that by default Galaxy will use gravity to create a supervisor configuration in Gravity’s state directory.
Gravity’s state directory is located in
database/gravity, and you can find the generated supervisor configuration in
database/gravity/supervisor. The location of the state directory can be controlled using the
galaxyctl, or using the
GRAVITY_STATE_DIR environment variable.
Configuration values for the supervisor configuration are read from the
gravity section of your
This is the preferred and out-of-the box way of configuring Gunicorn for serving Galaxy.
If you are not using
./run.sh for starting Galaxy or you would like to use another process manager,
all the Gunicorn configuration values can also directly be set on the command line.
Configuration is performed in the
gravity section of
galaxy.yml. You will find that the default, if copied from
galaxy.yml.sample, is commented out. The default configuration options are provided to Gunicorn on the command line by
gravity within the
After making changes to the
gravity section, you always need to activate Galaxy’s virtualenv and run
or one of the
graceful subcommands of the
galaxyctl command, which will run the
update command internally.
Common Gunicorn configuration¶
galaxy.yml, define a
gravity section. Shown below are the options common to all deployment strategies:
gravity: app_server: gunicorn gunicorn: # listening options bind: '127.0.0.1:8080' # performance options workers: 1 # Other options that will be passed to gunicorn extra_args:
Some of these options deserve explanation:
workers: Controls the number of Galaxy application processes Gunicorn will spawn. Increased web performance can be attained by increasing this value. If Gunicorn is the only application on the server, a good starting value is the number of CPUs * 2 + 1. 4-12 workers should be able to handle hundreds if not thousands of requests per second.
extra_args: You can specify additional arguments to pass to gunicorn here.
Note that the performance option values given above are just examples and should be tuned per your specific needs. However, as given, they are a good place to start.
Listening and proxy options¶
With a proxy server:
To use a socket for the communication between the proxy and Gunicorn, set the
bind option to a path:
gravity: app_server: gunicorn gunicorn: # listening options bind: 'unix:/srv/galaxy/var/gunicorn.sock' extra_args: '--forwarded-allow-ips="*"'
Here we’ve used a UNIX domain socket because there’s less overhead than a TCP socket and it can be secured by filesystem
permissions. Note that we’ve added
--forwarded-allow-ips="*" to ensure that the domain socket is trusted as a source from which to proxy headers.
You can also listen on a port:
app_server: gunicorn gunicorn: # listening options bind: '127.0.0.1:4001'
If you are listening on a port do not set
The choice of port 4001 is arbitrary, but in both cases, the socket location must match whatever socket the proxy server
is configured to communicate with. If using a UNIX domain socket, be sure that the proxy server’s user has read/write
permission on the socket. Because Galaxy and the proxy server most likely run as different users, this is not likely to
be the case by default. One common solution is to add the proxy server’s user to the Galaxy user’s primary group.
umask option can also help here.
By setting the
bind option to a socket,
run.sh will no longer automatically serve Galaxy via HTTP (since it is assumed that
you are setting a socket to serve Galaxy via a proxy server). If you wish to continue serving HTTP directly with Gunicorn
while using a socket, you can add an additional
--bind argument via the
gravity: app_server: gunicorn gunicorn: # listening options bind: 'unix:/srv/galaxy/var/gunicorn.sock' extra_args: '--forwarded-allow-ips="*" --bind 127.0.0.1:8080'
Note that this should only be used for debugging purposes due to
Without a proxy server:
It is strongly recommended to use a proxy server.
Gunicorn can be configured to serve HTTPS directly:
# listening options gunicorn: # listening options bind: '0.0.0.0:443' keyfile: server.key certfile: server.crt
See Gunicorn’s SSL documentation for more details.
To bind to ports < 1024 (e.g. if you want to bind to the standard HTTP/HTTPS ports 80/443), you must bind as the
user and drop privileges to the Galaxy user. However you are strongly encouraged to setup a proxy server
as described in the production configuration documentation.
In all strategies, once a handler has been assigned jobs, you cannot unconfigure that handler (e.g. to decrease the
number of handlers) until it has finished processing all its assigned jobs, or else its jobs will never reach a
terminal state. In order to allow a handler to run but not receive any new jobs, configure it with an unused tag (e.g
<handler id="handler5" tags="drain" />) and restart all Galaxy processes.
Alternatively, you can stop the handler and reassign its jobs to another handler, but this is currently only possible
UPDATE query in the database and is only recommended for advanced Galaxy administrators.
all-in-one job handling¶
Ensure that no
<handlers> section exists in your
job_conf.xml (or no
job_conf.xml exists at all) and start Galaxy
normally. No additional configuration is required. To increase the number of web workers/job handlers, increase the
workers in the gunicorn section of your galaxy.yml file:
gravity: ... gunicorn: ... # performance options workers: 4 ...
Jobs will be handled according to rules outlined above in Job Handler Assignment Methods.
<handlers> section is defined in
job_conf.xml, Galaxy’s web workers will no longer load and start the
job handling code, so tools cannot be mapped to specific handlers in this strategy. If you wish to control job
handling, choose another deployment strategy.
Statically defined handlers¶
<handlers> section in
job_conf.xml, define the webless handlers you plan to start. Tools can be mapped
to specific handlers, or to handler tags, as in the following example:
<job_conf> <handlers> <handler id="handler1" /> <handler id="handler2" /> <handler id="handler3" tags="nodefault" /> <handler id="handler4" tags="special" /> <handler id="handler5" tags="special" /> </handlers> <tools> <tool id="test1" handler="handler3" /> <tool id="test2" handler="special" /> <tool id="test3" handler="handler2" /> </tools> </job_conf>
Any untagged handler will be automatically considered a default handler. As seen in the example above, it is possible
to map any tool to any handler or tag, however, a handler must be tagged to prevent it from handling jobs created for
tools that are not explicitly mapped to handlers. Thus,
handler2 will handle all executions of tool
but it will also (along with
handler1) handle tools that are not explicitly mapped to handlers. In contrast,
handler3 will only handle executions of tool
run.sh will start the gunicorn and job handler process(es), but if you are not using
run.sh or the generated supervisor setup you will need to start the webless handler processes yourself. This is done on the command line like so:
$ cd /srv/galaxy/server $ ./scripts/galaxy-main -c config/galaxy.yml --server-name handler0 --daemonize $ ./scripts/galaxy-main -c config/galaxy.yml --server-name handler1 --daemonize $ ./scripts/galaxy-main -c config/galaxy.yml --server-name handler2 --daemonize
Dynamically defined handlers¶
In order to define handlers dynamically, you must be using one of the new “database locking” handler assignment methods
as explained in Job Handler Assignment Methods, such as in the following
<job_conf> <handlers assign_with="db-skip-locked" /> <tools> <tool id="test1" handler="special" /> </tools> </job_conf>
Note that we have defined a
<handlers> section without any
and we have explicitly configured the assignment method with
To let gravity know how many webless handler processes should be started set the number of processes to start in the
gravity: section of galaxy.yml:
gravity: handlers: handler: processes: 3 pools: - job-handlers - workflow-schedulers special: pools: - job-handlers.special
In this example 4 processes will be started in total:
3 processes will act as job handlers and workflow schedulers, and one process will be dedicated to handling jobs for the
special tag only. With the
job_conf.xml configuration above these would be jobs created by the
You can omit the
pools argument, this will then default to:
... pools: - job-handlers - workflow-schedulers ...
If you omit the
processes argument this will default to a single process.
You can further customize the handler names using the
for a complete example see this gravity test case.
You can define arbitrary environment variables for dynamic handlers using the
environnment key on a handler
gravity: handlers: handler: processes: 3 pools: - job-handlers environment: FOO: bar BAZ: quux
If you are not using dynamic handlers please omit the
handlers entry completely, as these will
otherwise be idle and not handle jobs or workflows.
As with statically defined handlers,
run.sh will start the process(es), but if you are not using
run.sh or the generated supervisor config you will need to start the webless handler processes yourself. This is done on the command line like so (note the addition of the
$ cd /srv/galaxy/server ./scripts/galaxy-main -c config/galaxy.yml --server-name handler_0 --attach-to-pool job-handlers --attach-to-pool workflow-scheduler --daemonize ./scripts/galaxy-main -c config/galaxy.yml --server-name handler_1 --attach-to-pool job-handlers --attach-to-pool workflow-scheduler --daemonize ./scripts/galaxy-main -c config/galaxy.yml --server-name handler_3 --attach-to-pool job-handlers --attach-to-pool workflow-scheduler --daemonize ./scripts/galaxy-main -c config/galaxy.yml --server-name special_0 --attach-to-pool job-handlers.special --daemonize
In this example:
handler2will handle tool executions that are not explicitly mapped to handlers
special_0will handle tool executions that are mapped to the
gravity & galaxyctl¶
Gravity is a management tool for Galaxy servers, and is installed when you set up Galaxy.
It provides two executables,
galaxyctl, which is used to manage the starting, stopping, and logging of Galaxy’s various processes, and
galaxy, which can be used to run a Galaxy server in the foreground.
These commands are available from within Galaxy’s virtualenv, or you can install them globally.
If you have used the standard installation method for Galaxy by running
./run.sh or executing
./scripts/common_startup.sh, a default directory has been configured in which gravity stores
its state. If you have installed Galaxy or gravity by another means, you can use the
GRAVITY_STATE_DIR environment variable to control the state directory. If a state dir is not specified, it defaults to
In the following sections we assume you have correctly set up gravity and can use the
Logging and daemonization¶
./run.sh the log output is shown on screen, and ^ctrl-c will stop all processes.
Galaxy can be started in the background by running
Alternatively, you can control Galaxy using the
galaxyctl command provided by gravity.
After activating Galaxy’s virtual environment you can start Galaxy in the background
$ galaxyctl start celery STARTING celery-beat STARTING gunicorn STARTING Log files are in /Users/mvandenb/src/doc_test/database/gravity/log
All process logs can be seen with
$ galaxyctl follow ==> /Users/mvandenb/src/doc_test/database/gravity/log/gunicorn.log <== galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,344 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(JobHandlerQueue.monitor_thread, started daemon 123145470246912)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,345 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(JobHandlerStopQueue.monitor_thread, started daemon 123145487036416)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,345 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(WorkflowRequestMonitor.monitor_thread, started daemon 123145503825920)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,345 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(database_heartbeart_main.thread, started daemon 123145520615424)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,345 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <GalaxyQueueWorker(Thread-1, started daemon 123145537404928)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,346 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(Thread-2, started daemon 123145554194432)> is alive. galaxy.webapps.galaxy.buildapp DEBUG 2022-02-17 16:31:33,346 [pN:main,p:91480,tN:MainThread] Prior to webapp return, Galaxy thread <Thread(Thread-3, started daemon 123145570983936)> is alive. [2022-02-17 16:31:34 +0100]  [INFO] Started server process  [2022-02-17 16:31:34 +0100]  [INFO] Waiting for application startup. [2022-02-17 16:31:34 +0100]  [INFO] Application startup complete. ==> /Users/mvandenb/src/doc_test/database/gravity/log/celery.log <== [2022-02-17 16:31:27,008: DEBUG/MainProcess] ^-- substep ok [2022-02-17 16:31:27,009: DEBUG/MainProcess] | Consumer: Starting Events [2022-02-17 16:31:27,009: DEBUG/MainProcess] ^-- substep ok [2022-02-17 16:31:27,009: DEBUG/MainProcess] | Consumer: Starting Tasks [2022-02-17 16:31:27,035: DEBUG/MainProcess] ^-- substep ok [2022-02-17 16:31:27,035: DEBUG/MainProcess] | Consumer: Starting Heart [2022-02-17 16:31:27,035: DEBUG/MainProcess] ^-- substep ok [2022-02-17 16:31:27,035: DEBUG/MainProcess] | Consumer: Starting event loop [2022-02-17 16:31:27,035: INFO/MainProcess] celery@MacBook-Pro-2.local ready. [2022-02-17 16:31:27,035: DEBUG/MainProcess] basic.qos: prefetch_count->8 ==> /Users/mvandenb/src/doc_test/database/gravity/log/celery-beat.log <== [2022-02-17 16:29:04,023: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes [2022-02-17 16:29:04,024: DEBUG/MainProcess] beat: Waking up in 5.00 minutes. No Galaxy config file found, running from current working directory: /Users/mvandenb/src/doc_test [2022-02-17 16:31:26,818: DEBUG/MainProcess] Setting default socket timeout to 30 [2022-02-17 16:31:26,819: INFO/MainProcess] beat: Starting... [2022-02-17 16:31:26,824: DEBUG/MainProcess] Current schedule: <ScheduleEntry: prune-history-audit-table galaxy.celery.tasks.prune_history_audit_table() <freq: 1.00 hour> <ScheduleEntry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)> [2022-02-17 16:31:26,824: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes [2022-02-17 16:31:26,825: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
More advanced logging options are described in the Galaxy Logging Configuration documentation.
Starting and Stopping¶
If you want to run your Galaxy server as a persistent service, you can include the
galaxy script from Galaxy’s
virtualenv in the configuration of your process manager (e.g. systemd). You can then continue using the
galaxyctl command as usual
to start/stop/restart Galaxy or follow the logs.
For zero-downtime restarts use the command
This is a sample config for systemd. More information on systemd.service environment
settings can be found in the documentation
The filename follows the pattern
<service_name>.service. In this case we will use
[Unit] Description=Galaxy processes After=network.target After=time-sync.target [Service] PermissionsStartOnly=true Type=simple User=galaxy Group=galaxy Restart=on-abort WorkingDirectory=/srv/galaxy/server TimeoutStartSec=10 ExecStart=/srv/galaxy/venv/bin/galaxy --state-dir /srv/galaxy/database/gravity Environment=VIRTUAL_ENV=/srv/galaxy/venv PATH=/srv/galaxy/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin [Install] WantedBy=multi-user.target
We can now enable and start the the Galaxy services with systemd:
# systemctl enable galaxy Created symlink from /etc/systemd/system/multi-user.target.wants/galaxy.service to /etc/systemd/system/galaxy.service.