Warning
This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
Server-Sent Events (SSE) for real-time updates
Galaxy can push history changes, interactive-tool entry-point changes, and in-app notifications to connected browsers via Server-Sent Events instead of polling. This replaces the legacy 3-second history poll and 10-second entry-point poll with a single long-lived HTTP connection per browser tab, dramatically reducing API load on busy servers and giving users sub-second update latency.
This document describes the architecture, the configuration knobs, the metrics admins should watch, and how to configure NGINX so the long-lived event connection is not buffered or prematurely terminated.
How it works
A high-level view of the moving parts on the server:
┌────────────────┐ LISTEN/NOTIFY ┌──────────────────────┐
│ Postgres │ ─────────────────► │ HistoryAuditMonitor │
│ history_audit │ (or audit poll │ (one elected process) │
│ │ on SQLite) └──────────┬───────────┘
└────────────────┘ │ Kombu
▼
┌─────────────────────────────┐
│ galaxy_queue_worker │
│ control task fan-out: │
│ • history_update │
│ • entry_point_update │
│ • notification_update │
│ • broadcast_update │
└──────────────┬──────────────┘
│ in-process
▼
┌─────────────────────────────────────┐
│ Each gunicorn worker: │
│ SSEConnectionManager → asyncio │
│ queue per connected browser tab │
└──────────────┬──────────────────────┘
│ HTTP chunked
▼
┌─────────────────────────────────────┐
│ Browser EventSource │
│ (single stream, multiplexed events)│
└─────────────────────────────────────┘
Concretely:
One stream per browser, many event types. The browser opens a single
EventSourceagainst/api/events/stream. The same connection carrieshistory_update,entry_point_update,notification_update,broadcast_update, andnotification_statusevents.Per-process registries. Each Gunicorn worker keeps an
SSEConnectionManagerthat holds anasyncio.Queueper connected tab, indexed by user id (and Galaxy session id, so anonymous users still receive their own history’s updates).Producers.
History updates come from a
HistoryAuditMonitorthat watcheshistory_auditvia PostgreSQLLISTEN/NOTIFY(instant) or by polling the audit table on SQLite. Only one process in the cluster is the producer, picked byDatabaseHeartbeatleader election. If a standalonegalaxy-sse-monitorprocess is running it always wins; otherwise one webapp picks it up.Entry-point updates are dispatched directly from the code paths that mutate interactive-tool entry points — there is no separate watcher.
Notifications dispatch SSE events from
NotificationManagerwhenever a notification or broadcast is created.
Cross-process fan-out. Producers don’t know which worker holds the recipient’s connection, so all events go through a Kombu control task broadcast on the internal AMQP bus. Every worker receives every event and locally drops the ones for users it doesn’t currently hold a connection for.
Reconnect catch-up. When a browser reconnects after a network blip, it sends
Last-Event-ID. The server replays an aggregatednotification_statuscovering everything since that timestamp. History updates come with anupdate_timecursor in the payload, so the client can request the delta itself.
Standalone monitor (recommended for production)
The galaxy-sse-monitor console script (installed by the galaxy-app
package) runs the HistoryAuditMonitor outside the webapp processes. This
is the recommended layout for production because:
The webapp processes never compete with each other for the audit-monitor role on cold starts.
Restarting the webapp tier doesn’t briefly stall history updates while another worker is elected.
The monitor needs only DB + AMQP access, so it can be sized independently and runs with a much smaller resident-set than a webapp.
A typical Gravity supervisor entry looks like the existing
galaxy-celery-worker block — point at the same galaxy.yml and run
galaxy-sse-monitor on its own. With the daemon present, the
HistoryAuditMonitor registered on the webapp side stays idle (the
heartbeat election picks the daemon) but is still wired up so it can take
over if the daemon goes away.
If enable_sse_updates is false, galaxy-sse-monitor will start, log a
warning, and idle — it does no work and produces no events.
Configuration
There is a single admin-facing flag for SSE-driven updates:
galaxy:
enable_sse_updates: true
This controls all three SSE-driven paths (history, entry-point,
notifications). When false:
HistoryAuditMonitoris not registered, so the cluster does noLISTEN/NOTIFYor audit-table polling for history changes.The browser falls back to its existing 3-second history poll and 10-second entry-point poll.
Notifications fall back to the existing 30-second polling against
/api/notifications/status.
enable_notification_system is independent: it gates whether the
notification system is available at all (notification creation, delivery,
preferences, broadcasts). With enable_notification_system: true:
enable_sse_updates: true→ notifications arrive via SSE.enable_sse_updates: false→ notifications are polled.
With enable_notification_system: false the entire notification system is
off — there is nothing to push or poll.
The polling-fallback knob for the history audit monitor stays available:
galaxy:
history_audit_monitor_poll_interval: 2 # seconds, SQLite / no-LISTEN only
This only matters when running on SQLite or in setups where PostgreSQL
LISTEN/NOTIFY is unavailable.
What to monitor
When statsd is configured (via statsd_host and friends), the SSE
plumbing emits the following metrics. Capture these on the same dashboard
you use for Gunicorn worker health:
Metric |
Type |
Source |
Meaning |
|---|---|---|---|
|
counter |
|
A per-connection asyncio queue filled up; an event was lost. |
|
counter |
|
Control-task fan-outs by event kind. |
|
timing |
|
Wall time spent enqueueing the control task. |
|
counter |
|
Producer tried to dispatch with no queue worker bound — events would have been dropped. |
You can also expose a “currently connected SSE clients” gauge if you wire
one up: each SSEConnectionManager instance publishes
total_broadcast_connections (all connections, including anonymous) and
total_per_user_connections (connections bound to a specific user). These
are per-worker numbers; sum across workers for a cluster total.
Alerting recommendations:
galaxy.sse.connections.dropped> 0 sustained indicates a slow or stuck client whose queue filled up. Occasional drops on a network blip are normal; a steady rate is a bug or a misconfigured proxy holding events back too long.galaxy.sse.dispatch.skipped_no_qw> 0 means events are being lost because the producer process can’t reach the AMQP bus. Check theamqp_internal_connectionconfig and the AMQP broker health.galaxy.sse.dispatch.latency_msp95 climbing points at an AMQP bottleneck (broker load, network) — events will land late.
In addition, watch the standard worker metrics. SSE connections are
long-lived (tens of minutes is common), but Galaxy runs Gunicorn with
uvicorn.workers.UvicornWorker (configured by Gravity by default), so
each worker process is async and can hold thousands of idle SSE
connections without blocking other requests. The practical concern on
busy servers is therefore memory and file-descriptor headroom, not
worker exhaustion: budget a few KB per connection plus one fd per
connection per worker, and raise ulimit -n accordingly.
Configuring NGINX
The SSE endpoint is served at /api/events/stream. It is a normal
HTTP/1.1 chunked response, so it works through NGINX without any special
modules — but you must turn off response buffering and raise the
read/send timeouts, otherwise events will arrive in batched bursts (or
not at all until the connection times out).
Galaxy already sets X-Accel-Buffering: no on the response, which
disables NGINX’s response buffering for that one endpoint without
affecting buffering on the rest of Galaxy. That alone is enough on most
setups. The block below adds the read/send timeouts and HTTP/1.1
upgrade-friendly headers explicitly so the connection survives long
idle periods between events:
# Long-lived Server-Sent Events stream.
# Galaxy sends ``X-Accel-Buffering: no`` on the response, which
# disables nginx response buffering just for this endpoint.
# The keepalive comment fires every 30s so the read timeout
# only needs to be a comfortable margin above that.
location /api/events/stream {
proxy_pass http://unix:/srv/galaxy/var/gunicorn.sock;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection "";
# Disable buffering and gzip explicitly. ``X-Accel-Buffering``
# already does this, but pinning it here also covers setups
# where a sub-filter strips upstream headers.
proxy_buffering off;
proxy_cache off;
gzip off;
# Keepalives fire every 30s; allow generous slack on top.
proxy_read_timeout 1h;
proxy_send_timeout 1h;
}
Place this location block above the catch-all location / block in
your existing Galaxy server {} (see the NGINX proxy guide).
NGINX matches longest prefix first, so the order doesn’t matter for
correctness, but keeping all the override blocks together at the top of
the server block makes the special-case handling easy to find.
If you serve Galaxy at a URL prefix (/galaxy), prefix the location too:
location /galaxy/api/events/stream {
proxy_pass http://unix:/srv/galaxy/var/gunicorn.sock:/galaxy;
# ...same body as above
}
Other proxies
If you front Galaxy with something other than NGINX, the same rules
apply: disable response buffering for /api/events/stream, allow the
connection to stay open for at least the keepalive interval (30 s by
default) plus a healthy margin, and pass the request through HTTP/1.1
without forcing Connection: close.
Apache
mod_proxy_http: addProxyPasswithflushpackets=on flushwait=5and bumpProxyTimeoutto at least a few minutes. Avoidmod_deflateon this endpoint.HAProxy: the connection is a plain HTTP/1.1 chunked response and needs no special handling beyond
timeout serverandtimeout tunnelraised above 30 s.Cloudflare and other CDNs: many CDNs buffer HTTP/1.1 chunked responses by default. Either bypass the CDN for
/api/events/streamor follow the CDN’s documented pattern for SSE streaming.
Verifying the deployment
After enabling enable_sse_updates, three quick checks confirm the
stream is healthy end-to-end:
From the browser DevTools Network tab, open Galaxy and look for a
GET /api/events/streamrequest that stays in the pending state with a constantly incrementing transferred-bytes count. The responseContent-Typeistext/event-stream.Trigger a history change (run a tool, rename a dataset). The Network tab should show a
history_updateevent in the EventStream view of that connection within a second or two, and the history panel refreshes without a polling round-trip.On the server side,
galaxy.sse.dispatch.countshould be ticking up for each event kind your users exercise. If you wired up the connection gauges, they should reflect roughly one connection per open browser tab.
If the connection opens but no events arrive, the most common causes
are: a proxy buffering responses (revisit the NGINX section), or a
producer that can’t reach AMQP (see galaxy.sse.dispatch.skipped_no_qw).