Warning

This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

Server-Sent Events (SSE) for real-time updates

Galaxy can push history changes, interactive-tool entry-point changes, and in-app notifications to connected browsers via Server-Sent Events instead of polling. This replaces the legacy 3-second history poll and 10-second entry-point poll with a single long-lived HTTP connection per browser tab, dramatically reducing API load on busy servers and giving users sub-second update latency.

This document describes the architecture, the configuration knobs, the metrics admins should watch, and how to configure NGINX so the long-lived event connection is not buffered or prematurely terminated.

How it works

A high-level view of the moving parts on the server:

   ┌────────────────┐  LISTEN/NOTIFY     ┌──────────────────────┐
   │  Postgres      │ ─────────────────► │ HistoryAuditMonitor  │
   │  history_audit │  (or audit poll    │ (one elected process) │
   │                │   on SQLite)       └──────────┬───────────┘
   └────────────────┘                               │ Kombu
                                                    ▼
                                     ┌─────────────────────────────┐
                                     │  galaxy_queue_worker        │
                                     │  control task fan-out:      │
                                     │   • history_update          │
                                     │   • entry_point_update      │
                                     │   • notification_update     │
                                     │   • broadcast_update        │
                                     └──────────────┬──────────────┘
                                                    │ in-process
                                                    ▼
                              ┌─────────────────────────────────────┐
                              │  Each gunicorn worker:              │
                              │   SSEConnectionManager → asyncio    │
                              │   queue per connected browser tab   │
                              └──────────────┬──────────────────────┘
                                             │ HTTP chunked
                                             ▼
                              ┌─────────────────────────────────────┐
                              │  Browser EventSource                │
                              │  (single stream, multiplexed events)│
                              └─────────────────────────────────────┘

Concretely:

  1. One stream per browser, many event types. The browser opens a single EventSource against /api/events/stream. The same connection carries history_update, entry_point_update, notification_update, broadcast_update, and notification_status events.

  2. Per-process registries. Each Gunicorn worker keeps an SSEConnectionManager that holds an asyncio.Queue per connected tab, indexed by user id (and Galaxy session id, so anonymous users still receive their own history’s updates).

  3. Producers.

    • History updates come from a HistoryAuditMonitor that watches history_audit via PostgreSQL LISTEN/NOTIFY (instant) or by polling the audit table on SQLite. Only one process in the cluster is the producer, picked by DatabaseHeartbeat leader election. If a standalone galaxy-sse-monitor process is running it always wins; otherwise one webapp picks it up.

    • Entry-point updates are dispatched directly from the code paths that mutate interactive-tool entry points — there is no separate watcher.

    • Notifications dispatch SSE events from NotificationManager whenever a notification or broadcast is created.

  4. Cross-process fan-out. Producers don’t know which worker holds the recipient’s connection, so all events go through a Kombu control task broadcast on the internal AMQP bus. Every worker receives every event and locally drops the ones for users it doesn’t currently hold a connection for.

  5. Reconnect catch-up. When a browser reconnects after a network blip, it sends Last-Event-ID. The server replays an aggregated notification_status covering everything since that timestamp. History updates come with an update_time cursor in the payload, so the client can request the delta itself.

Configuration

There is a single admin-facing flag for SSE-driven updates:

galaxy:
  enable_sse_updates: true

This controls all three SSE-driven paths (history, entry-point, notifications). When false:

  • HistoryAuditMonitor is not registered, so the cluster does no LISTEN/NOTIFY or audit-table polling for history changes.

  • The browser falls back to its existing 3-second history poll and 10-second entry-point poll.

  • Notifications fall back to the existing 30-second polling against /api/notifications/status.

enable_notification_system is independent: it gates whether the notification system is available at all (notification creation, delivery, preferences, broadcasts). With enable_notification_system: true:

  • enable_sse_updates: true → notifications arrive via SSE.

  • enable_sse_updates: false → notifications are polled.

With enable_notification_system: false the entire notification system is off — there is nothing to push or poll.

The polling-fallback knob for the history audit monitor stays available:

galaxy:
  history_audit_monitor_poll_interval: 2  # seconds, SQLite / no-LISTEN only

This only matters when running on SQLite or in setups where PostgreSQL LISTEN/NOTIFY is unavailable.

What to monitor

When statsd is configured (via statsd_host and friends), the SSE plumbing emits the following metrics. Capture these on the same dashboard you use for Gunicorn worker health:

Metric

Type

Source

Meaning

galaxy.sse.connections.dropped

counter

SSEConnectionManager

A per-connection asyncio queue filled up; an event was lost.

galaxy.sse.dispatch.count (tag: task)

counter

SSEEventDispatcher

Control-task fan-outs by event kind.

galaxy.sse.dispatch.latency_ms (tag: task)

timing

SSEEventDispatcher

Wall time spent enqueueing the control task.

galaxy.sse.dispatch.skipped_no_qw

counter

SSEEventDispatcher

Producer tried to dispatch with no queue worker bound — events would have been dropped.

You can also expose a “currently connected SSE clients” gauge if you wire one up: each SSEConnectionManager instance publishes total_broadcast_connections (all connections, including anonymous) and total_per_user_connections (connections bound to a specific user). These are per-worker numbers; sum across workers for a cluster total.

Alerting recommendations:

  • galaxy.sse.connections.dropped > 0 sustained indicates a slow or stuck client whose queue filled up. Occasional drops on a network blip are normal; a steady rate is a bug or a misconfigured proxy holding events back too long.

  • galaxy.sse.dispatch.skipped_no_qw > 0 means events are being lost because the producer process can’t reach the AMQP bus. Check the amqp_internal_connection config and the AMQP broker health.

  • galaxy.sse.dispatch.latency_ms p95 climbing points at an AMQP bottleneck (broker load, network) — events will land late.

In addition, watch the standard worker metrics. SSE connections are long-lived (tens of minutes is common), but Galaxy runs Gunicorn with uvicorn.workers.UvicornWorker (configured by Gravity by default), so each worker process is async and can hold thousands of idle SSE connections without blocking other requests. The practical concern on busy servers is therefore memory and file-descriptor headroom, not worker exhaustion: budget a few KB per connection plus one fd per connection per worker, and raise ulimit -n accordingly.

Configuring NGINX

The SSE endpoint is served at /api/events/stream. It is a normal HTTP/1.1 chunked response, so it works through NGINX without any special modules — but you must turn off response buffering and raise the read/send timeouts, otherwise events will arrive in batched bursts (or not at all until the connection times out).

Galaxy already sets X-Accel-Buffering: no on the response, which disables NGINX’s response buffering for that one endpoint without affecting buffering on the rest of Galaxy. That alone is enough on most setups. The block below adds the read/send timeouts and HTTP/1.1 upgrade-friendly headers explicitly so the connection survives long idle periods between events:

        # Long-lived Server-Sent Events stream.
        # Galaxy sends ``X-Accel-Buffering: no`` on the response, which
        # disables nginx response buffering just for this endpoint.
        # The keepalive comment fires every 30s so the read timeout
        # only needs to be a comfortable margin above that.
        location /api/events/stream {
            proxy_pass http://unix:/srv/galaxy/var/gunicorn.sock;
            proxy_set_header Host $http_host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            proxy_http_version 1.1;
            proxy_set_header Connection "";

            # Disable buffering and gzip explicitly. ``X-Accel-Buffering``
            # already does this, but pinning it here also covers setups
            # where a sub-filter strips upstream headers.
            proxy_buffering off;
            proxy_cache off;
            gzip off;

            # Keepalives fire every 30s; allow generous slack on top.
            proxy_read_timeout 1h;
            proxy_send_timeout 1h;
        }

Place this location block above the catch-all location / block in your existing Galaxy server {} (see the NGINX proxy guide). NGINX matches longest prefix first, so the order doesn’t matter for correctness, but keeping all the override blocks together at the top of the server block makes the special-case handling easy to find.

If you serve Galaxy at a URL prefix (/galaxy), prefix the location too:

        location /galaxy/api/events/stream {
            proxy_pass http://unix:/srv/galaxy/var/gunicorn.sock:/galaxy;
            # ...same body as above
        }

Other proxies

If you front Galaxy with something other than NGINX, the same rules apply: disable response buffering for /api/events/stream, allow the connection to stay open for at least the keepalive interval (30 s by default) plus a healthy margin, and pass the request through HTTP/1.1 without forcing Connection: close.

  • Apache mod_proxy_http: add ProxyPass with flushpackets=on flushwait=5 and bump ProxyTimeout to at least a few minutes. Avoid mod_deflate on this endpoint.

  • HAProxy: the connection is a plain HTTP/1.1 chunked response and needs no special handling beyond timeout server and timeout tunnel raised above 30 s.

  • Cloudflare and other CDNs: many CDNs buffer HTTP/1.1 chunked responses by default. Either bypass the CDN for /api/events/stream or follow the CDN’s documented pattern for SSE streaming.

Verifying the deployment

After enabling enable_sse_updates, three quick checks confirm the stream is healthy end-to-end:

  1. From the browser DevTools Network tab, open Galaxy and look for a GET /api/events/stream request that stays in the pending state with a constantly incrementing transferred-bytes count. The response Content-Type is text/event-stream.

  2. Trigger a history change (run a tool, rename a dataset). The Network tab should show a history_update event in the EventStream view of that connection within a second or two, and the history panel refreshes without a polling round-trip.

  3. On the server side, galaxy.sse.dispatch.count should be ticking up for each event kind your users exercise. If you wired up the connection gauges, they should reflect roughly one connection per open browser tab.

If the connection opens but no events arrive, the most common causes are: a proxy buffering responses (revisit the NGINX section), or a producer that can’t reach AMQP (see galaxy.sse.dispatch.skipped_no_qw).