Job Metrics
Galaxy contains a plugin infrastructure for collecting and displaying metrics for job execution.
Configuration
An example config/job_metrics_conf.xml.sample
is included in the Galaxy distribution that describes which plugins are enabled and how they are configured. This will be updated for each new plugin added and so should be considered the most updated, complete source of documentation for what plugins are available and how to configure them.
Per Destination
If Galaxy targets different clusters or just different resources - it may make sense to configure different plugins for different resources, different jobs, etc…. To enable this - individual job destinations may disable, load a different job metrics file, or define metrics directly in job_conf.xml
in an embedded fashion.
Pulsar
Job metrics can be collected for Pulsar jobs - but there is added complexity. The deployer is responsible for ensuring the same plugins are available to both applications and the Pulsar app is configured with a job_metrics_conf.xml
that matches Galaxy’s configuration for that destination. In practice the Pulsar and Galaxy code bases should stay in-sync and if one is not using per destination metric configurations - this is simply a matter of ensuring the Pulsar and Galaxy are configured with identical job_metrics_conf.xml
files.
Available Plugins
See job_metrics_conf.xml.sample
for the most updated information.
The core
plugin is one enabled by default and it is a cross-platform plugin used to caputre job run time and core allocation. env
is the only other cross-platform plugin (should work with any *nix) and can be used to record all environment variables set at job runtime or just specific variables (e.g. PATH
) - this can potentially be useful for debugging cluster issues.
There are a number of remaining Linux-only plugins - there include cpuinfo
and meminfo
to collect CPU and memory about the node the Galaxy job runs on and uname
to likewise collect host and operating system information.
A more experimental and more powerful plugin providing deep integration with Collectl is available. The possibilities for uses with this plugin are many and it contains many configuration options. Two of the more useful possibilities this plugin provides include aggregating statistics across the process tree generated by a job to provide per-job resource statistics such as detailed memory usage, detailed CPU breakdown, course I/O information, etc…. or alternatively simply recording a Collectl output file for each job with detailed time-series data for all processes on the runtime system.
Viewing Collected Data
After a job complete successful, all information collected from job metric plugins should be visible to admin users under the dataset details for all of the dataset produces by the job.
Developing New Plugins
Galaxy ships with a good variety of plugins ranging from very simple to very complex - a good place to start is likely to just find the most similar plugin and use it as a template for creating a new plugin. Galaxy’s existing plugins and new ones to be added should be placed in lib/galaxy/job_metrics/instrumenters/
New plugins should subclass galaxy.job_metrics.instrumenters:InstrumentPlugin. The general strategy for implementing plugins is to describe the commands that Galaxy should run on the remote server with post_execute_instrument
and/or pre_execute_instrument
. These commands should likely write output to files in the job’s working directory (produce relative file names with _instrument_file_name
or full paths with _instrument_file_path
to ensure remote Galaxy job runners such as the Pulsar can determine what needs to be shipped back to Galaxy for processing). Finally, plugins must implement a job_properties
method to parse these files created at runtime after the fact back on the Galaxy server and produce a dictionary of properties to summarize the job.
Plugins can also affect how these properties are displayed to the user by overriding the formatter
attribute on the InstrumentPlugin
(see existing plugins for examples).
Future Plans
Future plans can be tracked on this GitHub issue.