galaxy.jobs package

Support for running a tool in Galaxy via an internal job management system

class galaxy.jobs.JobDestination(**kwds)[source]

Bases: Bunch

Provides details about where a job runs

__init__(**kwds)[source]

class galaxy.jobs.NoopQueue[source]

Bases: JobQueueI

Implements the JobQueueI interface but does nothing

put(*args, **kwargs)[source]

shutdown()[source]

class galaxy.jobs.JobToolConfiguration(**kwds)[source]

Bases: Bunch

Provides details on what handler and destination a tool should use

A JobToolConfiguration will have the required attribute ‘id’ and optional attributes ‘handler’, ‘destination’, and ‘params’

__init__(**kwds)[source]

get_resource_group()[source]

class galaxy.jobs.JobConfiguration(app: MinimalManagerApp)[source]

Bases: ConfiguresHandlers

A parser and interface to advanced job management features.

These features are configured in the job configuration, by default, job_conf.yml

DEFAULT_BASE_HANDLER_POOLS: tuple[str, ...] = ('job-handlers',)

DEFAULT_NWORKERS = 4

DEFAULT_HANDLER_READY_WINDOW_SIZE = 100

JOB_RESOURCE_CONDITIONAL_XML = '<conditional name="__job_resource">\n <param name="__job_resource__select" type="select" label="Job Resource Parameters">\n <option value="no">Use default job resource parameters</option>\n <option value="yes">Specify job resource parameters</option>\n </param>\n <when value="no"/>\n <when value="yes"/>\n </conditional>'

__init__(app: MinimalManagerApp)[source]: Parse the job configuration XML.

runner_plugins: list[dict]

handlers: dict

handler_runner_plugins: dict[str, str]

destinations: dict[str, tuple]

tools: dict[str, list]

tool_classes: dict[str, list]

resource_groups: dict[str, list]

resource_parameters: dict[str, Any]

get_tool_resource_xml(tool_id, tool_type)[source]

Given a tool id, return XML elements describing parameters to insert into job resources.

Tool id:: A tool ID (a string)
Tool type:: A tool type (a string)
Returns:: List of parameter elements.

static get_params(config, parent)[source]

static get_envs(parent)[source]

Parses any child <env> tags in to a dictionary suitable for persistence.

Parameters:: parent (lxml.etree._Element) – Parent element in which to find child <env> tags.
Returns:: dict

static get_resubmits(parent)[source]

Parses any child <resubmit> tags in to a dictionary suitable for persistence.

Parameters:: parent (lxml.etree._Element) – Parent element in which to find child <resubmit> tags.
Returns:: dict

property default_job_tool_configuration

The default JobToolConfiguration, used if a tool does not have an explicit defintion in the configuration. It consists of a reference to the default handler and default destination.

Returns:: JobToolConfiguration – a representation of a <tool> element that uses the default handler and destination

get_job_tool_configurations(ids, tool_classes)[source]

Get all configured JobToolConfigurations for a tool ID, or, if given a list of IDs, the JobToolConfigurations for the first id in ids matching a tool definition.

Note

You should not mix tool shed tool IDs, versionless tool shed IDs, and tool config tool IDs that refer to the same tool.

Parameters:: ids (list or str.) – Tool ID or IDs to fetch the JobToolConfiguration of.
Returns:: list – JobToolConfiguration Bunches representing <tool> elements matching the specified ID(s).

Example tool ID strings include:

Full tool shed id: toolshed.example.org/repos/nate/filter_tool_repo/filter_tool/1.0.0
Tool shed id less version: toolshed.example.org/repos/nate/filter_tool_repo/filter_tool
Tool config tool id: filter_tool

get_destination(id_or_tag)[source]

Given a destination ID or tag, return the JobDestination matching the provided ID or tag

Parameters:: id_or_tag (str) – A destination ID or tag.
Returns:: JobDestination – A valid destination

Destinations are deepcopied as they are expected to be passed in to job runners, which will modify them for persisting params set at runtime.

get_destinations(id_or_tag) → Iterable[JobDestination][source]

Given a destination ID or tag, return all JobDestinations matching the provided ID or tag

Parameters:: id_or_tag (str) – A destination ID or tag.
Returns:: list or tuple of JobDestinations

Destinations are not deepcopied, so they should not be passed to anything which might modify them.

get_job_runner_plugins(handler_id: str)[source]

Load all configured job runner plugins

Returns:: list of job runner plugins

is_id(collection)[source]

Given a collection of handlers or destinations, indicate whether the collection represents a real ID

Parameters:: collection (tuple or list) – A representation of a destination or handler
Returns:: bool

is_tag(collection)[source]

Given a collection of handlers or destinations, indicate whether the collection represents a tag

Parameters:: collection (tuple or list) – A representation of a destination or handler
Returns:: bool

convert_legacy_destinations(job_runners)[source]

Converts legacy (from a URL) destinations to contain the appropriate runner params defined in the URL.

Parameters:: job_runners (list of job runner plugins) – All loaded job runner plugins.

class galaxy.jobs.JobWrapper(job, queue: BaseJobHandlerQueue, use_persisted_destination=False)[source]

Bases: MinimalJobWrapper

__init__(job, queue: BaseJobHandlerQueue, use_persisted_destination=False)[source]

property job_destination

Return the JobDestination that this job will use to run. This will either be a configured destination, a randomly selected destination if the configured destination was a tag, or a dynamically generated destination from the dynamic runner.

Calling this method for the first time causes the dynamic runner to do its calculation, if any.

Returns:: JobDestination

set_job_destination(job_destination, external_id=None, flush=True, job=None)[source]

Persist job destination params in the database for recovery.

self.job_destination is not used because a runner may choose to rewrite parts of the destination (e.g. the params).

class galaxy.jobs.TaskWrapper(task, queue)[source]

Bases: JobWrapper

Extension of JobWrapper intended for running tasks. Should be refactored into a generalized executable unit wrapper parent, then jobs and tasks.

is_task = True

__init__(task, queue)[source]

can_split()[source]

get_job()[source]

get_task()[source]

get_id_tag()[source]

prepare(compute_environment=None)[source]: Prepare the job to run by creating the working directory and the config files.

fail(message, exception=False, tool_stdout='', tool_stderr='', exit_code=None, job_stdout=None, job_stderr=None)[source]: Indicate job failure by setting state and message on all output datasets.

change_state(state, info=False, flush=True, job=None)[source]

get_state()[source]

get_exit_code()[source]

set_runner(runner_url, external_id)[source]

finish(stdout, stderr, tool_exit_code=None, **kwds)[source]: Called to indicate that the associated command has been run. Updates the output datasets based on stderr and stdout from the command, and the contents of the output files.

cleanup(delete_files=True)[source]

get_command_line()[source]: Return complete command line, including possible version command.

get_output_file_id(file)[source]

get_tool_provided_job_metadata()[source]

get_dataset_finish_context(job_context, dataset)[source]

setup_external_metadata(exec_dir=None, tmp_dir=None, dataset_files_path=None, config_root=None, config_file=None, datatypes_config=None, set_extension=True, **kwds)[source]

get_output_destination(output_path)[source]: Destination for outputs marked as from_work_dir. These must be copied with the same basename as the path for the ultimate output destination. This is required in the task case so they can be merged.

Subpackages

Submodules

galaxy.jobs.command_factory module

galaxy.jobs.command_factory.build_command(runner: BaseJobRunner, job_wrapper: MinimalJobWrapper, container: Container | None = None, modify_command_for_container: bool = True, include_metadata: bool = False, include_work_dir_outputs: bool = True, create_tool_working_directory: bool = True, remote_command_params=None, remote_job_directory=None, stream_stdout_stderr: bool = False)[source]

Compose the sequence of commands necessary to execute a job. This will currently include:

environment settings corresponding to any requirement tags

preparing input files

command line taken from job wrapper

commands to set metadata (if include_metadata is True)

galaxy.jobs.dynamic_tool_destination module

galaxy.jobs.dynamic_tool_destination.priority_list: set[str] = {}: Instantiated to a list of all valid destinations in the job configuration file if run directly to validate configs. Otherwise, remains None. We often check to see if app is None, because if it is then we’ll try using the destination_list instead. -

galaxy.jobs.dynamic_tool_destination.destination_list: set[str] = {}: The largest the edit distance can be for a word to be considered A correction for another word.

galaxy.jobs.dynamic_tool_destination.max_edit_dist = 2: List of valid categories that can be expected in the configuration.

exception galaxy.jobs.dynamic_tool_destination.MalformedYMLException[source]: Bases: Exception

exception galaxy.jobs.dynamic_tool_destination.ScannerError[source]: Bases: Exception

galaxy.jobs.dynamic_tool_destination.get_keys_from_dict(dl: dict | list, keys_list: list) → None[source]: This function builds a list using the keys from nest dictionaries

class galaxy.jobs.dynamic_tool_destination.RuleValidator[source]

Bases: object

This class is the primary facility for validating configs. It’s always called in map_tool_to_destination and it’s called for validating config directly through DynamicToolDestination.py

classmethod validate_rule(rule_type: str, app, return_bool: bool = False, *args, **kwargs)[source]

This function is responsible for passing each rule to its relevant function.

Parameters:

rule_type – the current rule’s type
return_bool – True when we are only interested in the result of the validation, and not the validated rule itself.

Return type:

bool, dict (depending on return_bool)

Returns:

validated rule or result of validation (depending on return_bool)

galaxy.jobs.dynamic_tool_destination.parse_yaml(path: str = '/config/tool_destinations.yml', job_conf_path: str = '/config/job_conf.xml', app=None, test: bool = False, return_bool: bool = False)[source]

Get a yaml file from path and send it to validate_config for validation.

Parameters:

path – the path to the tool destinations config file
job_conf_path – the path to the job config file
test – indicates whether to run in test mode or production mode
return_bool (bool) – True when we are only interested in the result of the validation, and not the validated rule itself.

Return type:

bool, dict (depending on return_bool)

Returns:

validated rule or result of validation (depending on return_bool)

galaxy.jobs.dynamic_tool_destination.validate_destination(app, destination: str, err_message: str, err_message_contents, return_bool: bool = True)[source]

Validate received destination id.

Parameters:

app – Current app
destination – string containing the destination id that is being validated
err_message – Error message to be formatted with the contents of err_message_contents upon the event of invalid destination
err_message_contents (tuple) – A tuple of strings to be placed in err_message
return_bool – Whether or not the calling function has been told to return a boolean value or not. Determines whether or not to print ‘Ignoring…’ after error messages.

Return type:

bool

Returns:

True if the destination is valid and False otherwise.

galaxy.jobs.dynamic_tool_destination.validate_config(obj: dict, app=None, return_bool: bool = False)[source]

Validate received config.

Parameters:

obj – the entire contents of the config
return_bool – True when we are only interested in the result of the validation, and not the validated rule itself.

Return type:

bool, dict (depending on return_bool)

Returns:

validated rule or result of validation (depending on return_bool)

galaxy.jobs.dynamic_tool_destination.bytes_to_str(size, unit='YB')[source]

Uses the bi convention: 1024 B = 1 KB since this method primarily has inputs of bytes for RAM

@type size: int @param size: the size in int (bytes) to be converted to str

@rtype: str @return return_str: the resulting string

galaxy.jobs.dynamic_tool_destination.str_to_bytes(size)[source]

Uses the bi convention: 1024 B = 1 KB since this method primarily has inputs of bytes for RAM

@type size: str @param size: the size in str to be converted to int (bytes)

@rtype: int @return curr_size: the resulting size converted from str

galaxy.jobs.dynamic_tool_destination.map_tool_to_destination(job: Job, app: MinimalManagerApp, tool: Tool, user_email: str | None, test: bool = False, path: str | None = None, job_conf_path: str | None = None)[source]

Dynamically allocate resources

@param job: galaxy job @param app: current app @param tool: current tool

@type test: bool @param test: unused and ignored

@type path: str @param path: path to tool_destinations.yml

@type job_conf_path: str @param job_conf_path: path to job_conf.xml

galaxy.jobs.dynamic_tool_destination.get_destination_list_from_job_config(job_config_location) → set[source]

returns A list of all destination IDs declared in the job configuration

Parameters:: job_config_location (str) – The location of the job config file relative to the galaxy root directory. If NoneType, defaults to galaxy/config/job_conf.xml, galaxy/config/job_conf.xml.sample_advanced, or galaxy/config/job_conf.xml.sample_basic (first one that exists)
Returns:: A list of all of the destination IDs declared in the job configuration file.

galaxy.jobs.dynamic_tool_destination.get_edit_distance(source, target)[source]

returns the edit distance (levenshtein distance) between two strings. code from: en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance

@type str1: str @param str1: The first string

@type str2: str @param str2: The second string

@rtype: int @return: The edit distance between str1 and str2

galaxy.jobs.dynamic_tool_destination.get_typo_correction(typo_str, word_set, max_dist)[source]

returns the string in a set that closest matches the input string, as long as the edit distance between them is equal to or smaller than a value, or the words are the same when case is not considered. If there are no appropriate matches, nothing is returned instead.

Parameters:

typo_str (str) – The string to be compared
word_set (set of str) – The set of strings to compare to
max_dist (int) – the largest allowed edit distance between the word and the result. If nothing is within this range, nothing is returned

Return type:

str or NoneType

Returns:

The closest matching string, or None, if no strings being compared to are within max_dist edit distance.

galaxy.jobs.dynamic_tool_destination.verbose = True: list of all valid priorities, inferred from the global default_desinations section of the config

galaxy.jobs.handler module

Galaxy job handler, prepares, runs, tracks, and finishes Galaxy jobs

class galaxy.jobs.handler.JobHandlerI[source]

Bases: object

dispatcher: DefaultJobDispatcher | None

job_queue: JobQueueI

job_stop_queue: JobQueueI

abstract start()[source]

abstract shutdown()[source]

class galaxy.jobs.handler.JobHandler(app)[source]

Bases: JobHandlerI

Handle the preparation, running, tracking, and finishing of jobs

__init__(app)[source]

start()[source]

shutdown()[source]

class galaxy.jobs.handler.ItemGrabber(app, handler_assignment_method=None, max_grab=None, self_handler_tags=None, handler_tags=None)[source]

Bases: object

grab_model: type[Job] | type[WorkflowInvocation]

__init__(app, handler_assignment_method=None, max_grab=None, self_handler_tags=None, handler_tags=None)[source]

setup_query()[source]

static get_grabbable_handler_assignment_method(handler_assignment_methods)[source]

grab_unhandled_items()[source]: Attempts to assign unassigned jobs or invocaions to itself using DB serialization methods, if enabled. This simply sets Job.handler or WorkflowInvocation.handler to the current server name, which causes the job to be picked up by the appropriate handler.

class galaxy.jobs.handler.InvocationGrabber(app, handler_assignment_method=None, max_grab=None, self_handler_tags=None, handler_tags=None)[source]

Bases: ItemGrabber

grab_model: alias of WorkflowInvocation

class galaxy.jobs.handler.JobGrabber(app, handler_assignment_method=None, max_grab=None, self_handler_tags=None, handler_tags=None)[source]

Bases: ItemGrabber

grab_model: alias of Job

exception galaxy.jobs.handler.StopSignalException[source]

Bases: Exception

Exception raised when queue returns a stop signal.

class galaxy.jobs.handler.BaseJobHandlerQueue(app: MinimalManagerApp, dispatcher: DefaultJobDispatcher)[source]

Bases: JobQueueI, Monitors

STOP_SIGNAL = <object object>

__init__(app: MinimalManagerApp, dispatcher: DefaultJobDispatcher)[source]: Initializes the Queue, creates (unstarted) monitoring thread.

class galaxy.jobs.handler.JobHandlerQueue(app: MinimalManagerApp, dispatcher)[source]

Bases: BaseJobHandlerQueue

Job Handler’s Internal Queue, this is what actually implements waiting for jobs to be runnable and dispatching to a JobRunner.

__init__(app: MinimalManagerApp, dispatcher)[source]: Initializes the Queue, creates (unstarted) monitoring thread.

start()[source]: Starts the JobHandler’s thread after checking for any unhandled jobs.

job_wrapper(job, use_persisted_destination=False)[source]

job_pair_for_id(id)[source]

get_user_job_count(user_id)[source]

get_user_job_count_per_destination(user_id)[source]

increase_running_job_count(user_id, destination_id)[source]

get_total_job_count_per_destination()[source]

put(job_id, tool_id)[source]: Add a job to the queue (by job identifier)

shutdown()[source]: Attempts to gracefully shut down the worker thread

class galaxy.jobs.handler.JobHandlerStopQueue(app: MinimalManagerApp, dispatcher)[source]

Bases: BaseJobHandlerQueue

A queue for jobs which need to be terminated prematurely.

__init__(app: MinimalManagerApp, dispatcher)[source]: Initializes the Queue, creates (unstarted) monitoring thread.

start()[source]

put(job_id: int, error_msg: str | None = None)[source]

shutdown()[source]: Attempts to gracefully shut down the worker thread

class galaxy.jobs.handler.DefaultJobDispatcher(app: MinimalManagerApp)[source]

Bases: object

__init__(app: MinimalManagerApp)[source]

start()[source]

url_to_destination(url: str)[source]

This is used by the runner mapper (a.k.a. dynamic runner) and recovery methods to have runners convert URLs to destinations.

New-style runner plugin IDs must match the URL’s scheme for this to work.

get_job_runner(job_wrapper, get_task_runner=False)[source]

put(job_wrapper)[source]

stop(job: Job, job_wrapper: JobWrapper)[source]: Stop the given job. The input variable job may be either a Job or a Task.

recover(job: Job, job_wrapper: JobWrapper)[source]

shutdown()[source]

galaxy.jobs.manager module

Top-level Galaxy job manager, moves jobs to handler(s)

class galaxy.jobs.manager.JobManager(app: MinimalManagerApp)[source]

Bases: object

Highest level interface to job management.

__init__(app: MinimalManagerApp)[source]

job_handler: JobHandlerI

start()[source]

enqueue(job, tool=None, flush=True)[source]

Queue a job for execution.

Due to the nature of some handler assignment methods which are wholly DB-based, the enqueue method will flush the job. Callers who create the job typically should not flush the job before handing it off to enqueue(). If a job handler cannot be assigned, py:class:ToolExecutionError is raised.

Parameters:

job (Instance of galaxy.model.Job.) – Job to enqueue.
tool (Instance of galaxy.tools.Tool.) – Tool that the job will execute.

Raises:

ToolExecutionError – if a handler was unable to be assigned.

Returns:

str or None – Handler ID, tag, or pool assigned to the job.

stop(job, message=None)[source]

Stop a job that is currently executing.

This can be safely called on jobs that have already terminated.

Parameters:

job (Instance of galaxy.model.Job.) – Job to stop.
message (str) – Message (if any) to be set on the job and output dataset(s) to explain the reason for stopping.

shutdown()[source]

class galaxy.jobs.manager.NoopManager(*args, **kwargs)[source]

Bases: object

Implements the JobManager interface but does nothing

__init__(*args, **kwargs)[source]

enqueue(*args, **kwargs)[source]

stop(*args, **kwargs)[source]

class galaxy.jobs.manager.NoopHandler(*args, **kwargs)[source]

Bases: JobHandlerI

Implements the JobHandler interface but does nothing

__init__(*args, **kwargs)[source]

start()[source]

shutdown(*args)[source]

galaxy.jobs.mapper module

exception galaxy.jobs.mapper.JobMappingConfigurationException[source]: Bases: Exception

exception galaxy.jobs.mapper.JobMappingException(failure_message)[source]

Bases: Exception

__init__(failure_message)[source]

exception galaxy.jobs.mapper.JobNotReadyException(job_state=None, message=None)[source]

Bases: Exception

__init__(job_state=None, message=None)[source]

class galaxy.jobs.mapper.JobRunnerMapper(job_wrapper: JobWrapper, url_to_destination: Callable[[str], JobDestination], job_config: JobConfiguration)[source]

Bases: object

This class is responsible to managing the mapping of jobs (in the form of job_wrappers) to job runner url strings.

__init__(job_wrapper: JobWrapper, url_to_destination: Callable[[str], JobDestination], job_config: JobConfiguration)[source]

rules_module: ModuleType

get_job_destination(params)[source]: cached_job_destination is a public property that is sometimes externally set to short-circuit the mapper, such as during resubmits. get_job_destination will respect that and not run the mapper if so.

cache_job_destination(raw_job_destination)[source]: Force update of cached_job_destination to mapper determined job destination, overwriting any externally set cached_job_destination

galaxy.jobs.rule_helper module

class galaxy.jobs.rule_helper.RuleHelper(app)[source]

Bases: object

Utility to allow job rules to interface cleanly with the rest of Galaxy and shield them from low-level details of models, metrics, etc….

Currently focus is on figuring out job statistics for a given user, but could interface with other stuff as well.

__init__(app)[source]

supports_container(job_or_tool, container_type)[source]

Job rules can pass this function a job, job_wrapper, or tool and determine if the underlying tool believes it can be run with a specific container type.

Parameters:

job_or_tool
container_type – either “docker” or “singularity” currently

Returns:

true if the tool supports the specified container type.

supports_docker(job_or_tool)[source]

Returns true if the tool or job supports running on a singularity container.

Parameters:: job_or_tool – the job or tool to test for.
Returns:: true if the tool/job can run in docker.

supports_singularity(job_or_tool)[source]

Returns true if the tool or job supports running on a singularity container.

Parameters:: job_or_tool – the job or tool to test for.
Returns:: true if the tool/job can run in singularity.

job_count(**kwds)[source]

sum_job_runtime(**kwds)[source]

metric_query(select, metric_name, plugin, numeric=True)[source]

query(select_expression)[source]

should_burst(destination_ids, num_jobs, job_states=None)[source]

Check if the specified destinations destination_ids have at least num_jobs assigned to it - send in job_state as queued to limit this check to number of jobs queued.

See stock_rules for an simple example of using this function - but to get the most out of it - it should probably be used with custom job rules that can respond to the bursting by allocating resources, launching cloud nodes, etc….

choose_one(lst, hash_value=None)[source]: Choose a random value from supplied list. If hash_value is passed in then every request with that same hash_value would produce the same choice from the supplied list.

job_hash(job, hash_by=None)[source]

Produce a reproducible hash for the given job on various criteria - for instance if hash_by is “workflow_invocation,history” - all jobs within the same workflow invocation will receive the same hash - for jobs outside of workflows all jobs within the same history will receive the same hash, other jobs will be hashed on job’s id randomly.

Primarily intended for use with choose_one above - to consistent route or schedule related jobs.

galaxy.jobs.stock_rules module

Stock job ‘dynamic’ rules for use in the job config file - these may cover some simple use cases but will just proxy into functions in rule_helper so similar functionality - but more tailored and composable can be utilized in custom rules.

galaxy.jobs.stock_rules.choose_one(rule_helper, job, destination_ids, hash_by='job')[source]

galaxy.jobs.stock_rules.burst(rule_helper, job, from_destination_ids, to_destination_id, num_jobs, job_states=None)[source]

galaxy.jobs.stock_rules.docker_dispatch(rule_helper, tool, docker_destination_id, default_destination_id)[source]