Data managers

What are Data Managers?

Data Managers are a special class of Galaxy tool which allows for the download and/or creation of data that is stored within Tool Data Tables and their underlying flat (e.g. .loc) files. These tools handle e.g. the creation of indexes and the addition of entries/lines to the data table / .loc file via the Galaxy admin interface.

Data Managers can be defined locally or installed through the Tool Shed.

A Video Introduction

For a video overview on Data Managers, see this presentation from GCC2013.

Tutorial

The most up-to-date methods, including how to use Data Manager repositories in the Tool Shed: GCC2014 TrainingDay

What Kind of Data is Supported

The Data Manager framework supports any kind of built-in (“pre-cached”) data that a tool developer would like to make available via a Tool Data Table. This includes reference genomes, indexes on a reference genome, BLAST databases, protein or pathway domain databases, and so-on. This built-in data does not need to be associated with any type of reference, build, or dbkey (genomic or otherwise), but, in many cases, Tool Data Table entries and their Data Manager will be tied to a specific genomic build.

Graphical Overview of Interplay between Built-in Data and Galaxy Tools

../_images/data_managers_schematic_overview.png

Galaxy Data Manager XML File

The XML File for a Galaxy Data Manager, generally referred to as the “data manager config file”, serves a number of purposes. It defines the availability of Data Managers to a Galaxy instance. It does this by specifying the id of the Data Manager and the Data Manager tool that is associated with it. It also contains a listing of the Tool Data Tables that can be added to by the Data Manager. It also specifies how to manipulate the raw column values provided by the Data Manager Tool and under what directory structure to place the finalized data values.

Pay attention to the following when creating a new Data Manager:

Make sure your XML is valid - Improper XML will most likely cause Galaxy to not load your Data Managers. The easiest way to validate your XML is just to open the XML file itself in e.g. Firefox, which will either parse the file and display it, or show the error and its location in large letters.
Don’t forget to restart Galaxy - Galaxy loads and parses XML at run-time, which means you’ll have to restart it after updating any XML files. The same does not apply if you only update an executable.
Make sure you use an id that is unique within your Galaxy instance - Galaxy can only load one Data Manager having an the same ID at a single time.
When completed, make your Data Manager available in a ToolShed and install it from there - This will avoid any possible collisions due to non-unique IDs, as specialized name-spacing is utilized when Data Managers are installed from a ToolShed.

A Galaxy Data Manager’s config file consists of a subset of the following XML tag sets - each of these is described in detail in the following sections.

Details of XML tag sets

`<data_managers>` tag set

The outer-most tag set. It contains no attributes. Any number of <data_manager> tags can be included within it.

`<data_manager>` tag set

This tag defines a particular Data Manager. Any number of <data_table> tags can be included within it.

attribute	values	required	example	details
`tool_file`	A string*	yes	`tool_file="data_manager/twobit_builder.xml"`	This is the filename of the Data Manager Tool’s XML file, relative to the Galaxy Root. Multiple Data Managers can use the same Tool, but doing so would require “id” to be declared.
`id`	A string*	no	`id="twobit_builder"`	Must be unique across all Data Managers; should be lowercase and contain only letters, numbers, and underscores. While technically optional, it is a best-practice to specify this value. When not specified, it will use the id of the underlying Data Manager Tool.
`version`	A string*	no	`version="0.0.1"`	Deprecated with release 21.09. The version of the data manager defaults to the version of the data manager tool