Warning

This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

galaxy.visualization.data_providers package

Galaxy visualization/visual analysis data providers.

Submodules

galaxy.visualization.data_providers.basic module

class galaxy.visualization.data_providers.basic.BaseDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]

Bases: object

Base class for data providers. Data providers (a) read and package data from datasets; and (b) write subsets of data to new datasets.

__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]

Create basic data provider.

has_data(**kwargs)[source]

Returns true if dataset has data in the specified genome window, false otherwise.

get_iterator(**kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

get_data(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.

Return value must be a dictionary with the following attributes:
dataset_type, data
write_data_to_file(filename, **kwargs)[source]

Write data in region defined by chrom, start, and end to a file.

class galaxy.visualization.data_providers.basic.ColumnDataProvider(original_dataset, max_lines_returned=30000)[source]

Bases: galaxy.visualization.data_providers.basic.BaseDataProvider

Data provider for columnar data

MAX_LINES_RETURNED = 30000
__init__(original_dataset, max_lines_returned=30000)[source]
get_data(columns=None, start_val=0, max_vals=None, skip_comments=True, **kwargs)[source]

Returns data from specified columns in dataset. Format is list of lists where each list is a line of data.

galaxy.visualization.data_providers.cigar module

Functions for working with SAM/BAM CIGAR representation.

galaxy.visualization.data_providers.cigar.get_ref_based_read_seq_and_cigar(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]

Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.

galaxy.visualization.data_providers.genome module

Data providers for genome visualizations.

galaxy.visualization.data_providers.genome.float_nan(n)[source]

Return None instead of NaN to pass jQuery 1.4’s strict JSON

galaxy.visualization.data_providers.genome.get_bounds(reads, start_pos_index, end_pos_index)[source]

Returns the minimum and maximum position for a set of reads.

class galaxy.visualization.data_providers.genome.FeatureLocationIndexDataProvider(converted_dataset)[source]

Bases: galaxy.visualization.data_providers.basic.BaseDataProvider

Reads/writes/queries feature location index (FLI) datasets.

__init__(converted_dataset)[source]
get_data(query)[source]
class galaxy.visualization.data_providers.genome.GenomeDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.basic.BaseDataProvider

Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.

dataset_type = None

Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4

col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }

col_name_data_attr_mapping = {}
__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
write_data_to_file(regions, filename)[source]

Write data in region defined by chrom, start, and end to a file.

valid_chroms()[source]

Returns chroms/contigs that the dataset contains

has_data(chrom, start, end, **kwargs)[source]

Returns true if dataset has data in the specified genome window, false otherwise.

open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

get_data(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.

Return value must be a dictionary with the following attributes:
dataset_type, data
get_genome_data(chroms_info, **kwargs)[source]

Returns data for complete genome.

get_filters()[source]

Returns filters for provider’s data. Return value is a list of filters; each filter is a dictionary with the keys ‘name’, ‘index’, ‘type’. NOTE: This method uses the original dataset’s datatype and metadata to create the filters.

get_default_max_vals()[source]
class galaxy.visualization.data_providers.genome.FilterableMixin[source]

Bases: object

get_filters()[source]

Returns a dataset’s filters.

class galaxy.visualization.data_providers.genome.TabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider, galaxy.visualization.data_providers.genome.FilterableMixin

dataset_type = 'tabix'

Tabix index data provider for the Galaxy track browser.

col_name_data_attr_mapping = {4: {'index': 4, 'name': 'Score'}}
open_data_file()[source]
get_iterator(data_file, chrom, start, end, **kwargs)[source]
write_data_to_file(regions, filename)[source]
class galaxy.visualization.data_providers.genome.IntervalDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

dataset_type = 'interval_index'

Processes interval data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

get_iterator(data_file, chrom, start, end, **kwargs)[source]
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Provides

write_data_to_file(regions, filename)[source]
class galaxy.visualization.data_providers.genome.IntervalTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider, galaxy.visualization.data_providers.genome.IntervalDataProvider

Provides data from a BED file indexed via tabix.

class galaxy.visualization.data_providers.genome.BedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

Processes BED data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

dataset_type = 'interval_index'
get_iterator(data_file, chrom, start, end, **kwargs)[source]
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Provides

write_data_to_file(regions, filename)[source]
class galaxy.visualization.data_providers.genome.BedTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider, galaxy.visualization.data_providers.genome.BedDataProvider

Provides data from a BED file indexed via tabix.

class galaxy.visualization.data_providers.genome.RawBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.BedDataProvider

Provide data from BED file.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

get_iterator(data_file, chrom=None, start=None, end=None, **kwargs)[source]
class galaxy.visualization.data_providers.genome.VcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

Abstract class that processes VCF data from native format to payload format.

Payload format: An array of entries for each locus in the file. Each array has the following entries:

  1. GUID (unused)
  2. location (0-based)
  3. reference base(s)
  4. alternative base(s)
  5. quality score
  6. whether variant passed filter
  7. sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype

8-end: allele counts for each alternative

col_name_data_attr_mapping = {'Qual': {'index': 6, 'name': 'Qual'}}
dataset_type = 'variant'
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Returns a dict with the following attributes:

data - a list of variants with the format

.. raw:: text

    [<guid>, <start>, <end>, <name>, cigar, seq]

message - error/informative message
write_data_to_file(regions, filename)[source]
class galaxy.visualization.data_providers.genome.VcfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider, galaxy.visualization.data_providers.genome.VcfDataProvider

Provides data from a VCF file indexed via tabix.

dataset_type = 'variant'
class galaxy.visualization.data_providers.genome.RawVcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.VcfDataProvider

Provide data from VCF file.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

open_data_file()[source]
get_iterator(data_file, chrom, start, end, **kwargs)[source]
class galaxy.visualization.data_providers.genome.BamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider, galaxy.visualization.data_providers.genome.FilterableMixin

Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.

dataset_type = 'bai'
get_filters()[source]

Returns filters for dataset.

write_data_to_file(regions, filename)[source]

Write reads in regions to file.

open_data_file()[source]
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]

Returns a dict with the following attributes:

data - a list of reads with the format
    [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>]

    where <read_1> has the format
        [<start>, <end>, <cigar>, <strand>, <read_seq>]

    and <read_2> has the format
        [<start>, <end>, <cigar>, <strand>, <read_seq>]

    Field 7 is empty so that mapq scores' location matches that in single-end reads.
    For single-end reads, read has format:
        [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>]

    NOTE: read end and sequence data are not valid for reads outside of
    requested region and should not be used.

max_low - lowest coordinate for the returned reads
max_high - highest coordinate for the returned reads
message - error/informative message
class galaxy.visualization.data_providers.genome.SamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None)[source]

Bases: galaxy.visualization.data_providers.genome.BamDataProvider

dataset_type = 'bai'
__init__(converted_dataset=None, original_dataset=None, dependencies=None)[source]

Create SamDataProvider.

class galaxy.visualization.data_providers.genome.BBIDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

BBI data provider for the Galaxy track browser.

dataset_type = 'bigwig'
valid_chroms()[source]
has_data(chrom)[source]
get_data(chrom, start, end, start_val=0, max_vals=None, num_samples=1000, **kwargs)[source]
class galaxy.visualization.data_providers.genome.BigBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.BBIDataProvider

class galaxy.visualization.data_providers.genome.BigWigDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.BBIDataProvider

Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.

class galaxy.visualization.data_providers.genome.IntervalIndexDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider, galaxy.visualization.data_providers.genome.FilterableMixin

Interval index files used for GFF, Pileup files.

col_name_data_attr_mapping = {4: {'index': 4, 'name': 'Score'}}
dataset_type = 'interval_index'
write_data_to_file(regions, filename)[source]
open_data_file()[source]
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator for data in data_file in chrom:start-end

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
class galaxy.visualization.data_providers.genome.RawGFFDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

Provide data from GFF file that has not been indexed.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

dataset_type = 'interval_index'
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end as well as a file offset.

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

class galaxy.visualization.data_providers.genome.GtfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider

Returns data from GTF datasets that are indexed via tabix.

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
class galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

Abstract class that processes ENCODEPeak data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

get_iterator(data_file, chrom, start, end, **kwargs)[source]
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Provides

class galaxy.visualization.data_providers.genome.ENCODEPeakTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider, galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider

Provides data from an ENCODEPeak dataset indexed via tabix.

get_filters()[source]

Returns filters for dataset.

class galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.GenomeDataProvider

process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]

Provides

get_default_max_vals()[source]
class galaxy.visualization.data_providers.genome.ChromatinInteractionsTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: galaxy.visualization.data_providers.genome.TabixDataProvider, galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider

get_iterator(data_file, chrom, start=0, end=9223372036854775807, interchromosomal=False, **kwargs)[source]
galaxy.visualization.data_providers.genome.package_gff_feature(feature, no_detail=False, filter_cols=[])[source]

Package a GFF feature in an array for data providers.

galaxy.visualization.data_providers.registry module

class galaxy.visualization.data_providers.registry.DataProviderRegistry[source]

Bases: object

Registry for data providers that enables listing and lookup.

__init__()[source]
get_data_provider(trans, name=None, source='data', raw=False, original_dataset=None)[source]

Returns data provider matching parameter values. For standalone data sources, source parameter is ignored.