Warning

This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

galaxy.visualization.data_providers package

Galaxy visualization/visual analysis data providers.

Subpackages

Submodules

galaxy.visualization.data_providers.basic module

class galaxy.visualization.data_providers.basic.BaseDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]

Bases: object

Base class for data providers. Data providers both:

  • read and package data from datasets

  • write subsets of data to new datasets

__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]

Create basic data provider.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.basic.ColumnDataProvider(original_dataset, max_lines_returned=30000)[source]

Bases: BaseDataProvider

Data provider for columnar data

MAX_LINES_RETURNED = 30000
__init__(original_dataset, max_lines_returned=30000)[source]

Create basic data provider.

original_dataset: DatasetInstance
get_data(columns=None, start_val=0, max_vals=None, skip_comments=True, **kwargs)[source]

Returns data from specified columns in dataset. Format is list of lists where each list is a line of data.

galaxy.visualization.data_providers.cigar module

Functions for working with SAM/BAM CIGAR representation.

galaxy.visualization.data_providers.cigar.get_ref_based_read_seq_and_cigar(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]

Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.

galaxy.visualization.data_providers.genome module

Data providers for genome visualizations.

galaxy.visualization.data_providers.genome.float_nan(n)[source]

Return None instead of NaN to pass jQuery 1.4’s strict JSON

galaxy.visualization.data_providers.genome.get_bounds(reads, start_pos_index, end_pos_index)[source]

Returns the minimum and maximum position for a set of reads.

class galaxy.visualization.data_providers.genome.FeatureLocationIndexDataProvider(converted_dataset)[source]

Bases: object

Reads/writes/queries feature location index (FLI) datasets.

__init__(converted_dataset)[source]
get_data(query)[source]
class galaxy.visualization.data_providers.genome.GenomeDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: BaseDataProvider

Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.

dataset_type: str
col_name_data_attr_mapping: Dict[str | int, Dict] = {}
__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Create basic data provider.

valid_chroms()[source]

Returns chroms/contigs that the dataset contains

has_data(chrom)[source]

Returns true if dataset has data in the specified genome window, false otherwise.

open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str][source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

get_data(chrom: str, start: str | int, end: str | int, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.

Return value must be a dictionary with the following attributes:

dataset_type, data

get_genome_data(chroms_info, **kwargs)[source]

Returns data for complete genome.

get_filters()[source]

Returns filters for provider’s data. Return value is a list of filters; each filter is a dictionary with the keys ‘name’, ‘index’, ‘type’. NOTE: This method uses the original dataset’s datatype and metadata to create the filters.

get_default_max_vals()[source]
class galaxy.visualization.data_providers.genome.FilterableMixin[source]

Bases: object

original_dataset: DatasetInstance
get_filters()[source]

Returns a dataset’s filters.

class galaxy.visualization.data_providers.genome.TabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider, FilterableMixin

Tabix index data provider for the Galaxy track browser.

dataset_type: str = 'tabix'
col_name_data_attr_mapping: Dict[str | int, Dict] = {4: {'index': 4, 'name': 'Score'}}
open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str][source]

Returns an iterator that provides data in the region chrom:start-end

class galaxy.visualization.data_providers.genome.IntervalDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

Processes interval data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

dataset_type: str = 'interval_index'
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Provides

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.IntervalTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider, IntervalDataProvider

Provides data from a BED file indexed via tabix.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

Processes BED data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

dataset_type: str = 'interval_index'
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Provides

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BedTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider, BedDataProvider

Provides data from a BED file indexed via tabix.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.RawBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: BedDataProvider

Provide data from BED file.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.VcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

Abstract class that processes VCF data from native format to payload format.

Payload format: An array of entries for each locus in the file. Each array has the following entries:

  1. GUID (unused)

  2. location (0-based)

  3. reference base(s)

  4. alternative base(s)

  5. quality score

  6. whether variant passed filter

  7. sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype

  8. allele counts for each alternative

col_name_data_attr_mapping: Dict[str | int, Dict] = {'Qual': {'index': 6, 'name': 'Qual'}}
dataset_type: str = 'variant'
process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Returns a dict with the following attributes:

data - a list of variants with the format

.. raw:: text

    [<guid>, <start>, <end>, <name>, cigar, seq]

message - error/informative message
class galaxy.visualization.data_providers.genome.VcfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider, VcfDataProvider

Provides data from a VCF file indexed via tabix.

dataset_type: str = 'variant'
original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.RawVcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: VcfDataProvider

Provide data from VCF file.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider, FilterableMixin

Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.

dataset_type: str = 'bai'
get_filters()[source]

Returns filters for dataset.

open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str][source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]

Returns a dict with the following attributes:

data - a list of reads with the format
    [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>]

    where <read_1> has the format
        [<start>, <end>, <cigar>, <strand>, <read_seq>]

    and <read_2> has the format
        [<start>, <end>, <cigar>, <strand>, <read_seq>]

    Field 7 is empty so that mapq scores' location matches that in single-end reads.
    For single-end reads, read has format:
        [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>]

    NOTE: read end and sequence data are not valid for reads outside of
    requested region and should not be used.

max_low - lowest coordinate for the returned reads
max_high - highest coordinate for the returned reads
message - error/informative message
original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.SamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None)[source]

Bases: BamDataProvider

dataset_type: str = 'bai'
__init__(converted_dataset=None, original_dataset=None, dependencies=None)[source]

Create SamDataProvider.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BBIDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

BBI data provider for the Galaxy track browser.

dataset_type: str = 'bigwig'
valid_chroms()[source]

Returns chroms/contigs that the dataset contains

has_data(chrom)[source]

Returns true if dataset has data in the specified genome window, false otherwise.

get_data(chrom: str, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.

Return value must be a dictionary with the following attributes:

dataset_type, data

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BigBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: BBIDataProvider

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.BigWigDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: BBIDataProvider

Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.IntervalIndexDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider, FilterableMixin

Interval index files used for GFF, Pileup files.

col_name_data_attr_mapping: Dict[str | int, Dict] = {4: {'index': 4, 'name': 'Score'}}
dataset_type: str = 'interval_index'
open_data_file()[source]

Open data file for reading data.

get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str][source]

Returns an iterator for data in data_file in chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.RawGFFDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

Provide data from GFF file that has not been indexed.

NOTE: this data provider does not use indices, and hence will be very slow for large datasets.

dataset_type: str = 'interval_index'
get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end as well as a file offset.

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.GtfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider

Returns data from GTF datasets that are indexed via tabix.

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Process data from an iterator to a format that can be provided to client.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

Abstract class that processes ENCODEPeak data from native format to payload format.

Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]

get_iterator(data_file, chrom, start, end, **kwargs)[source]

Returns an iterator that provides data in the region chrom:start-end

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Provides

dataset_type: str
original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.ENCODEPeakTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider, ENCODEPeakDataProvider

Provides data from an ENCODEPeak dataset indexed via tabix.

get_filters()[source]

Returns filters for dataset.

original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: GenomeDataProvider

process_data(iterator, start_val=0, max_vals=9223372036854775807, **kwargs)[source]

Provides

get_default_max_vals()[source]
dataset_type: str
original_dataset: DatasetInstance
class galaxy.visualization.data_providers.genome.ChromatinInteractionsTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]

Bases: TabixDataProvider, ChromatinInteractionsDataProvider

get_iterator(data_file, chrom, start=0, end=9223372036854775807, interchromosomal=False, **kwargs) Iterator[str][source]
original_dataset: DatasetInstance
galaxy.visualization.data_providers.genome.package_gff_feature(feature, no_detail=False, filter_cols=None) List[str | int | float | List[Tuple[int, int]] | None][source]

Package a GFF feature in an array for data providers.

galaxy.visualization.data_providers.registry module

class galaxy.visualization.data_providers.registry.DataProviderRegistry[source]

Bases: object

Registry for data providers that enables listing and lookup.

__init__()[source]
get_data_provider(trans, name=None, source='data', raw=False, original_dataset=None)[source]

Returns data provider matching parameter values. For standalone data sources, source parameter is ignored.