Warning
This document is for an old release of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
galaxy.visualization.data_providers package¶
Galaxy visualization/visual analysis data providers.
Submodules¶
galaxy.visualization.data_providers.basic module¶
- class galaxy.visualization.data_providers.basic.BaseDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶
Bases:
object
Base class for data providers. Data providers both:
read and package data from datasets
write subsets of data to new datasets
- __init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶
Create basic data provider.
- original_dataset: galaxy.model.DatasetInstance¶
- has_data(**kwargs)[source]¶
Returns true if dataset has data in the specified genome window, false otherwise.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]¶
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶
Process data from an iterator to a format that can be provided to client.
- get_data(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶
Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- class galaxy.visualization.data_providers.basic.ColumnDataProvider(original_dataset, max_lines_returned=30000)[source]¶
Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Data provider for columnar data
- MAX_LINES_RETURNED = 30000¶
- original_dataset: galaxy.model.DatasetInstance¶
galaxy.visualization.data_providers.cigar module¶
Functions for working with SAM/BAM CIGAR representation.
- galaxy.visualization.data_providers.cigar.get_ref_based_read_seq_and_cigar(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]¶
Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.
galaxy.visualization.data_providers.genome module¶
Data providers for genome visualizations.
- galaxy.visualization.data_providers.genome.float_nan(n)[source]¶
Return None instead of NaN to pass jQuery 1.4’s strict JSON
- galaxy.visualization.data_providers.genome.get_bounds(reads, start_pos_index, end_pos_index)[source]¶
Returns the minimum and maximum position for a set of reads.
- class galaxy.visualization.data_providers.genome.FeatureLocationIndexDataProvider(converted_dataset)[source]¶
Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Reads/writes/queries feature location index (FLI) datasets.
- get_data(query)[source]¶
Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.GenomeDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.
- __init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Create basic data provider.
- write_data_to_file(regions, filename)[source]¶
Write data in region defined by chrom, start, and end to a file.
- has_data(chrom, start, end, **kwargs)[source]¶
Returns true if dataset has data in the specified genome window, false otherwise.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]¶
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶
Process data from an iterator to a format that can be provided to client.
- get_data(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶
Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- class galaxy.visualization.data_providers.genome.FilterableMixin[source]¶
Bases:
object
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.TabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Tabix index data provider for the Galaxy track browser.
- class galaxy.visualization.data_providers.genome.IntervalDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Processes interval data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end
- write_data_to_file(regions, filename)[source]¶
Write data in region defined by chrom, start, and end to a file.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.IntervalTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.IntervalDataProvider
Provides data from a BED file indexed via tabix.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Processes BED data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end
- write_data_to_file(regions, filename)[source]¶
Write data in region defined by chrom, start, and end to a file.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BedTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.BedDataProvider
Provides data from a BED file indexed via tabix.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.RawBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.BedDataProvider
Provide data from BED file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.VcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes VCF data from native format to payload format.
Payload format: An array of entries for each locus in the file. Each array has the following entries:
GUID (unused)
location (0-based)
reference base(s)
alternative base(s)
quality score
whether variant passed filter
sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype
allele counts for each alternative
- class galaxy.visualization.data_providers.genome.VcfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.VcfDataProvider
Provides data from a VCF file indexed via tabix.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.RawVcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.VcfDataProvider
Provide data from VCF file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]¶
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]¶
Returns a dict with the following attributes:
data - a list of reads with the format [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>] where <read_1> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] and <read_2> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] Field 7 is empty so that mapq scores' location matches that in single-end reads. For single-end reads, read has format: [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>] NOTE: read end and sequence data are not valid for reads outside of requested region and should not be used. max_low - lowest coordinate for the returned reads max_high - highest coordinate for the returned reads message - error/informative message
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.SamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None)[source]¶
Bases:
galaxy.visualization.data_providers.genome.BamDataProvider
- __init__(converted_dataset=None, original_dataset=None, dependencies=None)[source]¶
Create SamDataProvider.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BBIDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
BBI data provider for the Galaxy track browser.
- has_data(chrom)[source]¶
Returns true if dataset has data in the specified genome window, false otherwise.
- get_data(chrom, start, end, start_val=0, max_vals=None, num_samples=1000, **kwargs)[source]¶
Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BigBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.BigWigDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.IntervalIndexDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Interval index files used for GFF, Pileup files.
- write_data_to_file(regions, filename)[source]¶
Write data in region defined by chrom, start, and end to a file.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]¶
Returns an iterator for data in data_file in chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶
Process data from an iterator to a format that can be provided to client.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.RawGFFDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Provide data from GFF file that has not been indexed.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end as well as a file offset.
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶
Process data from an iterator to a format that can be provided to client.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.GtfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
Returns data from GTF datasets that are indexed via tabix.
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶
Process data from an iterator to a format that can be provided to client.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes ENCODEPeak data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]¶
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.ENCODEPeakTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider
Provides data from an ENCODEPeak dataset indexed via tabix.
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
- original_dataset: galaxy.model.DatasetInstance¶
- class galaxy.visualization.data_providers.genome.ChromatinInteractionsTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider
- get_iterator(data_file, chrom, start=0, end=9223372036854775807, interchromosomal=False, **kwargs) Iterator[str] [source]¶
- original_dataset: galaxy.model.DatasetInstance¶