Warning
This document is for an old release of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
galaxy.visualization.data_providers package¶
Galaxy visualization/visual analysis data providers.
Submodules¶
galaxy.visualization.data_providers.basic module¶
-
class
galaxy.visualization.data_providers.basic.
BaseDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Bases:
object
Base class for data providers. Data providers both:
read and package data from datasets
write subsets of data to new datasets
-
__init__
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Create basic data provider.
-
has_data
(**kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator
(**kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data
(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
-
class
galaxy.visualization.data_providers.basic.
ColumnDataProvider
(original_dataset, max_lines_returned=30000)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Data provider for columnar data
-
MAX_LINES_RETURNED
= 30000¶
-
galaxy.visualization.data_providers.cigar module¶
Functions for working with SAM/BAM CIGAR representation.
-
galaxy.visualization.data_providers.cigar.
get_ref_based_read_seq_and_cigar
(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]¶ Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.
galaxy.visualization.data_providers.genome module¶
Data providers for genome visualizations.
-
galaxy.visualization.data_providers.genome.
float_nan
(n)[source]¶ Return None instead of NaN to pass jQuery 1.4’s strict JSON
-
galaxy.visualization.data_providers.genome.
get_bounds
(reads, start_pos_index, end_pos_index)[source]¶ Returns the minimum and maximum position for a set of reads.
-
class
galaxy.visualization.data_providers.genome.
FeatureLocationIndexDataProvider
(converted_dataset)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Reads/writes/queries feature location index (FLI) datasets.
-
class
galaxy.visualization.data_providers.genome.
GenomeDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.
-
dataset_type
: str¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
__init__
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Create basic data provider.
-
write_data_to_file
(regions, filename)[source]¶ Write data in region defined by chrom, start, and end to a file.
-
has_data
(chrom, start, end, **kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator
(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data
(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
-
-
class
galaxy.visualization.data_providers.genome.
TabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
-
class
galaxy.visualization.data_providers.genome.
IntervalDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
-
dataset_type
: str = 'interval_index'¶ Processes interval data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
-
class
galaxy.visualization.data_providers.genome.
IntervalTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.IntervalDataProvider
Provides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
BedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Processes BED data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
dataset_type
: str = 'interval_index'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
-
class
galaxy.visualization.data_providers.genome.
BedTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.BedDataProvider
Provides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
RawBedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BedDataProvider
Provide data from BED file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.
VcfDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes VCF data from native format to payload format.
Payload format: An array of entries for each locus in the file. Each array has the following entries:
GUID (unused)
location (0-based)
reference base(s)
alternative base(s)
quality score
whether variant passed filter
sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype
allele counts for each alternative
-
dataset_type
: str = 'variant'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
class
galaxy.visualization.data_providers.genome.
VcfTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.VcfDataProvider
Provides data from a VCF file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
RawVcfDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.VcfDataProvider
Provide data from VCF file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.
BamDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.
-
dataset_type
: str = 'bai'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
get_iterator
(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]¶ Returns a dict with the following attributes:
data - a list of reads with the format [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>] where <read_1> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] and <read_2> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] Field 7 is empty so that mapq scores' location matches that in single-end reads. For single-end reads, read has format: [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>] NOTE: read end and sequence data are not valid for reads outside of requested region and should not be used. max_low - lowest coordinate for the returned reads max_high - highest coordinate for the returned reads message - error/informative message
-
-
class
galaxy.visualization.data_providers.genome.
SamDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None)[source]¶ Bases:
galaxy.visualization.data_providers.genome.BamDataProvider
-
class
galaxy.visualization.data_providers.genome.
BBIDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
BBI data provider for the Galaxy track browser.
-
dataset_type
: str = 'bigwig'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
has_data
(chrom)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_data
(chrom, start, end, start_val=0, max_vals=None, num_samples=1000, **kwargs)[source]¶ Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
-
-
class
galaxy.visualization.data_providers.genome.
BigBedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
-
class
galaxy.visualization.data_providers.genome.
BigWigDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.
-
class
galaxy.visualization.data_providers.genome.
IntervalIndexDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Interval index files used for GFF, Pileup files.
-
dataset_type
: str = 'interval_index'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
write_data_to_file
(regions, filename)[source]¶ Write data in region defined by chrom, start, and end to a file.
-
-
class
galaxy.visualization.data_providers.genome.
RawGFFDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Provide data from GFF file that has not been indexed.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
dataset_type
: str = 'interval_index'¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
-
class
galaxy.visualization.data_providers.genome.
GtfTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
Returns data from GTF datasets that are indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
ENCODEPeakDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes ENCODEPeak data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
class
galaxy.visualization.data_providers.genome.
ENCODEPeakTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider
Provides data from an ENCODEPeak dataset indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
ChromatinInteractionsDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
-
class
galaxy.visualization.data_providers.genome.
ChromatinInteractionsTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider