Warning
This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
galaxy.visualization.data_providers package¶
Galaxy visualization/visual analysis data providers.
Submodules¶
galaxy.visualization.data_providers.basic module¶
-
class
galaxy.visualization.data_providers.basic.
BaseDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Bases:
object
Base class for data providers. Data providers (a) read and package data from datasets; and (b) write subsets of data to new datasets.
-
__init__
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Create basic data provider.
-
has_data
(**kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator
(**kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data
(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
- dataset_type, data
-
-
class
galaxy.visualization.data_providers.basic.
ColumnDataProvider
(original_dataset, max_lines_returned=30000)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Data provider for columnar data
-
MAX_LINES_RETURNED
= 30000¶
-
galaxy.visualization.data_providers.cigar module¶
Functions for working with SAM/BAM CIGAR representation.
-
galaxy.visualization.data_providers.cigar.
get_ref_based_read_seq_and_cigar
(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]¶ Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.
galaxy.visualization.data_providers.genome module¶
Data providers for genome visualizations.
-
galaxy.visualization.data_providers.genome.
float_nan
(n)[source]¶ Return None instead of NaN to pass jQuery 1.4’s strict JSON
-
galaxy.visualization.data_providers.genome.
get_bounds
(reads, start_pos_index, end_pos_index)[source]¶ Returns the minimum and maximum position for a set of reads.
-
class
galaxy.visualization.data_providers.genome.
FeatureLocationIndexDataProvider
(converted_dataset)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Reads/writes/queries feature location index (FLI) datasets.
-
class
galaxy.visualization.data_providers.genome.
GenomeDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProvider
Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.
-
dataset_type
= None¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
col_name_data_attr_mapping
= {}¶
-
__init__
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
-
write_data_to_file
(regions, filename)[source]¶ Write data in region defined by chrom, start, and end to a file.
-
has_data
(chrom, start, end, **kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator
(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data
(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
- dataset_type, data
-
-
class
galaxy.visualization.data_providers.genome.
TabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
-
dataset_type
= 'tabix'¶ Tabix index data provider for the Galaxy track browser.
-
col_name_data_attr_mapping
= {4: {'index': 4, 'name': 'Score'}}¶
-
-
class
galaxy.visualization.data_providers.genome.
IntervalDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
-
dataset_type
= 'interval_index'¶ Processes interval data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
-
class
galaxy.visualization.data_providers.genome.
IntervalTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.IntervalDataProvider
Provides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
BedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Processes BED data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
dataset_type
= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.
BedTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.BedDataProvider
Provides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
RawBedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BedDataProvider
Provide data from BED file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.
VcfDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes VCF data from native format to payload format.
Payload format: An array of entries for each locus in the file. Each array has the following entries:
- GUID (unused)
- location (0-based)
- reference base(s)
- alternative base(s)
- quality score
- whether variant passed filter
- sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype
8-end: allele counts for each alternative
-
col_name_data_attr_mapping
= {'Qual': {'index': 6, 'name': 'Qual'}}¶
-
dataset_type
= 'variant'¶
-
class
galaxy.visualization.data_providers.genome.
VcfTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.VcfDataProvider
Provides data from a VCF file indexed via tabix.
-
dataset_type
= 'variant'¶
-
-
class
galaxy.visualization.data_providers.genome.
RawVcfDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.VcfDataProvider
Provide data from VCF file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.
BamDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.
-
dataset_type
= 'bai'¶
-
get_iterator
(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data
(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]¶ Returns a dict with the following attributes:
data - a list of reads with the format [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>] where <read_1> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] and <read_2> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] Field 7 is empty so that mapq scores' location matches that in single-end reads. For single-end reads, read has format: [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>] NOTE: read end and sequence data are not valid for reads outside of requested region and should not be used. max_low - lowest coordinate for the returned reads max_high - highest coordinate for the returned reads message - error/informative message
-
-
class
galaxy.visualization.data_providers.genome.
SamDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None)[source]¶ Bases:
galaxy.visualization.data_providers.genome.BamDataProvider
-
dataset_type
= 'bai'¶
-
-
class
galaxy.visualization.data_providers.genome.
BBIDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
BBI data provider for the Galaxy track browser.
-
dataset_type
= 'bigwig'¶
-
-
class
galaxy.visualization.data_providers.genome.
BigBedDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
-
class
galaxy.visualization.data_providers.genome.
BigWigDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.
-
class
galaxy.visualization.data_providers.genome.
IntervalIndexDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
,galaxy.visualization.data_providers.genome.FilterableMixin
Interval index files used for GFF, Pileup files.
-
col_name_data_attr_mapping
= {4: {'index': 4, 'name': 'Score'}}¶
-
dataset_type
= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.
RawGFFDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Provide data from GFF file that has not been indexed.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
dataset_type
= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.
GtfTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
Returns data from GTF datasets that are indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
ENCODEPeakDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
Abstract class that processes ENCODEPeak data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
class
galaxy.visualization.data_providers.genome.
ENCODEPeakTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider
Provides data from an ENCODEPeak dataset indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.
ChromatinInteractionsDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
-
class
galaxy.visualization.data_providers.genome.
ChromatinInteractionsTabixDataProvider
(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider
,galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider