galaxy.visualization.data_providers package
Galaxy visualization/visual analysis data providers.
Subpackages
- galaxy.visualization.data_providers.phyloviz package
PhylovizDataProvider
- Submodules
- galaxy.visualization.data_providers.phyloviz.baseparser module
- galaxy.visualization.data_providers.phyloviz.newickparser module
- galaxy.visualization.data_providers.phyloviz.nexusparser module
- galaxy.visualization.data_providers.phyloviz.phyloxmlparser module
Submodules
galaxy.visualization.data_providers.basic module
- class galaxy.visualization.data_providers.basic.BaseDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]
Bases:
object
Base class for data providers. Data providers both:
read and package data from datasets
write subsets of data to new datasets
- __init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]
Create basic data provider.
- original_dataset: DatasetInstance
- has_data(**kwargs)[source]
Returns true if dataset has data in the specified genome window, false otherwise.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
Process data from an iterator to a format that can be provided to client.
- get_data(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]
Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- class galaxy.visualization.data_providers.basic.ColumnDataProvider(original_dataset, max_lines_returned=30000)[source]
Bases:
BaseDataProvider
Data provider for columnar data
- MAX_LINES_RETURNED = 30000
- original_dataset: DatasetInstance
galaxy.visualization.data_providers.cigar module
Functions for working with SAM/BAM CIGAR representation.
- galaxy.visualization.data_providers.cigar.get_ref_based_read_seq_and_cigar(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]
Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.
galaxy.visualization.data_providers.genome module
Data providers for genome visualizations.
- galaxy.visualization.data_providers.genome.float_nan(n)[source]
Return None instead of NaN to pass jQuery 1.4’s strict JSON
- galaxy.visualization.data_providers.genome.get_bounds(reads, start_pos_index, end_pos_index)[source]
Returns the minimum and maximum position for a set of reads.
- class galaxy.visualization.data_providers.genome.FeatureLocationIndexDataProvider(converted_dataset)[source]
Bases:
BaseDataProvider
Reads/writes/queries feature location index (FLI) datasets.
- get_data(query)[source]
Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.GenomeDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
BaseDataProvider
Base class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.
- __init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Create basic data provider.
- write_data_to_file(regions, filename)[source]
Write data in region defined by chrom, start, and end to a file.
- has_data(chrom, start, end, **kwargs)[source]
Returns true if dataset has data in the specified genome window, false otherwise.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
Process data from an iterator to a format that can be provided to client.
- get_data(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]
Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- class galaxy.visualization.data_providers.genome.FilterableMixin[source]
Bases:
object
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.TabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
,FilterableMixin
Tabix index data provider for the Galaxy track browser.
- class galaxy.visualization.data_providers.genome.IntervalDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
Processes interval data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end
- write_data_to_file(regions, filename)[source]
Write data in region defined by chrom, start, and end to a file.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.IntervalTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
,IntervalDataProvider
Provides data from a BED file indexed via tabix.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
Processes BED data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end
- write_data_to_file(regions, filename)[source]
Write data in region defined by chrom, start, and end to a file.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BedTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
,BedDataProvider
Provides data from a BED file indexed via tabix.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.RawBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
BedDataProvider
Provide data from BED file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.VcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
Abstract class that processes VCF data from native format to payload format.
Payload format: An array of entries for each locus in the file. Each array has the following entries:
GUID (unused)
location (0-based)
reference base(s)
alternative base(s)
quality score
whether variant passed filter
sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype
allele counts for each alternative
- class galaxy.visualization.data_providers.genome.VcfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
,VcfDataProvider
Provides data from a VCF file indexed via tabix.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.RawVcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
VcfDataProvider
Provide data from VCF file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
,FilterableMixin
Provides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]
Returns an iterator that provides data in the region chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]
Returns a dict with the following attributes:
data - a list of reads with the format [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>] where <read_1> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] and <read_2> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] Field 7 is empty so that mapq scores' location matches that in single-end reads. For single-end reads, read has format: [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>] NOTE: read end and sequence data are not valid for reads outside of requested region and should not be used. max_low - lowest coordinate for the returned reads max_high - highest coordinate for the returned reads message - error/informative message
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.SamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None)[source]
Bases:
BamDataProvider
- __init__(converted_dataset=None, original_dataset=None, dependencies=None)[source]
Create SamDataProvider.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BBIDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
BBI data provider for the Galaxy track browser.
- has_data(chrom)[source]
Returns true if dataset has data in the specified genome window, false otherwise.
- get_data(chrom, start, end, start_val=0, max_vals=None, num_samples=1000, **kwargs)[source]
Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
dataset_type, data
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BigBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
BBIDataProvider
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.BigWigDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
BBIDataProvider
Provides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.IntervalIndexDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
,FilterableMixin
Interval index files used for GFF, Pileup files.
- write_data_to_file(regions, filename)[source]
Write data in region defined by chrom, start, and end to a file.
- get_iterator(data_file, chrom, start, end, **kwargs) Iterator[str] [source]
Returns an iterator for data in data_file in chrom:start-end
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
Process data from an iterator to a format that can be provided to client.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.RawGFFDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
Provide data from GFF file that has not been indexed.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end as well as a file offset.
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
Process data from an iterator to a format that can be provided to client.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.GtfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
Returns data from GTF datasets that are indexed via tabix.
- process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]
Process data from an iterator to a format that can be provided to client.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
Abstract class that processes ENCODEPeak data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
- get_iterator(data_file, chrom, start, end, **kwargs)[source]
Returns an iterator that provides data in the region chrom:start-end
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.ENCODEPeakTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
,ENCODEPeakDataProvider
Provides data from an ENCODEPeak dataset indexed via tabix.
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
GenomeDataProvider
- original_dataset: DatasetInstance
- class galaxy.visualization.data_providers.genome.ChromatinInteractionsTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]
Bases:
TabixDataProvider
,ChromatinInteractionsDataProvider
- get_iterator(data_file, chrom, start=0, end=9223372036854775807, interchromosomal=False, **kwargs) Iterator[str] [source]
- original_dataset: DatasetInstance