Warning
This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
galaxy.visualization.data_providers package¶
Galaxy visualization/visual analysis data providers.
Submodules¶
galaxy.visualization.data_providers.basic module¶
-
class
galaxy.visualization.data_providers.basic.BaseDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Bases:
objectBase class for data providers. Data providers (a) read and package data from datasets; and (b) write subsets of data to new datasets.
-
__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i values are returned.')[source]¶ Create basic data provider.
-
has_data(**kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator(**kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data(chrom, start, end, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data as specified by kwargs. start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
- dataset_type, data
-
-
class
galaxy.visualization.data_providers.basic.ColumnDataProvider(original_dataset, max_lines_returned=30000)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProviderData provider for columnar data
-
MAX_LINES_RETURNED= 30000¶
-
galaxy.visualization.data_providers.cigar module¶
Functions for working with SAM/BAM CIGAR representation.
-
galaxy.visualization.data_providers.cigar.get_ref_based_read_seq_and_cigar(read_seq, read_start, ref_seq, ref_seq_start, cigar)[source]¶ Returns a ( new_read_seq, new_cigar ) that can be used with reference sequence to reconstruct the read. The new read sequence includes only bases that cannot be recovered from the reference: mismatches and insertions (soft clipped bases are not included). The new cigar replaces Ms with =s and Xs because the M operation can denote a sequence match or mismatch.
galaxy.visualization.data_providers.genome module¶
Data providers for genome visualizations.
-
galaxy.visualization.data_providers.genome.float_nan(n)[source]¶ Return None instead of NaN to pass jQuery 1.4’s strict JSON
-
galaxy.visualization.data_providers.genome.get_bounds(reads, start_pos_index, end_pos_index)[source]¶ Returns the minimum and maximum position for a set of reads.
-
class
galaxy.visualization.data_providers.genome.FeatureLocationIndexDataProvider(converted_dataset)[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProviderReads/writes/queries feature location index (FLI) datasets.
-
class
galaxy.visualization.data_providers.genome.GenomeDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.basic.BaseDataProviderBase class for genome data providers. All genome providers use BED coordinate format (0-based, half-open coordinates) for both queries and returned data.
-
dataset_type= None¶ Mapping from column name to payload data; this mapping is used to create filters. Key is column name, value is a dict with mandatory key ‘index’ and optional key ‘name’. E.g. this defines column 4
col_name_data_attr_mapping = {4 : { index: 5, name: ‘Score’ } }
-
col_name_data_attr_mapping= {}¶
-
__init__(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶
-
write_data_to_file(regions, filename)[source]¶ Write data in region defined by chrom, start, and end to a file.
-
has_data(chrom, start, end, **kwargs)[source]¶ Returns true if dataset has data in the specified genome window, false otherwise.
-
get_iterator(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data(iterator, start_val=0, max_vals=None, **kwargs)[source]¶ Process data from an iterator to a format that can be provided to client.
-
get_data(chrom=None, low=None, high=None, start_val=0, max_vals=9223372036854775807, **kwargs)[source]¶ Returns data in region defined by chrom, start, and end. start_val and max_vals are used to denote the data to return: start_val is the first element to return and max_vals indicates the number of values to return.
- Return value must be a dictionary with the following attributes:
- dataset_type, data
-
-
class
galaxy.visualization.data_providers.genome.TabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider,galaxy.visualization.data_providers.genome.FilterableMixin-
dataset_type= 'tabix'¶ Tabix index data provider for the Galaxy track browser.
-
col_name_data_attr_mapping= {4: {'index': 4, 'name': 'Score'}}¶
-
-
class
galaxy.visualization.data_providers.genome.IntervalDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider-
dataset_type= 'interval_index'¶ Processes interval data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
-
class
galaxy.visualization.data_providers.genome.IntervalTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider,galaxy.visualization.data_providers.genome.IntervalDataProviderProvides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.BedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProviderProcesses BED data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
dataset_type= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.BedTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider,galaxy.visualization.data_providers.genome.BedDataProviderProvides data from a BED file indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.RawBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BedDataProviderProvide data from BED file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.VcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProviderAbstract class that processes VCF data from native format to payload format.
Payload format: An array of entries for each locus in the file. Each array has the following entries:
- GUID (unused)
- location (0-based)
- reference base(s)
- alternative base(s)
- quality score
- whether variant passed filter
- sample genotypes – a single string with samples separated by commas; empty string denotes the reference genotype
8-end: allele counts for each alternative
-
col_name_data_attr_mapping= {'Qual': {'index': 6, 'name': 'Qual'}}¶
-
dataset_type= 'variant'¶
-
class
galaxy.visualization.data_providers.genome.VcfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider,galaxy.visualization.data_providers.genome.VcfDataProviderProvides data from a VCF file indexed via tabix.
-
dataset_type= 'variant'¶
-
-
class
galaxy.visualization.data_providers.genome.RawVcfDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.VcfDataProviderProvide data from VCF file.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
class
galaxy.visualization.data_providers.genome.BamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider,galaxy.visualization.data_providers.genome.FilterableMixinProvides access to intervals from a sorted indexed BAM file. Coordinate data is reported in BED format: 0-based, half-open.
-
dataset_type= 'bai'¶
-
get_iterator(data_file, chrom, start, end, **kwargs)[source]¶ Returns an iterator that provides data in the region chrom:start-end
-
process_data(iterator, start_val=0, max_vals=None, ref_seq=None, iterator_type='nth', mean_depth=None, start=0, end=0, **kwargs)[source]¶ Returns a dict with the following attributes:
data - a list of reads with the format [<guid>, <start>, <end>, <name>, <read_1>, <read_2>, [empty], <mapq_scores>] where <read_1> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] and <read_2> has the format [<start>, <end>, <cigar>, <strand>, <read_seq>] Field 7 is empty so that mapq scores' location matches that in single-end reads. For single-end reads, read has format: [<guid>, <start>, <end>, <name>, <cigar>, <strand>, <seq>, <mapq_score>] NOTE: read end and sequence data are not valid for reads outside of requested region and should not be used. max_low - lowest coordinate for the returned reads max_high - highest coordinate for the returned reads message - error/informative message
-
-
class
galaxy.visualization.data_providers.genome.SamDataProvider(converted_dataset=None, original_dataset=None, dependencies=None)[source]¶ Bases:
galaxy.visualization.data_providers.genome.BamDataProvider-
dataset_type= 'bai'¶
-
-
class
galaxy.visualization.data_providers.genome.BBIDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProviderBBI data provider for the Galaxy track browser.
-
dataset_type= 'bigwig'¶
-
-
class
galaxy.visualization.data_providers.genome.BigBedDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProvider
-
class
galaxy.visualization.data_providers.genome.BigWigDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.BBIDataProviderProvides data from BigWig files; position data is reported in 1-based coordinate system, i.e. wiggle format.
-
class
galaxy.visualization.data_providers.genome.IntervalIndexDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider,galaxy.visualization.data_providers.genome.FilterableMixinInterval index files used for GFF, Pileup files.
-
col_name_data_attr_mapping= {4: {'index': 4, 'name': 'Score'}}¶
-
dataset_type= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.RawGFFDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProviderProvide data from GFF file that has not been indexed.
NOTE: this data provider does not use indices, and hence will be very slow for large datasets.
-
dataset_type= 'interval_index'¶
-
-
class
galaxy.visualization.data_providers.genome.GtfTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProviderReturns data from GTF datasets that are indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.ENCODEPeakDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProviderAbstract class that processes ENCODEPeak data from native format to payload format.
Payload format: [ uid (offset), start, end, name, strand, thick_start, thick_end, blocks ]
-
class
galaxy.visualization.data_providers.genome.ENCODEPeakTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider,galaxy.visualization.data_providers.genome.ENCODEPeakDataProviderProvides data from an ENCODEPeak dataset indexed via tabix.
-
class
galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.GenomeDataProvider
-
class
galaxy.visualization.data_providers.genome.ChromatinInteractionsTabixDataProvider(converted_dataset=None, original_dataset=None, dependencies=None, error_max_vals='Only the first %i %s in this region are displayed.')[source]¶ Bases:
galaxy.visualization.data_providers.genome.TabixDataProvider,galaxy.visualization.data_providers.genome.ChromatinInteractionsDataProvider