Warning

This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

galaxy.datatypes.util package

Utilities for Galaxy datatypes.

Submodules

galaxy.datatypes.util.generic_util module

galaxy.datatypes.util.generic_util.count_special_lines(word, filename, invert=False)[source]

searching for special ‘words’ using the grep tool grep is used to speed up the searching and counting The number of hits is returned.

galaxy.datatypes.util.gff_util module

Provides utilities for working with GFF files.

class galaxy.datatypes.util.gff_util.GFFInterval(reader, fields, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False)[source]

Bases: bx.intervals.io.GenomicInterval

A GFF interval, including attributes. If file is strictly a GFF file, only attribute is ‘group.’

__init__(reader, fields, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False)[source]
copy()[source]
class galaxy.datatypes.util.gff_util.GFFFeature(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False, intervals=[], raw_size=0)[source]

Bases: galaxy.datatypes.util.gff_util.GFFInterval

A GFF feature, which can include multiple intervals.

__init__(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False, intervals=[], raw_size=0)[source]
name()[source]

Returns feature’s name.

copy()[source]
lines()[source]
class galaxy.datatypes.util.gff_util.GFFIntervalToBEDReaderWrapper(reader, **kwargs)[source]

Bases: bx.intervals.io.NiceReaderWrapper

Reader wrapper that reads GFF intervals/lines and automatically converts them to BED format.

parse_row(line)[source]
class galaxy.datatypes.util.gff_util.GFFReaderWrapper(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, fix_strand=False, convert_to_bed_coord=False, **kwargs)[source]

Bases: bx.intervals.io.NiceReaderWrapper

Reader wrapper for GFF files.

Wrapper has two major functions:

  1. group entries for GFF file (via group column), GFF3 (via id attribute), or GTF (via gene_id/transcript id);
  2. convert coordinates from GFF format–starting and ending coordinates are 1-based, closed–to the ‘traditional’/BED interval format–0 based, half-open. This is useful when using GFF files as inputs to tools that expect traditional interval format.
__init__(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, fix_strand=False, convert_to_bed_coord=False, **kwargs)[source]
parse_row(line)[source]
galaxy.datatypes.util.gff_util.convert_bed_coords_to_gff(interval)[source]

Converts an interval object’s coordinates from BED format to GFF format. Accepted object types include GenomicInterval and list (where the first element in the list is the interval’s start, and the second element is the interval’s end).

galaxy.datatypes.util.gff_util.convert_gff_coords_to_bed(interval)[source]

Converts an interval object’s coordinates from GFF format to BED format. Accepted object types include GFFFeature, GenomicInterval, and list (where the first element in the list is the interval’s start, and the second element is the interval’s end).

galaxy.datatypes.util.gff_util.parse_gff_attributes(attr_str)[source]

Parses a GFF/GTF attribute string and returns a dictionary of name-value pairs. The general format for a GFF3 attributes string is

name1=value1;name2=value2

The general format for a GTF attribute string is

name1 “value1” ; name2 “value2”

The general format for a GFF attribute string is a single string that denotes the interval’s group; in this case, method returns a dictionary with a single key-value pair, and key name is ‘group’

galaxy.datatypes.util.gff_util.parse_gff3_attributes(attr_str)[source]

Parses a GFF3 attribute string and returns a dictionary of name-value pairs. The general format for a GFF3 attributes string is

name1=value1;name2=value2
galaxy.datatypes.util.gff_util.gff_attributes_to_str(attrs, gff_format)[source]

Convert GFF attributes to string. Supported formats are GFF3, GTF.

galaxy.datatypes.util.gff_util.read_unordered_gtf(iterator, strict=False)[source]

Returns GTF features found in an iterator. GTF lines need not be ordered or clustered for reader to work. Reader returns GFFFeature objects sorted by transcript_id, chrom, and start position.

galaxy.datatypes.util.maf_utilities module

Provides wrappers and utilities for working with MAF files and alignments.

galaxy.datatypes.util.maf_utilities.maketrans()

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

galaxy.datatypes.util.maf_utilities.src_split(src)[source]
galaxy.datatypes.util.maf_utilities.src_merge(spec, chrom, contig=None)[source]
galaxy.datatypes.util.maf_utilities.get_species_in_block(block)[source]
galaxy.datatypes.util.maf_utilities.tool_fail(msg='Unknown Error')[source]
class galaxy.datatypes.util.maf_utilities.TempFileHandler(max_open_files=None, **kwds)[source]

Bases: object

Handles creating, opening, closing, and deleting of Temp files, with a maximum number of files open at one time.

DEFAULT_MAX_OPEN_FILES = 524288.0
__init__(max_open_files=None, **kwds)[source]
get_open_tempfile(index=None, **kwds)[source]
close(index, delete=False)[source]
flush(index)[source]
class galaxy.datatypes.util.maf_utilities.RegionAlignment(size, species=[], temp_file_handler=None)[source]

Bases: object

DNA_COMPLEMENT = {65: 84, 67: 71, 71: 67, 84: 65, 97: 116, 99: 103, 103: 99, 116: 97}
MAX_SEQUENCE_SIZE = 9223372036854775807
__init__(size, species=[], temp_file_handler=None)[source]
add_species(species)[source]
get_species_names(skip=[])[source]
get_sequence(species)[source]
get_sequence_reverse_complement(species)[source]
set_position(index, species, base)[source]
set_range(index, species, bases)[source]
flush(species=None)[source]
class galaxy.datatypes.util.maf_utilities.GenomicRegionAlignment(start, end, species=[], temp_file_handler=None)[source]

Bases: galaxy.datatypes.util.maf_utilities.RegionAlignment

__init__(start, end, species=[], temp_file_handler=None)[source]
class galaxy.datatypes.util.maf_utilities.SplicedAlignment(exon_starts, exon_ends, species=[], temp_file_handler=None)[source]

Bases: object

DNA_COMPLEMENT = {65: 84, 67: 71, 71: 67, 84: 65, 97: 116, 99: 103, 103: 99, 116: 97}
__init__(exon_starts, exon_ends, species=[], temp_file_handler=None)[source]
get_species_names(skip=[])[source]
get_sequence(species)[source]
get_sequence_reverse_complement(species)[source]
start
end
galaxy.datatypes.util.maf_utilities.maf_index_by_uid(maf_uid, index_location_file)[source]
galaxy.datatypes.util.maf_utilities.open_or_build_maf_index(maf_file, index_filename, species=None)[source]
galaxy.datatypes.util.maf_utilities.build_maf_index_species_chromosomes(filename, index_species=None)[source]
galaxy.datatypes.util.maf_utilities.build_maf_index(maf_file, species=None)[source]
galaxy.datatypes.util.maf_utilities.component_overlaps_region(c, region)[source]
galaxy.datatypes.util.maf_utilities.chop_block_by_region(block, src, region, species=None, mincols=0)[source]
galaxy.datatypes.util.maf_utilities.orient_block_by_region(block, src, region, force_strand=None)[source]
galaxy.datatypes.util.maf_utilities.get_oriented_chopped_blocks_for_region(index, src, region, species=None, mincols=0, force_strand=None)[source]
galaxy.datatypes.util.maf_utilities.get_oriented_chopped_blocks_with_index_offset_for_region(index, src, region, species=None, mincols=0, force_strand=None)[source]
galaxy.datatypes.util.maf_utilities.iter_blocks_split_by_src(block, src)[source]
galaxy.datatypes.util.maf_utilities.iter_blocks_split_by_species(block, species=None)[source]
galaxy.datatypes.util.maf_utilities.get_chopped_blocks_for_region(index, src, region, species=None, mincols=0)[source]
galaxy.datatypes.util.maf_utilities.get_chopped_blocks_with_index_offset_for_region(index, src, region, species=None, mincols=0)[source]
galaxy.datatypes.util.maf_utilities.get_region_alignment(index, primary_species, chrom, start, end, strand='+', species=None, mincols=0, overwrite_with_gaps=True, temp_file_handler=None)[source]
galaxy.datatypes.util.maf_utilities.reduce_block_by_primary_genome(block, species, chromosome, region_start)[source]
galaxy.datatypes.util.maf_utilities.fill_region_alignment(alignment, index, primary_species, chrom, start, end, strand='+', species=None, mincols=0, overwrite_with_gaps=True)[source]
galaxy.datatypes.util.maf_utilities.get_spliced_region_alignment(index, primary_species, chrom, starts, ends, strand='+', species=None, mincols=0, overwrite_with_gaps=True, temp_file_handler=None)[source]
galaxy.datatypes.util.maf_utilities.line_enumerator(lines, comment_start='#')[source]
galaxy.datatypes.util.maf_utilities.get_starts_ends_fields_from_gene_bed(line)[source]
galaxy.datatypes.util.maf_utilities.iter_components_by_src(block, src)[source]
galaxy.datatypes.util.maf_utilities.get_components_by_src(block, src)[source]
galaxy.datatypes.util.maf_utilities.iter_components_by_src_start(block, src)[source]
galaxy.datatypes.util.maf_utilities.get_components_by_src_start(block, src)[source]
galaxy.datatypes.util.maf_utilities.sort_block_components_by_block(block1, block2)[source]
galaxy.datatypes.util.maf_utilities.get_species_in_maf(maf_filename)[source]
galaxy.datatypes.util.maf_utilities.parse_species_option(species)[source]
galaxy.datatypes.util.maf_utilities.remove_temp_index_file(index_filename)[source]
galaxy.datatypes.util.maf_utilities.get_fasta_header(component, attributes={}, suffix=None)[source]
galaxy.datatypes.util.maf_utilities.get_attributes_from_fasta_header(header)[source]
galaxy.datatypes.util.maf_utilities.iter_fasta_alignment(filename)[source]