Warning

This document is for an old release of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

galaxy.datatypes.util package

Utilities for Galaxy datatypes.

Submodules

galaxy.datatypes.util.generic_util module

galaxy.datatypes.util.generic_util.count_special_lines(word, filename, invert=False)[source]

searching for special ‘words’ using the grep tool grep is used to speed up the searching and counting The number of hits is returned.

galaxy.datatypes.util.gff_util module

Provides utilities for working with GFF files.

class galaxy.datatypes.util.gff_util.GFFInterval(reader, fields, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False)[source]

Bases: bx.intervals.io.GenomicInterval

A GFF interval, including attributes. If file is strictly a GFF file, only attribute is ‘group.’

__init__(reader, fields, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False)[source]
copy()[source]
class galaxy.datatypes.util.gff_util.GFFFeature(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False, intervals=[], raw_size=0)[source]

Bases: galaxy.datatypes.util.gff_util.GFFInterval

A GFF feature, which can include multiple intervals.

__init__(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, default_strand='.', fix_strand=False, intervals=[], raw_size=0)[source]
name()[source]

Returns feature’s name.

copy()[source]
lines()[source]
class galaxy.datatypes.util.gff_util.GFFIntervalToBEDReaderWrapper(reader, **kwargs)[source]

Bases: bx.intervals.io.NiceReaderWrapper

Reader wrapper that reads GFF intervals/lines and automatically converts them to BED format.

parse_row(line)[source]
class galaxy.datatypes.util.gff_util.GFFReaderWrapper(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, fix_strand=False, convert_to_bed_coord=False, **kwargs)[source]

Bases: bx.intervals.io.NiceReaderWrapper

Reader wrapper for GFF files.

Wrapper has two major functions:

  1. group entries for GFF file (via group column), GFF3 (via id attribute), or GTF (via gene_id/transcript id);
  2. convert coordinates from GFF format–starting and ending coordinates are 1-based, closed–to the ‘traditional’/BED interval format–0 based, half-open. This is useful when using GFF files as inputs to tools that expect traditional interval format.
__init__(reader, chrom_col=0, feature_col=2, start_col=3, end_col=4, strand_col=6, score_col=5, fix_strand=False, convert_to_bed_coord=False, **kwargs)[source]
parse_row(line)[source]
galaxy.datatypes.util.gff_util.convert_bed_coords_to_gff(interval)[source]

Converts an interval object’s coordinates from BED format to GFF format. Accepted object types include GenomicInterval and list (where the first element in the list is the interval’s start, and the second element is the interval’s end).

galaxy.datatypes.util.gff_util.convert_gff_coords_to_bed(interval)[source]

Converts an interval object’s coordinates from GFF format to BED format. Accepted object types include GFFFeature, GenomicInterval, and list (where the first element in the list is the interval’s start, and the second element is the interval’s end).

galaxy.datatypes.util.gff_util.parse_gff_attributes(attr_str)[source]

Parses a GFF/GTF attribute string and returns a dictionary of name-value pairs. The general format for a GFF3 attributes string is

name1=value1;name2=value2

The general format for a GTF attribute string is

name1 “value1” ; name2 “value2”

The general format for a GFF attribute string is a single string that denotes the interval’s group; in this case, method returns a dictionary with a single key-value pair, and key name is ‘group’

galaxy.datatypes.util.gff_util.parse_gff3_attributes(attr_str)[source]

Parses a GFF3 attribute string and returns a dictionary of name-value pairs. The general format for a GFF3 attributes string is

name1=value1;name2=value2
galaxy.datatypes.util.gff_util.gff_attributes_to_str(attrs, gff_format)[source]

Convert GFF attributes to string. Supported formats are GFF3, GTF.

galaxy.datatypes.util.gff_util.read_unordered_gtf(iterator, strict=False)[source]

Returns GTF features found in an iterator. GTF lines need not be ordered or clustered for reader to work. Reader returns GFFFeature objects sorted by transcript_id, chrom, and start position.