galaxy.datatypes package

Subpackages

Submodules

galaxy.datatypes.assembly module

velvet datatypes James E Johnson - University of Minnesota for velvet assembler tool in galaxy

class galaxy.datatypes.assembly.Amos(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing the AMOS assembly file

edam_data = 'data_0925'
edam_format = 'format_3582'
file_ext = 'afg'
sniff(filename)[source]

Determines whether the file is an amos assembly file format Example:

{CTG
iid:1
eid:1
seq:
CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}
}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.assembly.Sequences(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fasta

Class describing the Sequences file generated by velveth

edam_data = 'data_0925'
sniff(filename)[source]

Determines whether the file is a velveth produced fasta format The id line has 3 fields separated by tabs: sequence_name sequence_index category:

>SEQUENCE_0_length_35   1       1
GGATATAGGGCCAACCCAACTCAACGGCCTGTCTT
>SEQUENCE_1_length_35   2       1
CGACGAATGACAGGTCACGAATTTGGCGGGGATTA
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.assembly.Roadmaps(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing the Sequences file generated by velveth

edam_format = 'format_2561'
sniff(filename)[source]
Determines whether the file is a velveth produced RoadMap::
142858 21 1 ROADMAP 1 ROADMAP 2 ...
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.assembly.Velvet(**kwd)[source]

Bases: galaxy.datatypes.text.Html

composite_type = 'auto_primary_file'
allow_datatype_change = False
file_ext = 'velvet'
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]
regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

set_meta(dataset, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for velveth dataset, defaults to 'velvet', paired_end_reads (MetadataParameter): has paired-end reads, defaults to 'False', long_reads (MetadataParameter): has long reads, defaults to 'False', short2_reads (MetadataParameter): has 2nd short reads, defaults to 'False'

galaxy.datatypes.binary module

Binary classes

class galaxy.datatypes.binary.Binary(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Binary data

edam_format = 'format_2333'
sniffable_binary_formats = [{'ext': 'idat', 'type': 'idat', 'class': <class 'galaxy.datatypes.binary.Idat'>}, {'ext': 'cel', 'type': 'cel', 'class': <class 'galaxy.datatypes.binary.Cel'>}, {'ext': 'bam', 'type': 'bam', 'class': <class 'galaxy.datatypes.binary.Bam'>}, {'ext': 'cram', 'type': 'cram', 'class': <class 'galaxy.datatypes.binary.CRAM'>}, {'ext': 'bcf', 'type': 'bcf', 'class': <class 'galaxy.datatypes.binary.Bcf'>}, {'ext': 'h5', 'type': 'h5', 'class': <class 'galaxy.datatypes.binary.H5'>}, {'ext': 'sff', 'type': 'sff', 'class': <class 'galaxy.datatypes.binary.Sff'>}, {'ext': 'bigwig', 'type': 'bigwig', 'class': <class 'galaxy.datatypes.binary.BigWig'>}, {'ext': 'bigbed', 'type': 'bigbed', 'class': <class 'galaxy.datatypes.binary.BigBed'>}, {'ext': 'twobit', 'type': 'twobit', 'class': <class 'galaxy.datatypes.binary.TwoBit'>}, {'ext': 'gemini.sqlite', 'type': 'gemini.sqlite', 'class': <class 'galaxy.datatypes.binary.GeminiSQLite'>}, {'ext': 'idpdb', 'type': 'idpdb', 'class': <class 'galaxy.datatypes.binary.IdpDB'>}, {'ext': 'mz.sqlite', 'type': 'mz.sqlite', 'class': <class 'galaxy.datatypes.binary.MzSQlite'>}, {'ext': 'sqlite', 'type': 'sqlite', 'class': <class 'galaxy.datatypes.binary.SQlite'>}, {'ext': 'xlsx', 'type': 'xlsx', 'class': <class 'galaxy.datatypes.binary.Xlsx'>}, {'ext': 'sra', 'type': 'sra', 'class': <class 'galaxy.datatypes.binary.Sra'>}, {'ext': 'rdata', 'type': 'RData', 'class': <class 'galaxy.datatypes.binary.RData'>}, {'ext': 'oxlicg', 'type': 'oxli.countgraph', 'class': <class 'galaxy.datatypes.binary.OxliCountGraph'>}, {'ext': 'oxling', 'type': 'oxli.nodegraph', 'class': <class 'galaxy.datatypes.binary.OxliNodeGraph'>}, {'ext': 'oxlits', 'type': 'oxli.tagset', 'class': <class 'galaxy.datatypes.binary.OxliTagSet'>}, {'ext': 'oxlist', 'type': 'oxli.stoptags', 'class': <class 'galaxy.datatypes.binary.OxliStopTags'>}, {'ext': 'oxliss', 'type': 'oxli.subset', 'class': <class 'galaxy.datatypes.binary.OxliSubset'>}, {'ext': 'oxligl', 'type': 'oxli.graphlabels', 'class': <class 'galaxy.datatypes.binary.OxliGraphLabels'>}, {'ext': 'searchgui_archive', 'type': 'searchgui_archive', 'class': <class 'galaxy.datatypes.binary.SearchGuiArchive'>}, {'ext': 'netcdf', 'type': 'netcdf', 'class': <class 'galaxy.datatypes.binary.NetCDF'>}, {'ext': 'dmnd', 'type': 'dmnd', 'class': <class 'galaxy.datatypes.binary.DMND'>}, {'ext': 'pdf', 'type': 'pdf', 'class': <class 'galaxy.datatypes.images.Pdf'>}]
unsniffable_binary_formats = ['ab1', 'compressed_archive', 'zip', 'asn1-binary', 'scf']
static register_sniffable_binary_format(data_type, ext, type_class)[source]
static register_unsniffable_binary_ext(ext)[source]
static is_sniffable_binary(filename)[source]
static is_ext_unsniffable(ext)[source]
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

get_mime()[source]

Returns the mime type of the datatype

display_data(trans, dataset, preview=False, filename=None, to_ext=None, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Ab1(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an ab1 binary sequence file

file_ext = 'ab1'
edam_format = 'format_3000'
edam_data = 'data_0924'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Idat(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Binary data in idat format

file_ext = 'idat'
edam_format = 'format_2058'
edam_data = 'data_2603'
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Cel(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Binary data in CEL format.

file_ext = 'cel'
edam_format = 'format_1638'
edam_data = 'data_3110'
sniff(filename)[source]

Try to guess if the file is a CEL file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.CEL')
>>> Cel().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.mz5')
>>> Cel().sniff(fname)
False
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.CompressedArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'compressed_archive'
compressed = True
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.CompressedZipArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'zip'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.GenericAsn1Binary(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for generic ASN.1 binary format

file_ext = 'asn1-binary'
edam_format = 'format_1966'
edam_data = 'data_0849'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Bam(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a BAM binary file

edam_format = 'format_2572'
edam_data = 'data_0863'
file_ext = 'bam'
track_type = 'ReadTrack'
data_sources = {'index': 'bigwig', 'data': 'bai'}
static merge(split_files, output_file)[source]
dataset_content_needs_grooming(file_name)[source]

See if file_name is a sorted BAM file

groom_dataset_content(file_name)[source]

Ensures that the Bam file contents are sorted. This function is called on an output dataset after the content is initially generated.

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Creates the index for the BAM file.

sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
line_dataprovider(*args, **kwargs)[source]
regex_line_dataprovider(*args, **kwargs)[source]
column_dataprovider(*args, **kwargs)[source]
dict_dataprovider(*args, **kwargs)[source]
header_dataprovider(*args, **kwargs)[source]
id_seq_qual_dataprovider(*args, **kwargs)[source]
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
samtools_dataprovider(*args, **kwargs)[source]

Generic samtools interface - all options available through settings.

dataproviders = {'chunk64': <function chunk64_dataprovider>, 'id-seq-qual': <function id_seq_qual_dataprovider>, 'header': <function header_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'samtools': <function samtools_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'line': <function line_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', bam_index (FileParameter): BAM Index File, defaults to 'None', bam_version (MetadataParameter): BAM Version, defaults to 'None', sort_order (MetadataParameter): Sort Order, defaults to 'None', read_groups (MetadataParameter): Read Groups, defaults to '[]', reference_names (MetadataParameter): Chromosome Names, defaults to '[]', reference_lengths (MetadataParameter): Chromosome Lengths, defaults to '[]', bam_header (MetadataParameter): Dictionary of BAM Headers, defaults to '{}'
class galaxy.datatypes.binary.CRAM(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

file_ext = 'cram'
edam_format = 'format_3462'
edam_data = 'format_0863'
set_meta(dataset, overwrite=True, **kwd)[source]
get_cram_version(filename)[source]
set_index_file(dataset, index_file)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', cram_version (MetadataParameter): CRAM Version, defaults to 'None', cram_index (FileParameter): CRAM Index File, defaults to 'None'
class galaxy.datatypes.binary.Bcf(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a BCF file

edam_format = 'format_3020'
edam_data = 'data_3498'
file_ext = 'bcf'
sniff(filename)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Creates the index for the BCF file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', bcf_index (FileParameter): BCF Index File, defaults to 'None'
class galaxy.datatypes.binary.H5(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an HDF5 file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.mz5' )
>>> H5().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> H5().sniff( fname )
False
file_ext = 'h5'
edam_format = 'format_3590'
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Scf(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an scf binary sequence file

edam_format = 'format_1632'
edam_data = 'data_0924'
file_ext = 'scf'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Sff(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Standard Flowgram Format (SFF)

edam_format = 'format_3284'
edam_data = 'data_0924'
file_ext = 'sff'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.BigWig(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Accessing binary BigWig files from UCSC. The supplemental info in the paper has the binary details: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq351v1

edam_format = 'format_3006'
edam_data = 'data_3002'
track_type = 'LineTrack'
data_sources = {'data_standalone': 'bigwig'}
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.BigBed(**kwd)[source]

Bases: galaxy.datatypes.binary.BigWig

BigBed support from UCSC.

edam_format = 'format_3004'
edam_data = 'data_3002'
data_sources = {'data_standalone': 'bigbed'}
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.TwoBit(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a TwoBit format nucleotide file

edam_format = 'format_3009'
edam_data = 'data_0848'
file_ext = 'twobit'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.SQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a Sqlite database

file_ext = 'sqlite'
edam_format = 'format_3621'
init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sqlite_dataprovider(*args, **kwargs)[source]
sqlite_datatableprovider(*args, **kwargs)[source]
sqlite_datadictprovider(*args, **kwargs)[source]
dataproviders = {'chunk64': <function chunk64_dataprovider>, 'chunk': <function chunk_dataprovider>, 'sqlite': <function sqlite_dataprovider>, 'base': <function base_dataprovider>, 'sqlite-dict': <function sqlite_datadictprovider>, 'sqlite-table': <function sqlite_datatableprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}'
class galaxy.datatypes.binary.GeminiSQLite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Gemini Sqlite database

file_ext = 'gemini.sqlite'
edam_format = 'format_3622'
edam_data = 'data_3498'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}', gemini_version (MetadataParameter): Gemini Version, defaults to '0.10.0'
class galaxy.datatypes.binary.MzSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Proteomics Sqlite database

file_ext = 'mz.sqlite'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}'
class galaxy.datatypes.binary.IdpDB(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing an IDPicker 3 idpDB (sqlite) database

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.idpDB' )
>>> IdpDB().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> IdpDB().sniff( fname )
False
file_ext = 'idpdb'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}'
class galaxy.datatypes.binary.Xlsx(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for Excel 2007 (xlsx) files

file_ext = 'xlsx'
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.Sra(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Sequence Read Archive (SRA) datatype originally from mdshw5/sra-tools-galaxy

file_ext = 'sra'
sniff(filename)[source]

The first 8 bytes of any NCBI sra file is ‘NCBI.sra’, and the file is binary. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.RData(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Generic R Data file datatype implementation

file_ext = 'rdata'
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliBinary(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliCountGraph(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliCountGraph starts with “OXLI” + one byte version number + 8-bit binary ‘1’ Test file generated via:

load-into-counting.py --n_tables 1 --max-tablesize 1 \
    oxli_countgraph.oxlicg khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliCountGraph().sniff( fname )
False
>>> fname = get_test_fname( "oxli_countgraph.oxlicg" )
>>> OxliCountGraph().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliNodeGraph(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliNodeGraph starts with “OXLI” + one byte version number + 8-bit binary ‘2’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliNodeGraph().sniff( fname )
False
>>> fname = get_test_fname( "oxli_nodegraph.oxling" )
>>> OxliNodeGraph().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliTagSet(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliTagSet starts with “OXLI” + one byte version number + 8-bit binary ‘3’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2;
mv oxli_nodegraph.oxling.tagset oxli_tagset.oxlits

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliTagSet().sniff( fname )
False
>>> fname = get_test_fname( "oxli_tagset.oxlits" )
>>> OxliTagSet().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliStopTags(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliStopTags starts with “OXLI” + one byte version number + 8-bit binary ‘4’ Test file adapted from khmer 2.0’s “khmer/tests/test-data/goodversion-k32.stoptags”

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliStopTags().sniff( fname )
False
>>> fname = get_test_fname( "oxli_stoptags.oxlist" )
>>> OxliStopTags().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliSubset(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliSubset starts with “OXLI” + one byte version number + 8-bit binary ‘5’ Test file generated via:

load-graph.py -k 20 example tests/test-data/random-20-a.fa;
partition-graph.py example;
mv example.subset.0.pmap oxli_subset.oxliss

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliSubset().sniff( fname )
False
>>> fname = get_test_fname( "oxli_subset.oxliss" )
>>> OxliSubset().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.OxliGraphLabels(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliGraphLabels starts with “OXLI” + one byte version number + 8-bit binary ‘6’ Test file generated via:

python -c "from khmer import GraphLabels; \
    gl = GraphLabels(20, 1e7, 4); \
    gl.consume_fasta_and_tag_with_labels('tests/test-data/test-labels.fa'); \
    gl.save_labels_and_tags('oxli_graphlabels.oxligl')"

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> OxliGraphLabels().sniff( fname )
False
>>> fname = get_test_fname( "oxli_graphlabels.oxligl" )
>>> OxliGraphLabels().sniff( fname )
True
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.SearchGuiArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing a SearchGUI archive

file_ext = 'searchgui_archive'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', searchgui_version (MetadataParameter): SearchGui Version, defaults to '1.28.0', searchgui_major_version (MetadataParameter): SearchGui Major Version, defaults to '1'
class galaxy.datatypes.binary.NetCDF(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Binary data in netCDF format

file_ext = 'netcdf'
edam_format = 'format_3650'
edam_data = 'data_0943'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.binary.DMND(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an DMND file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( ‘diamond_db.dmnd’ ) >>> DMND().sniff( fname ) True >>> fname = get_test_fname( ‘interval.interval’ ) >>> DMND().sniff( fname ) False

file_ext = 'dmnd'
edam_format = ''
__init__(**kwd)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'

galaxy.datatypes.checkers module

Module proxies galaxy.util.checkers for backward compatibility.

External datatypes may make use of these functions.

galaxy.datatypes.checkers.check_binary(name, file_path=True)[source]
galaxy.datatypes.checkers.check_bz2(file_path)[source]
galaxy.datatypes.checkers.check_gzip(file_path)[source]
galaxy.datatypes.checkers.check_html(file_path, chunk=None)[source]
galaxy.datatypes.checkers.check_image(file_path)[source]

Simple wrapper around image_type to yield a True/False verdict

galaxy.datatypes.checkers.check_zip(file_path)[source]
galaxy.datatypes.checkers.is_gzip(file_path)[source]
galaxy.datatypes.checkers.is_bz2(file_path)[source]

galaxy.datatypes.chrominfo module

class galaxy.datatypes.chrominfo.ChromInfo(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'len'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chrom (ColumnParameter): Chrom column, defaults to '1', length (ColumnParameter): Length column, defaults to '2'

galaxy.datatypes.constructive_solid_geometry module

Constructive Solid Geometry file formats.

class galaxy.datatypes.constructive_solid_geometry.Ply(**kwd)[source]

Bases: object

The PLY format describes an object as a collection of vertices, faces and other elements, along with properties such as color and normal direction that can be attached to these elements. A PLY file contains the description of exactly one object.

subtype = ''
__init__(**kwd)[source]
sniff(filename)[source]

The structure of a typical PLY file: Header, Vertex List, Face List, (lists of other elements)

set_meta(dataset, **kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
class galaxy.datatypes.constructive_solid_geometry.PlyAscii(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Ply, galaxy.datatypes.data.Text

file_ext = 'plyascii'
subtype = 'ascii'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', file_format (MetadataParameter): File format, defaults to 'None', vertex (MetadataParameter): Vertex, defaults to 'None', face (MetadataParameter): Face, defaults to 'None', other_elements (MetadataParameter): Other elements, defaults to '[]'
class galaxy.datatypes.constructive_solid_geometry.PlyBinary(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Ply, galaxy.datatypes.binary.Binary

file_ext = 'plybinary'
subtype = 'binary'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', file_format (MetadataParameter): File format, defaults to 'None', vertex (MetadataParameter): Vertex, defaults to 'None', face (MetadataParameter): Face, defaults to 'None', other_elements (MetadataParameter): Other elements, defaults to '[]'
class galaxy.datatypes.constructive_solid_geometry.Vtk(**kwd)[source]

Bases: object

The Visualization Toolkit provides a number of source and writer objects to read and write popular data file formats. The Visualization Toolkit also provides some of its own file formats.

There are two different styles of file formats available in VTK. The simplest are the legacy, serial formats that are easy to read and write either by hand or programmatically. However, these formats are less flexible than the XML based file formats which support random access, parallel I/O, and portable data compression and are preferred to the serial VTK file formats whenever possible.

All keyword phrases are written in ASCII form whether the file is binary or ASCII. The binary section of the file (if in binary form) is the data proper; i.e., the numbers that define points coordinates, scalars, cell indices, and so forth.

Binary data must be placed into the file immediately after the newline (‘\n’) character from the previous ASCII keyword and parameter sequence.

TODO: only legacy formats are currently supported and support for XML formats should be added.

subtype = ''
__init__(**kwd)[source]
sniff(filename)[source]

VTK files can be either ASCII or binary, with two different styles of file formats: legacy or XML. We’ll assume if the file contains a valid VTK header, then it is a valid VTK file.

set_meta(dataset, **kwd)[source]
set_initial_metadata(i, line, dataset)[source]
set_structure_metadata(line, dataset, dataset_type)[source]

The fourth part of legacy VTK files is the dataset structure. The geometry part describes the geometry and topology of the dataset. This part begins with a line containing the keyword DATASET followed by a keyword describing the type of dataset. Then, depending upon the type of dataset, other keyword/ data combinations define the actual data.

get_blurb(dataset)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
class galaxy.datatypes.constructive_solid_geometry.VtkAscii(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Vtk, galaxy.datatypes.data.Text

file_ext = 'vtkascii'
subtype = 'ASCII'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', vtk_version (MetadataParameter): Vtk version, defaults to 'None', file_format (MetadataParameter): File format, defaults to 'None', dataset_type (MetadataParameter): Dataset type, defaults to 'None', dimensions (MetadataParameter): Dimensions, defaults to '[]', origin (MetadataParameter): Origin, defaults to '[]', spacing (MetadataParameter): Spacing, defaults to '[]', points (MetadataParameter): Points, defaults to 'None', vertices (MetadataParameter): Vertices, defaults to 'None', lines (MetadataParameter): Lines, defaults to 'None', polygons (MetadataParameter): Polygons, defaults to 'None', triangle_strips (MetadataParameter): Triangle strips, defaults to 'None', cells (MetadataParameter): Cells, defaults to 'None', field_names (MetadataParameter): Field names, defaults to '[]', field_components (MetadataParameter): Field names and components, defaults to '{}'
class galaxy.datatypes.constructive_solid_geometry.VtkBinary(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Vtk, galaxy.datatypes.binary.Binary

file_ext = 'vtkbinary'
subtype = 'BINARY'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', vtk_version (MetadataParameter): Vtk version, defaults to 'None', file_format (MetadataParameter): File format, defaults to 'None', dataset_type (MetadataParameter): Dataset type, defaults to 'None', dimensions (MetadataParameter): Dimensions, defaults to '[]', origin (MetadataParameter): Origin, defaults to '[]', spacing (MetadataParameter): Spacing, defaults to '[]', points (MetadataParameter): Points, defaults to 'None', vertices (MetadataParameter): Vertices, defaults to 'None', lines (MetadataParameter): Lines, defaults to 'None', polygons (MetadataParameter): Polygons, defaults to 'None', triangle_strips (MetadataParameter): Triangle strips, defaults to 'None', cells (MetadataParameter): Cells, defaults to 'None', field_names (MetadataParameter): Field names, defaults to '[]', field_components (MetadataParameter): Field names and components, defaults to '{}'
class galaxy.datatypes.constructive_solid_geometry.STL(**kwd)[source]

Bases: galaxy.datatypes.data.Data

file_ext = 'stl'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
galaxy.datatypes.constructive_solid_geometry.get_next_line(fh)[source]

galaxy.datatypes.coverage module

Coverage datatypes

class galaxy.datatypes.coverage.LastzCoverage(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'coverage'
get_track_window(dataset, data, start, end)[source]

Assumes we have a numpy file.

get_track_resolution(dataset, start, end)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', positionCol (ColumnParameter): Position column, defaults to '2', forwardCol (ColumnParameter): Forward or aggregate read column, defaults to '3', reverseCol (ColumnParameter): Optional reverse read column, defaults to 'None'

galaxy.datatypes.data module

class galaxy.datatypes.data.DataMeta(name, bases, dict_)[source]

Bases: abc.ABCMeta

Metaclass for Data class. Sets up metadata spec.

__init__(name, bases, dict_)[source]
class galaxy.datatypes.data.Data(**kwd)[source]

Bases: object

Base class for all datatypes. Implements basic interfaces as well as class methods for metadata.

>>> class DataTest( Data ):
...     MetadataElement( name="test" )
...
>>> DataTest.metadata_spec.test.name
'test'
>>> DataTest.metadata_spec.test.desc
'test'
>>> type( DataTest.metadata_spec.test.param )
<class 'galaxy.model.metadata.MetadataParameter'>
edam_data = 'data_0006'
edam_format = 'format_1915'
CHUNKABLE = False
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'

Dictionary of metadata fields for this datatype

copy_safe_peek = True
is_binary = True
allow_datatype_change = True
composite_type = None
primary_file_name = 'index'
track_type = None
data_sources = {}
__init__(**kwd)[source]

Initialize the datatype

supported_display_apps = {}
composite_files = {}
write_from_stream(dataset, stream)[source]

Writes data from a stream

set_raw_data(dataset, data)[source]

Saves the data on the disc

get_raw_data(dataset)[source]

Returns the full data. To stream it open the file_name and read/write as needed

dataset_content_needs_grooming(file_name)[source]

This function is called on an output dataset file after the content is initially generated.

groom_dataset_content(file_name)[source]

This function is called on an output dataset file if dataset_content_needs_grooming returns True.

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Unimplemented method, allows guessing of metadata from contents of file

missing_meta(dataset, check=[], skip=[])[source]

Checks for empty metadata values, Returns True if non-optional metadata is missing Specifying a list of ‘check’ values will only check those names provided; when used, optionality is ignored Specifying a list of ‘skip’ items will return True even when a named metadata value is missing

set_max_optional_metadata_filesize(max_value)[source]
get_max_optional_metadata_filesize()[source]
max_optional_metadata_filesize
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

display_peek(dataset)[source]

Create HTML table, used for displaying peek

display_data(trans, data, preview=False, filename=None, to_ext=None, **kwd)[source]

Old display method, for transition - though still used by API and test framework. Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.

TOOD: Document alternatives to overridding this method (data providers?).

display_name(dataset)[source]

Returns formatted html of dataset name

display_info(dataset)[source]

Returns formatted html of dataset info

validate(dataset)[source]

Unimplemented validate, return no exceptions

repair_methods(dataset)[source]

Unimplemented method, returns dict with method/option for repairing errors

get_mime()[source]

Returns the mime type of the datatype

add_display_app(app_id, label, file_function, links_function)[source]

Adds a display app to the datatype. app_id is a unique id label is the primary display label, e.g., display at ‘UCSC’ file_function is a string containing the name of the function that returns a properly formatted display links_function is a string containing the name of the function that returns a list of (link_name,link)

remove_display_app(app_id)[source]

Removes a display app from the datatype

clear_display_apps()[source]
add_display_application(display_application)[source]

New style display applications

get_display_application(key, default=None)[source]
get_display_applications_by_dataset(dataset, trans)[source]
get_display_types()[source]

Returns display types available

get_display_label(type)[source]

Returns primary label for display app

as_display_type(dataset, type, **kwd)[source]

Returns modified file contents for a particular display type

Returns a list of tuples of (name, link) for a particular display type. No check on ‘access’ permissions is done here - if you can view the dataset, you can also save it or send it to a destination outside of Galaxy, so Galaxy security restrictions do not apply anyway.

get_converter_types(original_dataset, datatypes_registry)[source]

Returns available converters by type for this dataset

find_conversion_destination(dataset, accepted_formats, datatypes_registry, **kwd)[source]

Returns ( target_ext, existing converted dataset )

convert_dataset(trans, original_dataset, target_type, return_output=False, visible=True, deps=None, target_context=None, history=None)[source]

This function adds a job to the queue to convert a dataset to another type. Returns a message about success/failure.

after_setting_metadata(dataset)[source]

This function is called on the dataset after metadata is set.

before_setting_metadata(dataset)[source]

This function is called on the dataset before metadata is set.

add_composite_file(name, **kwds)[source]
writable_files
get_composite_files(dataset=None)[source]
generate_primary_file(dataset=None)[source]
has_resolution
matches_any(target_datatypes)[source]

Check if this datatype is of any of the target_datatypes or is a subtype thereof.

static merge(split_files, output_file)[source]

Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

has_dataprovider(data_format)[source]

Returns True if data_format is available in dataproviders.

dataprovider(dataset, data_format, **settings)[source]

Base dataprovider factory for all datatypes that returns the proper provider for the given data_format or raises a NoProviderAvailable.

base_dataprovider(*args, **kwargs)[source]
chunk_dataprovider(*args, **kwargs)[source]
chunk64_dataprovider(*args, **kwargs)[source]
dataproviders = {'chunk64': <function chunk64_dataprovider>, 'base': <function base_dataprovider>, 'chunk': <function chunk_dataprovider>}
class galaxy.datatypes.data.Text(**kwd)[source]

Bases: galaxy.datatypes.data.Data

edam_format = 'format_2330'
file_ext = 'txt'
line_class = 'line'
write_from_stream(dataset, stream)[source]

Writes data from a stream

set_raw_data(dataset, data)[source]

Saves the data on the disc

get_mime()[source]

Returns the mime type of the datatype

set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

estimate_file_lines(dataset)[source]

Perform a rough estimate by extrapolating number of lines from a small read.

count_data_lines(dataset)[source]

Count the number of lines of data in dataset, skipping all blank lines and comments.

set_peek(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=None, line_wrap=True)[source]

Set the peek. This method is used by various subclasses of Text.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by line.

line_dataprovider(*args, **kwargs)[source]

Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.

regex_line_dataprovider(*args, **kwargs)[source]

Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.

dataproviders = {'chunk64': <function chunk64_dataprovider>, 'base': <function base_dataprovider>, 'line': <function line_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.data.GenericAsn1(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class for generic ASN.1 text format

edam_data = 'data_0849'
edam_format = 'format_1966'
file_ext = 'asn1'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.data.LineCount(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Dataset contains a single line with a single integer that denotes the line count for a related dataset. Used for custom builds.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.data.Newick(**kwd)[source]

Bases: galaxy.datatypes.data.Text

New Hampshire/Newick Format

edam_data = 'data_0872'
edam_format = 'format_1910'
file_ext = 'nhx'
__init__(**kwd)[source]

Initialize foobar datatype

init_meta(dataset, copy_from=None)[source]
sniff(filename)[source]

Returning false as the newick format is too general and cannot be sniffed.

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.data.Nexus(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Nexus format as used By Paup, Mr Bayes, etc

edam_data = 'data_0872'
edam_format = 'format_1912'
file_ext = 'nex'
__init__(**kwd)[source]

Initialize foobar datatype

init_meta(dataset, copy_from=None)[source]
sniff(filename)[source]

All Nexus Files Simply puts a ‘#NEXUS’ in its first line

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
galaxy.datatypes.data.get_test_fname(fname)[source]

Returns test data filename

galaxy.datatypes.data.get_file_peek(file_name, is_multi_byte=False, WIDTH=256, LINE_COUNT=5, skipchars=None, line_wrap=True)[source]

Returns the first LINE_COUNT lines wrapped to WIDTH

>>> fname = get_test_fname('4.bed')
>>> get_file_peek(fname, LINE_COUNT=1)
u'chr22\t30128507\t31828507\tuc003bnx.1_cds_2_0_chr22_29227_f\t0\t+\n'

galaxy.datatypes.genetics module

rgenetics datatypes Use at your peril Ross Lazarus for the rgenetics and galaxy projects

genome graphs datatypes derived from Interval datatypes genome graphs datasets have a header row with appropriate columnames The first column is always the marker - eg columname = rs, first row= rs12345 if the rows are snps subsequent row values are all numeric ! Will fail if any non numeric (eg ‘+’ or ‘NA’) values ross lazarus for rgenetics august 20 2007

class galaxy.datatypes.genetics.GenomeGraphs(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Tab delimited data containing a marker id and any number of numeric values

file_ext = 'gg'
__init__(**kwd)[source]

Initialize gg datatype, by adding UCSC display apps

set_meta(dataset, **kwd)[source]
as_ucsc_display_file(dataset, **kwd)[source]

Returns file

from the ever-helpful angie hinrichs angie@soe.ucsc.edu a genome graphs call looks like this

http://genome.ucsc.edu/cgi-bin/hgGenome?clade=mammal&org=Human&db=hg18&hgGenome_dataSetName=dname &hgGenome_dataSetDescription=test&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess &hgGenome_columnLabels=best%20guess&hgGenome_maxVal=&hgGenome_labelVals= &hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http://galaxy.esphealth.org/datasets/333/display/index &hgGenome_doSubmitUpload=submit

Galaxy gives this for an interval file

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr1:1-1000&hgt.customText= http%3A%2F%2Fgalaxy.esphealth.org%2Fdisplay_as%3Fid%3D339%26display_app%3Ducsc

make_html_table(dataset, skipchars=[])[source]

Create HTML table, used for displaying peek

validate(dataset)[source]

Validate a gg file - all numeric after header row

sniff(filename)[source]

Determines whether the file is in gg format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> GenomeGraphs().sniff( fname )
False
>>> fname = get_test_fname( '1.gg' )
>>> GenomeGraphs().sniff( fname )
True
get_mime()[source]

Returns the mime type of the datatype

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (MetadataParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', markerCol (ColumnParameter): Marker ID column, defaults to '1'
class galaxy.datatypes.genetics.rgTabList(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

for sampleid and for featureid lists of exclusions or inclusions in the clean tool featureid subsets on statistical criteria -> specialized display such as gg

file_ext = 'rgTList'
__init__(**kwd)[source]

Initialize featurelistt datatype

display_peek(dataset)[source]

Returns formated html of peek

get_mime()[source]

Returns the mime type of the datatype

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.genetics.rgSampleList(**kwd)[source]

Bases: galaxy.datatypes.genetics.rgTabList

for sampleid exclusions or inclusions in the clean tool output from QC eg excess het, gender error, ibd pair member,eigen outlier,excess mendel errors,... since they can be uploaded, should be flexible but they are persistent at least same infrastructure for expression?

file_ext = 'rgSList'
__init__(**kwd)[source]

Initialize samplelist datatype

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.genetics.rgFeatureList(**kwd)[source]

Bases: galaxy.datatypes.genetics.rgTabList

for featureid lists of exclusions or inclusions in the clean tool output from QC eg low maf, high missingness, bad hwe in controls, excess mendel errors,... featureid subsets on statistical criteria -> specialized display such as gg same infrastructure for expression?

file_ext = 'rgFList'
__init__(**kwd)[source]

Initialize featurelist datatype

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.genetics.Rgenetics(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class to use for rgenetics datatypes derived from html - composite datatype elements stored in extra files path

composite_type = 'auto_primary_file'
allow_datatype_change = False
file_ext = 'rgenetics'
generate_primary_file(dataset=None)[source]
regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

get_mime()[source]

Returns the mime type of the datatype

set_meta(dataset, **kwd)[source]

for lped/pbed eg

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.SNPMatrix(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

BioC SNPMatrix Rgenetics data collections

file_ext = 'snpmatrix'
set_peek(dataset, **kwd)[source]
sniff(filename)[source]

need to check the file header hex code

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Lped(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

linkage pedigree (ped,map) Rgenetics data collections

file_ext = 'lped'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Pphe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Plink phenotype file - header must have FID IID... Rgenetics data collections

file_ext = 'pphe'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Fphe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

fbat pedigree file - mad format with ! as first char on header row Rgenetics data collections

file_ext = 'fphe'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Phe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Phenotype file

file_ext = 'phe'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Fped(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

FBAT pedigree format - single file, map is header row of rs numbers. Strange. Rgenetics data collections

file_ext = 'fped'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Pbed(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Plink Binary compressed 2bit/geno Rgenetics data collections

file_ext = 'pbed'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.ldIndep(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

LD (a good measure of redundancy of information) depleted Plink Binary compressed 2bit/geno This is really a plink binary, but some tools work better with less redundancy so are constrained to these files

file_ext = 'ldreduced'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Eigenstratgeno(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Eigenstrat format - may be able to get rid of this if we move to shellfish Rgenetics data collections

file_ext = 'eigenstratgeno'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Eigenstratpca(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Eigenstrat PCA file for case control adjustment Rgenetics data collections

file_ext = 'eigenstratpca'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Snptest(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

BioC snptest Rgenetics data collections

file_ext = 'snptest'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'
class galaxy.datatypes.genetics.Pheno(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

base class for pheno files

file_ext = 'pheno'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.genetics.RexpBase(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class for BioC data structures in Galaxy must be constructed with the pheno data in place since that goes into the metadata for each instance

file_ext = 'rexpbase'
html_table = None
is_binary = True
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime()[source]

Returns the mime type of the datatype

get_phecols(phenolist=[], maxConc=20)[source]

sept 2009: cannot use whitespace to split - make a more complex structure here and adjust the methods that rely on this structure return interesting phenotype column names for an rexpression eset or affybatch to use in array subsetting and so on. Returns a data structure for a dynamic Galaxy select parameter. A column with only 1 value doesn’t change, so is not interesting for analysis. A column with a different value in every row is equivalent to a unique identifier so is also not interesting for anova or limma analysis - both these are removed after the concordance (count of unique terms) is constructed for each column. Then a complication - each remaining pair of columns is tested for redundancy - if two columns are always paired, then only one is needed :)

get_pheno(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh note that R is wierd and adds the row.name in the header so the columns are all wrong - unless you tell it not to. A file can be written as write.table(file=’foo.pheno’,pData(foo),sep=’ ‘,quote=F,row.names=F)

set_peek(dataset, **kwd)[source]

expects a .pheno file in the extra_files_dir - ugh note that R is weird and does not include the row.name in the header. why?

get_peek(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh

get_file_peek(filename)[source]

can’t really peek at a filename - need the extra_files_path and such?

regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, **kwd)[source]

NOTE we apply the tabular machinary to the phenodata extracted from a BioC eSet or affybatch.

make_html_table(pp='nothing supplied from peek\n')[source]

Create HTML table, used for displaying peek

display_peek(dataset)[source]

Returns formatted html of peek

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'
class galaxy.datatypes.genetics.Affybatch(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'affybatch'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'
class galaxy.datatypes.genetics.Eset(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'eset'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'
class galaxy.datatypes.genetics.MAlist(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'malist'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'

galaxy.datatypes.graph module

Graph content classes.

class galaxy.datatypes.graph.Xgmml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

XGMML graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

file_ext = 'xgmml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Returns false and the user must manually set.

static merge(split_files, output_file)[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

node_edge_dataprovider(*args, **kwargs)[source]
dataproviders = {'xml': <function xml_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'node-edge': <function node_edge_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.graph.Sif(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

SIF graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

First column: node id Second column: relationship type Third to Nth column: target ids for link

file_ext = 'sif'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Returns false and the user must manually set.

static merge(split_files, output_file)[source]
node_edge_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'node-edge': <function node_edge_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.graph.XGMMLGraphDataProvider(source, selector=None, max_depth=None, **kwargs)[source]

Bases: galaxy.datatypes.dataproviders.hierarchy.XMLDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings = {'offset': 'int', 'limit': 'int', 'max_depth': 'int', 'selector': 'str'}
class galaxy.datatypes.graph.SIFGraphDataProvider(source, indeces=None, column_count=None, column_types=None, parsers=None, parse_columns=True, deliminator='t', filters=None, **kwargs)[source]

Bases: galaxy.datatypes.dataproviders.column.ColumnarDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings = {'strip_newlines': 'bool', 'strip_lines': 'bool', 'comment_char': 'str', 'provide_blank': 'bool', 'regex_list': 'list:escaped', 'limit': 'int', 'filters': 'list:str', 'offset': 'int', 'column_types': 'list:str', 'parse_columns': 'bool', 'indeces': 'list:int', 'invert': 'bool', 'column_count': 'int', 'deliminator': 'str'}

galaxy.datatypes.images module

Image classes

class galaxy.datatypes.images.Image(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Class describing an image

edam_data = 'data_2968'
edam_format = 'format_3547'
file_ext = ''
__init__(**kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Determine if the file is in this format

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Jpg(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3579'
file_ext = 'jpg'
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Png(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3603'
file_ext = 'png'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Tiff(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3591'
file_ext = 'tiff'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Hamamatsu(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'vms'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Mirax(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'mrxs'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Sakura(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'svslide'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Nrrd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'nrrd'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Bmp(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3592'
file_ext = 'bmp'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Gif(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3467'
file_ext = 'gif'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Im(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3593'
file_ext = 'im'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Pcd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3594'
file_ext = 'pcd'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Pcx(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3595'
file_ext = 'pcx'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Ppm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3596'
file_ext = 'ppm'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Psd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3597'
file_ext = 'psd'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Xbm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3598'
file_ext = 'xbm'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Xpm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3599'
file_ext = 'xpm'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Rgb(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3600'
file_ext = 'rgb'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Pbm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3601'
file_ext = 'pbm'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Pgm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3602'
file_ext = 'pgm'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Eps(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3466'
file_ext = 'eps'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Rast(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3605'
file_ext = 'rast'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Pdf(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3508'
file_ext = 'pdf'
sniff(filename)[source]

Determine if the file is in pdf format.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
galaxy.datatypes.images.create_applet_tag_peek(class_name, archive, params)[source]
class galaxy.datatypes.images.Gmaj(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Class describing a GMAJ Applet

edam_format = 'format_3547'
file_ext = 'gmaj.zip'
copy_safe_peek = False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
get_mime()[source]

Returns the mime type of the datatype

sniff(filename)[source]

NOTE: the sniff.convert_newlines() call in the upload utility will keep Gmaj data types from being correctly sniffed, but the files can be uploaded (they’ll be sniffed as ‘txt’). This sniff function is here to provide an example of a sniffer for a zip file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.images.Html(**kwd)[source]

Bases: galaxy.datatypes.text.Html

Deprecated class. This class should not be used anymore, but the galaxy.datatypes.text:Html one. This is for backwards compatibilities only.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.images.Laj(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a LAJ Applet

file_ext = 'laj'
copy_safe_peek = False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'

galaxy.datatypes.interval module

Interval datatypes

class galaxy.datatypes.interval.Interval(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Tab delimited data containing interval information

edam_data = 'data_3002'
edam_format = 'format_3475'
file_ext = 'interval'
line_class = 'region'
track_type = 'FeatureTrack'
data_sources = {'index': 'bigwig', 'data': 'tabix'}

Add metadata elements

__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display apps

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, first_line_is_header=False, **kwd)[source]

Tries to guess from the line the location number of the column for the chromosome, region start-end and strand

displayable(dataset)[source]
get_estimated_display_viewport(dataset, chrom_col=None, start_col=None, end_col=None)[source]

Return a chrom, start, stop tuple for viewing a file.

as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents with only the bed data

display_peek(dataset)[source]

Returns formated html of peek

Generate links to UCSC genome browser sites based on the dbkey and content of dataset.

validate(dataset)[source]

Validate an interval file using the bx GenomicIntervalReader

repair_methods(dataset)[source]

Return options for removing errors along with a description

sniff(filename)[source]

Checks for ‘intervalness’

This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> Interval().sniff( fname )
False
>>> fname = get_test_fname( 'interval.interval' )
>>> Interval().sniff( fname )
True
get_track_window(dataset, data, start, end)[source]

Assumes the incoming track data is sorted already.

get_track_resolution(dataset, start, end)[source]
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
interval_dataprovider(*args, **kwargs)[source]
interval_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>, 'column': <function column_dataprovider>, 'interval-dict': <function interval_dict_dataprovider>, 'chunk': <function chunk_dataprovider>, 'interval': <function interval_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'
class galaxy.datatypes.interval.BedGraph(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Tab delimited chrom/start/end/datavalue dataset

edam_format = 'format_3583'
file_ext = 'bedgraph'
track_type = 'LineTrack'
data_sources = {'index': 'bigwig', 'data': 'bigwig'}
as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents as is with no modifications. TODO: this is a functional stub and will need to be enhanced moving forward to provide additional support for bedgraph.

get_estimated_display_viewport(dataset, chrom_col=0, start_col=1, end_col=2)[source]

Set viewport based on dataset’s first 100 lines.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'
class galaxy.datatypes.interval.Bed(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Tab delimited data in BED format

edam_format = 'format_3003'
file_ext = 'bed'
data_sources = {'index': 'bigwig', 'data': 'tabix', 'feature_search': 'fli'}
track_type = 'FeatureTrack'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts']

Add metadata elements

set_meta(dataset, overwrite=True, **kwd)[source]

Sets the metadata information for datasets previously determined to be in bed format.

as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents with only the bed data. If bed 6+, treat as interval.

sniff(filename)[source]

Checks for ‘bedness’

BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used. The data type of all 12 columns is: 1-str, 2-int, 3-int, 4-str, 5-int, 6-str, 7-int, 8-int, 9-int or list, 10-int, 11-list, 12-list

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_tab.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'interval1.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'complete.bed' )
>>> Bed().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'
class galaxy.datatypes.interval.BedStrict(**kwd)[source]

Bases: galaxy.datatypes.interval.Bed

Tab delimited data in strict BED format - no non-standard columns allowed

edam_format = 'format_3584'
file_ext = 'bedstrict'
allow_datatype_change = False
__init__(**kwd)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'
class galaxy.datatypes.interval.Bed6(**kwd)[source]

Bases: galaxy.datatypes.interval.BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6

edam_format = 'format_3585'
file_ext = 'bed6'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'
class galaxy.datatypes.interval.Bed12(**kwd)[source]

Bases: galaxy.datatypes.interval.BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12

edam_format = 'format_3586'
file_ext = 'bed12'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'
class galaxy.datatypes.interval.Gff(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular, galaxy.datatypes.interval._RemoteCallMixin

Tab delimited data in Gff format

edam_data = 'data_1255'
edam_format = 'format_2305'
file_ext = 'gff'
valid_gff_frame = ['.', '0', '1', '2']
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group']
data_sources = {'index': 'bigwig', 'data': 'interval_index', 'feature_search': 'fli'}
track_type = 'FeatureTrack'

Add metadata elements

__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_attribute_metadata(dataset)[source]

Sets metadata elements for dataset’s attributes.

set_meta(dataset, overwrite=True, **kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

get_estimated_display_viewport(dataset)[source]

Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 formats. This function should correctly handle both...

sniff(filename)[source]

Determines whether the file is in gff format

GFF lines have nine required fields that must be tab-separated.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format3

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'gff_version_3.gff' )
>>> Gff().sniff( fname )
False
>>> fname = get_test_fname( 'test.gff' )
>>> Gff().sniff( fname )
True
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
interval_dataprovider(*args, **kwargs)[source]
interval_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>, 'column': <function column_dataprovider>, 'interval-dict': <function interval_dict_dataprovider>, 'chunk': <function chunk_dataprovider>, 'interval': <function interval_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'int', 'str', 'str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'
class galaxy.datatypes.interval.Gff3(**kwd)[source]

Bases: galaxy.datatypes.interval.Gff

Tab delimited data in Gff3 format

edam_format = 'format_1975'
file_ext = 'gff3'
valid_gff3_strand = ['+', '-', '.', '?']
valid_gff3_phase = ['.', '0', '1', '2']
column_names = ['Seqid', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes']
track_type = 'FeatureTrack'

Add metadata elements

__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]

Determines whether the file is in GFF version 3 format

GFF 3 format:

  1. adds a mechanism for representing more than one level of hierarchical grouping of features and subfeatures.
  2. separates the ideas of group membership and feature name/id
  3. constrains the feature type field to be taken from a controlled vocabulary.
  4. allows a single feature, such as an exon, to belong to more than one group at a time.
  5. provides an explicit convention for pairwise alignments
  6. provides an explicit convention for features that occupy disjunct regions

The format consists of 9 columns, separated by tabs (NOT spaces).

Undefined fields are replaced with the ”.” character, as described in the original GFF spec.

For complete details see http://song.sourceforge.net/gff3.shtml

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.gff' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname('gff_version_3.gff')
>>> Gff3().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'float', 'str', 'int', 'list']', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'
class galaxy.datatypes.interval.Gtf(**kwd)[source]

Bases: galaxy.datatypes.interval.Gff

Tab delimited data in Gtf format

edam_format = 'format_2306'
file_ext = 'gtf'
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes']
track_type = 'FeatureTrack'

Add metadata elements

sniff(filename)[source]

Determines whether the file is in gtf format

GTF lines have nine required fields that must be tab-separated. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space. The attribute list must begin with the two mandatory attributes:

gene_id value - A globally unique identifier for the genomic source of the sequence. transcript_id value - A globally unique identifier for the predicted transcript.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format4

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.bed' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gff' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gtf().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'float', 'str', 'int', 'list']', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'
class galaxy.datatypes.interval.Wiggle(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular, galaxy.datatypes.interval._RemoteCallMixin

Tab delimited data in wiggle format

edam_format = 'format_3005'
file_ext = 'wig'
track_type = 'LineTrack'
data_sources = {'index': 'bigwig', 'data': 'bigwig'}
__init__(**kwd)[source]
get_estimated_display_viewport(dataset)[source]

Return a chrom, start, stop tuple for viewing a file.

display_peek(dataset)[source]

Returns formated html of peek

set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]

Determines wether the file is in wiggle format

The .wig format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line is the track data, which can be entered in several different formats.

The track definition line begins with the word ‘track’ followed by the track type. The track type with version is REQUIRED, and it currently must be wiggle_0. For example, track type=wiggle_0...

For complete details see http://genome.ucsc.edu/goldenPath/help/wiggle.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'interval1.bed' )
>>> Wiggle().sniff( fname )
False
>>> fname = get_test_fname( 'wiggle.wig' )
>>> Wiggle().sniff( fname )
True
get_track_window(dataset, data, start, end)[source]

Assumes we have a numpy file.

get_track_resolution(dataset, start, end)[source]
wiggle_dataprovider(*args, **kwargs)[source]
wiggle_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'wiggle-dict': <function wiggle_dict_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'wiggle': <function wiggle_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.interval.CustomTrack(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

UCSC CustomTrack

edam_format = 'format_3588'
file_ext = 'customtrack'
__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display app

set_meta(dataset, overwrite=True, **kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

get_estimated_display_viewport(dataset, chrom_col=None, start_col=None, end_col=None)[source]

Return a chrom, start, stop tuple for viewing a file.

sniff(filename)[source]

Determines whether the file is in customtrack format.

CustomTrack files are built within Galaxy and are basically bed or interval files with the first line looking something like this.

track name=”User Track” description=”User Supplied Track (from Galaxy)” color=0,0,0 visibility=1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'complete.bed' )
>>> CustomTrack().sniff( fname )
False
>>> fname = get_test_fname( 'ucsc.customtrack' )
>>> CustomTrack().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.interval.ENCODEPeak(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Human ENCODE peak format. There are both broad and narrow peak formats. Formats are very similar; narrow peak has an additional column, though.

Broad peak ( http://genome.ucsc.edu/FAQ/FAQformat#format13 ): This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+3 format.

Narrow peak http://genome.ucsc.edu/FAQ/FAQformat#format12 and : This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

edam_format = 'format_3612'
file_ext = 'encodepeak'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'SignalValue', 'pValue', 'qValue', 'Peak']
data_sources = {'index': 'bigwig', 'data': 'tabix'}

Add metadata elements

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'
class galaxy.datatypes.interval.ChromatinInteractions(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Chromatin interactions obtained from 3C/5C/Hi-C experiments.

file_ext = 'chrint'
track_type = 'DiagonalHeatmapTrack'
data_sources = {'index': 'bigwig', 'data': 'tabix'}
column_names = ['Chrom1', 'Start1', 'End1', 'Chrom2', 'Start2', 'End2', 'Value']

Add metadata elements

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '7', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None', chrom1Col (ColumnParameter): Chrom1 column, defaults to '1', start1Col (ColumnParameter): Start1 column, defaults to '2', end1Col (ColumnParameter): End1 column, defaults to '3', chrom2Col (ColumnParameter): Chrom2 column, defaults to '4', start2Col (ColumnParameter): Start2 column, defaults to '5', end2Col (ColumnParameter): End2 column, defaults to '6', valueCol (ColumnParameter): Value column, defaults to '7'
sniff(filename)[source]
class galaxy.datatypes.interval.ScIdx(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

ScIdx files are 1-based and consist of strand-specific coordinate counts. They always have 5 columns, and the first row is the column labels: ‘chrom’, ‘index’, ‘forward’, ‘reverse’, ‘value’. Each line following the first consists of data: chromosome name (type str), peak index (type int), Forward strand peak count (type int), Reverse strand peak count (type int) and value (type int). The value of the 5th ‘value’ column is the sum of the forward and reverse peak count values.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
file_ext = 'scidx'
__init__(**kwd)[source]

Initialize scidx datatype.

sniff(filename)[source]

Checks for ‘scidx-ness.’

galaxy.datatypes.metadata module

Expose the model metadata module as a datatype module also, allowing it to live in galaxy.model means the model module doesn’t have any dependencies on th datatypes module. This module will need to remain here for datatypes living in the tool shed so we might as well keep and use this interface from the datatypes module.

class galaxy.datatypes.metadata.Statement(target)[source]

Bases: object

This class inserts its target into a list in the surrounding class. the data.Data class has a metaclass which executes these statements. This is how we shove the metadata element spec into the class.

__init__(target)[source]
classmethod process(element)[source]
class galaxy.datatypes.metadata.MetadataCollection(parent)[source]

Bases: object

MetadataCollection is not a collection at all, but rather a proxy to the real metadata which is stored as a Dictionary. This class handles processing the metadata elements when they are set and retrieved, returning default values in cases when metadata is not set.

__init__(parent)[source]
element_is_set(name)[source]
from_JSON_dict(filename=None, path_rewriter=None, json_dict=None)[source]
get(key, default=None)[source]
get_html_by_name(name, **kwd)[source]
get_parent()[source]
items()[source]
make_dict_copy(to_copy)[source]

Makes a deep copy of input iterable to_copy according to self.spec

parent
remove_key(name)[source]
set_parent(parent)[source]
spec
to_JSON_dict(filename=None)[source]
class galaxy.datatypes.metadata.MetadataSpecCollection(dict=None)[source]

Bases: galaxy.util.odict.odict

A simple extension of dict which allows cleaner access to items and allows the values to be iterated over directly as if it were a list. append() is also implemented for simplicity and does not “append”.

__init__(dict=None)[source]
append(item)[source]
iter()[source]
class galaxy.datatypes.metadata.MetadataParameter(spec)[source]

Bases: object

__init__(spec)[source]
from_external_value(value, parent)[source]

Turns a value read from an external dict into its value to be pushed directly into the metadata dict.

get_html(value, context=None, other_values=None, **kwd)[source]

The “context” is simply the metadata collection/bunch holding this piece of metadata. This is passed in to allow for metadata to validate against each other (note: this could turn into a huge, recursive mess if not done with care). For example, a column assignment should validate against the number of columns in the dataset.

get_html_field(value=None, context=None, other_values=None, **kwd)[source]
make_copy(value, target_context=None, source_context=None)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

to_safe_string(value)[source]
to_string(value)[source]
unwrap(form_value)[source]

Turns a value into its storable form.

validate(value)[source]

Throw an exception if the value is invalid.

wrap(value, session)[source]

Turns a value into its usable form.

class galaxy.datatypes.metadata.MetadataElementSpec(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]

Bases: object

Defines a metadata element and adds it to the metadata_spec (which is a MetadataSpecCollection) of datatype.

__init__(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]
get(name, default=None)[source]
unwrap(value)[source]

Turns an incoming value into its storable form.

wrap(value, session)[source]

Turns a stored value into its usable form.

class galaxy.datatypes.metadata.SelectParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

__init__(spec)[source]
get_html(value, context=None, other_values=None, values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
classmethod marshal(value)[source]
to_string(value)[source]
wrap(value, session)[source]
class galaxy.datatypes.metadata.DBKeyParameter(spec)[source]

Bases: galaxy.model.metadata.SelectParameter

get_html(value=None, context=None, other_values=None, values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
class galaxy.datatypes.metadata.RangeParameter(spec)[source]

Bases: galaxy.model.metadata.SelectParameter

__init__(spec)[source]
get_html(value, context=None, other_values=None, values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
classmethod marshal(value)[source]
class galaxy.datatypes.metadata.ColumnParameter(spec)[source]

Bases: galaxy.model.metadata.RangeParameter

get_html(value, context=None, other_values=None, values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
class galaxy.datatypes.metadata.ColumnTypesParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.ListParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.DictParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_safe_string(value)[source]
to_string(value)[source]
class galaxy.datatypes.metadata.PythonObjectParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

get_html(value=None, context=None, other_values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, **kwd)[source]
classmethod marshal(value)[source]
to_string(value)[source]
class galaxy.datatypes.metadata.FileParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

from_external_value(value, parent, path_rewriter=None)[source]

Turns a value read from a external dict into its value to be pushed directly into the metadata dict.

get_html(value=None, context=None, other_values=None, **kwd)[source]
get_html_field(value=None, context=None, other_values=None, **kwd)[source]
make_copy(value, target_context, source_context)[source]
classmethod marshal(value)[source]
new_file(dataset=None, **kwds)[source]
to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

to_safe_string(value)[source]
to_string(value)[source]
wrap(value, session)[source]
class galaxy.datatypes.metadata.MetadataTempFile(**kwds)[source]

Bases: object

__init__(**kwds)[source]
classmethod cleanup_from_JSON_dict_filename(filename)[source]
file_name
classmethod from_JSON(json_dict)[source]
classmethod is_JSONified_value(value)[source]
tmp_dir = 'database/tmp'
to_JSON()[source]
class galaxy.datatypes.metadata.JobExternalOutputMetadataWrapper(job)[source]

Bases: object

Class with methods allowing set_meta() to be called externally to the Galaxy head. This class allows access to external metadata filenames for all outputs associated with a job. We will use JSON as the medium of exchange of information, except for the DatasetInstance object which will use pickle (in the future this could be JSONified as well)

__init__(job)[source]
cleanup_external_metadata(sa_session)[source]
external_metadata_set_successfully(dataset, sa_session)[source]
get_dataset_metadata_key(dataset)[source]
get_output_filenames_by_dataset(dataset, sa_session)[source]
invalidate_external_metadata(datasets, sa_session)[source]
set_job_runner_external_pid(pid, sa_session)[source]
setup_external_metadata(datasets, sa_session, exec_dir=None, tmp_dir=None, dataset_files_path=None, output_fnames=None, config_root=None, config_file=None, datatypes_config=None, job_metadata=None, compute_tmp_dir=None, include_command=True, max_metadata_value_size=0, kwds=None)[source]

galaxy.datatypes.msa module

class galaxy.datatypes.msa.Hmmer(**kwd)[source]

Bases: galaxy.datatypes.data.Text

edam_data = 'data_1364'
edam_format = 'format_1370'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.msa.Hmmer2(**kwd)[source]

Bases: galaxy.datatypes.msa.Hmmer

edam_format = 'format_3328'
file_ext = 'hmm2'
sniff(filename)[source]

HMMER2 files start with HMMER2.0

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.msa.Hmmer3(**kwd)[source]

Bases: galaxy.datatypes.msa.Hmmer

edam_format = 'format_3329'
file_ext = 'hmm3'
sniff(filename)[source]

HMMER3 files start with HMMER3/f

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.msa.HmmerPress(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for hmmpress database files.

file_ext = 'hmmpress'
allow_datatype_change = False
composite_type = 'basic'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text.

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.msa.Stockholm_1_0(**kwd)[source]

Bases: galaxy.datatypes.data.Text

edam_data = 'data_0863'
edam_format = 'format_1961'
file_ext = 'stockholm'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
set_meta(dataset, **kwd)[source]

Set the number of models in dataset.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by model records.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', number_of_models (MetadataParameter): Number of multiple alignments, defaults to '0'
class galaxy.datatypes.msa.MauveXmfa(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'xmfa'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', number_of_models (MetadataParameter): Number of alignmened sequences, defaults to '0'
set_meta(dataset, **kwd)[source]

galaxy.datatypes.ngsindex module

NGS indexes

class galaxy.datatypes.ngsindex.BowtieIndex(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class for BowtieIndex is subclassed by BowtieColorIndex and BowtieBaseIndex

is_binary = True
composite_type = 'auto_primary_file'
allow_datatype_change = False
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'unknown'
class galaxy.datatypes.ngsindex.BowtieColorIndex(**kwd)[source]

Bases: galaxy.datatypes.ngsindex.BowtieIndex

Bowtie color space index

file_ext = 'bowtie_color_index'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'color'
class galaxy.datatypes.ngsindex.BowtieBaseIndex(**kwd)[source]

Bases: galaxy.datatypes.ngsindex.BowtieIndex

Bowtie base space index

file_ext = 'bowtie_base_index'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'base'

galaxy.datatypes.proteomics module

Proteomics Datatypes

class galaxy.datatypes.proteomics.Wiff(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for wiff files.

edam_data = 'data_2536'
edam_format = 'format_3710'
file_ext = 'wiff'
allow_datatype_change = False
composite_type = 'auto_primary_file'
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.proteomics.PepXmlReport(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

pepxml converted to tabular report

edam_data = 'data_2536'
file_ext = 'pepxml.tsv'
__init__(**kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.proteomics.ProtXmlReport(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

protxml converted to tabular report

edam_data = 'data_2536'
file_ext = 'protxml.tsv'
comment_lines = 1
__init__(**kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.proteomics.ProteomicsXml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

An enhanced XML datatype used to reuse code across several proteomic/mass-spec datatypes.

edam_data = 'data_2536'
edam_format = 'format_2032'
sniff(filename)[source]

Determines whether the file is the correct XML type.

set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.PepXml(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

pepXML data

edam_format = 'format_3655'
file_ext = 'pepxml'
blurb = 'pepXML data'
root = 'msms_pipeline_analysis'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MzML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

mzML data

edam_format = 'format_3244'
file_ext = 'mzml'
blurb = 'mzML Mass Spectrometry data'
root = '(mzML|indexedmzML)'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.ProtXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

protXML data

file_ext = 'protxml'
blurb = 'prot XML Search Results'
root = 'protein_summary'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MzXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

mzXML data

edam_format = 'format_3654'
file_ext = 'mzxml'
blurb = 'mzXML Mass Spectrometry data'
root = 'mzXML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MzData(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

mzData data

edam_format = 'format_3245'
file_ext = 'mzdata'
blurb = 'mzData Mass Spectrometry data'
root = 'mzData'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MzIdentML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

edam_format = 'format_3247'
file_ext = 'mzid'
blurb = 'XML identified peptides and proteins.'
root = 'MzIdentML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.TraML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

edam_format = 'format_3246'
file_ext = 'traml'
blurb = 'TraML transition list'
root = 'TraML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MzQuantML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

edam_format = 'format_3248'
file_ext = 'mzq'
blurb = 'XML quantification data'
root = 'MzQuantML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.ConsensusXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

file_ext = 'consensusxml'
blurb = 'OpenMS multiple LC-MS map alignment file'
root = 'consensusXML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.FeatureXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

file_ext = 'featurexml'
blurb = 'OpenMS feature file'
root = 'featureMap'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.IdXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

file_ext = 'idxml'
blurb = 'OpenMS identification file'
root = 'IdXML'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.TandemXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

edam_format = 'format_3711'
file_ext = 'tandem'
blurb = 'X!Tandem search results file'
root = 'bioml'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.UniProtXML(**kwd)[source]

Bases: galaxy.datatypes.proteomics.ProteomicsXml

file_ext = 'uniprotxml'
blurb = 'UniProt Proteome file'
root = 'uniprot'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.Mgf(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Mascot Generic Format data

edam_data = 'data_2536'
edam_format = 'format_3651'
file_ext = 'mgf'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.MascotDat(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Mascot search results

edam_data = 'data_2536'
edam_format = 'format_3713'
file_ext = 'mascotdat'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.ThermoRAW(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a Thermo Finnigan binary RAW file

edam_data = 'data_2536'
edam_format = 'format_3712'
file_ext = 'raw'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.proteomics.Msp(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Output of NIST MS Search Program chemdata.nist.gov/mass-spc/ftp/mass-spc/PepLib.pdf

file_ext = 'msp'
static next_line_starts_with(contents, prefix)[source]
sniff(filename)[source]

Determines whether the file is a NIST MSP output file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.SPLibNoIndex(**kwd)[source]

Bases: galaxy.datatypes.data.Text

SPlib without index file

file_ext = 'splib_noindex'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.SPLib(**kwd)[source]

Bases: galaxy.datatypes.proteomics.Msp

SpectraST Spectral Library. Closely related to msp format

file_ext = 'splib'
composite_type = 'auto_primary_file'
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Determines whether the file is a SpectraST generated file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.Ms2(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'ms2'
sniff(filename)[source]

Determines whether the file is a valid ms2 file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.proteomics.XHunterAslFormat(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Annotated Spectra in the HLF format http://www.thegpm.org/HUNTER/format_2006_09_15.html

file_ext = 'hlf'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.proteomics.Sf3(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a Scaffold SF3 files

file_ext = 'sf3'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'

galaxy.datatypes.qualityscore module

Qualityscore class

class galaxy.datatypes.qualityscore.QualityScore(**kwd)[source]

Bases: galaxy.datatypes.data.Text

until we know more about quality score formats

edam_data = 'data_2048'
edam_format = 'format_3606'
file_ext = 'qual'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.qualityscore.QualityScoreSOLiD(**kwd)[source]

Bases: galaxy.datatypes.qualityscore.QualityScore

until we know more about quality score formats

edam_format = 'format_3610'
file_ext = 'qualsolid'
sniff(filename)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> QualityScoreSOLiD().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.qualsolid' )
>>> QualityScoreSOLiD().sniff( fname )
True
set_meta(dataset, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.qualityscore.QualityScore454(**kwd)[source]

Bases: galaxy.datatypes.qualityscore.QualityScore

until we know more about quality score formats

edam_format = 'format_3611'
file_ext = 'qual454'
sniff(filename)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> QualityScore454().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.qual454' )
>>> QualityScore454().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.qualityscore.QualityScoreSolexa(**kwd)[source]

Bases: galaxy.datatypes.qualityscore.QualityScore

until we know more about quality score formats

edam_format = 'format_3608'
file_ext = 'qualsolexa'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.qualityscore.QualityScoreIllumina(**kwd)[source]

Bases: galaxy.datatypes.qualityscore.QualityScore

until we know more about quality score formats

edam_format = 'format_3609'
file_ext = 'qualillumina'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'

galaxy.datatypes.registry module

Provides mapping between extensions and datatypes, mime-types, etc.

exception galaxy.datatypes.registry.ConfigurationError[source]

Bases: exceptions.Exception

class galaxy.datatypes.registry.Registry(config=None)[source]

Bases: object

__init__(config=None)[source]
load_datatypes(root_dir=None, config=None, deactivate=False, override=True)[source]

Parse a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom datatypes is being deactivated or uninstalled, so appropriate loaded datatypes will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting data types.

get_legacy_sites_by_build(site_type, build)[source]
get_display_sites(site_type)[source]
load_datatype_sniffers(root, deactivate=False, handling_proprietary_datatypes=False, override=False)[source]

Process the sniffers element from a parsed a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom sniffers is being deactivated or uninstalled, so appropriate loaded sniffers will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting sniffers.

get_datatype_class_by_name(name)[source]

Return the datatype class where the datatype’s type attribute (as defined in the datatype_conf.xml file) contains name.

get_available_tracks()[source]
get_mimetype_by_extension(ext, default='application/octet-stream')[source]

Returns a mimetype based on an extension

get_datatype_by_extension(ext)[source]

Returns a datatype based on an extension

change_datatype(data, ext)[source]
load_datatype_converters(toolbox, installed_repository_dict=None, deactivate=False, use_cached=False)[source]

If deactivate is False, add datatype converters from self.converters or self.proprietary_converters to the calling app’s toolbox. If deactivate is True, eliminates relevant converters from the calling app’s toolbox.

load_display_applications(app, installed_repository_dict=None, deactivate=False)[source]

If deactivate is False, add display applications from self.display_app_containers or self.proprietary_display_app_containers to appropriate datatypes. If deactivate is True, eliminates relevant display applications from appropriate datatypes.

reload_display_applications(display_application_ids=None)[source]

Reloads display applications: by id, or all if no ids provided Returns tuple( [reloaded_ids], [failed_ids] )

load_external_metadata_tool(toolbox)[source]

Adds a tool which is used to set external metadata

set_default_values()[source]
get_converters_by_datatype(ext)[source]

Returns available converters by source type

get_converter_by_target_type(source_ext, target_ext)[source]

Returns a converter based on source and target datatypes

find_conversion_destination_for_dataset_by_extensions(dataset, accepted_formats, converter_safe=True)[source]

Returns ( target_ext, existing converted dataset )

get_composite_extensions()[source]
get_upload_metadata_params(context, group, tool)[source]

Returns dict of case value:inputs for metadata conditional for upload tool

edam_formats
edam_data
integrated_datatypes_configs
to_xml_file()[source]
get_extension(elem)[source]

Function which returns the extension lowercased :param elem: :return extension:

galaxy.datatypes.sequence module

Sequence classes

class galaxy.datatypes.sequence.SequenceSplitLocations(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class storing information about a sequence file composed of multiple gzip files concatenated as one OR an uncompressed file. In the GZIP case, each sub-file’s location is stored in start and end.

The format of the file is JSON:

{ "sections" : [
        { "start" : "x", "end" : "y", "sequences" : "z" },
        ...
]}
file_ext = 'fqtoc'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.sequence.Sequence(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a sequence

edam_data = 'data_2044'

Add metadata elements

set_meta(dataset, **kwd)[source]

Set the number of sequences and the number of data lines in dataset.

set_peek(dataset, is_multi_byte=False)[source]
static get_sequences_per_file(total_sequences, split_params)[source]
classmethod do_slow_split(input_datasets, subdir_generator_function, split_params)[source]
classmethod do_fast_split(input_datasets, toc_file_datasets, subdir_generator_function, split_params)[source]
classmethod write_split_files(input_datasets, toc_file_datasets, subdir_generator_function, sequences_per_file)[source]
split(input_datasets, subdir_generator_function, split_params)[source]

Split a generic sequence file (not sensible or possible, see subclasses).

static get_split_commands_with_toc(input_name, output_name, toc_file, start_sequence, sequence_count)[source]

Uses a Table of Contents dict, parsed from an FQTOC file, to come up with a set of shell commands that will extract the parts necessary >>> three_sections=[dict(start=0, end=74, sequences=10), dict(start=74, end=148, sequences=10), dict(start=148, end=148+76, sequences=10)] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=10) [‘dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=1, sequence_count=5) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +5 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=20) [‘dd bs=1 skip=0 count=148 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=10) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘(dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=10, sequence_count=10) [‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=20) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’, ‘(dd bs=1 skip=148 count=76 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’]

static get_split_commands_sequential(is_compressed, input_name, output_name, start_sequence, sequence_count)[source]

Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, ‘./input.gz’, ‘./output.gz’, start_sequence=0, sequence_count=10) [‘zcat ”./input.gz” | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > ”./output.gz”’] >>> Sequence.get_split_commands_sequential(False, ‘./input.fastq’, ‘./output.fastq’, start_sequence=10, sequence_count=10) [‘tail -n +41 ”./input.fastq” 2> /dev/null | head -40 > ”./output.fastq”’]

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.Alignment(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing an alignment

edam_data = 'data_0863'

Add metadata elements

split(input_datasets, subdir_generator_function, split_params)[source]

Split a generic alignment file (not sensible or possible, see subclasses).

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', species (SelectParameter): Species, defaults to '[]'
class galaxy.datatypes.sequence.Fasta(**kwd)[source]

Bases: galaxy.datatypes.sequence.Sequence

Class representing a FASTA sequence

edam_format = 'format_1929'
file_ext = 'fasta'
sniff(filename)[source]

Determines whether the file is in fasta format

A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (“>”) symbol in the first column. All lines should be shorter than 80 characters

For complete details see http://www.ncbi.nlm.nih.gov/blast/fasta.shtml

Rules for sniffing as True:

We don’t care about line length (other than empty lines).

The first non-empty line must start with ‘>’ and the Very Next line.strip() must have sequence data and not be a header.

‘sequence data’ here is loosely defined as non-empty lines which do not start with ‘>’

This will cause Color Space FASTA (csfasta) to be detected as True (they are, after all, still FASTA files - they have a header line followed by sequence data)

Previously this method did some checking to determine if the sequence data had integers (presumably to differentiate between fasta and csfasta)

This should be done through sniff order, where csfasta (currently has a null sniff function) is detected for first (stricter definition) followed sometime after by fasta

We will only check that the first purported sequence is correctly formatted.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Fasta().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.fasta' )
>>> Fasta().sniff( fname )
True
classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split a FASTA file sequence by sequence.

Note that even if split_mode=”number_of_parts”, the actual number of sub-files produced may not match that requested by split_size.

If split_mode=”to_size” then split_size is treated as the number of FASTA records to put in each sub-file (not size in bytes).

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.csFasta(**kwd)[source]

Bases: galaxy.datatypes.sequence.Sequence

Class representing the SOLID Color-Space sequence ( csfasta )

edam_format = 'format_3589'
file_ext = 'csfasta'
sniff(filename)[source]
Color-space sequence:
>2_15_85_F3 T213021013012303002332212012112221222112212222
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> csFasta().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> csFasta().sniff( fname )
True
set_meta(dataset, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.BaseFastq(**kwd)[source]

Bases: galaxy.datatypes.sequence.Sequence

Base class for FastQ sequences

edam_format = 'format_1930'
file_ext = 'fastq'
set_meta(dataset, **kwd)[source]

Set the number of sequences and the number of data lines in dataset. FIXME: This does not properly handle line wrapping

sniff(filename)[source]

Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml

Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina
These differ in the representation of the quality scores
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.fastqsanger' )
>>> Fastq().sniff( fname )
True
>>> fname = get_test_fname( '2.fastqsanger' )
>>> Fastq().sniff( fname )
True
display_data(trans, dataset, preview=False, filename=None, to_ext=None, **kwd)[source]
classmethod split(input_datasets, subdir_generator_function, split_params)[source]

FASTQ files are split on cluster boundaries, in increments of 4 lines

static process_split_file(data)[source]

This is called in the context of an external process launched by a Task (possibly not on the Galaxy machine) to create the input files for the Task. The parameters: data - a dict containing the contents of the split file

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.Fastq(**kwd)[source]

Bases: galaxy.datatypes.sequence.BaseFastq

Class representing a generic FASTQ sequence

edam_format = 'format_1930'
file_ext = 'fastq'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSanger(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fastq

Class representing a FASTQ sequence ( the Sanger variant )

edam_format = 'format_1932'
file_ext = 'fastqsanger'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSolexa(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fastq

Class representing a FASTQ sequence ( the Solexa variant )

edam_format = 'format_1933'
file_ext = 'fastqsolexa'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqIllumina(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fastq

Class representing a FASTQ sequence ( the Illumina 1.3+ variant )

edam_format = 'format_1931'
file_ext = 'fastqillumina'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqCSSanger(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fastq

Class representing a Color Space FASTQ sequence ( e.g a SOLiD variant )

file_ext = 'fastqcssanger'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqGz(**kwd)[source]

Bases: galaxy.datatypes.sequence.BaseFastq, galaxy.datatypes.binary.Binary

Class representing a generic compressed FASTQ sequence

edam_format = 'format_1930'
file_ext = 'fastq.gz'
compressed = True
sniff(filename)[source]

Determines whether the file is in gzip-compressed FASTQ format

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSangerGz(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqGz

Class representing a compressed FASTQ sequence ( the Sanger variant )

edam_format = 'format_1932'
file_ext = 'fastqsanger.gz'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSolexaGz(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqGz

Class representing a compressed FASTQ sequence ( the Solexa variant )

edam_format = 'format_1933'
file_ext = 'fastqsolexa.gz'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqIlluminaGz(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqGz

Class representing a compressed FASTQ sequence ( the Illumina 1.3+ variant )

edam_format = 'format_1931'
file_ext = 'fastqillumina.gz'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqCSSangerGz(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqGz

Class representing a Color Space compressed FASTQ sequence ( e.g a SOLiD variant )

file_ext = 'fastqcssanger.gz'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqBz2(**kwd)[source]

Bases: galaxy.datatypes.sequence.BaseFastq, galaxy.datatypes.binary.Binary

Class representing a generic compressed FASTQ sequence

edam_format = 'format_1930'
file_ext = 'fastq.bz2'
compressed = True
sniff(filename)[source]

Determine whether the file is in bzip2-compressed FASTQ format

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSangerBz2(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqBz2

Class representing a compressed FASTQ sequence ( the Sanger variant )

edam_format = 'format_1932'
file_ext = 'fastqsanger.bz2'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqSolexaBz2(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqBz2

Class representing a compressed FASTQ sequence ( the Solexa variant )

edam_format = 'format_1933'
file_ext = 'fastqsolexa.bz2'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqIlluminaBz2(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqBz2

Class representing a compressed FASTQ sequence ( the Illumina 1.3+ variant )

edam_format = 'format_1931'
file_ext = 'fastqillumina.bz2'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.FastqCSSangerBz2(**kwd)[source]

Bases: galaxy.datatypes.sequence.FastqBz2

Class representing a Color Space compressed FASTQ sequence ( e.g a SOLiD variant )

file_ext = 'fastqcssanger.bz2'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
class galaxy.datatypes.sequence.Maf(**kwd)[source]

Bases: galaxy.datatypes.sequence.Alignment

Class describing a Maf alignment

edam_format = 'format_3008'
file_ext = 'maf'
init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Parses and sets species, chromosomes, index from MAF file.

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]

Returns formated html of peek

make_html_table(dataset, skipchars=[])[source]

Create HTML table, used for displaying peek

sniff(filename)[source]

Determines wether the file is in maf format

The .maf format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.

The first line of a .maf file begins with ##maf. This word is followed by white-space-separated variable=value pairs. There should be no white space surrounding the “=”.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format5

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Maf().sniff( fname )
True
>>> fname = get_test_fname( 'sequence.fasta' )
>>> Maf().sniff( fname )
False
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', species (SelectParameter): Species, defaults to '[]', blocks (MetadataParameter): Number of blocks, defaults to '0', species_chromosomes (FileParameter): Species Chromosomes, defaults to 'None', maf_index (FileParameter): MAF Index File, defaults to 'None'
class galaxy.datatypes.sequence.MafCustomTrack(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'mafcustomtrack'
set_meta(dataset, overwrite=True, **kwd)[source]

Parses and sets viewport metadata from MAF file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', vp_chromosome (MetadataParameter): Viewport Chromosome, defaults to 'chr1', vp_start (MetadataParameter): Viewport Start, defaults to '1', vp_end (MetadataParameter): Viewport End, defaults to '100'
class galaxy.datatypes.sequence.Axt(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing an axt alignment

edam_data = 'data_0863'
edam_format = 'format_3013'
file_ext = 'axt'
sniff(filename)[source]

Determines whether the file is in axt format

axt alignment files are produced from Blastz, an alignment tool available from Webb Miller’s lab at Penn State University.

Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.

The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.

The sequence lines contain the sequence of the primary assembly (line 2) and aligning assembly (line 3) with inserts. Repeats are indicated by lower-case letters.

For complete details see http://genome.ucsc.edu/goldenPath/help/axt.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'alignment.axt' )
>>> Axt().sniff( fname )
True
>>> fname = get_test_fname( 'alignment.lav' )
>>> Axt().sniff( fname )
False
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.sequence.Lav(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a LAV alignment

edam_data = 'data_0863'
edam_format = 'format_3014'
file_ext = 'lav'
sniff(filename)[source]

Determines whether the file is in lav format

LAV is an alignment format developed by Webb Miller’s group. It is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.

For complete details see http://www.bioperl.org/wiki/LAV_alignment_format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'alignment.lav' )
>>> Lav().sniff( fname )
True
>>> fname = get_test_fname( 'alignment.axt' )
>>> Lav().sniff( fname )
False
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.sequence.RNADotPlotMatrix(**kwd)[source]

Bases: galaxy.datatypes.data.Data

edam_format = 'format_3466'
file_ext = 'rna_eps'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Determine if the file is in RNA dot plot format.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.sequence.DotBracket(**kwd)[source]

Bases: galaxy.datatypes.sequence.Sequence

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'
edam_data = 'data_0880'
edam_format = 'format_1457'
file_ext = 'dbn'
sequence_regexp = <_sre.SRE_Pattern object>
structure_regexp = <_sre.SRE_Pattern object>
set_meta(dataset, **kwd)[source]

Set the number of sequences and the number of data lines in dataset.

sniff(filename)[source]

Galaxy Dbn (Dot-Bracket notation) rules:

  • The first non-empty line is a header line: no comment lines are allowed.

    • A header line starts with a ‘>’ symbol and continues with 0 or multiple symbols until the line ends.
  • The second non-empty line is a sequence line.

  • The third non-empty line is a structure (Dot-Bracket) line and only describes the 2D structure of the sequence above it.

    • A structure line must consist of the following chars: ‘.{}[]()’.
    • A structure line must be of the same length as the sequence line, and each char represents the structure of the nucleotide above it.
    • A structure line has no prefix and no suffix.
    • A nucleotide pairs with only 1 or 0 other nucleotides.
      • In a structure line, the number of ‘(‘ symbols equals the number of ‘)’ symbols, the number of ‘[‘ symbols equals the number of ‘]’ symbols and the number of ‘{‘ symbols equals the number of ‘}’ symbols.
  • The format accepts multiple entries per file, given that each entry is provided as three lines: the header, sequence and structure line.

    • Sniffing is only applied on the first entry.
  • Empty lines are allowed.

class galaxy.datatypes.sequence.Genbank(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class representing a Genbank sequence

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
edam_format = 'format_1936'
edam_data = 'data_0849'
file_ext = 'genbank'
sniff(filename)[source]

galaxy.datatypes.sniff module

File format detector

galaxy.datatypes.sniff.get_test_fname(fname)[source]

Returns test data filename

galaxy.datatypes.sniff.stream_to_open_named_file(stream, fd, filename, source_encoding=None, source_error='strict', target_encoding=None, target_error='strict')[source]

Writes a stream to the provided file descriptor, returns the file’s name and bool( is_multi_byte ). Closes file descriptor

galaxy.datatypes.sniff.stream_to_file(stream, suffix='', prefix='', dir=None, text=False, **kwd)[source]

Writes a stream to a temporary file, returns the temporary file’s name

galaxy.datatypes.sniff.check_newlines(fname, bytes_to_read=52428800)[source]

Determines if there are any non-POSIX newlines in the first number_of_bytes (by default, 50MB) of the file.

galaxy.datatypes.sniff.convert_newlines(fname, in_place=True, tmp_dir=None, tmp_prefix=None)[source]

Converts in place a file from universal line endings to Posix line endings.

>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("1 2\r3 4")
>>> convert_newlines(fname, tmp_prefix="gxtest", tmp_dir=tempfile.gettempdir())
(2, None)
>>> open(fname).read()
'1 2\n3 4\n'
galaxy.datatypes.sniff.sep2tabs(fname, in_place=True, patt='\\s+')[source]

Transforms in place a ‘sep’ separated file to a tab separated one

>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("1 2\n3 4\n")
>>> sep2tabs(fname)
(2, None)
>>> open(fname).read()
'1\t2\n3\t4\n'
galaxy.datatypes.sniff.convert_newlines_sep2tabs(fname, in_place=True, patt='\\s+', tmp_dir=None, tmp_prefix=None)[source]

Combines above methods: convert_newlines() and sep2tabs() so that files do not need to be read twice

>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("1 2\r3 4")
>>> convert_newlines_sep2tabs(fname, tmp_prefix="gxtest", tmp_dir=tempfile.gettempdir())
(2, None)
>>> open(fname).read()
'1\t2\n3\t4\n'
galaxy.datatypes.sniff.get_headers(fname, sep, count=60, is_multi_byte=False)[source]

Returns a list with the first ‘count’ lines split by ‘sep’

>>> fname = get_test_fname('complete.bed')
>>> get_headers(fname,'\t')
[['chr7', '127475281', '127491632', 'NM_000230', '0', '+', '127486022', '127488767', '0', '3', '29,172,3225,', '0,10713,13126,'], ['chr7', '127486011', '127488900', 'D49487', '0', '+', '127486022', '127488767', '0', '2', '155,490,', '0,2399']]
galaxy.datatypes.sniff.is_column_based(fname, sep='\t', skip=0, is_multi_byte=False)[source]

Checks whether the file is column based with respect to a separator (defaults to tab separator).

>>> fname = get_test_fname('test.gff')
>>> is_column_based(fname)
True
>>> fname = get_test_fname('test_tab.bed')
>>> is_column_based(fname)
True
>>> is_column_based(fname, sep=' ')
False
>>> fname = get_test_fname('test_space.txt')
>>> is_column_based(fname)
False
>>> is_column_based(fname, sep=' ')
True
>>> fname = get_test_fname('test_ensembl.tab')
>>> is_column_based(fname)
True
>>> fname = get_test_fname('test_tab1.tabular')
>>> is_column_based(fname, sep=' ', skip=0)
False
>>> fname = get_test_fname('test_tab1.tabular')
>>> is_column_based(fname)
True
galaxy.datatypes.sniff.guess_ext(fname, sniff_order, is_multi_byte=False)[source]

Returns an extension that can be used in the datatype factory to generate a data for the ‘fname’ file

>>> from galaxy.datatypes import registry
>>> sample_conf = os.path.join(util.galaxy_directory(), "config", "datatypes_conf.xml.sample")
>>> datatypes_registry = registry.Registry()
>>> datatypes_registry.load_datatypes(root_dir=util.galaxy_directory(), config=sample_conf)
>>> sniff_order = datatypes_registry.sniff_order
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml')
>>> guess_ext(fname, sniff_order)
'blastxml'
>>> fname = get_test_fname('interval.interval')
>>> guess_ext(fname, sniff_order)
'interval'
>>> fname = get_test_fname('interval1.bed')
>>> guess_ext(fname, sniff_order)
'bed'
>>> fname = get_test_fname('test_tab.bed')
>>> guess_ext(fname, sniff_order)
'bed'
>>> fname = get_test_fname('sequence.maf')
>>> guess_ext(fname, sniff_order)
'maf'
>>> fname = get_test_fname('sequence.fasta')
>>> guess_ext(fname, sniff_order)
'fasta'
>>> fname = get_test_fname('file.html')
>>> guess_ext(fname, sniff_order)
'html'
>>> fname = get_test_fname('test.gtf')
>>> guess_ext(fname, sniff_order)
'gtf'
>>> fname = get_test_fname('test.gff')
>>> guess_ext(fname, sniff_order)
'gff'
>>> fname = get_test_fname('gff_version_3.gff')
>>> guess_ext(fname, sniff_order)
'gff3'
>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("a\t2")
>>> guess_ext(fname, sniff_order)
'txt'
>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("a\t2\nc\t1\nd\t0")
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('temp.txt')
>>> open(fname, 'wt').write("a 1 2 x\nb 3 4 y\nc 5 6 z")
>>> guess_ext(fname, sniff_order)
'txt'
>>> fname = get_test_fname('test_tab1.tabular')
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('alignment.lav')
>>> guess_ext(fname, sniff_order)
'lav'
>>> fname = get_test_fname('1.sff')
>>> guess_ext(fname, sniff_order)
'sff'
>>> fname = get_test_fname('1.bam')
>>> guess_ext(fname, sniff_order)
'bam'
>>> fname = get_test_fname('3unsorted.bam')
>>> guess_ext(fname, sniff_order)
'bam'
>>> fname = get_test_fname('test.idpDB')
>>> guess_ext(fname, sniff_order)
'idpdb'
>>> fname = get_test_fname('test.mz5')
>>> guess_ext(fname, sniff_order)
'h5'
>>> fname = get_test_fname('issue1818.tabular')
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> guess_ext(fname, sniff_order)
'cml'
>>> fname = get_test_fname('q.fps')
>>> guess_ext(fname, sniff_order)
'fps'
>>> fname = get_test_fname('drugbank_drugs.inchi')
>>> guess_ext(fname, sniff_order)
'inchi'
>>> fname = get_test_fname('drugbank_drugs.mol2')
>>> guess_ext(fname, sniff_order)
'mol2'
>>> fname = get_test_fname('drugbank_drugs.sdf')
>>> guess_ext(fname, sniff_order)
'sdf'
>>> fname = get_test_fname('5e5z.pdb')
>>> guess_ext(fname, sniff_order)
'pdb'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.otu')
>>> guess_ext(fname, sniff_order)
'mothur.otu'
>>> fname = get_test_fname('1.gg')
>>> guess_ext(fname, sniff_order)
'gg'
>>> fname = get_test_fname('diamond_db.dmnd')
>>> guess_ext(fname, sniff_order)
'dmnd'
galaxy.datatypes.sniff.handle_compressed_file(filename, datatypes_registry, ext='auto')[source]
galaxy.datatypes.sniff.handle_uploaded_dataset_file(filename, datatypes_registry, ext='auto', is_multi_byte=False)[source]
exception galaxy.datatypes.sniff.InappropriateDatasetContentError[source]

Bases: exceptions.Exception

galaxy.datatypes.tabular module

Tabular datatype

class galaxy.datatypes.tabular.TabularData(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Generic tabular data

edam_format = 'format_3475'
CHUNKABLE = True

Add metadata elements

set_meta(dataset, **kwd)[source]
set_peek(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=None)[source]
displayable(dataset)[source]
get_chunk(trans, dataset, offset=0, ck_size=None)[source]
display_data(trans, dataset, preview=False, filename=None, to_ext=None, offset=None, ck_size=None, **kwd)[source]
make_html_table(dataset, **kwargs)[source]

Create HTML table, used for displaying peek

make_html_peek_header(dataset, skipchars=None, column_names=None, column_number_format='%s', column_parameter_alias=None, **kwargs)[source]
make_html_peek_rows(dataset, skipchars=None, **kwargs)[source]
display_peek(dataset)[source]

Returns formatted html of peek

column_dataprovider(*args, **kwargs)[source]

Uses column settings that are passed in

dataset_column_dataprovider(*args, **kwargs)[source]

Attempts to get column settings from dataset.metadata

dict_dataprovider(*args, **kwargs)[source]

Uses column settings that are passed in

dataset_dict_dataprovider(*args, **kwargs)[source]

Attempts to get column settings from dataset.metadata

dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.Tabular(**kwd)[source]

Bases: galaxy.datatypes.tabular.TabularData

Tab delimited data

set_meta(dataset, overwrite=True, skip=None, max_data_lines=100000, max_guess_type_data_lines=None, **kwd)[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
  2. If a tabular file has no data, it will have one column of type ‘str’.
  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
as_gbrowse_display_file(dataset, **kwd)[source]
as_ucsc_display_file(dataset, **kwd)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.Taxonomy(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

__init__(**kwd)[source]

Initialize taxonomy datatype

display_peek(dataset)[source]

Returns formated html of peek

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.Sam(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

edam_format = 'format_2573'
edam_data = 'data_0863'
file_ext = 'sam'
track_type = 'ReadTrack'
data_sources = {'index': 'bigwig', 'data': 'bam'}
__init__(**kwd)[source]

Initialize taxonomy datatype

display_peek(dataset)[source]

Returns formated html of peek

sniff(filename)[source]

Determines whether the file is in SAM format

A file in SAM format consists of lines of tab-separated data. The following header line may be the first line:

@QNAME  FLAG    RNAME   POS     MAPQ    CIGAR   MRNM    MPOS    ISIZE   SEQ     QUAL
or
@QNAME  FLAG    RNAME   POS     MAPQ    CIGAR   MRNM    MPOS    ISIZE   SEQ     QUAL    OPT

Data in the OPT column is optional and can consist of tab-separated data

For complete details see http://samtools.sourceforge.net/SAM1.pdf

Rules for sniffing as True:

There must be 11 or more columns of data on each line
Columns 2 (FLAG), 4(POS), 5 (MAPQ), 8 (MPOS), and 9 (ISIZE) must be numbers (9 can be negative)
We will only check that up to the first 5 alignments are correctly formatted.
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Sam().sniff( fname )
False
>>> fname = get_test_fname( '1.sam' )
>>> Sam().sniff( fname )
True
set_meta(dataset, overwrite=True, skip=None, max_data_lines=5, **kwd)[source]
static merge(split_files, output_file)[source]

Multiple SAM files may each have headers. Since the headers should all be the same, remove the headers from files 1-n, keeping them in the first file only

line_dataprovider(*args, **kwargs)[source]
regex_line_dataprovider(*args, **kwargs)[source]
column_dataprovider(*args, **kwargs)[source]
dataset_column_dataprovider(*args, **kwargs)[source]
dict_dataprovider(*args, **kwargs)[source]
dataset_dict_dataprovider(*args, **kwargs)[source]
header_dataprovider(*args, **kwargs)[source]
id_seq_qual_dataprovider(*args, **kwargs)[source]
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'id-seq-qual': <function id_seq_qual_dataprovider>, 'header': <function header_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.Pileup(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Tab delimited data in pileup (6- or 10-column) format

edam_format = 'format_3015'
file_ext = 'pileup'
line_class = 'genomic coordinate'
data_sources = {'data': 'tabix'}

Add metadata elements

init_meta(dataset, copy_from=None)[source]
display_peek(dataset)[source]

Returns formated html of peek

repair_methods(dataset)[source]

Return options for removing errors along with a description

sniff(filename)[source]

Checks for ‘pileup-ness’

There are two main types of pileup: 6-column and 10-column. For both, the first three and last two columns are the same. We only check the first three to allow for some personalization of the format.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'interval.interval' )
>>> Pileup().sniff( fname )
False
>>> fname = get_test_fname( '6col.pileup' )
>>> Pileup().sniff( fname )
True
>>> fname = get_test_fname( '10col.pileup' )
>>> Pileup().sniff( fname )
True
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '2', baseCol (ColumnParameter): Reference base column, defaults to '3'
class galaxy.datatypes.tabular.Vcf(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Variant Call Format for describing SNPs and other simple genome variations.

edam_format = 'format_3016'
track_type = 'VariantTrack'
data_sources = {'index': 'bigwig', 'data': 'tabix'}
file_ext = 'vcf'
column_names = ['Chrom', 'Pos', 'ID', 'Ref', 'Alt', 'Qual', 'Filter', 'Info', 'Format', 'data']
sniff(filename)[source]
display_peek(dataset)[source]

Returns formated html of peek

set_meta(dataset, **kwd)[source]
static merge(split_files, output_file)[source]
genomic_region_dataprovider(*args, **kwargs)[source]
genomic_region_dict_dataprovider(*args, **kwargs)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'genomic-region-dict': <function genomic_region_dict_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'genomic-region': <function genomic_region_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '10', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'int', 'str', 'str', 'str', 'int', 'str', 'list', 'str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[5]', sample_names (MetadataParameter): Sample names, defaults to '[]'
class galaxy.datatypes.tabular.Eland(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Support for the export.txt.gz file used by Illumina’s ELANDv2e aligner

file_ext = '_export.txt.gz'
__init__(**kwd)[source]

Initialize taxonomy datatype

make_html_table(dataset, skipchars=None)[source]

Create HTML table, used for displaying peek

sniff(filename)[source]

Determines whether the file is in ELAND export format

A file in ELAND export format consists of lines of tab-separated data. There is no header.

Rules for sniffing as True:

- There must be 22 columns on each line
- LANE, TILEm X, Y, INDEX, READ_NO, SEQ, QUAL, POSITION, *STRAND, FILT must be correct
- We will only check that up to the first 5 alignments are correctly formatted.
set_meta(dataset, overwrite=True, skip=None, max_data_lines=5, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comments, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' ', tiles (ListParameter): Set of tiles, defaults to '[]', reads (ListParameter): Set of reads, defaults to '[]', lanes (ListParameter): Set of lanes, defaults to '[]', barcodes (ListParameter): Set of barcodes, defaults to '[]'
class galaxy.datatypes.tabular.ElandMulti(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'elandmulti'
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.FeatureLocationIndex(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

An index that stores feature locations in tabular format.

file_ext = 'fli'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '2', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.BaseCSV(**kwd)[source]

Bases: galaxy.datatypes.tabular.TabularData

Delimiter-separated table data. This includes CSV, TSV and other dialects understood by the Python ‘csv’ module https://docs.python.org/2/library/csv.html Must be extended to define the dialect to use, strict_width and file_ext. See the Python module csv for documentation of dialect settings

delimiter = ','
peek_size = 1024
big_peek_size = 10240
is_int(column_text)[source]
is_float(column_text)[source]
guess_type(text)[source]
sniff(filename)[source]

Return True if if recognizes dialect and header.

set_meta(dataset, **kwd)[source]
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.CSV(**kwd)[source]

Bases: galaxy.datatypes.tabular.BaseCSV

Comma-separated table data. Only sniffs comma-separated files with at least 2 rows and 2 columns.

file_ext = 'csv'
dialect

alias of excel

strict_width = False
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.TSV(**kwd)[source]

Bases: galaxy.datatypes.tabular.BaseCSV

Tab-separated table data. Only sniff tab-separated files with at least 2 rows and 2 columns.

Note: Use of this datatype is optional as the general tabular datatype will handle most tab-separated files. This datatype is only required for datasets with tabs INSIDE double quotes.

This datatype currently does not support TSV files where the header has one column less to indicate first column is row names. This kind of file is handled fine by the tabular datatype.

file_ext = 'tsv'
dialect

alias of excel_tab

strict_width = True
dataproviders = {'dataset-column': <function dataset_column_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'column': <function column_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'dict': <function dict_dataprovider>, 'dataset-dict': <function dataset_dict_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
class galaxy.datatypes.tabular.ConnectivityTable(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

edam_format = 'format_3309'
file_ext = 'ct'
header_regexp = <_sre.SRE_Pattern object>
structure_regexp = <_sre.SRE_Pattern object at 0x30159a0>
__init__(**kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', delimiter (MetadataParameter): Data delimiter, defaults to ' '
set_meta(dataset, **kwd)[source]
sniff(filename)[source]

The ConnectivityTable (CT) is a file format used for describing RNA 2D structures by tools including MFOLD, UNAFOLD and the RNAStructure package. The tabular file format is defined as follows:

5   energy = -12.3  sequence name
1   G       0       2       0       1
2   A       1       3       0       2
3   A       2       4       0       3
4   A       3       5       0       4
5   C       4       6       1       5

The links given at the edam ontology page do not indicate what type of separator is used (space or tab) while different implementations exist. The implementation that uses spaces as separator (implemented in RNAStructure) is as follows:

10    ENERGY = -34.8  seqname
1 G       0    2    9    1
2 G       1    3    8    2
3 G       2    4    7    3
4 a       3    5    0    4
5 a       4    6    0    5
6 a       5    7    0    6
7 C       6    8    3    7
8 C       7    9    2    8
9 C       8   10    1    9
10 a       9    0    0   10
get_chunk(trans, dataset, chunk)[source]

galaxy.datatypes.text module

Clearing house for generic text datatypes that are not XML or tabular.

class galaxy.datatypes.text.Html(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing an html file

edam_format = 'format_2331'
file_ext = 'html'
set_peek(dataset, is_multi_byte=False)[source]
get_mime()[source]

Returns the mime type of the datatype

sniff(filename)[source]

Determines whether the file is in html format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'complete.bed' )
>>> Html().sniff( fname )
False
>>> fname = get_test_fname( 'file.html' )
>>> Html().sniff( fname )
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.Json(**kwd)[source]

Bases: galaxy.datatypes.data.Text

edam_format = 'format_3464'
file_ext = 'json'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Try to load the string with the json module. If successful it’s a json file.

display_peek(dataset)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.Ipynb(**kwd)[source]

Bases: galaxy.datatypes.text.Json

file_ext = 'ipynb'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Try to load the string with the json module. If successful it’s a json file.

display_data(trans, dataset, preview=False, filename=None, to_ext=None, **kwd)[source]
set_meta(dataset, **kwd)[source]

Set the number of models in dataset.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.Biom1(**kwd)[source]

Bases: galaxy.datatypes.text.Json

BIOM version 1.0 file format description http://biom-format.org/documentation/format_versions/biom-1.0.html

file_ext = 'biom1'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
set_meta(dataset, **kwd)[source]

Store metadata information from the BIOM file.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', table_rows (MetadataParameter): table_rows, defaults to '[]', table_matrix_element_type (MetadataParameter): table_matrix_element_type, defaults to '', table_format (MetadataParameter): table_format, defaults to '', table_generated_by (MetadataParameter): table_generated_by, defaults to '', table_matrix_type (MetadataParameter): table_matrix_type, defaults to '', table_shape (MetadataParameter): table_shape, defaults to '[]', table_format_url (MetadataParameter): table_format_url, defaults to '', table_date (MetadataParameter): table_date, defaults to '', table_type (MetadataParameter): table_type, defaults to '', table_id (MetadataParameter): table_id, defaults to 'None', table_columns (MetadataParameter): table_columns, defaults to '[]'
class galaxy.datatypes.text.Obo(**kwd)[source]

Bases: galaxy.datatypes.data.Text

OBO file format description http://www.geneontology.org/GO.format.obo-1_2.shtml

edam_data = 'data_0582'
edam_format = 'format_2549'
file_ext = 'obo'
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Try to guess the Obo filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.Arff(**kwd)[source]

Bases: galaxy.datatypes.data.Text

An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. http://weka.wikispaces.com/ARFF

edam_format = 'format_3581'
file_ext = 'arff'

Add metadata elements

set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Try to guess the Arff filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.

set_meta(dataset, **kwd)[source]

Trying to count the comment lines and the number of columns included. A typical ARFF data block looks like this: @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0'
class galaxy.datatypes.text.SnpEffDb(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a SnpEff genome build

edam_format = 'format_3624'
file_ext = 'snpeffdb'
__init__(**kwd)[source]
getSnpeffVersionFromFile(path)[source]
set_meta(dataset, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', genome_version (MetadataParameter): Genome Version, defaults to 'None', snpeff_version (MetadataParameter): SnpEff Version, defaults to 'SnpEff4.0', regulation (MetadataParameter): Regulation Names, defaults to '[]', annotation (MetadataParameter): Annotation Names, defaults to '[]'
class galaxy.datatypes.text.SnpSiftDbNSFP(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a dbNSFP database prepared fpr use by SnpSift dbnsfp

file_ext = 'snpsiftdbnsfp'
composite_type = 'auto_primary_file'
allow_datatype_change = False

## The dbNSFP file is a tabular file with 1 header line ## The first 4 columns are required to be: chrom pos ref alt ## These match columns 1,2,4,5 of the VCF file ## SnpSift requires the file to be block-gzipped and the indexed with samtools tabix ## Example: ## Compress using block-gzip algorithm bgzip dbNSFP2.3.txt ## Create tabix index tabix -s 1 -b 2 -e 2 dbNSFP2.3.txt.gz

__init__(**kwd)[source]
init_meta(dataset, copy_from=None)[source]
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

set_meta(dataset, overwrite=True, **kwd)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', reference_name (MetadataParameter): Reference Name, defaults to 'dbSNFP', bgzip (MetadataParameter): dbNSFP bgzip, defaults to 'None', index (MetadataParameter): Tabix Index File, defaults to 'None', annotation (MetadataParameter): Annotation Names, defaults to '[]'
class galaxy.datatypes.text.Smat(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'smat'
display_peek(dataset)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

The use of ESTScan implies the creation of scores matrices which reflect the codons preferences in the studied organisms. The ESTScan package includes scripts for generating these files. The output of these scripts consists of the matrices, one for each isochor, and which look like this:

FORMAT: hse_4is.conf CODING REGION 6 3 1 s C+G: 0 44 -1 0 2 -2 2 1 -8 0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_space.txt')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('test_tab.bed')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('1.smat')
>>> Smat().sniff(fname)
True
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.PlantTribesOrtho(**kwd)[source]

Bases: galaxy.datatypes.text.Html

PlantTribes sequences classified into precomputed, orthologous gene family clusters.

file_ext = 'ptortho'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.PlantTribesOrthoCodingSequence(**kwd)[source]

Bases: galaxy.datatypes.text.Html

PlantTribes sequences classified into precomputed, orthologous gene family clusters and corresponding coding sequences.

file_ext = 'ptorthocs'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.PlantTribesPhylogeneticTree(**kwd)[source]

Bases: galaxy.datatypes.text.Html

PlantTribes multiple sequence alignments and inferred maximum likelihood phylogenies for orthogroups.

file_ext = 'pttree'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.text.PlantTribesMultipleSequenceAlignment(**kwd)[source]

Bases: galaxy.datatypes.text.Html

PlantTribes multiple sequence alignments for orthogroups.

file_ext = 'ptalign'
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
set_peek(dataset, is_multi_byte=False)[source]

galaxy.datatypes.tracks module

Datatype classes for tracks/track views within galaxy.

class galaxy.datatypes.tracks.GeneTrack(**kwargs)[source]

Bases: galaxy.datatypes.binary.Binary

edam_data = 'data_3002'
edam_format = 'format_2919'
file_ext = 'genetrack'
__init__(**kwargs)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?'
class galaxy.datatypes.tracks.UCSCTrackHub(**kwd)[source]

Bases: galaxy.datatypes.text.Html

Datatype for UCSC TrackHub

file_ext = 'trackhub'
composite_type = 'auto_primary_file'
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'

galaxy.datatypes.xml module

XML format classes

class galaxy.datatypes.xml.GenericXml(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Base format class for any XML file.

edam_format = 'format_2332'
file_ext = 'xml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Determines whether the file is XML or not

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> GenericXml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> GenericXml().sniff( fname )
False
static merge(split_files, output_file)[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

xml_dataprovider(*args, **kwargs)[source]
dataproviders = {'xml': <function xml_dataprovider>, 'chunk64': <function chunk64_dataprovider>, 'chunk': <function chunk_dataprovider>, 'regex-line': <function regex_line_dataprovider>, 'base': <function base_dataprovider>, 'line': <function line_dataprovider>}
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.xml.MEMEXml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

MEME XML Output data

file_ext = 'memexml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.xml.CisML(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

CisML XML data

file_ext = 'cisml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.xml.Phyloxml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

Format for defining phyloxml data http://www.phyloxml.org/

edam_data = 'data_0872'
edam_format = 'format_3159'
file_ext = 'phyloxml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

“Checking for keyword - ‘phyloxml’ always in lowercase in the first few lines

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
class galaxy.datatypes.xml.Owl(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

Web Ontology Language OWL format description http://www.w3.org/TR/owl-ref/

edam_format = 'format_3262'
file_ext = 'owl'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'
sniff(filename)[source]

Checking for keyword - ‘<owl’ in the first 200 lines.