galaxy.datatypes package

Subpackages

Submodules

galaxy.datatypes.annotation module

class galaxy.datatypes.annotation.SnapHmm(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'snaphmm'
edam_data = 'data_1364'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff_prefix(file_prefix)[source]

SNAP model files start with zoeHMM

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edd1a7c50>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.annotation.Augustus(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing an Augustus prediction model

file_ext = 'augustus'
edam_data = 'data_0950'
compressed = True
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]

Augustus archives always contain the same files

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbaa8550>}

galaxy.datatypes.anvio module

Datatypes for Anvi’o https://github.com/merenlab/anvio

class galaxy.datatypes.anvio.AnvioComposite(**kwd)[source]

Bases: galaxy.datatypes.text.Html

Base class to use for Anvi’o composite datatypes. Generally consist of a sqlite database, plus optional additional files

file_ext = 'anvio_composite'
composite_type = 'auto_primary_file'
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime()[source]

Returns the mime type of the datatype

set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioComposite

Class for AnvioDB database files.

file_ext = 'anvio_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(*args, **kwd)[source]
set_meta(dataset, **kwd)[source]

Set the anvio_basename based upon actual extra_files_path contents.

metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5bf98>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioStructureDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Structure DB database files.

file_ext = 'anvio_structure_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edd4c60f0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioGenomesDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Genomes DB database files.

file_ext = 'anvio_genomes_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edd4bef98>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioContigsDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Contigs DB database files.

file_ext = 'anvio_contigs_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(*args, **kwd)[source]
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edd4bedd8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioProfileDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Profile DB database files.

file_ext = 'anvio_profile_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(*args, **kwd)[source]
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbafe240>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioPanDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Pan DB database files.

file_ext = 'anvio_pan_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbafe668>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.anvio.AnvioSamplesDB(*args, **kwd)[source]

Bases: galaxy.datatypes.anvio.AnvioDB

Class for Anvio Samples DB database files.

file_ext = 'anvio_samples_db'
composite_type = 'auto_primary_file'
allow_datatype_change = False
metadata_spec = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbafe8d0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd5b518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}

galaxy.datatypes.assembly module

velvet datatypes James E Johnson - University of Minnesota for velvet assembler tool in galaxy

class galaxy.datatypes.assembly.Amos(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing the AMOS assembly file

edam_data = 'data_0925'
edam_format = 'format_3582'
file_ext = 'afg'
sniff_prefix(file_prefix)[source]

Determines whether the file is an amos assembly file format Example:

{CTG
iid:1
eid:1
seq:
CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}
}
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda10e518>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.assembly.Sequences(**kwd)[source]

Bases: galaxy.datatypes.sequence.Fasta

Class describing the Sequences file generated by velveth

edam_data = 'data_0925'
file_ext = 'sequences'
sniff_prefix(file_prefix)[source]

Determines whether the file is a velveth produced fasta format The id line has 3 fields separated by tabs: sequence_name sequence_index category:

>SEQUENCE_0_length_35   1       1
GGATATAGGGCCAACCCAACTCAACGGCCTGTCTT
>SEQUENCE_1_length_35   2       1
CGACGAATGACAGGTCACGAATTTGGCGGGGATTA
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda10e780>}
sniff(filename)
class galaxy.datatypes.assembly.Roadmaps(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing the Sequences file generated by velveth

edam_format = 'format_2561'
file_ext = 'roadmaps'
sniff_prefix(file_prefix)[source]
Determines whether the file is a velveth produced RoadMap::
142858 21 1 ROADMAP 1 ROADMAP 2 …
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edabe14e0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.assembly.Velvet(**kwd)[source]

Bases: galaxy.datatypes.text.Html

composite_type = 'auto_primary_file'
allow_datatype_change = False
file_ext = 'velvet'
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]
regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

set_meta(dataset, **kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edabe15f8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'long_reads': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edabe16d8>, 'paired_end_reads': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edabe1668>, 'short2_reads': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edabe1748>}

galaxy.datatypes.binary module

Binary classes

class galaxy.datatypes.binary.Binary(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Binary data

edam_format = 'format_2333'
static register_sniffable_binary_format(data_type, ext, type_class)[source]

Deprecated method.

static register_unsniffable_binary_ext(ext)[source]

Deprecated method.

set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

get_mime()[source]

Returns the mime type of the datatype

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>}
class galaxy.datatypes.binary.Ab1(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an ab1 binary sequence file

file_ext = 'ab1'
edam_format = 'format_3000'
edam_data = 'data_0924'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885160>}
class galaxy.datatypes.binary.Idat(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Binary data in idat format

file_ext = 'idat'
edam_format = 'format_2058'
edam_data = 'data_2603'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8852b0>}
class galaxy.datatypes.binary.Cel(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Cel File format described at: http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html

file_ext = 'cel'
edam_format = 'format_1638'
edam_data = 'data_3110'
sniff(filename)[source]

Try to guess if the file is a Cel file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘affy_v_agcc.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘affy_v_3.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘affy_v_4.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘test.gal’) >>> Cel().sniff(fname) False

set_meta(dataset, **kwd)[source]

Set metadata for Cel file.

set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8854a8>}
class galaxy.datatypes.binary.MashSketch(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Mash Sketch file. Sketches are used by the MinHash algorithm to allow fast distance estimations with low storage and memory requirements. To make a sketch, each k-mer in a sequence is hashed, which creates a pseudo-random identifier. By sorting these identifiers (hashes), a small subset from the top of the sorted list can represent the entire sequence (these are min-hashes). The more similar another sequence is, the more min-hashes it is likely to share.

file_ext = 'msh'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8856a0>}
class galaxy.datatypes.binary.CompressedArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'compressed_archive'
compressed = True
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>}
class galaxy.datatypes.binary.DynamicCompressedArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

matches_any(target_datatypes)[source]

Treat two aspects of compressed datatypes separately.

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885a90>}
class galaxy.datatypes.binary.GzDynamicCompressedArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.DynamicCompressedArchive

compressed_format = 'gzip'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885c88>}
class galaxy.datatypes.binary.Bz2DynamicCompressedArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.DynamicCompressedArchive

compressed_format = 'bz2'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885e80>}
class galaxy.datatypes.binary.CompressedZipArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'zip'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f0b8>}
class galaxy.datatypes.binary.GenericAsn1Binary(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for generic ASN.1 binary format

file_ext = 'asn1-binary'
edam_format = 'format_1966'
edam_data = 'data_0849'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f2b0>}
class galaxy.datatypes.binary.BamNative(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing a BAM binary file that is not necessarily sorted

edam_format = 'format_2572'
edam_data = 'data_0863'
file_ext = 'unsorted.bam'
sort_flag = None
static merge(split_files, output_file)[source]

Merges BAM files

Parameters:
  • split_files – List of bam file paths to merge
  • output_file – Write merged bam file to this location
init_meta(dataset, copy_from=None)[source]
sniff(filename)[source]
classmethod is_bam(filename)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
to_archive(trans, dataset, name='')[source]
groom_dataset_content(file_name)[source]

Ensures that the BAM file contents are coordinate-sorted. This function is called on an output dataset after the content is initially generated.

get_chunk(trans, dataset, offset=0, ck_size=None)[source]
display_data(trans, dataset, preview=False, filename=None, to_ext=None, offset=None, ck_size=None, **kwd)[source]
validate(dataset, **kwd)[source]
metadata_spec = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f780>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f550>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f8d0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f860>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f7f0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f630>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f710>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f6a0>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f5c0>}
class galaxy.datatypes.binary.Bam(**kwd)[source]

Bases: galaxy.datatypes.binary.BamNative

Class describing a BAM binary file

edam_format = 'format_2572'
edam_data = 'data_0863'
file_ext = 'bam'
track_type = 'ReadTrack'
data_sources = {'data': 'bai', 'index': 'bigwig'}
get_index_flag(file_name)[source]

Return pysam flag for bai index (default) or csi index (contig size > (2**29 - 1) )

dataset_content_needs_grooming(file_name)[source]

Check if file_name is a coordinate-sorted BAM file

set_meta(dataset, overwrite=True, **kwd)[source]
sniff(file_name)[source]
line_dataprovider(dataset, **settings)[source]
regex_line_dataprovider(dataset, **settings)[source]
column_dataprovider(dataset, **settings)[source]
dict_dataprovider(dataset, **settings)[source]
header_dataprovider(dataset, **settings)[source]
id_seq_qual_dataprovider(dataset, **settings)[source]
genomic_region_dataprovider(dataset, **settings)[source]
genomic_region_dict_dataprovider(dataset, **settings)[source]
samtools_dataprovider(dataset, **settings)[source]

Generic samtools interface - all options available through settings.

dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'column': <function Bam.column_dataprovider at 0x7f1efa8940d0>, 'dict': <function Bam.dict_dataprovider at 0x7f1efa894268>, 'genomic-region': <function Bam.genomic_region_dataprovider at 0x7f1efa894730>, 'genomic-region-dict': <function Bam.genomic_region_dict_dataprovider at 0x7f1efa8948c8>, 'header': <function Bam.header_dataprovider at 0x7f1efa894400>, 'id-seq-qual': <function Bam.id_seq_qual_dataprovider at 0x7f1efa894598>, 'line': <function Bam.line_dataprovider at 0x7f1efa890d08>, 'regex-line': <function Bam.regex_line_dataprovider at 0x7f1efa890ea0>, 'samtools': <function Bam.samtools_dataprovider at 0x7f1efa894a60>}
metadata_spec = {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88ff98>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f780>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88ff28>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f550>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f8d0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f860>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f7f0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f630>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f710>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f6a0>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f5c0>}
class galaxy.datatypes.binary.ProBam(**kwd)[source]

Bases: galaxy.datatypes.binary.Bam

Class describing a BAM binary file - extended for proteomics data

edam_format = 'format_3826'
edam_data = 'data_0863'
file_ext = 'probam'
metadata_spec = {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897240>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f780>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8971d0>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f550>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f8d0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f860>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f7f0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f630>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f710>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f6a0>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa88f5c0>}
class galaxy.datatypes.binary.BamInputSorted(**kwd)[source]

Bases: galaxy.datatypes.binary.BamNative

sort_flag = '-n'
file_ext = 'qname_input_sorted.bam'

A class for BAM files that can formally be unsorted or queryname sorted. Alignments are either ordered based on the order with which the queries appear when producing the alignment, or ordered by their queryname. This notaby keeps alignments produced by paired end sequencing adjacent.

sniff(file_name)[source]
dataset_content_needs_grooming(file_name)[source]

Groom if the file is coordinate sorted

metadata_spec = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897668>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897438>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8977b8>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897748>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8976d8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897518>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8975f8>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897588>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8974a8>}
class galaxy.datatypes.binary.BamQuerynameSorted(**kwd)[source]

Bases: galaxy.datatypes.binary.BamInputSorted

A class for queryname sorted BAM files.

sort_flag = '-n'
file_ext = 'qname_sorted.bam'
sniff(file_name)[source]
dataset_content_needs_grooming(file_name)[source]

Check if file_name is a queryname-sorted BAM file

metadata_spec = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897be0>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8979b0>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897d30>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897cc0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897c50>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897a90>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897b70>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897b00>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897a20>}
class galaxy.datatypes.binary.CRAM(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

file_ext = 'cram'
edam_format = 'format_3462'
edam_data = 'format_0863'
set_meta(dataset, overwrite=True, **kwd)[source]
get_cram_version(filename)[source]
set_index_file(dataset, index_file)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]
metadata_spec = {'cram_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897f98>, 'cram_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa897f28>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>}
class galaxy.datatypes.binary.BaseBcf(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

edam_format = 'format_3020'
edam_data = 'data_3498'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c198>}
class galaxy.datatypes.binary.Bcf(**kwd)[source]

Bases: galaxy.datatypes.binary.BaseBcf

Class describing a (BGZF-compressed) BCF file

file_ext = 'bcf'
sniff(filename)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Creates the index for the BCF file.

metadata_spec = {'bcf_index': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c390>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c198>}
class galaxy.datatypes.binary.BcfUncompressed(**kwd)[source]

Bases: galaxy.datatypes.binary.BaseBcf

Class describing an uncompressed BCF file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.bcf_uncompressed')
>>> BcfUncompressed().sniff(fname)
True
>>> fname = get_test_fname('1.bcf')
>>> BcfUncompressed().sniff(fname)
False
file_ext = 'bcf_uncompressed'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c588>}
class galaxy.datatypes.binary.H5(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an HDF5 file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.mz5')
>>> H5().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> H5().sniff(fname)
False
file_ext = 'h5'
edam_format = 'format_3590'
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c780>}
class galaxy.datatypes.binary.Loom(**kwd)[source]

Bases: galaxy.datatypes.binary.H5

Class describing a Loom file: http://loompy.org/

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.loom')
>>> Loom().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Loom().sniff(fname)
False
file_ext = 'loom'
edam_format = 'format_3590'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
metadata_spec = {'col_attrs_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cef0>, 'col_attrs_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cf60>, 'col_graphs_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cfd0>, 'col_graphs_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824080>, 'creation_date': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cc18>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c780>, 'description': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ca58>, 'doi': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cb38>, 'layers_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ccf8>, 'layers_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cd68>, 'loom_spec_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cba8>, 'row_attrs_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ce10>, 'row_attrs_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ce80>, 'row_graphs_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8240f0>, 'row_graphs_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824160>, 'shape': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cc88>, 'title': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c9e8>, 'url': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81cac8>}
class galaxy.datatypes.binary.Anndata(**kwd)[source]

Bases: galaxy.datatypes.binary.H5

Class describing an HDF5 anndata files: http://anndata.rtfd.io >>> from galaxy.datatypes.sniff import get_test_fname >>> Anndata().sniff(get_test_fname(‘pbmc3k_tiny.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘test.mz5’)) False

file_ext = 'h5ad'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824358>}
class galaxy.datatypes.binary.GmxBinary(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Base class for GROMACS binary files - xtc, trr, cpt

magic_number = None
file_ext = ''
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824550>}
class galaxy.datatypes.binary.Trr(**kwd)[source]

Bases: galaxy.datatypes.binary.GmxBinary

Class describing an trr file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.trr')
>>> Trr().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Trr().sniff(fname)
False
file_ext = 'trr'
magic_number = 1993
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824748>}
class galaxy.datatypes.binary.Cpt(**kwd)[source]

Bases: galaxy.datatypes.binary.GmxBinary

Class describing a checkpoint (.cpt) file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.cpt')
>>> Cpt().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Cpt().sniff(fname)
False
file_ext = 'cpt'
magic_number = 171817
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824940>}
class galaxy.datatypes.binary.Xtc(**kwd)[source]

Bases: galaxy.datatypes.binary.GmxBinary

Class describing an xtc file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.xtc')
>>> Xtc().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Xtc().sniff(fname)
False
file_ext = 'xtc'
magic_number = 1995
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824b38>}
class galaxy.datatypes.binary.Edr(**kwd)[source]

Bases: galaxy.datatypes.binary.GmxBinary

Class describing an edr file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.edr')
>>> Edr().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Edr().sniff(fname)
False
file_ext = 'edr'
magic_number = -55555
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824d30>}
class galaxy.datatypes.binary.Biom2(**kwd)[source]

Bases: galaxy.datatypes.binary.H5

Class describing a biom2 file (http://biom-format.org/documentation/biom_format.html)

file_ext = 'biom2'
edam_format = 'format_3746'
sniff(filename)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> Biom2().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Biom2().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> Biom2().sniff(fname)
False
set_meta(dataset, overwrite=True, **kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'creation_date': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f278>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81c780>, 'format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f128>, 'format_url': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f048>, 'format_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f0b8>, 'generated_by': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f208>, 'id': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa824f98>, 'nnz': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f2e8>, 'shape': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f358>, 'type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f198>}
class galaxy.datatypes.binary.Cool(**kwd)[source]

Bases: galaxy.datatypes.binary.H5

Class describing the cool format (https://github.com/mirnylab/cooler)

file_ext = 'cool'
sniff(filename)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('matrix.cool')
>>> Cool().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Cool().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> Cool().sniff(fname)
False
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> Cool().sniff(fname)
False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f588>}
class galaxy.datatypes.binary.MCool(**kwd)[source]

Bases: galaxy.datatypes.binary.H5

Class describing the multi-resolution cool format (https://github.com/mirnylab/cooler)

file_ext = 'mcool'
sniff(filename)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('matrix.mcool')
>>> MCool().sniff(fname)
True
>>> fname = get_test_fname('matrix.cool')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('test.mz5')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> MCool().sniff(fname)
False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f7b8>}
class galaxy.datatypes.binary.Scf(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an scf binary sequence file

edam_format = 'format_1632'
edam_data = 'data_0924'
file_ext = 'scf'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82f9b0>}
class galaxy.datatypes.binary.Sff(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Standard Flowgram Format (SFF)

edam_format = 'format_3284'
edam_data = 'data_0924'
file_ext = 'sff'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82fba8>}
class galaxy.datatypes.binary.BigWig(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Accessing binary BigWig files from UCSC. The supplemental info in the paper has the binary details: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq351v1

edam_format = 'format_3006'
edam_data = 'data_3002'
file_ext = 'bigwig'
track_type = 'LineTrack'
data_sources = {'data_standalone': 'bigwig'}
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82fda0>}
class galaxy.datatypes.binary.BigBed(**kwd)[source]

Bases: galaxy.datatypes.binary.BigWig

BigBed support from UCSC.

edam_format = 'format_3004'
edam_data = 'data_3002'
file_ext = 'bigbed'
data_sources = {'data_standalone': 'bigbed'}
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa82ff98>}
class galaxy.datatypes.binary.TwoBit(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a TwoBit format nucleotide file

edam_format = 'format_3009'
edam_data = 'data_0848'
file_ext = 'twobit'
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835240>}
class galaxy.datatypes.binary.SQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a Sqlite database

file_ext = 'sqlite'
edam_format = 'format_3621'
init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
sniff_table_names(filename, table_names)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sqlite_dataprovider(dataset, **settings)[source]
sqlite_datatableprovider(dataset, **settings)[source]
sqlite_datadictprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'sqlite': <function SQlite.sqlite_dataprovider at 0x7f1efa832ea0>, 'sqlite-dict': <function SQlite.sqlite_datadictprovider at 0x7f1efa839268>, 'sqlite-table': <function SQlite.sqlite_datatableprovider at 0x7f1efa8390d0>}
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.GeminiSQLite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Gemini Sqlite database

file_ext = 'gemini.sqlite'
edam_format = 'format_3622'
edam_data = 'data_3498'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'gemini_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8358d0>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.ChiraSQLite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a ChiRAViz Sqlite database

file_ext = 'chira.sqlite'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835ba8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835c18>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835b38>}
class galaxy.datatypes.binary.CuffDiffSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a CuffDiff SQLite database

file_ext = 'cuffdiff.sqlite'
edam_format = 'format_3621'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'cuffdiff_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835e80>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'genes': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835ef0>, 'samples': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835f60>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.MzSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Proteomics Sqlite database

file_ext = 'mz.sqlite'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e278>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e2e8>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e208>}
class galaxy.datatypes.binary.PQP(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Peptide query parameters file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.pqp')
>>> PQP().sniff(fname)
True
>>> fname = get_test_fname('test.osw')
>>> PQP().sniff(fname)
False
file_ext = 'pqp'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]

table definition according to https://github.com/grosenberger/OpenMS/blob/develop/src/openms/source/ANALYSIS/OPENSWATH/TransitionPQPFile.cpp#L264 for now VERSION GENE PEPTIDE_GENE_MAPPING are excluded, since there is test data wo these tables, see also here https://github.com/OpenMS/OpenMS/issues/4365

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e5c0>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e630>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e550>}
class galaxy.datatypes.binary.OSW(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing OpenSwath output

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.osw')
>>> OSW().sniff(fname)
True
>>> fname = get_test_fname('test.sqmass')
>>> OSW().sniff(fname)
False
file_ext = 'osw'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e908>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e978>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83e898>}
class galaxy.datatypes.binary.SQmass(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Sqmass database

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.sqmass')
>>> SQmass().sniff(fname)
True
>>> fname = get_test_fname('test.pqp')
>>> SQmass().sniff(fname)
False
file_ext = 'sqmass'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83ec50>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83ecc0>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83ebe0>}
class galaxy.datatypes.binary.BlibSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Proteomics Spectral Library Sqlite database

file_ext = 'blib'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'blib_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa83ef28>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.DlibSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Proteomics Spectral Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.dlib')
>>> DlibSQlite().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> DlibSQlite().sniff(fname)
False
file_ext = 'dlib'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'dlib_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8461d0>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.ElibSQlite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a Proteomics Chromatagram Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.elib')
>>> ElibSQlite().sniff(fname)
True
>>> fname = get_test_fname('test.dlib')
>>> ElibSQlite().sniff(fname)
False
file_ext = 'elib'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>, 'version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846438>}
class galaxy.datatypes.binary.IdpDB(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing an IDPicker 3 idpDB (sqlite) database

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.idpdb')
>>> IdpDB().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> IdpDB().sniff(fname)
False
file_ext = 'idpdb'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846710>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846780>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8466a0>}
class galaxy.datatypes.binary.GAFASQLite(**kwd)[source]

Bases: galaxy.datatypes.binary.SQlite

Class describing a GAFA SQLite database

file_ext = 'gafa.sqlite'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'gafa_schema_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8469e8>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8355f8>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835668>, 'tables': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa835588>}
class galaxy.datatypes.binary.Xlsx(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for Excel 2007 (xlsx) files

file_ext = 'xlsx'
compressed = True
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846be0>}
class galaxy.datatypes.binary.ExcelXls(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an Excel (xls) file

file_ext = 'excel.xls'
edam_format = 'format_3468'
sniff(filename)[source]
get_mime()[source]

Returns the mime type of the datatype

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846dd8>}
class galaxy.datatypes.binary.Sra(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Sequence Read Archive (SRA) datatype originally from mdshw5/sra-tools-galaxy

file_ext = 'sra'
sniff(filename)[source]

The first 8 bytes of any NCBI sra file is ‘NCBI.sra’, and the file is binary. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa846fd0>}
class galaxy.datatypes.binary.RData(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Generic R Data file datatype implementation

file_ext = 'rdata'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84f208>}
class galaxy.datatypes.binary.OxliBinary(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84f438>}
class galaxy.datatypes.binary.OxliCountGraph(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliCountGraph starts with “OXLI” + one byte version number + 8-bit binary ‘1’ Test file generated via:

load-into-counting.py --n_tables 1 --max-tablesize 1 \
    oxli_countgraph.oxlicg khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliCountGraph().sniff(fname)
False
>>> fname = get_test_fname("oxli_countgraph.oxlicg")
>>> OxliCountGraph().sniff(fname)
True
file_ext = 'oxlicg'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84f630>}
class galaxy.datatypes.binary.OxliNodeGraph(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliNodeGraph starts with “OXLI” + one byte version number + 8-bit binary ‘2’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliNodeGraph().sniff(fname)
False
>>> fname = get_test_fname("oxli_nodegraph.oxling")
>>> OxliNodeGraph().sniff(fname)
True
file_ext = 'oxling'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84f828>}
class galaxy.datatypes.binary.OxliTagSet(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliTagSet starts with “OXLI” + one byte version number + 8-bit binary ‘3’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2;
mv oxli_nodegraph.oxling.tagset oxli_tagset.oxlits

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliTagSet().sniff(fname)
False
>>> fname = get_test_fname("oxli_tagset.oxlits")
>>> OxliTagSet().sniff(fname)
True
file_ext = 'oxlits'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84fa20>}
class galaxy.datatypes.binary.OxliStopTags(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliStopTags starts with “OXLI” + one byte version number + 8-bit binary ‘4’ Test file adapted from khmer 2.0’s “khmer/tests/test-data/goodversion-k32.stoptags”

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliStopTags().sniff(fname)
False
>>> fname = get_test_fname("oxli_stoptags.oxlist")
>>> OxliStopTags().sniff(fname)
True
file_ext = 'oxlist'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84fc18>}
class galaxy.datatypes.binary.OxliSubset(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliSubset starts with “OXLI” + one byte version number + 8-bit binary ‘5’ Test file generated via:

load-graph.py -k 20 example tests/test-data/random-20-a.fa;
partition-graph.py example;
mv example.subset.0.pmap oxli_subset.oxliss

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliSubset().sniff(fname)
False
>>> fname = get_test_fname("oxli_subset.oxliss")
>>> OxliSubset().sniff(fname)
True
file_ext = 'oxliss'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa84fe10>}
class galaxy.datatypes.binary.OxliGraphLabels(**kwd)[source]

Bases: galaxy.datatypes.binary.OxliBinary

OxliGraphLabels starts with “OXLI” + one byte version number + 8-bit binary ‘6’ Test file generated via:

python -c "from khmer import GraphLabels; \
    gl = GraphLabels(20, 1e7, 4); \
    gl.consume_fasta_and_tag_with_labels('tests/test-data/test-labels.fa'); \
    gl.save_labels_and_tags('oxli_graphlabels.oxligl')"

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliGraphLabels().sniff(fname)
False
>>> fname = get_test_fname("oxli_graphlabels.oxligl")
>>> OxliGraphLabels().sniff(fname)
True
file_ext = 'oxligl'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855048>}
class galaxy.datatypes.binary.PostgresqlArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing a Postgresql database packed into a tar archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('postgresql_fake.tar.bz2')
>>> PostgresqlArchive().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar')
>>> PostgresqlArchive().sniff(fname)
False
file_ext = 'postgresql'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855278>}
class galaxy.datatypes.binary.Fast5Archive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing a FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5Archive().sniff(fname)
True
file_ext = 'fast5.tar'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8554a8>}
class galaxy.datatypes.binary.Fast5ArchiveGz(**kwd)[source]

Bases: galaxy.datatypes.binary.Fast5Archive

Class describing a gzip-compressed FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> Fast5ArchiveGz().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.bz2')
>>> Fast5ArchiveGz().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5ArchiveGz().sniff(fname)
False
file_ext = 'fast5.tar.gz'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8556a0>}
class galaxy.datatypes.binary.Fast5ArchiveBz2(**kwd)[source]

Bases: galaxy.datatypes.binary.Fast5Archive

Class describing a bzip2-compressed FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar.bz2')
>>> Fast5ArchiveBz2().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> Fast5ArchiveBz2().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5ArchiveBz2().sniff(fname)
False
file_ext = 'fast5.tar.bz2'
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855898>}
class galaxy.datatypes.binary.SearchGuiArchive(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Class describing a SearchGUI archive

file_ext = 'searchgui_archive'
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa885898>, 'searchgui_major_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855b38>, 'searchgui_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855ac8>}
class galaxy.datatypes.binary.NetCDF(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Binary data in netCDF format

file_ext = 'netcdf'
edam_format = 'format_3650'
edam_data = 'data_0943'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855d30>}
class galaxy.datatypes.binary.Dcd(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a dcd file from the CHARMM molecular simulation program

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_glucose_vacuum.dcd')
>>> Dcd().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Dcd().sniff(fname)
False
file_ext = 'dcd'
edam_data = 'data_3842'
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa855f28>}
class galaxy.datatypes.binary.Vel(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing a velocity file from the CHARMM molecular simulation program

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_charmm.vel')
>>> Vel().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Vel().sniff(fname)
False
file_ext = 'vel'
__init__(**kwd)[source]
sniff(filename)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7df160>}
class galaxy.datatypes.binary.DAA(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an DAA (diamond alignment archive) file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond.daa’) >>> DAA().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> DAA().sniff(fname) False

file_ext = 'daa'
__init__(**kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7df358>}
class galaxy.datatypes.binary.RMA6(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an RMA6 (MEGAN6 read-match archive) file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond.rma6’) >>> RMA6().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> RMA6().sniff(fname) False

file_ext = 'rma6'
__init__(**kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7df550>}
class galaxy.datatypes.binary.DMND(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an DMND file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond_db.dmnd’) >>> DMND().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> DMND().sniff(fname) False

file_ext = 'dmnd'
__init__(**kwd)[source]
sniff(filename)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7df748>}
class galaxy.datatypes.binary.ICM(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class describing an ICM (interpolated context model) file, used by Glimmer

file_ext = 'icm'
edam_data = 'data_0950'
set_peek(dataset, is_multi_byte=False)[source]
sniff(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7df940>}
class galaxy.datatypes.binary.BafTar(**kwd)[source]

Bases: galaxy.datatypes.binary.CompressedArchive

Base class for common behavior of tar files of directory-based raw file formats >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘brukerbaf.d.tar’) >>> BafTar().sniff(fname) True >>> fname = get_test_fname(‘test.fast5.tar’) >>> BafTar().sniff(fname) False

edam_data = 'data_2536'
edam_format = 'format_3712'
file_ext = 'brukerbaf.d.tar'
get_signature_file()[source]
sniff(filename)[source]
get_type()[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7dfb38>}
class galaxy.datatypes.binary.YepTar(**kwd)[source]

Bases: galaxy.datatypes.binary.BafTar

A tar’d up .d directory containing Agilent/Bruker YEP format data

file_ext = 'agilentbrukeryep.d.tar'
get_signature_file()[source]
get_type()[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7dfd30>}
class galaxy.datatypes.binary.TdfTar(**kwd)[source]

Bases: galaxy.datatypes.binary.BafTar

A tar’d up .d directory containing Bruker TDF format data

file_ext = 'brukertdf.d.tar'
get_signature_file()[source]
get_type()[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7dff28>}
class galaxy.datatypes.binary.MassHunterTar(**kwd)[source]

Bases: galaxy.datatypes.binary.BafTar

A tar’d up .d directory containing Agilent MassHunter format data

file_ext = 'agilentmasshunter.d.tar'
get_signature_file()[source]
get_type()[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7e8160>}
class galaxy.datatypes.binary.MassLynxTar(**kwd)[source]

Bases: galaxy.datatypes.binary.BafTar

A tar’d up .d directory containing Waters MassLynx format data

file_ext = 'watersmasslynx.raw.tar'
get_signature_file()[source]
get_type()[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7e8358>}
class galaxy.datatypes.binary.WiffTar(**kwd)[source]

Bases: galaxy.datatypes.binary.BafTar

A tar’d up .wiff/.scan pair containing Sciex WIFF format data >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘some.wiff.tar’) >>> WiffTar().sniff(fname) True >>> fname = get_test_fname(‘brukerbaf.d.tar’) >>> WiffTar().sniff(fname) False >>> fname = get_test_fname(‘test.fast5.tar’) >>> WiffTar().sniff(fname) False

file_ext = 'wiff.tar'
sniff(filename)[source]
get_type()[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7e8550>}

galaxy.datatypes.blast module

NCBI BLAST datatypes.

Covers the blastxml format and the BLAST databases.

class galaxy.datatypes.blast.BlastXml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

NCBI Blast XML Output data

file_ext = 'blastxml'
edam_format = 'format_3331'
edam_data = 'data_0857'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff_prefix(file_prefix)[source]

Determines whether the file is blastxml

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml')
>>> BlastXml().sniff(fname)
True
>>> fname = get_test_fname('tblastn_four_human_vs_rhodopsin.blastxml')
>>> BlastXml().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> BlastXml().sniff(fname)
False
static merge(split_files, output_file)[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9d0da90>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.blast.BlastNucDb(**kwd)[source]

Bases: galaxy.datatypes.blast._BlastDb, galaxy.datatypes.data.Data

Class for nucleotide BLAST database files.

file_ext = 'blastdbn'
allow_datatype_change = False
composite_type = 'basic'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda84ce10>}
class galaxy.datatypes.blast.BlastProtDb(**kwd)[source]

Bases: galaxy.datatypes.blast._BlastDb, galaxy.datatypes.data.Data

Class for protein BLAST database files.

file_ext = 'blastdbp'
allow_datatype_change = False
composite_type = 'basic'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda84c9e8>}
class galaxy.datatypes.blast.BlastDomainDb(**kwd)[source]

Bases: galaxy.datatypes.blast._BlastDb, galaxy.datatypes.data.Data

Class for domain BLAST database files.

file_ext = 'blastdbd'
allow_datatype_change = False
composite_type = 'basic'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda84cb38>}

galaxy.datatypes.checkers module

Module proxies galaxy.util.checkers for backward compatibility.

External datatypes may make use of these functions.

galaxy.datatypes.checkers.check_binary(name, file_path=True)[source]
galaxy.datatypes.checkers.check_bz2(file_path, check_content=True)[source]
galaxy.datatypes.checkers.check_gzip(file_path, check_content=True)[source]
galaxy.datatypes.checkers.check_html(name, file_path=True)[source]

Returns True if the file/string contains HTML code.

galaxy.datatypes.checkers.check_image(file_path)[source]

Simple wrapper around image_type to yield a True/False verdict

galaxy.datatypes.checkers.check_zip(file_path, check_content=True, files=1)[source]
galaxy.datatypes.checkers.is_gzip(file_path)[source]
galaxy.datatypes.checkers.is_bz2(file_path)[source]

galaxy.datatypes.chrominfo module

class galaxy.datatypes.chrominfo.ChromInfo(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'len'
metadata_spec = {'chrom': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda756860>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'length': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda756978>}

galaxy.datatypes.constructive_solid_geometry module

Constructive Solid Geometry file formats.

class galaxy.datatypes.constructive_solid_geometry.Ply(**kwd)[source]

Bases: object

The PLY format describes an object as a collection of vertices, faces and other elements, along with properties such as color and normal direction that can be attached to these elements. A PLY file contains the description of exactly one object.

subtype = ''
__init__(**kwd)[source]
sniff_prefix(file_prefix)[source]

The structure of a typical PLY file: Header, Vertex List, Face List, (lists of other elements)

set_meta(dataset, **kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.PlyAscii(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Ply, galaxy.datatypes.data.Text

file_ext = 'plyascii'
subtype = 'ascii'
__init__(**kwd)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'face': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6630>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6d68>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6860>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6f98>}
class galaxy.datatypes.constructive_solid_geometry.PlyBinary(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Ply, galaxy.datatypes.binary.Binary

file_ext = 'plybinary'
subtype = 'binary'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'face': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6a90>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6208>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6ba8>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92f6a20>}
class galaxy.datatypes.constructive_solid_geometry.Vtk(**kwd)[source]

Bases: object

The Visualization Toolkit provides a number of source and writer objects to read and write popular data file formats. The Visualization Toolkit also provides some of its own file formats.

There are two different styles of file formats available in VTK. The simplest are the legacy, serial formats that are easy to read and write either by hand or programmatically. However, these formats are less flexible than the XML based file formats which support random access, parallel I/O, and portable data compression and are preferred to the serial VTK file formats whenever possible.

All keyword phrases are written in ASCII form whether the file is binary or ASCII. The binary section of the file (if in binary form) is the data proper; i.e., the numbers that define points coordinates, scalars, cell indices, and so forth.

Binary data must be placed into the file immediately after the newline (‘\n’) character from the previous ASCII keyword and parameter sequence.

TODO: only legacy formats are currently supported and support for XML formats should be added.

subtype = ''
__init__(**kwd)[source]
sniff_prefix(file_prefix)[source]

VTK files can be either ASCII or binary, with two different styles of file formats: legacy or XML. We’ll assume if the file contains a valid VTK header, then it is a valid VTK file.

set_meta(dataset, **kwd)[source]
set_initial_metadata(i, line, dataset)[source]
set_structure_metadata(line, dataset, dataset_type)[source]

The fourth part of legacy VTK files is the dataset structure. The geometry part describes the geometry and topology of the dataset. This part begins with a line containing the keyword DATASET followed by a keyword describing the type of dataset. Then, depending upon the type of dataset, other keyword/ data combinations define the actual data.

get_blurb(dataset)[source]
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.VtkAscii(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Vtk, galaxy.datatypes.data.Text

file_ext = 'vtkascii'
subtype = 'ASCII'
__init__(**kwd)[source]
metadata_spec = {'cells': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329128>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374400>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374cf8>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329978>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93296a0>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374470>, 'lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93297b8>, 'origin': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93744e0>, 'points': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374080>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329ba8>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374a90>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93290f0>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9374a20>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda3fe438>}
class galaxy.datatypes.constructive_solid_geometry.VtkBinary(**kwd)[source]

Bases: galaxy.datatypes.constructive_solid_geometry.Vtk, galaxy.datatypes.binary.Binary

file_ext = 'vtkbinary'
subtype = 'BINARY'
__init__(**kwd)[source]
metadata_spec = {'cells': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb240>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329630>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329a58>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb358>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb320>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329cf8>, 'lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb198>, 'origin': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329080>, 'points': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329780>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb128>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329b00>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed93bb8d0>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329ac8>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9329b70>}
class galaxy.datatypes.constructive_solid_geometry.STL(**kwd)[source]

Bases: galaxy.datatypes.data.Data

file_ext = 'stl'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9410358>}
galaxy.datatypes.constructive_solid_geometry.get_next_line(fh)[source]

galaxy.datatypes.coverage module

Coverage datatypes

class galaxy.datatypes.coverage.LastzCoverage(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'coverage'
get_track_resolution(dataset, start, end)[source]
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb7779e8>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb777d68>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'forwardCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb777ac8>, 'positionCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb777a20>, 'reverseCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb777cc0>}

galaxy.datatypes.data module

class galaxy.datatypes.data.DatatypeValidation(state, message)[source]

Bases: object

__init__(state, message)[source]
static validated()[source]
static invalid(message)[source]
static unvalidated()[source]
galaxy.datatypes.data.validate(dataset_instance)[source]
class galaxy.datatypes.data.DataMeta(name, bases, dict_)[source]

Bases: abc.ABCMeta

Metaclass for Data class. Sets up metadata spec.

__init__(name, bases, dict_)[source]
class galaxy.datatypes.data.Data(**kwd)[source]

Bases: object

Base class for all datatypes. Implements basic interfaces as well as class methods for metadata.

>>> class DataTest( Data ):
...     MetadataElement( name="test" )
...
>>> DataTest.metadata_spec.test.name
'test'
>>> DataTest.metadata_spec.test.desc
'test'
>>> type( DataTest.metadata_spec.test.param )
<class 'galaxy.model.metadata.MetadataParameter'>
edam_data = 'data_0006'
edam_format = 'format_1915'
file_ext = 'data'
CHUNKABLE = False
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}

Dictionary of metadata fields for this datatype

copy_safe_peek = True
is_binary = True
allow_datatype_change = True
composite_type = None
primary_file_name = 'index'
track_type = None
data_sources = {}
__init__(**kwd)[source]

Initialize the datatype

supported_display_apps = {}
composite_files = {}
get_raw_data(dataset)[source]

Returns the full data. To stream it open the file_name and read/write as needed

dataset_content_needs_grooming(file_name)[source]

This function is called on an output dataset file after the content is initially generated.

groom_dataset_content(file_name)[source]

This function is called on an output dataset file if dataset_content_needs_grooming returns True.

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Unimplemented method, allows guessing of metadata from contents of file

missing_meta(dataset, check=None, skip=None)[source]

Checks for empty metadata values, Returns True if non-optional metadata is missing Specifying a list of ‘check’ values will only check those names provided; when used, optionality is ignored Specifying a list of ‘skip’ items will return True even when a named metadata value is missing

set_max_optional_metadata_filesize(max_value)[source]
get_max_optional_metadata_filesize()[source]
max_optional_metadata_filesize
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

Parameters:is_multi_byte (bool) – deprecated
display_peek(dataset)[source]

Create HTML table, used for displaying peek

to_archive(trans, dataset, name='')[source]

Collect archive paths and file handles that need to be exported when archiving dataset.

Parameters:
  • dataset – HistoryDatasetAssociation
  • name – archive name, in collection context corresponds to collection name(s) and element_identifier, joined by ‘/’, e.g ‘fastq_collection/sample1/forward’
display_data(trans, data, preview=False, filename=None, to_ext=None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.

TOOD: Document alternatives to overridding this method (data providers?).

display_as_markdown(dataset_instance, markdown_format_helpers)[source]

Prepare for embedding dataset into a basic Markdown document.

This is a somewhat experimental interface and should not be implemented on datatypes not tightly tied to a Galaxy version (e.g. datatypes in the Tool Shed).

Speaking very losely - the datatype should should load a bounded amount of data from the supplied dataset instance and prepare for embedding it into Markdown. This should be relatively vanilla Markdown - the result of this is bleached and it should not contain nested Galaxy Markdown directives.

If the data cannot reasonably be displayed, just indicate this and do not throw an exception.

display_name(dataset)[source]

Returns formatted html of dataset name

display_info(dataset)[source]

Returns formatted html of dataset info

repair_methods(dataset)[source]

Unimplemented method, returns dict with method/option for repairing errors

get_mime()[source]

Returns the mime type of the datatype

add_display_app(app_id, label, file_function, links_function)[source]

Adds a display app to the datatype. app_id is a unique id label is the primary display label, e.g., display at ‘UCSC’ file_function is a string containing the name of the function that returns a properly formatted display links_function is a string containing the name of the function that returns a list of (link_name,link)

remove_display_app(app_id)[source]

Removes a display app from the datatype

clear_display_apps()[source]
add_display_application(display_application)[source]

New style display applications

get_display_application(key, default=None)[source]
get_display_applications_by_dataset(dataset, trans)[source]
get_display_types()[source]

Returns display types available

get_display_label(type)[source]

Returns primary label for display app

as_display_type(dataset, type, **kwd)[source]

Returns modified file contents for a particular display type

Returns a list of tuples of (name, link) for a particular display type. No check on ‘access’ permissions is done here - if you can view the dataset, you can also save it or send it to a destination outside of Galaxy, so Galaxy security restrictions do not apply anyway.

get_converter_types(original_dataset, datatypes_registry)[source]

Returns available converters by type for this dataset

find_conversion_destination(dataset, accepted_formats, datatypes_registry, **kwd)[source]

Returns ( target_ext, existing converted dataset )

convert_dataset(trans, original_dataset, target_type, return_output=False, visible=True, deps=None, target_context=None, history=None)[source]

This function adds a job to the queue to convert a dataset to another type. Returns a message about success/failure.

after_setting_metadata(dataset)[source]

This function is called on the dataset after metadata is set.

before_setting_metadata(dataset)[source]

This function is called on the dataset before metadata is set.

add_composite_file(name, **kwds)[source]
writable_files
get_composite_files(dataset=None)[source]
generate_primary_file(dataset=None)[source]
has_resolution
matches_any(target_datatypes)[source]

Check if this datatype is of any of the target_datatypes or is a subtype thereof.

static merge(split_files, output_file)[source]

Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

has_dataprovider(data_format)[source]

Returns True if data_format is available in dataproviders.

dataprovider(dataset, data_format, **settings)[source]

Base dataprovider factory for all datatypes that returns the proper provider for the given data_format or raises a NoProviderAvailable.

validate(dataset, **kwd)[source]
base_dataprovider(dataset, **settings)[source]
chunk_dataprovider(dataset, **settings)[source]
chunk64_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>}
class galaxy.datatypes.data.Text(**kwd)[source]

Bases: galaxy.datatypes.data.Data

edam_format = 'format_2330'
file_ext = 'txt'
line_class = 'line'
is_binary = False
get_mime()[source]

Returns the mime type of the datatype

set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

estimate_file_lines(dataset)[source]

Perform a rough estimate by extrapolating number of lines from a small read.

count_data_lines(dataset)[source]

Count the number of lines of data in dataset, skipping all blank lines and comments.

set_peek(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=None, line_wrap=True)[source]

Set the peek. This method is used by various subclasses of Text.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by line.

line_dataprovider(dataset, **settings)[source]

Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.

regex_line_dataprovider(dataset, **settings)[source]

Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.

dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>}
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.data.Directory(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Class representing a directory of files.

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedcc0>}
class galaxy.datatypes.data.GenericAsn1(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class for generic ASN.1 text format

edam_data = 'data_0849'
edam_format = 'format_1966'
file_ext = 'asn1'
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedeb8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.data.LineCount(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Dataset contains a single line with a single integer that denotes the line count for a related dataset. Used for custom builds.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbf60f0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.data.Newick(**kwd)[source]

Bases: galaxy.datatypes.data.Text

New Hampshire/Newick Format

edam_data = 'data_0872'
edam_format = 'format_1910'
file_ext = 'newick'
__init__(**kwd)[source]

Initialize foobar datatype

init_meta(dataset, copy_from=None)[source]
sniff(filename)[source]

Returning false as the newick format is too general and cannot be sniffed.

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbf62e8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.data.Nexus(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Nexus format as used By Paup, Mr Bayes, etc

edam_data = 'data_0872'
edam_format = 'format_1912'
file_ext = 'nex'
__init__(**kwd)[source]

Initialize foobar datatype

init_meta(dataset, copy_from=None)[source]
sniff_prefix(file_prefix)[source]

All Nexus Files Simply puts a ‘#NEXUS’ in its first line

get_visualizations(dataset)[source]

Returns a list of visualizations for datatype.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbf64e0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
galaxy.datatypes.data.get_test_fname(fname)[source]

Returns test data filename

galaxy.datatypes.data.get_file_peek(file_name, is_multi_byte=False, WIDTH=256, LINE_COUNT=5, skipchars=None, line_wrap=True)[source]

Returns the first LINE_COUNT lines wrapped to WIDTH.

Parameters:is_multi_byte (bool) – deprecated
>>> def assert_peek_is(file_name, expected, *args, **kwd):
...     path = get_test_fname(file_name)
...     peek = get_file_peek(path, *args, **kwd)
...     assert peek == expected, "%s != %s" % (peek, expected)
>>> assert_peek_is('0_nonewline', u'0')
>>> assert_peek_is('0.txt', u'0\n')
>>> assert_peek_is('4.bed', u'chr22\t30128507\t31828507\tuc003bnx.1_cds_2_0_chr22_29227_f\t0\t+\n', LINE_COUNT=1)
>>> assert_peek_is('1.bed', u'chr1\t147962192\t147962580\tCCDS989.1_cds_0_0_chr1_147962193_r\t0\t-\nchr1\t147984545\t147984630\tCCDS990.1_cds_0_0_chr1_147984546_f\t0\t+\n', LINE_COUNT=2)

galaxy.datatypes.genetics module

rgenetics datatypes Use at your peril Ross Lazarus for the rgenetics and galaxy projects

genome graphs datatypes derived from Interval datatypes genome graphs datasets have a header row with appropriate columnames The first column is always the marker - eg columname = rs, first row= rs12345 if the rows are snps subsequent row values are all numeric ! Will fail if any non numeric (eg ‘+’ or ‘NA’) values ross lazarus for rgenetics august 20 2007

class galaxy.datatypes.genetics.GenomeGraphs(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Tab delimited data containing a marker id and any number of numeric values

file_ext = 'gg'
__init__(**kwd)[source]

Initialize gg datatype, by adding UCSC display apps

set_meta(dataset, **kwd)[source]
as_ucsc_display_file(dataset, **kwd)[source]

Returns file

from the ever-helpful angie hinrichs angie@soe.ucsc.edu a genome graphs call looks like this

http://genome.ucsc.edu/cgi-bin/hgGenome?clade=mammal&org=Human&db=hg18&hgGenome_dataSetName=dname &hgGenome_dataSetDescription=test&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess &hgGenome_columnLabels=best%20guess&hgGenome_maxVal=&hgGenome_labelVals= &hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http://galaxy.esphealth.org/datasets/333/display/index &hgGenome_doSubmitUpload=submit

Galaxy gives this for an interval file

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr1:1-1000&hgt.customText= http%3A%2F%2Fgalaxy.esphealth.org%2Fdisplay_as%3Fid%3D339%26display_app%3Ducsc

make_html_table(dataset, skipchars=[])[source]

Create HTML table, used for displaying peek

validate(dataset, **kwd)[source]

Validate a gg file - all numeric after header row

sniff_prefix(file_prefix)[source]

Determines whether the file is in gg format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> GenomeGraphs().sniff( fname )
False
>>> fname = get_test_fname( '1.gg' )
>>> GenomeGraphs().sniff( fname )
True
get_mime()[source]

Returns the mime type of the datatype

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad9fcc0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad9f908>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'markerCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad91080>}
sniff(filename)
class galaxy.datatypes.genetics.rgTabList(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

for sampleid and for featureid lists of exclusions or inclusions in the clean tool featureid subsets on statistical criteria -> specialized display such as gg

file_ext = 'rgTList'
__init__(**kwd)[source]

Initialize featurelistt datatype

display_peek(dataset)[source]

Returns formated html of peek

get_mime()[source]

Returns the mime type of the datatype

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9ef73c8>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad83da0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad9fc50>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad9f278>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad9fb70>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad83c50>}
class galaxy.datatypes.genetics.rgSampleList(**kwd)[source]

Bases: galaxy.datatypes.genetics.rgTabList

for sampleid exclusions or inclusions in the clean tool output from QC eg excess het, gender error, ibd pair member,eigen outlier,excess mendel errors,… since they can be uploaded, should be flexible but they are persistent at least same infrastructure for expression?

file_ext = 'rgSList'
__init__(**kwd)[source]

Initialize samplelist datatype

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda16fbe0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda16fb38>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad83588>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad83978>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad83908>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc5a550>}
class galaxy.datatypes.genetics.rgFeatureList(**kwd)[source]

Bases: galaxy.datatypes.genetics.rgTabList

for featureid lists of exclusions or inclusions in the clean tool output from QC eg low maf, high missingness, bad hwe in controls, excess mendel errors,… featureid subsets on statistical criteria -> specialized display such as gg same infrastructure for expression?

file_ext = 'rgFList'
__init__(**kwd)[source]

Initialize featurelist datatype

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a400>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a390>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a320>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a208>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a278>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc4a470>}
class galaxy.datatypes.genetics.Rgenetics(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class to use for rgenetics datatypes derived from html - composite datatype elements stored in extra files path

composite_type = 'auto_primary_file'
allow_datatype_change = False
file_ext = 'rgenetics'
generate_primary_file(dataset=None)[source]
regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

get_mime()[source]

Returns the mime type of the datatype

set_meta(dataset, **kwd)[source]

for lped/pbed eg

metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95c1cf8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.SNPMatrix(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

BioC SNPMatrix Rgenetics data collections

file_ext = 'snpmatrix'
set_peek(dataset, **kwd)[source]
sniff(filename)[source]

need to check the file header hex code

metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9a49a90>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Lped(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

linkage pedigree (ped,map) Rgenetics data collections

file_ext = 'lped'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9a37da0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Pphe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Plink phenotype file - header must have FID IID… Rgenetics data collections

file_ext = 'pphe'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed9a61908>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Fphe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

fbat pedigree file - mad format with ! as first char on header row Rgenetics data collections

file_ext = 'fphe'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95c39b0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Phe(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Phenotype file

file_ext = 'phe'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95c3f98>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Fped(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

FBAT pedigree format - single file, map is header row of rs numbers. Strange. Rgenetics data collections

file_ext = 'fped'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95c5f60>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Pbed(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Plink Binary compressed 2bit/geno Rgenetics data collections

file_ext = 'pbed'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95ca128>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.ldIndep(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

LD (a good measure of redundancy of information) depleted Plink Binary compressed 2bit/geno This is really a plink binary, but some tools work better with less redundancy so are constrained to these files

file_ext = 'ldreduced'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed95caac8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Eigenstratgeno(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Eigenstrat format - may be able to get rid of this if we move to shellfish Rgenetics data collections

file_ext = 'eigenstratgeno'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbc2f860>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Eigenstratpca(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

Eigenstrat PCA file for case control adjustment Rgenetics data collections

file_ext = 'eigenstratpca'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad98e48>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.Snptest(**kwd)[source]

Bases: galaxy.datatypes.genetics.Rgenetics

BioC snptest Rgenetics data collections

file_ext = 'snptest'
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad985c0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.IdeasPre(**kwd)[source]

Bases: galaxy.datatypes.text.Html

This datatype defines the input format required by IDEAS: https://academic.oup.com/nar/article/44/14/6721/2468150 The IDEAS preprocessor tool produces an output using this format. The extra_files_path of the primary input dataset contains the following files and directories. - chromosome_windows.txt (optional) - chromosomes.bed (optional) - IDEAS_input_config.txt - compressed archived tmp directory containing a number of compressed bed files.

composite_type = 'auto_primary_file'
allow_datatype_change = False
file_ext = 'ideaspre'
__init__(**kwd)[source]
set_meta(dataset, **kwd)[source]
generate_primary_file(dataset=None)[source]
regenerate_primary_file(dataset)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad980f0>, 'chrom_bed': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad98080>, 'chrom_windows': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edad986a0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'input_config': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb9e8>, 'tmp_archive': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0ebb38>}
class galaxy.datatypes.genetics.Pheno(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

base class for pheno files

file_ext = 'pheno'
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb898>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb668>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0ebef0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0ebe10>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0ebc18>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb908>}
class galaxy.datatypes.genetics.RexpBase(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class for BioC data structures in Galaxy must be constructed with the pheno data in place since that goes into the metadata for each instance

file_ext = 'rexpbase'
html_table = None
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime()[source]

Returns the mime type of the datatype

get_phecols(phenolist=[], maxConc=20)[source]

sept 2009: cannot use whitespace to split - make a more complex structure here and adjust the methods that rely on this structure return interesting phenotype column names for an rexpression eset or affybatch to use in array subsetting and so on. Returns a data structure for a dynamic Galaxy select parameter. A column with only 1 value doesn’t change, so is not interesting for analysis. A column with a different value in every row is equivalent to a unique identifier so is also not interesting for anova or limma analysis - both these are removed after the concordance (count of unique terms) is constructed for each column. Then a complication - each remaining pair of columns is tested for redundancy - if two columns are always paired, then only one is needed :)

get_pheno(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh note that R is wierd and adds the row.name in the header so the columns are all wrong - unless you tell it not to. A file can be written as write.table(file=’foo.pheno’,pData(foo),sep=’ ‘,quote=F,row.names=F)

set_peek(dataset, **kwd)[source]

expects a .pheno file in the extra_files_dir - ugh note that R is weird and does not include the row.name in the header. why?

get_peek(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh

get_file_peek(filename)[source]

can’t really peek at a filename - need the extra_files_path and such?

regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, **kwd)[source]

NOTE we apply the tabular machinary to the phenodata extracted from a BioC eSet or affybatch.

make_html_table(pp='nothing supplied from peek\n')[source]

Create HTML table, used for displaying peek

display_peek(dataset)[source]

Returns formatted html of peek

metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb550>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb358>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eba58>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb4e0>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb5c0>}
class galaxy.datatypes.genetics.Affybatch(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'affybatch'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb438>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb6a0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb320>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb780>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb470>}
class galaxy.datatypes.genetics.Eset(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'eset'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e40b8>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb1d0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0eb160>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4048>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4128>}
class galaxy.datatypes.genetics.MAlist(**kwd)[source]

Bases: galaxy.datatypes.genetics.RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'malist'
__init__(**kwd)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4470>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4390>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4320>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4400>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e44e0>}
class galaxy.datatypes.genetics.LinkageStudies(**kwd)[source]

Bases: galaxy.datatypes.data.Text

superclass for classical linkage analysis suites

test_files = ['linkstudies.allegro_fparam', 'linkstudies.alohomora_gts', 'linkstudies.linkage_datain', 'linkstudies.linkage_map']
__init__(**kwd)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e46d8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.genetics.GenotypeMatrix(**kwd)[source]

Bases: galaxy.datatypes.genetics.LinkageStudies

Sample matrix of genotypes - GTs as columns

file_ext = 'alohomora_gts'
__init__(**kwd)[source]
header_check(fio)[source]
sniff_prefix(file_prefix)[source]
>>> classname = GenotypeMatrix
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4908>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.genetics.MarkerMap(**kwd)[source]

Bases: galaxy.datatypes.genetics.LinkageStudies

Map of genetic markers including physical and genetic distance Common input format for linkage programs

chrom, genetic pos, markername, physical pos, Nr

file_ext = 'linkage_map'
header_check(fio)[source]
sniff_prefix(file_prefix)[source]
>>> classname = MarkerMap
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4b00>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.genetics.DataIn(**kwd)[source]

Bases: galaxy.datatypes.genetics.LinkageStudies

Common linkage input file for intermarker distances and recombination rates

file_ext = 'linkage_datain'
__init__(**kwd)[source]
sniff_prefix(file_prefix)[source]
>>> classname = DataIn
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4d30>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.genetics.AllegroLOD(**kwd)[source]

Bases: galaxy.datatypes.genetics.LinkageStudies

Allegro output format for LOD scores

file_ext = 'allegro_fparam'
header_check(fio)[source]
sniff_prefix(file_prefix)[source]
>>> classname = AllegroLOD
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eda0e4f28>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)

galaxy.datatypes.gis module

GIS classes

class galaxy.datatypes.gis.Shapefile(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

The Shapefile data format: For more information please see http://en.wikipedia.org/wiki/Shapefile

composite_type = 'auto_primary_file'
file_ext = 'shp'
allow_datatype_change = False
__init__(**kwd)[source]
generate_primary_file(dataset=None)[source]
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text.

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e5d908>}

galaxy.datatypes.graph module

Graph content classes.

class galaxy.datatypes.graph.Xgmml(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

XGMML graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

file_ext = 'xgmml'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Returns false and the user must manually set.

static merge(split_files, output_file)[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

node_edge_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'node-edge': <function Xgmml.node_edge_dataprovider at 0x7f1ed8e20158>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>, 'xml': <function GenericXml.xml_dataprovider at 0x7f1edf3fcf28>}
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e7ad30>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.graph.Sif(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

SIF graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

First column: node id Second column: relationship type Third to Nth column: target ids for link

file_ext = 'sif'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

sniff(filename)[source]

Returns false and the user must manually set.

static merge(split_files, output_file)[source]
node_edge_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'column': <function TabularData.column_dataprovider at 0x7f1efa7ec7b8>, 'dataset-column': <function TabularData.dataset_column_dataprovider at 0x7f1efa7ec950>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider at 0x7f1efa7ecc80>, 'dict': <function TabularData.dict_dataprovider at 0x7f1efa7ecae8>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'node-edge': <function Sif.node_edge_dataprovider at 0x7f1ed8e206a8>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>}
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e36470>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e364e0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e365c0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e361d0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e366a0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8e36d68>}
class galaxy.datatypes.graph.XGMMLGraphDataProvider(source, selector=None, max_depth=None, **kwargs)[source]

Bases: galaxy.datatypes.dataproviders.hierarchy.XMLDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings = {'limit': 'int', 'max_depth': 'int', 'offset': 'int', 'selector': 'str'}
class galaxy.datatypes.graph.SIFGraphDataProvider(source, indeces=None, column_count=None, column_types=None, parsers=None, parse_columns=True, deliminator='t', filters=None, **kwargs)[source]

Bases: galaxy.datatypes.dataproviders.column.ColumnarDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings = {'column_count': 'int', 'column_types': 'list:str', 'comment_char': 'str', 'deliminator': 'str', 'filters': 'list:str', 'indeces': 'list:int', 'invert': 'bool', 'limit': 'int', 'offset': 'int', 'parse_columns': 'bool', 'provide_blank': 'bool', 'regex_list': 'list:escaped', 'strip_lines': 'bool', 'strip_newlines': 'bool'}

galaxy.datatypes.images module

Image classes

class galaxy.datatypes.images.Image(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Class describing an image

edam_data = 'data_2968'
edam_format = 'format_3547'
file_ext = ''
__init__(**kwd)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)[source]

Determine if the file is in this format

handle_dataset_as_image(hda)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d9d860>}
class galaxy.datatypes.images.Jpg(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3579'
file_ext = 'jpg'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d9ddd8>}
class galaxy.datatypes.images.Png(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3603'
file_ext = 'png'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d5ada0>}
class galaxy.datatypes.images.Tiff(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3591'
file_ext = 'tiff'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8dcc2b0>}
class galaxy.datatypes.images.Hamamatsu(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'vms'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8f38e48>}
class galaxy.datatypes.images.Mirax(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'mrxs'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d61d30>}
class galaxy.datatypes.images.Sakura(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'svslide'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d493c8>}
class galaxy.datatypes.images.Nrrd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

file_ext = 'nrrd'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d1c2b0>}
class galaxy.datatypes.images.Bmp(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3592'
file_ext = 'bmp'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d1cb38>}
class galaxy.datatypes.images.Gif(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3467'
file_ext = 'gif'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d1cd30>}
class galaxy.datatypes.images.Im(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3593'
file_ext = 'im'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d1ceb8>}
class galaxy.datatypes.images.Pcd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3594'
file_ext = 'pcd'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d880f0>}
class galaxy.datatypes.images.Pcx(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3595'
file_ext = 'pcx'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d882b0>}
class galaxy.datatypes.images.Ppm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3596'
file_ext = 'ppm'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d88438>}
class galaxy.datatypes.images.Psd(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3597'
file_ext = 'psd'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d885c0>}
class galaxy.datatypes.images.Xbm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3598'
file_ext = 'xbm'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d887b8>}
class galaxy.datatypes.images.Xpm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3599'
file_ext = 'xpm'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d889b0>}
class galaxy.datatypes.images.Rgb(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3600'
file_ext = 'rgb'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d88ba8>}
class galaxy.datatypes.images.Pbm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3601'
file_ext = 'pbm'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d88d30>}
class galaxy.datatypes.images.Pgm(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3602'
file_ext = 'pgm'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d88f28>}
class galaxy.datatypes.images.Eps(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3466'
file_ext = 'eps'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6e128>}
class galaxy.datatypes.images.Rast(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3605'
file_ext = 'rast'
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6e2e8>}
class galaxy.datatypes.images.Pdf(**kwd)[source]

Bases: galaxy.datatypes.images.Image

edam_format = 'format_3508'
file_ext = 'pdf'
sniff(filename)[source]

Determine if the file is in pdf format.

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6e4e0>}
galaxy.datatypes.images.create_applet_tag_peek(class_name, archive, params)[source]
class galaxy.datatypes.images.Gmaj(**kwd)[source]

Bases: galaxy.datatypes.data.Data

Class describing a GMAJ Applet

edam_format = 'format_3547'
file_ext = 'gmaj.zip'
copy_safe_peek = False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
get_mime()[source]

Returns the mime type of the datatype

sniff(filename)[source]

NOTE: the sniff.convert_newlines() call in the upload utility will keep Gmaj data types from being correctly sniffed, but the files can be uploaded (they’ll be sniffed as ‘txt’). This sniff function is here to provide an example of a sniffer for a zip file.

metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6e630>}
class galaxy.datatypes.images.Html(**kwd)[source]

Bases: galaxy.datatypes.text.Html

Deprecated class. This class should not be used anymore, but the galaxy.datatypes.text:Html one. This is for backwards compatibilities only.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6e828>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.images.Laj(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Class describing a LAJ Applet

file_ext = 'laj'
copy_safe_peek = False
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d6ea20>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}

galaxy.datatypes.interval module

Interval datatypes

class galaxy.datatypes.interval.Interval(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

Tab delimited data containing interval information

edam_data = 'data_3002'
edam_format = 'format_3475'
file_ext = 'interval'
line_class = 'region'
track_type = 'FeatureTrack'
data_sources = {'data': 'tabix', 'index': 'bigwig'}

Add metadata elements

__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display apps

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, first_line_is_header=False, **kwd)[source]

Tries to guess from the line the location number of the column for the chromosome, region start-end and strand

displayable(dataset)[source]
get_estimated_display_viewport(dataset, chrom_col=None, start_col=None, end_col=None)[source]

Return a chrom, start, stop tuple for viewing a file.

as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents with only the bed data

display_peek(dataset)[source]

Returns formated html of peek

Generate links to UCSC genome browser sites based on the dbkey and content of dataset.

validate(dataset, **kwd)[source]

Validate an interval file using the bx GenomicIntervalReader

repair_methods(dataset)[source]

Return options for removing errors along with a description

sniff_prefix(file_prefix)[source]

Checks for ‘intervalness’

This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> Interval().sniff( fname )
False
>>> fname = get_test_fname( 'interval.interval' )
>>> Interval().sniff( fname )
True
get_track_resolution(dataset, start, end)[source]
genomic_region_dataprovider(dataset, **settings)[source]
genomic_region_dict_dataprovider(dataset, **settings)[source]
interval_dataprovider(dataset, **settings)[source]
interval_dict_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'column': <function TabularData.column_dataprovider at 0x7f1efa7ec7b8>, 'dataset-column': <function TabularData.dataset_column_dataprovider at 0x7f1efa7ec950>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider at 0x7f1efa7ecc80>, 'dict': <function TabularData.dict_dataprovider at 0x7f1efa7ecae8>, 'genomic-region': <function Interval.genomic_region_dataprovider at 0x7f1efa810488>, 'genomic-region-dict': <function Interval.genomic_region_dict_dataprovider at 0x7f1efa810620>, 'interval': <function Interval.interval_dataprovider at 0x7f1efa8107b8>, 'interval-dict': <function Interval.interval_dict_dataprovider at 0x7f1efa810950>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>}
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8087b8>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f98>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808e48>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f28>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808dd8>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808eb8>}
sniff(filename)
class galaxy.datatypes.interval.BedGraph(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Tab delimited chrom/start/end/datavalue dataset

edam_format = 'format_3583'
file_ext = 'bedgraph'
track_type = 'LineTrack'
data_sources = {'data': 'bigwig', 'index': 'bigwig'}
as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents as is with no modifications. TODO: this is a functional stub and will need to be enhanced moving forward to provide additional support for bedgraph.

get_estimated_display_viewport(dataset, chrom_col=0, start_col=1, end_col=2)[source]

Set viewport based on dataset’s first 100 lines.

metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f160>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f400>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f2b0>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f390>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f240>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f320>}
class galaxy.datatypes.interval.Bed(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Tab delimited data in BED format

edam_format = 'format_3003'
file_ext = 'bed'
data_sources = {'data': 'tabix', 'feature_search': 'fli', 'index': 'bigwig'}
track_type = 'FeatureTrack'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts']

Add metadata elements

set_meta(dataset, overwrite=True, **kwd)[source]

Sets the metadata information for datasets previously determined to be in bed format.

as_ucsc_display_file(dataset, **kwd)[source]

Returns file contents with only the bed data. If bed 6+, treat as interval.

sniff_prefix(file_prefix)[source]

Checks for ‘bedness’

BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used. The data type of all 12 columns is: 1-str, 2-int, 3-int, 4-str, 5-int, 6-str, 7-int, 8-int, 9-int or list, 10-int, 11-list, 12-list

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_tab.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'interv1.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'complete.bed' )
>>> Bed().sniff( fname )
True
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f588>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f7b8>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f6d8>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f28>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f668>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f748>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f7f0>}
class galaxy.datatypes.interval.ProBed(**kwd)[source]

Bases: galaxy.datatypes.interval.Bed

Tab delimited data in proBED format - adaptation of BED for proteomics data.

edam_format = 'format_3827'
file_ext = 'probed'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts', 'ProteinAccession', 'PeptideSequence', 'Uniqueness', 'GenomeReferenceVersion', 'PsmScore', 'Fdr', 'Modifications', 'Charge', 'ExpMassToCharge', 'CalcMassToCharge', 'PsmRank', 'DatasetID', 'Uri']
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f9b0>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fbe0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fb00>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f28>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fa90>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fb70>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fc18>}
class galaxy.datatypes.interval.BedStrict(**kwd)[source]

Bases: galaxy.datatypes.interval.Bed

Tab delimited data in strict BED format - no non-standard columns allowed

edam_format = 'format_3584'
file_ext = 'bedstrict'
allow_datatype_change = False
__init__(**kwd)[source]
set_meta(dataset, overwrite=True, **kwd)[source]
sniff(filename)[source]
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fdd8>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a048>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80feb8>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80ff98>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80fe48>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80ff28>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f7f0>}
class galaxy.datatypes.interval.Bed6(**kwd)[source]

Bases: galaxy.datatypes.interval.BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6

edam_format = 'format_3585'
file_ext = 'bed6'
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a240>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a470>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a320>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a400>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a2b0>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a390>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f7f0>}
class galaxy.datatypes.interval.Bed12(**kwd)[source]

Bases: galaxy.datatypes.interval.BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12

edam_format = 'format_3586'
file_ext = 'bed12'
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a668>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a898>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a748>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a828>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a6d8>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81a7b8>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa80f7f0>}
class galaxy.datatypes.interval.Gff(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular, galaxy.datatypes.interval._RemoteCallMixin

Tab delimited data in Gff format

edam_data = 'data_1255'
edam_format = 'format_2305'
file_ext = 'gff'
valid_gff_frame = ['.', '0', '1', '2']
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group']
data_sources = {'data': 'interval_index', 'feature_search': 'fli', 'index': 'bigwig'}
track_type = 'FeatureTrack'

Add metadata elements

__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_attribute_metadata(dataset)[source]

Sets metadata elements for dataset’s attributes.

set_meta(dataset, overwrite=True, **kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

get_estimated_display_viewport(dataset)[source]

Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 formats. This function should correctly handle both…

sniff_prefix(file_prefix)[source]

Determines whether the file is in gff format

GFF lines have nine required fields that must be tab-separated.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format3

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('gff.gff3')
>>> Gff().sniff( fname )
False
>>> fname = get_test_fname('test.gff')
>>> Gff().sniff( fname )
True
genomic_region_dataprovider(dataset, **settings)[source]
genomic_region_dict_dataprovider(dataset, **settings)[source]
interval_dataprovider(dataset, **settings)[source]
interval_dict_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'column': <function TabularData.column_dataprovider at 0x7f1efa7ec7b8>, 'dataset-column': <function TabularData.dataset_column_dataprovider at 0x7f1efa7ec950>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider at 0x7f1efa7ecc80>, 'dict': <function TabularData.dict_dataprovider at 0x7f1efa7ecae8>, 'genomic-region': <function Gff.genomic_region_dataprovider at 0x7f1eebbe3488>, 'genomic-region-dict': <function Gff.genomic_region_dict_dataprovider at 0x7f1eebbe3620>, 'interval': <function Gff.interval_dataprovider at 0x7f1eebbe37b8>, 'interval-dict': <function Gff.interval_dict_dataprovider at 0x7f1eebbe3950>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>}
metadata_spec = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ada0>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ad30>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81acc0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ac50>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>}
sniff(filename)
class galaxy.datatypes.interval.Gff3(**kwd)[source]

Bases: galaxy.datatypes.interval.Gff

Tab delimited data in Gff3 format

edam_format = 'format_1975'
file_ext = 'gff3'
valid_gff3_strand = ['+', '-', '.', '?']
valid_gff3_phase = ['.', '0', '1', '2']
column_names = ['Seqid', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes']
track_type = 'FeatureTrack'

Add metadata elements

__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_meta(dataset, overwrite=True, **kwd)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is in GFF version 3 format

GFF 3 format:

  1. adds a mechanism for representing more than one level of hierarchical grouping of features and subfeatures.
  2. separates the ideas of group membership and feature name/id
  3. constrains the feature type field to be taken from a controlled vocabulary.
  4. allows a single feature, such as an exon, to belong to more than one group at a time.
  5. provides an explicit convention for pairwise alignments
  6. provides an explicit convention for features that occupy disjunct regions

The format consists of 9 columns, separated by tabs (NOT spaces).

Undefined fields are replaced with the “.” character, as described in the original GFF spec.

For complete details see http://song.sourceforge.net/gff3.shtml

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.gff' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname('gff.gff3')
>>> Gff3().sniff( fname )
True
metadata_spec = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ada0>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ad30>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81af98>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ac50>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>}
class galaxy.datatypes.interval.Gtf(**kwd)[source]

Bases: galaxy.datatypes.interval.Gff

Tab delimited data in Gtf format

edam_format = 'format_2306'
file_ext = 'gtf'
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes']
track_type = 'FeatureTrack'

Add metadata elements

sniff_prefix(file_prefix)[source]

Determines whether the file is in gtf format

GTF lines have nine required fields that must be tab-separated. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space. The attribute list must begin with the two mandatory attributes:

gene_id value - A globally unique identifier for the genomic source of the sequence. transcript_id value - A globally unique identifier for the predicted transcript.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format4

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.bed' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gff' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gtf().sniff( fname )
True
metadata_spec = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ada0>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa81ad30>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8240>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe81d0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>}
class galaxy.datatypes.interval.Wiggle(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular, galaxy.datatypes.interval._RemoteCallMixin

Tab delimited data in wiggle format

edam_format = 'format_3005'
file_ext = 'wig'
track_type = 'LineTrack'
data_sources = {'data': 'bigwig', 'index': 'bigwig'}
__init__(**kwd)[source]
get_estimated_display_viewport(dataset)[source]

Return a chrom, start, stop tuple for viewing a file.

display_peek(dataset)[source]

Returns formated html of peek

set_meta(dataset, overwrite=True, **kwd)[source]
sniff_prefix(file_prefix)[source]

Determines wether the file is in wiggle format

The .wig format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line is the track data, which can be entered in several different formats.

The track definition line begins with the word ‘track’ followed by the track type. The track type with version is REQUIRED, and it currently must be wiggle_0. For example, track type=wiggle_0…

For complete details see http://genome.ucsc.edu/goldenPath/help/wiggle.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'interv1.bed' )
>>> Wiggle().sniff( fname )
False
>>> fname = get_test_fname( 'wiggle.wig' )
>>> Wiggle().sniff( fname )
True
get_track_resolution(dataset, start, end)[source]
wiggle_dataprovider(dataset, **settings)[source]
wiggle_dict_dataprovider(dataset, **settings)[source]
dataproviders = {'base': <function Data.base_dataprovider at 0x7f1efdbef9d8>, 'chunk': <function Data.chunk_dataprovider at 0x7f1efdbefb70>, 'chunk64': <function Data.chunk64_dataprovider at 0x7f1efdbefd08>, 'column': <function TabularData.column_dataprovider at 0x7f1efa7ec7b8>, 'dataset-column': <function TabularData.dataset_column_dataprovider at 0x7f1efa7ec950>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider at 0x7f1efa7ecc80>, 'dict': <function TabularData.dict_dataprovider at 0x7f1efa7ecae8>, 'line': <function Text.line_dataprovider at 0x7f1efdbf1488>, 'regex-line': <function Text.regex_line_dataprovider at 0x7f1efdbf1620>, 'wiggle': <function Wiggle.wiggle_dataprovider at 0x7f1eebbe78c8>, 'wiggle-dict': <function Wiggle.wiggle_dict_dataprovider at 0x7f1eebbe7a60>}
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe84e0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>}
sniff(filename)
class galaxy.datatypes.interval.CustomTrack(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

UCSC CustomTrack

edam_format = 'format_3588'
file_ext = 'customtrack'
__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display app

set_meta(dataset, overwrite=True, **kwd)[source]
display_peek(dataset)[source]

Returns formated html of peek

get_estimated_display_viewport(dataset, chrom_col=None, start_col=None, end_col=None)[source]

Return a chrom, start, stop tuple for viewing a file.

sniff_prefix(file_prefix)[source]

Determines whether the file is in customtrack format.

CustomTrack files are built within Galaxy and are basically bed or interval files with the first line looking something like this.

track name=”User Track” description=”User Supplied Track (from Galaxy)” color=0,0,0 visibility=1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'complete.bed' )
>>> CustomTrack().sniff( fname )
False
>>> fname = get_test_fname( 'ucsc.customtrack' )
>>> CustomTrack().sniff( fname )
True
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8898>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8828>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe87b8>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe86d8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8748>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8908>}
sniff(filename)
class galaxy.datatypes.interval.ENCODEPeak(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Human ENCODE peak format. There are both broad and narrow peak formats. Formats are very similar; narrow peak has an additional column, though.

Broad peak ( http://genome.ucsc.edu/FAQ/FAQformat#format13 ): This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+3 format.

Narrow peak http://genome.ucsc.edu/FAQ/FAQformat#format12 and : This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

edam_format = 'format_3612'
file_ext = 'encodepeak'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'SignalValue', 'pValue', 'qValue', 'Peak']
data_sources = {'data': 'tabix', 'index': 'bigwig'}

Add metadata elements

sniff(filename)[source]
metadata_spec = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8b00>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8d30>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8c50>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f28>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8be0>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8cc0>}
class galaxy.datatypes.interval.ChromatinInteractions(**kwd)[source]

Bases: galaxy.datatypes.interval.Interval

Chromatin interactions obtained from 3C/5C/Hi-C experiments.

file_ext = 'chrint'
track_type = 'DiagonalHeatmapTrack'
data_sources = {'data': 'tabix', 'index': 'bigwig'}
column_names = ['Chrom1', 'Start1', 'End1', 'Chrom2', 'Start2', 'End2', 'Value']

Add metadata elements

metadata_spec = {'chrom1Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8ef0>, 'chrom2Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf20f0>, 'chromCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa8087b8>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf22b0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'end1Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf2080>, 'end2Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf21d0>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808e48>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808f28>, 'start1Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbe8fd0>, 'start2Col': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf2160>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808dd8>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa808eb8>, 'valueCol': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf2240>}
sniff(filename)[source]
class galaxy.datatypes.interval.ScIdx(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

ScIdx files are 1-based and consist of strand-specific coordinate counts. They always have 5 columns, and the first row is the column labels: ‘chrom’, ‘index’, ‘forward’, ‘reverse’, ‘value’. Each line following the first consists of data: chromosome name (type str), peak index (type int), Forward strand peak count (type int), Reverse strand peak count (type int) and value (type int). The value of the 5th ‘value’ column is the sum of the forward and reverse peak count values.

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf24a8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1eebbf2438>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>}
sniff(filename)
file_ext = 'scidx'
__init__(**kwd)[source]

Initialize scidx datatype.

sniff_prefix(file_prefix)[source]

Checks for ‘scidx-ness.’

galaxy.datatypes.isa module

ISA datatype

See https://github.com/ISA-tools

galaxy.datatypes.isa.utf8_text_file_open(path)[source]
class galaxy.datatypes.isa.IsaTab(**kwd)[source]

Bases: galaxy.datatypes.isa._Isa

file_ext = 'isa-tab'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed89b2208>}
class galaxy.datatypes.isa.IsaJson(**kwd)[source]

Bases: galaxy.datatypes.isa._Isa

file_ext = 'isa-json'
__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed89b2438>}

galaxy.datatypes.metadata module

Expose the model metadata module as a datatype module also, allowing it to live in galaxy.model means the model module doesn’t have any dependencies on th datatypes module. This module will need to remain here for datatypes living in the tool shed so we might as well keep and use this interface from the datatypes module.

class galaxy.datatypes.metadata.Statement(target)[source]

Bases: object

This class inserts its target into a list in the surrounding class. the data.Data class has a metaclass which executes these statements. This is how we shove the metadata element spec into the class.

__init__(target)[source]
classmethod process(element)[source]
class galaxy.datatypes.metadata.MetadataCollection(parent)[source]

Bases: object

MetadataCollection is not a collection at all, but rather a proxy to the real metadata which is stored as a Dictionary. This class handles processing the metadata elements when they are set and retrieved, returning default values in cases when metadata is not set.

__init__(parent)[source]
get_parent()[source]
set_parent(parent)[source]
parent
spec
get(key, default=None)[source]
items()[source]
remove_key(name)[source]
element_is_set(name)[source]
get_metadata_parameter(name, **kwd)[source]
make_dict_copy(to_copy)[source]

Makes a deep copy of input iterable to_copy according to self.spec

requires_dataset_id
from_JSON_dict(filename=None, path_rewriter=None, json_dict=None)[source]
to_JSON_dict(filename=None)[source]
class galaxy.datatypes.metadata.MetadataSpecCollection(*args, **kwds)[source]

Bases: collections.OrderedDict

A simple extension of OrderedDict which allows cleaner access to items and allows the values to be iterated over directly as if it were a list. append() is also implemented for simplicity and does not “append”.

__init__(*args, **kwds)[source]
append(item)[source]
class galaxy.datatypes.metadata.MetadataParameter(spec)[source]

Bases: object

__init__(spec)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
to_string(value)[source]
to_safe_string(value)[source]
make_copy(value, target_context=None, source_context=None)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

validate(value)[source]

Throw an exception if the value is invalid.

unwrap(form_value)[source]

Turns a value into its storable form.

wrap(value, session)[source]

Turns a value into its usable form.

from_external_value(value, parent)[source]

Turns a value read from an external dict into its value to be pushed directly into the metadata dict.

to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

class galaxy.datatypes.metadata.MetadataElementSpec(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]

Bases: object

Defines a metadata element and adds it to the metadata_spec (which is a MetadataSpecCollection) of datatype.

__init__(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]
get(name, default=None)[source]
wrap(value, session)[source]

Turns a stored value into its usable form.

unwrap(value)[source]

Turns an incoming value into its storable form.

class galaxy.datatypes.metadata.SelectParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

__init__(spec)[source]
to_string(value)[source]
get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
wrap(value, session)[source]
classmethod marshal(value)[source]
class galaxy.datatypes.metadata.DBKeyParameter(spec)[source]

Bases: galaxy.model.metadata.SelectParameter

get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
class galaxy.datatypes.metadata.RangeParameter(spec)[source]

Bases: galaxy.model.metadata.SelectParameter

__init__(spec)[source]
get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
classmethod marshal(value)[source]
class galaxy.datatypes.metadata.ColumnParameter(spec)[source]

Bases: galaxy.model.metadata.RangeParameter

get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
class galaxy.datatypes.metadata.ColumnTypesParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.ListParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.DictParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
to_safe_string(value)[source]
class galaxy.datatypes.metadata.PythonObjectParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
classmethod marshal(value)[source]
class galaxy.datatypes.metadata.FileParameter(spec)[source]

Bases: galaxy.model.metadata.MetadataParameter

to_string(value)[source]
to_safe_string(value)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
wrap(value, session)[source]
make_copy(value, target_context, source_context)[source]
classmethod marshal(value)[source]
from_external_value(value, parent, path_rewriter=None)[source]

Turns a value read from a external dict into its value to be pushed directly into the metadata dict.

to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

new_file(dataset=None, **kwds)[source]
class galaxy.datatypes.metadata.MetadataTempFile(**kwds)[source]

Bases: object

tmp_dir = 'database/tmp'
__init__(**kwds)[source]
file_name
to_JSON()[source]
classmethod from_JSON(json_dict)[source]
classmethod is_JSONified_value(value)[source]
classmethod cleanup_from_JSON_dict_filename(filename)[source]

galaxy.datatypes.microarrays module

class galaxy.datatypes.microarrays.GenericMicroarrayFile(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Abstract class for most of the microarray files.

set_peek(dataset, is_multi_byte=False)[source]
get_mime()[source]
metadata_spec = {'block_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed866b400>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fccf8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85eca20>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed866b588>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed866b160>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85ec748>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85ab8d0>}
class galaxy.datatypes.microarrays.Gal(**kwd)[source]

Bases: galaxy.datatypes.microarrays.GenericMicroarrayFile

Gal File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gal

edam_format = 'format_3829'
edam_data = 'data_3110'
file_ext = 'gal'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a Gal file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘test.gal’) >>> Gal().sniff(fname) True >>> fname = get_test_fname(‘test.gpr’) >>> Gal().sniff(fname) False

set_meta(dataset, **kwd)[source]

Set metadata for Gal file.

metadata_spec = {'block_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fcc88>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed861b550>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fc438>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fcc18>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fc160>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fcbe0>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed85fc780>}
sniff(filename)
class galaxy.datatypes.microarrays.Gpr(**kwd)[source]

Bases: galaxy.datatypes.microarrays.GenericMicroarrayFile

Gpr File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gpr

edam_format = 'format_3829'
edam_data = 'data_3110'
file_ext = 'gpr'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a Gpr file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘test.gpr’) >>> Gpr().sniff(fname) True >>> fname = get_test_fname(‘test.gal’) >>> Gpr().sniff(fname) False

set_meta(dataset, **kwd)[source]

Set metadata for Gpr file.

metadata_spec = {'block_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed86088d0>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8608e48>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed861bb38>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed8d18da0>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed861b3c8>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed861bba8>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed861b898>}
sniff(filename)

galaxy.datatypes.molecules module

galaxy.datatypes.molecules.count_lines(filename, non_empty=False)[source]

counting the number of lines from the ‘filename’ file

class galaxy.datatypes.molecules.GenericMolFile(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Abstract class for most of the molecule files.

set_peek(dataset, is_multi_byte=False)[source]
get_mime()[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed838c908>}
class galaxy.datatypes.molecules.MOL(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

file_ext = 'mol'
set_meta(dataset, **kwd)[source]

Set the number molecules, in the case of MOL its always one.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed838ccc0>}
class galaxy.datatypes.molecules.SDF(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

file_ext = 'sdf'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a SDF2 file.

An SDfile (structure-data file) can contain multiple compounds.

Each compound starts with a block in V2000 or V3000 molfile format, which ends with a line equal to ‘M END’. This is followed by a non-structural data block, which ends with a line equal to ‘$$$$’.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.sdf')
>>> SDF().sniff(fname)
True
>>> fname = get_test_fname('github88.v3k.sdf')
>>> SDF().sniff(fname)
True
>>> fname = get_test_fname('chebi_57262.v3k.mol')
>>> SDF().sniff(fname)
False
set_meta(dataset, **kwd)[source]

Set the number of molecules in dataset.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by molecule records.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84a29e8>}
sniff(filename)
class galaxy.datatypes.molecules.MOL2(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

file_ext = 'mol2'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a MOL2 file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.mol2')
>>> MOL2().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> MOL2().sniff(fname)
False
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by molecule records.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84c9630>}
sniff(filename)
class galaxy.datatypes.molecules.FPS(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

chemfp fingerprint file: http://code.google.com/p/chem-fingerprints/wiki/FPS

file_ext = 'fps'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a FPS file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('q.fps')
>>> FPS().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> FPS().sniff(fname)
False
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by fingerprint records.

static merge(split_files, output_file)[source]

Merging fps files requires merging the header manually. We take the header from the first file.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84c9898>}
sniff(filename)
class galaxy.datatypes.molecules.OBFS(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

OpenBabel Fastsearch format (fs).

file_ext = 'obfs'
composite_type = 'basic'
allow_datatype_change = False
__init__(**kwd)[source]

A Fastsearch Index consists of a binary file with the fingerprints and a pointer the actual molecule file.

set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text.

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

get_mime()[source]

Returns the mime type of the datatype (pretend it is text for peek)

merge(split_files, output_file, extra_merge_args)[source]

Merging Fastsearch indices is not supported.

split(input_datasets, subdir_generator_function, split_params)[source]

Splitting Fastsearch indices is not supported.

metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84c9a90>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa882e48>}
class galaxy.datatypes.molecules.DRF(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

file_ext = 'drf'
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84c9c88>}
class galaxy.datatypes.molecules.PHAR(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

Pharmacophore database format from silicos-it.

file_ext = 'phar'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84c9e48>}
class galaxy.datatypes.molecules.PDB(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

Protein Databank format. http://www.wwpdb.org/documentation/format33/v3.3.html

file_ext = 'pdb'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a PDB file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('5e5z.pdb')
>>> PDB().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> PDB().sniff(fname)
False
set_meta(dataset, **kwd)[source]

Find Chain_IDs for metadata.

set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7080>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed838c908>}
sniff(filename)
class galaxy.datatypes.molecules.PDBQT(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

PDBQT Autodock and Autodock Vina format http://autodock.scripps.edu/faqs-help/faq/what-is-the-format-of-a-pdbqt-file

file_ext = 'pdbqt'
sniff_prefix(file_prefix)[source]

Try to guess if the file is a PDBQT file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('NuBBE_1_obabel_3D.pdbqt')
>>> PDBQT().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> PDBQT().sniff(fname)
False
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7278>}
sniff(filename)
class galaxy.datatypes.molecules.PQR(**kwd)[source]

Bases: galaxy.datatypes.molecules.GenericMolFile

Protein Databank format. https://apbs-pdb2pqr.readthedocs.io/en/latest/formats/pqr.html

file_ext = 'pqr'
get_matcher()[source]
Atom and HETATM line fields are space separated, match group:
0: Field_name
A string which specifies the type of PQR entry: ATOM or HETATM.
1: Atom_number
An integer which provides the atom index.
2: Atom_name
A string which provides the atom name.
3: Residue_name
A string which provides the residue name.
5: Chain_ID (Optional, group 4 is whole field)
An optional string which provides the chain ID of the atom. Note that chain ID support is a new feature of APBS 0.5.0 and later versions.
6: Residue_number
An integer which provides the residue index.
7: X 8: Y 9: Z
3 floats which provide the atomic coordinates (in angstroms)
10: Charge
A float which provides the atomic charge (in electrons).
11: Radius
A float which provides the atomic radius (in angstroms).
sniff_prefix(file_prefix)[source]

Try to guess if the file is a PQR file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘5e5z.pqr’) >>> PQR().sniff(fname) True >>> fname = get_test_fname(‘drugbank_drugs.cml’) >>> PQR().sniff(fname) False

set_meta(dataset, **kwd)[source]

Find Optional Chain_IDs for metadata.

set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7470>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed838c908>}
sniff(filename)
class galaxy.datatypes.molecules.grd(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'grd'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7668>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.molecules.grdtgz(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

file_ext = 'grd.tgz'
set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7860>}
class galaxy.datatypes.molecules.InChI(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'inchi'
column_names = ['InChI']
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

set_peek(dataset, is_multi_byte=False)[source]
sniff_prefix(file_prefix)[source]

Try to guess if the file is a InChI file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.inchi')
>>> InChI().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> InChI().sniff(fname)
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7a90>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7a20>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7b00>}
sniff(filename)
class galaxy.datatypes.molecules.SMILES(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'smi'
column_names = ['SMILES', 'TITLE']
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7d30>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7cc0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d7da0>}
class galaxy.datatypes.molecules.CML(**kwd)[source]

Bases: galaxy.datatypes.xml.GenericXml

Chemical Markup Language http://cml.sourceforge.net/

file_ext = 'cml'
set_meta(dataset, **kwd)[source]

Set the number of lines of data in dataset.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edf4034e0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84af048>}
set_peek(dataset, is_multi_byte=False)[source]
sniff(filename)
sniff_prefix(file_prefix)[source]

Try to guess if the file is a CML file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('interval.interval')
>>> CML().sniff(fname)
False
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> CML().sniff(fname)
True
classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by molecule records.

static merge(split_files, output_file)[source]

Merging CML files.

galaxy.datatypes.mothur module

Mothur Metagenomics Datatypes

class galaxy.datatypes.mothur.Otu(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'mothur.otu'
__init__(**kwd)[source]
set_meta(dataset, overwrite=True, **kwd)[source]

Set metadata for Otu files.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> from galaxy.util.bunch import Bunch
>>> dataset = Bunch()
>>> dataset.metadata = Bunch
>>> otu = Otu()
>>> dataset.file_name = get_test_fname( 'mothur_datatypetest_true.mothur.otu' )
>>> dataset.has_data = lambda: True
>>> otu.set_meta(dataset)
>>> dataset.metadata.columns
100
>>> len(dataset.metadata.labels) == 37
True
>>> len(dataset.metadata.otulabels) == 98
True
sniff_prefix(file_prefix)[source]

Determines whether the file is otu (operational taxonomic unit) format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.otu' )
>>> Otu().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.otu' )
>>> Otu().sniff( fname )
False
metadata_spec = {'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed800cc50>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'labels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed800cb38>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d86d8>}
sniff(filename)
class galaxy.datatypes.mothur.Sabund(**kwd)[source]

Bases: galaxy.datatypes.mothur.Otu

file_ext = 'mothur.sabund'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Sabund_file

init_meta(dataset, copy_from=None)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is otu (operational taxonomic unit) format label<TAB>count[<TAB>value(1..n)]

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.sabund' )
>>> Sabund().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.sabund' )
>>> Sabund().sniff( fname )
False
metadata_spec = {'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7f432b0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'labels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed92a67f0>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edd4389b0>}
class galaxy.datatypes.mothur.GroupAbund(**kwd)[source]

Bases: galaxy.datatypes.mothur.Otu

file_ext = 'mothur.shared'
__init__(**kwd)[source]
init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, skip=1, **kwd)[source]
sniff_prefix(file_prefix, vals_are_int=False)[source]

Determines whether the file is a otu (operational taxonomic unit) Shared format label<TAB>group<TAB>count[<TAB>value(1..n)] The first line is column headings as of Mothur v 1.2

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.shared' )
>>> GroupAbund().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.shared' )
>>> GroupAbund().sniff( fname )
False
metadata_spec = {'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed800cc50>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7f04240>, 'labels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed800cb38>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed84d86d8>}
class galaxy.datatypes.mothur.SecondaryStructureMap(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.map'
__init__(**kwd)[source]

Initialize secondary structure map datatype

sniff_prefix(file_prefix)[source]

Determines whether the file is a secondary structure map format A single column with an integer value which indicates the row that this row maps to. Check to make sure if structMap[10] = 380 then structMap[380] = 10 and vice versa.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.map' )
>>> SecondaryStructureMap().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.map' )
>>> SecondaryStructureMap().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6c780>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6c898>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6c908>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6cac8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6ca20>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6c208>}
sniff(filename)
class galaxy.datatypes.mothur.AlignCheck(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.align.check'
__init__(**kwd)[source]

Initialize AlignCheck datatype

set_meta(dataset, overwrite=True, **kwd)[source]
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f0f0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f080>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6cfd0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6cef0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e6cf60>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f160>}
class galaxy.datatypes.mothur.AlignReport(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template AY457915 501 82283 1525 kmer 89.07 needleman 5 501 1 499 499 2 0 0 97.6

file_ext = 'mothur.align.report'
__init__(**kwd)[source]

Initialize AlignCheck datatype

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f550>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f4e0>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f470>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f390>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f400>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f5c0>}
class galaxy.datatypes.mothur.DistanceMatrix(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'mothur.dist'

Add metadata elements

init_meta(dataset, copy_from=None)[source]
set_meta(dataset, overwrite=True, skip=0, **kwd)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7f828>}
class galaxy.datatypes.mothur.LowerTriangleDistanceMatrix(**kwd)[source]

Bases: galaxy.datatypes.mothur.DistanceMatrix

file_ext = 'mothur.lower.dist'
__init__(**kwd)[source]

Initialize secondary structure map datatype

init_meta(dataset, copy_from=None)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is a lower-triangle distance matrix (phylip) format The first line has the number of sequences in the matrix. The remaining lines have the sequence name followed by a list of distances from all preceeding sequences

5 # possibly but not always preceded by a tab :/ U68589 U68590 0.3371 U68591 0.3609 0.3782 U68592 0.4155 0.3197 0.4148 U68593 0.2872 0.1690 0.3361 0.2842
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.lower.dist' )
>>> LowerTriangleDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.lower.dist' )
>>> LowerTriangleDistanceMatrix().sniff( fname )
False
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7fa90>}
sniff(filename)
class galaxy.datatypes.mothur.SquareDistanceMatrix(**kwd)[source]

Bases: galaxy.datatypes.mothur.DistanceMatrix

file_ext = 'mothur.square.dist'
__init__(**kwd)[source]
init_meta(dataset, copy_from=None)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is a square distance matrix (Column-formatted distance matrix) format The first line has the number of sequences in the matrix. The following lines have the sequence name in the first column plus a column for the distance to each sequence in the row order in which they appear in the matrix.

3 U68589 0.0000 0.3371 0.3610 U68590 0.3371 0.0000 0.3783 U68590 0.3371 0.0000 0.3783
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.square.dist' )
>>> SquareDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.square.dist' )
>>> SquareDistanceMatrix().sniff( fname )
False
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7fcf8>}
sniff(filename)
class galaxy.datatypes.mothur.PairwiseDistanceMatrix(**kwd)[source]

Bases: galaxy.datatypes.mothur.DistanceMatrix, galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.pair.dist'
__init__(**kwd)[source]

Initialize secondary structure map datatype

set_meta(dataset, overwrite=True, skip=None, **kwd)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is a pairwise distance matrix (Column-formatted distance matrix) format The first and second columns have the sequence names and the third column is the distance between those sequences.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.pair.dist' )
>>> PairwiseDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.pair.dist' )
>>> PairwiseDistanceMatrix().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e7fef0>}
sniff(filename)
class galaxy.datatypes.mothur.Names(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.names'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Name_file Name file shows the relationship between a representative sequence(col 1) and the sequences(comma-separated) it represents(col 2)

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d2e8>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d278>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d208>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d128>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d198>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d358>}
class galaxy.datatypes.mothur.Summary(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.summary'
__init__(**kwd)[source]

summarizes the quality of sequences in an unaligned or aligned fasta-formatted sequence file

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d748>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d6d8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d668>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d588>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d5f8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d7b8>}
class galaxy.datatypes.mothur.Group(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.groups'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Groups_file Group file assigns sequence (col 1) to a group (col 2)

set_meta(dataset, overwrite=True, skip=None, max_data_lines=None, **kwd)[source]
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8d9e8>}
class galaxy.datatypes.mothur.AccNos(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.accnos'
__init__(**kwd)[source]

A list of names

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8dda0>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8dd30>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8dcc0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8dbe0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8dc50>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e8de10>}
class galaxy.datatypes.mothur.Oligos(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'mothur.oligos'
sniff_prefix(file_prefix)[source]

http://www.mothur.org/wiki/Oligos_File Determines whether the file is a otu (operational taxonomic unit) format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.oligos' )
>>> Oligos().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.oligos' )
>>> Oligos().sniff( fname )
False
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.mothur.Frequency(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.freq'
__init__(**kwd)[source]

A list of names

sniff_prefix(file_prefix)[source]

Determines whether the file is a frequency tabular format for chimera analysis #1.14.0 0 0.000 1 0.000 … 155 0.975

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.freq' )
>>> Frequency().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.freq' )
>>> Frequency().sniff( fname )
False
>>> # Expression count matrix (EdgeR wrapper)
>>> fname = get_test_fname( 'mothur_datatypetest_false_2.mothur.freq' )
>>> Frequency().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94438>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e943c8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94358>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94278>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e942e8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e944a8>}
sniff(filename)
class galaxy.datatypes.mothur.Quantile(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.quan'
__init__(**kwd)[source]

Quantiles for chimera analysis

sniff_prefix(file_prefix)[source]

Determines whether the file is a quantiles tabular format for chimera analysis 1 0 0 0 0 0 0 2 0.309198 0.309198 0.37161 0.37161 0.37161 0.37161 3 0.510982 0.563213 0.693529 0.858939 1.07442 1.20608 …

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.quan' )
>>> Quantile().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.quan' )
>>> Quantile().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'filtered': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e946a0>, 'masked': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94710>}
sniff(filename)
class galaxy.datatypes.mothur.LaneMask(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'mothur.filter'
sniff_prefix(file_prefix)[source]

Determines whether the file is a lane mask filter: 1 line consisting of zeros and ones.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.filter' )
>>> LaneMask().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.filter' )
>>> LaneMask().sniff( fname )
False
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e948d0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.mothur.CountTable(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.count_table'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Count_File A table with first column names and following columns integer counts # Example 1: Representative_Sequence total U68630 1 U68595 1 U68600 1 # Example 2 (with group columns): Representative_Sequence total forest pasture U68630 1 1 0 U68595 1 1 0 U68600 1 1 0 U68591 1 1 0 U68647 1 0 1

set_meta(dataset, overwrite=True, skip=1, max_data_lines=None, **kwd)[source]
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'groups': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94b38>}
class galaxy.datatypes.mothur.RefTaxonomy(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.ref.taxonomy'
__init__(**kwd)[source]
sniff_prefix(file_prefix)[source]

Determines whether the file is a Reference Taxonomy

http://www.mothur.org/wiki/Taxonomy_outline A table with 2 or 3 columns: - SequenceName - Taxonomy (semicolon-separated taxonomy in descending order) - integer ? Example: 2-column (http://www.mothur.org/wiki/Taxonomy_outline)

X56533.1 Eukaryota;Alveolata;Ciliophora;Intramacronucleata;Oligohymenophorea;Hymenostomatida;Tetrahymenina;Glaucomidae;Glaucoma; X97975.1 Eukaryota;Parabasalidea;Trichomonada;Trichomonadida;unclassified_Trichomonadida; AF052717.1 Eukaryota;Parabasalidea;
Example: 3-column (http://vamps.mbl.edu/resources/databases.php)
v3_AA008 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus 5 v3_AA016 Bacteria 120 v3_AA019 Archaea;Crenarchaeota;Marine_Group_I 1
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.ref.taxonomy' )
>>> RefTaxonomy().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.ref.taxonomy' )
>>> RefTaxonomy().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94f28>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94eb8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94e48>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94d68>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94dd8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e94f98>}
sniff(filename)
class galaxy.datatypes.mothur.ConsensusTaxonomy(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.cons.taxonomy'
__init__(**kwd)[source]

A list of names

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b390>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b320>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b2b0>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b1d0>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b240>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b400>}
class galaxy.datatypes.mothur.TaxonomySummary(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.tax.summary'
__init__(**kwd)[source]

A Summary of taxon classification

metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b7b8>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b748>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b6d8>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b5f8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b668>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b828>}
class galaxy.datatypes.mothur.Axes(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.axes'
__init__(**kwd)[source]

Initialize axes datatype

sniff_prefix(file_prefix)[source]

Determines whether the file is an axes format The first line may have column headings. The following lines have the name in the first column plus float columns for each axis. ==> 98_sq_phylip_amazon.fn.unique.pca.axes <==

group axis1 axis2 forest 0.000000 0.145743 pasture 0.145743 0.000000
==> 98_sq_phylip_amazon.nmds.axes <==
axis1 axis2

U68589 0.262608 -0.077498 U68590 0.027118 0.195197 U68591 0.329854 0.014395

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.axes' )
>>> Axes().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.axes' )
>>> Axes().sniff( fname )
False
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9bba8>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9bb38>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9bac8>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9b9e8>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9ba58>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9bc18>}
sniff(filename)
class galaxy.datatypes.mothur.SffFlow(**kwd)[source]

Bases: galaxy.datatypes.tabular.Tabular

file_ext = 'mothur.sff.flow'

https://mothur.org/wiki/flow_file/ The first line is the total number of flow values - 800 for Titanium data. For GS FLX it would be 400. Following lines contain: - SequenceName - the number of useable flows as defined by 454’s software - the flow intensity for each base going in the order of TACG. Example:

800 GQY1XT001CQL4K 85 1.04 0.00 1.00 0.02 0.03 1.02 0.05 … GQY1XT001CQIRF 84 1.02 0.06 0.98 0.06 0.09 1.05 0.07 … GQY1XT001CF5YW 88 1.02 0.02 1.01 0.04 0.06 1.02 0.03 …
__init__(**kwd)[source]
metadata_spec = {'column_names': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f160>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa7667b8>, 'columns': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766278>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa75bd30>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa766048>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efa77f438>, 'flow_order': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9beb8>, 'flow_values': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed7e9be48>}
set_meta(dataset, overwrite=True, skip=1, max_data_lines=None, **kwd)[source]
make_html_table(dataset, skipchars=None)[source]

Create HTML table, used for displaying peek

galaxy.datatypes.msa module

class galaxy.datatypes.msa.InfernalCM(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'cm'
set_peek(dataset, is_multi_byte=False)[source]
sniff_prefix(file_prefix)[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'infernal_model.cm' )
>>> InfernalCM().sniff( fname )
True
>>> fname = get_test_fname( '2.txt' )
>>> InfernalCM().sniff( fname )
False
set_meta(dataset, **kwd)[source]

Set the number of models and the version of CM file in dataset.

metadata_spec = {'cm_version': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed78f2f60>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed78f2ef0>}
sniff(filename)
class galaxy.datatypes.msa.Hmmer(**kwd)[source]

Bases: galaxy.datatypes.data.Text

edam_data = 'data_1364'
edam_format = 'format_1370'
set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
sniff_prefix(filename)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83df198>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)
class galaxy.datatypes.msa.Hmmer2(**kwd)[source]

Bases: galaxy.datatypes.msa.Hmmer

edam_format = 'format_3328'
file_ext = 'hmm2'
sniff_prefix(file_prefix)[source]

HMMER2 files start with HMMER2.0

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83df358>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.msa.Hmmer3(**kwd)[source]

Bases: galaxy.datatypes.msa.Hmmer

edam_format = 'format_3329'
file_ext = 'hmm3'
sniff_prefix(file_prefix)[source]

HMMER3 files start with HMMER3/f

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83df550>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.msa.HmmerPress(**kwd)[source]

Bases: galaxy.datatypes.binary.Binary

Class for hmmpress database files.

file_ext = 'hmmpress'
allow_datatype_change = False
composite_type = 'basic'
set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text.

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

__init__(**kwd)[source]
metadata_spec = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83df748>}
class galaxy.datatypes.msa.Stockholm_1_0(**kwd)[source]

Bases: galaxy.datatypes.data.Text

edam_data = 'data_0863'
edam_format = 'format_1961'
file_ext = 'stockholm'
set_peek(dataset, is_multi_byte=False)[source]
sniff_prefix(file_prefix)[source]
set_meta(dataset, **kwd)[source]

Set the number of models in dataset.

classmethod split(input_datasets, subdir_generator_function, split_params)[source]

Split the input files by model records.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83df940>}
sniff(filename)
class galaxy.datatypes.msa.MauveXmfa(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'xmfa'
set_peek(dataset, is_multi_byte=False)[source]
sniff_prefix(file_prefix)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed83dfb38>}
set_meta(dataset, **kwd)[source]
sniff(filename)

galaxy.datatypes.neo4j module

Neo4j Composite Dataset

class galaxy.datatypes.neo4j.Neo4j(**kwd)[source]

Bases: galaxy.datatypes.images.Html

base class to use for neostore datatypes derived from html - composite datatype elements stored in extra files path

generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime()[source]

Returns the mime type of the datatype

set_peek(dataset, is_multi_byte=False)[source]

Set the peek and blurb text

display_peek(dataset)[source]

Create HTML content, used for displaying peek.

metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed791add8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.neo4j.Neo4jDB(**kwd)[source]

Bases: galaxy.datatypes.neo4j.Neo4j, galaxy.datatypes.data.Data

Class for neo4jDB database files.

file_ext = 'neostore'
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(**kwd)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed791af60>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
class galaxy.datatypes.neo4j.Neo4jDBzip(**kwd)[source]

Bases: galaxy.datatypes.neo4j.Neo4j, galaxy.datatypes.data.Data

Class for neo4jDB database files.

file_ext = 'neostore.zip'
composite_type = 'auto_primary_file'
allow_datatype_change = False
__init__(**kwd)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed791add8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'neostore_zip': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed791ab38>, 'reference_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed791a208>}

galaxy.datatypes.ngsindex module

NGS indexes

class galaxy.datatypes.ngsindex.BowtieIndex(**kwd)[source]

Bases: galaxy.datatypes.text.Html

base class for BowtieIndex is subclassed by BowtieColorIndex and BowtieBaseIndex

composite_type = 'auto_primary_file'
allow_datatype_change = False
generate_primary_file(dataset=None)[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

regenerate_primary_file(dataset)[source]

cannot do this until we are setting metadata

set_peek(dataset, is_multi_byte=False)[source]
display_peek(dataset)[source]
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910d978>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910d9b0>}
class galaxy.datatypes.ngsindex.BowtieColorIndex(**kwd)[source]

Bases: galaxy.datatypes.ngsindex.BowtieIndex

Bowtie color space index

file_ext = 'bowtie_color_index'
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910d978>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910db70>}
class galaxy.datatypes.ngsindex.BowtieBaseIndex(**kwd)[source]

Bases: galaxy.datatypes.ngsindex.BowtieIndex

Bowtie base space index

file_ext = 'bowtie_base_index'
metadata_spec = {'base_name': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910d978>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edbd33cf8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1ed910d8d0>}

galaxy.datatypes.phylip module

Created on January. 05, 2018

@authors: Kenzo-Hugo Hillion and Fabien Mareuil, Institut Pasteur, Paris @contacts: kehillio@pasteur.fr and fabien.mareuil@pasteur.fr @project: galaxy @githuborganization: C3BI Phylip datatype sniffer

class galaxy.datatypes.phylip.Phylip(**kwd)[source]

Bases: galaxy.datatypes.data.Text

Phylip format stores a multiple sequence alignment

edam_data = 'data_0863'
edam_format = 'format_1997'
file_ext = 'phylip'

Add metadata elements

set_meta(dataset, **kwd)[source]

Set the number of sequences and the number of data lines in dataset.

set_peek(dataset, is_multi_byte=False)[source]
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbedac8>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb596be0>}
sniff(filename)
sniff_prefix(file_prefix)[source]

All Phylip files starts with the number of sequences so we can use this to count the following number of sequences in the first ‘stack’

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.phylip')
>>> Phylip().sniff(fname)
True

galaxy.datatypes.plant_tribes module

class galaxy.datatypes.plant_tribes.Smat(**kwd)[source]

Bases: galaxy.datatypes.data.Text

file_ext = 'smat'
display_peek(dataset)[source]
set_peek(dataset, is_multi_byte=False)[source]
sniff_prefix(file_prefix)[source]

The use of ESTScan implies the creation of scores matrices which reflect the codons preferences in the studied organisms. The ESTScan package includes scripts for generating these files. The output of these scripts consists of the matrices, one for each isochor, and which look like this:

FORMAT: hse_4is.conf CODING REGION 6 3 1 s C+G: 0 44 -1 0 2 -2 2 1 -8 0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_space.txt')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('test_tab.bed')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('1.smat')
>>> Smat().sniff(fname)
True
metadata_spec = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1edb5966a0>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object at 0x7f1efdbed7f0>}
sniff(filename)