Warning

This document is for an in-development version of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.

galaxy.datatypes package

Subpackages

Submodules

galaxy.datatypes.annotation module

class galaxy.datatypes.annotation.SnapHmm(**kwd)[source]

Bases: Text

file_ext = 'snaphmm'
edam_data = 'data_1364'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

SNAP model files start with zoeHMM

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.annotation.Augustus(**kwd)[source]

Bases: CompressedArchive

Class describing an Augustus prediction model

file_ext = 'augustus'
edam_data = 'data_0950'
compressed = True
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff(filename: str) bool[source]

Augustus archives always contain the same files

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.anvio module

Datatypes for Anvi’o https://github.com/merenlab/anvio

class galaxy.datatypes.anvio.AnvioComposite(**kwd)[source]

Bases: Html

Base class to use for Anvi’o composite datatypes. Generally consist of a sqlite database, plus optional additional files

file_ext = 'anvio_composite'
composite_type: str | None = 'auto_primary_file'
generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime() str[source]

Returns the mime type of the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioDB(*args, **kwd)[source]

Bases: AnvioComposite

Class for AnvioDB database files.

file_ext = 'anvio_db'
__init__(*args, **kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the anvio_basename based upon actual extra_files_path contents.

metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioStructureDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Structure DB database files.

file_ext = 'anvio_structure_db'
metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioGenomesDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Genomes DB database files.

file_ext = 'anvio_genomes_db'
metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioContigsDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Contigs DB database files.

file_ext = 'anvio_contigs_db'
__init__(*args, **kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioProfileDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Profile DB database files.

file_ext = 'anvio_profile_db'
__init__(*args, **kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioPanDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Pan DB database files.

file_ext = 'anvio_pan_db'
metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.anvio.AnvioSamplesDB(*args, **kwd)[source]

Bases: AnvioDB

Class for Anvio Samples DB database files.

file_ext = 'anvio_samples_db'
metadata_spec: metadata.MetadataSpecCollection = {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.assembly module

velvet datatypes James E Johnson - University of Minnesota for velvet assembler tool in galaxy

class galaxy.datatypes.assembly.Amos(**kwd)[source]

Bases: Text

Class describing the AMOS assembly file

edam_data = 'data_0925'
edam_format = 'format_3582'
file_ext = 'afg'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is an amos assembly file format Example:

{CTG
iid:1
eid:1
seq:
CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}
}
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.assembly.Sequences(**kwd)[source]

Bases: Fasta

Class describing the Sequences file generated by velveth

edam_data = 'data_0925'
file_ext = 'sequences'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a velveth produced fasta format The id line has 3 fields separated by tabs: sequence_name sequence_index category:

>SEQUENCE_0_length_35   1       1
GGATATAGGGCCAACCCAACTCAACGGCCTGTCTT
>SEQUENCE_1_length_35   2       1
CGACGAATGACAGGTCACGAATTTGGCGGGGATTA
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.assembly.Roadmaps(**kwd)[source]

Bases: Text

Class describing the Sequences file generated by velveth

edam_format = 'format_2561'
file_ext = 'roadmaps'
sniff_prefix(file_prefix: FilePrefix) bool[source]
Determines whether the file is a velveth produced RoadMap::

142858 21 1 ROADMAP 1 ROADMAP 2 …

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.assembly.Velvet(**kwd)[source]

Bases: Html

composite_type: str | None = 'auto_primary_file'
file_ext = 'velvet'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
regenerate_primary_file(dataset: DatasetProtocol) None[source]

cannot do this until we are setting metadata

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'long_reads': <galaxy.model.metadata.MetadataElementSpec object>, 'paired_end_reads': <galaxy.model.metadata.MetadataElementSpec object>, 'short2_reads': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.binary module

Binary classes

class galaxy.datatypes.binary.Binary(**kwd)[source]

Bases: Data

Binary data

edam_format = 'format_2333'
file_ext = 'binary'
static register_sniffable_binary_format(data_type, ext, type_class)[source]

Deprecated method.

static register_unsniffable_binary_ext(ext)[source]

Deprecated method.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

get_mime() str[source]

Returns the mime type of the datatype

get_structured_content(dataset, content_type, **kwargs)[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Ab1(**kwd)[source]

Bases: Binary

Class describing an ab1 binary sequence file

file_ext = 'ab1'
edam_format = 'format_3000'
edam_data = 'data_0924'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Idat(**kwd)[source]

Bases: Binary

Binary data in idat format

file_ext = 'idat'
edam_format = 'format_2058'
edam_data = 'data_2603'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Cel(**kwd)[source]

Bases: Binary

Cel File format described at: http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html

is_binary: bool | typing_extensions.Literal[maybe] = 'maybe'
file_ext = 'cel'
edam_format = 'format_1638'
edam_data = 'data_3110'
sniff(filename: str) bool[source]

Try to guess if the file is a Cel file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('affy_v_agcc.cel')
>>> Cel().sniff(fname)
True
>>> fname = get_test_fname('affy_v_3.cel')
>>> Cel().sniff(fname)
True
>>> fname = get_test_fname('affy_v_4.cel')
>>> Cel().sniff(fname)
True
>>> fname = get_test_fname('test.gal')
>>> Cel().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set metadata for Cel file.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MashSketch(**kwd)[source]

Bases: Binary

Mash Sketch file. Sketches are used by the MinHash algorithm to allow fast distance estimations with low storage and memory requirements. To make a sketch, each k-mer in a sequence is hashed, which creates a pseudo-random identifier. By sorting these identifiers (hashes), a small subset from the top of the sorted list can represent the entire sequence (these are min-hashes). The more similar another sequence is, the more min-hashes it is likely to share.

file_ext = 'msh'
is_binary: bool | typing_extensions.Literal[maybe] = 'maybe'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.CompressedArchive(**kwd)[source]

Bases: Binary

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'compressed_archive'
compressed = True
is_binary: bool | typing_extensions.Literal[maybe] = 'maybe'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Meryldb(**kwd)[source]

Bases: CompressedArchive

MerylDB is a tar.gz archive, with 128 files. 64 data files and 64 index files.

file_ext = 'meryldb'
sniff(filename: str) bool[source]

Try to guess if the file is a Cel file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('affy_v_agcc.cel')
>>> Meryldb().sniff(fname)
False
>>> fname = get_test_fname('read-db.meryldb')
>>> Meryldb().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Visium(**kwd)[source]

Bases: CompressedArchive

Visium is a tar.gz archive with at least a ‘Spatial’ subfolder, a filtered h5 file and a raw h5 file.

file_ext = 'visium.tar.gz'
sniff(filename: str) bool[source]

Check data structure: Contains h5 files Contains spatial folder

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Bref3(**kwd)[source]

Bases: Binary

Bref3 format is a binary format for storing phased, non-missing genotypes for a list of samples.

file_ext = 'bref3'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.DynamicCompressedArchive(**kwd)[source]

Bases: CompressedArchive

compressed_format: str
uncompressed_datatype_instance: Data
matches_any(target_datatypes: List[Any]) bool[source]

Treat two aspects of compressed datatypes separately.

metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.GzDynamicCompressedArchive(**kwd)[source]

Bases: DynamicCompressedArchive

compressed_format: str = 'gzip'
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

uncompressed_datatype_instance: Data
class galaxy.datatypes.binary.Bz2DynamicCompressedArchive(**kwd)[source]

Bases: DynamicCompressedArchive

compressed_format: str = 'bz2'
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

uncompressed_datatype_instance: Data
class galaxy.datatypes.binary.CompressedZipArchive(**kwd)[source]

Bases: CompressedArchive

Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.

file_ext = 'zip'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.GenericAsn1Binary(**kwd)[source]

Bases: Binary

Class for generic ASN.1 binary format

file_ext = 'asn1-binary'
edam_format = 'format_1966'
edam_data = 'data_0849'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BamNative(**kwd)[source]

Bases: CompressedArchive, _BamOrSam

Class describing a BAM binary file that is not necessarily sorted

edam_format = 'format_2572'
edam_data = 'data_0863'
file_ext = 'unsorted.bam'
sort_flag: str | None = None
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

static merge(split_files: List[str], output_file: str) None[source]

Merges BAM files

Parameters:
  • split_files – List of bam file paths to merge

  • output_file – Write merged bam file to this location

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
sniff(filename: str) bool[source]
classmethod is_bam(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

to_archive(dataset: DatasetProtocol, name: str = '') Iterable[source]

Collect archive paths and file handles that need to be exported when archiving dataset.

Parameters:
  • dataset – HistoryDatasetAssociation

  • name – archive name, in collection context corresponds to collection name(s) and element_identifier, joined by ‘/’, e.g ‘fastq_collection/sample1/forward’

groom_dataset_content(file_name: str) None[source]

Ensures that the BAM file contents are coordinate-sorted. This function is called on an output dataset after the content is initially generated.

get_chunk(trans, dataset: HasFileName, offset: int = 0, ck_size: int | None = None) str[source]
display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, offset: int | None = None, ck_size: int | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]
metadata_spec: metadata.MetadataSpecCollection = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Bam(**kwd)[source]

Bases: BamNative

Class describing a BAM binary file

edam_format = 'format_2572'
edam_data = 'data_0863'
file_ext = 'bam'
track_type: str | None = 'ReadTrack'
data_sources: Dict[str, str] = {'data': 'bai', 'index': 'bigwig'}
get_index_flag(file_name: str) str[source]

Return pysam flag for bai index (default) or csi index (contig size > (2**29 - 1) )

dataset_content_needs_grooming(file_name: str) bool[source]

Check if file_name is a coordinate-sorted BAM file

set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
line_dataprovider(dataset: DatasetProtocol, **settings) FilteredLineDataProvider[source]
regex_line_dataprovider(dataset: DatasetProtocol, **settings) RegexLineDataProvider[source]
column_dataprovider(dataset: DatasetProtocol, **settings) ColumnarDataProvider[source]
dict_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]
header_dataprovider(dataset: DatasetProtocol, **settings) RegexLineDataProvider[source]
id_seq_qual_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) ColumnarDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]
samtools_dataprovider(dataset: DatasetProtocol, **settings) SamtoolsDataProvider[source]

Generic samtools interface - all options available through settings.

dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function Bam.column_dataprovider>, 'dict': <function Bam.dict_dataprovider>, 'genomic-region': <function Bam.genomic_region_dataprovider>, 'genomic-region-dict': <function Bam.genomic_region_dict_dataprovider>, 'header': <function Bam.header_dataprovider>, 'id-seq-qual': <function Bam.id_seq_qual_dataprovider>, 'line': <function Bam.line_dataprovider>, 'regex-line': <function Bam.regex_line_dataprovider>, 'samtools': <function Bam.samtools_dataprovider>}
metadata_spec: metadata.MetadataSpecCollection = {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.ProBam(**kwd)[source]

Bases: Bam

Class describing a BAM binary file - extended for proteomics data

edam_format = 'format_3826'
edam_data = 'data_0863'
file_ext = 'probam'
metadata_spec: metadata.MetadataSpecCollection = {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BamInputSorted(**kwd)[source]

Bases: BamNative

A class for BAM files that can formally be unsorted or queryname sorted. Alignments are either ordered based on the order with which the queries appear when producing the alignment, or ordered by their queryname. This notaby keeps alignments produced by paired end sequencing adjacent.

sort_flag: str | None = '-n'
file_ext = 'qname_input_sorted.bam'
sniff(filename: str) bool[source]
dataset_content_needs_grooming(file_name: str) bool[source]

Groom if the file is coordinate sorted

metadata_spec: metadata.MetadataSpecCollection = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BamQuerynameSorted(**kwd)[source]

Bases: BamInputSorted

A class for queryname sorted BAM files.

sort_flag: str | None = '-n'
file_ext = 'qname_sorted.bam'
sniff(filename: str) bool[source]
dataset_content_needs_grooming(file_name: str) bool[source]

Check if file_name is a queryname-sorted BAM file

metadata_spec: metadata.MetadataSpecCollection = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.CRAM(**kwd)[source]

Bases: Binary

file_ext = 'cram'
edam_format = 'format_3462'
edam_data = 'data_0863'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

get_cram_version(filename: str) Tuple[int, int][source]
set_index_file(dataset: HasFileName, index_file) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'cram_index': <galaxy.model.metadata.MetadataElementSpec object>, 'cram_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BaseBcf(**kwd)[source]

Bases: CompressedArchive

edam_format = 'format_3020'
edam_data = 'data_3498'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Bcf(**kwd)[source]

Bases: BaseBcf

Class describing a (BGZF-compressed) BCF file

file_ext = 'bcf'
sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Creates the index for the BCF file.

metadata_spec: MetadataSpecCollection = {'bcf_index': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BcfUncompressed(**kwd)[source]

Bases: BaseBcf

Class describing an uncompressed BCF file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.bcf_uncompressed')
>>> BcfUncompressed().sniff(fname)
True
>>> fname = get_test_fname('1.bcf')
>>> BcfUncompressed().sniff(fname)
False
file_ext = 'bcf_uncompressed'
compressed = False
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.H5(**kwd)[source]

Bases: Binary

Class describing an HDF5 file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.mz5')
>>> H5().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> H5().sniff(fname)
False
file_ext = 'h5'
edam_format = 'format_3590'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

get_structured_content(dataset, content_type=None, path='/', dtype='origin', format='json', flatten=False, selection=None, **kwargs)[source]

Implements h5grove protocol (https://silx-kit.github.io/h5grove/). This allows the h5web visualization tool (https://github.com/silx-kit/h5web) to be used directly with Galaxy datasets.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Loom(**kwd)[source]

Bases: H5

Class describing a Loom file: http://loompy.org/

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.loom')
>>> Loom().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Loom().sniff(fname)
False
file_ext = 'loom'
edam_format = 'format_3590'
sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

metadata_spec: MetadataSpecCollection = {'col_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'col_attrs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'col_graphs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'col_graphs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'description': <galaxy.model.metadata.MetadataElementSpec object>, 'doi': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_count': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_names': <galaxy.model.metadata.MetadataElementSpec object>, 'loom_spec_version': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'row_graphs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'row_graphs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'title': <galaxy.model.metadata.MetadataElementSpec object>, 'url': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Anndata(**kwd)[source]

Bases: H5

Class describing an HDF5 anndata files: http://anndata.rtfd.io

>>> from galaxy.datatypes.sniff import get_test_fname
>>> Anndata().sniff(get_test_fname('pbmc3k_tiny.h5ad'))
True
>>> Anndata().sniff(get_test_fname('test.mz5'))
False
>>> Anndata().sniff(get_test_fname('import.loom.krumsiek11.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_0_6_small2.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_0_6_small.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_0_7_4_small2.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_0_7_4_small.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_unk2.h5ad'))
True
>>> Anndata().sniff(get_test_fname('adata_unk.h5ad'))
True
file_ext = 'h5ad'
sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'anndata_spec_version': <galaxy.model.metadata.MetadataElementSpec object>, 'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'description': <galaxy.model.metadata.MetadataElementSpec object>, 'doi': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_count': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_names': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_size': <galaxy.model.metadata.MetadataElementSpec object>, 'obsm_count': <galaxy.model.metadata.MetadataElementSpec object>, 'obsm_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_count': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_size': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'title': <galaxy.model.metadata.MetadataElementSpec object>, 'uns_count': <galaxy.model.metadata.MetadataElementSpec object>, 'uns_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'url': <galaxy.model.metadata.MetadataElementSpec object>, 'var_count': <galaxy.model.metadata.MetadataElementSpec object>, 'var_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'var_size': <galaxy.model.metadata.MetadataElementSpec object>, 'varm_count': <galaxy.model.metadata.MetadataElementSpec object>, 'varm_layers': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Grib(**kwd)[source]

Bases: Binary

Class describing an GRIB file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.grib')
>>> Grib().sniff_prefix(FilePrefix(fname))
True
>>> fname = FilePrefix(get_test_fname('interval.interval'))
>>> Grib().sniff_prefix(fname)
False
file_ext = 'grib'
edam_format = 'format_2333'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the GRIB edition.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'grib_edition': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.GmxBinary(**kwd)[source]

Bases: Binary

Base class for GROMACS binary files - xtc, trr, cpt

magic_number: int | None = None
file_ext = ''
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.Trr(**kwd)[source]

Bases: GmxBinary

Class describing an trr file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.trr')
>>> Trr().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Trr().sniff(fname)
False
file_ext = 'trr'
magic_number: int | None = 1993
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Cpt(**kwd)[source]

Bases: GmxBinary

Class describing a checkpoint (.cpt) file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.cpt')
>>> Cpt().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Cpt().sniff(fname)
False
file_ext = 'cpt'
magic_number: int | None = 171817
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Xtc(**kwd)[source]

Bases: GmxBinary

Class describing an xtc file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.xtc')
>>> Xtc().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Xtc().sniff(fname)
False
file_ext = 'xtc'
magic_number: int | None = 1995
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Edr(**kwd)[source]

Bases: GmxBinary

Class describing an edr file from the GROMACS suite

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('md.edr')
>>> Edr().sniff(fname)
True
>>> fname = get_test_fname('md.trr')
>>> Edr().sniff(fname)
False
file_ext = 'edr'
magic_number: int | None = -55555
metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Biom2(**kwd)[source]

Bases: H5

Class describing a biom2 file (http://biom-format.org/documentation/biom_format.html)

file_ext = 'biom2'
edam_format = 'format_3746'
sniff(filename: str) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> Biom2().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Biom2().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> Biom2().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'format_url': <galaxy.model.metadata.MetadataElementSpec object>, 'format_version': <galaxy.model.metadata.MetadataElementSpec object>, 'generated_by': <galaxy.model.metadata.MetadataElementSpec object>, 'id': <galaxy.model.metadata.MetadataElementSpec object>, 'nnz': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'type': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Cool(**kwd)[source]

Bases: H5

Class describing the cool format (https://github.com/mirnylab/cooler)

file_ext = 'cool'
sniff(filename: str) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('matrix.cool')
>>> Cool().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Cool().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> Cool().sniff(fname)
False
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> Cool().sniff(fname)
False
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MCool(**kwd)[source]

Bases: H5

Class describing the multi-resolution cool format (https://github.com/mirnylab/cooler)

file_ext = 'mcool'
sniff(filename: str) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('matrix.mcool')
>>> MCool().sniff(fname)
True
>>> fname = get_test_fname('matrix.cool')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('test.mz5')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('wiggle.wig')
>>> MCool().sniff(fname)
False
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> MCool().sniff(fname)
False
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.H5MLM(**kwd)[source]

Bases: H5

Machine learning model generated by Galaxy-ML.

file_ext = 'h5mlm'
TARGET_URL = 'https://github.com/goeckslab/Galaxy-ML'
max_peek_size = 1000
max_preview_size = 1000000
CONFIG = '-model_config-'
HTTP_REPR = '-http_repr-'
HYPERPARAMETER = '-model_hyperparameters-'
REPR = '-repr-'
URL = '-URL-'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
get_attribute(filename: str, attr_key: str) str[source]
get_repr(filename: str) str[source]
get_html_repr(filename: str) str[source]
get_config_string(filename: str) str[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'hyper_params': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.LudwigModel(**kwd)[source]

Bases: Html

Composite datatype that encloses multiple files for a Ludwig trained model.

composite_type: str | None = 'auto_primary_file'
file_ext = 'ludwig_model'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.HexrdMaterials(**kwd)[source]

Bases: H5

Class describing a Hexrd Materials file: https://github.com/HEXRD/hexrd

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('hexrd.materials.h5')
>>> HexrdMaterials().sniff(fname)
True
>>> fname = get_test_fname('test.loom')
>>> HexrdMaterials().sniff(fname)
False
file_ext = 'hexrd.materials.h5'
edam_format = 'format_3590'
sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'LatticeParameters': <galaxy.model.metadata.MetadataElementSpec object>, 'SpaceGroupNumber': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'materials': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Scf(**kwd)[source]

Bases: Binary

Class describing an scf binary sequence file

edam_format = 'format_1632'
edam_data = 'data_0924'
file_ext = 'scf'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Sff(**kwd)[source]

Bases: Binary

Standard Flowgram Format (SFF)

edam_format = 'format_3284'
edam_data = 'data_0924'
file_ext = 'sff'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.BigWig(**kwd)[source]

Bases: Binary

Accessing binary BigWig files from UCSC. The supplemental info in the paper has the binary details: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq351v1

edam_format = 'format_3006'
edam_data = 'data_3002'
file_ext = 'bigwig'
track_type: str | None = 'LineTrack'
data_sources: Dict[str, str] = {'data_standalone': 'bigwig'}
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.BigBed(**kwd)[source]

Bases: BigWig

BigBed support from UCSC.

edam_format = 'format_3004'
edam_data = 'data_3002'
file_ext = 'bigbed'
data_sources: Dict[str, str] = {'data_standalone': 'bigbed'}
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.TwoBit(**kwd)[source]

Bases: Binary

Class describing a TwoBit format nucleotide file

edam_format = 'format_3009'
edam_data = 'data_0848'
file_ext = 'twobit'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.SQlite(**kwd)[source]

Bases: Binary

Class describing a Sqlite database

file_ext = 'sqlite'
edam_format = 'format_3621'
init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
sniff_table_names(filename: str, table_names: Iterable) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sqlite_dataprovider(dataset: DatasetProtocol, **settings) SQliteDataProvider[source]
sqlite_datatableprovider(dataset: DatasetProtocol, **settings) SQliteDataTableProvider[source]
sqlite_datadictprovider(dataset: DatasetProtocol, **settings) SQliteDataDictProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'sqlite': <function SQlite.sqlite_dataprovider>, 'sqlite-dict': <function SQlite.sqlite_datadictprovider>, 'sqlite-table': <function SQlite.sqlite_datatableprovider>}
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.GeminiSQLite(**kwd)[source]

Bases: SQlite

Class describing a Gemini Sqlite database

file_ext = 'gemini.sqlite'
edam_format = 'format_3622'
edam_data = 'data_3498'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'gemini_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.ChiraSQLite(**kwd)[source]

Bases: SQlite

Class describing a ChiRAViz Sqlite database

file_ext = 'chira.sqlite'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.CuffDiffSQlite(**kwd)[source]

Bases: SQlite

Class describing a CuffDiff SQLite database

file_ext = 'cuffdiff.sqlite'
edam_format = 'format_3621'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'cuffdiff_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'genes': <galaxy.model.metadata.MetadataElementSpec object>, 'samples': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MzSQlite(**kwd)[source]

Bases: SQlite

Class describing a Proteomics Sqlite database

file_ext = 'mz.sqlite'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.PQP(**kwd)[source]

Bases: SQlite

Class describing a Peptide query parameters file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.pqp')
>>> PQP().sniff(fname)
True
>>> fname = get_test_fname('test.osw')
>>> PQP().sniff(fname)
False
file_ext = 'pqp'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]

table definition according to https://github.com/grosenberger/OpenMS/blob/develop/src/openms/source/ANALYSIS/OPENSWATH/TransitionPQPFile.cpp#L264 for now VERSION GENE PEPTIDE_GENE_MAPPING are excluded, since there is test data wo these tables, see also here https://github.com/OpenMS/OpenMS/issues/4365

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OSW(**kwd)[source]

Bases: SQlite

Class describing OpenSwath output

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.osw')
>>> OSW().sniff(fname)
True
>>> fname = get_test_fname('test.sqmass')
>>> OSW().sniff(fname)
False
file_ext = 'osw'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.SQmass(**kwd)[source]

Bases: SQlite

Class describing a Sqmass database

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.sqmass')
>>> SQmass().sniff(fname)
True
>>> fname = get_test_fname('test.pqp')
>>> SQmass().sniff(fname)
False
file_ext = 'sqmass'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.BlibSQlite(**kwd)[source]

Bases: SQlite

Class describing a Proteomics Spectral Library Sqlite database

file_ext = 'blib'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'blib_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.DlibSQlite(**kwd)[source]

Bases: SQlite

Class describing a Proteomics Spectral Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.dlib')
>>> DlibSQlite().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> DlibSQlite().sniff(fname)
False
file_ext = 'dlib'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dlib_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.ElibSQlite(**kwd)[source]

Bases: SQlite

Class describing a Proteomics Chromatagram Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.elib')
>>> ElibSQlite().sniff(fname)
True
>>> fname = get_test_fname('test.dlib')
>>> ElibSQlite().sniff(fname)
False
file_ext = 'elib'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.IdpDB(**kwd)[source]

Bases: SQlite

Class describing an IDPicker 3 idpDB (sqlite) database

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.idpdb')
>>> IdpDB().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> IdpDB().sniff(fname)
False
file_ext = 'idpdb'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.GAFASQLite(**kwd)[source]

Bases: SQlite

Class describing a GAFA SQLite database

file_ext = 'gafa.sqlite'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'gafa_schema_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.NcbiTaxonomySQlite(**kwd)[source]

Bases: SQlite

Class describing the NCBI Taxonomy database stored in SQLite as done by rust-ncbitaxonomy

file_ext = 'ncbitaxonomy.sqlite'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'ncbitaxonomy_schema_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>, 'taxon_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Xlsx(**kwd)[source]

Bases: Binary

Class for Excel 2007 (xlsx) files

file_ext = 'xlsx'
compressed = True
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.ExcelXls(**kwd)[source]

Bases: Binary

Class describing an Excel (xls) file

file_ext = 'excel.xls'
edam_format = 'format_3468'
sniff_prefix(file_prefix: FilePrefix) bool[source]
get_mime() str[source]

Returns the mime type of the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.Sra(**kwd)[source]

Bases: Binary

Sequence Read Archive (SRA) datatype originally from mdshw5/sra-tools-galaxy

file_ext = 'sra'
sniff_prefix(file_prefix: FilePrefix) bool[source]

The first 8 bytes of any NCBI sra file is ‘NCBI.sra’, and the file is binary. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.RData(**kwd)[source]

Bases: CompressedArchive

Generic R Data file datatype implementation, i.e. files generated with R’s save or save.img function see https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml and https://cran.r-project.org/doc/manuals/r-patched/R-ints.html#Serialization-Formats

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.rdata')
>>> RData().sniff(fname)
True
>>> from galaxy.util.bunch import Bunch
>>> dataset = Bunch()
>>> dataset.metadata = Bunch
>>> dataset.get_file_name = lambda : fname
>>> dataset.has_data = lambda: True
>>> RData().set_meta(dataset)
>>> dataset.metadata.version
'3'
VERSION_2_PREFIX = b'RDX2\nX\n'
VERSION_3_PREFIX = b'RDX3\nX\n'
file_ext = 'rdata'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.RDS(**kwd)[source]

Bases: CompressedArchive

File using a serialized R object generated with R’s saveRDS function see https://cran.r-project.org/doc/manuals/r-patched/R-ints.html#Serialization-Formats

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('int-r3.rds')
>>> RDS().sniff(fname)
True
>>> fname = get_test_fname('int-r4.rds')
>>> RDS().sniff(fname)
True
>>> fname = get_test_fname('int-r3-version2.rds')
>>> RDS().sniff(fname)
True
>>> from galaxy.util.bunch import Bunch
>>> dataset = Bunch()
>>> dataset.metadata = Bunch
>>> dataset.get_file_name = lambda : get_test_fname('int-r4.rds')
>>> dataset.has_data = lambda: True
>>> RDS().set_meta(dataset)
>>> dataset.metadata.version
'3'
>>> dataset.metadata.rversion
'4.1.1'
>>> dataset.metadata.minrversion
'3.5.0'
file_ext = 'rds'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'minrversion': <galaxy.model.metadata.MetadataElementSpec object>, 'rversion': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.OxliBinary(**kwd)[source]

Bases: Binary

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliCountGraph(**kwd)[source]

Bases: OxliBinary

OxliCountGraph starts with “OXLI” + one byte version number + 8-bit binary ‘1’ Test file generated via:

load-into-counting.py --n_tables 1 --max-tablesize 1 \
    oxli_countgraph.oxlicg khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliCountGraph().sniff(fname)
False
>>> fname = get_test_fname("oxli_countgraph.oxlicg")
>>> OxliCountGraph().sniff(fname)
True
file_ext = 'oxlicg'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliNodeGraph(**kwd)[source]

Bases: OxliBinary

OxliNodeGraph starts with “OXLI” + one byte version number + 8-bit binary ‘2’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliNodeGraph().sniff(fname)
False
>>> fname = get_test_fname("oxli_nodegraph.oxling")
>>> OxliNodeGraph().sniff(fname)
True
file_ext = 'oxling'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliTagSet(**kwd)[source]

Bases: OxliBinary

OxliTagSet starts with “OXLI” + one byte version number + 8-bit binary ‘3’ Test file generated via:

load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \
    khmer/tests/test-data/100-reads.fq.bz2;
mv oxli_nodegraph.oxling.tagset oxli_tagset.oxlits

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliTagSet().sniff(fname)
False
>>> fname = get_test_fname("oxli_tagset.oxlits")
>>> OxliTagSet().sniff(fname)
True
file_ext = 'oxlits'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliStopTags(**kwd)[source]

Bases: OxliBinary

OxliStopTags starts with “OXLI” + one byte version number + 8-bit binary ‘4’ Test file adapted from khmer 2.0’s “khmer/tests/test-data/goodversion-k32.stoptags”

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliStopTags().sniff(fname)
False
>>> fname = get_test_fname("oxli_stoptags.oxlist")
>>> OxliStopTags().sniff(fname)
True
file_ext = 'oxlist'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliSubset(**kwd)[source]

Bases: OxliBinary

OxliSubset starts with “OXLI” + one byte version number + 8-bit binary ‘5’ Test file generated via:

load-graph.py -k 20 example tests/test-data/random-20-a.fa;
partition-graph.py example;
mv example.subset.0.pmap oxli_subset.oxliss

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliSubset().sniff(fname)
False
>>> fname = get_test_fname("oxli_subset.oxliss")
>>> OxliSubset().sniff(fname)
True
file_ext = 'oxliss'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.OxliGraphLabels(**kwd)[source]

Bases: OxliBinary

OxliGraphLabels starts with “OXLI” + one byte version number + 8-bit binary ‘6’ Test file generated via:

python -c "from khmer import GraphLabels; \
    gl = GraphLabels(20, 1e7, 4); \
    gl.consume_fasta_and_tag_with_labels('tests/test-data/test-labels.fa'); \
    gl.save_labels_and_tags('oxli_graphlabels.oxligl')"

using khmer 2.0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sequence.csfasta')
>>> OxliGraphLabels().sniff(fname)
False
>>> fname = get_test_fname("oxli_graphlabels.oxligl")
>>> OxliGraphLabels().sniff(fname)
True
file_ext = 'oxligl'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.PostgresqlArchive(**kwd)[source]

Bases: CompressedArchive

Class describing a Postgresql database packed into a tar archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('postgresql_fake.tar.bz2')
>>> PostgresqlArchive().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar')
>>> PostgresqlArchive().sniff(fname)
False
file_ext = 'postgresql'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MongoDBArchive(**kwd)[source]

Bases: CompressedArchive

Class describing a Mongo database packed into a tar archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('mongodb_fake.tar.bz2')
>>> MongoDBArchive().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar')
>>> MongoDBArchive().sniff(fname)
False
file_ext = 'mongodb'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.GeneNoteBook(**kwd)[source]

Bases: MongoDBArchive

Class describing a bzip2-compressed GeneNoteBook archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('mongodb_fake.tar.bz2')
>>> GeneNoteBook().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> GeneNoteBook().sniff(fname)
False
file_ext = 'genenotebook'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Fast5Archive(**kwd)[source]

Bases: CompressedArchive

Class describing a FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5Archive().sniff(fname)
True
file_ext = 'fast5.tar'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Fast5ArchiveGz(**kwd)[source]

Bases: Fast5Archive

Class describing a gzip-compressed FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> Fast5ArchiveGz().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.xz')
>>> Fast5ArchiveGz().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar.bz2')
>>> Fast5ArchiveGz().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5ArchiveGz().sniff(fname)
False
file_ext = 'fast5.tar.gz'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Fast5ArchiveXz(**kwd)[source]

Bases: Fast5Archive

Class describing a xz-compressed FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> Fast5ArchiveXz().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar.xz')
>>> Fast5ArchiveXz().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.bz2')
>>> Fast5ArchiveXz().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5ArchiveXz().sniff(fname)
False
file_ext = 'fast5.tar.xz'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Fast5ArchiveBz2(**kwd)[source]

Bases: Fast5Archive

Class describing a bzip2-compressed FAST5 archive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.fast5.tar.bz2')
>>> Fast5ArchiveBz2().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar.xz')
>>> Fast5ArchiveBz2().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar.gz')
>>> Fast5ArchiveBz2().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Fast5ArchiveBz2().sniff(fname)
False
file_ext = 'fast5.tar.bz2'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.SearchGuiArchive(**kwd)[source]

Bases: CompressedArchive

Class describing a SearchGUI archive

file_ext = 'searchgui_archive'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'searchgui_major_version': <galaxy.model.metadata.MetadataElementSpec object>, 'searchgui_version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.NetCDF(**kwd)[source]

Bases: Binary

Binary data in netCDF format

file_ext = 'netcdf'
edam_format = 'format_3650'
edam_data = 'data_0943'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.Dcd(**kwd)[source]

Bases: Binary

Class describing a dcd file from the CHARMM molecular simulation program

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_glucose_vacuum.dcd')
>>> Dcd().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Dcd().sniff(fname)
False
file_ext = 'dcd'
edam_data = 'data_3842'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Vel(**kwd)[source]

Bases: Binary

Class describing a velocity file from the CHARMM molecular simulation program

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_charmm.vel')
>>> Vel().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Vel().sniff(fname)
False
file_ext = 'vel'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.DAA(**kwd)[source]

Bases: Binary

Class describing an DAA (diamond alignment archive) file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('diamond.daa')
>>> DAA().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> DAA().sniff(fname)
False
file_ext = 'daa'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.RMA6(**kwd)[source]

Bases: Binary

Class describing an RMA6 (MEGAN6 read-match archive) file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('diamond.rma6')
>>> RMA6().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> RMA6().sniff(fname)
False
file_ext = 'rma6'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.DMND(**kwd)[source]

Bases: Binary

Class describing an DMND file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('diamond_db.dmnd')
>>> DMND().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> DMND().sniff(fname)
False
file_ext = 'dmnd'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.ICM(**kwd)[source]

Bases: Binary

Class describing an ICM (interpolated context model) file, used by Glimmer

file_ext = 'icm'
edam_data = 'data_0950'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Parquet(**kwd)[source]

Bases: Binary

Class describing Apache Parquet file (https://parquet.apache.org/)

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('example.parquet')
>>> Parquet().sniff(fname)
True
>>> fname = get_test_fname('test.mz5')
>>> Parquet().sniff(fname)
False
file_ext = 'parquet'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.BafTar(**kwd)[source]

Bases: CompressedArchive

Base class for common behavior of tar files of directory-based raw file formats

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('brukerbaf.d.tar')
>>> BafTar().sniff(fname)
True
>>> fname = get_test_fname('test.fast5.tar')
>>> BafTar().sniff(fname)
False
edam_data = 'data_2536'
edam_format = 'format_3712'
file_ext = 'brukerbaf.d.tar'
get_signature_file() str[source]
sniff(filename: str) bool[source]
get_type() str[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.YepTar(**kwd)[source]

Bases: BafTar

A tar’d up .d directory containing Agilent/Bruker YEP format data

file_ext = 'agilentbrukeryep.d.tar'
get_signature_file() str[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.TdfTar(**kwd)[source]

Bases: BafTar

A tar’d up .d directory containing Bruker TDF format data

file_ext = 'brukertdf.d.tar'
get_signature_file() str[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MassHunterTar(**kwd)[source]

Bases: BafTar

A tar’d up .d directory containing Agilent MassHunter format data

file_ext = 'agilentmasshunter.d.tar'
get_signature_file() str[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.MassLynxTar(**kwd)[source]

Bases: BafTar

A tar’d up .d directory containing Waters MassLynx format data

file_ext = 'watersmasslynx.raw.tar'
get_signature_file() str[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.WiffTar(**kwd)[source]

Bases: BafTar

A tar’d up .wiff/.scan pair containing Sciex WIFF format data

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('some.wiff.tar')
>>> WiffTar().sniff(fname)
True
>>> fname = get_test_fname('brukerbaf.d.tar')
>>> WiffTar().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> WiffTar().sniff(fname)
False
file_ext = 'wiff.tar'
sniff(filename: str) bool[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Wiff2Tar(**kwd)[source]

Bases: BafTar

A tar’d up .wiff2/.scan pair containing Sciex WIFF format data

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('some.wiff2.tar')
>>> Wiff2Tar().sniff(fname)
True
>>> fname = get_test_fname('brukerbaf.d.tar')
>>> Wiff2Tar().sniff(fname)
False
>>> fname = get_test_fname('test.fast5.tar')
>>> Wiff2Tar().sniff(fname)
False
file_ext = 'wiff2.tar'
sniff(filename: str) bool[source]
get_type() str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Pretext(**kwd)[source]

Bases: Binary

PretextMap contact map file Try to guess if the file is a Pretext file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sample.pretext')
>>> Pretext().sniff(fname)
True
file_ext = 'pretext'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.binary.JP2(**kwd)[source]

Bases: Binary

JPEG 2000 binary image format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.jp2')
>>> JP2().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> JP2().sniff(fname)
False
file_ext = 'jp2'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Npz(**kwd)[source]

Bases: CompressedArchive

Class describing an Numpy NPZ file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('hexrd.images.npz')
>>> Npz().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Npz().sniff(fname)
False
file_ext = 'npz'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'files': <galaxy.model.metadata.MetadataElementSpec object>, 'nfiles': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.HexrdImagesNpz(**kwd)[source]

Bases: Npz

Class describing an HEXRD Images Numpy NPZ file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('hexrd.images.npz')
>>> HexrdImagesNpz().sniff(fname)
True
>>> fname = get_test_fname('eta_ome.npz')
>>> HexrdImagesNpz().sniff(fname)
False
file_ext = 'hexrd.images.npz'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'files': <galaxy.model.metadata.MetadataElementSpec object>, 'nfiles': <galaxy.model.metadata.MetadataElementSpec object>, 'nframes': <galaxy.model.metadata.MetadataElementSpec object>, 'omegas': <galaxy.model.metadata.MetadataElementSpec object>, 'panel_id': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.HexrdEtaOmeNpz(**kwd)[source]

Bases: Npz

Class describing an HEXRD Eta-Ome Numpy NPZ file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('hexrd.eta_ome.npz')
>>> HexrdEtaOmeNpz().sniff(fname)
True
>>> fname = get_test_fname('hexrd.images.npz')
>>> HexrdEtaOmeNpz().sniff(fname)
False
file_ext = 'hexrd.eta_ome.npz'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'HKLs': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'files': <galaxy.model.metadata.MetadataElementSpec object>, 'nfiles': <galaxy.model.metadata.MetadataElementSpec object>, 'nframes': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.FITS(**kwd)[source]

Bases: Binary

FITS (Flexible Image Transport System) file data format, widely used in astronomy Represents sky images (in celestial coordinates) and tables https://fits.gsfc.nasa.gov/fits_primer.html

file_ext = 'fits'
__init__(**kwd)[source]

Initialize the datatype

sniff(filename: str) bool[source]

Determines whether the file is a FITS file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘test.fits’) >>> FITS().sniff(fname) True >>> fname = FilePrefix(get_test_fname(‘interval.interval’)) >>> FITS().sniff(fname) False

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'HDUs': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.binary.Numpy(**kwd)[source]

Bases: Binary

Class defining a numpy data file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.npy')
>>> Numpy().sniff(fname)
True
file_ext = 'npy'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version_str': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.blast module

NCBI BLAST datatypes.

Covers the blastxml format and the BLAST databases.

class galaxy.datatypes.blast.BlastXml(**kwd)[source]

Bases: GenericXml

NCBI Blast XML Output data

file_ext = 'blastxml'
edam_format = 'format_3331'
edam_data = 'data_0857'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is blastxml

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml')
>>> BlastXml().sniff(fname)
True
>>> fname = get_test_fname('tblastn_four_human_vs_rhodopsin.blastxml')
>>> BlastXml().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> BlastXml().sniff(fname)
False
static merge(split_files: List[str], output_file: str) None[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.blast.BlastNucDb(**kwd)[source]

Bases: _BlastDb

Class for nucleotide BLAST database files.

file_ext = 'blastdbn'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.BlastProtDb(**kwd)[source]

Bases: _BlastDb

Class for protein BLAST database files.

file_ext = 'blastdbp'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.BlastDomainDb(**kwd)[source]

Bases: _BlastDb

Class for domain BLAST database files.

file_ext = 'blastdbd'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.LastDb(**kwd)[source]

Bases: Data

Class for LAST database files.

file_ext = 'lastdb'
composite_type: str | None = 'basic'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.BlastNucDb5(**kwd)[source]

Bases: _BlastDb

Class for nucleotide BLAST database files.

file_ext = 'blastdbn5'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.BlastProtDb5(**kwd)[source]

Bases: _BlastDb

Class for protein BLAST database files.

file_ext = 'blastdbp5'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.blast.BlastDomainDb5(**kwd)[source]

Bases: _BlastDb

Class for domain BLAST database files.

file_ext = 'blastdbd5'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.checkers module

Module proxies galaxy.util.checkers for backward compatibility.

External datatypes may make use of these functions.

galaxy.datatypes.checkers.check_binary(name, file_path: bool = True) bool[source]
galaxy.datatypes.checkers.check_bz2(file_path: str, check_content: bool = True) Tuple[bool, bool][source]
galaxy.datatypes.checkers.check_gzip(file_path: str, check_content: bool = True) Tuple[bool, bool][source]
galaxy.datatypes.checkers.check_html(name, file_path: bool = True) bool[source]

Returns True if the file/string contains HTML code.

galaxy.datatypes.checkers.check_image(file_path: str) bool[source]

Simple wrapper around image_type to yield a True/False verdict

galaxy.datatypes.checkers.check_zip(file_path: str, check_content: bool = True, files=1) Tuple[bool, bool][source]
galaxy.datatypes.checkers.is_gzip(file_path: str) bool[source]
galaxy.datatypes.checkers.is_bz2(file_path: str) bool[source]

galaxy.datatypes.chrominfo module

class galaxy.datatypes.chrominfo.ChromInfo(**kwd)[source]

Bases: Tabular

file_ext = 'len'
metadata_spec: MetadataSpecCollection = {'chrom': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'length': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.constructive_solid_geometry module

Constructive Solid Geometry file formats.

class galaxy.datatypes.constructive_solid_geometry.Ply(**kwd)[source]

Bases: object

The PLY format describes an object as a collection of vertices, faces and other elements, along with properties such as color and normal direction that can be attached to these elements. A PLY file contains the description of exactly one object.

subtype = ''
abstract __init__(**kwd)[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

The structure of a typical PLY file: Header, Vertex List, Face List, (lists of other elements)

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]
display_peek(dataset: DatasetProtocol) str[source]
sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.PlyAscii(**kwd)[source]

Bases: Ply, Text

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.plyascii')
>>> PlyAscii().sniff(fname)
True
>>> fname = get_test_fname('test.vtkascii')
>>> PlyAscii().sniff(fname)
False
file_ext = 'plyascii'
subtype = 'ascii'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'face': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.PlyBinary(**kwd)[source]

Bases: Ply, Binary

file_ext = 'plybinary'
subtype = 'binary'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'face': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.Vtk(**kwd)[source]

Bases: object

The Visualization Toolkit provides a number of source and writer objects to read and write popular data file formats. The Visualization Toolkit also provides some of its own file formats.

There are two different styles of file formats available in VTK. The simplest are the legacy, serial formats that are easy to read and write either by hand or programmatically. However, these formats are less flexible than the XML based file formats which support random access, parallel I/O, and portable data compression and are preferred to the serial VTK file formats whenever possible.

All keyword phrases are written in ASCII form whether the file is binary or ASCII. The binary section of the file (if in binary form) is the data proper; i.e., the numbers that define points coordinates, scalars, cell indices, and so forth.

Binary data must be placed into the file immediately after the newline (’\n’) character from the previous ASCII keyword and parameter sequence.

TODO: only legacy formats are currently supported and support for XML formats should be added.

subtype = ''
abstract __init__(**kwd)[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

VTK files can be either ASCII or binary, with two different styles of file formats: legacy or XML. We’ll assume if the file contains a valid VTK header, then it is a valid VTK file.

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]
set_initial_metadata(i: int, line: str, dataset: DatasetProtocol) DatasetProtocol[source]
set_structure_metadata(line: str, dataset: DatasetProtocol, dataset_type: str | None) Tuple[DatasetProtocol, str | None][source]

The fourth part of legacy VTK files is the dataset structure. The geometry part describes the geometry and topology of the dataset. This part begins with a line containing the keyword DATASET followed by a keyword describing the type of dataset. Then, depending upon the type of dataset, other keyword/ data combinations define the actual data.

get_blurb(dataset: HasMetadata) str[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]
display_peek(dataset: DatasetProtocol) str[source]
sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.VtkAscii(**kwd)[source]

Bases: Vtk, Text

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.vtkascii')
>>> VtkAscii().sniff(fname)
True
>>> fname = get_test_fname('test.vtkbinary')
>>> VtkAscii().sniff(fname)
False
file_ext = 'vtkascii'
subtype = 'ASCII'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'lines': <galaxy.model.metadata.MetadataElementSpec object>, 'origin': <galaxy.model.metadata.MetadataElementSpec object>, 'points': <galaxy.model.metadata.MetadataElementSpec object>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.VtkBinary(**kwd)[source]

Bases: Vtk, Binary

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.vtkbinary')
>>> VtkBinary().sniff(fname)
True
>>> fname = get_test_fname('test.vtkascii')
>>> VtkBinary().sniff(fname)
False
file_ext = 'vtkbinary'
subtype = 'BINARY'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: metadata.MetadataSpecCollection = {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'lines': <galaxy.model.metadata.MetadataElementSpec object>, 'origin': <galaxy.model.metadata.MetadataElementSpec object>, 'points': <galaxy.model.metadata.MetadataElementSpec object>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.STL(**kwd)[source]

Bases: Data

file_ext = 'stl'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.NeperTess(**kwd)[source]

Bases: Text

Neper Tessellation File

Example:

***tess
**format
    format
**general
    dim type
**cell
    number_of_cells
file_ext = 'neper.tess'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Neper tess format, starts with ***tess

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.neper.tess')
>>> NeperTess().sniff(fname)
True
>>> fname = get_test_fname('test.neper.tesr')
>>> NeperTess().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimension': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.NeperTesr(**kwd)[source]

Bases: Binary

Neper Raster Tessellation File

Example:

***tesr
**format
    format
**general
    dimension
    size_x size_y [size_z]
    voxsize_x voxsize_y [voxsize_z]
[*origin
    origin_x origin_y [origin_z]]
[*hasvoid has_void]
[**cell
    number_of_cells
file_ext = 'neper.tesr'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Neper tesr format, starts with ***tesr

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.neper.tesr')
>>> NeperTesr().sniff(fname)
True
>>> fname = get_test_fname('test.neper.tess')
>>> NeperTesr().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimension': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'origin': <galaxy.model.metadata.MetadataElementSpec object>, 'size': <galaxy.model.metadata.MetadataElementSpec object>, 'voxsize': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.NeperPoints(**kwd)[source]

Bases: Text

Neper Position File Neper position format has 1 - 3 floats per line separated by white space.

file_ext = 'neper.points'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimension': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.NeperPointsTabular(**kwd)[source]

Bases: NeperPoints, Tabular

Neper Position File Neper position format has 1 - 3 floats per line separated by TABs.

file_ext = 'neper.points.tsv'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'dimension': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.NeperMultiScaleCell(**kwd)[source]

Bases: Text

Neper Multiscale Cell File

file_ext = 'neper.mscell'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.GmshMsh(**kwd)[source]

Bases: Binary

Gmsh Mesh File

file_ext = 'gmsh.msh'
is_binary: bool | typing_extensions.Literal[maybe] = 'maybe'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Gmsh msh format, starts with $MeshFormat

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.gmsh.msh')
>>> GmshMsh().sniff(fname)
True
>>> fname = get_test_fname('test.neper.tesr')
>>> GmshMsh().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.constructive_solid_geometry.GmshGeo(**kwd)[source]

Bases: Text

Gmsh geometry File

file_ext = 'gmsh.geo'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.constructive_solid_geometry.ZsetGeof(**kwd)[source]

Bases: Text

Z-set geof File

file_ext = 'zset.geof'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.constructive_solid_geometry.get_next_line(fh)[source]

galaxy.datatypes.coverage module

Coverage datatypes

class galaxy.datatypes.coverage.LastzCoverage(**kwd)[source]

Bases: Tabular

file_ext = 'coverage'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'forwardCol': <galaxy.model.metadata.MetadataElementSpec object>, 'positionCol': <galaxy.model.metadata.MetadataElementSpec object>, 'reverseCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.data module

exception galaxy.datatypes.data.DatatypeConverterNotFoundException[source]

Bases: Exception

class galaxy.datatypes.data.DatatypeValidation(state: str, message: str)[source]

Bases: object

__init__(state: str, message: str) None[source]
static validated() DatatypeValidation[source]
static invalid(message: str) DatatypeValidation[source]
static unvalidated() DatatypeValidation[source]
galaxy.datatypes.data.validate(dataset_instance: DatasetProtocol) DatatypeValidation[source]
galaxy.datatypes.data.get_params_and_input_name(converter, deps: Dict | None, target_context: Dict | None = None) Tuple[Dict, str][source]
class galaxy.datatypes.data.DataMeta(name, bases, dict_)[source]

Bases: type

Metaclass for Data class. Sets up metadata spec.

__init__(name, bases, dict_)[source]
class galaxy.datatypes.data.Data(**kwd)[source]

Bases: object

Base class for all datatypes. Implements basic interfaces as well as class methods for metadata.

>>> class DataTest( Data ):
...     MetadataElement( name="test" )
...
>>> DataTest.metadata_spec.test.name
'test'
>>> DataTest.metadata_spec.test.desc
'test'
>>> type( DataTest.metadata_spec.test.param )
<class 'galaxy.model.metadata.MetadataParameter'>
edam_data = 'data_0006'
edam_format = 'format_1915'
file_ext = 'data'
CHUNKABLE = False
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

is_binary: bool | typing_extensions.Literal[maybe] = True
composite_type: str | None = None
primary_file_name = 'index'
allow_datatype_change: bool | None = None
track_type: str | None = None
data_sources: Dict[str, str] = {}
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>}
__init__(**kwd)[source]

Initialize the datatype

supported_display_apps: Dict[str, Any] = {}
composite_files: Dict[str, Any] = {}
classmethod is_datatype_change_allowed() bool[source]

Returns the value of the allow_datatype_change class attribute if set in a subclass, or True iff the datatype is not composite.

dataset_content_needs_grooming(file_name: str) bool[source]

This function is called on an output dataset file after the content is initially generated.

groom_dataset_content(file_name: str) None[source]

This function is called on an output dataset file if dataset_content_needs_grooming returns True.

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, *, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

missing_meta(dataset: HasMetadata, check: List | None = None, skip: List | None = None) bool[source]

Checks for empty metadata values. Returns False if no non-optional metadata is missing and the missing metadata key otherwise. Specifying a list of ‘check’ values will only check those names provided; when used, optionality is ignored Specifying a list of ‘skip’ items will return True even when a named metadata value is missing; when used, optionality is ignored

set_max_optional_metadata_filesize(max_value: int) None[source]
get_max_optional_metadata_filesize() int[source]
property max_optional_metadata_filesize: int
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

to_archive(dataset: DatasetProtocol, name: str = '') Iterable[source]

Collect archive paths and file handles that need to be exported when archiving dataset.

Parameters:
  • dataset – HistoryDatasetAssociation

  • name – archive name, in collection context corresponds to collection name(s) and element_identifier, joined by ‘/’, e.g ‘fastq_collection/sample1/forward’

display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

display_as_markdown(dataset_instance: DatasetProtocol) str[source]

Prepare for embedding dataset into a basic Markdown document.

This is a somewhat experimental interface and should not be implemented on datatypes not tightly tied to a Galaxy version (e.g. datatypes in the Tool Shed).

Speaking very loosely - the datatype should load a bounded amount of data from the supplied dataset instance and prepare for embedding it into Markdown. This should be relatively vanilla Markdown - the result of this is bleached and it should not contain nested Galaxy Markdown directives.

If the data cannot reasonably be displayed, just indicate this and do not throw an exception.

display_name(dataset: HasName) str[source]

Returns formatted html of dataset name

display_info(dataset: HasInfo) str[source]

Returns formatted html of dataset info

get_mime() str[source]

Returns the mime type of the datatype

add_display_app(app_id: str, label: str, file_function: str, links_function: str) None[source]

Adds a display app to the datatype. app_id is a unique id label is the primary display label, e.g., display at ‘UCSC’ file_function is a string containing the name of the function that returns a properly formatted display links_function is a string containing the name of the function that returns a list of (link_name,link)

remove_display_app(app_id: str) None[source]

Removes a display app from the datatype

clear_display_apps() None[source]
add_display_application(display_application: DisplayApplication) None[source]

New style display applications

get_display_application(key: str, default: DisplayApplication | None = None) DisplayApplication[source]
get_display_applications_by_dataset(dataset: DatasetProtocol, trans) Dict[str, DisplayApplication][source]
get_display_types() List[str][source]

Returns display types available

get_display_label(type: str) str[source]

Returns primary label for display app

as_display_type(dataset: DatasetProtocol, type: str, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]

Returns modified file contents for a particular display type

Returns a list of tuples of (name, link) for a particular display type. No check on ‘access’ permissions is done here - if you can view the dataset, you can also save it or send it to a destination outside of Galaxy, so Galaxy security restrictions do not apply anyway.

get_converter_types(original_dataset: HasExt, datatypes_registry: Registry) Dict[str, Dict][source]

Returns available converters by type for this dataset

find_conversion_destination(dataset: DatasetProtocol, accepted_formats: List[str], datatypes_registry, **kwd) Tuple[bool, str | None, Any][source]

Returns ( direct_match, converted_ext, existing converted dataset )

convert_dataset(trans, original_dataset: DatasetHasHidProtocol, target_type: str, return_output: bool = False, visible: bool = True, deps: Dict | None = None, target_context: Dict | None = None, history=None)[source]

This function adds a job to the queue to convert a dataset to another type. Returns a message about success/failure.

after_setting_metadata(dataset: HasClearAssociatedFiles) None[source]

This function is called on the dataset after metadata is set.

before_setting_metadata(dataset: HasClearAssociatedFiles) None[source]

This function is called on the dataset before metadata is set.

add_composite_file(name: str, **kwds) None[source]
property writable_files
get_writable_files_for_dataset(dataset: HasMetadata | None) Dict[source]
get_composite_files(dataset: HasMetadata | None = None)[source]
generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
property has_resolution
matches_any(target_datatypes: List[Any]) bool[source]

Check if this datatype is of any of the target_datatypes or is a subtype thereof.

static merge(split_files: List[str], output_file: str) None[source]

Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.

has_dataprovider(data_format: str) bool[source]

Returns True if data_format is available in dataproviders.

dataprovider(dataset: DatasetProtocol, data_format: str, **settings)[source]

Base dataprovider factory for all datatypes that returns the proper provider for the given data_format or raises a NoProviderAvailable.

validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]
base_dataprovider(dataset: DatasetProtocol, **settings) DataProvider[source]
chunk_dataprovider(dataset: DatasetProtocol, **settings) ChunkDataProvider[source]
chunk64_dataprovider(dataset: DatasetProtocol, **settings) Base64ChunkDataProvider[source]
handle_dataset_as_image(hda: DatasetProtocol) str[source]
class galaxy.datatypes.data.Text(**kwd)[source]

Bases: Data

edam_format = 'format_2330'
file_ext = 'txt'
line_class = 'line'
is_binary: bool | typing_extensions.Literal[maybe] = False
get_mime() str[source]

Returns the mime type of the datatype

set_meta(dataset: DatasetProtocol, *, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

estimate_file_lines(dataset: DatasetProtocol) int | None[source]

Perform a rough estimate by extrapolating number of lines from a small read.

count_data_lines(dataset: HasFileName) int | None[source]

Count the number of lines of data in dataset, skipping all blank lines and comments.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by line.

line_dataprovider(dataset: DatasetProtocol, **settings) FilteredLineDataProvider[source]

Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.

regex_line_dataprovider(dataset: DatasetProtocol, **settings) RegexLineDataProvider[source]

Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.

dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.data.Directory(**kwd)[source]

Bases: Data

Class representing a directory of files.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.data.GenericAsn1(**kwd)[source]

Bases: Text

Class for generic ASN.1 text format

edam_data = 'data_0849'
edam_format = 'format_1966'
file_ext = 'asn1'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.data.LineCount(**kwd)[source]

Bases: Text

Dataset contains a single line with a single integer that denotes the line count for a related dataset. Used for custom builds.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.data.Newick(**kwd)[source]

Bases: Text

New Hampshire/Newick Format

edam_data = 'data_0872'
edam_format = 'format_1910'
file_ext = 'newick'
sniff(filename: str) bool[source]

Returning false as the newick format is too general and cannot be sniffed.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.data.Nexus(**kwd)[source]

Bases: Text

Nexus format as used By Paup, Mr Bayes, etc

edam_data = 'data_0872'
edam_format = 'format_1912'
file_ext = 'nex'
sniff_prefix(file_prefix: FilePrefix) bool[source]

All Nexus Files Simply puts a ‘#NEXUS’ in its first line

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
galaxy.datatypes.data.get_test_fname(fname)[source]

Returns test data filename

galaxy.datatypes.data.get_file_peek(file_name, width=256, line_count=5, skipchars=None, line_wrap=True)[source]

Returns the first line_count lines wrapped to width.

>>> def assert_peek_is(file_name, expected, *args, **kwd):
...     path = get_test_fname(file_name)
...     peek = get_file_peek(path, *args, **kwd)
...     assert peek == expected, "%s != %s" % (peek, expected)
>>> assert_peek_is('0_nonewline', u'0')
>>> assert_peek_is('0.txt', u'0\n')
>>> assert_peek_is('4.bed', u'chr22\t30128507\t31828507\tuc003bnx.1_cds_2_0_chr22_29227_f\t0\t+\n', line_count=1)
>>> assert_peek_is('1.bed', u'chr1\t147962192\t147962580\tCCDS989.1_cds_0_0_chr1_147962193_r\t0\t-\nchr1\t147984545\t147984630\tCCDS990.1_cds_0_0_chr1_147984546_f\t0\t+\n', line_count=2)

galaxy.datatypes.flow module

Flow analysis datatypes.

class galaxy.datatypes.flow.FCS(**kwd)[source]

Bases: Binary

Class describing an FCS binary file

file_ext = 'fcs'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checking if the file is in FCS format. Should read FCS2.0, FCS3.0 and FCS3.1

Based on flowcore: https://github.com/RGLab/flowCore/blob/27141b792ad65ae8bd0aeeef26e757c39cdaefe7/R/IO.R#L667

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.genetics module

rgenetics datatypes Use at your peril Ross Lazarus for the rgenetics and galaxy projects

genome graphs datatypes derived from Interval datatypes genome graphs datasets have a header row with appropriate columnames The first column is always the marker - eg columname = rs, first row= rs12345 if the rows are snps subsequent row values are all numeric ! Will fail if any non numeric (eg ‘+’ or ‘NA’) values ross lazarus for rgenetics august 20 2007

class galaxy.datatypes.genetics.GenomeGraphs(**kwd)[source]

Bases: Tabular

Tab delimited data containing a marker id and any number of numeric values

file_ext = 'gg'
__init__(**kwd)[source]

Initialize gg datatype, by adding UCSC display apps

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

as_ucsc_display_file(dataset: DatasetProtocol, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]

Returns file

from the ever-helpful angie hinrichs angie@soe.ucsc.edu a genome graphs call looks like this

http://genome.ucsc.edu/cgi-bin/hgGenome?clade=mammal&org=Human&db=hg18&hgGenome_dataSetName=dname &hgGenome_dataSetDescription=test&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess &hgGenome_columnLabels=best%20guess&hgGenome_maxVal=&hgGenome_labelVals= &hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http://galaxy.esphealth.org/datasets/333/display/index &hgGenome_doSubmitUpload=submit

Galaxy gives this for an interval file

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr1:1-1000&hgt.customText= http%3A%2F%2Fgalaxy.esphealth.org%2Fdisplay_as%3Fid%3D339%26display_app%3Ducsc

make_html_table(dataset: DatasetProtocol, **kwargs) str[source]

Create HTML table, used for displaying peek

validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]

Validate a gg file - all numeric after header row

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in gg format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> GenomeGraphs().sniff( fname )
False
>>> fname = get_test_fname( '1.gg' )
>>> GenomeGraphs().sniff( fname )
True
get_mime() str[source]

Returns the mime type of the datatype

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'markerCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.genetics.rgTabList(**kwd)[source]

Bases: Tabular

for sampleid and for featureid lists of exclusions or inclusions in the clean tool featureid subsets on statistical criteria -> specialized display such as gg

file_ext = 'rgTList'
__init__(**kwd)[source]

Initialize featurelistt datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

get_mime() str[source]

Returns the mime type of the datatype

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.rgSampleList(**kwd)[source]

Bases: rgTabList

for sampleid exclusions or inclusions in the clean tool output from QC eg excess het, gender error, ibd pair member,eigen outlier,excess mendel errors,… since they can be uploaded, should be flexible but they are persistent at least same infrastructure for expression?

file_ext = 'rgSList'
__init__(**kwd)[source]

Initialize samplelist datatype

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.rgFeatureList(**kwd)[source]

Bases: rgTabList

for featureid lists of exclusions or inclusions in the clean tool output from QC eg low maf, high missingness, bad hwe in controls, excess mendel errors,… featureid subsets on statistical criteria -> specialized display such as gg same infrastructure for expression?

file_ext = 'rgFList'
__init__(**kwd)[source]

Initialize featurelist datatype

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Rgenetics(**kwd)[source]

Bases: Html

base class to use for rgenetics datatypes derived from html - composite datatype elements stored in extra files path

composite_type: str | None = 'auto_primary_file'
file_ext = 'rgenetics'
generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
regenerate_primary_file(dataset: DatasetProtocol) None[source]

cannot do this until we are setting metadata

get_mime() str[source]

Returns the mime type of the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

for lped/pbed eg

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.SNPMatrix(**kwd)[source]

Bases: Rgenetics

BioC SNPMatrix Rgenetics data collections

file_ext = 'snpmatrix'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff(filename: str) bool[source]

need to check the file header hex code

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Lped(**kwd)[source]

Bases: Rgenetics

linkage pedigree (ped,map) Rgenetics data collections

file_ext = 'lped'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Pphe(**kwd)[source]

Bases: Rgenetics

Plink phenotype file - header must have FID IID… Rgenetics data collections

file_ext = 'pphe'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Fphe(**kwd)[source]

Bases: Rgenetics

fbat pedigree file - mad format with ! as first char on header row Rgenetics data collections

file_ext = 'fphe'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Phe(**kwd)[source]

Bases: Rgenetics

Phenotype file

file_ext = 'phe'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Fped(**kwd)[source]

Bases: Rgenetics

FBAT pedigree format - single file, map is header row of rs numbers. Strange. Rgenetics data collections

file_ext = 'fped'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Pbed(**kwd)[source]

Bases: Rgenetics

Plink Binary compressed 2bit/geno Rgenetics data collections

file_ext = 'pbed'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.ldIndep(**kwd)[source]

Bases: Rgenetics

LD (a good measure of redundancy of information) depleted Plink Binary compressed 2bit/geno This is really a plink binary, but some tools work better with less redundancy so are constrained to these files

file_ext = 'ldreduced'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Eigenstratgeno(**kwd)[source]

Bases: Rgenetics

Eigenstrat format - may be able to get rid of this if we move to shellfish Rgenetics data collections

file_ext = 'eigenstratgeno'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Eigenstratpca(**kwd)[source]

Bases: Rgenetics

Eigenstrat PCA file for case control adjustment Rgenetics data collections

file_ext = 'eigenstratpca'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Snptest(**kwd)[source]

Bases: Rgenetics

BioC snptest Rgenetics data collections

file_ext = 'snptest'
metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.IdeasPre(**kwd)[source]

Bases: Html

This datatype defines the input format required by IDEAS: https://academic.oup.com/nar/article/44/14/6721/2468150 The IDEAS preprocessor tool produces an output using this format. The extra_files_path of the primary input dataset contains the following files and directories. - chromosome_windows.txt (optional) - chromosomes.bed (optional) - IDEAS_input_config.txt - compressed archived tmp directory containing a number of compressed bed files.

composite_type: str | None = 'auto_primary_file'
file_ext = 'ideaspre'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
regenerate_primary_file(dataset: DatasetProtocol) None[source]
metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom_bed': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom_windows': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'input_config': <galaxy.model.metadata.MetadataElementSpec object>, 'tmp_archive': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Pheno(**kwd)[source]

Bases: Tabular

base class for pheno files

file_ext = 'pheno'
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.RexpBase(**kwd)[source]

Bases: Html

base class for BioC data structures in Galaxy must be constructed with the pheno data in place since that goes into the metadata for each instance

file_ext = 'rexpbase'
html_table = None
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime() str[source]

Returns the mime type of the datatype

get_phecols(phenolist: List, maxConc: int = 20) List[source]

sept 2009: cannot use whitespace to split - make a more complex structure here and adjust the methods that rely on this structure return interesting phenotype column names for an rexpression eset or affybatch to use in array subsetting and so on. Returns a data structure for a dynamic Galaxy select parameter. A column with only 1 value doesn’t change, so is not interesting for analysis. A column with a different value in every row is equivalent to a unique identifier so is also not interesting for anova or limma analysis - both these are removed after the concordance (count of unique terms) is constructed for each column. Then a complication - each remaining pair of columns is tested for redundancy - if two columns are always paired, then only one is needed :)

get_pheno(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh note that R is wierd and adds the row.name in the header so the columns are all wrong - unless you tell it not to. A file can be written as write.table(file=’foo.pheno’,pData(foo),sep=’ ‘,quote=F,row.names=F)

set_peek(dataset: DatasetProtocol, **kwd) None[source]

expects a .pheno file in the extra_files_dir - ugh note that R is weird and does not include the row.name in the header. why?

get_peek(dataset)[source]

expects a .pheno file in the extra_files_dir - ugh

get_file_peek(filename: str) str[source]

Read and return the first max_lines. (Can’t really peek at a filename - need the extra_files_path and such?)

regenerate_primary_file(dataset: DatasetProtocol) None[source]

cannot do this until we are setting metadata

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

NOTE we apply the tabular machinary to the phenodata extracted from a BioC eSet or affybatch.

make_html_table(pp: str = 'nothing supplied from peek\n') str[source]

Create HTML table, used for displaying peek

display_peek(dataset: DatasetProtocol) str[source]

Returns formatted html of peek

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Affybatch(**kwd)[source]

Bases: RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'affybatch'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.Eset(**kwd)[source]

Bases: RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'eset'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.MAlist(**kwd)[source]

Bases: RexpBase

derived class for BioC data structures in Galaxy

file_ext = 'malist'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.LinkageStudies(**kwd)[source]

Bases: Text

superclass for classical linkage analysis suites

test_files = ['linkstudies.allegro_fparam', 'linkstudies.alohomora_gts', 'linkstudies.linkage_datain', 'linkstudies.linkage_map']
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.genetics.GenotypeMatrix(**kwd)[source]

Bases: LinkageStudies

Sample matrix of genotypes - GTs as columns

file_ext = 'alohomora_gts'
__init__(**kwd)[source]

Initialize the datatype

header_check(fio: IO) bool[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> classname = GenotypeMatrix
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.genetics.MarkerMap(**kwd)[source]

Bases: LinkageStudies

Map of genetic markers including physical and genetic distance Common input format for linkage programs

chrom, genetic pos, markername, physical pos, Nr

file_ext = 'linkage_map'
header_check(fio: IO) bool[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> classname = MarkerMap
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.genetics.DataIn(**kwd)[source]

Bases: LinkageStudies

Common linkage input file for intermarker distances and recombination rates

file_ext = 'linkage_datain'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> classname = DataIn
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.genetics.AllegroLOD(**kwd)[source]

Bases: LinkageStudies

Allegro output format for LOD scores

file_ext = 'allegro_fparam'
header_check(fio: IO) bool[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> classname = AllegroLOD
>>> from galaxy.datatypes.sniff import get_test_fname
>>> extn_true = classname().file_ext
>>> file_true = get_test_fname("linkstudies." + extn_true)
>>> classname().sniff(file_true)
True
>>> false_files = list(LinkageStudies.test_files)
>>> false_files.remove("linkstudies." + extn_true)
>>> result_true = []
>>> for fname in false_files:
...     file_false = get_test_fname(fname)
...     res = classname().sniff(file_false)
...     if res:
...         result_true.append(fname)
>>>
>>> result_true
[]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.gis module

GIS classes

class galaxy.datatypes.gis.Shapefile(**kwd)[source]

Bases: Binary

The Shapefile data format: For more information please see http://en.wikipedia.org/wiki/Shapefile

composite_type: str | None = 'auto_primary_file'
file_ext = 'shp'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.goldenpath module

class galaxy.datatypes.goldenpath.GoldenPath(**kwd)[source]

Bases: Tabular

Class describing NCBI’s Golden Path assembly format

edam_format = 'format_3693'
file_ext = 'agp'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checks for and does cursory validation on data that looks like AGP

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('eg1.agp')
>>> GoldenPath().sniff(fname)
True
>>> fname = get_test_fname('eg2.agp')
>>> GoldenPath().sniff(fname)
True
>>> fname = get_test_fname('1.bed')
>>> GoldenPath().sniff(fname)
False
>>> fname = get_test_fname('2.tabular')
>>> GoldenPath().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
exception galaxy.datatypes.goldenpath.AGPError(fname, line_number, message='Error in AGP file.')[source]

Bases: Exception

Exception raised for AGP related errors.

__init__(fname, line_number, message='Error in AGP file.')[source]
class galaxy.datatypes.goldenpath.AGPFile(in_file)[source]

Bases: object

A class storing the contents of an AGP v2.1 file. https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/

The class is able to read new AGP lines in order to sequentially build the complete file.

The class should be capable of checking the validity of the file, as well as writing the AGP contents to a file stream.

Common abbreviations:

“comp”: AGP component “obj”: AGP object “pid”: AGP part number

__init__(in_file)[source]
property agp_version
property fname
property num_lines

Calculate the number of lines in the current state of the AGP file.

iterate_objs()[source]

Iterate over the objects of the AGP file.

iterate_lines()[source]

Iterate over the non-comment lines of AGP file.

class galaxy.datatypes.goldenpath.AGPObject(agp_fname, in_agp_line)[source]

Bases: object

Represents an AGP object. Objects will consist of AGP lines, and have to adhere to certain rules. By organizing AGP lines into the objects that they comprise, we can easily calculate stats about the assembly (the collection of objects).

__init__(agp_fname, in_agp_line)[source]
property obj
property obj_len
property num_lines
add_line(agp_line)[source]
iterate_lines()[source]
class galaxy.datatypes.goldenpath.AGPLine(fname, line_number, obj, obj_beg, obj_end, pid, comp_type)[source]

Bases: object

An abstract base class representing a single AGP file line. Inheriting subclasses should override or implement new methods to check the validity of a single AFP line. Validity checks that involve multiple lines should not be considered.

allowed_comp_types: Set[str] = {}
__init__(fname, line_number, obj, obj_beg, obj_end, pid, comp_type)[source]
class galaxy.datatypes.goldenpath.AGPSeqLine(fname, line_number, obj, obj_beg, obj_end, pid, comp_type, comp, comp_beg, comp_end, orientation)[source]

Bases: AGPLine

A subclass of AGPLine specifically for AGP lines that represent sequences.

allowed_comp_types: Set[str] = {'A', 'D', 'F', 'G', 'O', 'P', 'W'}
allowed_orientations = {'+', '-', '0', '?', 'na'}
__init__(fname, line_number, obj, obj_beg, obj_end, pid, comp_type, comp, comp_beg, comp_end, orientation)[source]
class galaxy.datatypes.goldenpath.AGPGapLine(fname, line_number, obj, obj_beg, obj_end, pid, comp_type, gap_len, gap_type, linkage, linkage_evidence)[source]

Bases: AGPLine

A subclass of AGPLine specifically for AGP lines that represent sequence gaps.

allowed_comp_types: Set[str] = {'N', 'U'}
allowed_linkage_types = {'no', 'yes'}
allowed_gap_types = {'centromere', 'contamination', 'contig', 'heterochromatin', 'repeat', 'scaffold', 'short_arm', 'telomere'}
allowed_evidence_types = {'align_genus', 'align_trnscpt', 'align_xgenus', 'clone_contig', 'map', 'na', 'paired-ends', 'pcr', 'proximity_ligation', 'strobe', 'unspecified', 'within_clone'}
__init__(fname, line_number, obj, obj_beg, obj_end, pid, comp_type, gap_len, gap_type, linkage, linkage_evidence)[source]

galaxy.datatypes.graph module

Graph content classes.

class galaxy.datatypes.graph.Xgmml(**kwd)[source]

Bases: GenericXml

XGMML graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

file_ext = 'xgmml'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]

Returns false and the user must manually set.

static merge(split_files: List[str], output_file: str) None[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

node_edge_dataprovider(dataset: DatasetProtocol, **settings) XGMMLGraphDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'node-edge': <function Xgmml.node_edge_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'xml': <function GenericXml.xml_dataprovider>}
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.graph.Sif(**kwd)[source]

Bases: Tabular

SIF graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).

First column: node id Second column: relationship type Third to Nth column: target ids for link

file_ext = 'sif'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]

Returns false and the user must manually set.

static merge(split_files: List[str], output_file: str) None[source]

Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.

node_edge_dataprovider(dataset: DatasetProtocol, **settings) SIFGraphDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'node-edge': <function Sif.node_edge_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.graph.XGMMLGraphDataProvider(source, selector=None, max_depth=None, **kwargs)[source]

Bases: XMLDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings: Dict[str, str] = {'limit': 'int', 'max_depth': 'int', 'offset': 'int', 'selector': 'str'}
class galaxy.datatypes.graph.SIFGraphDataProvider(source, indeces=None, column_count=None, column_types=None, parsers=None, parse_columns=True, deliminator='\t', filters=None, **kwargs)[source]

Bases: ColumnarDataProvider

Provide two lists: nodes, edges:

'nodes': contains objects of the form:
    { 'id' : <some string id>, 'data': <any extra data> }
'edges': contains objects of the form:
    { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
settings: Dict[str, str] = {'column_count': 'int', 'column_types': 'list:str', 'comment_char': 'str', 'deliminator': 'str', 'filters': 'list:str', 'indeces': 'list:int', 'invert': 'bool', 'limit': 'int', 'offset': 'int', 'parse_columns': 'bool', 'provide_blank': 'bool', 'regex_list': 'list:escaped', 'strip_lines': 'bool', 'strip_newlines': 'bool'}

galaxy.datatypes.hdf5 module

Composite datatype for the HDF5SummarizedExperiment R data object.

This datatype was created for use with the iSEE interactive tool.

class galaxy.datatypes.hdf5.HDF5SummarizedExperiment(**kwd)[source]

Bases: Data

Composite datatype to represent HDF5SummarizedExperiment objects.

A lightweight shell file se.rds is read into memory by R, and provides an interface to the much larger assays.h5 files which contains the experiment data.

Within R, the HDF5SummarizedExperiment object is conventionally referenced by the parent directory name of these two files. In Galaxy tool commands, the parent directory can be accessed through param_name.extra_files_path.

file_ext = 'rdata.se'
composite_type: str | None = 'auto_primary_file'
allow_datatype_change: bool | None = False
__init__(**kwd)[source]

Construct object from input files.

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]

Override parent init metadata.

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

Generate primary file to represent dataset.

sniff(filename: str) bool[source]

Returns false and the user must manually set.

get_mime() str[source]

Return the mime type of the datatype.

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.images module

Image classes

class galaxy.datatypes.images.Image(**kwd)[source]

Bases: Data

Class describing an image

edam_data = 'data_2968'
edam_format = 'format_3547'
file_ext = ''
__init__(**kwd)[source]

Initialize the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]

Determine if the file is in this format

handle_dataset_as_image(hda: DatasetProtocol) str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Jpg(**kwd)[source]

Bases: Image

edam_format = 'format_3579'
file_ext = 'jpg'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Png(**kwd)[source]

Bases: Image

edam_format = 'format_3603'
file_ext = 'png'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Tiff(**kwd)[source]

Bases: Image

edam_format = 'format_3591'
file_ext = 'tiff'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'offsets': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.OMETiff(**kwd)[source]

Bases: Tiff

file_ext = 'ome.tiff'
sniff(filename: str) bool[source]

Determine if the file is in this format

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'offsets': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Hamamatsu(**kwd)[source]

Bases: Image

file_ext = 'vms'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Mirax(**kwd)[source]

Bases: Image

file_ext = 'mrxs'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Sakura(**kwd)[source]

Bases: Image

file_ext = 'svslide'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Nrrd(**kwd)[source]

Bases: Image

file_ext = 'nrrd'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Bmp(**kwd)[source]

Bases: Image

edam_format = 'format_3592'
file_ext = 'bmp'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Gif(**kwd)[source]

Bases: Image

edam_format = 'format_3467'
file_ext = 'gif'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Im(**kwd)[source]

Bases: Image

edam_format = 'format_3593'
file_ext = 'im'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Pcd(**kwd)[source]

Bases: Image

edam_format = 'format_3594'
file_ext = 'pcd'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Pcx(**kwd)[source]

Bases: Image

edam_format = 'format_3595'
file_ext = 'pcx'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Ppm(**kwd)[source]

Bases: Image

edam_format = 'format_3596'
file_ext = 'ppm'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Psd(**kwd)[source]

Bases: Image

edam_format = 'format_3597'
file_ext = 'psd'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Xbm(**kwd)[source]

Bases: Image

edam_format = 'format_3598'
file_ext = 'xbm'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Xpm(**kwd)[source]

Bases: Image

edam_format = 'format_3599'
file_ext = 'xpm'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Rgb(**kwd)[source]

Bases: Image

edam_format = 'format_3600'
file_ext = 'rgb'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Pbm(**kwd)[source]

Bases: Image

edam_format = 'format_3601'
file_ext = 'pbm'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Pgm(**kwd)[source]

Bases: Image

edam_format = 'format_3602'
file_ext = 'pgm'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Eps(**kwd)[source]

Bases: Image

edam_format = 'format_3466'
file_ext = 'eps'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Rast(**kwd)[source]

Bases: Image

edam_format = 'format_3605'
file_ext = 'rast'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Pdf(**kwd)[source]

Bases: Image

edam_format = 'format_3508'
file_ext = 'pdf'
sniff(filename: str) bool[source]

Determine if the file is in pdf format.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Tck(**kwd)[source]

Bases: Binary

Tracks file format (.tck) format https://mrtrix.readthedocs.io/en/latest/getting_started/image_data.html#tracks-file-format-tck

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('fibers_sparse_top_6_lines.tck')
>>> Tck().sniff( fname )
True
>>> fname = get_test_fname('2.txt')
>>> Tck().sniff( fname )
False
file_ext = 'tck'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Trk(**kwd)[source]

Bases: Binary

Track File format (.trk) is the tractography file format. http://trackvis.org/docs/?subsect=fileformat

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('IIT2mean_top_2000bytes.trk')
>>> Trk().sniff( fname )
True
>>> fname = get_test_fname('2.txt')
>>> Trk().sniff( fname )
False
file_ext = 'trk'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Mrc2014(**kwd)[source]

Bases: Binary

MRC/CCP4 2014 file format (.mrc). https://www.ccpem.ac.uk/mrc_format/mrc2014.php

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.mrc')
>>> Mrc2014().sniff(fname)
True
>>> fname = get_test_fname('2.txt')
>>> Mrc2014().sniff(fname)
False
file_ext = 'mrc'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Gmaj(**kwd)[source]

Bases: Data

Deprecated class. Exists for limited backwards compatibility.

edam_format = 'format_3547'
file_ext = 'gmaj.zip'
get_mime() str[source]

Returns the mime type of the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Analyze75(**kwd)[source]

Bases: Binary

Mayo Analyze 7.5 files http://www.imzml.org

file_ext = 'analyze75'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Nifti1(**kwd)[source]

Bases: Binary

Nifti1 format https://nifti.nimh.nih.gov/pub/dist/src/niftilib/nifti1.h

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('T1_top_350bytes.nii1')
>>> Nifti1().sniff( fname )
True
>>> fname = get_test_fname('2.txt')
>>> Nifti1().sniff( fname )
False
file_ext = 'nii1'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Nifti2(**kwd)[source]

Bases: Binary

Nifti2 format https://brainder.org/2015/04/03/the-nifti-2-file-format/

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('avg152T1_LR_nifti2_top_100bytes.nii2')
>>> Nifti2().sniff( fname )
True
>>> fname = get_test_fname('T1_top_350bytes.nii1')
>>> Nifti2().sniff( fname )
False
file_ext = 'nii2'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Gifti(**kwd)[source]

Bases: GenericXml

Class describing a Gifti format

file_ext = 'gii'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a Gifti file

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Human.colin.R.activations.label.gii')
>>> Gifti().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Gifti().sniff(fname)
False
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml')
>>> Gifti().sniff(fname)
False
>>> fname = get_test_fname('tblastn_four_human_vs_rhodopsin.blastxml')
>>> Gifti().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Star(**kwd)[source]

Bases: Text

Base format class for Relion STAR (Self-defining Text Archiving and Retrieval) image files. https://relion.readthedocs.io/en/latest/Reference/Conventions.html

file_ext = 'star'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Each file must have one or more data blocks. The start of a data block is defined by the keyword data_ followed by an optional string for identification (e.g., data_images). All text before the first data_ keyword are comments

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.star')
>>> Star().sniff(fname)
True
>>> fname = get_test_fname('interval.interval')
>>> Star().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.images.Html(**kwd)[source]

Bases: Html

Deprecated class. This class should not be used anymore, but the galaxy.datatypes.text:Html one. This is for backwards compatibilities only.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.images.Laj(**kwd)[source]

Bases: Text

Deprecated class. Exists for limited backwards compatibility.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.interval module

Interval datatypes

class galaxy.datatypes.interval.Interval(**kwd)[source]

Bases: Tabular

Tab delimited data containing interval information

edam_data = 'data_3002'
edam_format = 'format_3475'
file_ext = 'interval'
line_class = 'region'
track_type: str | None = 'FeatureTrack'
data_sources: Dict[str, str] = {'data': 'tabix', 'index': 'bigwig'}
__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display apps

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, *, overwrite: bool = True, first_line_is_header: bool = False, **kwd) None[source]

Tries to guess from the line the location number of the column for the chromosome, region start-end and strand

displayable(dataset: DatasetProtocol) bool[source]
get_estimated_display_viewport(dataset: DatasetProtocol, chrom_col: int | None = None, start_col: int | None = None, end_col: int | None = None) Tuple[str | None, str | None, str | None][source]

Return a chrom, start, stop tuple for viewing a file.

as_ucsc_display_file(dataset: DatasetProtocol, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]

Returns file contents with only the bed data

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

Generate links to UCSC genome browser sites based on the dbkey and content of dataset.

validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]

Validate an interval file using the bx GenomicIntervalReader

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checks for ‘intervalness’

This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> Interval().sniff( fname )
False
>>> fname = get_test_fname( 'interval.interval' )
>>> Interval().sniff( fname )
True
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
interval_dataprovider(dataset: DatasetProtocol, **settings) IntervalDataProvider[source]
interval_dict_dataprovider(dataset: DatasetProtocol, **settings) IntervalDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Interval.genomic_region_dataprovider>, 'genomic-region-dict': <function Interval.genomic_region_dict_dataprovider>, 'interval': <function Interval.interval_dataprovider>, 'interval-dict': <function Interval.interval_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.interval.BedGraph(**kwd)[source]

Bases: Interval

Tab delimited chrom/start/end/datavalue dataset

edam_format = 'format_3583'
file_ext = 'bedgraph'
track_type: str | None = 'LineTrack'
data_sources: Dict[str, str] = {'data': 'bigwig', 'index': 'bigwig'}
as_ucsc_display_file(dataset: DatasetProtocol, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]

Returns file contents as is with no modifications. TODO: this is a functional stub and will need to be enhanced moving forward to provide additional support for bedgraph.

get_estimated_display_viewport(dataset: DatasetProtocol, chrom_col: int | None = 0, start_col: int | None = 1, end_col: int | None = 2) Tuple[str | None, str | None, str | None][source]

Set viewport based on dataset’s first 100 lines.

metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Bed(**kwd)[source]

Bases: Interval

Tab delimited data in BED format

edam_format = 'format_3003'
file_ext = 'bed'
data_sources: Dict[str, str] = {'data': 'tabix', 'feature_search': 'fli', 'index': 'bigwig'}
track_type: str | None = 'FeatureTrack'
check_required_metadata = True
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts']
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Sets the metadata information for datasets previously determined to be in bed format.

as_ucsc_display_file(dataset: DatasetProtocol, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]

Returns file contents with only the bed data. If bed 6+, treat as interval.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checks for ‘bedness’

BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used. The data type of all 12 columns is: 1-str, 2-int, 3-int, 4-str, 5-int, 6-str, 7-int, 8-int, 9-int or list, 10-int, 11-list, 12-list

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_tab.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'interv1.bed' )
>>> Bed().sniff( fname )
True
>>> fname = get_test_fname( 'complete.bed' )
>>> Bed().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.ProBed(**kwd)[source]

Bases: Bed

Tab delimited data in proBED format - adaptation of BED for proteomics data.

edam_format = 'format_3827'
file_ext = 'probed'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts', 'ProteinAccession', 'PeptideSequence', 'Uniqueness', 'GenomeReferenceVersion', 'PsmScore', 'Fdr', 'Modifications', 'Charge', 'ExpMassToCharge', 'CalcMassToCharge', 'PsmRank', 'DatasetID', 'Uri']
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.BedStrict(**kwd)[source]

Bases: Bed

Tab delimited data in strict BED format - no non-standard columns allowed

edam_format = 'format_3584'
file_ext = 'bedstrict'
allow_datatype_change: bool | None = False
__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display apps

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Sets the metadata information for datasets previously determined to be in bed format.

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Bed6(**kwd)[source]

Bases: BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6

edam_format = 'format_3585'
file_ext = 'bed6'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Bed12(**kwd)[source]

Bases: BedStrict

Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12

edam_format = 'format_3586'
file_ext = 'bed12'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Gff(**kwd)[source]

Bases: Tabular, _RemoteCallMixin

Tab delimited data in Gff format

edam_data = 'data_1255'
edam_format = 'format_2305'
file_ext = 'gff'
valid_gff_frame = ['.', '0', '1', '2']
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group']
data_sources: Dict[str, str] = {'data': 'interval_index', 'feature_search': 'fli', 'index': 'bigwig'}
track_type: str | None = 'FeatureTrack'
__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_attribute_metadata(dataset: DatasetProtocol) None[source]

Sets metadata elements for dataset’s attributes.

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

get_estimated_display_viewport(dataset: DatasetProtocol) Tuple[str | None, str | None, str | None][source]

Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 formats. This function should correctly handle both…

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in gff format

GFF lines have nine required fields that must be tab-separated.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format3

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('gff.gff3')
>>> Gff().sniff( fname )
False
>>> fname = get_test_fname('test.gff')
>>> Gff().sniff( fname )
True
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
interval_dataprovider(dataset: DatasetProtocol, **settings)[source]
interval_dict_dataprovider(dataset: DatasetProtocol, **settings)[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Gff.genomic_region_dataprovider>, 'genomic-region-dict': <function Gff.genomic_region_dict_dataprovider>, 'interval': <function Gff.interval_dataprovider>, 'interval-dict': <function Gff.interval_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.interval.Gff3(**kwd)[source]

Bases: Gff

Tab delimited data in Gff3 format

edam_format = 'format_1975'
file_ext = 'gff3'
valid_gff3_strand = ['+', '-', '.', '?']
valid_gff3_phase = ['.', '0', '1', '2']
column_names = ['Seqid', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes']
track_type: str | None = 'FeatureTrack'
__init__(**kwd)[source]

Initialize datatype, by adding GBrowse display app

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in GFF version 3 format

GFF 3 format:

  1. adds a mechanism for representing more than one level of hierarchical grouping of features and subfeatures.

  2. separates the ideas of group membership and feature name/id

  3. constrains the feature type field to be taken from a controlled vocabulary.

  4. allows a single feature, such as an exon, to belong to more than one group at a time.

  5. provides an explicit convention for pairwise alignments

  6. provides an explicit convention for features that occupy disjunct regions

The format consists of 9 columns, separated by tabs (NOT spaces).

Undefined fields are replaced with the “.” character, as described in the original GFF spec.

For complete details see http://song.sourceforge.net/gff3.shtml

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.gff' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gff3().sniff( fname )
False
>>> fname = get_test_fname('gff.gff3')
>>> Gff3().sniff( fname )
True
>>> fname = get_test_fname( 'grch37.75.gtf' )
>>> Gff3().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Gtf(**kwd)[source]

Bases: Gff

Tab delimited data in Gtf format

edam_format = 'format_2306'
file_ext = 'gtf'
column_names = ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes']
track_type: str | None = 'FeatureTrack'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in gtf format

GTF lines have nine required fields that must be tab-separated. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space. The attribute list must begin with the two mandatory attributes:

gene_id value - A globally unique identifier for the genomic source of the sequence. transcript_id value - A globally unique identifier for the predicted transcript.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format4

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.bed' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gff' )
>>> Gtf().sniff( fname )
False
>>> fname = get_test_fname( 'test.gtf' )
>>> Gtf().sniff( fname )
True
>>> fname = get_test_fname( 'grch37.75.gtf' )
>>> Gtf().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.Wiggle(**kwd)[source]

Bases: Tabular, _RemoteCallMixin

Tab delimited data in wiggle format

edam_format = 'format_3005'
file_ext = 'wig'
track_type: str | None = 'LineTrack'
data_sources: Dict[str, str] = {'data': 'bigwig', 'index': 'bigwig'}
__init__(**kwd)[source]

Initialize the datatype

get_estimated_display_viewport(dataset: DatasetProtocol) Tuple[str | None, str | None, str | None][source]

Return a chrom, start, stop tuple for viewing a file.

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines wether the file is in wiggle format

The .wig format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line is the track data, which can be entered in several different formats.

The track definition line begins with the word ‘track’ followed by the track type. The track type with version is REQUIRED, and it currently must be wiggle_0. For example, track type=wiggle_0…

For complete details see http://genome.ucsc.edu/goldenPath/help/wiggle.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'interv1.bed' )
>>> Wiggle().sniff( fname )
False
>>> fname = get_test_fname( 'wiggle.wig' )
>>> Wiggle().sniff( fname )
True
wiggle_dataprovider(dataset: DatasetProtocol, **settings) WiggleDataProvider[source]
wiggle_dict_dataprovider(dataset: DatasetProtocol, **settings) WiggleDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'wiggle': <function Wiggle.wiggle_dataprovider>, 'wiggle-dict': <function Wiggle.wiggle_dict_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.interval.CustomTrack(**kwd)[source]

Bases: Tabular

UCSC CustomTrack

edam_format = 'format_3588'
file_ext = 'customtrack'
__init__(**kwd)[source]

Initialize interval datatype, by adding UCSC display app

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

get_estimated_display_viewport(dataset: DatasetProtocol, chrom_col: int | None = None, start_col: int | None = None, end_col: int | None = None) Tuple[str | None, str | None, str | None][source]

Return a chrom, start, stop tuple for viewing a file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in customtrack format.

CustomTrack files are built within Galaxy and are basically bed or interval files with the first line looking something like this.

track name=”User Track” description=”User Supplied Track (from Galaxy)” color=0,0,0 visibility=1

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'complete.bed' )
>>> CustomTrack().sniff( fname )
False
>>> fname = get_test_fname( 'ucsc.customtrack' )
>>> CustomTrack().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.interval.ENCODEPeak(**kwd)[source]

Bases: Interval

Human ENCODE peak format. There are both broad and narrow peak formats. Formats are very similar; narrow peak has an additional column, though.

Broad peak ( http://genome.ucsc.edu/FAQ/FAQformat#format13 ): This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+3 format.

Narrow peak http://genome.ucsc.edu/FAQ/FAQformat#format12 and : This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

edam_format = 'format_3612'
file_ext = 'encodepeak'
column_names = ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'SignalValue', 'pValue', 'qValue', 'Peak']
data_sources: Dict[str, str] = {'data': 'tabix', 'index': 'bigwig'}
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.ChromatinInteractions(**kwd)[source]

Bases: Interval

Chromatin interactions obtained from 3C/5C/Hi-C experiments.

file_ext = 'chrint'
track_type: str | None = 'DiagonalHeatmapTrack'
data_sources: Dict[str, str] = {'data': 'tabix', 'index': 'bigwig'}
column_names = ['Chrom1', 'Start1', 'End1', 'Chrom2', 'Start2', 'End2', 'Value']
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'chrom1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'end1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'end2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'start1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'start2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'valueCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.ScIdx(**kwd)[source]

Bases: Tabular

ScIdx files are 1-based and consist of strand-specific coordinate counts. They always have 5 columns, and the first row is the column labels: ‘chrom’, ‘index’, ‘forward’, ‘reverse’, ‘value’. Each line following the first consists of data: chromosome name (type str), peak index (type int), Forward strand peak count (type int), Reverse strand peak count (type int) and value (type int). The value of the 5th ‘value’ column is the sum of the forward and reverse peak count values.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('cntrl_hg19.scidx')
>>> ScIdx().sniff(fname)
True
>>> Bed().sniff(fname)
False
>>> fname = get_test_fname('empty.txt')
>>> ScIdx().sniff(fname)
False
file_ext = 'scidx'
__init__(**kwd)[source]

Initialize scidx datatype.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checks for ‘scidx-ness.’

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.interval.IntervalTabix(**kwd)[source]

Bases: Interval

Class describing the bgzip format (http://samtools.github.io/hts-specs/SAMv1.pdf) As tabix is just a bgzip sorted for chr,start,end with an index

file_ext = 'interval_tabix.gz'
edam_format = 'format_3616'
compressed = True
compressed_format = 'gzip'
sniff_prefix(file_prefix: FilePrefix)[source]

Checks for ‘intervalness’

This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test_space.txt' )
>>> Interval().sniff( fname )
False
>>> fname = get_test_fname( 'interval.interval' )
>>> Interval().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, first_line_is_header: bool = False, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Tries to guess from the line the location number of the column for the chromosome, region start-end and strand

metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.JuicerMediumTabix(**kwd)[source]

Bases: IntervalTabix

Class describing a tabix file built from a juicer medium format: https://github.com/aidenlab/juicer/wiki/Pre#medium-format <readname> <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <mapq2>

str = strand (0 for forward, anything else for reverse) chr = chromosome (must be a chromosome in the genome) pos = position frag = restriction site fragment mapq = mapping quality score

file_ext = 'juicer_medium_tabix.gz'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.BedTabix(**kwd)[source]

Bases: IntervalTabix

Class describing a tabix file built from a bed file

file_ext = 'bed_tabix.gz'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.interval.GffTabix(**kwd)[source]

Bases: IntervalTabix

Class describing a tabix file built from a bed file

file_ext = 'gff_tabix.gz'
metadata_spec: MetadataSpecCollection = {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.isa module

ISA datatype

See https://github.com/ISA-tools

class galaxy.datatypes.isa.IsaTab(**kwd)[source]

Bases: _Isa

file_ext = 'isa-tab'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.isa.IsaJson(**kwd)[source]

Bases: _Isa

file_ext = 'isa-json'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.media module

Video classes

galaxy.datatypes.media.ffprobe(path)[source]
class galaxy.datatypes.media.Audio(**kwd)[source]

Bases: Binary

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Video(**kwd)[source]

Bases: Binary

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Unimplemented method, allows guessing of metadata from contents of file

metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Mkv(**kwd)[source]

Bases: Video

file_ext = 'mkv'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Mp4(**kwd)[source]

Bases: Video

Class that reads MP4 video file. >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Mp4, ‘video_1.mp4’) True >>> sniff_with_cls(Mp4, ‘audio_1.mp4’) False

file_ext = 'mp4'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Flv(**kwd)[source]

Bases: Video

file_ext = 'flv'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Mpg(**kwd)[source]

Bases: Video

file_ext = 'mpg'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Mp3(**kwd)[source]

Bases: Audio

Class that reads MP3 audio file. >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Mp3, ‘audio_2.mp3’) True >>> sniff_with_cls(Mp3, ‘audio_1.wav’) False

file_ext = 'mp3'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.media.Wav(**kwd)[source]

Bases: Audio

Class that reads WAV audio file >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Wav, ‘hello.wav’) True >>> sniff_with_cls(Wav, ‘audio_2.mp3’) False >>> sniff_with_cls(Wav, ‘drugbank_drugs.cml’) False

file_ext = 'wav'
blurb = 'RIFF WAV Audio file'
is_binary: bool | typing_extensions.Literal[maybe] = True
get_mime() str[source]

Returns the mime type of the datatype.

sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the metadata for this dataset from the file contents.

metadata_spec: MetadataSpecCollection = {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'nchannels': <galaxy.model.metadata.MetadataElementSpec object>, 'nframes': <galaxy.model.metadata.MetadataElementSpec object>, 'rate': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>, 'sampwidth': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.metacyto module

MetaCyto analysis datatypes.

class galaxy.datatypes.metacyto.mStats(**kwd)[source]

Bases: Tabular

Class describing the table of cluster statistics output from MetaCyto

file_ext = 'metacyto_stats.txt'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Quick test on file headings

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.metacyto.mSummary(**kwd)[source]

Bases: Tabular

Class describing the summary table output by MetaCyto after FCS preprocessing

file_ext = 'metacyto_summary.txt'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.metadata module

Expose the model metadata module as a datatype module also, allowing it to live in galaxy.model means the model module doesn’t have any dependencies on th datatypes module. This module will need to remain here for datatypes living in the tool shed so we might as well keep and use this interface from the datatypes module.

class galaxy.datatypes.metadata.Statement(target)[source]

Bases: object

This class inserts its target into a list in the surrounding class. the data.Data class has a metaclass which executes these statements. This is how we shove the metadata element spec into the class.

__init__(target)[source]
classmethod process(element)[source]
class galaxy.datatypes.metadata.MetadataCollection(parent: DatasetInstance | NoneDataset, session: galaxy_scoped_session | SessionlessContext | None = None)[source]

Bases: Mapping

MetadataCollection is not a collection at all, but rather a proxy to the real metadata which is stored as a Dictionary. This class handles processing the metadata elements when they are set and retrieved, returning default values in cases when metadata is not set.

__init__(parent: DatasetInstance | NoneDataset, session: galaxy_scoped_session | SessionlessContext | None = None) None[source]
get_parent()[source]
set_parent(parent)[source]
property parent
property spec
remove_key(name)[source]
element_is_set(name) bool[source]

check if the meta data with the given name is set, i.e.

  • if the such a metadata actually exists and

  • if its value differs from no_value

Parameters:

name – the name of the metadata element

Returns:

True if the value differes from the no_value False if its equal of if no metadata with the name is specified

get_metadata_parameter(name, **kwd)[source]
make_dict_copy(to_copy)[source]

Makes a deep copy of input iterable to_copy according to self.spec

property requires_dataset_id
from_JSON_dict(filename=None, path_rewriter=None, json_dict=None)[source]
to_JSON_dict(filename=None)[source]
class galaxy.datatypes.metadata.MetadataSpecCollection(*args, **kwds)[source]

Bases: OrderedDict

A simple extension of OrderedDict which allows cleaner access to items and allows the values to be iterated over directly as if it were a list. append() is also implemented for simplicity and does not “append”.

__init__(*args, **kwds)[source]
append(item)[source]
class galaxy.datatypes.metadata.MetadataParameter(spec)[source]

Bases: object

__init__(spec)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
to_string(value)[source]
to_safe_string(value)[source]
make_copy(value, target_context: MetadataCollection, source_context)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

validate(value)[source]

Throw an exception if the value is invalid.

unwrap(form_value)[source]

Turns a value into its storable form.

wrap(value, session)[source]

Turns a value into its usable form.

from_external_value(value, parent)[source]

Turns a value read from an external dict into its value to be pushed directly into the metadata dict.

to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

class galaxy.datatypes.metadata.MetadataElementSpec(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, optional=False, **kwargs)[source]

Bases: object

Defines a metadata element and adds it to the metadata_spec (which is a MetadataSpecCollection) of datatype.

__init__(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, optional=False, **kwargs)[source]
get(name, default=None)[source]
wrap(value, session)[source]

Turns a stored value into its usable form.

unwrap(value)[source]

Turns an incoming value into its storable form.

class galaxy.datatypes.metadata.SelectParameter(spec)[source]

Bases: MetadataParameter

__init__(spec)[source]
to_string(value)[source]
get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
wrap(value, session)[source]

Turns a value into its usable form.

classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

class galaxy.datatypes.metadata.DBKeyParameter(spec)[source]

Bases: SelectParameter

get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
make_copy(value, target_context: MetadataCollection, source_context)[source]
class galaxy.datatypes.metadata.RangeParameter(spec)[source]

Bases: SelectParameter

__init__(spec)[source]
get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

class galaxy.datatypes.metadata.ColumnParameter(spec)[source]

Bases: RangeParameter

get_field(value=None, context=None, other_values=None, values=None, **kwd)[source]
class galaxy.datatypes.metadata.ColumnTypesParameter(spec)[source]

Bases: MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.ListParameter(spec)[source]

Bases: MetadataParameter

to_string(value)[source]
class galaxy.datatypes.metadata.DictParameter(spec)[source]

Bases: MetadataParameter

to_string(value)[source]
to_safe_string(value)[source]
class galaxy.datatypes.metadata.PythonObjectParameter(spec)[source]

Bases: MetadataParameter

to_string(value)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

class galaxy.datatypes.metadata.FileParameter(spec)[source]

Bases: MetadataParameter

to_string(value)[source]
to_safe_string(value)[source]
get_field(value=None, context=None, other_values=None, **kwd)[source]
wrap(value, session)[source]

Turns a value into its usable form.

make_copy(value, target_context: MetadataCollection, source_context)[source]
classmethod marshal(value)[source]

This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.

from_external_value(value, parent, path_rewriter=None)[source]

Turns a value read from a external dict into its value to be pushed directly into the metadata dict.

to_external_value(value)[source]

Turns a value read from a metadata into its value to be pushed directly into the external dict.

new_file(dataset=None, metadata_tmp_files_dir=None, **kwds)[source]
class galaxy.datatypes.metadata.MetadataTempFile(metadata_tmp_files_dir=None, **kwds)[source]

Bases: object

__init__(metadata_tmp_files_dir=None, **kwds)[source]
tmp_dir = 'database/tmp'
get_file_name()[source]
to_JSON()[source]
classmethod from_JSON(json_dict)[source]
classmethod is_JSONified_value(value)[source]
classmethod cleanup_from_JSON_dict_filename(filename)[source]

galaxy.datatypes.microarrays module

class galaxy.datatypes.microarrays.GenericMicroarrayFile(**kwd)[source]

Bases: Text

Abstract class for most of the microarray files.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

get_mime() str[source]

Returns the mime type of the datatype

metadata_spec: MetadataSpecCollection = {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.microarrays.Gal(**kwd)[source]

Bases: GenericMicroarrayFile

Gal File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gal

edam_format = 'format_3829'
edam_data = 'data_3110'
file_ext = 'gal'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a Gal file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.gal')
>>> Gal().sniff(fname)
True
>>> fname = get_test_fname('test.gpr')
>>> Gal().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set metadata for Gal file.

metadata_spec: MetadataSpecCollection = {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.microarrays.Gpr(**kwd)[source]

Bases: GenericMicroarrayFile

Gpr File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gpr

edam_format = 'format_3829'
edam_data = 'data_3110'
file_ext = 'gpr'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a Gpr file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.gpr')
>>> Gpr().sniff(fname)
True
>>> fname = get_test_fname('test.gal')
>>> Gpr().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set metadata for Gpr file.

metadata_spec: MetadataSpecCollection = {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.molecules module

galaxy.datatypes.molecules.count_lines(filename, non_empty=False)[source]

counting the number of lines from the ‘filename’ file

class galaxy.datatypes.molecules.GenericMolFile(**kwd)[source]

Bases: Text

Abstract class for most of the molecule files.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

get_mime() str[source]

Returns the mime type of the datatype

element_symbols = ['Ac', 'Ag', 'Al', 'Am', 'Ar', 'As', 'At', 'Au', 'B ', 'Ba', 'Be', 'Bh', 'Bi', 'Bk', 'Br', 'C ', 'Ca', 'Cd', 'Ce', 'Cf', 'Cl', 'Cm', 'Co', 'Cr', 'Cs', 'Cu', 'Ds', 'Db', 'Dy', 'Er', 'Es', 'Eu', 'F ', 'Fe', 'Fm', 'Fr', 'Ga', 'Gd', 'Ge', 'H ', 'He', 'Hf', 'Hg', 'Ho', 'Hs', 'I ', 'In', 'Ir', 'K ', 'Kr', 'La', 'Li', 'Lr', 'Lu', 'Md', 'Mg', 'Mn', 'Mo', 'Mt', 'N ', 'Na', 'Nb', 'Nd', 'Ne', 'Ni', 'No', 'Np', 'O ', 'Os', 'P ', 'Pa', 'Pb', 'Pd', 'Pm', 'Po', 'Pr', 'Pt', 'Pu', 'Ra', 'Rb', 'Re', 'Rf', 'Rg', 'Rh', 'Rn', 'Ru', 'S ', 'Sb', 'Sc', 'Se', 'Sg', 'Si', 'Sm', 'Sn', 'Sr', 'Ta', 'Tb', 'Tc', 'Te', 'Th', 'Ti', 'Tl', 'Tm', 'U ', 'V ', 'W ', 'Xe', 'Y ', 'Yb', 'Zn', 'Zr']
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.AtomicStructFile(**kwd)[source]

Bases: GenericMolFile

Abstract class for structure files which use the ASE IO library for metadata.

meta_error = False
file_ext = ''
ase_format = ''
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Find Atom IDs for metadata.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

get_dataset_info(metadata)[source]
metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.MOL(**kwd)[source]

Bases: GenericMolFile

file_ext = 'mol'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number molecules, in the case of MOL its always one.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.SDF(**kwd)[source]

Bases: GenericMolFile

file_ext = 'sdf'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a SDF2 file.

An SDfile (structure-data file) can contain multiple compounds.

Each compound starts with a block in V2000 or V3000 molfile format, which ends with a line equal to ‘M END’. This is followed by a non-structural data block, which ends with a line equal to ‘$$$$’.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.sdf')
>>> SDF().sniff(fname)
True
>>> fname = get_test_fname('github88.v3k.sdf')
>>> SDF().sniff(fname)
True
>>> fname = get_test_fname('chebi_57262.v3k.mol')
>>> SDF().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of molecules in dataset.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by molecule records.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.MOL2(**kwd)[source]

Bases: GenericMolFile

file_ext = 'mol2'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a MOL2 file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.mol2')
>>> MOL2().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> MOL2().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by molecule records.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.FPS(**kwd)[source]

Bases: GenericMolFile

chemfp fingerprint file: http://code.google.com/p/chem-fingerprints/wiki/FPS

file_ext = 'fps'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a FPS file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('q.fps')
>>> FPS().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> FPS().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by fingerprint records.

static merge(split_files: List[str], output_file: str) None[source]

Merging fps files requires merging the header manually. We take the header from the first file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.OBFS(**kwd)[source]

Bases: Binary

OpenBabel Fastsearch format (fs).

file_ext = 'obfs'
composite_type: str | None = 'basic'
__init__(**kwd)[source]

A Fastsearch Index consists of a binary file with the fingerprints and a pointer the actual molecule file.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

get_mime() str[source]

Returns the mime type of the datatype (pretend it is text for peek)

static merge(split_files: List[str], output_file: str) None[source]

Merging Fastsearch indices is not supported.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Splitting Fastsearch indices is not supported.

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.DRF(**kwd)[source]

Bases: GenericMolFile

file_ext = 'drf'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.PHAR(**kwd)[source]

Bases: GenericMolFile

Pharmacophore database format from silicos-it.

file_ext = 'phar'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.PDB(**kwd)[source]

Bases: GenericMolFile

Protein Databank format. http://www.wwpdb.org/documentation/format33/v3.3.html

file_ext = 'pdb'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a PDB file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('5e5z.pdb')
>>> PDB().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> PDB().sniff(fname)
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Find Chain_IDs for metadata.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.PDBQT(**kwd)[source]

Bases: GenericMolFile

PDBQT Autodock and Autodock Vina format http://autodock.scripps.edu/faqs-help/faq/what-is-the-format-of-a-pdbqt-file

file_ext = 'pdbqt'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a PDBQT file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('NuBBE_1_obabel_3D.pdbqt')
>>> PDBQT().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> PDBQT().sniff(fname)
False
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.PQR(**kwd)[source]

Bases: GenericMolFile

Protein Databank format. https://apbs-pdb2pqr.readthedocs.io/en/latest/formats/pqr.html

file_ext = 'pqr'
get_matcher() Pattern[source]
Atom and HETATM line fields are space separated, match group:
0: Field_name

A string which specifies the type of PQR entry: ATOM or HETATM.

1: Atom_number

An integer which provides the atom index.

2: Atom_name

A string which provides the atom name.

3: Residue_name

A string which provides the residue name.

5: Chain_ID (Optional, group 4 is whole field)

An optional string which provides the chain ID of the atom. Note that chain ID support is a new feature of APBS 0.5.0 and later versions.

6: Residue_number

An integer which provides the residue index.

7: X 8: Y 9: Z

3 floats which provide the atomic coordinates (in angstroms)

10: Charge

A float which provides the atomic charge (in electrons).

11: Radius

A float which provides the atomic radius (in angstroms).

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a PQR file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘5e5z.pqr’) >>> PQR().sniff(fname) True >>> fname = get_test_fname(‘drugbank_drugs.cml’) >>> PQR().sniff(fname) False

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Find Optional Chain_IDs for metadata.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.Cell(**kwd)[source]

Bases: AtomicStructFile

CASTEP CELL format.

file_ext = 'cell'
ase_format = 'castep-cell'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a CASTEP CELL file.

A fingerprint for CELL files is the use of %BLOCK and %ENDBLOCK to denote data blocks (not case sensitive).

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si_uppercase.cell')
>>> Cell().sniff(fname)
True
>>> fname = get_test_fname('Si_lowercase.cell')
>>> Cell().sniff(fname)
True
>>> fname = get_test_fname('Si.cif')
>>> Cell().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.CIF(**kwd)[source]

Bases: AtomicStructFile

CIF format.

file_ext = 'cif'
ase_format = 'cif'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a CIF file.

The CIF format and the Relion STAR format have a shared origin. Note therefore that STAR files and the STAR sniffer also use data_ blocks. STAR files will not pass the CIF sniffer, but CIF files can pass the STAR sniffer.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.cif')
>>> CIF().sniff(fname)
True
>>> fname = get_test_fname('Si_lowercase.cell')
>>> CIF().sniff(fname)
False
>>> fname = get_test_fname('1.star')
>>> CIF().sniff(fname)
False
>>> fname = get_test_fname('LaMnO3.cif')
>>> CIF().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.XYZ(**kwd)[source]

Bases: AtomicStructFile

XYZ format.

file_ext = 'xyz'
ase_format = 'extxyz'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a XYZ file.

XYZ has no fingerprint phrases, so the whole prefix must be checked for the correct structure. If the prefix passes, assume the whole file passes.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.xyz')
>>> XYZ().sniff(fname)
True
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si_multi.xyz')
>>> XYZ().sniff(fname)
True
>>> fname = get_test_fname('Si.cif')
>>> XYZ().sniff(fname)
False
>>> fname = get_test_fname('not_a_xyz_file.txt')
>>> XYZ().sniff(fname)
False
read_blocks(lines: List) List[source]

Parses and returns a list of dictionaries representing XYZ structure blocks (aka frames).

Raises IndexError, TypeError, ValueError

metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.ExtendedXYZ(**kwd)[source]

Bases: XYZ

Extended XYZ format.

Uses specification from https://github.com/libAtoms/extxyz.

file_ext = 'extxyz'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is an Extended XYZ file.

XYZ files will not pass the ExtendedXYZ sniffer, but ExtendedXYZ files can pass the XYZ sniffer.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.extxyz')
>>> ExtendedXYZ().sniff(fname)
True
>>> fname = get_test_fname('Si.xyz')
>>> ExtendedXYZ().sniff(fname)
False
read_blocks(lines: List) List[source]

Parses and returns a list of XYZ structure blocks (aka frames).

Raises IndexError, TypeError, ValueError

metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.grd(**kwd)[source]

Bases: Text

file_ext = 'grd'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.grdtgz(**kwd)[source]

Bases: Binary

file_ext = 'grd.tgz'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.InChI(**kwd)[source]

Bases: Tabular

file_ext = 'inchi'
column_names = ['InChI']
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a InChI file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('drugbank_drugs.inchi')
>>> InChI().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> InChI().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.SMILES(**kwd)[source]

Bases: Tabular

file_ext = 'smi'
column_names = ['SMILES', 'TITLE']
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.CML(**kwd)[source]

Bases: GenericXml

Chemical Markup Language http://cml.sourceforge.net/

file_ext = 'cml'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a CML file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('interval.interval')
>>> CML().sniff(fname)
False
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> CML().sniff(fname)
True
classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by molecule records.

static merge(split_files: List[str], output_file: str) None[source]

Merging CML files.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.molecules.GRO(**kwd)[source]

Bases: GenericMolFile

GROMACS structure format. https://manual.gromacs.org/current/reference-manual/file-formats.html#gro

file_ext = 'gro'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess if the file is a GRO file.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('5e5z.gro')
>>> GRO().sniff_prefix(fname)
True
>>> fname = get_test_fname('5e5z.pdb')
>>> GRO().sniff_prefix(fname)
False
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.molecules.Magres(**kwd)[source]

Bases: AtomicStructFile

Report on a MAGRES calculation

file_ext = 'magres'
ase_format = 'magres'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a MAGRES log

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('ethanol.magres')
>>> Magres().sniff(fname)
True
>>> fname = get_test_fname('Si.cif')
>>> Magres().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'atom_data': <galaxy.model.metadata.MetadataElementSpec object>, 'chemical_formula': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'is_periodic': <galaxy.model.metadata.MetadataElementSpec object>, 'lattice_parameters': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_atoms': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.mothur module

Mothur Metagenomics Datatypes

class galaxy.datatypes.mothur.Otu(**kwd)[source]

Bases: Text

file_ext = 'mothur.otu'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set metadata for Otu files.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> from galaxy.util.bunch import Bunch
>>> dataset = Bunch()
>>> dataset.metadata = Bunch
>>> otu = Otu()
>>> dataset.get_file_name = lambda : get_test_fname( 'mothur_datatypetest_true.mothur.otu' )
>>> dataset.has_data = lambda: True
>>> otu.set_meta(dataset)
>>> dataset.metadata.columns
100
>>> len(dataset.metadata.labels) == 37
True
>>> len(dataset.metadata.otulabels) == 98
True
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is otu (operational taxonomic unit) format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.otu' )
>>> Otu().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.otu' )
>>> Otu().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.Sabund(**kwd)[source]

Bases: Otu

file_ext = 'mothur.sabund'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Sabund_file

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is otu (operational taxonomic unit) format label<TAB>count[<TAB>value(1..n)]

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.sabund' )
>>> Sabund().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.sabund' )
>>> Sabund().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.GroupAbund(**kwd)[source]

Bases: Otu

file_ext = 'mothur.shared'
__init__(**kwd)[source]

Initialize the datatype

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = 1, **kwd) None[source]

Set metadata for Otu files.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> from galaxy.util.bunch import Bunch
>>> dataset = Bunch()
>>> dataset.metadata = Bunch
>>> otu = Otu()
>>> dataset.get_file_name = lambda : get_test_fname( 'mothur_datatypetest_true.mothur.otu' )
>>> dataset.has_data = lambda: True
>>> otu.set_meta(dataset)
>>> dataset.metadata.columns
100
>>> len(dataset.metadata.labels) == 37
True
>>> len(dataset.metadata.otulabels) == 98
True
sniff_prefix(file_prefix: FilePrefix, vals_are_int=False) bool[source]

Determines whether the file is a otu (operational taxonomic unit) Shared format label<TAB>group<TAB>count[<TAB>value(1..n)] The first line is column headings as of Mothur v 1.2

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.shared' )
>>> GroupAbund().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.shared' )
>>> GroupAbund().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.SecondaryStructureMap(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.map'
__init__(**kwd)[source]

Initialize secondary structure map datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a secondary structure map format A single column with an integer value which indicates the row that this row maps to. Check to make sure if structMap[10] = 380 then structMap[380] = 10 and vice versa.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.map' )
>>> SecondaryStructureMap().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.map' )
>>> SecondaryStructureMap().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.AlignCheck(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.align.check'
__init__(**kwd)[source]

Initialize AlignCheck datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.AlignReport(**kwd)[source]

Bases: Tabular

QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template AY457915 501 82283 1525 kmer 89.07 needleman 5 501 1 499 499 2 0 0 97.6

file_ext = 'mothur.align.report'
__init__(**kwd)[source]

Initialize AlignCheck datatype

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.DistanceMatrix(**kwd)[source]

Bases: Text

file_ext = 'mothur.dist'
init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = 0, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.LowerTriangleDistanceMatrix(**kwd)[source]

Bases: DistanceMatrix

file_ext = 'mothur.lower.dist'
__init__(**kwd)[source]

Initialize secondary structure map datatype

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a lower-triangle distance matrix (phylip) format The first line has the number of sequences in the matrix. The remaining lines have the sequence name followed by a list of distances from all preceeding sequences

5 # possibly but not always preceded by a tab :/ U68589 U68590 0.3371 U68591 0.3609 0.3782 U68592 0.4155 0.3197 0.4148 U68593 0.2872 0.1690 0.3361 0.2842

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.lower.dist' )
>>> LowerTriangleDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.lower.dist' )
>>> LowerTriangleDistanceMatrix().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.SquareDistanceMatrix(**kwd)[source]

Bases: DistanceMatrix

file_ext = 'mothur.square.dist'
__init__(**kwd)[source]

Initialize the datatype

init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a square distance matrix (Column-formatted distance matrix) format The first line has the number of sequences in the matrix. The following lines have the sequence name in the first column plus a column for the distance to each sequence in the row order in which they appear in the matrix.

3 U68589 0.0000 0.3371 0.3610 U68590 0.3371 0.0000 0.3783 U68590 0.3371 0.0000 0.3783

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.square.dist' )
>>> SquareDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.square.dist' )
>>> SquareDistanceMatrix().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.PairwiseDistanceMatrix(**kwd)[source]

Bases: DistanceMatrix, Tabular

file_ext = 'mothur.pair.dist'
__init__(**kwd)[source]

Initialize secondary structure map datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a pairwise distance matrix (Column-formatted distance matrix) format The first and second columns have the sequence names and the third column is the distance between those sequences.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.pair.dist' )
>>> PairwiseDistanceMatrix().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.pair.dist' )
>>> PairwiseDistanceMatrix().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.Names(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.names'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Name_file Name file shows the relationship between a representative sequence(col 1) and the sequences(comma-separated) it represents(col 2)

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.Summary(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.summary'
__init__(**kwd)[source]

summarizes the quality of sequences in an unaligned or aligned fasta-formatted sequence file

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.Group(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.groups'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Groups_file Group file assigns sequence (col 1) to a group (col 2)

set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.AccNos(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.accnos'
__init__(**kwd)[source]

A list of names

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.Oligos(**kwd)[source]

Bases: Text

file_ext = 'mothur.oligos'
sniff_prefix(file_prefix: FilePrefix) bool[source]

http://www.mothur.org/wiki/Oligos_File Determines whether the file is a otu (operational taxonomic unit) format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.oligos' )
>>> Oligos().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.oligos' )
>>> Oligos().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.Frequency(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.freq'
__init__(**kwd)[source]

A list of names

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a frequency tabular format for chimera analysis

#1.14.0
0   0.000
1   0.000
...
155 0.975
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.freq' )
>>> Frequency().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.freq' )
>>> Frequency().sniff( fname )
False
>>> # Expression count matrix (EdgeR wrapper)
>>> fname = get_test_fname( 'mothur_datatypetest_false_2.mothur.freq' )
>>> Frequency().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.Quantile(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.quan'
__init__(**kwd)[source]

Quantiles for chimera analysis

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a quantiles tabular format for chimera analysis

1   0       0       0       0       0       0
2       0.309198        0.309198        0.37161 0.37161 0.37161 0.37161
3       0.510982        0.563213        0.693529        0.858939        1.07442 1.20608
...
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.quan' )
>>> Quantile().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.quan' )
>>> Quantile().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'filtered': <galaxy.model.metadata.MetadataElementSpec object>, 'masked': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.LaneMask(**kwd)[source]

Bases: Text

file_ext = 'mothur.filter'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a lane mask filter: 1 line consisting of zeros and ones.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.filter' )
>>> LaneMask().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.filter' )
>>> LaneMask().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.CountTable(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.count_table'
__init__(**kwd)[source]

http://www.mothur.org/wiki/Count_File A table with first column names and following columns integer counts # Example 1: Representative_Sequence total U68630 1 U68595 1 U68600 1 # Example 2 (with group columns): Representative_Sequence total forest pasture U68630 1 1 0 U68595 1 1 0 U68600 1 1 0 U68591 1 1 0 U68647 1 0 1

set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = 1, max_data_lines: int | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.RefTaxonomy(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.ref.taxonomy'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a Reference Taxonomy

http://www.mothur.org/wiki/Taxonomy_outline A table with 2 or 3 columns:

  • SequenceName

  • Taxonomy (semicolon-separated taxonomy in descending order)

  • integer ?

Example: 2-column (http://www.mothur.org/wiki/Taxonomy_outline)

X56533.1        Eukaryota;Alveolata;Ciliophora;Intramacronucleata;Oligohymenophorea;Hymenostomatida;Tetrahymenina;Glaucomidae;Glaucoma;
X97975.1        Eukaryota;Parabasalidea;Trichomonada;Trichomonadida;unclassified_Trichomonadida;
AF052717.1      Eukaryota;Parabasalidea;

Example: 3-column (http://vamps.mbl.edu/resources/databases.php)

v3_AA008    Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus      5
v3_AA016    Bacteria        120
v3_AA019    Archaea;Crenarchaeota;Marine_Group_I    1
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.ref.taxonomy' )
>>> RefTaxonomy().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.ref.taxonomy' )
>>> RefTaxonomy().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.ConsensusTaxonomy(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.cons.taxonomy'
__init__(**kwd)[source]

A list of names

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.TaxonomySummary(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.tax.summary'
__init__(**kwd)[source]

A Summary of taxon classification

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.mothur.Axes(**kwd)[source]

Bases: Tabular

file_ext = 'mothur.axes'
__init__(**kwd)[source]

Initialize axes datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is an axes format The first line may have column headings. The following lines have the name in the first column plus float columns for each axis.

group   axis1   axis2
forest  0.000000        0.145743
pasture 0.145743        0.000000
        axis1   axis2
U68589  0.262608        -0.077498
U68590  0.027118        0.195197
U68591  0.329854        0.014395
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.axes' )
>>> Axes().sniff( fname )
True
>>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.axes' )
>>> Axes().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.mothur.SffFlow(**kwd)[source]

Bases: Tabular

https://mothur.org/wiki/flow_file/ The first line is the total number of flow values - 800 for Titanium data. For GS FLX it would be 400. Following lines contain:

  • SequenceName

  • the number of useable flows as defined by 454’s software

  • the flow intensity for each base going in the order of TACG.

Example:

800
GQY1XT001CQL4K 85 1.04 0.00 1.00 0.02 0.03 1.02 0.05 ...
GQY1XT001CQIRF 84 1.02 0.06 0.98 0.06 0.09 1.05 0.07 ...
GQY1XT001CF5YW 88 1.02 0.02 1.01 0.04 0.06 1.02 0.03 ...
file_ext = 'mothur.sff.flow'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = 1, max_data_lines: int | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

make_html_table(dataset: DatasetProtocol, skipchars: List | None = None, **kwargs) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'flow_order': <galaxy.model.metadata.MetadataElementSpec object>, 'flow_values': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.msa module

class galaxy.datatypes.msa.InfernalCM(**kwd)[source]

Bases: Text

file_ext = 'cm'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'infernal_model.cm' )
>>> InfernalCM().sniff( fname )
True
>>> fname = get_test_fname( '2.txt' )
>>> InfernalCM().sniff( fname )
False
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of models and the version of CM file in dataset.

metadata_spec: MetadataSpecCollection = {'cm_version': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.msa.Hmmer(**kwd)[source]

Bases: Text

edam_data = 'data_1364'
edam_format = 'format_1370'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

abstract sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.msa.Hmmer2(**kwd)[source]

Bases: Hmmer

edam_format = 'format_3328'
file_ext = 'hmm2'
sniff_prefix(file_prefix: FilePrefix) bool[source]

HMMER2 files start with HMMER2.0

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.msa.Hmmer3(**kwd)[source]

Bases: Hmmer

edam_format = 'format_3329'
file_ext = 'hmm3'
sniff_prefix(file_prefix: FilePrefix) bool[source]

HMMER3 files start with HMMER3/f

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.msa.HmmerPress(**kwd)[source]

Bases: Binary

Class for hmmpress database files.

file_ext = 'hmmpress'
composite_type: str | None = 'basic'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.msa.Stockholm_1_0(**kwd)[source]

Bases: Text

edam_data = 'data_0863'
edam_format = 'format_1961'
file_ext = 'stockholm'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of models in dataset.

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split the input files by model records.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.msa.MauveXmfa(**kwd)[source]

Bases: Text

file_ext = 'xmfa'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.msa.Msf(**kwd)[source]

Bases: Text

Multiple sequence alignment format produced by the Accelrys GCG suite and other programs.

edam_data = 'data_0863'
edam_format = 'format_1947'
file_ext = 'msf'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.neo4j module

Neo4j Composite Dataset

class galaxy.datatypes.neo4j.Neo4j(**kwd)[source]

Bases: Html

base class to use for neostore datatypes derived from html - composite datatype elements stored in extra files path

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

get_mime() str[source]

Returns the mime type of the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML content, used for displaying peek.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.neo4j.Neo4jDB(**kwd)[source]

Bases: Neo4j, Data

Class for neo4jDB database files.

file_ext = 'neostore'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.neo4j.Neo4jDBzip(**kwd)[source]

Bases: Neo4j, Data

Class for neo4jDB database files.

file_ext = 'neostore.zip'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'neostore_zip': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_name': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.ngsindex module

NGS indexes

class galaxy.datatypes.ngsindex.BowtieIndex(**kwd)[source]

Bases: Html

base class for BowtieIndex is subclassed by BowtieColorIndex and BowtieBaseIndex

composite_type: str | None = 'auto_primary_file'
generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

regenerate_primary_file(dataset: DatasetProtocol) None[source]

cannot do this until we are setting metadata

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.ngsindex.BowtieColorIndex(**kwd)[source]

Bases: BowtieIndex

Bowtie color space index

file_ext = 'bowtie_color_index'
metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.ngsindex.BowtieBaseIndex(**kwd)[source]

Bases: BowtieIndex

Bowtie base space index

file_ext = 'bowtie_base_index'
metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.phylip module

Created on January. 05, 2018

@authors: Kenzo-Hugo Hillion and Fabien Mareuil, Institut Pasteur, Paris @contacts: kehillio@pasteur.fr and fabien.mareuil@pasteur.fr @project: galaxy @githuborganization: C3BI Phylip datatype sniffer

class galaxy.datatypes.phylip.Phylip(**kwd)[source]

Bases: Text

Phylip format stores a multiple sequence alignment

edam_data = 'data_0863'
edam_format = 'format_1997'
file_ext = 'phylip'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_strict_interleaved(nb_seq: int, seq_length: int, alignment_prefix: StringIO) bool[source]
sniff_strict_sequential(nb_seq: int, seq_length: int, alignment_prefix: StringIO) bool[source]
sniff_relaxed_interleaved(nb_seq: int, seq_length: int, alignment_prefix: StringIO) bool[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

All Phylip files starts with the number of sequences so we can use this to count the following number of sequences in the first ‘stack’

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_strict_interleaved.phylip')
>>> Phylip().sniff(fname)
True
>>> fname = get_test_fname('test_relaxed_interleaved.phylip')
>>> Phylip().sniff(fname)
True
>>> fname = get_test_fname("not_a_phylip_file.tabular")
>>> Phylip().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.plant_tribes module

class galaxy.datatypes.plant_tribes.Smat(**kwd)[source]

Bases: Text

file_ext = 'smat'
display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

The use of ESTScan implies the creation of scores matrices which reflect the codons preferences in the studied organisms. The ESTScan package includes scripts for generating these files. The output of these scripts consists of the matrices, one for each isochor, and which look like this:

FORMAT: hse_4is.conf CODING REGION 6 3 1 s C+G: 0 44 -1 0 2 -2 2 1 -8 0

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_space.txt')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('test_tab.bed')
>>> Smat().sniff(fname)
False
>>> fname = get_test_fname('1.smat')
>>> Smat().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.plant_tribes.PlantTribesKsComponents(**kwd)[source]

Bases: Tabular

file_ext = 'ptkscmp'
display_peek(dataset: DatasetProtocol) str[source]

Returns formatted html of peek

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of significant components in the Ks distribution. The dataset will always be on the order of less than 10 lines.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff(filename: str) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test_tab.bed')
>>> PlantTribesKsComponents().sniff(fname)
False
>>> fname = get_test_fname('1.ptkscmp')
>>> PlantTribesKsComponents().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_comp': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.proteomics module

Proteomics Datatypes

class galaxy.datatypes.proteomics.Wiff(**kwd)[source]

Bases: Binary

Class for wiff files.

edam_data = 'data_2536'
edam_format = 'format_3710'
file_ext = 'wiff'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Wiff2(**kwd)[source]

Bases: Binary

Class for wiff2 files.

edam_data = 'data_2536'
edam_format = 'format_3710'
file_ext = 'wiff2'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzTab(**kwd)[source]

Bases: Text

exchange format for proteomics and metabolomics results

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.mztab')
>>> MzTab().sniff(fname)
True
>>> fname = get_test_fname('test.mztab2')
>>> MzTab().sniff(fname)
False
edam_data = 'data_3681'
file_ext = 'mztab'
__init__(**kwd)[source]

Initialize the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is the correct type.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.MzTab2(**kwd)[source]

Bases: MzTab

exchange format for proteomics and metabolomics results

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.mztab2')
>>> MzTab2().sniff(fname)
True
>>> fname = get_test_fname('test.mztab')
>>> MzTab2().sniff(fname)
False
file_ext = 'mztab2'
__init__(**kwd)[source]

Initialize the datatype

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Kroenik(**kwd)[source]

Bases: Tabular

Kroenik (HardKloer sibling) files

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.kroenik')
>>> Kroenik().sniff(fname)
True
>>> fname = get_test_fname('test.peplist')
>>> Kroenik().sniff(fname)
False
file_ext = 'kroenik'
__init__(**kwd)[source]

Initialize the datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.PepList(**kwd)[source]

Bases: Tabular

Peplist file as used in OpenMS https://github.com/OpenMS/OpenMS/blob/0fc8765670a0ad625c883f328de60f738f7325a4/src/openms/source/FORMAT/FileHandler.cpp#L432

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.peplist')
>>> PepList().sniff(fname)
True
>>> fname = get_test_fname('test.psms')
>>> PepList().sniff(fname)
False
file_ext = 'peplist'
__init__(**kwd)[source]

Initialize the datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.PSMS(**kwd)[source]

Bases: Tabular

Percolator tab-delimited output (PSM level, .psms) as used in OpenMS https://github.com/OpenMS/OpenMS/blob/0fc8765670a0ad625c883f328de60f738f7325a4/src/openms/source/FORMAT/FileHandler.cpp#L453 see also http://www.kojak-ms.org/docs/percresults.html

Note that the data rows can have more columns than the header line since ProteinIds are listed tab-separated.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.psms')
>>> PSMS().sniff(fname)
True
>>> fname = get_test_fname('test.kroenik')
>>> PSMS().sniff(fname)
False
file_ext = 'psms'
__init__(**kwd)[source]

Initialize the datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.PEFF(**kwd)[source]

Bases: Sequence

PSI Extended FASTA Format https://github.com/HUPO-PSI/PEFF

file_ext = 'peff'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'test.peff' )
>>> PEFF().sniff( fname )
True
>>> fname = get_test_fname( 'sequence.fasta' )
>>> PEFF().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.PepXmlReport(**kwd)[source]

Bases: Tabular

pepxml converted to tabular report

edam_data = 'data_2536'
file_ext = 'pepxml.tsv'
__init__(**kwd)[source]

Initialize the datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ProtXmlReport(**kwd)[source]

Bases: Tabular

protxml converted to tabular report

edam_data = 'data_2536'
file_ext = 'protxml.tsv'
comment_lines = 1
__init__(**kwd)[source]

Initialize the datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Dta(**kwd)[source]

Bases: TabularData

dta The first line contains the singly protonated peptide mass (MH+) and the peptide charge state separated by a space. Subsequent lines contain space separated pairs of fragment ion m/z and intensity values.

file_ext = 'dta'
comment_lines = 0
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Dta2d(**kwd)[source]

Bases: TabularData

dta2d: files with three tab/space-separated columns. The default format is: retention time (seconds) , m/z , intensity. If the first line starts with ‘#’, a different order is defined by the the order of the keywords ‘MIN’ (retention time in minutes) or ‘SEC’ (retention time in seconds), ‘MZ’, and ‘INT’. Example: ‘#MZ MIN INT’ The peaks of one retention time have to be in subsequent lines.

Note: sniffer detects (tab or space separated) dta2d files with correct header, wo header seems to generic

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.dta2d')
>>> Dta2d().sniff(fname)
True
>>> fname = get_test_fname('test.edta')
>>> Dta2d().sniff(fname)
False
file_ext = 'dta2d'
comment_lines = 0
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.Edta(**kwd)[source]

Bases: TabularData

Input text file containing tab, space or comma separated columns. The separator between columns is checked in the first line in this order.

It supports three variants of this format.

  1. Columns are: RT, MZ, Intensity A header is optional.

  2. Columns are: RT, MZ, Intensity, Charge, <Meta-Data> columns{0,} A header is mandatory.

  3. Columns are: (RT, MZ, Intensity, Charge){1,}, <Meta-Data> columns{0,} Header is mandatory. First quadruplet is the consensus. All following quadruplets describe the sub-features. This variant is discerned from variant #2 by the name of the fifth column, which is required to be RT1 (or rt1). All other column names for sub-features are faithfully ignored.

Note the sniffer only detects files with header.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('test.edta')
>>> Edta().sniff(fname)
True
>>> fname = get_test_fname('test.dta2d')
>>> Edta().sniff(fname)
False
file_ext = 'edta'
comment_lines = 0
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.ProteomicsXml(**kwd)[source]

Bases: GenericXml

An enhanced XML datatype used to reuse code across several proteomic/mass-spec datatypes.

edam_data = 'data_2536'
edam_format = 'format_2032'
root: str
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is the correct XML type.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ParamXml(**kwd)[source]

Bases: ProteomicsXml

store Parameters in XML formal

file_ext = 'paramxml'
blurb = 'parameters in xmls'
root: str = 'parameters|PARAMETERS'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.PepXml(**kwd)[source]

Bases: ProteomicsXml

pepXML data

edam_format = 'format_3655'
file_ext = 'pepxml'
blurb = 'pepXML data'
root: str = 'msms_pipeline_analysis'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MascotXML(**kwd)[source]

Bases: ProteomicsXml

mzXML data

file_ext = 'mascotxml'
blurb = 'mascot Mass Spectrometry data'
root: str = 'mascot_search_results'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzML(**kwd)[source]

Bases: ProteomicsXml

mzML data

edam_format = 'format_3244'
file_ext = 'mzml'
blurb = 'mzML Mass Spectrometry data'
root: str = '(mzML|indexedmzML)'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.NmrML(**kwd)[source]

Bases: ProteomicsXml

nmrML data

file_ext = 'nmrml'
blurb = 'nmrML NMR data'
root: str = 'nmrML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ProtXML(**kwd)[source]

Bases: ProteomicsXml

protXML data

file_ext = 'protxml'
blurb = 'prot XML Search Results'
root: str = 'protein_summary'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzXML(**kwd)[source]

Bases: ProteomicsXml

mzXML data

edam_format = 'format_3654'
file_ext = 'mzxml'
blurb = 'mzXML Mass Spectrometry data'
root: str = 'mzXML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzData(**kwd)[source]

Bases: ProteomicsXml

mzData data

edam_format = 'format_3245'
file_ext = 'mzdata'
blurb = 'mzData Mass Spectrometry data'
root: str = 'mzData'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzIdentML(**kwd)[source]

Bases: ProteomicsXml

edam_format = 'format_3247'
file_ext = 'mzid'
blurb = 'XML identified peptides and proteins.'
root: str = 'MzIdentML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.TraML(**kwd)[source]

Bases: ProteomicsXml

edam_format = 'format_3246'
file_ext = 'traml'
blurb = 'TraML transition list'
root: str = 'TraML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.TrafoXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'trafoxml'
blurb = 'RT alignment tranformation'
root: str = 'TrafoXML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MzQuantML(**kwd)[source]

Bases: ProteomicsXml

edam_format = 'format_3248'
file_ext = 'mzq'
blurb = 'XML quantification data'
root: str = 'MzQuantML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ConsensusXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'consensusxml'
blurb = 'OpenMS multiple LC-MS map alignment file'
root: str = 'consensusXML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.FeatureXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'featurexml'
blurb = 'OpenMS feature file'
root: str = 'featureMap'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.IdXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'idxml'
blurb = 'OpenMS identification file'
root: str = 'IdXML'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.TandemXML(**kwd)[source]

Bases: ProteomicsXml

edam_format = 'format_3711'
file_ext = 'tandem'
blurb = 'X!Tandem search results file'
root: str = 'bioml'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.UniProtXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'uniprotxml'
blurb = 'UniProt Proteome file'
root: str = 'uniprot'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.XquestXML(**kwd)[source]

Bases: ProteomicsXml

file_ext = 'xquest.xml'
blurb = 'XQuest XML file'
root: str = 'xquest_results'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.XquestSpecXML(**kwd)[source]

Bases: ProteomicsXml

spec.xml

file_ext = 'spec.xml'
blurb = 'xquest_spectra'
root: str = 'xquest_spectra'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.QCML(**kwd)[source]

Bases: ProteomicsXml

qcml https://github.com/OpenMS/OpenMS/blob/113c49d01677f7f03343ce7cd542d83c99b351ee/share/OpenMS/SCHEMAS/mzQCML_0_0_5.xsd https://github.com/OpenMS/OpenMS/blob/3cfc57ad1788e7ab2bd6dd9862818b2855234c3f/share/OpenMS/SCHEMAS/qcML_0.0.7.xsd

file_ext = 'qcml'
blurb = 'QualityAssessments to runs'
root: str = 'qcML|MzQualityML)'
metadata_spec: metadata.MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Mgf(**kwd)[source]

Bases: Text

Mascot Generic Format data

edam_data = 'data_2536'
edam_format = 'format_3651'
file_ext = 'mgf'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.MascotDat(**kwd)[source]

Bases: Text

Mascot search results

edam_data = 'data_2536'
edam_format = 'format_3713'
file_ext = 'mascotdat'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ThermoRAW(**kwd)[source]

Bases: Binary

Class describing a Thermo Finnigan binary RAW file

edam_data = 'data_2536'
edam_format = 'format_3712'
file_ext = 'thermo.raw'
sniff(filename: str) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Msp(**kwd)[source]

Bases: Text

Output of NIST MS Search Program chemdata.nist.gov/mass-spc/ftp/mass-spc/PepLib.pdf

file_ext = 'msp'
static next_line_starts_with(contents: IO, prefix: str) bool[source]
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a NIST MSP output file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.SPLibNoIndex(**kwd)[source]

Bases: Text

SPlib without index file

file_ext = 'splib_noindex'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.SPLib(**kwd)[source]

Bases: Msp

SpectraST Spectral Library. Closely related to msp format

file_ext = 'splib'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a SpectraST generated file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.Ms2(**kwd)[source]

Bases: Text

file_ext = 'ms2'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a valid ms2 file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.proteomics.XHunterAslFormat(**kwd)[source]

Bases: Binary

Annotated Spectra in the HLF format http://www.thegpm.org/HUNTER/format_2006_09_15.html

file_ext = 'hlf'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.Sf3(**kwd)[source]

Bases: Binary

Class describing a Scaffold SF3 files

file_ext = 'sf3'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.proteomics.ImzML(**kwd)[source]

Bases: Binary

Class for imzML files. http://www.imzml.org

edam_format = 'format_3682'
file_ext = 'imzml'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.qiime2 module

class galaxy.datatypes.qiime2.QIIME2Artifact(**kwd)[source]

Bases: _QIIME2ResultBase

file_ext = 'qza'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'semantic_type': <galaxy.model.metadata.MetadataElementSpec object>, 'semantic_type_simple': <galaxy.model.metadata.MetadataElementSpec object>, 'uuid': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.qiime2.QIIME2Visualization(**kwd)[source]

Bases: _QIIME2ResultBase

file_ext = 'qzv'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'semantic_type': <galaxy.model.metadata.MetadataElementSpec object>, 'semantic_type_simple': <galaxy.model.metadata.MetadataElementSpec object>, 'uuid': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.qiime2.QIIME2Metadata(**kwd)[source]

Bases: Tabular

QIIME 2 supports overriding the type of a column to Categorical when a specific directive #q2:types is present under the ID row.

Galaxy already understands column types quite well, however we sometimes want to override its inferred type.

For Galaxy, we are going to require that if a directive occurs, it happens on the second line (after the header). This is the most typical location and interacts best with the current implementation of Tabular.

file_ext = 'qiime2.tabular'
get_column_names(first_line: str) List[str] | None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Let Galaxy’s Tabular format handle most of this. We will just jump in at the last minute to (potentially) override some column types.

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.qualityscore module

Qualityscore class

class galaxy.datatypes.qualityscore.QualityScore(**kwd)[source]

Bases: Text

until we know more about quality score formats

edam_data = 'data_2048'
edam_format = 'format_3606'
file_ext = 'qual'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.qualityscore.QualityScoreSOLiD(**kwd)[source]

Bases: QualityScore

until we know more about quality score formats

edam_format = 'format_3610'
file_ext = 'qualsolid'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> QualityScoreSOLiD().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.qualsolid' )
>>> QualityScoreSOLiD().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.qualityscore.QualityScore454(**kwd)[source]

Bases: QualityScore

until we know more about quality score formats

edam_format = 'format_3611'
file_ext = 'qual454'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> QualityScore454().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.qual454' )
>>> QualityScore454().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.qualityscore.QualityScoreSolexa(**kwd)[source]

Bases: QualityScore

until we know more about quality score formats

edam_format = 'format_3608'
file_ext = 'qualsolexa'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.qualityscore.QualityScoreIllumina(**kwd)[source]

Bases: QualityScore

until we know more about quality score formats

edam_format = 'format_3609'
file_ext = 'qualillumina'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.registry module

Provides mapping between extensions and datatypes, mime-types, etc.

exception galaxy.datatypes.registry.ConfigurationError[source]

Bases: Exception

class galaxy.datatypes.registry.Registry(config=None)[source]

Bases: object

__init__(config=None)[source]
load_datatypes(root_dir=None, config=None, override=True, use_converters=True, use_display_applications=True, use_build_sites=True)[source]

Parse a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository.

get_legacy_sites_by_build(site_type, build)[source]
get_display_sites(site_type)[source]
load_datatype_sniffers(root, override=False, compressed_sniffers=None)[source]

Process the sniffers element from a parsed a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository.

get_datatype_from_filename(name)[source]
is_extension_unsniffable_binary(ext)[source]
get_datatype_class_by_name(name)[source]

Return the datatype class where the datatype’s type attribute (as defined in the datatype_conf.xml file) contains name.

get_available_tracks()[source]
get_mimetype_by_extension(ext, default='application/octet-stream')[source]

Returns a mimetype based on an extension

get_datatype_by_extension(ext) Data | None[source]

Returns a datatype object based on an extension

change_datatype(data, ext)[source]
load_datatype_converters(toolbox, use_cached=False)[source]

Add datatype converters from self.converters to the calling app’s toolbox.

load_display_applications(app)[source]

Add display applications from self.display_app_containers or to appropriate datatypes.

reload_display_applications(display_application_ids=None)[source]

Reloads display applications: by id, or all if no ids provided Returns tuple( [reloaded_ids], [failed_ids] )

load_external_metadata_tool(toolbox)[source]

Adds a tool which is used to set external metadata

set_default_values()[source]
get_converters_by_datatype(ext)[source]

Returns available converters by source type

get_converter_by_target_type(source_ext, target_ext)[source]

Returns a converter based on source and target datatypes

find_conversion_destination_for_dataset_by_extensions(dataset_or_ext: str | DatasetProtocol, accepted_formats: Iterable[str | Data], converter_safe: bool = True) Tuple[bool, str | None, DatasetProtocol | None][source]

returns (direct_match, converted_ext, converted_dataset) - direct match is True iff no the data set already has an accepted format - target_ext becomes None if conversion is not possible (or necesary)

get_composite_extensions()[source]
get_upload_metadata_params(context, group, tool)[source]

Returns dict of case value:inputs for metadata conditional for upload tool

property edam_formats
property edam_data
to_xml_file(path)[source]
get_extension(elem)[source]

Function which returns the extension lowercased :param elem: :return extension:

galaxy.datatypes.registry.upload_warning(template: Template | None, auto_compressed_type: str | None = None) str | None[source]
galaxy.datatypes.registry.example_datatype_registry_for_sample(sniff_compressed_dynamic_datatypes_default=True)[source]

galaxy.datatypes.sequence module

Sequence classes

class galaxy.datatypes.sequence.SequenceSplitLocations(**kwd)[source]

Bases: Text

Class storing information about a sequence file composed of multiple gzip files concatenated as one OR an uncompressed file. In the GZIP case, each sub-file’s location is stored in start and end.

The format of the file is JSON:

{ "sections" : [
        { "start" : "x", "end" : "y", "sequences" : "z" },
        ...
]}
file_ext = 'fqtoc'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.Sequence(**kwd)[source]

Bases: Text

Class describing a sequence

edam_data = 'data_2044'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

static get_sequences_per_file(total_sequences: int, split_params: Dict) List[source]
classmethod do_slow_split(input_datasets, subdir_generator_function, split_params)[source]
classmethod do_fast_split(input_datasets, toc_file_datasets, subdir_generator_function, split_params)[source]
classmethod write_split_files(input_datasets, toc_file_datasets, subdir_generator_function, sequences_per_file)[source]
classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split a generic sequence file (not sensible or possible, see subclasses).

static get_split_commands_with_toc(input_name: str, output_name: str, toc_file: Any, start_sequence: int, sequence_count: int) List[source]

Uses a Table of Contents dict, parsed from an FQTOC file, to come up with a set of shell commands that will extract the parts necessary >>> three_sections=[dict(start=0, end=74, sequences=10), dict(start=74, end=148, sequences=10), dict(start=148, end=148+76, sequences=10)] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=10) [‘dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=1, sequence_count=5) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +5 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=20) [‘dd bs=1 skip=0 count=148 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=10) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘(dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=10, sequence_count=10) [‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=20) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’, ‘(dd bs=1 skip=148 count=76 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’]

static get_split_commands_sequential(is_compressed: bool, input_name: str, output_name: str, start_sequence: int, sequence_count: int) List[source]

Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, ‘./input.gz’, ‘./output.gz’, start_sequence=0, sequence_count=10) [‘zcat “./input.gz” | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > “./output.gz”’] >>> Sequence.get_split_commands_sequential(False, ‘./input.fastq’, ‘./output.fastq’, start_sequence=10, sequence_count=10) [‘tail -n +41 “./input.fastq” 2> /dev/null | head -40 > “./output.fastq”’]

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.Alignment(**kwd)[source]

Bases: Text

Class describing an alignment

edam_data = 'data_0863'
classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split a generic alignment file (not sensible or possible, see subclasses).

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'species': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.Fasta(**kwd)[source]

Bases: Sequence

Class representing a FASTA sequence

edam_format = 'format_1929'
file_ext = 'fasta'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in a FASTA dataset.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in fasta format

A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (“>”) symbol in the first column. All lines should be shorter than 80 characters

For complete details see http://www.ncbi.nlm.nih.gov/blast/fasta.shtml

Rules for sniffing as True:

We don’t care about line length (other than empty lines).

The first non-empty line must start with ‘>’ and the Very Next line.strip() must have sequence data and not be a header.

‘sequence data’ here is loosely defined as non-empty lines which do not start with ‘>’

This will cause Color Space FASTA (csfasta) to be detected as True (they are, after all, still FASTA files - they have a header line followed by sequence data)

Previously this method did some checking to determine if the sequence data had integers (presumably to differentiate between fasta and csfasta)

This should be done through sniff order, where csfasta (currently has a null sniff function) is detected for first (stricter definition) followed sometime after by fasta

We will only check that the first purported sequence is correctly formatted.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Fasta().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.fasta' )
>>> Fasta().sniff( fname )
True
classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

Split a FASTA file sequence by sequence.

Note that even if split_mode=”number_of_parts”, the actual number of sub-files produced may not match that requested by split_size.

If split_mode=”to_size” then split_size is treated as the number of FASTA records to put in each sub-file (not size in bytes).

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.csFasta(**kwd)[source]

Bases: Sequence

Class representing the SOLID Color-Space sequence ( csfasta )

edam_format = 'format_3589'
file_ext = 'csfasta'
sniff_prefix(file_prefix: FilePrefix) bool[source]
Color-space sequence:

>2_15_85_F3 T213021013012303002332212012112221222112212222

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> csFasta().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.csfasta' )
>>> csFasta().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.Fastg(**kwd)[source]

Bases: Sequence

Class representing a FASTG sequence http://fastg.sourceforge.net/FASTG_Spec_v1.00.pdf

edam_format = 'format_3823'
file_ext = 'fastg'
sniff_prefix(file_prefix: FilePrefix) bool[source]
FASTG must begin with lines:

#FASTG:begin; #FASTG:version=*.*; #FASTG:properties;

Or these can be combined on a line:

#FASTG:begin:version=*.*:properties;

FASTG must end with line:

#FASTG:end;

Example FASTG file:

#FASTG:begin; #FASTG:version=1.0:assembly_name=”tiny example”; >chr1:chr1; ACGANNNNN[5:gap:size=(5,4..6)]CAGGC[1:alt:allele|C,G]TATACG >chr2; ACATACGCATATATATATATATATATAT[20:tandem:size=(10,8..12)|AT]TCAGGCA[1:alt|A,T,TT]GGAC #FASTG:end;

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.fasta' )
>>> Fastg().sniff( fname )
False
>>> fname = get_test_fname( 'sequence.fastg' )
>>> Fastg().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'properties': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.BaseFastq(**kwd)[source]

Bases: Sequence

Base class for FastQ sequences

edam_format = 'format_1930'
file_ext = 'fastq'
bases_regexp = re.compile('^[NGTAC 0123\\.]*$', re.IGNORECASE)
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset. FIXME: This does not properly handle line wrapping

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml

Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina

These differ in the representation of the quality scores

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('4.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('3.fastq')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> fname = get_test_fname('2.fastq')
>>> Fastq().sniff(fname)
True
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastq')
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastqcssanger')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> FastqCSSanger().sniff(fname)
True
display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

classmethod split(input_datasets: List, subdir_generator_function: Callable, split_params: Dict | None) None[source]

FASTQ files are split on cluster boundaries, in increments of 4 lines

static process_split_file(data: Dict) bool[source]

This is called in the context of an external process launched by a Task (possibly not on the Galaxy machine) to create the input files for the Task. The parameters: data - a dict containing the contents of the split file

static quality_check(lines: Iterable) bool[source]
classmethod check_first_block(file_prefix: FilePrefix)[source]
classmethod check_block(block: List) bool[source]
validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.Fastq(**kwd)[source]

Bases: BaseFastq

Class representing a generic FASTQ sequence

edam_format = 'format_1930'
file_ext = 'fastq'
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.FastqSanger(**kwd)[source]

Bases: Fastq

Class representing a FASTQ sequence (the Sanger variant)

phred scored quality values 0:50 represented by ASCII 33:83

edam_format = 'format_1932'
file_ext = 'fastqsanger'
bases_regexp = re.compile('^[NGTAC]*$', re.IGNORECASE)
static quality_check(lines: Iterable) bool[source]

Presuming lines are lines from a fastq file, return True if the qualities are compatible with sanger encoding

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.FastqSolexa(**kwd)[source]

Bases: Fastq

Class representing a FASTQ sequence ( the Solexa variant )

solexa scored quality values -5:40 represented by ASCII 59:104

edam_format = 'format_1933'
file_ext = 'fastqsolexa'
static quality_check(lines: Iterable) bool[source]

Presuming lines are lines from a fastq file, return True if the qualities are compatible with sanger encoding

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml

Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina

These differ in the representation of the quality scores

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('4.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('3.fastq')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> fname = get_test_fname('2.fastq')
>>> Fastq().sniff(fname)
True
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastq')
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastqcssanger')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> FastqCSSanger().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.FastqIllumina(**kwd)[source]

Bases: Fastq

Class representing a FASTQ sequence ( the Illumina 1.3+ variant )

phred scored quality values 0:40 represented by ASCII 64:104

edam_format = 'format_1931'
file_ext = 'fastqillumina'
static quality_check(lines: Iterable) bool[source]

Presuming lines are lines from a fastq file, return True if the qualities are compatible with sanger encoding

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml

Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina

These differ in the representation of the quality scores

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('4.fastqsanger')
>>> FastqSanger().sniff(fname)
True
>>> fname = get_test_fname('3.fastq')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> fname = get_test_fname('2.fastq')
>>> Fastq().sniff(fname)
True
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastq')
>>> FastqSanger().sniff(fname)
False
>>> fname = get_test_fname('1.fastqcssanger')
>>> FastqSanger().sniff(fname)
False
>>> Fastq().sniff(fname)
True
>>> FastqCSSanger().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.FastqCSSanger(**kwd)[source]

Bases: Fastq

Class representing a Color Space FASTQ sequence ( e.g a SOLiD variant )

sequence in in color space phred scored quality values 0:93 represented by ASCII 33:126

file_ext = 'fastqcssanger'
bases_regexp = re.compile('^[NGTAC][0123\\.]*$', re.IGNORECASE)
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.Maf(**kwd)[source]

Bases: Alignment

Class describing a Maf alignment

edam_format = 'format_3008'
file_ext = 'maf'
init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Parses and sets species, chromosomes, index from MAF file.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

make_html_table(dataset: DatasetProtocol, skipchars: List | None = None) str[source]

Create HTML table, used for displaying peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines wether the file is in maf format

The .maf format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.

The first line of a .maf file begins with ##maf. This word is followed by white-space-separated variable=value pairs. There should be no white space surrounding the “=”.

For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format5

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Maf().sniff( fname )
True
>>> fname = get_test_fname( 'sequence.fasta' )
>>> Maf().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'blocks': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'maf_index': <galaxy.model.metadata.MetadataElementSpec object>, 'species': <galaxy.model.metadata.MetadataElementSpec object>, 'species_chromosomes': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.MafCustomTrack(**kwd)[source]

Bases: Text

file_ext = 'mafcustomtrack'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Parses and sets viewport metadata from MAF file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_chromosome': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_end': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_start': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.Axt(**kwd)[source]

Bases: Text

Class describing an axt alignment

edam_data = 'data_0863'
edam_format = 'format_3013'
file_ext = 'axt'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in axt format

axt alignment files are produced from Blastz, an alignment tool available from Webb Miller’s lab at Penn State University.

Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.

The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.

The sequence lines contain the sequence of the primary assembly (line 2) and aligning assembly (line 3) with inserts. Repeats are indicated by lower-case letters.

For complete details see http://genome.ucsc.edu/goldenPath/help/axt.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'alignment.axt' )
>>> Axt().sniff( fname )
True
>>> fname = get_test_fname( 'alignment.lav' )
>>> Axt().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.Lav(**kwd)[source]

Bases: Text

Class describing a LAV alignment

edam_data = 'data_0863'
edam_format = 'format_3014'
file_ext = 'lav'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in lav format

LAV is an alignment format developed by Webb Miller’s group. It is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.

For complete details see http://www.bioperl.org/wiki/LAV_alignment_format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'alignment.lav' )
>>> Lav().sniff( fname )
True
>>> fname = get_test_fname( 'alignment.axt' )
>>> Lav().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.RNADotPlotMatrix(**kwd)[source]

Bases: Data

edam_format = 'format_3466'
file_ext = 'rna_eps'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff(filename: str) bool[source]

Determine if the file is in RNA dot plot format.

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.sequence.DotBracket(**kwd)[source]

Bases: Sequence

edam_data = 'data_0880'
edam_format = 'format_1457'
file_ext = 'dbn'
sequence_regexp = re.compile('^[ACGTURYKMSWBDHVN]+$', re.IGNORECASE)
structure_regexp = re.compile('^[\\(\\)\\.\\[\\]{}]+$')
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of sequences and the number of data lines in dataset.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Galaxy Dbn (Dot-Bracket notation) rules:

  • The first non-empty line is a header line: no comment lines are allowed.

    • A header line starts with a ‘>’ symbol and continues with 0 or multiple symbols until the line ends.

  • The second non-empty line is a sequence line.

  • The third non-empty line is a structure (Dot-Bracket) line and only describes the 2D structure of the sequence above it.

    • A structure line must consist of the following chars: ‘.{}[]()’.

    • A structure line must be of the same length as the sequence line, and each char represents the structure of the nucleotide above it.

    • A structure line has no prefix and no suffix.

    • A nucleotide pairs with only 1 or 0 other nucleotides.

      • In a structure line, the number of ‘(’ symbols equals the number of ‘)’ symbols, the number of ‘[’ symbols equals the number of ‘]’ symbols and the number of ‘{’ symbols equals the number of ‘}’ symbols.

  • The format accepts multiple entries per file, given that each entry is provided as three lines: the header, sequence and structure line.

    • Sniffing is only applied on the first entry.

  • Empty lines are allowed.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.Genbank(**kwd)[source]

Bases: Text

Class representing a Genbank sequence

edam_format = 'format_1936'
edam_data = 'data_0849'
file_ext = 'genbank'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determine whether the file is in genbank format. Works for compressed files.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.genbank' )
>>> Genbank().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.sequence.MemePsp(**kwd)[source]

Bases: Sequence

Class representing MEME Position Specific Priors

file_ext = 'memepsp'
sniff_prefix(file_prefix: FilePrefix) bool[source]

The format of an entry in a PSP file is:

>ID WIDTH PRIORS

For complete details see http://meme-suite.org/doc/psp-format.html

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1.memepsp')
>>> MemePsp().sniff(fname)
True
>>> fname = get_test_fname('sequence.fasta')
>>> MemePsp().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.sniff module

File format detector

galaxy.datatypes.sniff.get_test_fname(fname)[source]

Returns test data filename

galaxy.datatypes.sniff.sniff_with_cls(cls, fname)[source]
galaxy.datatypes.sniff.handle_composite_file(datatype, src_path, extra_files, name, is_binary, tmp_dir, tmp_prefix, upload_opts)[source]
class galaxy.datatypes.sniff.ConvertResult(line_count, converted_path, converted_newlines, converted_regex)[source]

Bases: tuple

line_count: int

Alias for field number 0

converted_path: str | None

Alias for field number 1

converted_newlines: bool

Alias for field number 2

converted_regex: bool

Alias for field number 3

class galaxy.datatypes.sniff.ConvertFunction(*args, **kwargs)[source]

Bases: Protocol

__init__(*args, **kwargs)
galaxy.datatypes.sniff.convert_newlines(fname: str, in_place: bool = True, tmp_dir: str | None = None, tmp_prefix: str | None = 'gxupload', block_size: int = 131072, regexp=None) ConvertResult[source]

Converts in place a file from universal line endings to Posix line endings.

galaxy.datatypes.sniff.convert_sep2tabs(fname: str, in_place: bool = True, tmp_dir: str | None = None, tmp_prefix: str | None = 'gxupload', block_size: int = 131072)[source]

Transforms in place a ‘sep’ separated file to a tab separated one

galaxy.datatypes.sniff.convert_newlines_sep2tabs(fname: str, in_place: bool = True, tmp_dir: str | None = None, tmp_prefix: str | None = 'gxupload') ConvertResult[source]

Converts newlines in a file to posix newlines and replaces spaces with tabs.

galaxy.datatypes.sniff.iter_headers(fname_or_file_prefix, sep, count=60, comment_designator=None)[source]
galaxy.datatypes.sniff.validate_tabular(fname_or_file_prefix, validate_row, sep, comment_designator=None)[source]
galaxy.datatypes.sniff.get_headers(fname_or_file_prefix, sep, count=60, comment_designator=None)[source]

Returns a list with the first ‘count’ lines split by ‘sep’, ignoring lines starting with ‘comment_designator’

>>> fname = get_test_fname('complete.bed')
>>> get_headers(fname,'\t') == [['chr7', '127475281', '127491632', 'NM_000230', '0', '+', '127486022', '127488767', '0', '3', '29,172,3225,', '0,10713,13126,'], ['chr7', '127486011', '127488900', 'D49487', '0', '+', '127486022', '127488767', '0', '2', '155,490,', '0,2399']]
True
>>> fname = get_test_fname('test.gff')
>>> get_headers(fname, '\t', count=5, comment_designator='#') == [[''], ['chr7', 'bed2gff', 'AR', '26731313', '26731437', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731491', '26731536', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731541', '26731649', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731659', '26731841', '.', '+', '.', 'score']]
True
galaxy.datatypes.sniff.is_column_based(fname_or_file_prefix, sep='\t', skip=0)[source]

Checks whether the file is column based with respect to a separator (defaults to tab separator).

>>> fname = get_test_fname('test.gff')
>>> is_column_based(fname)
True
>>> fname = get_test_fname('test_tab.bed')
>>> is_column_based(fname)
True
>>> is_column_based(fname, sep=' ')
False
>>> fname = get_test_fname('test_space.txt')
>>> is_column_based(fname)
False
>>> is_column_based(fname, sep=' ')
True
>>> fname = get_test_fname('test_ensembl.tabular')
>>> is_column_based(fname)
True
>>> fname = get_test_fname('test_tab1.tabular')
>>> is_column_based(fname, sep=' ', skip=0)
False
>>> fname = get_test_fname('test_tab1.tabular')
>>> is_column_based(fname)
True
galaxy.datatypes.sniff.guess_ext(fname_or_file_prefix: str | FilePrefix, sniff_order, is_binary=None, auto_decompress=True)[source]

Returns an extension that can be used in the datatype factory to generate a data for the ‘fname’ file

>>> from galaxy.datatypes.registry import example_datatype_registry_for_sample
>>> datatypes_registry = example_datatype_registry_for_sample()
>>> sniff_order = datatypes_registry.sniff_order
>>> fname = get_test_fname('empty.txt')
>>> guess_ext(fname, sniff_order)
'txt'
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml')
>>> guess_ext(fname, sniff_order)
'blastxml'
>>> fname = get_test_fname('1.psl')
>>> guess_ext(fname, sniff_order)
'psl'
>>> fname = get_test_fname('2.psl')
>>> guess_ext(fname, sniff_order)
'psl'
>>> fname = get_test_fname('interval.interval')
>>> guess_ext(fname, sniff_order)
'interval'
>>> fname = get_test_fname('interv1.bed')
>>> guess_ext(fname, sniff_order)
'bed'
>>> fname = get_test_fname('test_tab.bed')
>>> guess_ext(fname, sniff_order)
'bed'
>>> fname = get_test_fname('sequence.maf')
>>> guess_ext(fname, sniff_order)
'maf'
>>> fname = get_test_fname('sequence.fasta')
>>> guess_ext(fname, sniff_order)
'fasta'
>>> fname = get_test_fname('1.genbank')
>>> guess_ext(fname, sniff_order)
'genbank'
>>> fname = get_test_fname('1.genbank.gz')
>>> guess_ext(fname, sniff_order)
'genbank.gz'
>>> fname = get_test_fname('file.html')
>>> guess_ext(fname, sniff_order)
'html'
>>> fname = get_test_fname('test.gtf')
>>> guess_ext(fname, sniff_order)
'gtf'
>>> fname = get_test_fname('test.gff')
>>> guess_ext(fname, sniff_order)
'gff'
>>> fname = get_test_fname('gff.gff3')
>>> guess_ext(fname, sniff_order)
'gff3'
>>> fname = get_test_fname('2.txt')
>>> guess_ext(fname, sniff_order)
'txt'
>>> fname = get_test_fname('test_tab2.tabular')
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('3.txt')
>>> guess_ext(fname, sniff_order)
'txt'
>>> fname = get_test_fname('test_tab1.tabular')
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('alignment.lav')
>>> guess_ext(fname, sniff_order)
'lav'
>>> fname = get_test_fname('1.sff')
>>> guess_ext(fname, sniff_order)
'sff'
>>> fname = get_test_fname('1.bam')
>>> guess_ext(fname, sniff_order)
'bam'
>>> fname = get_test_fname('3unsorted.bam')
>>> guess_ext(fname, sniff_order)
'unsorted.bam'
>>> fname = get_test_fname('test.idpdb')
>>> guess_ext(fname, sniff_order)
'idpdb'
>>> fname = get_test_fname('test.mz5')
>>> guess_ext(fname, sniff_order)
'h5'
>>> fname = get_test_fname('issue1818.tabular')
>>> guess_ext(fname, sniff_order)
'tabular'
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> guess_ext(fname, sniff_order)
'cml'
>>> fname = get_test_fname('q.fps')
>>> guess_ext(fname, sniff_order)
'fps'
>>> fname = get_test_fname('drugbank_drugs.inchi')
>>> guess_ext(fname, sniff_order)
'inchi'
>>> fname = get_test_fname('drugbank_drugs.mol2')
>>> guess_ext(fname, sniff_order)
'mol2'
>>> fname = get_test_fname('drugbank_drugs.sdf')
>>> guess_ext(fname, sniff_order)
'sdf'
>>> fname = get_test_fname('5e5z.pdb')
>>> guess_ext(fname, sniff_order)
'pdb'
>>> fname = get_test_fname('Si_uppercase.cell')
>>> guess_ext(fname, sniff_order)
'cell'
>>> fname = get_test_fname('Si_lowercase.cell')
>>> guess_ext(fname, sniff_order)
'cell'
>>> fname = get_test_fname('Si.cif')
>>> guess_ext(fname, sniff_order)
'cif'
>>> fname = get_test_fname('LaMnO3.cif')
>>> guess_ext(fname, sniff_order)
'cif'
>>> fname = get_test_fname('Si.xyz')
>>> guess_ext(fname, sniff_order)
'xyz'
>>> fname = get_test_fname('Si_multi.xyz')
>>> guess_ext(fname, sniff_order)
'xyz'
>>> fname = get_test_fname('Si.extxyz')
>>> guess_ext(fname, sniff_order)
'extxyz'
>>> fname = get_test_fname('Si.castep')
>>> guess_ext(fname, sniff_order)
'castep'
>>> fname = get_test_fname('test.fits')
>>> guess_ext(fname, sniff_order)
'fits'
>>> fname = get_test_fname('Si.param')
>>> guess_ext(fname, sniff_order)
'param'
>>> fname = get_test_fname('Si.den_fmt')
>>> guess_ext(fname, sniff_order)
'den_fmt'
>>> fname = get_test_fname('ethanol.magres')
>>> guess_ext(fname, sniff_order)
'magres'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.otu')
>>> guess_ext(fname, sniff_order)
'mothur.otu'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.lower.dist')
>>> guess_ext(fname, sniff_order)
'mothur.lower.dist'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.square.dist')
>>> guess_ext(fname, sniff_order)
'mothur.square.dist'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.pair.dist')
>>> guess_ext(fname, sniff_order)
'mothur.pair.dist'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.freq')
>>> guess_ext(fname, sniff_order)
'mothur.freq'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.quan')
>>> guess_ext(fname, sniff_order)
'mothur.quan'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.ref.taxonomy')
>>> guess_ext(fname, sniff_order)
'mothur.ref.taxonomy'
>>> fname = get_test_fname('mothur_datatypetest_true.mothur.axes')
>>> guess_ext(fname, sniff_order)
'mothur.axes'
>>> guess_ext(get_test_fname('infernal_model.cm'), sniff_order)
'cm'
>>> fname = get_test_fname('1.gg')
>>> guess_ext(fname, sniff_order)
'gg'
>>> fname = get_test_fname('diamond_db.dmnd')
>>> guess_ext(fname, sniff_order)
'dmnd'
>>> fname = get_test_fname('1.excel.xls')
>>> guess_ext(fname, sniff_order, is_binary=True)
'excel.xls'
>>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2')
>>> guess_ext(fname, sniff_order)
'biom2'
>>> fname = get_test_fname('454Score.pdf')
>>> guess_ext(fname, sniff_order)
'pdf'
>>> fname = get_test_fname('1.obo')
>>> guess_ext(fname, sniff_order)
'obo'
>>> fname = get_test_fname('1.arff')
>>> guess_ext(fname, sniff_order)
'arff'
>>> fname = get_test_fname('1.afg')
>>> guess_ext(fname, sniff_order)
'afg'
>>> fname = get_test_fname('1.owl')
>>> guess_ext(fname, sniff_order)
'owl'
>>> fname = get_test_fname('Acanium.snaphmm')
>>> guess_ext(fname, sniff_order)
'snaphmm'
>>> fname = get_test_fname('wiggle.wig')
>>> guess_ext(fname, sniff_order)
'wig'
>>> fname = get_test_fname('example.iqtree')
>>> guess_ext(fname, sniff_order)
'iqtree'
>>> fname = get_test_fname('1.stockholm')
>>> guess_ext(fname, sniff_order)
'stockholm'
>>> fname = get_test_fname('1.xmfa')
>>> guess_ext(fname, sniff_order)
'xmfa'
>>> fname = get_test_fname('test.blib')
>>> guess_ext(fname, sniff_order)
'blib'
>>> fname = get_test_fname('test_strict_interleaved.phylip')
>>> guess_ext(fname, sniff_order)
'phylip'
>>> fname = get_test_fname('test_relaxed_interleaved.phylip')
>>> guess_ext(fname, sniff_order)
'phylip'
>>> fname = get_test_fname('1.smat')
>>> guess_ext(fname, sniff_order)
'smat'
>>> fname = get_test_fname('1.ttl')
>>> guess_ext(fname, sniff_order)
'ttl'
>>> fname = get_test_fname('1.hdt')
>>> guess_ext(fname, sniff_order, is_binary=True)
'hdt'
>>> fname = get_test_fname('1.phyloxml')
>>> guess_ext(fname, sniff_order)
'phyloxml'
>>> fname = get_test_fname('1.dzi')
>>> guess_ext(fname, sniff_order)
'dzi'
>>> fname = get_test_fname('1.tiff')
>>> guess_ext(fname, sniff_order)
'tiff'
>>> fname = get_test_fname('1.fastqsanger.gz')
>>> guess_ext(fname, sniff_order)  # See test_datatype_registry for more compressed type tests.
'fastqsanger.gz'
>>> fname = get_test_fname('1.mtx')
>>> guess_ext(fname, sniff_order)
'mtx'
>>> fname = get_test_fname('mc_preprocess_summ.metacyto_summary.txt')
>>> guess_ext(fname, sniff_order)
'metacyto_summary.txt'
>>> fname = get_test_fname('Accuri_C6_A01_H2O.fcs')
>>> guess_ext(fname, sniff_order)
'fcs'
>>> fname = get_test_fname('1imzml')
>>> guess_ext(fname, sniff_order)  # This test case is ensuring doesn't throw exception, actual value could change if non-utf encoding handling improves.
'data'
>>> fname = get_test_fname('too_many_comments_gff3.tabular')
>>> guess_ext(fname, sniff_order)  # It's a VCF but is sniffed as tabular because of the limit on the number of header lines we read
'tabular'
galaxy.datatypes.sniff.guess_ext_from_file_name(fname, registry, requested_ext='auto')[source]
class galaxy.datatypes.sniff.FilePrefix(filename, auto_decompress=True)[source]

Bases: object

__init__(filename, auto_decompress=True)[source]
property binary
property file_size
string_io() StringIO[source]
text_io(*args, **kwargs) TextIOWrapper[source]
startswith(prefix)[source]
line_iterator()[source]
search(pattern)[source]
search_str(query_str)[source]
magic_header(pattern)[source]

Unpack header and get first element

startswith_bytes(test_bytes)[source]
galaxy.datatypes.sniff.run_sniffers_raw(file_prefix: FilePrefix, sniff_order)[source]

Run through sniffers specified by sniff_order, return None of None match.

galaxy.datatypes.sniff.zip_single_fileobj(path: str | PathLike) IO[bytes][source]
galaxy.datatypes.sniff.build_sniff_from_prefix(klass)[source]
galaxy.datatypes.sniff.disable_parent_class_sniffing(klass)[source]
class galaxy.datatypes.sniff.HandleCompressedFileResponse(is_valid, ext, uncompressed_path, compressed_type, is_compressed)[source]

Bases: tuple

is_valid: bool

Alias for field number 0

ext: str

Alias for field number 1

uncompressed_path: str

Alias for field number 2

compressed_type: str | None

Alias for field number 3

is_compressed: bool | None

Alias for field number 4

galaxy.datatypes.sniff.handle_compressed_file(file_prefix: FilePrefix, datatypes_registry, ext: str = 'auto', tmp_prefix: str | None = 'sniff_uncompress_', tmp_dir: str | None = None, in_place: bool = False, check_content: bool = True) HandleCompressedFileResponse[source]

Check uploaded files for compression, check compressed file contents, and uncompress if necessary.

Supports GZip, BZip2, and the first file in a Zip file.

For performance reasons, the temporary file used for uncompression is located in the same directory as the input/output file. This behavior can be changed with the tmp_dir param.

ext as returned will only be changed from the ext input param if the param was an autodetect type (auto) and the file was sniffed as a keep-compressed datatype.

is_valid as returned will only be set if the file is compressed and contains invalid contents (or the first file in the case of a zip file), this is so lengthy decompression can be bypassed if there is invalid content in the first 32KB. Otherwise the caller should be checking content.

galaxy.datatypes.sniff.handle_uploaded_dataset_file(filename, *args, **kwds) str[source]

Legacy wrapper about handle_uploaded_dataset_file_internal for tools using it.

class galaxy.datatypes.sniff.HandleUploadedDatasetFileInternalResponse(ext, converted_path, compressed_type, converted_newlines, converted_spaces)[source]

Bases: tuple

ext: str

Alias for field number 0

converted_path: str

Alias for field number 1

compressed_type: str | None

Alias for field number 2

converted_newlines: bool

Alias for field number 3

converted_spaces: bool

Alias for field number 4

galaxy.datatypes.sniff.convert_function(convert_to_posix_lines, convert_spaces_to_tabs) ConvertFunction[source]
galaxy.datatypes.sniff.handle_uploaded_dataset_file_internal(file_prefix: FilePrefix, datatypes_registry, ext: str = 'auto', tmp_prefix: str | None = 'sniff_upload_', tmp_dir: str | None = None, in_place: bool = False, check_content: bool = True, is_binary: bool | None = None, uploaded_file_ext: str | None = None, convert_to_posix_lines: bool | None = None, convert_spaces_to_tabs: bool | None = None) HandleUploadedDatasetFileInternalResponse[source]
exception galaxy.datatypes.sniff.InappropriateDatasetContentError[source]

Bases: Exception

galaxy.datatypes.spaln module

spaln Composite Dataset

class galaxy.datatypes.spaln.SpalnNuclDb(**kwd)[source]

Bases: _SpalnDb

file_ext = 'spalndbnp'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'spalndb_name': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.spaln.SpalnProtDb(**kwd)[source]

Bases: _SpalnDb

file_ext = 'spalndba'
__init__(**kwd)[source]

Initialize the datatype

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'spalndb_name': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.speech module

class galaxy.datatypes.speech.TextGrid(**kwd)[source]

Bases: Text

Praat Textgrid file for speech annotations

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1_1119_2_22_001.textgrid')
>>> TextGrid().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> TextGrid().sniff(fname)
False
file_ext = 'textgrid'
header = 'File type = "ooTextFile"\nObject class = "TextGrid"\n'
blurb = 'Praat TextGrid file'
sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'annotations': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.speech.BPF(**kwd)[source]

Bases: Text

Munich BPF annotation format https://www.phonetik.uni-muenchen.de/Bas/BasFormatseng.html#Partitur

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('1_1119_2_22_001.par')
>>> BPF().sniff(fname)
True
>>> fname = get_test_fname('1_1119_2_22_001-1.par')
>>> BPF().sniff(fname)
True
>>> fname = get_test_fname('drugbank_drugs.cml')
>>> BPF().sniff(fname)
False
file_ext = 'par'
mandatory_headers = ['LHD', 'REP', 'SNB', 'SAM', 'SBF', 'SSB', 'NCH', 'SPN', 'LBD']
optional_headers = ['FIL', 'TYP', 'DBN', 'VOL', 'DIR', 'SRC', 'BEG', 'END', 'RED', 'RET', 'RCC', 'CMT', 'SPI', 'PCF', 'PCN', 'EXP', 'SYS', 'DAT', 'SPA', 'MAO', 'GPO', 'SAO']
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the metadata for this dataset from the file contents

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'annotations': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.tabular module

Tabular datatype

class galaxy.datatypes.tabular.TabularData(**kwd)[source]

Bases: Text

Generic tabular data

edam_format = 'format_3475'
CHUNKABLE = True
data_line_offset = 0
max_peek_columns = 50
abstract set_meta(dataset: DatasetProtocol, *, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

displayable(dataset: DatasetProtocol) bool[source]
get_chunk(trans, dataset: HasFileName, offset: int = 0, ck_size: int | None = None) str[source]
display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, offset: int | None = None, ck_size: int | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

display_as_markdown(dataset_instance: DatasetProtocol) str[source]

Prepare for embedding dataset into a basic Markdown document.

This is a somewhat experimental interface and should not be implemented on datatypes not tightly tied to a Galaxy version (e.g. datatypes in the Tool Shed).

Speaking very loosely - the datatype should load a bounded amount of data from the supplied dataset instance and prepare for embedding it into Markdown. This should be relatively vanilla Markdown - the result of this is bleached and it should not contain nested Galaxy Markdown directives.

If the data cannot reasonably be displayed, just indicate this and do not throw an exception.

make_html_table(dataset: DatasetProtocol, **kwargs) str[source]

Create HTML table, used for displaying peek

make_html_peek_header(dataset: DatasetProtocol, skipchars: List | None = None, column_names: List | None = None, column_number_format: str = '%s', column_parameter_alias: Dict | None = None, **kwargs) str[source]
make_html_peek_rows(dataset: DatasetProtocol, skipchars: List | None = None, **kwargs) str[source]
display_peek(dataset: DatasetProtocol) str[source]

Returns formatted html of peek

is_int(column_text: str) bool[source]
is_float(column_text: str) bool[source]
guess_type(text: str) str[source]
column_dataprovider(dataset: DatasetProtocol, **settings) ColumnarDataProvider[source]

Uses column settings that are passed in

dataset_column_dataprovider(dataset: DatasetProtocol, **settings) DatasetColumnarDataProvider[source]

Attempts to get column settings from dataset.metadata

dict_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]

Uses column settings that are passed in

dataset_dict_dataprovider(dataset: DatasetProtocol, **settings) DatasetDictDataProvider[source]

Attempts to get column settings from dataset.metadata

dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.Tabular(**kwd)[source]

Bases: TabularData

Tab delimited data

file_ext = 'tabular'
get_column_names(first_line: str) List[str] | None[source]
set_meta(dataset: DatasetProtocol, *, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = 100000, max_guess_type_data_lines: int | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

as_gbrowse_display_file(dataset: HasFileName, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]
as_ucsc_display_file(dataset: DatasetProtocol, **kwd) IO[str] | TextIOWrapper | GzipFile | BZ2File | LZMAFile | IO[bytes] | str[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.SraManifest(**kwd)[source]

Bases: Tabular

A manifest received from the sra_source tool.

file_ext = 'sra_manifest.tabular'
data_line_offset = 1
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

get_column_names(first_line: str) List[str] | None[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.Taxonomy(**kwd)[source]

Bases: Tabular

file_ext = 'taxonomy'
__init__(**kwd)[source]

Initialize taxonomy datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.Sam(**kwd)[source]

Bases: Tabular, _BamOrSam

edam_format = 'format_2573'
edam_data = 'data_0863'
file_ext = 'sam'
track_type: str | None = 'ReadTrack'
data_sources: Dict[str, str] = {'data': 'bam', 'index': 'bigwig'}
__init__(**kwd)[source]

Initialize sam datatype

display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in SAM format

A file in SAM format consists of lines of tab-separated data. The following header line may be the first line:

@QNAME  FLAG    RNAME   POS     MAPQ    CIGAR   MRNM    MPOS    ISIZE   SEQ     QUAL
or
@QNAME  FLAG    RNAME   POS     MAPQ    CIGAR   MRNM    MPOS    ISIZE   SEQ     QUAL    OPT

Data in the OPT column is optional and can consist of tab-separated data

For complete details see http://samtools.sourceforge.net/SAM1.pdf

Rules for sniffing as True:

There must be 11 or more columns of data on each line
Columns 2 (FLAG), 4(POS), 5 (MAPQ), 8 (MPOS), and 9 (ISIZE) must be numbers (9 can be negative)
We will only check that up to the first 5 alignments are correctly formatted.
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'sequence.maf' )
>>> Sam().sniff( fname )
False
>>> fname = get_test_fname( '1.sam' )
>>> Sam().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = 5, **kwd) None[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> from galaxy.datatypes.registry import example_datatype_registry_for_sample
>>> from galaxy.model import Dataset, set_datatypes_registry
>>> from galaxy.model import History, HistoryDatasetAssociation
>>> from galaxy.model.mapping import init
>>> sa_session = init("/tmp", "sqlite:///:memory:", create_tables=True).session
>>> hist = History()
>>> with sa_session.begin():
...     sa_session.add(hist)
>>> set_datatypes_registry(example_datatype_registry_for_sample())
>>> fname = get_test_fname( 'sam_with_header.sam' )
>>> samds = Dataset(external_filename=fname)
>>> hda = hist.add_dataset(HistoryDatasetAssociation(id=1, extension='sam', create_dataset=True, sa_session=sa_session, dataset=samds))
>>> Sam().set_meta(hda)
>>> hda.metadata.comment_lines
2
>>> hda.metadata.reference_names
['ref', 'ref2']
static merge(split_files: List[str], output_file: str) None[source]

Multiple SAM files may each have headers. Since the headers should all be the same, remove the headers from files 1-n, keeping them in the first file only

line_dataprovider(dataset: DatasetProtocol, **settings) FilteredLineDataProvider[source]

Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.

regex_line_dataprovider(dataset: DatasetProtocol, **settings) RegexLineDataProvider[source]

Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.

column_dataprovider(dataset: DatasetProtocol, **settings) ColumnarDataProvider[source]

Uses column settings that are passed in

dataset_column_dataprovider(dataset: DatasetProtocol, **settings) DatasetColumnarDataProvider[source]

Attempts to get column settings from dataset.metadata

dict_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]

Uses column settings that are passed in

dataset_dict_dataprovider(dataset: DatasetProtocol, **settings) DatasetDictDataProvider[source]

Attempts to get column settings from dataset.metadata

header_dataprovider(dataset: DatasetProtocol, **settings) RegexLineDataProvider[source]
id_seq_qual_dataprovider(dataset: DatasetProtocol, **settings) DictDataProvider[source]
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function Sam.column_dataprovider>, 'dataset-column': <function Sam.dataset_column_dataprovider>, 'dataset-dict': <function Sam.dataset_dict_dataprovider>, 'dict': <function Sam.dict_dataprovider>, 'genomic-region': <function Sam.genomic_region_dataprovider>, 'genomic-region-dict': <function Sam.genomic_region_dict_dataprovider>, 'header': <function Sam.header_dataprovider>, 'id-seq-qual': <function Sam.id_seq_qual_dataprovider>, 'line': <function Sam.line_dataprovider>, 'regex-line': <function Sam.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.Pileup(**kwd)[source]

Bases: Tabular

Tab delimited data in pileup (6- or 10-column) format

edam_format = 'format_3015'
file_ext = 'pileup'
line_class = 'genomic coordinate'
data_sources: Dict[str, str] = {'data': 'tabix'}
init_meta(dataset: HasMetadata, copy_from: HasMetadata | None = None) None[source]
display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checks for ‘pileup-ness’

There are two main types of pileup: 6-column and 10-column. For both, the first three and last two columns are the same. We only check the first three to allow for some personalization of the format.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'interval.interval' )
>>> Pileup().sniff( fname )
False
>>> fname = get_test_fname( '6col.pileup' )
>>> Pileup().sniff( fname )
True
>>> fname = get_test_fname( '10col.pileup' )
>>> Pileup().sniff( fname )
True
>>> fname = get_test_fname( '1.excel.xls' )
>>> Pileup().sniff( fname )
False
>>> fname = get_test_fname( '2.txt' )
>>> Pileup().sniff( fname )  # 2.txt
False
>>> fname = get_test_fname( 'test_tab2.tabular' )
>>> Pileup().sniff( fname )
False
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Pileup.genomic_region_dataprovider>, 'genomic-region-dict': <function Pileup.genomic_region_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'baseCol': <galaxy.model.metadata.MetadataElementSpec object>, 'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.BaseVcf(**kwd)[source]

Bases: Tabular

Variant Call Format for describing SNPs and other simple genome variations.

edam_format = 'format_3016'
track_type: str | None = 'VariantTrack'
data_sources: Dict[str, str] = {'data': 'tabix', 'index': 'bigwig'}
column_names = ['Chrom', 'Pos', 'ID', 'Ref', 'Alt', 'Qual', 'Filter', 'Info', 'Format', 'data']
display_peek(dataset: DatasetProtocol) str[source]

Returns formated html of peek

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

static merge(split_files: List[str], output_file: str) None[source]

Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.

validate(dataset: DatasetProtocol, **kwd) DatatypeValidation[source]
genomic_region_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
genomic_region_dict_dataprovider(dataset: DatasetProtocol, **settings) GenomicRegionDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function BaseVcf.genomic_region_dataprovider>, 'genomic-region-dict': <function BaseVcf.genomic_region_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.Vcf(**kwd)[source]

Bases: BaseVcf

file_ext = 'vcf'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.VcfGz(**kwd)[source]

Bases: BaseVcf, Binary

file_ext = 'vcf_bgzip'
file_ext_export_alias = 'vcf.gz'
compressed = True
compressed_format = 'gzip'
sniff(filename: str) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: str | None = None, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.Eland(**kwd)[source]

Bases: Tabular

Support for the export.txt.gz file used by Illumina’s ELANDv2e aligner

compressed = True
compressed_format = 'gzip'
file_ext = '_export.txt.gz'
__init__(**kwd)[source]

Initialize eland datatype

make_html_table(dataset: DatasetProtocol, skipchars: List | None = None, peek: List | None = None, **kwargs) str[source]

Create HTML table, used for displaying peek

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in ELAND export format

A file in ELAND export format consists of lines of tab-separated data. There is no header.

Rules for sniffing as True:

- There must be 22 columns on each line
- LANE, TILEm X, Y, INDEX, READ_NO, SEQ, QUAL, POSITION, *STRAND, FILT must be correct
- We will only check that up to the first 5 alignments are correctly formatted.
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = 5, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

metadata_spec: MetadataSpecCollection = {'barcodes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'lanes': <galaxy.model.metadata.MetadataElementSpec object>, 'reads': <galaxy.model.metadata.MetadataElementSpec object>, 'tiles': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.ElandMulti(**kwd)[source]

Bases: Tabular

file_ext = 'elandmulti'
sniff_prefix(file_prefix: FilePrefix) bool[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.FeatureLocationIndex(**kwd)[source]

Bases: Tabular

An index that stores feature locations in tabular format.

file_ext = 'fli'
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.BaseCSV(**kwd)[source]

Bases: TabularData

Delimiter-separated table data. This includes CSV, TSV and other dialects understood by the Python ‘csv’ module https://docs.python.org/2/library/csv.html Must be extended to define the dialect to use, strict_width and file_ext. See the Python module csv for documentation of dialect settings

property dialect
property strict_width
delimiter = ','
peek_size = 1024
big_peek_size = 10240
sniff(filename: str) bool[source]

Return True if if recognizes dialect and header.

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.CSV(**kwd)[source]

Bases: BaseCSV

Comma-separated table data. Only sniffs comma-separated files with at least 2 rows and 2 columns.

file_ext = 'csv'
dialect

alias of excel

strict_width = False
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.TSV(**kwd)[source]

Bases: BaseCSV

Tab-separated table data. Only sniff tab-separated files with at least 2 rows and 2 columns.

Note: Use of this datatype is optional as the general tabular datatype will handle most tab-separated files. This datatype is only required for datasets with tabs INSIDE double quotes.

This datatype currently does not support TSV files where the header has one column less to indicate first column is row names. This kind of file is handled fine by the tabular datatype.

file_ext = 'tsv'
dialect

alias of excel_tab

strict_width = True
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tabular.ConnectivityTable(**kwd)[source]

Bases: Tabular

edam_format = 'format_3309'
file_ext = 'ct'
header_regexp = re.compile('^[0-9]+(?:\t|[ ]+).*?(?:ENERGY|energy|dG)[ \t].*?=')
structure_regexp = re.compile('^[0-9]+(?:\t|[ ]+)[ACGTURYKMSWBDHVN]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+')
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.

Items of interest:

  1. We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).

  2. If a tabular file has no data, it will have one column of type ‘str’.

  3. We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.

sniff_prefix(file_prefix: FilePrefix) bool[source]

The ConnectivityTable (CT) is a file format used for describing RNA 2D structures by tools including MFOLD, UNAFOLD and the RNAStructure package. The tabular file format is defined as follows:

5   energy = -12.3  sequence name
1   G       0       2       0       1
2   A       1       3       0       2
3   A       2       4       0       3
4   A       3       5       0       4
5   C       4       6       1       5

The links given at the edam ontology page do not indicate what type of separator is used (space or tab) while different implementations exist. The implementation that uses spaces as separator (implemented in RNAStructure) is as follows:

10    ENERGY = -34.8  seqname
1 G       0    2    9    1
2 G       1    3    8    2
3 G       2    4    7    3
4 a       3    5    0    4
5 a       4    6    0    5
6 a       5    7    0    6
7 C       6    8    3    7
8 C       7    9    2    8
9 C       8   10    1    9
10 a       9    0    0   10
get_chunk(trans, dataset: HasFileName, offset: int = 0, ck_size: int | None = None) str[source]
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.MatrixMarket(**kwd)[source]

Bases: TabularData

The Matrix Market (MM) exchange formats provide a simple mechanism to facilitate the exchange of matrix data. MM coordinate format is suitable for representing sparse matrices. Only nonzero entries need be encoded, and the coordinates of each are given explicitly.

The tabular file format is defined as follows:

%%MatrixMarket matrix coordinate real general <--- header line
%                                             <--+
% comments                                       |-- 0 or more comment lines
%                                             <--+
    M  N  L                                   <--- rows, columns, entries
    I1  J1  A(I1, J1)                         <--+
    I2  J2  A(I2, J2)                            |
    I3  J3  A(I3, J3)                            |-- L lines
        . . .                                    |
    IL JL  A(IL, JL)                          <--+

Indices are 1-based, i.e. A(1,1) is the first element.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> MatrixMarket().sniff( get_test_fname( 'sequence.maf' ) )
False
>>> MatrixMarket().sniff( get_test_fname( '1.mtx' ) )
True
>>> MatrixMarket().sniff( get_test_fname( '2.mtx' ) )
True
>>> MatrixMarket().sniff( get_test_fname( '3.mtx' ) )
True
file_ext = 'mtx'
__init__(**kwd)[source]

Initialize the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = 5, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.CMAP(**kwd)[source]

Bases: TabularData

# CMAP File Version: 2.0 # Label Channels: 1 # Nickase Recognition Site 1: cttaag;green_01 # Nickase Recognition Site 2: cctcagc;red_01 # Number of Consensus Maps: 459 # Values corresponding to intervals (StdDev, HapDelta) refer to the interval between current site and next site #h CMapId ContigLength NumSites SiteID LabelChannel Position StdDev Coverage Occurrence ChimQuality SegDupL SegDupR FragileL FragileR OutlierFrac ChimNorm Mask #f int float int int int float float float float float float float float float float float Hex 182 58474736.7 10235 1 1 58820.9 35.4 13.5 13.5 -1.00 -1.00 -1.00 3.63 0.00 0.00 -1.00 0 182 58474736.7 10235 1 1 58820.9 35.4 13.5 13.5 -1.00 -1.00 -1.00 3.63 0.00 0.00 -1.00 0 182 58474736.7 10235 1 1 58820.9 35.4 13.5 13.5 -1.00 -1.00 -1.00 3.63 0.00 0.00 -1.00 0

file_ext = 'cmap'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, skip: int | None = None, max_data_lines: int | None = 7, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'channel_1_color': <galaxy.model.metadata.MetadataElementSpec object>, 'channel_2_color': <galaxy.model.metadata.MetadataElementSpec object>, 'cmap_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'label_channels': <galaxy.model.metadata.MetadataElementSpec object>, 'nickase_recognition_site_1': <galaxy.model.metadata.MetadataElementSpec object>, 'nickase_recognition_site_2': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_consensus_nanomaps': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.tabular.Psl(**kwd)[source]

Bases: Tabular

Tab delimited data in psl format.

edam_format = 'format_3007'
file_ext = 'psl'
line_class = 'assemblies'
data_sources: Dict[str, str] = {'data': 'tabix'}
__init__(**kwd)[source]

Initialize psl datatype

sniff_prefix(file_prefix: FilePrefix)[source]

PSL lines represent alignments, and are typically generated by BLAT. Each line consists of 21 required fields, and track lines may optionally be used to provide more information.

Fields are tab-separated, and all 21 are required. Although not part of the formal PSL specification, track lines may be used to further configure sets of features. Track lines are placed at the beginning of the list of features they are to affect.

Rules for sniffing as True:

- There must be 21 columns on each fields line
- matches, misMatches repMatches, nCount, qNumInsert,
  qBaseInsert, tNumInsert, tBaseInsert, strand, qSize, qStart,
  qEnd, tName, tSize, tStart, tEnd, blockCount, blockSizes,
  qStarts, tStarts  must be correct
- We will only check that up to the first 10 alignments are
  correctly formatted.
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.psl' )
>>> Psl().sniff( fname )
True
>>> fname = get_test_fname( '2.psl' )
>>> Psl().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> Psl().sniff( fname )
False
>>> fname = get_test_fname( '2.txt' )
>>> Psl().sniff( fname )
False
>>> fname = get_test_fname( 'test_tab2.tabular' )
>>> Psl().sniff( fname )
False
>>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.ref.taxonomy' )
>>> Psl().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.text module

Clearing house for generic text datatypes that are not XML or tabular.

class galaxy.datatypes.text.Html(**kwd)[source]

Bases: Text

Class describing an html file

edam_format = 'format_2331'
file_ext = 'html'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

get_mime() str[source]

Returns the mime type of the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in html format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'complete.bed' )
>>> Html().sniff( fname )
False
>>> fname = get_test_fname( 'file.html' )
>>> Html().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Json(**kwd)[source]

Bases: Text

edam_format = 'format_3464'
file_ext = 'json'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

get_mime() str[source]

Returns the mime type of the datatype

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to load the string with the json module. If successful it’s a json file.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.DataManagerJson(**kwd)[source]

Bases: Json

file_ext = 'data_manager_json'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd)[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_tables': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.text.ExpressionJson(**kwd)[source]

Bases: Json

Represents the non-data input or output to a tool or workflow.

file_ext = 'json'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'json_type': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.text.Ipynb(**kwd)[source]

Bases: Json

file_ext = 'ipynb'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to load the string with the json module. If successful it’s a json file.

display_data(trans, dataset: DatasetHasHidProtocol, preview: bool = False, filename: str | None = None, to_ext: str | None = None, **kwd)[source]

Displays data in central pane if preview is True, else handles download.

Datatypes should be very careful if overriding this method and this interface between datatypes and Galaxy will likely change.

TODO: Document alternatives to overriding this method (data providers?).

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of models in dataset.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Biom1(**kwd)[source]

Bases: Json

BIOM version 1.0 file format description http://biom-format.org/documentation/format_versions/biom-1.0.html

file_ext = 'biom1'
edam_format = 'format_3746'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to load the string with the json module. If successful it’s a json file.

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Store metadata information from the BIOM file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_column_metadata_headers': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_date': <galaxy.model.metadata.MetadataElementSpec object>, 'table_format': <galaxy.model.metadata.MetadataElementSpec object>, 'table_format_url': <galaxy.model.metadata.MetadataElementSpec object>, 'table_generated_by': <galaxy.model.metadata.MetadataElementSpec object>, 'table_id': <galaxy.model.metadata.MetadataElementSpec object>, 'table_matrix_element_type': <galaxy.model.metadata.MetadataElementSpec object>, 'table_matrix_type': <galaxy.model.metadata.MetadataElementSpec object>, 'table_rows': <galaxy.model.metadata.MetadataElementSpec object>, 'table_shape': <galaxy.model.metadata.MetadataElementSpec object>, 'table_type': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.ImgtJson(**kwd)[source]

Bases: Json

https://github.com/repseqio/library-imgt/releases Data coming from IMGT server may be used for academic research only, provided that it is referred to IMGT®, and cited as: “IMGT®, the international ImMunoGeneTics information system® http://www.imgt.org (founder and director: Marie-Paule Lefranc, Montpellier, France).”

file_ext = 'imgt.json'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in json format with imgt elements

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.json' )
>>> ImgtJson().sniff( fname )
False
>>> fname = get_test_fname( 'imgt.json' )
>>> ImgtJson().sniff( fname )
True
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Store metadata information from the imgt file.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'taxon_names': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.GeoJson(**kwd)[source]

Bases: Json

GeoJSON is a geospatial data interchange format based on JavaScript Object Notation (JSON). https://tools.ietf.org/html/rfc7946

file_ext = 'geojson'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in json format with imgt elements

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.json' )
>>> GeoJson().sniff( fname )
False
>>> fname = get_test_fname( 'gis.geojson' )
>>> GeoJson().sniff( fname )
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Obo(**kwd)[source]

Bases: Text

OBO file format description https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_2.html

edam_data = 'data_0582'
edam_format = 'format_2549'
file_ext = 'obo'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess the Obo filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Arff(**kwd)[source]

Bases: Text

An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. http://weka.wikispaces.com/ARFF

edam_format = 'format_3581'
file_ext = 'arff'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to guess the Arff filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Trying to count the comment lines and the number of columns included. A typical ARFF data block looks like this: @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa

metadata_spec: MetadataSpecCollection = {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.SnpEffDb(**kwd)[source]

Bases: Text

Class describing a SnpEff genome build

edam_format = 'format_3624'
file_ext = 'snpeffdb'
__init__(**kwd)[source]

Initialize the datatype

getSnpeffVersionFromFile(path: str) str | None[source]
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

metadata_spec: MetadataSpecCollection = {'annotation': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'genome_version': <galaxy.model.metadata.MetadataElementSpec object>, 'regulation': <galaxy.model.metadata.MetadataElementSpec object>, 'snpeff_version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.text.SnpSiftDbNSFP(**kwd)[source]

Bases: Text

Class describing a dbNSFP database prepared fpr use by SnpSift dbnsfp

The dbNSFP file is a tabular file with 1 header line. The first 4 columns are required to be: chrom pos ref alt These match columns 1,2,4,5 of the VCF file SnpSift requires the file to be block-gzipped and the indexed with samtools tabix

Example: - Compress using block-gzip algorithm: $ bgzip dbNSFP2.3.txt - Create tabix index $ tabix -s 1 -b 2 -e 2 dbNSFP2.3.txt.gz

file_ext = 'snpsiftdbnsfp'
composite_type: str | None = 'auto_primary_file'
__init__(**kwd)[source]

Initialize the datatype

generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

regenerate_primary_file(dataset: DatasetProtocol) None[source]

cannot do this until we are setting metadata

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'annotation': <galaxy.model.metadata.MetadataElementSpec object>, 'bgzip': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'index': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_name': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.text.IQTree(**kwd)[source]

Bases: Text

IQ-TREE format

file_ext = 'iqtree'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Detect the IQTree file

Scattered text file containing various headers and data types.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('example.iqtree')
>>> IQTree().sniff(fname)
True
>>> fname = get_test_fname('temp.txt')
>>> IQTree().sniff(fname)
False
>>> fname = get_test_fname('test_tab1.tabular')
>>> IQTree().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Paf(**kwd)[source]

Bases: Text

PAF: a Pairwise mApping Format

https://github.com/lh3/miniasm/blob/master/PAF.md

file_ext = 'paf'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('A-3105.paf')
>>> Paf().sniff(fname)
True
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Gfa1(**kwd)[source]

Bases: Text

Graphical Fragment Assembly (GFA) 1.0

http://gfa-spec.github.io/GFA-spec/GFA1.html

file_ext = 'gfa1'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('big.gfa1')
>>> Gfa1().sniff(fname)
True
>>> Gfa2().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Gfa2(**kwd)[source]

Bases: Text

Graphical Fragment Assembly (GFA) 2.0

https://github.com/GFA-spec/GFA-spec/blob/master/GFA2.md

file_ext = 'gfa2'
sniff_prefix(file_prefix: FilePrefix) bool[source]
>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('sample.gfa2')
>>> Gfa2().sniff(fname)
True
>>> Gfa1().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Yaml(**kwd)[source]

Bases: Text

Yaml files

file_ext = 'yaml'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to load the string with the yaml module. If successful it’s a yaml file.

get_mime() str[source]

Returns the mime type of the datatype

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.BCSLmodel(**kwd)[source]

Bases: Text

BioChemical Space Language model file

file_ext = 'bcsl.model'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .bcsl.model format

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.BCSLts(**kwd)[source]

Bases: Json

BioChemical Space Language transition system file

file_ext = 'bcsl.ts'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .bcsl.ts format

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.StormSample(**kwd)[source]

Bases: Text

Storm PCTL parameter synthesis result file containing probability function of parameters.

file_ext = 'storm.sample'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .storm.sample format

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.StormCheck(**kwd)[source]

Bases: Text

Storm PCTL model checking result file containing boolean or numerical result.

file_ext = 'storm.check'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .storm.check format

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.CTLresult(**kwd)[source]

Bases: Text

CTL model checking result

file_ext = 'ctl.result'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .ctl.result format

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.PithyaProperty(**kwd)[source]

Bases: Text

Pithya CTL property format

file_ext = 'pithya.property'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .pithya.property format

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.PithyaModel(**kwd)[source]

Bases: Text

Pithya model format

file_ext = 'pithya.model'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .pithya.model format

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.PithyaResult(**kwd)[source]

Bases: Json

Pithya result format

file_ext = 'pithya.result'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is in .pithya.result format

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Castep(**kwd)[source]

Bases: Text

Report on a CASTEP calculation

file_ext = 'castep'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is a CASTEP log

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.castep')
>>> Castep().sniff(fname)
True
>>> fname = get_test_fname('Si.param')
>>> Castep().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.Param(**kwd)[source]

Bases: Yaml

CASTEP parameter input file

file_ext = 'param'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Modified version of the normal Yaml sniff that also checks for a valid CASTEP task key-value pair, which is not case sensitive

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.param')
>>> Param().sniff(fname)
True
>>> fname = get_test_fname('Si.castep')
>>> Param().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.text.FormattedDensity(**kwd)[source]

Bases: Text

Final electron density from a CASTEP calculation written to an ASCII file

file_ext = 'den_fmt'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file contains electron densities in the CASTEP den_fmt format

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('Si.den_fmt')
>>> FormattedDensity().sniff(fname)
True
>>> fname = get_test_fname('YbCuAs2.den_fmt')
>>> FormattedDensity().sniff(fname)
True
>>> fname = get_test_fname('Si.param')
>>> FormattedDensity().sniff(fname)
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

galaxy.datatypes.tracks module

Datatype classes for tracks/track views within galaxy.

class galaxy.datatypes.tracks.GeneTrack(**kwd)[source]

Bases: Binary

edam_data = 'data_3002'
edam_format = 'format_2919'
file_ext = 'genetrack'
metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.tracks.UCSCTrackHub(**kwd)[source]

Bases: Html

Datatype for UCSC TrackHub

file_ext = 'trackhub'
composite_type: str | None = 'auto_primary_file'
generate_primary_file(dataset: HasExtraFilesAndMetadata) str[source]

This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek. This method is used by various subclasses of Text.

display_peek(dataset: DatasetProtocol) str[source]

Create HTML table, used for displaying peek

sniff(filename: str) bool[source]
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

galaxy.datatypes.triples module

Triple format classes

class galaxy.datatypes.triples.Triples(**kwd)[source]

Bases: Data

The abstract base class for the file format that can contain triples

edam_data = 'data_0582'
edam_format = 'format_2376'
file_ext = 'triples'
sniff(filename: str) bool[source]

Returns false and the user must manually set.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.triples.NTriples(**kwd)[source]

Bases: Text, Triples

The N-Triples triple data format

edam_format = 'format_3256'
file_ext = 'nt'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

Returns false and the user must manually set.

class galaxy.datatypes.triples.N3(**kwd)[source]

Bases: Text, Triples

The N3 triple data format

edam_format = 'format_3257'
file_ext = 'n3'
sniff(filename: str) bool[source]

Returns false and the user must manually set.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.triples.Turtle(**kwd)[source]

Bases: Text, Triples

The Turtle triple data format

edam_format = 'format_3255'
file_ext = 'ttl'
sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

Returns false and the user must manually set.

class galaxy.datatypes.triples.Rdf(**kwd)[source]

Bases: GenericXml, Triples

Resource Description Framework format (http://www.w3.org/RDF/).

edam_format = 'format_3261'
file_ext = 'rdf'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is XML or not

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> GenericXml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> GenericXml().sniff( fname )
False
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

Returns false and the user must manually set.

class galaxy.datatypes.triples.Jsonld(**kwd)[source]

Bases: Json, Triples

The JSON-LD data format

edam_format = 'format_3464'
file_ext = 'jsonld'
sniff_prefix(file_prefix: FilePrefix) bool[source]

Try to load the string with the json module. If successful it’s a json file.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

Returns false and the user must manually set.

class galaxy.datatypes.triples.HDT(**kwd)[source]

Bases: Binary, Triples

The HDT triple data format

edam_format = 'format_2376'
file_ext = 'hdt'
sniff(filename: str) bool[source]

Returns false and the user must manually set.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.triples.Sbol(**kwd)[source]

Bases: Text, Triples

The SBOL data format (https://sbolstandard.org).

edam_format = 'format_3725'
file_ext = 'sbol'
set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

sniff_prefix(file_prefix: FilePrefix) bool[source]
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)

Returns false and the user must manually set.

galaxy.datatypes.upload_util module

exception galaxy.datatypes.upload_util.UploadProblemException[source]

Bases: Exception

class galaxy.datatypes.upload_util.HandleUploadResponse(stdout, ext, datatype, is_binary, converted_path, converted_newlines, converted_spaces)[source]

Bases: tuple

stdout: str | None

Alias for field number 0

ext: str

Alias for field number 1

datatype: Data

Alias for field number 2

is_binary: bool

Alias for field number 3

converted_path: str | None

Alias for field number 4

converted_newlines: bool

Alias for field number 5

converted_spaces: bool

Alias for field number 6

galaxy.datatypes.upload_util.handle_upload(registry, path: str, requested_ext: str, name: str, tmp_prefix: str | None, tmp_dir: str | None, check_content: bool, link_data_only: bool, in_place: bool, auto_decompress: bool, convert_to_posix_lines: bool, convert_spaces_to_tabs: bool) HandleUploadResponse[source]

galaxy.datatypes.xml module

XML format classes

class galaxy.datatypes.xml.GenericXml(**kwd)[source]

Bases: Text

Base format class for any XML file.

edam_format = 'format_2332'
file_ext = 'xml'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Determines whether the file is XML or not

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> GenericXml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> GenericXml().sniff( fname )
False
static merge(split_files: List[str], output_file: str) None[source]

Merging multiple XML files is non-trivial and must be done in subclasses.

xml_dataprovider(dataset: DatasetProtocol, **settings) XMLDataProvider[source]
dataproviders: Dict[str, Any] = {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'xml': <function GenericXml.xml_dataprovider>}
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
class galaxy.datatypes.xml.MEMEXml(**kwd)[source]

Bases: GenericXml

MEME XML Output data

file_ext = 'memexml'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
sniff_prefix(file_prefix)

Determines whether the file is XML or not

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> GenericXml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> GenericXml().sniff( fname )
False
class galaxy.datatypes.xml.CisML(**kwd)[source]

Bases: GenericXml

CisML XML data

file_ext = 'cisml'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

sniff(filename)
sniff_prefix(file_prefix)

Determines whether the file is XML or not

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> GenericXml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> GenericXml().sniff( fname )
False
class galaxy.datatypes.xml.Dzi(**kwd)[source]

Bases: GenericXml

Deep zoom image format, see https://github.com/openseadragon/openseadragon/wiki/The-DZI-File-Format

file_ext = 'dzi'
__init__(**kwd)[source]

Initialize the datatype

set_meta(dataset: DatasetProtocol, overwrite: bool = True, **kwd) None[source]

Set the number of lines of data in dataset.

set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checking for keyword - ‘Collection’ or ‘Image’ in the first 200 lines. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘1.dzi’) >>> Dzi().sniff(fname) True >>> fname = get_test_fname(‘megablast_xml_parser_test1.blastxml’) >>> Dzi().sniff(fname) False

metadata_spec: MetadataSpecCollection = {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'height': <galaxy.model.metadata.MetadataElementSpec object>, 'max_level': <galaxy.model.metadata.MetadataElementSpec object>, 'overlap': <galaxy.model.metadata.MetadataElementSpec object>, 'quality': <galaxy.model.metadata.MetadataElementSpec object>, 'tile_size': <galaxy.model.metadata.MetadataElementSpec object>, 'width': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.xml.Phyloxml(**kwd)[source]

Bases: GenericXml

Format for defining phyloxml data http://www.phyloxml.org/

edam_data = 'data_0872'
edam_format = 'format_3159'
file_ext = 'phyloxml'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

“Checking for keyword - ‘phyloxml’ always in lowercase in the first few lines.

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname( '1.phyloxml' )
>>> Phyloxml().sniff( fname )
True
>>> fname = get_test_fname( 'interval.interval' )
>>> Phyloxml().sniff( fname )
False
>>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' )
>>> Phyloxml().sniff( fname )
False
metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.xml.Owl(**kwd)[source]

Bases: GenericXml

Web Ontology Language OWL format description http://www.w3.org/TR/owl-ref/

edam_format = 'format_3262'
file_ext = 'owl'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checking for keyword - ‘<owl’ in the first 200 lines.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype

class galaxy.datatypes.xml.Sbml(**kwd)[source]

Bases: GenericXml

System Biology Markup Language http://sbml.org

file_ext = 'sbml'
edam_data = 'data_2024'
edam_format = 'format_2585'
set_peek(dataset: DatasetProtocol, **kwd) None[source]

Set the peek and blurb text

sniff_prefix(file_prefix: FilePrefix) bool[source]

Checking for keyword - ‘<sbml’ in the first 200 lines.

metadata_spec: MetadataSpecCollection = {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}

Dictionary of metadata fields for this datatype