Warning
This document is for an old release of Galaxy. You can alternatively view this page in the latest release if it exists or view the top of the latest release's documentation.
galaxy.datatypes package¶
Subpackages¶
- galaxy.datatypes.converters package
- Submodules
- galaxy.datatypes.converters.bed_to_gff_converter module
- galaxy.datatypes.converters.bgzip module
- galaxy.datatypes.converters.cram_to_bam module
- galaxy.datatypes.converters.fasta_to_len module
- galaxy.datatypes.converters.fasta_to_tabular_converter module
- galaxy.datatypes.converters.fastq_to_fqtoc module
- galaxy.datatypes.converters.fastqsolexa_to_fasta_converter module
- galaxy.datatypes.converters.fastqsolexa_to_qual_converter module
- galaxy.datatypes.converters.gff_to_bed_converter module
- galaxy.datatypes.converters.gff_to_interval_index_converter module
- galaxy.datatypes.converters.interval_to_bed_converter module
- galaxy.datatypes.converters.interval_to_bedstrict_converter module
- galaxy.datatypes.converters.interval_to_fli module
- galaxy.datatypes.converters.interval_to_interval_index_converter module
- galaxy.datatypes.converters.interval_to_tabix_converter module
- galaxy.datatypes.converters.lped_to_fped_converter module
- galaxy.datatypes.converters.lped_to_pbed_converter module
- galaxy.datatypes.converters.maf_to_fasta_converter module
- galaxy.datatypes.converters.maf_to_interval_converter module
- galaxy.datatypes.converters.pbed_ldreduced_converter module
- galaxy.datatypes.converters.pbed_to_lped_converter module
- galaxy.datatypes.converters.picard_interval_list_to_bed6_converter module
- galaxy.datatypes.converters.pileup_to_interval_index_converter module
- galaxy.datatypes.converters.ref_to_seq_taxonomy_converter module
- galaxy.datatypes.converters.tabular_csv module
- galaxy.datatypes.converters.tabular_to_dbnsfp module
- galaxy.datatypes.converters.vcf_to_interval_index_converter module
- galaxy.datatypes.converters.vcf_to_vcf_bgzip module
- galaxy.datatypes.converters.wiggle_to_simple_converter module
- galaxy.datatypes.dataproviders package
- Submodules
- galaxy.datatypes.dataproviders.base module
- galaxy.datatypes.dataproviders.chunk module
- galaxy.datatypes.dataproviders.column module
- galaxy.datatypes.dataproviders.dataset module
- galaxy.datatypes.dataproviders.decorators module
- galaxy.datatypes.dataproviders.exceptions module
- galaxy.datatypes.dataproviders.external module
- galaxy.datatypes.dataproviders.hierarchy module
- galaxy.datatypes.dataproviders.line module
- galaxy.datatypes.display_applications package
- galaxy.datatypes.util package
Submodules¶
galaxy.datatypes.annotation module¶
-
class
galaxy.datatypes.annotation.
SnapHmm
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'snaphmm'¶
-
edam_data
= 'data_1364'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.anvio module¶
Datatypes for Anvi’o https://github.com/merenlab/anvio
-
class
galaxy.datatypes.anvio.
AnvioComposite
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
Base class to use for Anvi’o composite datatypes. Generally consist of a sqlite database, plus optional additional files
-
file_ext
= 'anvio_composite'¶
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioComposite
Class for AnvioDB database files.
-
file_ext
= 'anvio_db'¶
-
set_meta
(dataset, **kwd)[source]¶ Set the anvio_basename based upon actual extra_files_path contents.
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioStructureDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Structure DB database files.
-
file_ext
= 'anvio_structure_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioGenomesDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Genomes DB database files.
-
file_ext
= 'anvio_genomes_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioContigsDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Contigs DB database files.
-
file_ext
= 'anvio_contigs_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioProfileDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Profile DB database files.
-
file_ext
= 'anvio_profile_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioPanDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Pan DB database files.
-
file_ext
= 'anvio_pan_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.anvio.
AnvioSamplesDB
(*args, **kwd)[source]¶ Bases:
galaxy.datatypes.anvio.AnvioDB
Class for Anvio Samples DB database files.
-
file_ext
= 'anvio_samples_db'¶
-
metadata_spec
= {'anvio_basename': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.assembly module¶
velvet datatypes James E Johnson - University of Minnesota for velvet assembler tool in galaxy
-
class
galaxy.datatypes.assembly.
Amos
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing the AMOS assembly file
-
edam_data
= 'data_0925'¶
-
edam_format
= 'format_3582'¶
-
file_ext
= 'afg'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is an amos assembly file format Example:
{CTG iid:1 eid:1 seq: CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA . qlt: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD . {TLE src:1027 off:0 clr:618,0 gap: 250 612 . } }
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.assembly.
Sequences
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fasta
Class describing the Sequences file generated by velveth
-
edam_data
= 'data_0925'¶
-
file_ext
= 'sequences'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a velveth produced fasta format The id line has 3 fields separated by tabs: sequence_name sequence_index category:
>SEQUENCE_0_length_35 1 1 GGATATAGGGCCAACCCAACTCAACGGCCTGTCTT >SEQUENCE_1_length_35 2 1 CGACGAATGACAGGTCACGAATTTGGCGGGGATTA
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.assembly.
Roadmaps
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing the Sequences file generated by velveth
-
edam_format
= 'format_2561'¶
-
file_ext
= 'roadmaps'¶
-
sniff_prefix
(file_prefix)[source]¶ - Determines whether the file is a velveth produced RoadMap::
142858 21 1 ROADMAP 1 ROADMAP 2 …
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.assembly.
Velvet
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
-
file_ext
= 'velvet'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'long_reads': <galaxy.model.metadata.MetadataElementSpec object>, 'paired_end_reads': <galaxy.model.metadata.MetadataElementSpec object>, 'short2_reads': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.binary module¶
Binary classes
-
class
galaxy.datatypes.binary.
Binary
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Binary data
-
edam_format
= 'format_2333'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Ab1
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an ab1 binary sequence file
-
file_ext
= 'ab1'¶
-
edam_format
= 'format_3000'¶
-
edam_data
= 'data_0924'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Idat
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Binary data in idat format
-
file_ext
= 'idat'¶
-
edam_format
= 'format_2058'¶
-
edam_data
= 'data_2603'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Cel
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Cel File format described at: http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html
-
file_ext
= 'cel'¶
-
edam_format
= 'format_1638'¶
-
edam_data
= 'data_3110'¶
-
sniff
(filename)[source]¶ Try to guess if the file is a Cel file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘affy_v_agcc.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘affy_v_3.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘affy_v_4.cel’) >>> Cel().sniff(fname) True >>> fname = get_test_fname(‘test.gal’) >>> Cel().sniff(fname) False
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
MashSketch
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Mash Sketch file. Sketches are used by the MinHash algorithm to allow fast distance estimations with low storage and memory requirements. To make a sketch, each k-mer in a sequence is hashed, which creates a pseudo-random identifier. By sorting these identifiers (hashes), a small subset from the top of the sorted list can represent the entire sequence (these are min-hashes). The more similar another sequence is, the more min-hashes it is likely to share.
-
file_ext
= 'msh'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
CompressedArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.
-
file_ext
= 'compressed_archive'¶
-
compressed
= True¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Meryldb
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
MerylDB is a tar.gz archive, with 128 files. 64 data files and 64 index files.
-
file_ext
= 'meryldb'¶
-
sniff
(filename)[source]¶ Try to guess if the file is a Cel file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘affy_v_agcc.cel’) >>> Meryldb().sniff(fname) False >>> fname = get_test_fname(‘read-db.meryldb’) >>> Meryldb().sniff(fname) True
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
DynamicCompressedArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
GzDynamicCompressedArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.DynamicCompressedArchive
-
compressed_format
= 'gzip'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Bz2DynamicCompressedArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.DynamicCompressedArchive
-
compressed_format
= 'bz2'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
CompressedZipArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.
-
file_ext
= 'zip'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
GenericAsn1Binary
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for generic ASN.1 binary format
-
file_ext
= 'asn1-binary'¶
-
edam_format
= 'format_1966'¶
-
edam_data
= 'data_0849'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BamNative
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Class describing a BAM binary file that is not necessarily sorted
-
edam_format
= 'format_2572'¶
-
edam_data
= 'data_0863'¶
-
file_ext
= 'unsorted.bam'¶
-
static
merge
(split_files, output_file)[source]¶ Merges BAM files
- Parameters
split_files – List of bam file paths to merge
output_file – Write merged bam file to this location
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
to_archive
(dataset, name='')[source]¶ Collect archive paths and file handles that need to be exported when archiving dataset.
- Parameters
dataset – HistoryDatasetAssociation
name – archive name, in collection context corresponds to collection name(s) and element_identifier, joined by ‘/’, e.g ‘fastq_collection/sample1/forward’
-
groom_dataset_content
(file_name)[source]¶ Ensures that the BAM file contents are coordinate-sorted. This function is called on an output dataset after the content is initially generated.
-
display_data
(trans, dataset, preview=False, filename=None, to_ext=None, offset=None, ck_size=None, **kwd)[source]¶ Displays data in central pane if preview is True, else handles download.
Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
metadata_spec
= {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Bam
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BamNative
Class describing a BAM binary file
-
edam_format
= 'format_2572'¶
-
edam_data
= 'data_0863'¶
-
file_ext
= 'bam'¶
-
get_index_flag
(file_name)[source]¶ Return pysam flag for bai index (default) or csi index (contig size > (2**29 - 1) )
-
dataset_content_needs_grooming
(file_name)[source]¶ Check if file_name is a coordinate-sorted BAM file
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
samtools_dataprovider
(dataset, **settings)[source]¶ Generic samtools interface - all options available through settings.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function Bam.column_dataprovider>, 'dict': <function Bam.dict_dataprovider>, 'genomic-region': <function Bam.genomic_region_dataprovider>, 'genomic-region-dict': <function Bam.genomic_region_dict_dataprovider>, 'header': <function Bam.header_dataprovider>, 'id-seq-qual': <function Bam.id_seq_qual_dataprovider>, 'line': <function Bam.line_dataprovider>, 'regex-line': <function Bam.regex_line_dataprovider>, 'samtools': <function Bam.samtools_dataprovider>}¶
-
metadata_spec
= {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
ProBam
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Bam
Class describing a BAM binary file - extended for proteomics data
-
edam_format
= 'format_3826'¶
-
edam_data
= 'data_0863'¶
-
file_ext
= 'probam'¶
-
metadata_spec
= {'bam_csi_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_index': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BamInputSorted
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BamNative
-
file_ext
= 'qname_input_sorted.bam'¶ A class for BAM files that can formally be unsorted or queryname sorted. Alignments are either ordered based on the order with which the queries appear when producing the alignment, or ordered by their queryname. This notaby keeps alignments produced by paired end sequencing adjacent.
-
metadata_spec
= {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BamQuerynameSorted
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BamInputSorted
A class for queryname sorted BAM files.
-
file_ext
= 'qname_sorted.bam'¶
-
dataset_content_needs_grooming
(file_name)[source]¶ Check if file_name is a queryname-sorted BAM file
-
metadata_spec
= {'bam_header': <galaxy.model.metadata.MetadataElementSpec object>, 'bam_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'read_groups': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_lengths': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_names': <galaxy.model.metadata.MetadataElementSpec object>, 'sort_order': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
CRAM
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
file_ext
= 'cram'¶
-
edam_format
= 'format_3462'¶
-
edam_data
= 'format_0863'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'cram_index': <galaxy.model.metadata.MetadataElementSpec object>, 'cram_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BaseBcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
-
edam_format
= 'format_3020'¶
-
edam_data
= 'data_3498'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Bcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BaseBcf
Class describing a (BGZF-compressed) BCF file
-
file_ext
= 'bcf'¶
-
metadata_spec
= {'bcf_index': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BcfUncompressed
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BaseBcf
Class describing an uncompressed BCF file
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('1.bcf_uncompressed') >>> BcfUncompressed().sniff(fname) True >>> fname = get_test_fname('1.bcf') >>> BcfUncompressed().sniff(fname) False
-
file_ext
= 'bcf_uncompressed'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
H5
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an HDF5 file
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.mz5') >>> H5().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> H5().sniff(fname) False
-
file_ext
= 'h5'¶
-
edam_format
= 'format_3590'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Loom
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.H5
Class describing a Loom file: http://loompy.org/
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.loom') >>> Loom().sniff(fname) True >>> fname = get_test_fname('test.mz5') >>> Loom().sniff(fname) False
-
file_ext
= 'loom'¶
-
edam_format
= 'format_3590'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'col_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'col_attrs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'col_graphs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'col_graphs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'description': <galaxy.model.metadata.MetadataElementSpec object>, 'doi': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_count': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_names': <galaxy.model.metadata.MetadataElementSpec object>, 'loom_spec_version': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'row_graphs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'row_graphs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'title': <galaxy.model.metadata.MetadataElementSpec object>, 'url': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Anndata
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.H5
Class describing an HDF5 anndata files: http://anndata.rtfd.io >>> from galaxy.datatypes.sniff import get_test_fname >>> Anndata().sniff(get_test_fname(‘pbmc3k_tiny.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘test.mz5’)) False >>> Anndata().sniff(get_test_fname(‘import.loom.krumsiek11.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_0_6_small2.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_0_6_small.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_0_7_4_small2.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_0_7_4_small.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_unk2.h5ad’)) True >>> Anndata().sniff(get_test_fname(‘adata_unk.h5ad’)) True
-
file_ext
= 'h5ad'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'anndata_spec_version': <galaxy.model.metadata.MetadataElementSpec object>, 'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'description': <galaxy.model.metadata.MetadataElementSpec object>, 'doi': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_count': <galaxy.model.metadata.MetadataElementSpec object>, 'layers_names': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_names': <galaxy.model.metadata.MetadataElementSpec object>, 'obs_size': <galaxy.model.metadata.MetadataElementSpec object>, 'obsm_count': <galaxy.model.metadata.MetadataElementSpec object>, 'obsm_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_count': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'raw_var_size': <galaxy.model.metadata.MetadataElementSpec object>, 'row_attrs_count': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'title': <galaxy.model.metadata.MetadataElementSpec object>, 'uns_count': <galaxy.model.metadata.MetadataElementSpec object>, 'uns_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'url': <galaxy.model.metadata.MetadataElementSpec object>, 'var_count': <galaxy.model.metadata.MetadataElementSpec object>, 'var_layers': <galaxy.model.metadata.MetadataElementSpec object>, 'var_size': <galaxy.model.metadata.MetadataElementSpec object>, 'varm_count': <galaxy.model.metadata.MetadataElementSpec object>, 'varm_layers': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
GmxBinary
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Base class for GROMACS binary files - xtc, trr, cpt
-
file_ext
= ''¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
Trr
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.GmxBinary
Class describing an trr file from the GROMACS suite
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('md.trr') >>> Trr().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> Trr().sniff(fname) False
-
file_ext
= 'trr'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Cpt
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.GmxBinary
Class describing a checkpoint (.cpt) file from the GROMACS suite
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('md.cpt') >>> Cpt().sniff(fname) True >>> fname = get_test_fname('md.trr') >>> Cpt().sniff(fname) False
-
file_ext
= 'cpt'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Xtc
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.GmxBinary
Class describing an xtc file from the GROMACS suite
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('md.xtc') >>> Xtc().sniff(fname) True >>> fname = get_test_fname('md.trr') >>> Xtc().sniff(fname) False
-
file_ext
= 'xtc'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Edr
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.GmxBinary
Class describing an edr file from the GROMACS suite
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('md.edr') >>> Edr().sniff(fname) True >>> fname = get_test_fname('md.trr') >>> Edr().sniff(fname) False
-
file_ext
= 'edr'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Biom2
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.H5
Class describing a biom2 file (http://biom-format.org/documentation/biom_format.html)
-
file_ext
= 'biom2'¶
-
edam_format
= 'format_3746'¶
-
sniff
(filename)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2') >>> Biom2().sniff(fname) True >>> fname = get_test_fname('test.mz5') >>> Biom2().sniff(fname) False >>> fname = get_test_fname('wiggle.wig') >>> Biom2().sniff(fname) False
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'creation_date': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'format_url': <galaxy.model.metadata.MetadataElementSpec object>, 'format_version': <galaxy.model.metadata.MetadataElementSpec object>, 'generated_by': <galaxy.model.metadata.MetadataElementSpec object>, 'id': <galaxy.model.metadata.MetadataElementSpec object>, 'nnz': <galaxy.model.metadata.MetadataElementSpec object>, 'shape': <galaxy.model.metadata.MetadataElementSpec object>, 'type': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Cool
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.H5
Class describing the cool format (https://github.com/mirnylab/cooler)
-
file_ext
= 'cool'¶
-
sniff
(filename)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('matrix.cool') >>> Cool().sniff(fname) True >>> fname = get_test_fname('test.mz5') >>> Cool().sniff(fname) False >>> fname = get_test_fname('wiggle.wig') >>> Cool().sniff(fname) False >>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2') >>> Cool().sniff(fname) False
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
MCool
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.H5
Class describing the multi-resolution cool format (https://github.com/mirnylab/cooler)
-
file_ext
= 'mcool'¶
-
sniff
(filename)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('matrix.mcool') >>> MCool().sniff(fname) True >>> fname = get_test_fname('matrix.cool') >>> MCool().sniff(fname) False >>> fname = get_test_fname('test.mz5') >>> MCool().sniff(fname) False >>> fname = get_test_fname('wiggle.wig') >>> MCool().sniff(fname) False >>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2') >>> MCool().sniff(fname) False
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Scf
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an scf binary sequence file
-
edam_format
= 'format_1632'¶
-
edam_data
= 'data_0924'¶
-
file_ext
= 'scf'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Sff
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Standard Flowgram Format (SFF)
-
edam_format
= 'format_3284'¶
-
edam_data
= 'data_0924'¶
-
file_ext
= 'sff'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
BigWig
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Accessing binary BigWig files from UCSC. The supplemental info in the paper has the binary details: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq351v1
-
edam_format
= 'format_3006'¶
-
edam_data
= 'data_3002'¶
-
file_ext
= 'bigwig'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
BigBed
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BigWig
BigBed support from UCSC.
-
edam_format
= 'format_3004'¶
-
edam_data
= 'data_3002'¶
-
file_ext
= 'bigbed'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
TwoBit
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a TwoBit format nucleotide file
-
edam_format
= 'format_3009'¶
-
edam_data
= 'data_0848'¶
-
file_ext
= 'twobit'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
SQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a Sqlite database
-
file_ext
= 'sqlite'¶
-
edam_format
= 'format_3621'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'sqlite': <function SQlite.sqlite_dataprovider>, 'sqlite-dict': <function SQlite.sqlite_datadictprovider>, 'sqlite-table': <function SQlite.sqlite_datatableprovider>}¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
GeminiSQLite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Gemini Sqlite database
-
file_ext
= 'gemini.sqlite'¶
-
edam_format
= 'format_3622'¶
-
edam_data
= 'data_3498'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'gemini_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
ChiraSQLite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a ChiRAViz Sqlite database
-
file_ext
= 'chira.sqlite'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
CuffDiffSQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a CuffDiff SQLite database
-
file_ext
= 'cuffdiff.sqlite'¶
-
edam_format
= 'format_3621'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'cuffdiff_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'genes': <galaxy.model.metadata.MetadataElementSpec object>, 'samples': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
MzSQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Proteomics Sqlite database
-
file_ext
= 'mz.sqlite'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
PQP
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Peptide query parameters file
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.pqp') >>> PQP().sniff(fname) True >>> fname = get_test_fname('test.osw') >>> PQP().sniff(fname) False
-
file_ext
= 'pqp'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
sniff
(filename)[source]¶ table definition according to https://github.com/grosenberger/OpenMS/blob/develop/src/openms/source/ANALYSIS/OPENSWATH/TransitionPQPFile.cpp#L264 for now VERSION GENE PEPTIDE_GENE_MAPPING are excluded, since there is test data wo these tables, see also here https://github.com/OpenMS/OpenMS/issues/4365
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OSW
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing OpenSwath output
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.osw') >>> OSW().sniff(fname) True >>> fname = get_test_fname('test.sqmass') >>> OSW().sniff(fname) False
-
file_ext
= 'osw'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
SQmass
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Sqmass database
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.sqmass') >>> SQmass().sniff(fname) True >>> fname = get_test_fname('test.pqp') >>> SQmass().sniff(fname) False
-
file_ext
= 'sqmass'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
BlibSQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Proteomics Spectral Library Sqlite database
-
file_ext
= 'blib'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'blib_version': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
DlibSQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Proteomics Spectral Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.dlib') >>> DlibSQlite().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> DlibSQlite().sniff(fname) False
-
file_ext
= 'dlib'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dlib_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
ElibSQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Proteomics Chromatagram Library Sqlite database DLIBs only have the “entries”, “metadata”, and “peptidetoprotein” tables populated. ELIBs have the rest of the tables populated too, such as “peptidequants” or “peptidescores”.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.elib') >>> ElibSQlite().sniff(fname) True >>> fname = get_test_fname('test.dlib') >>> ElibSQlite().sniff(fname) False
-
file_ext
= 'elib'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
IdpDB
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing an IDPicker 3 idpDB (sqlite) database
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.idpdb') >>> IdpDB().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> IdpDB().sniff(fname) False
-
file_ext
= 'idpdb'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
GAFASQLite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a GAFA SQLite database
-
file_ext
= 'gafa.sqlite'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'gafa_schema_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
NcbiTaxonomySQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing the NCBI Taxonomy database stored in SQLite as done by rust-ncbitaxonomy
-
file_ext
= 'ncbitaxonomy.sqlite'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'ncbitaxonomy_schema_version': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_row_count': <galaxy.model.metadata.MetadataElementSpec object>, 'tables': <galaxy.model.metadata.MetadataElementSpec object>, 'taxon_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Xlsx
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for Excel 2007 (xlsx) files
-
file_ext
= 'xlsx'¶
-
compressed
= True¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
ExcelXls
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an Excel (xls) file
-
file_ext
= 'excel.xls'¶
-
edam_format
= 'format_3468'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Sra
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Sequence Read Archive (SRA) datatype originally from mdshw5/sra-tools-galaxy
-
file_ext
= 'sra'¶
-
sniff_prefix
(sniff_prefix)[source]¶ The first 8 bytes of any NCBI sra file is ‘NCBI.sra’, and the file is binary. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
RData
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Generic R Data file datatype implementation
-
file_ext
= 'rdata'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliBinary
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliCountGraph
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliCountGraph starts with “OXLI” + one byte version number + 8-bit binary ‘1’ Test file generated via:
load-into-counting.py --n_tables 1 --max-tablesize 1 \ oxli_countgraph.oxlicg khmer/tests/test-data/100-reads.fq.bz2
using khmer 2.0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliCountGraph().sniff(fname) False >>> fname = get_test_fname("oxli_countgraph.oxlicg") >>> OxliCountGraph().sniff(fname) True
-
file_ext
= 'oxlicg'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliNodeGraph
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliNodeGraph starts with “OXLI” + one byte version number + 8-bit binary ‘2’ Test file generated via:
load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \ khmer/tests/test-data/100-reads.fq.bz2
using khmer 2.0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliNodeGraph().sniff(fname) False >>> fname = get_test_fname("oxli_nodegraph.oxling") >>> OxliNodeGraph().sniff(fname) True
-
file_ext
= 'oxling'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliTagSet
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliTagSet starts with “OXLI” + one byte version number + 8-bit binary ‘3’ Test file generated via:
load-graph.py --n_tables 1 --max-tablesize 1 oxli_nodegraph.oxling \ khmer/tests/test-data/100-reads.fq.bz2; mv oxli_nodegraph.oxling.tagset oxli_tagset.oxlits
using khmer 2.0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliTagSet().sniff(fname) False >>> fname = get_test_fname("oxli_tagset.oxlits") >>> OxliTagSet().sniff(fname) True
-
file_ext
= 'oxlits'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliStopTags
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliStopTags starts with “OXLI” + one byte version number + 8-bit binary ‘4’ Test file adapted from khmer 2.0’s “khmer/tests/test-data/goodversion-k32.stoptags”
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliStopTags().sniff(fname) False >>> fname = get_test_fname("oxli_stoptags.oxlist") >>> OxliStopTags().sniff(fname) True
-
file_ext
= 'oxlist'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliSubset
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliSubset starts with “OXLI” + one byte version number + 8-bit binary ‘5’ Test file generated via:
load-graph.py -k 20 example tests/test-data/random-20-a.fa; partition-graph.py example; mv example.subset.0.pmap oxli_subset.oxliss
using khmer 2.0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliSubset().sniff(fname) False >>> fname = get_test_fname("oxli_subset.oxliss") >>> OxliSubset().sniff(fname) True
-
file_ext
= 'oxliss'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
OxliGraphLabels
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.OxliBinary
OxliGraphLabels starts with “OXLI” + one byte version number + 8-bit binary ‘6’ Test file generated via:
python -c "from khmer import GraphLabels; \ gl = GraphLabels(20, 1e7, 4); \ gl.consume_fasta_and_tag_with_labels('tests/test-data/test-labels.fa'); \ gl.save_labels_and_tags('oxli_graphlabels.oxligl')"
using khmer 2.0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('sequence.csfasta') >>> OxliGraphLabels().sniff(fname) False >>> fname = get_test_fname("oxli_graphlabels.oxligl") >>> OxliGraphLabels().sniff(fname) True
-
file_ext
= 'oxligl'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
PostgresqlArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Class describing a Postgresql database packed into a tar archive
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('postgresql_fake.tar.bz2') >>> PostgresqlArchive().sniff(fname) True >>> fname = get_test_fname('test.fast5.tar') >>> PostgresqlArchive().sniff(fname) False
-
file_ext
= 'postgresql'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Fast5Archive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Class describing a FAST5 archive
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.fast5.tar') >>> Fast5Archive().sniff(fname) True
-
file_ext
= 'fast5.tar'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Fast5ArchiveGz
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Fast5Archive
Class describing a gzip-compressed FAST5 archive
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.fast5.tar.gz') >>> Fast5ArchiveGz().sniff(fname) True >>> fname = get_test_fname('test.fast5.tar.bz2') >>> Fast5ArchiveGz().sniff(fname) False >>> fname = get_test_fname('test.fast5.tar') >>> Fast5ArchiveGz().sniff(fname) False
-
file_ext
= 'fast5.tar.gz'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Fast5ArchiveBz2
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Fast5Archive
Class describing a bzip2-compressed FAST5 archive
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.fast5.tar.bz2') >>> Fast5ArchiveBz2().sniff(fname) True >>> fname = get_test_fname('test.fast5.tar.gz') >>> Fast5ArchiveBz2().sniff(fname) False >>> fname = get_test_fname('test.fast5.tar') >>> Fast5ArchiveBz2().sniff(fname) False
-
file_ext
= 'fast5.tar.bz2'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fast5_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
SearchGuiArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Class describing a SearchGUI archive
-
file_ext
= 'searchgui_archive'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'searchgui_major_version': <galaxy.model.metadata.MetadataElementSpec object>, 'searchgui_version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
NetCDF
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Binary data in netCDF format
-
file_ext
= 'netcdf'¶
-
edam_format
= 'format_3650'¶
-
edam_data
= 'data_0943'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
Dcd
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a dcd file from the CHARMM molecular simulation program
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test_glucose_vacuum.dcd') >>> Dcd().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> Dcd().sniff(fname) False
-
file_ext
= 'dcd'¶
-
edam_data
= 'data_3842'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Vel
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a velocity file from the CHARMM molecular simulation program
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test_charmm.vel') >>> Vel().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> Vel().sniff(fname) False
-
file_ext
= 'vel'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
DAA
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an DAA (diamond alignment archive) file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond.daa’) >>> DAA().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> DAA().sniff(fname) False
-
file_ext
= 'daa'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
RMA6
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an RMA6 (MEGAN6 read-match archive) file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond.rma6’) >>> RMA6().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> RMA6().sniff(fname) False
-
file_ext
= 'rma6'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
DMND
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an DMND file >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘diamond_db.dmnd’) >>> DMND().sniff(fname) True >>> fname = get_test_fname(‘interval.interval’) >>> DMND().sniff(fname) False
-
file_ext
= 'dmnd'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
ICM
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an ICM (interpolated context model) file, used by Glimmer
-
file_ext
= 'icm'¶
-
edam_data
= 'data_0950'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
Parquet
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing Apache Parquet file (https://parquet.apache.org/) >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘example.parquet’) >>> Parquet().sniff(fname) True >>> fname = get_test_fname(‘test.mz5’) >>> Parquet().sniff(fname) False
-
file_ext
= 'parquet'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.binary.
BafTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.CompressedArchive
Base class for common behavior of tar files of directory-based raw file formats >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘brukerbaf.d.tar’) >>> BafTar().sniff(fname) True >>> fname = get_test_fname(‘test.fast5.tar’) >>> BafTar().sniff(fname) False
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_3712'¶
-
file_ext
= 'brukerbaf.d.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
YepTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BafTar
A tar’d up .d directory containing Agilent/Bruker YEP format data
-
file_ext
= 'agilentbrukeryep.d.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
TdfTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BafTar
A tar’d up .d directory containing Bruker TDF format data
-
file_ext
= 'brukertdf.d.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
MassHunterTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BafTar
A tar’d up .d directory containing Agilent MassHunter format data
-
file_ext
= 'agilentmasshunter.d.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
MassLynxTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BafTar
A tar’d up .d directory containing Waters MassLynx format data
-
file_ext
= 'watersmasslynx.raw.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.binary.
WiffTar
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BafTar
A tar’d up .wiff/.scan pair containing Sciex WIFF format data >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘some.wiff.tar’) >>> WiffTar().sniff(fname) True >>> fname = get_test_fname(‘brukerbaf.d.tar’) >>> WiffTar().sniff(fname) False >>> fname = get_test_fname(‘test.fast5.tar’) >>> WiffTar().sniff(fname) False
-
file_ext
= 'wiff.tar'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.blast module¶
NCBI BLAST datatypes.
Covers the blastxml
format and the BLAST databases.
-
class
galaxy.datatypes.blast.
BlastXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
NCBI Blast XML Output data
-
file_ext
= 'blastxml'¶
-
edam_format
= 'format_3331'¶
-
edam_data
= 'data_0857'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is blastxml
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('megablast_xml_parser_test1.blastxml') >>> BlastXml().sniff(fname) True >>> fname = get_test_fname('tblastn_four_human_vs_rhodopsin.blastxml') >>> BlastXml().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> BlastXml().sniff(fname) False
-
static
merge
(split_files, output_file)[source]¶ Merging multiple XML files is non-trivial and must be done in subclasses.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.blast.
BlastNucDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.blast._BlastDb
Class for nucleotide BLAST database files.
-
file_ext
= 'blastdbn'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.blast.
BlastProtDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.blast._BlastDb
Class for protein BLAST database files.
-
file_ext
= 'blastdbp'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.blast.
BlastDomainDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.blast._BlastDb
Class for domain BLAST database files.
-
file_ext
= 'blastdbd'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.blast.
LastDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class for LAST database files.
-
file_ext
= 'lastdb'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.blast.
BlastNucDb5
(**kwd)[source]¶ Bases:
galaxy.datatypes.blast._BlastDb
Class for nucleotide BLAST database files.
-
file_ext
= 'blastdbn5'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.blast.
BlastProtDb5
(**kwd)[source]¶ Bases:
galaxy.datatypes.blast._BlastDb
Class for protein BLAST database files.
-
file_ext
= 'blastdbp5'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.checkers module¶
Module proxies galaxy.util.checkers
for backward compatibility.
External datatypes may make use of these functions.
-
galaxy.datatypes.checkers.
check_html
(name, file_path=True)[source]¶ Returns True if the file/string contains HTML code.
galaxy.datatypes.chrominfo module¶
-
class
galaxy.datatypes.chrominfo.
ChromInfo
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'len'¶
-
metadata_spec
= {'chrom': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'length': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.constructive_solid_geometry module¶
Constructive Solid Geometry file formats.
-
class
galaxy.datatypes.constructive_solid_geometry.
Ply
(**kwd)[source]¶ Bases:
object
The PLY format describes an object as a collection of vertices, faces and other elements, along with properties such as color and normal direction that can be attached to these elements. A PLY file contains the description of exactly one object.
-
subtype
= ''¶
-
sniff_prefix
(file_prefix)[source]¶ The structure of a typical PLY file: Header, Vertex List, Face List, (lists of other elements)
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
PlyAscii
(**kwd)[source]¶ Bases:
galaxy.datatypes.constructive_solid_geometry.Ply
,galaxy.datatypes.data.Text
-
file_ext
= 'plyascii'¶
-
subtype
= 'ascii'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'face': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
PlyBinary
(**kwd)[source]¶ Bases:
galaxy.datatypes.constructive_solid_geometry.Ply
,galaxy.datatypes.binary.Binary
-
file_ext
= 'plybinary'¶
-
subtype
= 'binary'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'face': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'other_elements': <galaxy.model.metadata.MetadataElementSpec object>, 'vertex': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
Vtk
(**kwd)[source]¶ Bases:
object
The Visualization Toolkit provides a number of source and writer objects to read and write popular data file formats. The Visualization Toolkit also provides some of its own file formats.
There are two different styles of file formats available in VTK. The simplest are the legacy, serial formats that are easy to read and write either by hand or programmatically. However, these formats are less flexible than the XML based file formats which support random access, parallel I/O, and portable data compression and are preferred to the serial VTK file formats whenever possible.
All keyword phrases are written in ASCII form whether the file is binary or ASCII. The binary section of the file (if in binary form) is the data proper; i.e., the numbers that define points coordinates, scalars, cell indices, and so forth.
Binary data must be placed into the file immediately after the newline (‘\n’) character from the previous ASCII keyword and parameter sequence.
TODO: only legacy formats are currently supported and support for XML formats should be added.
-
subtype
= ''¶
-
sniff_prefix
(file_prefix)[source]¶ VTK files can be either ASCII or binary, with two different styles of file formats: legacy or XML. We’ll assume if the file contains a valid VTK header, then it is a valid VTK file.
-
set_structure_metadata
(line, dataset, dataset_type)[source]¶ The fourth part of legacy VTK files is the dataset structure. The geometry part describes the geometry and topology of the dataset. This part begins with a line containing the keyword DATASET followed by a keyword describing the type of dataset. Then, depending upon the type of dataset, other keyword/ data combinations define the actual data.
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
VtkAscii
(**kwd)[source]¶ Bases:
galaxy.datatypes.constructive_solid_geometry.Vtk
,galaxy.datatypes.data.Text
-
file_ext
= 'vtkascii'¶
-
subtype
= 'ASCII'¶
-
metadata_spec
= {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'lines': <galaxy.model.metadata.MetadataElementSpec object>, 'origin': <galaxy.model.metadata.MetadataElementSpec object>, 'points': <galaxy.model.metadata.MetadataElementSpec object>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
VtkBinary
(**kwd)[source]¶ Bases:
galaxy.datatypes.constructive_solid_geometry.Vtk
,galaxy.datatypes.binary.Binary
-
file_ext
= 'vtkbinary'¶
-
subtype
= 'BINARY'¶
-
metadata_spec
= {'cells': <galaxy.model.metadata.MetadataElementSpec object>, 'dataset_type': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'dimensions': <galaxy.model.metadata.MetadataElementSpec object>, 'field_components': <galaxy.model.metadata.MetadataElementSpec object>, 'field_names': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'lines': <galaxy.model.metadata.MetadataElementSpec object>, 'origin': <galaxy.model.metadata.MetadataElementSpec object>, 'points': <galaxy.model.metadata.MetadataElementSpec object>, 'polygons': <galaxy.model.metadata.MetadataElementSpec object>, 'spacing': <galaxy.model.metadata.MetadataElementSpec object>, 'triangle_strips': <galaxy.model.metadata.MetadataElementSpec object>, 'vertices': <galaxy.model.metadata.MetadataElementSpec object>, 'vtk_version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.constructive_solid_geometry.
STL
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
-
file_ext
= 'stl'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.coverage module¶
Coverage datatypes
-
class
galaxy.datatypes.coverage.
LastzCoverage
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'coverage'¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'forwardCol': <galaxy.model.metadata.MetadataElementSpec object>, 'positionCol': <galaxy.model.metadata.MetadataElementSpec object>, 'reverseCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.data module¶
-
class
galaxy.datatypes.data.
DataMeta
(name, bases, namespace, **kwargs)[source]¶ Bases:
abc.ABCMeta
Metaclass for Data class. Sets up metadata spec.
-
class
galaxy.datatypes.data.
Data
(**kwd)[source]¶ Bases:
object
Base class for all datatypes. Implements basic interfaces as well as class methods for metadata.
>>> class DataTest( Data ): ... MetadataElement( name="test" ) ... >>> DataTest.metadata_spec.test.name 'test' >>> DataTest.metadata_spec.test.desc 'test' >>> type( DataTest.metadata_spec.test.param ) <class 'galaxy.model.metadata.MetadataParameter'>
-
edam_data
= 'data_0006'¶
-
edam_format
= 'format_1915'¶
-
file_ext
= 'data'¶
-
CHUNKABLE
= False¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶ Dictionary of metadata fields for this datatype
-
copy_safe_peek
= True¶
-
is_binary
= True¶
-
primary_file_name
= 'index'¶
-
classmethod
is_datatype_change_allowed
()[source]¶ Returns the value of the allow_datatype_change class attribute if set in a subclass, or True iff the datatype is not composite.
-
get_raw_data
(dataset)[source]¶ Returns the full data. To stream it open the file_name and read/write as needed
-
dataset_content_needs_grooming
(file_name)[source]¶ This function is called on an output dataset file after the content is initially generated.
-
groom_dataset_content
(file_name)[source]¶ This function is called on an output dataset file if dataset_content_needs_grooming returns True.
-
set_meta
(dataset: Any, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
missing_meta
(dataset, check=None, skip=None)[source]¶ Checks for empty metadata values, Returns True if non-optional metadata is missing Specifying a list of ‘check’ values will only check those names provided; when used, optionality is ignored Specifying a list of ‘skip’ items will return True even when a named metadata value is missing
-
property
max_optional_metadata_filesize
¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek and blurb text
- Parameters
is_multi_byte (bool) – deprecated
-
to_archive
(dataset, name='')[source]¶ Collect archive paths and file handles that need to be exported when archiving dataset.
- Parameters
dataset – HistoryDatasetAssociation
name – archive name, in collection context corresponds to collection name(s) and element_identifier, joined by ‘/’, e.g ‘fastq_collection/sample1/forward’
-
display_data
(trans, data, preview=False, filename=None, to_ext=None, **kwd)[source]¶ Displays data in central pane if preview is True, else handles download.
Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
display_as_markdown
(dataset_instance, markdown_format_helpers)[source]¶ Prepare for embedding dataset into a basic Markdown document.
This is a somewhat experimental interface and should not be implemented on datatypes not tightly tied to a Galaxy version (e.g. datatypes in the Tool Shed).
Speaking very losely - the datatype should should load a bounded amount of data from the supplied dataset instance and prepare for embedding it into Markdown. This should be relatively vanilla Markdown - the result of this is bleached and it should not contain nested Galaxy Markdown directives.
If the data cannot reasonably be displayed, just indicate this and do not throw an exception.
-
repair_methods
(dataset)[source]¶ Unimplemented method, returns dict with method/option for repairing errors
-
add_display_app
(app_id, label, file_function, links_function)[source]¶ Adds a display app to the datatype. app_id is a unique id label is the primary display label, e.g., display at ‘UCSC’ file_function is a string containing the name of the function that returns a properly formatted display links_function is a string containing the name of the function that returns a list of (link_name,link)
-
as_display_type
(dataset, type, **kwd)[source]¶ Returns modified file contents for a particular display type
-
get_display_links
(dataset, type, app, base_url, target_frame='_blank', **kwd)[source]¶ Returns a list of tuples of (name, link) for a particular display type. No check on ‘access’ permissions is done here - if you can view the dataset, you can also save it or send it to a destination outside of Galaxy, so Galaxy security restrictions do not apply anyway.
-
get_converter_types
(original_dataset, datatypes_registry)[source]¶ Returns available converters by type for this dataset
-
find_conversion_destination
(dataset, accepted_formats, datatypes_registry, **kwd)[source]¶ Returns ( direct_match, converted_ext, existing converted dataset )
-
convert_dataset
(trans, original_dataset, target_type, return_output=False, visible=True, deps=None, target_context=None, history=None)[source]¶ This function adds a job to the queue to convert a dataset to another type. Returns a message about success/failure.
-
after_setting_metadata
(dataset)[source]¶ This function is called on the dataset after metadata is set.
-
before_setting_metadata
(dataset)[source]¶ This function is called on the dataset before metadata is set.
-
property
writable_files
¶
-
property
has_resolution
¶
-
matches_any
(target_datatypes)[source]¶ Check if this datatype is of any of the target_datatypes or is a subtype thereof.
-
static
merge
(split_files, output_file)[source]¶ Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.
-
dataprovider
(dataset, data_format, **settings)[source]¶ Base dataprovider factory for all datatypes that returns the proper provider for the given data_format or raises a NoProviderAvailable.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>}¶
-
-
class
galaxy.datatypes.data.
Text
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
-
edam_format
= 'format_2330'¶
-
file_ext
= 'txt'¶
-
line_class
= 'line'¶
-
is_binary
= False¶
-
estimate_file_lines
(dataset)[source]¶ Perform a rough estimate by extrapolating number of lines from a small read.
-
count_data_lines
(dataset)[source]¶ Count the number of lines of data in dataset, skipping all blank lines and comments.
-
set_peek
(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=None, line_wrap=True, **kwd)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by line.
-
line_dataprovider
(dataset, **settings)[source]¶ Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.
-
regex_line_dataprovider
(dataset, **settings)[source]¶ Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.data.
Directory
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class representing a directory of files.
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.data.
GenericAsn1
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class for generic ASN.1 text format
-
edam_data
= 'data_0849'¶
-
edam_format
= 'format_1966'¶
-
file_ext
= 'asn1'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.data.
LineCount
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Dataset contains a single line with a single integer that denotes the line count for a related dataset. Used for custom builds.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.data.
Newick
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
New Hampshire/Newick Format
-
edam_data
= 'data_0872'¶
-
edam_format
= 'format_1910'¶
-
file_ext
= 'newick'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.data.
Nexus
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Nexus format as used By Paup, Mr Bayes, etc
-
edam_data
= 'data_0872'¶
-
edam_format
= 'format_1912'¶
-
file_ext
= 'nex'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
galaxy.datatypes.data.
get_file_peek
(file_name, is_multi_byte=False, WIDTH=256, LINE_COUNT=5, skipchars=None, line_wrap=True)[source]¶ Returns the first LINE_COUNT lines wrapped to WIDTH.
- Parameters
is_multi_byte (bool) – deprecated
>>> def assert_peek_is(file_name, expected, *args, **kwd): ... path = get_test_fname(file_name) ... peek = get_file_peek(path, *args, **kwd) ... assert peek == expected, "%s != %s" % (peek, expected) >>> assert_peek_is('0_nonewline', u'0') >>> assert_peek_is('0.txt', u'0\n') >>> assert_peek_is('4.bed', u'chr22\t30128507\t31828507\tuc003bnx.1_cds_2_0_chr22_29227_f\t0\t+\n', LINE_COUNT=1) >>> assert_peek_is('1.bed', u'chr1\t147962192\t147962580\tCCDS989.1_cds_0_0_chr1_147962193_r\t0\t-\nchr1\t147984545\t147984630\tCCDS990.1_cds_0_0_chr1_147984546_f\t0\t+\n', LINE_COUNT=2)
galaxy.datatypes.genetics module¶
rgenetics datatypes Use at your peril Ross Lazarus for the rgenetics and galaxy projects
genome graphs datatypes derived from Interval datatypes genome graphs datasets have a header row with appropriate columnames The first column is always the marker - eg columname = rs, first row= rs12345 if the rows are snps subsequent row values are all numeric ! Will fail if any non numeric (eg ‘+’ or ‘NA’) values ross lazarus for rgenetics august 20 2007
-
class
galaxy.datatypes.genetics.
GenomeGraphs
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data containing a marker id and any number of numeric values
-
file_ext
= 'gg'¶
-
set_meta
(dataset, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
ucsc_links
(dataset, type, app, base_url)[source]¶ from the ever-helpful angie hinrichs angie@soe.ucsc.edu a genome graphs call looks like this
http://genome.ucsc.edu/cgi-bin/hgGenome?clade=mammal&org=Human&db=hg18&hgGenome_dataSetName=dname &hgGenome_dataSetDescription=test&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess &hgGenome_columnLabels=best%20guess&hgGenome_maxVal=&hgGenome_labelVals= &hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http://galaxy.esphealth.org/datasets/333/display/index &hgGenome_doSubmitUpload=submit
Galaxy gives this for an interval file
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr1:1-1000&hgt.customText= http%3A%2F%2Fgalaxy.esphealth.org%2Fdisplay_as%3Fid%3D339%26display_app%3Ducsc
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in gg format
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'test_space.txt' ) >>> GenomeGraphs().sniff( fname ) False >>> fname = get_test_fname( '1.gg' ) >>> GenomeGraphs().sniff( fname ) True
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'markerCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.genetics.
rgTabList
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
for sampleid and for featureid lists of exclusions or inclusions in the clean tool featureid subsets on statistical criteria -> specialized display such as gg
-
file_ext
= 'rgTList'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
rgSampleList
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.rgTabList
for sampleid exclusions or inclusions in the clean tool output from QC eg excess het, gender error, ibd pair member,eigen outlier,excess mendel errors,… since they can be uploaded, should be flexible but they are persistent at least same infrastructure for expression?
-
file_ext
= 'rgSList'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
rgFeatureList
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.rgTabList
for featureid lists of exclusions or inclusions in the clean tool output from QC eg low maf, high missingness, bad hwe in controls, excess mendel errors,… featureid subsets on statistical criteria -> specialized display such as gg same infrastructure for expression?
-
file_ext
= 'rgFList'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Rgenetics
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
base class to use for rgenetics datatypes derived from html - composite datatype elements stored in extra files path
-
file_ext
= 'rgenetics'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
SNPMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
BioC SNPMatrix Rgenetics data collections
-
file_ext
= 'snpmatrix'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Lped
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
linkage pedigree (ped,map) Rgenetics data collections
-
file_ext
= 'lped'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Pphe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Plink phenotype file - header must have FID IID… Rgenetics data collections
-
file_ext
= 'pphe'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Fphe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
fbat pedigree file - mad format with ! as first char on header row Rgenetics data collections
-
file_ext
= 'fphe'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Phe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Phenotype file
-
file_ext
= 'phe'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Fped
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
FBAT pedigree format - single file, map is header row of rs numbers. Strange. Rgenetics data collections
-
file_ext
= 'fped'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Pbed
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Plink Binary compressed 2bit/geno Rgenetics data collections
-
file_ext
= 'pbed'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
ldIndep
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
LD (a good measure of redundancy of information) depleted Plink Binary compressed 2bit/geno This is really a plink binary, but some tools work better with less redundancy so are constrained to these files
-
file_ext
= 'ldreduced'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Eigenstratgeno
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Eigenstrat format - may be able to get rid of this if we move to shellfish Rgenetics data collections
-
file_ext
= 'eigenstratgeno'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Eigenstratpca
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Eigenstrat PCA file for case control adjustment Rgenetics data collections
-
file_ext
= 'eigenstratpca'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Snptest
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
BioC snptest Rgenetics data collections
-
file_ext
= 'snptest'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
IdeasPre
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
This datatype defines the input format required by IDEAS: https://academic.oup.com/nar/article/44/14/6721/2468150 The IDEAS preprocessor tool produces an output using this format. The extra_files_path of the primary input dataset contains the following files and directories. - chromosome_windows.txt (optional) - chromosomes.bed (optional) - IDEAS_input_config.txt - compressed archived tmp directory containing a number of compressed bed files.
-
file_ext
= 'ideaspre'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom_bed': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom_windows': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'input_config': <galaxy.model.metadata.MetadataElementSpec object>, 'tmp_archive': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Pheno
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
base class for pheno files
-
file_ext
= 'pheno'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
RexpBase
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
base class for BioC data structures in Galaxy must be constructed with the pheno data in place since that goes into the metadata for each instance
-
file_ext
= 'rexpbase'¶
-
html_table
= None¶
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
get_phecols
(phenolist, maxConc=20)[source]¶ sept 2009: cannot use whitespace to split - make a more complex structure here and adjust the methods that rely on this structure return interesting phenotype column names for an rexpression eset or affybatch to use in array subsetting and so on. Returns a data structure for a dynamic Galaxy select parameter. A column with only 1 value doesn’t change, so is not interesting for analysis. A column with a different value in every row is equivalent to a unique identifier so is also not interesting for anova or limma analysis - both these are removed after the concordance (count of unique terms) is constructed for each column. Then a complication - each remaining pair of columns is tested for redundancy - if two columns are always paired, then only one is needed :)
-
get_pheno
(dataset)[source]¶ expects a .pheno file in the extra_files_dir - ugh note that R is wierd and adds the row.name in the header so the columns are all wrong - unless you tell it not to. A file can be written as write.table(file=’foo.pheno’,pData(foo),sep=’ ‘,quote=F,row.names=F)
-
set_peek
(dataset, **kwd)[source]¶ expects a .pheno file in the extra_files_dir - ugh note that R is weird and does not include the row.name in the header. why?
-
get_file_peek
(filename)[source]¶ can’t really peek at a filename - need the extra_files_path and such?
-
set_meta
(dataset, **kwd)[source]¶ NOTE we apply the tabular machinary to the phenodata extracted from a BioC eSet or affybatch.
-
make_html_table
(pp='nothing supplied from peek\n')[source]¶ Create HTML table, used for displaying peek
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Affybatch
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'affybatch'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
Eset
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'eset'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
MAlist
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'malist'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'pheCols': <galaxy.model.metadata.MetadataElementSpec object>, 'pheno_path': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
LinkageStudies
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
superclass for classical linkage analysis suites
-
test_files
= ['linkstudies.allegro_fparam', 'linkstudies.alohomora_gts', 'linkstudies.linkage_datain', 'linkstudies.linkage_map']¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.genetics.
GenotypeMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.LinkageStudies
Sample matrix of genotypes - GTs as columns
-
file_ext
= 'alohomora_gts'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> classname = GenotypeMatrix >>> from galaxy.datatypes.sniff import get_test_fname >>> extn_true = classname().file_ext >>> file_true = get_test_fname("linkstudies." + extn_true) >>> classname().sniff(file_true) True >>> false_files = list(LinkageStudies.test_files) >>> false_files.remove("linkstudies." + extn_true) >>> result_true = [] >>> for fname in false_files: ... file_false = get_test_fname(fname) ... res = classname().sniff(file_false) ... if res: ... result_true.append(fname) >>> >>> result_true []
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.genetics.
MarkerMap
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.LinkageStudies
Map of genetic markers including physical and genetic distance Common input format for linkage programs
chrom, genetic pos, markername, physical pos, Nr
-
file_ext
= 'linkage_map'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> classname = MarkerMap >>> from galaxy.datatypes.sniff import get_test_fname >>> extn_true = classname().file_ext >>> file_true = get_test_fname("linkstudies." + extn_true) >>> classname().sniff(file_true) True >>> false_files = list(LinkageStudies.test_files) >>> false_files.remove("linkstudies." + extn_true) >>> result_true = [] >>> for fname in false_files: ... file_false = get_test_fname(fname) ... res = classname().sniff(file_false) ... if res: ... result_true.append(fname) >>> >>> result_true []
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.genetics.
DataIn
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.LinkageStudies
Common linkage input file for intermarker distances and recombination rates
-
file_ext
= 'linkage_datain'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> classname = DataIn >>> from galaxy.datatypes.sniff import get_test_fname >>> extn_true = classname().file_ext >>> file_true = get_test_fname("linkstudies." + extn_true) >>> classname().sniff(file_true) True >>> false_files = list(LinkageStudies.test_files) >>> false_files.remove("linkstudies." + extn_true) >>> result_true = [] >>> for fname in false_files: ... file_false = get_test_fname(fname) ... res = classname().sniff(file_false) ... if res: ... result_true.append(fname) >>> >>> result_true []
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.genetics.
AllegroLOD
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.LinkageStudies
Allegro output format for LOD scores
-
file_ext
= 'allegro_fparam'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> classname = AllegroLOD >>> from galaxy.datatypes.sniff import get_test_fname >>> extn_true = classname().file_ext >>> file_true = get_test_fname("linkstudies." + extn_true) >>> classname().sniff(file_true) True >>> false_files = list(LinkageStudies.test_files) >>> false_files.remove("linkstudies." + extn_true) >>> result_true = [] >>> for fname in false_files: ... file_false = get_test_fname(fname) ... res = classname().sniff(file_false) ... if res: ... result_true.append(fname) >>> >>> result_true []
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.gis module¶
GIS classes
-
class
galaxy.datatypes.gis.
Shapefile
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
The Shapefile data format: For more information please see http://en.wikipedia.org/wiki/Shapefile
-
file_ext
= 'shp'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.graph module¶
Graph content classes.
-
class
galaxy.datatypes.graph.
Xgmml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
XGMML graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).
-
file_ext
= 'xgmml'¶
-
static
merge
(split_files, output_file)[source]¶ Merging multiple XML files is non-trivial and must be done in subclasses.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'node-edge': <function Xgmml.node_edge_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'xml': <function GenericXml.xml_dataprovider>}¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.graph.
Sif
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
SIF graph format (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats).
First column: node id Second column: relationship type Third to Nth column: target ids for link
-
file_ext
= 'sif'¶
-
static
merge
(split_files, output_file)[source]¶ Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'node-edge': <function Sif.node_edge_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.graph.
XGMMLGraphDataProvider
(source, selector=None, max_depth=None, **kwargs)[source]¶ Bases:
galaxy.datatypes.dataproviders.hierarchy.XMLDataProvider
Provide two lists: nodes, edges:
'nodes': contains objects of the form: { 'id' : <some string id>, 'data': <any extra data> } 'edges': contains objects of the form: { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
-
class
galaxy.datatypes.graph.
SIFGraphDataProvider
(source, indeces=None, column_count=None, column_types=None, parsers=None, parse_columns=True, deliminator='\t', filters=None, **kwargs)[source]¶ Bases:
galaxy.datatypes.dataproviders.column.ColumnarDataProvider
Provide two lists: nodes, edges:
'nodes': contains objects of the form: { 'id' : <some string id>, 'data': <any extra data> } 'edges': contains objects of the form: { 'source' : <an index into nodes>, 'target': <an index into nodes>, 'data': <any extra data> }
-
settings
: Dict[str, str] = {'column_count': 'int', 'column_types': 'list:str', 'comment_char': 'str', 'deliminator': 'str', 'filters': 'list:str', 'indeces': 'list:int', 'invert': 'bool', 'limit': 'int', 'offset': 'int', 'parse_columns': 'bool', 'provide_blank': 'bool', 'regex_list': 'list:escaped', 'strip_lines': 'bool', 'strip_newlines': 'bool'}¶
-
galaxy.datatypes.images module¶
Image classes
-
class
galaxy.datatypes.images.
Image
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class describing an image
-
edam_data
= 'data_2968'¶
-
edam_format
= 'format_3547'¶
-
file_ext
= ''¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek and blurb text
- Parameters
is_multi_byte (bool) – deprecated
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Jpg
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3579'¶
-
file_ext
= 'jpg'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Png
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3603'¶
-
file_ext
= 'png'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Tiff
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3591'¶
-
file_ext
= 'tiff'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Hamamatsu
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'vms'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Mirax
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'mrxs'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Sakura
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'svslide'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Nrrd
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'nrrd'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Bmp
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3592'¶
-
file_ext
= 'bmp'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Gif
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3467'¶
-
file_ext
= 'gif'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Im
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3593'¶
-
file_ext
= 'im'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Pcd
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3594'¶
-
file_ext
= 'pcd'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Pcx
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3595'¶
-
file_ext
= 'pcx'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Ppm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3596'¶
-
file_ext
= 'ppm'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Psd
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3597'¶
-
file_ext
= 'psd'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Xbm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3598'¶
-
file_ext
= 'xbm'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Xpm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3599'¶
-
file_ext
= 'xpm'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Rgb
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3600'¶
-
file_ext
= 'rgb'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Pbm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3601'¶
-
file_ext
= 'pbm'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Pgm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3602'¶
-
file_ext
= 'pgm'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Eps
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3466'¶
-
file_ext
= 'eps'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Rast
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3605'¶
-
file_ext
= 'rast'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Pdf
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
edam_format
= 'format_3508'¶
-
file_ext
= 'pdf'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Tck
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Tracks file format (.tck) format https://mrtrix.readthedocs.io/en/latest/getting_started/image_data.html#tracks-file-format-tck
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('fibers_sparse_top_6_lines.tck') >>> Tck().sniff( fname ) True >>> fname = get_test_fname('2.txt') >>> Tck().sniff( fname ) False
-
file_ext
= 'tck'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.images.
Trk
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Track File format (.trk) is the tractography file format. http://trackvis.org/docs/?subsect=fileformat
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('IIT2mean_top_2000bytes.trk') >>> Trk().sniff( fname ) True >>> fname = get_test_fname('2.txt') >>> Trk().sniff( fname ) False
-
file_ext
= 'trk'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.images.
Gmaj
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class describing a GMAJ Applet
-
edam_format
= 'format_3547'¶
-
file_ext
= 'gmaj.zip'¶
-
copy_safe_peek
= False¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek and blurb text
- Parameters
is_multi_byte (bool) – deprecated
-
sniff
(filename)[source]¶ NOTE: the sniff.convert_newlines() call in the upload utility will keep Gmaj data types from being correctly sniffed, but the files can be uploaded (they’ll be sniffed as ‘txt’). This sniff function is here to provide an example of a sniffer for a zip file.
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Analyze75
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Mayo Analyze 7.5 files http://www.imzml.org
-
file_ext
= 'analyze75'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Nifti1
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Nifti1 format https://nifti.nimh.nih.gov/pub/dist/src/niftilib/nifti1.h
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('T1_top_350bytes.nii1') >>> Nifti1().sniff( fname ) True >>> fname = get_test_fname('2.txt') >>> Nifti1().sniff( fname ) False
-
file_ext
= 'nii1'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.images.
Nifti2
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Nifti2 format https://brainder.org/2015/04/03/the-nifti-2-file-format/
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('avg152T1_LR_nifti2_top_100bytes.nii2') >>> Nifti2().sniff( fname ) True >>> fname = get_test_fname('T1_top_350bytes.nii1') >>> Nifti2().sniff( fname ) False
-
file_ext
= 'nii2'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.images.
Gifti
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Class describing a Gifti format
-
file_ext
= 'gii'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a Gifti file
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('Human.colin.R.activations.label.gii') >>> Gifti().sniff(fname) True >>> fname = get_test_fname('interval.interval') >>> Gifti().sniff(fname) False >>> fname = get_test_fname('megablast_xml_parser_test1.blastxml') >>> Gifti().sniff(fname) False >>> fname = get_test_fname('tblastn_four_human_vs_rhodopsin.blastxml') >>> Gifti().sniff(fname) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.images.
Html
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
Deprecated class. This class should not be used anymore, but the galaxy.datatypes.text:Html one. This is for backwards compatibilities only.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.images.
Laj
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a LAJ Applet
-
file_ext
= 'laj'¶
-
copy_safe_peek
= False¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.interval module¶
Interval datatypes
-
class
galaxy.datatypes.interval.
Interval
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data containing interval information
-
edam_data
= 'data_3002'¶
-
edam_format
= 'format_3475'¶
-
file_ext
= 'interval'¶
-
line_class
= 'region'¶
-
set_meta
(dataset, overwrite=True, first_line_is_header=False, **kwd)[source]¶ Tries to guess from the line the location number of the column for the chromosome, region start-end and strand
-
get_estimated_display_viewport
(dataset, chrom_col=None, start_col=None, end_col=None)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
ucsc_links
(dataset, type, app, base_url)[source]¶ Generate links to UCSC genome browser sites based on the dbkey and content of dataset.
-
sniff_prefix
(file_prefix)[source]¶ Checks for ‘intervalness’
This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'test_space.txt' ) >>> Interval().sniff( fname ) False >>> fname = get_test_fname( 'interval.interval' ) >>> Interval().sniff( fname ) True
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Interval.genomic_region_dataprovider>, 'genomic-region-dict': <function Interval.genomic_region_dict_dataprovider>, 'interval': <function Interval.interval_dataprovider>, 'interval-dict': <function Interval.interval_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.interval.
BedGraph
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Tab delimited chrom/start/end/datavalue dataset
-
edam_format
= 'format_3583'¶
-
file_ext
= 'bedgraph'¶
-
as_ucsc_display_file
(dataset, **kwd)[source]¶ Returns file contents as is with no modifications. TODO: this is a functional stub and will need to be enhanced moving forward to provide additional support for bedgraph.
-
get_estimated_display_viewport
(dataset, chrom_col=0, start_col=1, end_col=2)[source]¶ Set viewport based on dataset’s first 100 lines.
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Bed
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Tab delimited data in BED format
-
edam_format
= 'format_3003'¶
-
file_ext
= 'bed'¶
-
column_names
= ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts']¶ Add metadata elements
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Sets the metadata information for datasets previously determined to be in bed format.
-
as_ucsc_display_file
(dataset, **kwd)[source]¶ Returns file contents with only the bed data. If bed 6+, treat as interval.
-
sniff_prefix
(file_prefix)[source]¶ Checks for ‘bedness’
BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used. The data type of all 12 columns is: 1-str, 2-int, 3-int, 4-str, 5-int, 6-str, 7-int, 8-int, 9-int or list, 10-int, 11-list, 12-list
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format1
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'test_tab.bed' ) >>> Bed().sniff( fname ) True >>> fname = get_test_fname( 'interv1.bed' ) >>> Bed().sniff( fname ) True >>> fname = get_test_fname( 'complete.bed' ) >>> Bed().sniff( fname ) True
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
ProBed
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Bed
Tab delimited data in proBED format - adaptation of BED for proteomics data.
-
edam_format
= 'format_3827'¶
-
file_ext
= 'probed'¶
-
column_names
= ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'ThickStart', 'ThickEnd', 'ItemRGB', 'BlockCount', 'BlockSizes', 'BlockStarts', 'ProteinAccession', 'PeptideSequence', 'Uniqueness', 'GenomeReferenceVersion', 'PsmScore', 'Fdr', 'Modifications', 'Charge', 'ExpMassToCharge', 'CalcMassToCharge', 'PsmRank', 'DatasetID', 'Uri']¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
BedStrict
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Bed
Tab delimited data in strict BED format - no non-standard columns allowed
-
edam_format
= 'format_3584'¶
-
file_ext
= 'bedstrict'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Sets the metadata information for datasets previously determined to be in bed format.
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Bed6
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.BedStrict
Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6
-
edam_format
= 'format_3585'¶
-
file_ext
= 'bed6'¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Bed12
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.BedStrict
Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12
-
edam_format
= 'format_3586'¶
-
file_ext
= 'bed12'¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Gff
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
,galaxy.datatypes.interval._RemoteCallMixin
Tab delimited data in Gff format
-
edam_data
= 'data_1255'¶
-
edam_format
= 'format_2305'¶
-
file_ext
= 'gff'¶
-
valid_gff_frame
= ['.', '0', '1', '2']¶
-
column_names
= ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group']¶
-
data_sources
: Dict[str, str] = {'data': 'interval_index', 'feature_search': 'fli', 'index': 'bigwig'}¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
get_estimated_display_viewport
(dataset)[source]¶ Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 formats. This function should correctly handle both…
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in gff format
GFF lines have nine required fields that must be tab-separated.
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format3
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('gff.gff3') >>> Gff().sniff( fname ) False >>> fname = get_test_fname('test.gff') >>> Gff().sniff( fname ) True
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Gff.genomic_region_dataprovider>, 'genomic-region-dict': <function Gff.genomic_region_dict_dataprovider>, 'interval': <function Gff.interval_dataprovider>, 'interval-dict': <function Gff.interval_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.interval.
Gff3
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Gff
Tab delimited data in Gff3 format
-
edam_format
= 'format_1975'¶
-
file_ext
= 'gff3'¶
-
valid_gff3_strand
= ['+', '-', '.', '?']¶
-
valid_gff3_phase
= ['.', '0', '1', '2']¶
-
column_names
= ['Seqid', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes']¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in GFF version 3 format
GFF 3 format:
adds a mechanism for representing more than one level of hierarchical grouping of features and subfeatures.
separates the ideas of group membership and feature name/id
constrains the feature type field to be taken from a controlled vocabulary.
allows a single feature, such as an exon, to belong to more than one group at a time.
provides an explicit convention for pairwise alignments
provides an explicit convention for features that occupy disjunct regions
The format consists of 9 columns, separated by tabs (NOT spaces).
Undefined fields are replaced with the “.” character, as described in the original GFF spec.
For complete details see http://song.sourceforge.net/gff3.shtml
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'test.gff' ) >>> Gff3().sniff( fname ) False >>> fname = get_test_fname( 'test.gtf' ) >>> Gff3().sniff( fname ) False >>> fname = get_test_fname('gff.gff3') >>> Gff3().sniff( fname ) True >>> fname = get_test_fname( 'grch37.75.gtf' ) >>> Gff3().sniff( fname ) False
-
metadata_spec
= {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Gtf
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Gff
Tab delimited data in Gtf format
-
edam_format
= 'format_2306'¶
-
file_ext
= 'gtf'¶
-
column_names
= ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes']¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in gtf format
GTF lines have nine required fields that must be tab-separated. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space. The attribute list must begin with the two mandatory attributes:
gene_id value - A globally unique identifier for the genomic source of the sequence. transcript_id value - A globally unique identifier for the predicted transcript.
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format4
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( '1.bed' ) >>> Gtf().sniff( fname ) False >>> fname = get_test_fname( 'test.gff' ) >>> Gtf().sniff( fname ) False >>> fname = get_test_fname( 'test.gtf' ) >>> Gtf().sniff( fname ) True >>> fname = get_test_fname( 'grch37.75.gtf' ) >>> Gtf().sniff( fname ) True
-
metadata_spec
= {'attribute_types': <galaxy.model.metadata.MetadataElementSpec object>, 'attributes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
Wiggle
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
,galaxy.datatypes.interval._RemoteCallMixin
Tab delimited data in wiggle format
-
edam_format
= 'format_3005'¶
-
file_ext
= 'wig'¶
-
get_estimated_display_viewport
(dataset)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
sniff_prefix
(file_prefix)[source]¶ Determines wether the file is in wiggle format
The .wig format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line is the track data, which can be entered in several different formats.
The track definition line begins with the word ‘track’ followed by the track type. The track type with version is REQUIRED, and it currently must be wiggle_0. For example, track type=wiggle_0…
For complete details see http://genome.ucsc.edu/goldenPath/help/wiggle.html
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'interv1.bed' ) >>> Wiggle().sniff( fname ) False >>> fname = get_test_fname( 'wiggle.wig' ) >>> Wiggle().sniff( fname ) True
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'wiggle': <function Wiggle.wiggle_dataprovider>, 'wiggle-dict': <function Wiggle.wiggle_dict_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.interval.
CustomTrack
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
UCSC CustomTrack
-
edam_format
= 'format_3588'¶
-
file_ext
= 'customtrack'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
get_estimated_display_viewport
(dataset, chrom_col=None, start_col=None, end_col=None)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in customtrack format.
CustomTrack files are built within Galaxy and are basically bed or interval files with the first line looking something like this.
track name=”User Track” description=”User Supplied Track (from Galaxy)” color=0,0,0 visibility=1
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'complete.bed' ) >>> CustomTrack().sniff( fname ) False >>> fname = get_test_fname( 'ucsc.customtrack' ) >>> CustomTrack().sniff( fname ) True
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.interval.
ENCODEPeak
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Human ENCODE peak format. There are both broad and narrow peak formats. Formats are very similar; narrow peak has an additional column, though.
Broad peak ( http://genome.ucsc.edu/FAQ/FAQformat#format13 ): This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+3 format.
Narrow peak http://genome.ucsc.edu/FAQ/FAQformat#format12 and : This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.
-
edam_format
= 'format_3612'¶
-
file_ext
= 'encodepeak'¶
-
column_names
= ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'SignalValue', 'pValue', 'qValue', 'Peak']¶
-
metadata_spec
= {'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
ChromatinInteractions
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Chromatin interactions obtained from 3C/5C/Hi-C experiments.
-
file_ext
= 'chrint'¶
-
column_names
= ['Chrom1', 'Start1', 'End1', 'Chrom2', 'Start2', 'End2', 'Value']¶ Add metadata elements
-
metadata_spec
= {'chrom1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'chrom2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'end1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'end2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'nameCol': <galaxy.model.metadata.MetadataElementSpec object>, 'start1Col': <galaxy.model.metadata.MetadataElementSpec object>, 'start2Col': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>, 'strandCol': <galaxy.model.metadata.MetadataElementSpec object>, 'valueCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.interval.
ScIdx
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
ScIdx files are 1-based and consist of strand-specific coordinate counts. They always have 5 columns, and the first row is the column labels: ‘chrom’, ‘index’, ‘forward’, ‘reverse’, ‘value’. Each line following the first consists of data: chromosome name (type str), peak index (type int), Forward strand peak count (type int), Reverse strand peak count (type int) and value (type int). The value of the 5th ‘value’ column is the sum of the forward and reverse peak count values.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
file_ext
= 'scidx'¶
-
galaxy.datatypes.isa module¶
ISA datatype
galaxy.datatypes.media module¶
Video classes
-
class
galaxy.datatypes.media.
Audio
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
set_meta
(dataset, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Video
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
set_meta
(dataset, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Mkv
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Video
-
file_ext
= 'mkv'¶
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Mp4
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Video
Class that reads MP4 video file. >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Mp4, ‘video_1.mp4’) True >>> sniff_with_cls(Mp4, ‘audio_1.mp4’) False
-
file_ext
= 'mp4'¶
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Flv
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Video
-
file_ext
= 'flv'¶
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Mpg
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Video
-
file_ext
= 'mpg'¶
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'fps': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_h': <galaxy.model.metadata.MetadataElementSpec object>, 'resolution_w': <galaxy.model.metadata.MetadataElementSpec object>, 'video_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'video_streams': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Mp3
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Audio
Class that reads MP3 audio file. >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Mp3, ‘audio_2.mp3’) True >>> sniff_with_cls(Mp3, ‘audio_1.wav’) False
-
file_ext
= 'mp3'¶
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.media.
Wav
(**kwd)[source]¶ Bases:
galaxy.datatypes.media.Audio
Class that reads WAV audio file >>> from galaxy.datatypes.sniff import sniff_with_cls >>> sniff_with_cls(Wav, ‘hello.wav’) True >>> sniff_with_cls(Wav, ‘audio_2.mp3’) False >>> sniff_with_cls(Wav, ‘drugbank_drugs.cml’) False
-
file_ext
= 'wav'¶
-
blurb
= 'RIFF WAV Audio file'¶
-
is_binary
= True¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Set the metadata for this dataset from the file contents.
-
metadata_spec
= {'audio_codecs': <galaxy.model.metadata.MetadataElementSpec object>, 'audio_streams': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'duration': <galaxy.model.metadata.MetadataElementSpec object>, 'nchannels': <galaxy.model.metadata.MetadataElementSpec object>, 'nframes': <galaxy.model.metadata.MetadataElementSpec object>, 'rate': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_rates': <galaxy.model.metadata.MetadataElementSpec object>, 'sampwidth': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.metadata module¶
Expose the model metadata module as a datatype module also, allowing it to live in galaxy.model means the model module doesn’t have any dependencies on th datatypes module. This module will need to remain here for datatypes living in the tool shed so we might as well keep and use this interface from the datatypes module.
-
class
galaxy.datatypes.metadata.
Statement
(target)[source]¶ Bases:
object
This class inserts its target into a list in the surrounding class. the data.Data class has a metaclass which executes these statements. This is how we shove the metadata element spec into the class.
-
class
galaxy.datatypes.metadata.
MetadataCollection
(parent)[source]¶ Bases:
collections.abc.Mapping
MetadataCollection is not a collection at all, but rather a proxy to the real metadata which is stored as a Dictionary. This class handles processing the metadata elements when they are set and retrieved, returning default values in cases when metadata is not set.
-
property
parent
¶
-
property
spec
¶
-
element_is_set
(name)[source]¶ check if the meta data with the given name is set, i.e.
if the such a metadata actually exists and
if its value differs from no_value
- Parameters
name – the name of the metadata element
- Returns
True if the value differes from the no_value False if its equal of if no metadata with the name is specified
-
property
requires_dataset_id
¶
-
property
-
class
galaxy.datatypes.metadata.
MetadataSpecCollection
(*args, **kwds)[source]¶ Bases:
collections.OrderedDict
A simple extension of OrderedDict which allows cleaner access to items and allows the values to be iterated over directly as if it were a list. append() is also implemented for simplicity and does not “append”.
-
class
galaxy.datatypes.metadata.
MetadataParameter
(spec)[source]¶ Bases:
object
-
classmethod
marshal
(value)[source]¶ This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.
-
classmethod
-
class
galaxy.datatypes.metadata.
MetadataElementSpec
(datatype, name=None, desc=None, param=<class 'galaxy.model.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]¶ Bases:
object
Defines a metadata element and adds it to the metadata_spec (which is a MetadataSpecCollection) of datatype.
-
class
galaxy.datatypes.metadata.
FileParameter
(spec)[source]¶ Bases:
galaxy.model.metadata.MetadataParameter
-
classmethod
marshal
(value)[source]¶ This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.
-
from_external_value
(value, parent, path_rewriter=None)[source]¶ Turns a value read from a external dict into its value to be pushed directly into the metadata dict.
-
classmethod
galaxy.datatypes.microarrays module¶
-
class
galaxy.datatypes.microarrays.
GenericMicroarrayFile
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Abstract class for most of the microarray files.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.microarrays.
Gal
(**kwd)[source]¶ Bases:
galaxy.datatypes.microarrays.GenericMicroarrayFile
Gal File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gal
-
edam_format
= 'format_3829'¶
-
edam_data
= 'data_3110'¶
-
file_ext
= 'gal'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a Gal file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘test.gal’) >>> Gal().sniff(fname) True >>> fname = get_test_fname(‘test.gpr’) >>> Gal().sniff(fname) False
-
metadata_spec
= {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.microarrays.
Gpr
(**kwd)[source]¶ Bases:
galaxy.datatypes.microarrays.GenericMicroarrayFile
Gpr File format described at: http://mdc.custhelp.com/app/answers/detail/a_id/18883/#gpr
-
edam_format
= 'format_3829'¶
-
edam_data
= 'data_3110'¶
-
file_ext
= 'gpr'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a Gpr file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘test.gpr’) >>> Gpr().sniff(fname) True >>> fname = get_test_fname(‘test.gal’) >>> Gpr().sniff(fname) False
-
metadata_spec
= {'block_count': <galaxy.model.metadata.MetadataElementSpec object>, 'block_type': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'file_format': <galaxy.model.metadata.MetadataElementSpec object>, 'file_type': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_data_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_optional_header_records': <galaxy.model.metadata.MetadataElementSpec object>, 'version_number': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.molecules module¶
-
galaxy.datatypes.molecules.
count_lines
(filename, non_empty=False)[source]¶ counting the number of lines from the ‘filename’ file
-
class
galaxy.datatypes.molecules.
GenericMolFile
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Abstract class for most of the molecule files.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
MOL
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
-
file_ext
= 'mol'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
SDF
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
-
file_ext
= 'sdf'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a SDF2 file.
An SDfile (structure-data file) can contain multiple compounds.
Each compound starts with a block in V2000 or V3000 molfile format, which ends with a line equal to ‘M END’. This is followed by a non-structural data block, which ends with a line equal to ‘$$$$’.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('drugbank_drugs.sdf') >>> SDF().sniff(fname) True >>> fname = get_test_fname('github88.v3k.sdf') >>> SDF().sniff(fname) True >>> fname = get_test_fname('chebi_57262.v3k.mol') >>> SDF().sniff(fname) False
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by molecule records.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
MOL2
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
-
file_ext
= 'mol2'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a MOL2 file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('drugbank_drugs.mol2') >>> MOL2().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> MOL2().sniff(fname) False
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by molecule records.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
FPS
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
chemfp fingerprint file: http://code.google.com/p/chem-fingerprints/wiki/FPS
-
file_ext
= 'fps'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a FPS file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('q.fps') >>> FPS().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> FPS().sniff(fname) False
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by fingerprint records.
-
static
merge
(split_files, output_file)[source]¶ Merging fps files requires merging the header manually. We take the header from the first file.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
OBFS
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
OpenBabel Fastsearch format (fs).
-
file_ext
= 'obfs'¶
-
__init__
(**kwd)[source]¶ A Fastsearch Index consists of a binary file with the fingerprints and a pointer the actual molecule file.
-
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Splitting Fastsearch indices is not supported.
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
DRF
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
-
file_ext
= 'drf'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
PHAR
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
Pharmacophore database format from silicos-it.
-
file_ext
= 'phar'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
PDB
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
Protein Databank format. http://www.wwpdb.org/documentation/format33/v3.3.html
-
file_ext
= 'pdb'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a PDB file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('5e5z.pdb') >>> PDB().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> PDB().sniff(fname) False
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
PDBQT
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
PDBQT Autodock and Autodock Vina format http://autodock.scripps.edu/faqs-help/faq/what-is-the-format-of-a-pdbqt-file
-
file_ext
= 'pdbqt'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a PDBQT file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('NuBBE_1_obabel_3D.pdbqt') >>> PDBQT().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> PDBQT().sniff(fname) False
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
PQR
(**kwd)[source]¶ Bases:
galaxy.datatypes.molecules.GenericMolFile
Protein Databank format. https://apbs-pdb2pqr.readthedocs.io/en/latest/formats/pqr.html
-
file_ext
= 'pqr'¶
-
get_matcher
()[source]¶ - Atom and HETATM line fields are space separated, match group:
- 0: Field_name
A string which specifies the type of PQR entry: ATOM or HETATM.
- 1: Atom_number
An integer which provides the atom index.
- 2: Atom_name
A string which provides the atom name.
- 3: Residue_name
A string which provides the residue name.
- 5: Chain_ID (Optional, group 4 is whole field)
An optional string which provides the chain ID of the atom. Note that chain ID support is a new feature of APBS 0.5.0 and later versions.
- 6: Residue_number
An integer which provides the residue index.
- 7: X 8: Y 9: Z
3 floats which provide the atomic coordinates (in angstroms)
- 10: Charge
A float which provides the atomic charge (in electrons).
- 11: Radius
A float which provides the atomic radius (in angstroms).
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a PQR file. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘5e5z.pqr’) >>> PQR().sniff(fname) True >>> fname = get_test_fname(‘drugbank_drugs.cml’) >>> PQR().sniff(fname) False
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'chain_ids': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
grd
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'grd'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
grdtgz
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
file_ext
= 'grd.tgz'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
InChI
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'inchi'¶
-
column_names
= ['InChI']¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a InChI file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('drugbank_drugs.inchi') >>> InChI().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> InChI().sniff(fname) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.molecules.
SMILES
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'smi'¶
-
column_names
= ['SMILES', 'TITLE']¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.molecules.
CML
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Chemical Markup Language http://cml.sourceforge.net/
-
file_ext
= 'cml'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to guess if the file is a CML file.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('interval.interval') >>> CML().sniff(fname) False >>> fname = get_test_fname('drugbank_drugs.cml') >>> CML().sniff(fname) True
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by molecule records.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_molecules': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.mothur module¶
Mothur Metagenomics Datatypes
-
class
galaxy.datatypes.mothur.
Otu
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mothur.otu'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Set metadata for Otu files.
>>> from galaxy.datatypes.sniff import get_test_fname >>> from galaxy.util.bunch import Bunch >>> dataset = Bunch() >>> dataset.metadata = Bunch >>> otu = Otu() >>> dataset.file_name = get_test_fname( 'mothur_datatypetest_true.mothur.otu' ) >>> dataset.has_data = lambda: True >>> otu.set_meta(dataset) >>> dataset.metadata.columns 100 >>> len(dataset.metadata.labels) == 37 True >>> len(dataset.metadata.otulabels) == 98 True
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is otu (operational taxonomic unit) format
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.otu' ) >>> Otu().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.otu' ) >>> Otu().sniff( fname ) False
-
metadata_spec
= {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
Sabund
(**kwd)[source]¶ Bases:
galaxy.datatypes.mothur.Otu
-
file_ext
= 'mothur.sabund'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is otu (operational taxonomic unit) format label<TAB>count[<TAB>value(1..n)]
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.sabund' ) >>> Sabund().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.sabund' ) >>> Sabund().sniff( fname ) False
-
metadata_spec
= {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
GroupAbund
(**kwd)[source]¶ Bases:
galaxy.datatypes.mothur.Otu
-
file_ext
= 'mothur.shared'¶
-
set_meta
(dataset, overwrite=True, skip=1, **kwd)[source]¶ Set metadata for Otu files.
>>> from galaxy.datatypes.sniff import get_test_fname >>> from galaxy.util.bunch import Bunch >>> dataset = Bunch() >>> dataset.metadata = Bunch >>> otu = Otu() >>> dataset.file_name = get_test_fname( 'mothur_datatypetest_true.mothur.otu' ) >>> dataset.has_data = lambda: True >>> otu.set_meta(dataset) >>> dataset.metadata.columns 100 >>> len(dataset.metadata.labels) == 37 True >>> len(dataset.metadata.otulabels) == 98 True
-
sniff_prefix
(file_prefix, vals_are_int=False)[source]¶ Determines whether the file is a otu (operational taxonomic unit) Shared format label<TAB>group<TAB>count[<TAB>value(1..n)] The first line is column headings as of Mothur v 1.2
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.shared' ) >>> GroupAbund().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.shared' ) >>> GroupAbund().sniff( fname ) False
-
metadata_spec
= {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>, 'labels': <galaxy.model.metadata.MetadataElementSpec object>, 'otulabels': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
SecondaryStructureMap
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.map'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a secondary structure map format A single column with an integer value which indicates the row that this row maps to. Check to make sure if structMap[10] = 380 then structMap[380] = 10 and vice versa.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.map' ) >>> SecondaryStructureMap().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.map' ) >>> SecondaryStructureMap().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
AlignCheck
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.align.check'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
AlignReport
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template AY457915 501 82283 1525 kmer 89.07 needleman 5 501 1 499 499 2 0 0 97.6
-
file_ext
= 'mothur.align.report'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
DistanceMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mothur.dist'¶ Add metadata elements
-
set_meta
(dataset, overwrite=True, skip=0, **kwd)[source]¶ Set the number of lines of data in dataset.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
LowerTriangleDistanceMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.mothur.DistanceMatrix
-
file_ext
= 'mothur.lower.dist'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a lower-triangle distance matrix (phylip) format The first line has the number of sequences in the matrix. The remaining lines have the sequence name followed by a list of distances from all preceeding sequences
5 # possibly but not always preceded by a tab :/ U68589 U68590 0.3371 U68591 0.3609 0.3782 U68592 0.4155 0.3197 0.4148 U68593 0.2872 0.1690 0.3361 0.2842
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.lower.dist' ) >>> LowerTriangleDistanceMatrix().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.lower.dist' ) >>> LowerTriangleDistanceMatrix().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
SquareDistanceMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.mothur.DistanceMatrix
-
file_ext
= 'mothur.square.dist'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a square distance matrix (Column-formatted distance matrix) format The first line has the number of sequences in the matrix. The following lines have the sequence name in the first column plus a column for the distance to each sequence in the row order in which they appear in the matrix.
3 U68589 0.0000 0.3371 0.3610 U68590 0.3371 0.0000 0.3783 U68590 0.3371 0.0000 0.3783
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.square.dist' ) >>> SquareDistanceMatrix().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.square.dist' ) >>> SquareDistanceMatrix().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
PairwiseDistanceMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.mothur.DistanceMatrix
,galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.pair.dist'¶
-
set_meta
(dataset, overwrite=True, skip=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a pairwise distance matrix (Column-formatted distance matrix) format The first and second columns have the sequence names and the third column is the distance between those sequences.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.pair.dist' ) >>> PairwiseDistanceMatrix().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.pair.dist' ) >>> PairwiseDistanceMatrix().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_count': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
Names
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.names'¶
-
__init__
(**kwd)[source]¶ http://www.mothur.org/wiki/Name_file Name file shows the relationship between a representative sequence(col 1) and the sequences(comma-separated) it represents(col 2)
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
Summary
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.summary'¶
-
__init__
(**kwd)[source]¶ summarizes the quality of sequences in an unaligned or aligned fasta-formatted sequence file
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
Group
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.groups'¶
-
__init__
(**kwd)[source]¶ http://www.mothur.org/wiki/Groups_file Group file assigns sequence (col 1) to a group (col 2)
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
AccNos
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.accnos'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
Oligos
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mothur.oligos'¶
-
sniff_prefix
(file_prefix)[source]¶ http://www.mothur.org/wiki/Oligos_File Determines whether the file is a otu (operational taxonomic unit) format
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.oligos' ) >>> Oligos().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.oligos' ) >>> Oligos().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
Frequency
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.freq'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a frequency tabular format for chimera analysis
#1.14.0 0 0.000 1 0.000 ... 155 0.975
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.freq' ) >>> Frequency().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.freq' ) >>> Frequency().sniff( fname ) False >>> # Expression count matrix (EdgeR wrapper) >>> fname = get_test_fname( 'mothur_datatypetest_false_2.mothur.freq' ) >>> Frequency().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
Quantile
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.quan'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a quantiles tabular format for chimera analysis
1 0 0 0 0 0 0 2 0.309198 0.309198 0.37161 0.37161 0.37161 0.37161 3 0.510982 0.563213 0.693529 0.858939 1.07442 1.20608 ...
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.quan' ) >>> Quantile().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.quan' ) >>> Quantile().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'filtered': <galaxy.model.metadata.MetadataElementSpec object>, 'masked': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
LaneMask
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mothur.filter'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a lane mask filter: 1 line consisting of zeros and ones.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.filter' ) >>> LaneMask().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.filter' ) >>> LaneMask().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
CountTable
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.count_table'¶
-
__init__
(**kwd)[source]¶ http://www.mothur.org/wiki/Count_File A table with first column names and following columns integer counts # Example 1: Representative_Sequence total U68630 1 U68595 1 U68600 1 # Example 2 (with group columns): Representative_Sequence total forest pasture U68630 1 1 0 U68595 1 1 0 U68600 1 1 0 U68591 1 1 0 U68647 1 0 1
-
set_meta
(dataset, overwrite=True, skip=1, max_data_lines=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'groups': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
RefTaxonomy
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.ref.taxonomy'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is a Reference Taxonomy
http://www.mothur.org/wiki/Taxonomy_outline A table with 2 or 3 columns:
SequenceName
Taxonomy (semicolon-separated taxonomy in descending order)
integer ?
Example: 2-column (http://www.mothur.org/wiki/Taxonomy_outline)
X56533.1 Eukaryota;Alveolata;Ciliophora;Intramacronucleata;Oligohymenophorea;Hymenostomatida;Tetrahymenina;Glaucomidae;Glaucoma; X97975.1 Eukaryota;Parabasalidea;Trichomonada;Trichomonadida;unclassified_Trichomonadida; AF052717.1 Eukaryota;Parabasalidea;
Example: 3-column (http://vamps.mbl.edu/resources/databases.php)
v3_AA008 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus 5 v3_AA016 Bacteria 120 v3_AA019 Archaea;Crenarchaeota;Marine_Group_I 1
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.ref.taxonomy' ) >>> RefTaxonomy().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.ref.taxonomy' ) >>> RefTaxonomy().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
ConsensusTaxonomy
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.cons.taxonomy'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
TaxonomySummary
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.tax.summary'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.mothur.
Axes
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.axes'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is an axes format The first line may have column headings. The following lines have the name in the first column plus float columns for each axis.
group axis1 axis2 forest 0.000000 0.145743 pasture 0.145743 0.000000
axis1 axis2 U68589 0.262608 -0.077498 U68590 0.027118 0.195197 U68591 0.329854 0.014395
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'mothur_datatypetest_true.mothur.axes' ) >>> Axes().sniff( fname ) True >>> fname = get_test_fname( 'mothur_datatypetest_false.mothur.axes' ) >>> Axes().sniff( fname ) False
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.mothur.
SffFlow
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'mothur.sff.flow'¶ https://mothur.org/wiki/flow_file/ The first line is the total number of flow values - 800 for Titanium data. For GS FLX it would be 400. Following lines contain:
SequenceName
the number of useable flows as defined by 454’s software
the flow intensity for each base going in the order of TACG.
Example:
800 GQY1XT001CQL4K 85 1.04 0.00 1.00 0.02 0.03 1.02 0.05 ... GQY1XT001CQIRF 84 1.02 0.06 0.98 0.06 0.09 1.05 0.07 ... GQY1XT001CF5YW 88 1.02 0.02 1.01 0.04 0.06 1.02 0.03 ...
-
set_meta
(dataset, overwrite=True, skip=1, max_data_lines=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'flow_order': <galaxy.model.metadata.MetadataElementSpec object>, 'flow_values': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.msa module¶
-
class
galaxy.datatypes.msa.
InfernalCM
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'cm'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'infernal_model.cm' ) >>> InfernalCM().sniff( fname ) True >>> fname = get_test_fname( '2.txt' ) >>> InfernalCM().sniff( fname ) False
-
metadata_spec
= {'cm_version': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.msa.
Hmmer
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
edam_data
= 'data_1364'¶
-
edam_format
= 'format_1370'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.msa.
Hmmer2
(**kwd)[source]¶ Bases:
galaxy.datatypes.msa.Hmmer
-
edam_format
= 'format_3328'¶
-
file_ext
= 'hmm2'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.msa.
Hmmer3
(**kwd)[source]¶ Bases:
galaxy.datatypes.msa.Hmmer
-
edam_format
= 'format_3329'¶
-
file_ext
= 'hmm3'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.msa.
HmmerPress
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for hmmpress database files.
-
file_ext
= 'hmmpress'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.msa.
Stockholm_1_0
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
edam_data
= 'data_0863'¶
-
edam_format
= 'format_1961'¶
-
file_ext
= 'stockholm'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split the input files by model records.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.msa.
MauveXmfa
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'xmfa'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_models': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.msa.
Msf
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Multiple sequence alignment format produced by the Accelrys GCG suite and other programs.
-
edam_data
= 'data_0863'¶
-
edam_format
= 'format_1947'¶
-
file_ext
= 'msf'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.neo4j module¶
Neo4j Composite Dataset
-
class
galaxy.datatypes.neo4j.
Neo4j
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Html
base class to use for neostore datatypes derived from html - composite datatype elements stored in extra files path
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.neo4j.
Neo4jDB
(**kwd)[source]¶ Bases:
galaxy.datatypes.neo4j.Neo4j
,galaxy.datatypes.data.Data
Class for neo4jDB database files.
-
file_ext
= 'neostore'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.neo4j.
Neo4jDBzip
(**kwd)[source]¶ Bases:
galaxy.datatypes.neo4j.Neo4j
,galaxy.datatypes.data.Data
Class for neo4jDB database files.
-
file_ext
= 'neostore.zip'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'neostore_zip': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_name': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.ngsindex module¶
NGS indexes
-
class
galaxy.datatypes.ngsindex.
BowtieIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
base class for BowtieIndex is subclassed by BowtieColorIndex and BowtieBaseIndex
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.ngsindex.
BowtieColorIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.ngsindex.BowtieIndex
Bowtie color space index
-
file_ext
= 'bowtie_color_index'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.ngsindex.
BowtieBaseIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.ngsindex.BowtieIndex
Bowtie base space index
-
file_ext
= 'bowtie_base_index'¶
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequence_space': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.phylip module¶
Created on January. 05, 2018
@authors: Kenzo-Hugo Hillion and Fabien Mareuil, Institut Pasteur, Paris @contacts: kehillio@pasteur.fr and fabien.mareuil@pasteur.fr @project: galaxy @githuborganization: C3BI Phylip datatype sniffer
-
class
galaxy.datatypes.phylip.
Phylip
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Phylip format stores a multiple sequence alignment
-
edam_data
= 'data_0863'¶
-
edam_format
= 'format_1997'¶
-
file_ext
= 'phylip'¶ Add metadata elements
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
sniff_prefix
(file_prefix)[source]¶ All Phylip files starts with the number of sequences so we can use this to count the following number of sequences in the first ‘stack’
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test_strict_interleaved.phylip') >>> Phylip().sniff(fname) True >>> fname = get_test_fname('test_relaxed_interleaved.phylip') >>> Phylip().sniff(fname) True
-
galaxy.datatypes.plant_tribes module¶
-
class
galaxy.datatypes.plant_tribes.
Smat
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'smat'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ The use of ESTScan implies the creation of scores matrices which reflect the codons preferences in the studied organisms. The ESTScan package includes scripts for generating these files. The output of these scripts consists of the matrices, one for each isochor, and which look like this:
FORMAT: hse_4is.conf CODING REGION 6 3 1 s C+G: 0 44 -1 0 2 -2 2 1 -8 0
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test_space.txt') >>> Smat().sniff(fname) False >>> fname = get_test_fname('test_tab.bed') >>> Smat().sniff(fname) False >>> fname = get_test_fname('1.smat') >>> Smat().sniff(fname) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.plant_tribes.
PlantTribesKsComponents
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'ptkscmp'¶
-
set_meta
(dataset, **kwd)[source]¶ Set the number of significant components in the Ks distribution. The dataset will always be on the order of less than 10 lines.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff
(filename)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test_tab.bed') >>> PlantTribesKsComponents().sniff(fname) False >>> fname = get_test_fname('1.ptkscmp') >>> PlantTribesKsComponents().sniff(fname) True
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'number_comp': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.proteomics module¶
Proteomics Datatypes
-
class
galaxy.datatypes.proteomics.
Wiff
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for wiff files.
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_3710'¶
-
file_ext
= 'wiff'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzTab
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
exchange format for proteomics and metabolomics results
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.mztab') >>> MzTab().sniff(fname) True >>> fname = get_test_fname('test.mztab2') >>> MzTab().sniff(fname) False
-
edam_data
= 'data_3681'¶
-
file_ext
= 'mztab'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
MzTab2
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.MzTab
exchange format for proteomics and metabolomics results
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.mztab2') >>> MzTab2().sniff(fname) True >>> fname = get_test_fname('test.mztab') >>> MzTab2().sniff(fname) False
-
file_ext
= 'mztab2'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Kroenik
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Kroenik (HardKloer sibling) files
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.kroenik') >>> Kroenik().sniff(fname) True >>> fname = get_test_fname('test.peplist') >>> Kroenik().sniff(fname) False
-
file_ext
= 'kroenik'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
PepList
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Peplist file as used in OpenMS https://github.com/OpenMS/OpenMS/blob/0fc8765670a0ad625c883f328de60f738f7325a4/src/openms/source/FORMAT/FileHandler.cpp#L432
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.peplist') >>> PepList().sniff(fname) True >>> fname = get_test_fname('test.psms') >>> PepList().sniff(fname) False
-
file_ext
= 'peplist'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
PSMS
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Percolator tab-delimited output (PSM level, .psms) as used in OpenMS https://github.com/OpenMS/OpenMS/blob/0fc8765670a0ad625c883f328de60f738f7325a4/src/openms/source/FORMAT/FileHandler.cpp#L453 see also http://www.kojak-ms.org/docs/percresults.html
Note that the data rows can have more columns than the header line since ProteinIds are listed tab-separated.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.psms') >>> PSMS().sniff(fname) True >>> fname = get_test_fname('test.kroenik') >>> PSMS().sniff(fname) False
-
file_ext
= 'psms'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
PEFF
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
PSI Extended FASTA Format https://github.com/HUPO-PSI/PEFF
-
file_ext
= 'peff'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'test.peff' ) >>> PEFF().sniff( fname ) True >>> fname = get_test_fname( 'sequence.fasta' ) >>> PEFF().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
PepXmlReport
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
pepxml converted to tabular report
-
edam_data
= 'data_2536'¶
-
file_ext
= 'pepxml.tsv'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ProtXmlReport
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
protxml converted to tabular report
-
edam_data
= 'data_2536'¶
-
file_ext
= 'protxml.tsv'¶
-
comment_lines
= 1¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Dta
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
dta The first line contains the singly protonated peptide mass (MH+) and the peptide charge state separated by a space. Subsequent lines contain space separated pairs of fragment ion m/z and intensity values.
-
file_ext
= 'dta'¶
-
comment_lines
= 0¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Dta2d
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
dta2d: files with three tab/space-separated columns. The default format is: retention time (seconds) , m/z , intensity. If the first line starts with ‘#’, a different order is defined by the the order of the keywords ‘MIN’ (retention time in minutes) or ‘SEC’ (retention time in seconds), ‘MZ’, and ‘INT’. Example: ‘#MZ MIN INT’ The peaks of one retention time have to be in subsequent lines.
Note: sniffer detects (tab or space separated) dta2d files with correct header, wo header seems to generic
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.dta2d') >>> Dta2d().sniff(fname) True >>> fname = get_test_fname('test.edta') >>> Dta2d().sniff(fname) False
-
file_ext
= 'dta2d'¶
-
comment_lines
= 0¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
Edta
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
Input text file containing tab, space or comma separated columns. The separator between columns is checked in the first line in this order.
It supports three variants of this format.
Columns are: RT, MZ, Intensity A header is optional.
Columns are: RT, MZ, Intensity, Charge, <Meta-Data> columns{0,} A header is mandatory.
Columns are: (RT, MZ, Intensity, Charge){1,}, <Meta-Data> columns{0,} Header is mandatory. First quadruplet is the consensus. All following quadruplets describe the sub-features. This variant is discerned from variant #2 by the name of the fifth column, which is required to be RT1 (or rt1). All other column names for sub-features are faithfully ignored.
Note the sniffer only detects files with header.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('test.edta') >>> Edta().sniff(fname) True >>> fname = get_test_fname('test.dta2d') >>> Edta().sniff(fname) False
-
file_ext
= 'edta'¶
-
comment_lines
= 0¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
class
galaxy.datatypes.proteomics.
ProteomicsXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
An enhanced XML datatype used to reuse code across several proteomic/mass-spec datatypes.
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_2032'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ParamXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
store Parameters in XML formal
-
file_ext
= 'paramxml'¶
-
blurb
= 'parameters in xmls'¶
-
root
= 'parameters|PARAMETERS'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
PepXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
pepXML data
-
edam_format
= 'format_3655'¶
-
file_ext
= 'pepxml'¶
-
blurb
= 'pepXML data'¶
-
root
= 'msms_pipeline_analysis'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MascotXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
mzXML data
-
file_ext
= 'mascotxml'¶
-
blurb
= 'mascot Mass Spectrometry data'¶
-
root
= 'mascot_search_results'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
mzML data
-
edam_format
= 'format_3244'¶
-
file_ext
= 'mzml'¶
-
blurb
= 'mzML Mass Spectrometry data'¶
-
root
= '(mzML|indexedmzML)'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
NmrML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
nmrML data
-
file_ext
= 'nmrml'¶
-
blurb
= 'nmrML NMR data'¶
-
root
= 'nmrML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ProtXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
protXML data
-
file_ext
= 'protxml'¶
-
blurb
= 'prot XML Search Results'¶
-
root
= 'protein_summary'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
mzXML data
-
edam_format
= 'format_3654'¶
-
file_ext
= 'mzxml'¶
-
blurb
= 'mzXML Mass Spectrometry data'¶
-
root
= 'mzXML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzData
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
mzData data
-
edam_format
= 'format_3245'¶
-
file_ext
= 'mzdata'¶
-
blurb
= 'mzData Mass Spectrometry data'¶
-
root
= 'mzData'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzIdentML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
edam_format
= 'format_3247'¶
-
file_ext
= 'mzid'¶
-
blurb
= 'XML identified peptides and proteins.'¶
-
root
= 'MzIdentML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
TraML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
edam_format
= 'format_3246'¶
-
file_ext
= 'traml'¶
-
blurb
= 'TraML transition list'¶
-
root
= 'TraML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
TrafoXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'trafoxml'¶
-
blurb
= 'RT alignment tranformation'¶
-
root
= 'TrafoXML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MzQuantML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
edam_format
= 'format_3248'¶
-
file_ext
= 'mzq'¶
-
blurb
= 'XML quantification data'¶
-
root
= 'MzQuantML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ConsensusXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'consensusxml'¶
-
blurb
= 'OpenMS multiple LC-MS map alignment file'¶
-
root
= 'consensusXML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
FeatureXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'featurexml'¶
-
blurb
= 'OpenMS feature file'¶
-
root
= 'featureMap'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
IdXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'idxml'¶
-
blurb
= 'OpenMS identification file'¶
-
root
= 'IdXML'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
TandemXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
edam_format
= 'format_3711'¶
-
file_ext
= 'tandem'¶
-
blurb
= 'X!Tandem search results file'¶
-
root
= 'bioml'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
UniProtXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'uniprotxml'¶
-
blurb
= 'UniProt Proteome file'¶
-
root
= 'uniprot'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
XquestXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
-
file_ext
= 'xquest.xml'¶
-
blurb
= 'XQuest XML file'¶
-
root
= 'xquest_results'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
XquestSpecXML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
spec.xml
-
file_ext
= 'spec.xml'¶
-
blurb
= 'xquest_spectra'¶
-
root
= 'xquest_spectra'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
QCML
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.ProteomicsXml
qcml https://github.com/OpenMS/OpenMS/blob/113c49d01677f7f03343ce7cd542d83c99b351ee/share/OpenMS/SCHEMAS/mzQCML_0_0_5.xsd https://github.com/OpenMS/OpenMS/blob/3cfc57ad1788e7ab2bd6dd9862818b2855234c3f/share/OpenMS/SCHEMAS/qcML_0.0.7.xsd
-
file_ext
= 'qcml'¶
-
blurb
= 'QualityAssessments to runs'¶
-
root
= 'qcML|MzQualityML)'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Mgf
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Mascot Generic Format data
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_3651'¶
-
file_ext
= 'mgf'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
MascotDat
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Mascot search results
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_3713'¶
-
file_ext
= 'mascotdat'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ThermoRAW
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a Thermo Finnigan binary RAW file
-
edam_data
= 'data_2536'¶
-
edam_format
= 'format_3712'¶
-
file_ext
= 'thermo.raw'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Msp
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Output of NIST MS Search Program chemdata.nist.gov/mass-spc/ftp/mass-spc/PepLib.pdf
-
file_ext
= 'msp'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
SPLibNoIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
SPlib without index file
-
file_ext
= 'splib_noindex'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
SPLib
(**kwd)[source]¶ Bases:
galaxy.datatypes.proteomics.Msp
SpectraST Spectral Library. Closely related to msp format
-
file_ext
= 'splib'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
Ms2
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'ms2'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.proteomics.
XHunterAslFormat
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Annotated Spectra in the HLF format http://www.thegpm.org/HUNTER/format_2006_09_15.html
-
file_ext
= 'hlf'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
Sf3
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a Scaffold SF3 files
-
file_ext
= 'sf3'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.proteomics.
ImzML
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for imzML files. http://www.imzml.org
-
edam_format
= 'format_3682'¶
-
file_ext
= 'imzml'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.qualityscore module¶
Qualityscore class
-
class
galaxy.datatypes.qualityscore.
QualityScore
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
until we know more about quality score formats
-
edam_data
= 'data_2048'¶
-
edam_format
= 'format_3606'¶
-
file_ext
= 'qual'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreSOLiD
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
edam_format
= 'format_3610'¶
-
file_ext
= 'qualsolid'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.fasta' ) >>> QualityScoreSOLiD().sniff( fname ) False >>> fname = get_test_fname( 'sequence.qualsolid' ) >>> QualityScoreSOLiD().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScore454
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
edam_format
= 'format_3611'¶
-
file_ext
= 'qual454'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.fasta' ) >>> QualityScore454().sniff( fname ) False >>> fname = get_test_fname( 'sequence.qual454' ) >>> QualityScore454().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreSolexa
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
edam_format
= 'format_3608'¶
-
file_ext
= 'qualsolexa'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreIllumina
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
edam_format
= 'format_3609'¶
-
file_ext
= 'qualillumina'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.registry module¶
Provides mapping between extensions and datatypes, mime-types, etc.
-
class
galaxy.datatypes.registry.
Registry
(config=None)[source]¶ Bases:
object
-
load_datatypes
(root_dir=None, config=None, deactivate=False, override=True, use_converters=True, use_display_applications=True, use_build_sites=True)[source]¶ Parse a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom datatypes is being deactivated or uninstalled, so appropriate loaded datatypes will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting data types.
-
load_datatype_sniffers
(root, deactivate=False, handling_proprietary_datatypes=False, override=False, compressed_sniffers=None)[source]¶ Process the sniffers element from a parsed a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom sniffers is being deactivated or uninstalled, so appropriate loaded sniffers will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting sniffers.
-
get_datatype_class_by_name
(name)[source]¶ Return the datatype class where the datatype’s type attribute (as defined in the datatype_conf.xml file) contains name.
-
get_mimetype_by_extension
(ext, default='application/octet-stream')[source]¶ Returns a mimetype based on an extension
-
load_datatype_converters
(toolbox, installed_repository_dict=None, deactivate=False, use_cached=False)[source]¶ If deactivate is False, add datatype converters from self.converters or self.proprietary_converters to the calling app’s toolbox. If deactivate is True, eliminates relevant converters from the calling app’s toolbox.
-
load_display_applications
(app, installed_repository_dict=None, deactivate=False)[source]¶ If deactivate is False, add display applications from self.display_app_containers or self.proprietary_display_app_containers to appropriate datatypes. If deactivate is True, eliminates relevant display applications from appropriate datatypes.
-
reload_display_applications
(display_application_ids=None)[source]¶ Reloads display applications: by id, or all if no ids provided Returns tuple( [reloaded_ids], [failed_ids] )
-
get_converter_by_target_type
(source_ext, target_ext)[source]¶ Returns a converter based on source and target datatypes
-
find_conversion_destination_for_dataset_by_extensions
(dataset_or_ext, accepted_formats, converter_safe=True)[source]¶ returns (direct_match, converted_ext, converted_dataset) - direct match is True iff no the data set already has an accepted format - target_ext becomes None if conversion is not possible (or necesary)
-
get_upload_metadata_params
(context, group, tool)[source]¶ Returns dict of case value:inputs for metadata conditional for upload tool
-
property
edam_formats
¶
-
property
edam_data
¶
-
galaxy.datatypes.sequence module¶
Sequence classes
-
class
galaxy.datatypes.sequence.
SequenceSplitLocations
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class storing information about a sequence file composed of multiple gzip files concatenated as one OR an uncompressed file. In the GZIP case, each sub-file’s location is stored in start and end.
The format of the file is JSON:
{ "sections" : [ { "start" : "x", "end" : "y", "sequences" : "z" }, ... ]}
-
file_ext
= 'fqtoc'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
Sequence
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a sequence
-
edam_data
= 'data_2044'¶ Add metadata elements
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
classmethod
do_fast_split
(input_datasets, toc_file_datasets, subdir_generator_function, split_params)[source]¶
-
classmethod
write_split_files
(input_datasets, toc_file_datasets, subdir_generator_function, sequences_per_file)[source]¶
-
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split a generic sequence file (not sensible or possible, see subclasses).
-
static
get_split_commands_with_toc
(input_name, output_name, toc_file, start_sequence, sequence_count)[source]¶ Uses a Table of Contents dict, parsed from an FQTOC file, to come up with a set of shell commands that will extract the parts necessary >>> three_sections=[dict(start=0, end=74, sequences=10), dict(start=74, end=148, sequences=10), dict(start=148, end=148+76, sequences=10)] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=10) [‘dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=1, sequence_count=5) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +5 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=20) [‘dd bs=1 skip=0 count=148 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=10) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘(dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=10, sequence_count=10) [‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=20) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’, ‘(dd bs=1 skip=148 count=76 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’]
-
static
get_split_commands_sequential
(is_compressed, input_name, output_name, start_sequence, sequence_count)[source]¶ Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, ‘./input.gz’, ‘./output.gz’, start_sequence=0, sequence_count=10) [‘zcat “./input.gz” | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > “./output.gz”’] >>> Sequence.get_split_commands_sequential(False, ‘./input.fastq’, ‘./output.fastq’, start_sequence=10, sequence_count=10) [‘tail -n +41 “./input.fastq” 2> /dev/null | head -40 > “./output.fastq”’]
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
Alignment
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an alignment
-
edam_data
= 'data_0863'¶ Add metadata elements
-
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split a generic alignment file (not sensible or possible, see subclasses).
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'species': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
Fasta
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing a FASTA sequence
-
edam_format
= 'format_1929'¶
-
file_ext
= 'fasta'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in fasta format
A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (“>”) symbol in the first column. All lines should be shorter than 80 characters
For complete details see http://www.ncbi.nlm.nih.gov/blast/fasta.shtml
Rules for sniffing as True:
We don’t care about line length (other than empty lines).
The first non-empty line must start with ‘>’ and the Very Next line.strip() must have sequence data and not be a header.
‘sequence data’ here is loosely defined as non-empty lines which do not start with ‘>’
This will cause Color Space FASTA (csfasta) to be detected as True (they are, after all, still FASTA files - they have a header line followed by sequence data)
Previously this method did some checking to determine if the sequence data had integers (presumably to differentiate between fasta and csfasta)
This should be done through sniff order, where csfasta (currently has a null sniff function) is detected for first (stricter definition) followed sometime after by fasta
We will only check that the first purported sequence is correctly formatted.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.maf' ) >>> Fasta().sniff( fname ) False >>> fname = get_test_fname( 'sequence.fasta' ) >>> Fasta().sniff( fname ) True
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split a FASTA file sequence by sequence.
Note that even if split_mode=”number_of_parts”, the actual number of sub-files produced may not match that requested by split_size.
If split_mode=”to_size” then split_size is treated as the number of FASTA records to put in each sub-file (not size in bytes).
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
csFasta
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing the SOLID Color-Space sequence ( csfasta )
-
edam_format
= 'format_3589'¶
-
file_ext
= 'csfasta'¶
-
sniff_prefix
(file_prefix)[source]¶ - Color-space sequence:
>2_15_85_F3 T213021013012303002332212012112221222112212222
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.fasta' ) >>> csFasta().sniff( fname ) False >>> fname = get_test_fname( 'sequence.csfasta' ) >>> csFasta().sniff( fname ) True
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
Fastg
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing a FASTG sequence
-
edam_format
= 'format_3823'¶
-
file_ext
= 'fastg'¶
-
sniff_prefix
(file_prefix)[source]¶ FASTG must begin with lines: #FASTG:begin; #FASTG:version=*.*; #FASTG:properties;
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'properties': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>, 'version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
BaseFastq
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Base class for FastQ sequences
-
edam_format
= 'format_1930'¶
-
file_ext
= 'fastq'¶
-
bases_regexp
= re.compile('^[NGTAC 0123\\.]*$', re.IGNORECASE)¶
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset. FIXME: This does not properly handle line wrapping
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml
- Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina
These differ in the representation of the quality scores
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('1.fastqsanger') >>> FastqSanger().sniff(fname) True >>> fname = get_test_fname('4.fastqsanger') >>> FastqSanger().sniff(fname) True >>> fname = get_test_fname('3.fastq') >>> FastqSanger().sniff(fname) False >>> Fastq().sniff(fname) True >>> fname = get_test_fname('2.fastq') >>> Fastq().sniff(fname) True >>> FastqSanger().sniff(fname) False >>> fname = get_test_fname('1.fastq') >>> FastqSanger().sniff(fname) False >>> fname = get_test_fname('1.fastqcssanger') >>> FastqSanger().sniff(fname) False >>> Fastq().sniff(fname) True >>> FastqCSSanger().sniff(fname) True
-
display_data
(trans, dataset, preview=False, filename=None, to_ext=None, **kwd)[source]¶ Displays data in central pane if preview is True, else handles download.
Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ FASTQ files are split on cluster boundaries, in increments of 4 lines
-
static
process_split_file
(data)[source]¶ This is called in the context of an external process launched by a Task (possibly not on the Galaxy machine) to create the input files for the Task. The parameters: data - a dict containing the contents of the split file
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
Fastq
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.BaseFastq
Class representing a generic FASTQ sequence
-
edam_format
= 'format_1930'¶
-
file_ext
= 'fastq'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
FastqSanger
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Sanger variant )
-
edam_format
= 'format_1932'¶
-
file_ext
= 'fastqsanger'¶
-
bases_regexp
= re.compile('^[NGTAC]*$', re.IGNORECASE)¶
-
static
quality_check
(lines)[source]¶ Presuming lines are lines from a fastq file, return True if the qualities are compatible with sanger encoding
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
FastqSolexa
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Solexa variant )
-
edam_format
= 'format_1933'¶
-
file_ext
= 'fastqsolexa'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
FastqIllumina
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Illumina 1.3+ variant )
-
edam_format
= 'format_1931'¶
-
file_ext
= 'fastqillumina'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
FastqCSSanger
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a Color Space FASTQ sequence ( e.g a SOLiD variant )
-
file_ext
= 'fastqcssanger'¶
-
bases_regexp
= re.compile('^[NGTAC][0123\\.]*$', re.IGNORECASE)¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
Maf
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Alignment
Class describing a Maf alignment
-
edam_format
= 'format_3008'¶
-
file_ext
= 'maf'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Parses and sets species, chromosomes, index from MAF file.
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Determines wether the file is in maf format
The .maf format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.
The first line of a .maf file begins with ##maf. This word is followed by white-space-separated variable=value pairs. There should be no white space surrounding the “=”.
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format5
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.maf' ) >>> Maf().sniff( fname ) True >>> fname = get_test_fname( 'sequence.fasta' ) >>> Maf().sniff( fname ) False
-
metadata_spec
= {'blocks': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'maf_index': <galaxy.model.metadata.MetadataElementSpec object>, 'species': <galaxy.model.metadata.MetadataElementSpec object>, 'species_chromosomes': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
MafCustomTrack
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mafcustomtrack'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_chromosome': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_end': <galaxy.model.metadata.MetadataElementSpec object>, 'vp_start': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
Axt
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an axt alignment
-
edam_data
= 'data_0863'¶
-
edam_format
= 'format_3013'¶
-
file_ext
= 'axt'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in axt format
axt alignment files are produced from Blastz, an alignment tool available from Webb Miller’s lab at Penn State University.
Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.
The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.
The sequence lines contain the sequence of the primary assembly (line 2) and aligning assembly (line 3) with inserts. Repeats are indicated by lower-case letters.
For complete details see http://genome.ucsc.edu/goldenPath/help/axt.html
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'alignment.axt' ) >>> Axt().sniff( fname ) True >>> fname = get_test_fname( 'alignment.lav' ) >>> Axt().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
Lav
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a LAV alignment
-
edam_data
= 'data_0863'¶
-
edam_format
= 'format_3014'¶
-
file_ext
= 'lav'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in lav format
LAV is an alignment format developed by Webb Miller’s group. It is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.
For complete details see http://www.bioperl.org/wiki/LAV_alignment_format
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'alignment.lav' ) >>> Lav().sniff( fname ) True >>> fname = get_test_fname( 'alignment.axt' ) >>> Lav().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
RNADotPlotMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
-
edam_format
= 'format_3466'¶
-
file_ext
= 'rna_eps'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek and blurb text
- Parameters
is_multi_byte (bool) – deprecated
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.sequence.
DotBracket
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
-
edam_data
= 'data_0880'¶
-
edam_format
= 'format_1457'¶
-
file_ext
= 'dbn'¶
-
sequence_regexp
= re.compile('^[ACGTURYKMSWBDHVN]+$', re.IGNORECASE)¶
-
structure_regexp
= re.compile('^[\\(\\)\\.\\[\\]{}]+$')¶
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
sniff_prefix
(file_prefix)[source]¶ Galaxy Dbn (Dot-Bracket notation) rules:
The first non-empty line is a header line: no comment lines are allowed.
A header line starts with a ‘>’ symbol and continues with 0 or multiple symbols until the line ends.
The second non-empty line is a sequence line.
A sequence line may only include chars that match the FASTA format (https://en.wikipedia.org/wiki/FASTA_format#Sequence_representation) symbols for nucleotides: ACGTURYKMSWBDHVN, and may thus not include whitespaces.
A sequence line has no prefix and no suffix.
A sequence line is case insensitive.
The third non-empty line is a structure (Dot-Bracket) line and only describes the 2D structure of the sequence above it.
A structure line must consist of the following chars: ‘.{}[]()’.
A structure line must be of the same length as the sequence line, and each char represents the structure of the nucleotide above it.
A structure line has no prefix and no suffix.
A nucleotide pairs with only 1 or 0 other nucleotides.
In a structure line, the number of ‘(‘ symbols equals the number of ‘)’ symbols, the number of ‘[‘ symbols equals the number of ‘]’ symbols and the number of ‘{‘ symbols equals the number of ‘}’ symbols.
The format accepts multiple entries per file, given that each entry is provided as three lines: the header, sequence and structure line.
Sniffing is only applied on the first entry.
Empty lines are allowed.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
Genbank
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class representing a Genbank sequence
-
edam_format
= 'format_1936'¶
-
edam_data
= 'data_0849'¶
-
file_ext
= 'genbank'¶
-
sniff_prefix
(file_prefix)[source]¶ Determine whether the file is in genbank format. Works for compressed files.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( '1.genbank' ) >>> Genbank().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.sequence.
MemePsp
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing MEME Position Specific Priors
-
file_ext
= 'memepsp'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'sequences': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
sniff_prefix
(file_prefix)[source]¶ The format of an entry in a PSP file is:
>ID WIDTH PRIORS
For complete details see http://meme-suite.org/doc/psp-format.html
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('1.memepsp') >>> MemePsp().sniff(fname) True >>> fname = get_test_fname('sequence.fasta') >>> MemePsp().sniff(fname) False
-
galaxy.datatypes.sniff module¶
File format detector
-
galaxy.datatypes.sniff.
stream_to_open_named_file
(stream, fd, filename, source_encoding=None, source_error='strict', target_encoding=None, target_error='strict')[source]¶ Writes a stream to the provided file descriptor, returns the file name. Closes file descriptor
-
galaxy.datatypes.sniff.
stream_to_file
(stream, suffix='', prefix='', dir=None, text=False, **kwd)[source]¶ Writes a stream to a temporary file, returns the temporary file’s name
-
galaxy.datatypes.sniff.
handle_composite_file
(datatype, src_path, extra_files, name, is_binary, tmp_dir, tmp_prefix, upload_opts)[source]¶
-
galaxy.datatypes.sniff.
convert_newlines
(fname, in_place=True, tmp_dir=None, tmp_prefix='gxupload', block_size=131072, regexp=None)[source]¶ Converts in place a file from universal line endings to Posix line endings.
-
galaxy.datatypes.sniff.
convert_newlines_sep2tabs
(fname, in_place=True, patt=b'[^\\S\\n]+', tmp_dir=None, tmp_prefix='gxupload')[source]¶ Converts newlines in a file to posix newlines and replaces spaces with tabs.
>>> fname = get_test_fname('temp.txt') >>> with open(fname, 'wt') as fh: ... _ = fh.write(u"1 2\r3 4") >>> convert_newlines_sep2tabs(fname, tmp_prefix="gxtest", tmp_dir=tempfile.gettempdir()) (2, None) >>> open(fname).read() '1\t2\n3\t4\n'
-
galaxy.datatypes.sniff.
iter_headers
(fname_or_file_prefix, sep, count=60, comment_designator=None)[source]¶
-
galaxy.datatypes.sniff.
validate_tabular
(fname_or_file_prefix, validate_row, sep, comment_designator=None)[source]¶
-
galaxy.datatypes.sniff.
get_headers
(fname_or_file_prefix, sep, count=60, comment_designator=None)[source]¶ Returns a list with the first ‘count’ lines split by ‘sep’, ignoring lines starting with ‘comment_designator’
>>> fname = get_test_fname('complete.bed') >>> get_headers(fname,'\t') == [['chr7', '127475281', '127491632', 'NM_000230', '0', '+', '127486022', '127488767', '0', '3', '29,172,3225,', '0,10713,13126,'], ['chr7', '127486011', '127488900', 'D49487', '0', '+', '127486022', '127488767', '0', '2', '155,490,', '0,2399']] True >>> fname = get_test_fname('test.gff') >>> get_headers(fname, '\t', count=5, comment_designator='#') == [[''], ['chr7', 'bed2gff', 'AR', '26731313', '26731437', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731491', '26731536', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731541', '26731649', '.', '+', '.', 'score'], ['chr7', 'bed2gff', 'AR', '26731659', '26731841', '.', '+', '.', 'score']] True
-
galaxy.datatypes.sniff.
is_column_based
(fname_or_file_prefix, sep='\t', skip=0)[source]¶ Checks whether the file is column based with respect to a separator (defaults to tab separator).
>>> fname = get_test_fname('test.gff') >>> is_column_based(fname) True >>> fname = get_test_fname('test_tab.bed') >>> is_column_based(fname) True >>> is_column_based(fname, sep=' ') False >>> fname = get_test_fname('test_space.txt') >>> is_column_based(fname) False >>> is_column_based(fname, sep=' ') True >>> fname = get_test_fname('test_ensembl.tabular') >>> is_column_based(fname) True >>> fname = get_test_fname('test_tab1.tabular') >>> is_column_based(fname, sep=' ', skip=0) False >>> fname = get_test_fname('test_tab1.tabular') >>> is_column_based(fname) True
-
galaxy.datatypes.sniff.
guess_ext
(fname, sniff_order, is_binary=False)[source]¶ Returns an extension that can be used in the datatype factory to generate a data for the ‘fname’ file
>>> from galaxy.datatypes.registry import example_datatype_registry_for_sample >>> datatypes_registry = example_datatype_registry_for_sample() >>> sniff_order = datatypes_registry.sniff_order >>> fname = get_test_fname('empty.txt') >>> guess_ext(fname, sniff_order) 'txt' >>> fname = get_test_fname('megablast_xml_parser_test1.blastxml') >>> guess_ext(fname, sniff_order) 'blastxml' >>> fname = get_test_fname('interval.interval') >>> guess_ext(fname, sniff_order) 'interval' >>> fname = get_test_fname('interv1.bed') >>> guess_ext(fname, sniff_order) 'bed' >>> fname = get_test_fname('test_tab.bed') >>> guess_ext(fname, sniff_order) 'bed' >>> fname = get_test_fname('sequence.maf') >>> guess_ext(fname, sniff_order) 'maf' >>> fname = get_test_fname('sequence.fasta') >>> guess_ext(fname, sniff_order) 'fasta' >>> fname = get_test_fname('1.genbank') >>> guess_ext(fname, sniff_order) 'genbank' >>> fname = get_test_fname('1.genbank.gz') >>> guess_ext(fname, sniff_order) 'genbank.gz' >>> fname = get_test_fname('file.html') >>> guess_ext(fname, sniff_order) 'html' >>> fname = get_test_fname('test.gtf') >>> guess_ext(fname, sniff_order) 'gtf' >>> fname = get_test_fname('test.gff') >>> guess_ext(fname, sniff_order) 'gff' >>> fname = get_test_fname('gff.gff3') >>> guess_ext(fname, sniff_order) 'gff3' >>> fname = get_test_fname('2.txt') >>> guess_ext(fname, sniff_order) 'txt' >>> fname = get_test_fname('2.tabular') >>> guess_ext(fname, sniff_order) 'tabular' >>> fname = get_test_fname('3.txt') >>> guess_ext(fname, sniff_order) 'txt' >>> fname = get_test_fname('test_tab1.tabular') >>> guess_ext(fname, sniff_order) 'tabular' >>> fname = get_test_fname('alignment.lav') >>> guess_ext(fname, sniff_order) 'lav' >>> fname = get_test_fname('1.sff') >>> guess_ext(fname, sniff_order) 'sff' >>> fname = get_test_fname('1.bam') >>> guess_ext(fname, sniff_order) 'bam' >>> fname = get_test_fname('3unsorted.bam') >>> guess_ext(fname, sniff_order) 'unsorted.bam' >>> fname = get_test_fname('test.idpdb') >>> guess_ext(fname, sniff_order) 'idpdb' >>> fname = get_test_fname('test.mz5') >>> guess_ext(fname, sniff_order) 'h5' >>> fname = get_test_fname('issue1818.tabular') >>> guess_ext(fname, sniff_order) 'tabular' >>> fname = get_test_fname('drugbank_drugs.cml') >>> guess_ext(fname, sniff_order) 'cml' >>> fname = get_test_fname('q.fps') >>> guess_ext(fname, sniff_order) 'fps' >>> fname = get_test_fname('drugbank_drugs.inchi') >>> guess_ext(fname, sniff_order) 'inchi' >>> fname = get_test_fname('drugbank_drugs.mol2') >>> guess_ext(fname, sniff_order) 'mol2' >>> fname = get_test_fname('drugbank_drugs.sdf') >>> guess_ext(fname, sniff_order) 'sdf' >>> fname = get_test_fname('5e5z.pdb') >>> guess_ext(fname, sniff_order) 'pdb' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.otu') >>> guess_ext(fname, sniff_order) 'mothur.otu' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.lower.dist') >>> guess_ext(fname, sniff_order) 'mothur.lower.dist' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.square.dist') >>> guess_ext(fname, sniff_order) 'mothur.square.dist' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.pair.dist') >>> guess_ext(fname, sniff_order) 'mothur.pair.dist' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.freq') >>> guess_ext(fname, sniff_order) 'mothur.freq' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.quan') >>> guess_ext(fname, sniff_order) 'mothur.quan' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.ref.taxonomy') >>> guess_ext(fname, sniff_order) 'mothur.ref.taxonomy' >>> fname = get_test_fname('mothur_datatypetest_true.mothur.axes') >>> guess_ext(fname, sniff_order) 'mothur.axes' >>> guess_ext(get_test_fname('infernal_model.cm'), sniff_order) 'cm' >>> fname = get_test_fname('1.gg') >>> guess_ext(fname, sniff_order) 'gg' >>> fname = get_test_fname('diamond_db.dmnd') >>> guess_ext(fname, sniff_order) 'dmnd' >>> fname = get_test_fname('1.excel.xls') >>> guess_ext(fname, sniff_order, is_binary=True) 'excel.xls' >>> fname = get_test_fname('biom2_sparse_otu_table_hdf5.biom2') >>> guess_ext(fname, sniff_order) 'biom2' >>> fname = get_test_fname('454Score.pdf') >>> guess_ext(fname, sniff_order) 'pdf' >>> fname = get_test_fname('1.obo') >>> guess_ext(fname, sniff_order) 'obo' >>> fname = get_test_fname('1.arff') >>> guess_ext(fname, sniff_order) 'arff' >>> fname = get_test_fname('1.afg') >>> guess_ext(fname, sniff_order) 'afg' >>> fname = get_test_fname('1.owl') >>> guess_ext(fname, sniff_order) 'owl' >>> fname = get_test_fname('Acanium.snaphmm') >>> guess_ext(fname, sniff_order) 'snaphmm' >>> fname = get_test_fname('wiggle.wig') >>> guess_ext(fname, sniff_order) 'wig' >>> fname = get_test_fname('example.iqtree') >>> guess_ext(fname, sniff_order) 'iqtree' >>> fname = get_test_fname('1.stockholm') >>> guess_ext(fname, sniff_order) 'stockholm' >>> fname = get_test_fname('1.xmfa') >>> guess_ext(fname, sniff_order) 'xmfa' >>> fname = get_test_fname('test.blib') >>> guess_ext(fname, sniff_order) 'blib' >>> fname = get_test_fname('test_strict_interleaved.phylip') >>> guess_ext(fname, sniff_order) 'phylip' >>> fname = get_test_fname('test_relaxed_interleaved.phylip') >>> guess_ext(fname, sniff_order) 'phylip' >>> fname = get_test_fname('1.smat') >>> guess_ext(fname, sniff_order) 'smat' >>> fname = get_test_fname('1.ttl') >>> guess_ext(fname, sniff_order) 'ttl' >>> fname = get_test_fname('1.hdt') >>> guess_ext(fname, sniff_order, is_binary=True) 'hdt' >>> fname = get_test_fname('1.phyloxml') >>> guess_ext(fname, sniff_order) 'phyloxml' >>> fname = get_test_fname('1.dzi') >>> guess_ext(fname, sniff_order) 'dzi' >>> fname = get_test_fname('1.tiff') >>> guess_ext(fname, sniff_order) 'tiff' >>> fname = get_test_fname('1.fastqsanger.gz') >>> guess_ext(fname, sniff_order) # See test_datatype_registry for more compressed type tests. 'fastqsanger.gz' >>> fname = get_test_fname('1.mtx') >>> guess_ext(fname, sniff_order) 'mtx' >>> fname = get_test_fname('1imzml') >>> guess_ext(fname, sniff_order) # This test case is ensuring doesn't throw exception, actual value could change if non-utf encoding handling improves. 'data' >>> fname = get_test_fname('too_many_comments_gff3.tabular') >>> guess_ext(fname, sniff_order) # It's a VCF but is sniffed as tabular because of the limit on the number of header lines we read 'tabular'
-
galaxy.datatypes.sniff.
run_sniffers_raw
(filename_or_file_prefix, sniff_order, is_binary=False)[source]¶ Run through sniffers specified by sniff_order, return None of None match.
-
galaxy.datatypes.sniff.
handle_compressed_file
(filename, datatypes_registry, ext='auto', tmp_prefix='sniff_uncompress_', tmp_dir=None, in_place=False, check_content=True, auto_decompress=True)[source]¶ Check uploaded files for compression, check compressed file contents, and uncompress if necessary.
Supports GZip, BZip2, and the first file in a Zip file.
For performance reasons, the temporary file used for uncompression is located in the same directory as the input/output file. This behavior can be changed with the tmp_dir param.
ext
as returned will only be changed from theext
input param if the param was an autodetect type (auto
) and the file was sniffed as a keep-compressed datatype.is_valid
as returned will only be set if the file is compressed and contains invalid contents (or the first file in the case of a zip file), this is so lengthy decompression can be bypassed if there is invalid content in the first 32KB. Otherwise the caller should be checking content.
-
galaxy.datatypes.sniff.
handle_uploaded_dataset_file
(*args, **kwds)[source]¶ Legacy wrapper about handle_uploaded_dataset_file_internal for tools using it.
-
galaxy.datatypes.sniff.
handle_uploaded_dataset_file_internal
(filename, datatypes_registry, ext='auto', tmp_prefix='sniff_upload_', tmp_dir=None, in_place=False, check_content=True, is_binary=None, auto_decompress=True, uploaded_file_ext=None, convert_to_posix_lines=None, convert_spaces_to_tabs=None)[source]¶
galaxy.datatypes.spaln module¶
spaln Composite Dataset
-
class
galaxy.datatypes.spaln.
SpalnNuclDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.spaln._SpalnDb
-
file_ext
= 'spalndbnp'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'spalndb_name': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.speech module¶
-
class
galaxy.datatypes.speech.
TextGrid
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Praat Textgrid file for speech annotations
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('1_1119_2_22_001.textgrid') >>> TextGrid().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> TextGrid().sniff(fname) False
-
file_ext
= 'textgrid'¶
-
header
= 'File type = "ooTextFile"\nObject class = "TextGrid"\n'¶
-
blurb
= 'Praat TextGrid file'¶
-
metadata_spec
= {'annotations': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.speech.
BPF
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Munich BPF annotation format https://www.phonetik.uni-muenchen.de/Bas/BasFormatseng.html#Partitur
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('1_1119_2_22_001.par') >>> BPF().sniff(fname) True >>> fname = get_test_fname('1_1119_2_22_001-1.par') >>> BPF().sniff(fname) True >>> fname = get_test_fname('drugbank_drugs.cml') >>> BPF().sniff(fname) False
-
file_ext
= 'par'¶
-
mandatory_headers
= ['LHD', 'REP', 'SNB', 'SAM', 'SBF', 'SSB', 'NCH', 'SPN', 'LBD']¶
-
optional_headers
= ['FIL', 'TYP', 'DBN', 'VOL', 'DIR', 'SRC', 'BEG', 'END', 'RED', 'RET', 'RCC', 'CMT', 'SPI', 'PCF', 'PCN', 'EXP', 'SYS', 'DAT', 'SPA', 'MAO', 'GPO', 'SAO']¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Set the metadata for this dataset from the file contents
-
metadata_spec
= {'annotations': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.tabular module¶
Tabular datatype
-
class
galaxy.datatypes.tabular.
TabularData
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Generic tabular data
-
edam_format
= 'format_3475'¶
-
CHUNKABLE
= True¶
-
data_line_offset
= 0¶ Add metadata elements
-
set_peek
(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=None, line_wrap=False, **kwd)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
display_data
(trans, dataset, preview=False, filename=None, to_ext=None, offset=None, ck_size=None, **kwd)[source]¶ Displays data in central pane if preview is True, else handles download.
Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
display_as_markdown
(dataset_instance, markdown_format_helpers)[source]¶ Prepare for embedding dataset into a basic Markdown document.
This is a somewhat experimental interface and should not be implemented on datatypes not tightly tied to a Galaxy version (e.g. datatypes in the Tool Shed).
Speaking very losely - the datatype should should load a bounded amount of data from the supplied dataset instance and prepare for embedding it into Markdown. This should be relatively vanilla Markdown - the result of this is bleached and it should not contain nested Galaxy Markdown directives.
If the data cannot reasonably be displayed, just indicate this and do not throw an exception.
-
make_html_peek_header
(dataset, skipchars=None, column_names=None, column_number_format='%s', column_parameter_alias=None, **kwargs)[source]¶
-
dataset_column_dataprovider
(dataset, **settings)[source]¶ Attempts to get column settings from dataset.metadata
-
dataset_dict_dataprovider
(dataset, **settings)[source]¶ Attempts to get column settings from dataset.metadata
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
Tabular
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
Tab delimited data
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=100000, max_guess_type_data_lines=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
SraManifest
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
A manifest received from the sra_source tool.
-
ext
= 'sra_manifest.tabular'¶
-
data_line_offset
= 1¶
-
set_meta
(dataset, **kwds)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
Taxonomy
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
Sam
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
edam_format
= 'format_2573'¶
-
edam_data
= 'data_0863'¶
-
file_ext
= 'sam'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in SAM format
A file in SAM format consists of lines of tab-separated data. The following header line may be the first line:
@QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL or @QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL OPT
Data in the OPT column is optional and can consist of tab-separated data
For complete details see http://samtools.sourceforge.net/SAM1.pdf
Rules for sniffing as True:
There must be 11 or more columns of data on each line Columns 2 (FLAG), 4(POS), 5 (MAPQ), 8 (MPOS), and 9 (ISIZE) must be numbers (9 can be negative) We will only check that up to the first 5 alignments are correctly formatted.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'sequence.maf' ) >>> Sam().sniff( fname ) False >>> fname = get_test_fname( '1.sam' ) >>> Sam().sniff( fname ) True
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=5, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
static
merge
(split_files, output_file)[source]¶ Multiple SAM files may each have headers. Since the headers should all be the same, remove the headers from files 1-n, keeping them in the first file only
-
line_dataprovider
(dataset, **settings)[source]¶ Returns an iterator over the dataset’s lines (that have been stripped) optionally excluding blank lines and lines that start with a comment character.
-
regex_line_dataprovider
(dataset, **settings)[source]¶ Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.
-
dataset_column_dataprovider
(dataset, **settings)[source]¶ Attempts to get column settings from dataset.metadata
-
dataset_dict_dataprovider
(dataset, **settings)[source]¶ Attempts to get column settings from dataset.metadata
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function Sam.column_dataprovider>, 'dataset-column': <function Sam.dataset_column_dataprovider>, 'dataset-dict': <function Sam.dataset_dict_dataprovider>, 'dict': <function Sam.dict_dataprovider>, 'genomic-region': <function Sam.genomic_region_dataprovider>, 'genomic-region-dict': <function Sam.genomic_region_dict_dataprovider>, 'header': <function Sam.header_dataprovider>, 'id-seq-qual': <function Sam.id_seq_qual_dataprovider>, 'line': <function Sam.line_dataprovider>, 'regex-line': <function Sam.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
Pileup
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data in pileup (6- or 10-column) format
-
edam_format
= 'format_3015'¶
-
file_ext
= 'pileup'¶
-
line_class
= 'genomic coordinate'¶
-
sniff_prefix
(file_prefix)[source]¶ Checks for ‘pileup-ness’
There are two main types of pileup: 6-column and 10-column. For both, the first three and last two columns are the same. We only check the first three to allow for some personalization of the format.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'interval.interval' ) >>> Pileup().sniff( fname ) False >>> fname = get_test_fname( '6col.pileup' ) >>> Pileup().sniff( fname ) True >>> fname = get_test_fname( '10col.pileup' ) >>> Pileup().sniff( fname ) True >>> fname = get_test_fname( '1.excel.xls' ) >>> Pileup().sniff( fname ) False >>> fname = get_test_fname( '2.txt' ) >>> Pileup().sniff( fname ) # 2.txt False >>> fname = get_test_fname( '2.tabular' ) >>> Pileup().sniff( fname ) False
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function Pileup.genomic_region_dataprovider>, 'genomic-region-dict': <function Pileup.genomic_region_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'baseCol': <galaxy.model.metadata.MetadataElementSpec object>, 'chromCol': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'endCol': <galaxy.model.metadata.MetadataElementSpec object>, 'startCol': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
BaseVcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Variant Call Format for describing SNPs and other simple genome variations.
-
edam_format
= 'format_3016'¶
-
column_names
= ['Chrom', 'Pos', 'ID', 'Ref', 'Alt', 'Qual', 'Filter', 'Info', 'Format', 'data']¶
-
set_meta
(dataset, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
static
merge
(split_files, output_file)[source]¶ Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'genomic-region': <function BaseVcf.genomic_region_dataprovider>, 'genomic-region-dict': <function BaseVcf.genomic_region_dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
Vcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.BaseVcf
-
file_ext
= 'vcf'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
VcfGz
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.BaseVcf
,galaxy.datatypes.binary.Binary
-
file_ext
= 'vcf_bgzip'¶
-
compressed
= True¶
-
compressed_format
= 'gzip'¶
-
set_meta
(dataset, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'sample_names': <galaxy.model.metadata.MetadataElementSpec object>, 'tabix_index': <galaxy.model.metadata.MetadataElementSpec object>, 'viz_filter_cols': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
Eland
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Support for the export.txt.gz file used by Illumina’s ELANDv2e aligner
-
compressed
= True¶
-
compressed_format
= 'gzip'¶
-
file_ext
= '_export.txt.gz'¶
-
make_html_table
(dataset, skipchars=None, peek=None)[source]¶ Create HTML table, used for displaying peek
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in ELAND export format
A file in ELAND export format consists of lines of tab-separated data. There is no header.
Rules for sniffing as True:
- There must be 22 columns on each line - LANE, TILEm X, Y, INDEX, READ_NO, SEQ, QUAL, POSITION, *STRAND, FILT must be correct - We will only check that up to the first 5 alignments are correctly formatted.
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=5, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
metadata_spec
= {'barcodes': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'lanes': <galaxy.model.metadata.MetadataElementSpec object>, 'reads': <galaxy.model.metadata.MetadataElementSpec object>, 'tiles': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
ElandMulti
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'elandmulti'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
FeatureLocationIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
An index that stores feature locations in tabular format.
-
file_ext
= 'fli'¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
BaseCSV
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
Delimiter-separated table data. This includes CSV, TSV and other dialects understood by the Python ‘csv’ module https://docs.python.org/2/library/csv.html Must be extended to define the dialect to use, strict_width and file_ext. See the Python module csv for documentation of dialect settings
-
delimiter
= ','¶
-
peek_size
= 1024¶
-
big_peek_size
= 10240¶
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
CSV
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.BaseCSV
Comma-separated table data. Only sniffs comma-separated files with at least 2 rows and 2 columns.
-
file_ext
= 'csv'¶
-
strict_width
= False¶
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
TSV
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.BaseCSV
Tab-separated table data. Only sniff tab-separated files with at least 2 rows and 2 columns.
Note: Use of this datatype is optional as the general tabular datatype will handle most tab-separated files. This datatype is only required for datasets with tabs INSIDE double quotes.
This datatype currently does not support TSV files where the header has one column less to indicate first column is row names. This kind of file is handled fine by the tabular datatype.
-
file_ext
= 'tsv'¶
-
dialect
¶ alias of
csv.excel_tab
-
strict_width
= True¶
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'column': <function TabularData.column_dataprovider>, 'dataset-column': <function TabularData.dataset_column_dataprovider>, 'dataset-dict': <function TabularData.dataset_dict_dataprovider>, 'dict': <function TabularData.dict_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>}¶
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tabular.
ConnectivityTable
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
edam_format
= 'format_3309'¶
-
file_ext
= 'ct'¶
-
header_regexp
= re.compile('^[0-9]+(?:\t|[ ]+).*?(?:ENERGY|energy|dG)[ \t].*?=')¶
-
structure_regexp
= re.compile('^[0-9]+(?:\t|[ ]+)[ACGTURYKMSWBDHVN]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+(?:\t|[ ]+)[^\t]+')¶
-
set_meta
(dataset, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
If a tabular file has no data, it will have one column of type ‘str’.
We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
sniff_prefix
(file_prefix)[source]¶ The ConnectivityTable (CT) is a file format used for describing RNA 2D structures by tools including MFOLD, UNAFOLD and the RNAStructure package. The tabular file format is defined as follows:
5 energy = -12.3 sequence name 1 G 0 2 0 1 2 A 1 3 0 2 3 A 2 4 0 3 4 A 3 5 0 4 5 C 4 6 1 5
The links given at the edam ontology page do not indicate what type of separator is used (space or tab) while different implementations exist. The implementation that uses spaces as separator (implemented in RNAStructure) is as follows:
10 ENERGY = -34.8 seqname 1 G 0 2 9 1 2 G 1 3 8 2 3 G 2 4 7 3 4 a 3 5 0 4 5 a 4 6 0 5 6 a 5 7 0 6 7 C 6 8 3 7 8 C 7 9 2 8 9 C 8 10 1 9 10 a 9 0 0 10
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
MatrixMarket
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
The Matrix Market (MM) exchange formats provide a simple mechanism to facilitate the exchange of matrix data. MM coordinate format is suitable for representing sparse matrices. Only nonzero entries need be encoded, and the coordinates of each are given explicitly.
The tabular file format is defined as follows:
%%MatrixMarket matrix coordinate real general <--- header line % <--+ % comments |-- 0 or more comment lines % <--+ M N L <--- rows, columns, entries I1 J1 A(I1, J1) <--+ I2 J2 A(I2, J2) | I3 J3 A(I3, J3) |-- L lines . . . | IL JL A(IL, JL) <--+
Indices are 1-based, i.e. A(1,1) is the first element.
>>> from galaxy.datatypes.sniff import get_test_fname >>> MatrixMarket().sniff( get_test_fname( 'sequence.maf' ) ) False >>> MatrixMarket().sniff( get_test_fname( '1.mtx' ) ) True >>> MatrixMarket().sniff( get_test_fname( '2.mtx' ) ) True >>> MatrixMarket().sniff( get_test_fname( '3.mtx' ) ) True
-
file_ext
= 'mtx'¶
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=5, **kwd)[source]¶ Set the number of lines of data in dataset.
-
metadata_spec
= {'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.tabular.
CMAP
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.TabularData
-
file_ext
= 'cmap'¶
-
metadata_spec
= {'channel_1_color': <galaxy.model.metadata.MetadataElementSpec object>, 'channel_2_color': <galaxy.model.metadata.MetadataElementSpec object>, 'cmap_version': <galaxy.model.metadata.MetadataElementSpec object>, 'column_names': <galaxy.model.metadata.MetadataElementSpec object>, 'column_types': <galaxy.model.metadata.MetadataElementSpec object>, 'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'delimiter': <galaxy.model.metadata.MetadataElementSpec object>, 'label_channels': <galaxy.model.metadata.MetadataElementSpec object>, 'nickase_recognition_site_1': <galaxy.model.metadata.MetadataElementSpec object>, 'nickase_recognition_site_2': <galaxy.model.metadata.MetadataElementSpec object>, 'number_of_consensus_nanomaps': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.text module¶
Clearing house for generic text datatypes that are not XML or tabular.
-
class
galaxy.datatypes.text.
Html
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an html file
-
edam_format
= 'format_2331'¶
-
file_ext
= 'html'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in html format
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'complete.bed' ) >>> Html().sniff( fname ) False >>> fname = get_test_fname( 'file.html' ) >>> Html().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Json
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
edam_format
= 'format_3464'¶
-
file_ext
= 'json'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to load the string with the json module. If successful it’s a json file.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
ExpressionJson
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
Represents the non-data input or output to a tool or workflow.
-
file_ext
= 'json'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'json_type': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.text.
Ipynb
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
-
file_ext
= 'ipynb'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to load the string with the json module. If successful it’s a json file.
-
display_data
(trans, dataset, preview=False, filename=None, to_ext=None, **kwd)[source]¶ Displays data in central pane if preview is True, else handles download.
Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Biom1
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
BIOM version 1.0 file format description http://biom-format.org/documentation/format_versions/biom-1.0.html
-
file_ext
= 'biom1'¶
-
edam_format
= 'format_3746'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to load the string with the json module. If successful it’s a json file.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'table_column_metadata_headers': <galaxy.model.metadata.MetadataElementSpec object>, 'table_columns': <galaxy.model.metadata.MetadataElementSpec object>, 'table_date': <galaxy.model.metadata.MetadataElementSpec object>, 'table_format': <galaxy.model.metadata.MetadataElementSpec object>, 'table_format_url': <galaxy.model.metadata.MetadataElementSpec object>, 'table_generated_by': <galaxy.model.metadata.MetadataElementSpec object>, 'table_id': <galaxy.model.metadata.MetadataElementSpec object>, 'table_matrix_element_type': <galaxy.model.metadata.MetadataElementSpec object>, 'table_matrix_type': <galaxy.model.metadata.MetadataElementSpec object>, 'table_rows': <galaxy.model.metadata.MetadataElementSpec object>, 'table_shape': <galaxy.model.metadata.MetadataElementSpec object>, 'table_type': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
ImgtJson
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
-
file_ext
= 'imgt.json'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in json format with imgt elements
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( '1.json' ) >>> ImgtJson().sniff( fname ) False >>> fname = get_test_fname( 'imgt.json' ) >>> ImgtJson().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'taxon_names': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
GeoJson
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
GeoJSON is a geospatial data interchange format based on JavaScript Object Notation (JSON). https://tools.ietf.org/html/rfc7946
-
file_ext
= 'geojson'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is in json format with imgt elements
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( '1.json' ) >>> GeoJson().sniff( fname ) False >>> fname = get_test_fname( 'gis.geojson' ) >>> GeoJson().sniff( fname ) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Obo
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
OBO file format description https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_2.html
-
edam_data
= 'data_0582'¶
-
edam_format
= 'format_2549'¶
-
file_ext
= 'obo'¶
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to guess the Obo filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Arff
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. http://weka.wikispaces.com/ARFF
-
edam_format
= 'format_3581'¶
-
file_ext
= 'arff'¶ Add metadata elements
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
sniff_prefix
(file_prefix)[source]¶ Try to guess the Arff filetype. It usually starts with a “format-version:” string and has several stanzas which starts with “id:”.
-
set_meta
(dataset, **kwd)[source]¶ Trying to count the comment lines and the number of columns included. A typical ARFF data block looks like this: @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa
-
metadata_spec
= {'columns': <galaxy.model.metadata.MetadataElementSpec object>, 'comment_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
SnpEffDb
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a SnpEff genome build
-
edam_format
= 'format_3624'¶
-
file_ext
= 'snpeffdb'¶
-
metadata_spec
= {'annotation': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'genome_version': <galaxy.model.metadata.MetadataElementSpec object>, 'regulation': <galaxy.model.metadata.MetadataElementSpec object>, 'snpeff_version': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.text.
SnpSiftDbNSFP
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a dbNSFP database prepared fpr use by SnpSift dbnsfp
-
file_ext
= 'snpsiftdbnsfp'¶
-
composite_type
: Optional[str] = 'auto_primary_file'¶ ## The dbNSFP file is a tabular file with 1 header line ## The first 4 columns are required to be: chrom pos ref alt ## These match columns 1,2,4,5 of the VCF file ## SnpSift requires the file to be block-gzipped and the indexed with samtools tabix ## Example: ## Compress using block-gzip algorithm bgzip dbNSFP2.3.txt ## Create tabix index tabix -s 1 -b 2 -e 2 dbNSFP2.3.txt.gz
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
metadata_spec
= {'annotation': <galaxy.model.metadata.MetadataElementSpec object>, 'bgzip': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'index': <galaxy.model.metadata.MetadataElementSpec object>, 'reference_name': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.text.
IQTree
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
IQ-TREE format
-
file_ext
= 'iqtree'¶
-
sniff_prefix
(file_prefix)[source]¶ Detect the IQTree file
Scattered text file containing various headers and data types.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('example.iqtree') >>> IQTree().sniff(fname) True
>>> fname = get_test_fname('temp.txt') >>> IQTree().sniff(fname) False
>>> fname = get_test_fname('test_tab1.tabular') >>> IQTree().sniff(fname) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Paf
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
PAF: a Pairwise mApping Format
https://github.com/lh3/miniasm/blob/master/PAF.md
-
file_ext
= 'paf'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('A-3105.paf') >>> Paf().sniff(fname) True
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Gfa1
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Graphical Fragment Assembly (GFA) 1.0
http://gfa-spec.github.io/GFA-spec/GFA1.html
-
file_ext
= 'gfa1'¶
-
sniff_prefix
(file_prefix)[source]¶ >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname('big.gfa1') >>> Gfa1().sniff(fname) True >>> Gfa2().sniff(fname) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.text.
Gfa2
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Graphical Fragment Assembly (GFA) 2.0
https://github.com/GFA-spec/GFA-spec/blob/master/GFA2.md
-
file_ext
= 'gfa2'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
galaxy.datatypes.tracks module¶
Datatype classes for tracks/track views within galaxy.
-
class
galaxy.datatypes.tracks.
GeneTrack
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
edam_data
= 'data_3002'¶
-
edam_format
= 'format_2919'¶
-
file_ext
= 'genetrack'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.tracks.
UCSCTrackHub
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Html
Datatype for UCSC TrackHub
-
file_ext
= 'trackhub'¶
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
set_peek
(dataset, is_multi_byte=False)[source]¶ Set the peek. This method is used by various subclasses of Text.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.triples module¶
Triple format classes
-
class
galaxy.datatypes.triples.
Triples
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
The abstract base class for the file format that can contain triples
-
edam_data
= 'data_0582'¶
-
edam_format
= 'format_2376'¶
-
file_ext
= 'triples'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.triples.
NTriples
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
,galaxy.datatypes.triples.Triples
The N-Triples triple data format
-
edam_format
= 'format_3256'¶
-
file_ext
= 'nt'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶ Returns false and the user must manually set.
-
-
class
galaxy.datatypes.triples.
N3
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
,galaxy.datatypes.triples.Triples
The N3 triple data format
-
edam_format
= 'format_3257'¶
-
file_ext
= 'n3'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.triples.
Turtle
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
,galaxy.datatypes.triples.Triples
The Turtle triple data format
-
edam_format
= 'format_3255'¶
-
file_ext
= 'ttl'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶ Returns false and the user must manually set.
-
-
class
galaxy.datatypes.triples.
Rdf
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
,galaxy.datatypes.triples.Triples
Resource Description Framework format (http://www.w3.org/RDF/).
-
edam_format
= 'format_3261'¶
-
file_ext
= 'rdf'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is XML or not
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' ) >>> GenericXml().sniff( fname ) True >>> fname = get_test_fname( 'interval.interval' ) >>> GenericXml().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶ Returns false and the user must manually set.
-
-
class
galaxy.datatypes.triples.
Jsonld
(**kwd)[source]¶ Bases:
galaxy.datatypes.text.Json
,galaxy.datatypes.triples.Triples
The JSON-LD data format
-
edam_format
= 'format_3464'¶
-
file_ext
= 'jsonld'¶
-
sniff_prefix
(file_prefix)[source]¶ Try to load the string with the json module. If successful it’s a json file.
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶ Returns false and the user must manually set.
-
-
class
galaxy.datatypes.triples.
HDT
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
,galaxy.datatypes.triples.Triples
The HDT triple data format
-
edam_format
= 'format_2376'¶
-
file_ext
= 'hdt'¶
-
metadata_spec
= {'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
galaxy.datatypes.upload_util module¶
galaxy.datatypes.xml module¶
XML format classes
-
class
galaxy.datatypes.xml.
GenericXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Base format class for any XML file.
-
edam_format
= 'format_2332'¶
-
file_ext
= 'xml'¶
-
sniff_prefix
(file_prefix)[source]¶ Determines whether the file is XML or not
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' ) >>> GenericXml().sniff( fname ) True >>> fname = get_test_fname( 'interval.interval' ) >>> GenericXml().sniff( fname ) False
-
static
merge
(split_files, output_file)[source]¶ Merging multiple XML files is non-trivial and must be done in subclasses.
-
dataproviders
= {'base': <function Data.base_dataprovider>, 'chunk': <function Data.chunk_dataprovider>, 'chunk64': <function Data.chunk64_dataprovider>, 'line': <function Text.line_dataprovider>, 'regex-line': <function Text.regex_line_dataprovider>, 'xml': <function GenericXml.xml_dataprovider>}¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
-
class
galaxy.datatypes.xml.
MEMEXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
MEME XML Output data
-
file_ext
= 'memexml'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
sniff_prefix
(file_prefix)¶ Determines whether the file is XML or not
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' ) >>> GenericXml().sniff( fname ) True >>> fname = get_test_fname( 'interval.interval' ) >>> GenericXml().sniff( fname ) False
-
-
class
galaxy.datatypes.xml.
CisML
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
CisML XML data
-
file_ext
= 'cisml'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
sniff
(filename)¶
-
sniff_prefix
(file_prefix)¶ Determines whether the file is XML or not
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' ) >>> GenericXml().sniff( fname ) True >>> fname = get_test_fname( 'interval.interval' ) >>> GenericXml().sniff( fname ) False
-
-
class
galaxy.datatypes.xml.
Dzi
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Deep zoom image format, see https://github.com/openseadragon/openseadragon/wiki/The-DZI-File-Format
-
file_ext
= 'dzi'¶
-
sniff_prefix
(file_prefix)[source]¶ Checking for keyword - ‘Collection’ or ‘Image’ in the first 200 lines. >>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname(‘1.dzi’) >>> Dzi().sniff(fname) True >>> fname = get_test_fname(‘megablast_xml_parser_test1.blastxml’) >>> Dzi().sniff(fname) False
-
metadata_spec
= {'base_name': <galaxy.model.metadata.MetadataElementSpec object>, 'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>, 'format': <galaxy.model.metadata.MetadataElementSpec object>, 'height': <galaxy.model.metadata.MetadataElementSpec object>, 'max_level': <galaxy.model.metadata.MetadataElementSpec object>, 'overlap': <galaxy.model.metadata.MetadataElementSpec object>, 'quality': <galaxy.model.metadata.MetadataElementSpec object>, 'tile_size': <galaxy.model.metadata.MetadataElementSpec object>, 'width': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.xml.
Phyloxml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Format for defining phyloxml data http://www.phyloxml.org/
-
edam_data
= 'data_0872'¶
-
edam_format
= 'format_3159'¶
-
file_ext
= 'phyloxml'¶
-
sniff_prefix
(file_prefix)[source]¶ “Checking for keyword - ‘phyloxml’ always in lowercase in the first few lines.
>>> from galaxy.datatypes.sniff import get_test_fname >>> fname = get_test_fname( '1.phyloxml' ) >>> Phyloxml().sniff( fname ) True >>> fname = get_test_fname( 'interval.interval' ) >>> Phyloxml().sniff( fname ) False >>> fname = get_test_fname( 'megablast_xml_parser_test1.blastxml' ) >>> Phyloxml().sniff( fname ) False
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.xml.
Owl
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Web Ontology Language OWL format description http://www.w3.org/TR/owl-ref/
-
edam_format
= 'format_3262'¶
-
file_ext
= 'owl'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-
-
class
galaxy.datatypes.xml.
Sbml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
System Biology Markup Language http://sbml.org
-
file_ext
= 'sbml'¶
-
edam_data
= 'data_2024'¶
-
edam_format
= 'format_2585'¶
-
metadata_spec
= {'data_lines': <galaxy.model.metadata.MetadataElementSpec object>, 'dbkey': <galaxy.model.metadata.MetadataElementSpec object>}¶
-