4.1.1.6. datanator.util package

4.1.1.6.1. Submodules

4.1.1.6.2. datanator.util.build_util module

datanator.util.build_util.continuousload(method)[source]
datanator.util.build_util.timeloadcontent(method)[source]
datanator.util.build_util.timemethod(method)[source]

4.1.1.6.3. datanator.util.constants module

4.1.1.6.4. datanator.util.molecule_util module

Utilities for dealing with molecules

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-12
Copyright:2017, Karr Lab
License:MIT
class datanator.util.molecule_util.InchiMolecule(structure)[source]

Bases: object

Represents the InChI-encoded structure of a molecule

formula[source]

empirical formula layer

Type:str
connections[source]

atomic conncetions (c) layer

Type:str
hydrogens[source]

hydrogen (h) layer

Type:str
protons[source]

proton (p) layer

Type:str
charge[source]

charge (q) layer

Type:str
double_bonds[source]

double bounds (b) layer

Type:str
stereochemistry[source]

stereochemistry (t) layer

Type:str
stereochemistry_parity[source]

stereochemistry parity (m) layer

Type:str
stereochemistry_type[source]

stereochemistry type (s) layer

Type:str
isotopes[source]

isotype (i) layer

Type:str
fixed_hydrogens[source]

fixed hydrogens (f) layer

Type:str
reconnected_metals[source]

reconnected metal (r) layer

Type:str
LAYERS[source]

dictionary of layer prefixes and names

Type:dict
LAYERS = {'': 'formula', 'b': 'double_bonds', 'c': 'connections', 'f': 'fixed_hydrogens', 'h': 'hydrogens', 'i': 'isotopes', 'm': 'stereochemistry_parity', 'p': 'protons', 'q': 'charge', 'r': 'reconnected_metals', 's': 'stereochemistry_type', 't': 'stereochemistry'}[source]
__str__()[source]

Generate an InChI string representation of the molecule

Returns:InChI string representation of the molecule
Return type:str
get_formula_and_connectivity()[source]

Get a string representation of the formula and connectivity

Returns:string representation of the formula and connectivity
Return type:str
is_equal(other, check_protonation=True, check_double_bonds=True, check_stereochemistry=True, check_isotopes=True, check_fixed_hydrogens=True, check_reconnected_metals=True)[source]

Determine if two molecules are semantically equal (all of their layers are equal).

Parameters:
  • other (InchiMolecule) – other molecule
  • check_protonation (bool, optional) – if obj:True, check that the protonation states (h, p, q) are equal
  • check_double_bonds (bool, optional) – if obj:True, check that the doubling bonding layers (b) are equal
  • check_stereochemistry (bool, optional) – if obj:True, check that the stereochemistry layers (t, m, s) are equal
  • check_isotopes (bool, optional) – if obj:True, check that the isotopic layers (i) are equal
  • check_fixed_hydrogens (bool, optional) – if obj:True, check that the fixed hydrogen layers (f) are equal
  • check_reconnected_metals (bool, optional) – if obj:True, check that the reconnected metals layers (r) are equal
Returns:

True the molecules are semantically equal

Return type:

bool

is_protonation_isomer(other)[source]

Determine if two molecules are protonation isomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are protonation isomers
Return type:bool
is_stereoisomer(other)[source]

Determine if two molecules are steroisomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are stereoisomers
Return type:bool
is_tautomer(other)[source]

Determine if two molecules are tautomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are tautomers
Return type:bool
remove_layer(layer)[source]

Remove a layer from a structure

Parameters:layer (str) – name of the layer
class datanator.util.molecule_util.Molecule(id='', name='', structure='', cross_references=None)[source]

Bases: object

Represents a molecule

id[source]

identifier

Type:str
name[source]

name

Type:str
structure[source]

structure in InChI, MOL, or canonical SMILES format

Type:str
cross_references[source]

list of cross references

Type:list
get_fingerprint(type='fp2')[source]

Calculate a fingerprint

Parameters:type (str, optional) – fingerprint type to calculate
Returns:fingerprint
Return type:pybel.Fingerprint
static get_fingerprint_types()[source]

Get list of fingerprint types

Returns:list of fingerprint types
Return type:list of str
get_format()[source]

Get the format of the structure

Returns:format
Return type:str
get_similarity(other, fingerprint_type='fp2')[source]

Calculate the similarity with another molecule

Parameters:
  • other (Molecule) – a second molecule
  • fingerprint_type (str, optional) – fingerprint type to use to calculate similarity
Returns:

the similarity with the other molecule

Return type:

float

to_format(format)[source]

Get the structure in a format

:param str: format such as inchi, mol, smiles

Returns:structure in a format
Return type:str
to_inchi()[source]

Get the structure in InChI format

Returns:structure in InChi format
Return type:str
to_mol()[source]

Get the structure in MOL format

Returns:structure in MOL format
Return type:str
to_openbabel()[source]

Create an Open Babel molecule for the molecule

Returns:Open Babel molecule
Return type:openbabel.OBMol
to_pybel()[source]

Create a pybel molecule for the molecule

Returns:pybel molecule
Return type:pybel.Molecule
to_smiles()[source]

Get the structure in SMILES format

Returns:structure in SMILES format
Return type:str

4.1.1.6.5. datanator.util.reaction_util module

Utilities for dealing with reactions

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-13
Copyright:2017, Karr Lab
License:MIT
datanator.util.reaction_util.calc_reactant_product_pairs(reaction)[source]

Get list of pairs of similar reactants and products using a greedy algorithm.

Parameters:reaction (data_model.Reaction) – reaction
Returns:data_model.Specie, data_model.Specie: list of pairs of similar reactants and products
Return type:list of tuple of obj

4.1.1.6.6. datanator.util.rna_seq_util module

Utilities for RNA-seq data

Author:Jonathan Karr <jonrkarr@gmail.com>
Author:Yosef Roth <yosefdroth@gmail.com>
Date:2018-01-15
Copyright:2018, Karr Lab
License:MIT
class datanator.util.rna_seq_util.Kallisto[source]

Bases: object

Python interface to kallisto.

index(fasta_filenames, index_filename=None, kmer_size=31, make_unique=False)[source]

Generate index from FASTA files

Parameters:
  • fastq_filenames (list of str) – paths to FASTA files
  • index_filename (str, optional) – path to the kallisto index file to be created
  • kmer_size (int, optional) – k-mer length
  • make_unique (bool, optional) – if True, replace repeated target names with unique names
quant(fastq_filenames, index_filename=None, output_dirname=None, bias=False, bootstrap_samples=0, seed=42, plaintext=False, fusion=False, single_end_reads=False, forward_stranded=False, reverse_stranded=False, fragment_length=None, fragment_length_std=None, threads=1, pseudobam=False)[source]

Process RNA-seq FASTQ files

Parameters:
  • fastq_filenames (list of str) – paths to FASTQ files
  • index_filename (str, optional) – path to the kallisto index file to be used for quantification
  • output_dirname (str, optional) – path to the output directory
  • single_end_reads (bool, optional) – if True, quantify single-end reads
  • fragment_length (float, optional) – estimated average fragment length
  • fragment_length_std (float, optional) – estimated standard deviation of fragment length

4.1.1.6.7. datanator.util.taxonomy_util module

Utilities for dealing with taxa

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-11
Copyright:2017, Karr Lab
License:MIT
class datanator.util.taxonomy_util.Taxon(id='', name='', ncbi_id=None, cross_references=None)[source]

Bases: object

Represents a taxon such as a genus, species, or strain

id[source]

identifier

Type:str
name[source]

name of the taxon

Type:str
id_of_nearest_ncbi_taxon[source]

ID of the nearest parent taxon which is in the NCBI database

Type:int
distance_from_nearest_ncbi_taxon[source]

distance from the taxon to its nearest parent which is in the NCBI database

Type:int
additional_name_beyond_nearest_ncbi_taxon[source]

additional part of the taxon’s beyond that of its nearest parent in the NCBI database

Type:str
cross_references[source]

list of cross references

Type:list
get_common_ancestor(other)[source]

Get the lastest common ancestor of two taxa

Parameters:other (Taxon) – a second taxon
Returns:latest common ancestor
Return type:Taxon
get_distance_to_common_ancestor(other)[source]

Calculate the number of links in the NCBI taxonomic tree between two taxa and their latest common ancestor

Note: This distances depends on the granularity of the lineage of the taxon. For example, there are only 7 links between most bacteria species and the Bacteria superkingdom. However, there are 28 links between the Homo sapiens species and the Eukaryota superkingdom.

Parameters:other (Taxon) – a second taxon
Returns:
number of links between self and its latest common ancestor with other in the NCBI
taxonomic tree
Return type:int
get_distance_to_root()[source]

Get the distance from the taxon to the root of the NCBI taxonomy tree

Returns:distance from the taxon to the root
Return type:int
get_max_distance_to_common_ancestor()[source]

Get the maximum distance from the taxon to a common ancestor with another taxon

Returns:maximum distance from the taxon to a common ancestor with another taxon
Return type:int
get_ncbi_id()[source]

Get the ID of the taxon within the NCBI database

Returns:
ID of the taxon within the NCBI database or
None if the taxon isn’t in the NCBI database
Return type:int or None
get_parent_taxa()[source]

Get parent taxa

Returns:list of parent taxa
Return type:list of Taxon
get_rank()[source]

Get the rank of the taxon

Returns:rank of the taxon
Return type:str
datanator.util.taxonomy_util.setup_database(force_update=False)[source]

Setup a local sqllite copy of the NCBI Taxonomy database. If force_update is False, then only download the content from NCBI and build the sqllite database, if a local database doesn’t already exist. If force_update is True, then always download the content from NCBI and rebuild the sqllite copy of the database.

Parameters:force_update (bool, optional) –
  • False: only download the content for the database and build a local sqllite database
    if a local sqllite copy of the database doesn’t already exist
  • True: always download the content for the database from NCBI and rebuild a local sqllite
    database

4.1.1.6.8. datanator.util.warning_util module

Warning utilities

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-04-13
Copyright:2017, Karr Lab
License:MIT
datanator.util.warning_util.disable_warnings()[source]

Disable warning messages from openbabel and urllib

datanator.util.warning_util.enable_warnings()[source]

Enable warning messages from openbabel and urllib

4.1.1.6.9. Module contents