4.1.1.7. datanator.util package

4.1.1.7.1. Submodules

4.1.1.7.2. datanator.util.build_util module

datanator.util.build_util.continuousload(method)[source]
datanator.util.build_util.timeloadcontent(method)[source]
datanator.util.build_util.timemethod(method)[source]

4.1.1.7.3. datanator.util.calc_tanimoto module

class datanator.util.calc_tanimoto.CalcTanimoto(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

Calculating the tanitomo similarity matrix given two compound collections e.g. ECMDB YMDB

get_tanimoto(mol1, mol2, str_format='inchi', rounding=3)[source]

Calculates tanimoto coefficients between two molecules, mol1 and mol2 :param mol1: molecule 1 in some format :param mol2: molecule 2 in same format as molecule 1 :param str_format: format for molecular representation

supported formats are provided by Pybel
Parameters:rounding – rounding of the final results
Returns:rounded tanimoto coefficient
Return type:tani
many_to_many(collection_str1='metabolites_meta', collection_str2='metabolites_meta', field1='inchi_deprot', field2='inchi_deprot', lookup1='inchi_hashed', lookup2='inchi_hashed', num=100)[source]

Go through collection_str and assign each compound top ‘num’ amount of most similar compounds :param collection_str1: collection in which compound is drawn :param collection_str2: collection in which comparison is made :param field1: field of interest in collection_str1 :param field2: filed of interest in collection_str2 :param num: number of most similar compound :param batch_size: batch_size for each server round trip

one_to_many(inchi, collection_str='metabolites_meta', field='inchi_deprot', lookup='inchi_hashed', num=100)[source]

Calculate tanimoto coefficients between one metabolite and the rest of the ‘collection_str’ :param inchi: chosen chemical compound in InChI format :param collection_str: collection in which comparisons are made :param field: field that has the chemical structure :param lookup: field that had been previous indexed :param num: max number of compounds to be returned, sorted by tanimoto

Returns:sorted numpy array of top num tanimoto coeff sorted_inchi: sorted top num inchi
Return type:sorted_coeff
datanator.util.calc_tanimoto.main()[source]

4.1.1.7.4. datanator.util.chem_util module

class datanator.util.chem_util.ChemUtil[source]

Bases: object

hash_inchi(inchi='InChI = None')[source]

Hash inchi string using sha224

simplify_inchi(inchi='InChI = None')[source]

Remove molecules’s protonation state “InChI=1S/H2O/h1H2” = > “InChI=1S/H2O”

4.1.1.7.5. datanator.util.constants module

4.1.1.7.6. datanator.util.file_util module

class datanator.util.file_util.FileUtil[source]

Bases: object

access_dict_by_index(_dict, count)[source]

Assuming dict has an order, return the first num of elements in dictionary :param _dict: { ‘a’:1, ‘b’:2, ‘c’:3, … } :param count: number of items to return

Returns:
a dictionary with the first count
from _dict {‘a’:1}
Return type:result
extract_values(obj, key)[source]

Pull all values of specified key from nested JSON.

flatten_json(nested_json)[source]

Flatten json object with nested keys into a single level. e.g. {a: b, {a: b,

c: [ d: e,
{d: e}, => f: g } {f: g}]}
Parameters:nested_json – A nested json object.
Returns:The flattened json object if successful, None otherwise.
get_common(list1, list2)[source]

Given two lists, find the closest common ancestor :param list1: [a, b, c, f, g] :param list2: [a, b, d, e]

Returns:
the closest common ancestor, in
the above example would be b
Return type:result
make_dict(keys, values)[source]

Give two lists, make a list of dictionaries :param keys: [a, b, c, d, …] :param values: [1, 2, 3, 4]

Returns:{‘a’: 1, ‘b’: 2, ‘c’: 3, …}
Return type:dic
replace_dict_key(_dict, replacements)[source]

Replace keys in a dictionary with the order in replacements e.g., {‘a’: 0, ‘b’: 1, ‘c’: 2}, [‘d’, ‘e’, ‘f’] => {‘d’: 0, ‘e’: 1, ‘f’: 2} :param _dict: dictionary whose keys are to be replaced :param replacement: list of replacement keys

Returns:dictionary with replaced keys
Return type:result
unpack_list(_list)[source]

Unpack sublists in a list :param _list: a list containing sublists e.g. [ […], […], … ]

Returns:unpacked list e.g. [ …. ]
Return type:result

4.1.1.7.7. datanator.util.index_collection module

Index collections in MongoDB accordingly

class datanator.util.index_collection.IndexCollection(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

index_corum(collection_str)[source]

Index fields in corum collection

index_intact_complex(collection_str='intact_complex')[source]

Index intact_complex collection

index_metabolites_meta(collection_str='metabolites_meta')[source]

Index metabolites_meta collection

index_pax(collection_str='pax')[source]

Index Pax collection

index_sabio(collection_str='sabio_rk')[source]

Index relevant fields in sabio_rk collection

index_strdb(collection_str='ecmdb')[source]

Index relevant fields in string only collections: ecmdb, ymdb, and intact_interaction

index_uniprot(collection_str='uniprot')[source]

Index uniprot collection

datanator.util.index_collection.main()[source]

4.1.1.7.8. datanator.util.molecule_util module

Utilities for dealing with molecules

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-12
Copyright:2017, Karr Lab
License:MIT
class datanator.util.molecule_util.InchiMolecule(structure)[source]

Bases: object

Represents the InChI-encoded structure of a molecule

formula[source]

empirical formula layer

Type:str
connections[source]

atomic conncetions (c) layer

Type:str
hydrogens[source]

hydrogen (h) layer

Type:str
protons[source]

proton (p) layer

Type:str
charge[source]

charge (q) layer

Type:str
double_bonds[source]

double bounds (b) layer

Type:str
stereochemistry[source]

stereochemistry (t) layer

Type:str
stereochemistry_parity[source]

stereochemistry parity (m) layer

Type:str
stereochemistry_type[source]

stereochemistry type (s) layer

Type:str
isotopes[source]

isotype (i) layer

Type:str
fixed_hydrogens[source]

fixed hydrogens (f) layer

Type:str
reconnected_metals[source]

reconnected metal (r) layer

Type:str
LAYERS[source]

dictionary of layer prefixes and names

Type:dict
LAYERS = {'': 'formula', 'b': 'double_bonds', 'c': 'connections', 'f': 'fixed_hydrogens', 'h': 'hydrogens', 'i': 'isotopes', 'm': 'stereochemistry_parity', 'p': 'protons', 'q': 'charge', 'r': 'reconnected_metals', 's': 'stereochemistry_type', 't': 'stereochemistry'}[source]
__str__()[source]

Generate an InChI string representation of the molecule

Returns:InChI string representation of the molecule
Return type:str
get_formula_and_connectivity()[source]

Get a string representation of the formula and connectivity

Returns:string representation of the formula and connectivity
Return type:str
is_equal(other, check_protonation=True, check_double_bonds=True, check_stereochemistry=True, check_isotopes=True, check_fixed_hydrogens=True, check_reconnected_metals=True)[source]

Determine if two molecules are semantically equal (all of their layers are equal).

Parameters:
  • other (InchiMolecule) – other molecule
  • check_protonation (bool, optional) – if obj:True, check that the protonation states (h, p, q) are equal
  • check_double_bonds (bool, optional) – if obj:True, check that the doubling bonding layers (b) are equal
  • check_stereochemistry (bool, optional) – if obj:True, check that the stereochemistry layers (t, m, s) are equal
  • check_isotopes (bool, optional) – if obj:True, check that the isotopic layers (i) are equal
  • check_fixed_hydrogens (bool, optional) – if obj:True, check that the fixed hydrogen layers (f) are equal
  • check_reconnected_metals (bool, optional) – if obj:True, check that the reconnected metals layers (r) are equal
Returns:

True the molecules are semantically equal

Return type:

bool

is_protonation_isomer(other)[source]

Determine if two molecules are protonation isomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are protonation isomers
Return type:bool
is_stereoisomer(other)[source]

Determine if two molecules are steroisomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are stereoisomers
Return type:bool
is_tautomer(other)[source]

Determine if two molecules are tautomers

Parameters:other (InchiMolecule) – other molecule
Returns:True if the molecules are tautomers
Return type:bool
remove_layer(layer)[source]

Remove a layer from a structure

Parameters:layer (str) – name of the layer
class datanator.util.molecule_util.Molecule(id='', name='', structure='', cross_references=None)[source]

Bases: object

Represents a molecule

id[source]

identifier

Type:str
name[source]

name

Type:str
structure[source]

structure in InChI, MOL, or canonical SMILES format

Type:str
cross_references[source]

list of cross references

Type:list of CrossReference
get_fingerprint(type='fp2')[source]

Calculate a fingerprint

Parameters:type (str, optional) – fingerprint type to calculate
Returns:fingerprint
Return type:pybel.Fingerprint
static get_fingerprint_types()[source]

Get list of fingerprint types

Returns:list of fingerprint types
Return type:list of str
get_format()[source]

Get the format of the structure

Returns:format
Return type:str
get_similarity(other, fingerprint_type='fp2')[source]

Calculate the similarity with another molecule

Parameters:
  • other (Molecule) – a second molecule
  • fingerprint_type (str, optional) – fingerprint type to use to calculate similarity
Returns:

the similarity with the other molecule

Return type:

float

to_format(format)[source]

Get the structure in a format

:param str: format such as inchi, mol, smiles

Returns:structure in a format
Return type:str
to_inchi()[source]

Get the structure in InChI format

Returns:structure in InChi format
Return type:str
to_mol()[source]

Get the structure in MOL format

Returns:structure in MOL format
Return type:str
to_openbabel()[source]

Create an Open Babel molecule for the molecule

Returns:Open Babel molecule
Return type:openbabel.OBMol
to_pybel()[source]

Create a pybel molecule for the molecule

Returns:pybel molecule
Return type:pybel.Molecule
to_smiles()[source]

Get the structure in SMILES format

Returns:structure in SMILES format
Return type:str

4.1.1.7.9. datanator.util.mongo_util module

class datanator.util.mongo_util.MongoUtil(cache_dirname=None, MongoDB=None, replicaSet=None, db='test', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: object

con_db(collection_str)[source]
fill_db(collection_str, sym_link=False)[source]

Check if collection is already in MongoDB if already in:

do nothing
else:
load data into db from quiltdata (karrlab/datanator_nosql)
collection_str[source]

name of collection (e.g. ‘ecmdb’, ‘pax’, etc)

whether download should be a sym link

flatten_collection(collection_str)[source]

Flatten a collection

c is ommitted because it does not have a non-object value associated with it

list_all_collections()[source]

List all non-system collections within database

print_schema(collection_str)[source]

Print out schema of a collection removed ‘_id’ from collection due to its object type and universality

4.1.1.7.10. datanator.util.reaction_util module

Utilities for dealing with reactions

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-13
Copyright:2017, Karr Lab
License:MIT
datanator.util.reaction_util.calc_reactant_product_pairs(reaction)[source]

Get list of pairs of similar reactants and products using a greedy algorithm.

Parameters:reaction (data_model.Reaction) – reaction
Returns:data_model.Specie, data_model.Specie: list of pairs of similar reactants and products
Return type:list of tuple of obj

4.1.1.7.11. datanator.util.rna_seq_util module

Utilities for RNA-seq data

Author:Jonathan Karr <jonrkarr@gmail.com>
Author:Yosef Roth <yosefdroth@gmail.com>
Date:2018-01-15
Copyright:2018, Karr Lab
License:MIT
class datanator.util.rna_seq_util.Kallisto[source]

Bases: object

Python interface to kallisto.

index(fasta_filenames, index_filename=None, kmer_size=31, make_unique=False)[source]

Generate index from FASTA files

Parameters:
  • fastq_filenames (list of str) – paths to FASTA files
  • index_filename (str, optional) – path to the kallisto index file to be created
  • kmer_size (int, optional) – k-mer length
  • make_unique (bool, optional) – if True, replace repeated target names with unique names
quant(fastq_filenames, index_filename=None, output_dirname=None, bias=False, bootstrap_samples=0, seed=42, plaintext=False, fusion=False, single_end_reads=False, forward_stranded=False, reverse_stranded=False, fragment_length=None, fragment_length_std=None, threads=1, pseudobam=False)[source]

Process RNA-seq FASTQ files

Parameters:
  • fastq_filenames (list of str) – paths to FASTQ files
  • index_filename (str, optional) – path to the kallisto index file to be used for quantification
  • output_dirname (str, optional) – path to the output directory
  • single_end_reads (bool, optional) – if True, quantify single-end reads
  • fragment_length (float, optional) – estimated average fragment length
  • fragment_length_std (float, optional) – estimated standard deviation of fragment length

4.1.1.7.12. datanator.util.server_util module

class datanator.util.server_util.ServerUtil(config_file=None, username=None, password=None, server=None, port=None, verbose=True)[source]

Bases: object

Utility function to read authentication files for connection with MongoDB servers on AWS

[user] User = some-user Password = some-password Server = server-address Port = port-number

get_user_config(username='admin')[source]

4.1.1.7.13. datanator.util.standardize_util module

standardize key to a uniform nomenclature

class datanator.util.standardize_util.StandardizeUtil(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf)[source]

Bases: datanator.util.mongo_util.MongoUtil

standardize_metabolite()[source]
standardize_sabio()[source]

4.1.1.7.14. datanator.util.taxonomy_util module

Utilities for dealing with taxa

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-04-11
Copyright:2017, Karr Lab
License:MIT
class datanator.util.taxonomy_util.Taxon(id='', name='', ncbi_id=None, cross_references=None)[source]

Bases: object

Represents a taxon such as a genus, species, or strain

id[source]

identifier

Type:str
name[source]

name of the taxon

Type:str
id_of_nearest_ncbi_taxon[source]

ID of the nearest parent taxon which is in the NCBI database

Type:int
distance_from_nearest_ncbi_taxon[source]

distance from the taxon to its nearest parent which is in the NCBI database

Type:int
additional_name_beyond_nearest_ncbi_taxon[source]

additional part of the taxon’s beyond that of its nearest parent in the NCBI database

Type:str
cross_references[source]

list of cross references

Type:list of CrossReference
get_common_ancestor(other)[source]

Get the lastest common ancestor of two taxa

Parameters:other (Taxon) – a second taxon
Returns:latest common ancestor
Return type:Taxon
get_distance_to_common_ancestor(other)[source]

Calculate the number of links in the NCBI taxonomic tree between two taxa and their latest common ancestor

Note: This distances depends on the granularity of the lineage of the taxon. For example, there are only 7 links between most bacteria species and the Bacteria superkingdom. However, there are 28 links between the Homo sapiens species and the Eukaryota superkingdom.

Parameters:other (Taxon) – a second taxon
Returns:
number of links between self and its latest common ancestor with other in the NCBI
taxonomic tree
Return type:int
get_distance_to_root()[source]

Get the distance from the taxon to the root of the NCBI taxonomy tree

Returns:distance from the taxon to the root
Return type:int
get_max_distance_to_common_ancestor()[source]

Get the maximum distance from the taxon to a common ancestor with another taxon

Returns:maximum distance from the taxon to a common ancestor with another taxon
Return type:int
get_ncbi_id()[source]

Get the ID of the taxon within the NCBI database

Returns:
ID of the taxon within the NCBI database or
None if the taxon isn’t in the NCBI database
Return type:int or None
get_parent_taxa()[source]

Get parent taxa

Returns:list of parent taxa
Return type:list of Taxon
get_rank()[source]

Get the rank of the taxon

Returns:rank of the taxon
Return type:str
datanator.util.taxonomy_util.setup_database(force_update=False)[source]

Setup a local sqllite copy of the NCBI Taxonomy database. If force_update is False, then only download the content from NCBI and build the sqllite database, if a local database doesn’t already exist. If force_update is True, then always download the content from NCBI and rebuild the sqllite copy of the database.

Parameters:force_update (bool, optional) –
  • False: only download the content for the database and build a local sqllite database
    if a local sqllite copy of the database doesn’t already exist
  • True: always download the content for the database from NCBI and rebuild a local sqllite
    database

4.1.1.7.15. datanator.util.warning_util module

Warning utilities

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-04-13
Copyright:2017, Karr Lab
License:MIT
datanator.util.warning_util.disable_warnings()[source]

Disable warning messages from openbabel and urllib

datanator.util.warning_util.enable_warnings()[source]

Enable warning messages from openbabel and urllib

4.1.1.7.16. Module contents