4.1.1.7. datanator.util package¶

4.1.1.7.1. Submodules¶

4.1.1.7.2. datanator.util.build_util module¶

datanator.util.build_util.continuousload(method)[source]¶

datanator.util.build_util.timeloadcontent(method)[source]¶

datanator.util.build_util.timemethod(method)[source]¶

4.1.1.7.3. datanator.util.calc_tanimoto module¶

class datanator.util.calc_tanimoto.CalcTanimoto(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

Calculating the tanitomo similarity matrix given two compound collections e.g. ECMDB YMDB

get_tanimoto(mol1, mol2, str_format='inchi', rounding=3)[source]¶

Calculates tanimoto coefficients between two molecules, mol1 and mol2 :param mol1: molecule 1 in some format :param mol2: molecule 2 in same format as molecule 1 :param str_format: format for molecular representation

supported formats are provided by Pybel

Parameters:	rounding – rounding of the final results
Returns:	rounded tanimoto coefficient
Return type:	tani

many_to_many(collection_str1='metabolites_meta', collection_str2='metabolites_meta', field1='inchi_deprot', field2='inchi_deprot', lookup1='inchi_hashed', lookup2='inchi_hashed', num=100)[source]¶: Go through collection_str and assign each compound top ‘num’ amount of most similar compounds :param collection_str1: collection in which compound is drawn :param collection_str2: collection in which comparison is made :param field1: field of interest in collection_str1 :param field2: filed of interest in collection_str2 :param num: number of most similar compound :param batch_size: batch_size for each server round trip

one_to_many(inchi, collection_str='metabolites_meta', field='inchi_deprot', lookup='inchi_hashed', num=100)[source]¶

Calculate tanimoto coefficients between one metabolite and the rest of the ‘collection_str’ :param inchi: chosen chemical compound in InChI format :param collection_str: collection in which comparisons are made :param field: field that has the chemical structure :param lookup: field that had been previous indexed :param num: max number of compounds to be returned, sorted by tanimoto

Returns:	sorted numpy array of top num tanimoto coeff sorted_inchi: sorted top num inchi
Return type:	sorted_coeff

datanator.util.calc_tanimoto.main()[source]¶

4.1.1.7.4. datanator.util.chem_util module¶

class datanator.util.chem_util.ChemUtil[source]¶

Bases: object

hash_inchi(inchi='InChI = None')[source]¶: Hash inchi string using sha224

simplify_inchi(inchi='InChI = None')[source]¶: Remove molecules’s protonation state “InChI=1S/H2O/h1H2” = > “InChI=1S/H2O”

4.1.1.7.5. datanator.util.constants module¶

4.1.1.7.6. datanator.util.file_util module¶

class datanator.util.file_util.FileUtil[source]¶

Bases: object

access_dict_by_index(_dict, count)[source]¶

Assuming dict has an order, return the first num of elements in dictionary :param _dict: { ‘a’:1, ‘b’:2, ‘c’:3, … } :param count: number of items to return

Returns:	a dictionary with the first count from _dict {‘a’:1}
Return type:	result

extract_values(obj, key)[source]¶: Pull all values of specified key from nested JSON.

flatten_json(nested_json)[source]¶

Flatten json object with nested keys into a single level. e.g. {a: b, {a: b,

c: [ d: e,

{d: e}, => f: g } {f: g}]}

Parameters:	nested_json – A nested json object.
Returns:	The flattened json object if successful, None otherwise.

get_common(list1, list2)[source]¶

Given two lists, find the closest common ancestor :param list1: [a, b, c, f, g] :param list2: [a, b, d, e]

Returns:	the closest common ancestor, in the above example would be b
Return type:	result

make_dict(keys, values)[source]¶

Give two lists, make a list of dictionaries :param keys: [a, b, c, d, …] :param values: [1, 2, 3, 4]

Returns:	{‘a’: 1, ‘b’: 2, ‘c’: 3, …}
Return type:	dic

replace_dict_key(_dict, replacements)[source]¶

Replace keys in a dictionary with the order in replacements e.g., {‘a’: 0, ‘b’: 1, ‘c’: 2}, [‘d’, ‘e’, ‘f’] => {‘d’: 0, ‘e’: 1, ‘f’: 2} :param _dict: dictionary whose keys are to be replaced :param replacement: list of replacement keys

Returns:	dictionary with replaced keys
Return type:	result

unpack_list(_list)[source]¶

Unpack sublists in a list :param _list: a list containing sublists e.g. [ […], […], … ]

Returns:	unpacked list e.g. [ …. ]
Return type:	result

4.1.1.7.7. datanator.util.index_collection module¶

Index collections in MongoDB accordingly

class datanator.util.index_collection.IndexCollection(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

index_corum(collection_str)[source]¶: Index fields in corum collection

index_intact_complex(collection_str='intact_complex')[source]¶: Index intact_complex collection

index_metabolites_meta(collection_str='metabolites_meta')[source]¶: Index metabolites_meta collection

index_pax(collection_str='pax')[source]¶: Index Pax collection

index_sabio(collection_str='sabio_rk')[source]¶: Index relevant fields in sabio_rk collection

index_strdb(collection_str='ecmdb')[source]¶: Index relevant fields in string only collections: ecmdb, ymdb, and intact_interaction

index_uniprot(collection_str='uniprot')[source]¶: Index uniprot collection

datanator.util.index_collection.main()[source]¶

4.1.1.7.8. datanator.util.molecule_util module¶

Utilities for dealing with molecules

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2017-04-12
Copyright:	2017, Karr Lab
License:	MIT

class datanator.util.molecule_util.InchiMolecule(structure)[source]¶

Bases: object

Represents the InChI-encoded structure of a molecule

formula[source]¶

empirical formula layer

Type:	`str`

connections[source]¶

atomic conncetions (c) layer

Type:	`str`

hydrogens[source]¶

hydrogen (h) layer

Type:	`str`

protons[source]¶

proton (p) layer

Type:	`str`

charge[source]¶

charge (q) layer

Type:	`str`

double_bonds[source]¶

double bounds (b) layer

Type:	`str`

stereochemistry[source]¶

stereochemistry (t) layer

Type:	`str`

stereochemistry_parity[source]¶

stereochemistry parity (m) layer

Type:	`str`

stereochemistry_type[source]¶

stereochemistry type (s) layer

Type:	`str`

isotopes[source]¶

isotype (i) layer

Type:	`str`

fixed_hydrogens[source]¶

fixed hydrogens (f) layer

Type:	`str`

reconnected_metals[source]¶

reconnected metal (r) layer

Type:	`str`

LAYERS[source]¶

dictionary of layer prefixes and names

Type:	`dict`

LAYERS = {'': 'formula', 'b': 'double_bonds', 'c': 'connections', 'f': 'fixed_hydrogens', 'h': 'hydrogens', 'i': 'isotopes', 'm': 'stereochemistry_parity', 'p': 'protons', 'q': 'charge', 'r': 'reconnected_metals', 's': 'stereochemistry_type', 't': 'stereochemistry'}[source]

__str__()[source]¶

Generate an InChI string representation of the molecule

Returns:	InChI string representation of the molecule
Return type:	`str`

get_formula_and_connectivity()[source]¶

Get a string representation of the formula and connectivity

Returns:	string representation of the formula and connectivity
Return type:	`str`

is_equal(other, check_protonation=True, check_double_bonds=True, check_stereochemistry=True, check_isotopes=True, check_fixed_hydrogens=True, check_reconnected_metals=True)[source]¶

Determine if two molecules are semantically equal (all of their layers are equal).

Parameters:	other (`InchiMolecule`) – other molecule check_protonation (`bool`, optional) – if obj:True, check that the protonation states (h, p, q) are equal check_double_bonds (`bool`, optional) – if obj:True, check that the doubling bonding layers (b) are equal check_stereochemistry (`bool`, optional) – if obj:True, check that the stereochemistry layers (t, m, s) are equal check_isotopes (`bool`, optional) – if obj:True, check that the isotopic layers (i) are equal check_fixed_hydrogens (`bool`, optional) – if obj:True, check that the fixed hydrogen layers (f) are equal check_reconnected_metals (`bool`, optional) – if obj:True, check that the reconnected metals layers (r) are equal
Returns:	`True` the molecules are semantically equal
Return type:	`bool`

is_protonation_isomer(other)[source]¶

Determine if two molecules are protonation isomers

Parameters:	other (`InchiMolecule`) – other molecule
Returns:	`True` if the molecules are protonation isomers
Return type:	`bool`

is_stereoisomer(other)[source]¶

Determine if two molecules are steroisomers

Parameters:	other (`InchiMolecule`) – other molecule
Returns:	`True` if the molecules are stereoisomers
Return type:	`bool`

is_tautomer(other)[source]¶

Determine if two molecules are tautomers

Parameters:	other (`InchiMolecule`) – other molecule
Returns:	`True` if the molecules are tautomers
Return type:	`bool`

remove_layer(layer)[source]¶

Remove a layer from a structure

Parameters:	layer (`str`) – name of the layer

class datanator.util.molecule_util.Molecule(id='', name='', structure='', cross_references=None)[source]¶

Bases: object

Represents a molecule

id[source]¶

identifier

Type:	`str`

name[source]¶

name

Type:	`str`

structure[source]¶

structure in InChI, MOL, or canonical SMILES format

Type:	`str`

cross_references[source]¶

list of cross references

Type:	`list` of `CrossReference`

get_fingerprint(type='fp2')[source]¶

Calculate a fingerprint

Parameters:	type (`str`, optional) – fingerprint type to calculate
Returns:	fingerprint
Return type:	`pybel.Fingerprint`

static get_fingerprint_types()[source]¶

Get list of fingerprint types

Returns:	list of fingerprint types
Return type:	`list` of `str`

get_format()[source]¶

Get the format of the structure

Returns:	format
Return type:	`str`

get_similarity(other, fingerprint_type='fp2')[source]¶

Calculate the similarity with another molecule

Parameters:	other (`Molecule`) – a second molecule fingerprint_type (`str`, optional) – fingerprint type to use to calculate similarity
Returns:	the similarity with the other molecule
Return type:	`float`

to_format(format)[source]¶

Get the structure in a format

:param str: format such as inchi, mol, smiles

Returns:	structure in a format
Return type:	`str`

to_inchi()[source]¶

Get the structure in InChI format

Returns:	structure in InChi format
Return type:	`str`

to_mol()[source]¶

Get the structure in MOL format

Returns:	structure in MOL format
Return type:	`str`

to_openbabel()[source]¶

Create an Open Babel molecule for the molecule

Returns:	Open Babel molecule
Return type:	`openbabel.OBMol`

to_pybel()[source]¶

Create a pybel molecule for the molecule

Returns:	pybel molecule
Return type:	`pybel.Molecule`

to_smiles()[source]¶

Get the structure in SMILES format

Returns:	structure in SMILES format
Return type:	`str`

4.1.1.7.9. datanator.util.mongo_util module¶

class datanator.util.mongo_util.MongoUtil(cache_dirname=None, MongoDB=None, replicaSet=None, db='test', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: object

con_db(collection_str)[source]¶

fill_db(collection_str, sym_link=False)[source]¶

Check if collection is already in MongoDB if already in:

do nothing

else:: load data into db from quiltdata (karrlab/datanator_nosql)

collection_str[source]¶: name of collection (e.g. ‘ecmdb’, ‘pax’, etc)

sym_link[source]¶: whether download should be a sym link

flatten_collection(collection_str)[source]¶

Flatten a collection

c is ommitted because it does not have a non-object value associated with it

list_all_collections()[source]¶: List all non-system collections within database

print_schema(collection_str)[source]¶: Print out schema of a collection removed ‘_id’ from collection due to its object type and universality

4.1.1.7.10. datanator.util.reaction_util module¶

Utilities for dealing with reactions

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2017-04-13
Copyright:	2017, Karr Lab
License:	MIT

datanator.util.reaction_util.calc_reactant_product_pairs(reaction)[source]¶

Get list of pairs of similar reactants and products using a greedy algorithm.

Parameters:	reaction (`data_model.Reaction`) – reaction
Returns:	data_model.Specie, `data_model.Specie`: list of pairs of similar reactants and products
Return type:	`list` of `tuple` of obj

4.1.1.7.11. datanator.util.rna_seq_util module¶

Utilities for RNA-seq data

Author:	Jonathan Karr <jonrkarr@gmail.com>
Author:	Yosef Roth <yosefdroth@gmail.com>
Date:	2018-01-15
Copyright:	2018, Karr Lab
License:	MIT

class datanator.util.rna_seq_util.Kallisto[source]¶

Bases: object

Python interface to kallisto.

index(fasta_filenames, index_filename=None, kmer_size=31, make_unique=False)[source]¶

Generate index from FASTA files

Parameters:	fastq_filenames (`list` of `str`) – paths to FASTA files index_filename (`str`, optional) – path to the kallisto index file to be created kmer_size (`int`, optional) – k-mer length make_unique (`bool`, optional) – if `True`, replace repeated target names with unique names

quant(fastq_filenames, index_filename=None, output_dirname=None, bias=False, bootstrap_samples=0, seed=42, plaintext=False, fusion=False, single_end_reads=False, forward_stranded=False, reverse_stranded=False, fragment_length=None, fragment_length_std=None, threads=1, pseudobam=False)[source]¶

Process RNA-seq FASTQ files

Parameters:

fastq_filenames (list of str) – paths to FASTQ files
index_filename (str, optional) – path to the kallisto index file to be used for quantification
output_dirname (str, optional) – path to the output directory
single_end_reads (bool, optional) – if True, quantify single-end reads
fragment_length (float, optional) – estimated average fragment length
fragment_length_std (float, optional) – estimated standard deviation of fragment length

4.1.1.7.12. datanator.util.server_util module¶

class datanator.util.server_util.ServerUtil(config_file=None, username=None, password=None, server=None, port=None, verbose=True)[source]¶

Bases: object

Utility function to read authentication files for connection with MongoDB servers on AWS

[user] User = some-user Password = some-password Server = server-address Port = port-number

get_user_config(username='admin')[source]¶

4.1.1.7.13. datanator.util.standardize_util module¶

standardize key to a uniform nomenclature

class datanator.util.standardize_util.StandardizeUtil(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

standardize_metabolite()[source]¶

standardize_sabio()[source]¶

4.1.1.7.14. datanator.util.taxonomy_util module¶

Utilities for dealing with taxa

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2017-04-11
Copyright:	2017, Karr Lab
License:	MIT

class datanator.util.taxonomy_util.Taxon(id='', name='', ncbi_id=None, cross_references=None)[source]¶

Bases: object

Represents a taxon such as a genus, species, or strain

id[source]¶

identifier

Type:	`str`

name[source]¶

name of the taxon

Type:	`str`

id_of_nearest_ncbi_taxon[source]¶

ID of the nearest parent taxon which is in the NCBI database

Type:	`int`

distance_from_nearest_ncbi_taxon[source]¶

distance from the taxon to its nearest parent which is in the NCBI database

Type:	`int`

additional_name_beyond_nearest_ncbi_taxon[source]¶

additional part of the taxon’s beyond that of its nearest parent in the NCBI database

Type:	`str`

cross_references[source]¶

list of cross references

Type:	`list` of `CrossReference`

get_common_ancestor(other)[source]¶

Get the lastest common ancestor of two taxa

Parameters:	other (`Taxon`) – a second taxon
Returns:	latest common ancestor
Return type:	`Taxon`

get_distance_to_common_ancestor(other)[source]¶

Calculate the number of links in the NCBI taxonomic tree between two taxa and their latest common ancestor

Note: This distances depends on the granularity of the lineage of the taxon. For example, there are only 7 links between most bacteria species and the Bacteria superkingdom. However, there are 28 links between the Homo sapiens species and the Eukaryota superkingdom.

Parameters:	other (`Taxon`) – a second taxon
Returns:	number of links between `self` and its latest common ancestor with `other` in the NCBI taxonomic tree
Return type:	`int`

get_distance_to_root()[source]¶

Get the distance from the taxon to the root of the NCBI taxonomy tree

Returns:	distance from the taxon to the root
Return type:	`int`

get_max_distance_to_common_ancestor()[source]¶

Get the maximum distance from the taxon to a common ancestor with another taxon

Returns:	maximum distance from the taxon to a common ancestor with another taxon
Return type:	`int`

get_ncbi_id()[source]¶

Get the ID of the taxon within the NCBI database

Returns:	ID of the taxon within the NCBI database or `None` if the taxon isn’t in the NCBI database
Return type:	`int` or `None`

get_parent_taxa()[source]¶

Get parent taxa

Returns:	list of parent taxa
Return type:	`list` of `Taxon`

get_rank()[source]¶

Get the rank of the taxon

Returns:	rank of the taxon
Return type:	`str`

datanator.util.taxonomy_util.setup_database(force_update=False)[source]¶

Setup a local sqllite copy of the NCBI Taxonomy database. If force_update is False, then only download the content from NCBI and build the sqllite database, if a local database doesn’t already exist. If force_update is True, then always download the content from NCBI and rebuild the sqllite copy of the database.

Parameters:	force_update (`bool`, optional) – `False`: only download the content for the database and build a local sqllite database if a local sqllite copy of the database doesn’t already exist `True`: always download the content for the database from NCBI and rebuild a local sqllite database

4.1.1.7.15. datanator.util.warning_util module¶

Warning utilities

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2017-04-13
Copyright:	2017, Karr Lab
License:	MIT

datanator.util.warning_util.disable_warnings()[source]¶: Disable warning messages from openbabel and urllib

datanator.util.warning_util.enable_warnings()[source]¶: Enable warning messages from openbabel and urllib

4.1.1.7. datanator.util package¶

4.1.1.7.1. Submodules¶

4.1.1.7.2. datanator.util.build_util module¶

4.1.1.7.3. datanator.util.calc_tanimoto module¶

4.1.1.7.4. datanator.util.chem_util module¶

4.1.1.7.5. datanator.util.constants module¶

4.1.1.7.6. datanator.util.file_util module¶

4.1.1.7.7. datanator.util.index_collection module¶

4.1.1.7.8. datanator.util.molecule_util module¶

4.1.1.7.9. datanator.util.mongo_util module¶

4.1.1.7.10. datanator.util.reaction_util module¶

4.1.1.7.11. datanator.util.rna_seq_util module¶

4.1.1.7.12. datanator.util.server_util module¶

4.1.1.7.13. datanator.util.standardize_util module¶

4.1.1.7.14. datanator.util.taxonomy_util module¶

4.1.1.7.15. datanator.util.warning_util module¶

4.1.1.7.16. Module contents¶