4.1.1.7. datanator.util package¶
4.1.1.7.1. Submodules¶
4.1.1.7.2. datanator.util.build_util module¶
4.1.1.7.3. datanator.util.calc_tanimoto module¶
-
class
datanator.util.calc_tanimoto.
CalcTanimoto
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
Calculating the tanitomo similarity matrix given two compound collections e.g. ECMDB YMDB
-
get_tanimoto
(mol1, mol2, str_format='inchi', rounding=3)[source]¶ Calculates tanimoto coefficients between two molecules, mol1 and mol2 :param mol1: molecule 1 in some format :param mol2: molecule 2 in same format as molecule 1 :param str_format: format for molecular representation
supported formats are provided by PybelParameters: rounding – rounding of the final results Returns: rounded tanimoto coefficient Return type: tani
-
many_to_many
(collection_str1='metabolites_meta', collection_str2='metabolites_meta', field1='inchi_deprot', field2='inchi_deprot', lookup1='inchi_hashed', lookup2='inchi_hashed', num=100)[source]¶ Go through collection_str and assign each compound top ‘num’ amount of most similar compounds :param collection_str1: collection in which compound is drawn :param collection_str2: collection in which comparison is made :param field1: field of interest in collection_str1 :param field2: filed of interest in collection_str2 :param num: number of most similar compound :param batch_size: batch_size for each server round trip
-
one_to_many
(inchi, collection_str='metabolites_meta', field='inchi_deprot', lookup='inchi_hashed', num=100)[source]¶ Calculate tanimoto coefficients between one metabolite and the rest of the ‘collection_str’ :param inchi: chosen chemical compound in InChI format :param collection_str: collection in which comparisons are made :param field: field that has the chemical structure :param lookup: field that had been previous indexed :param num: max number of compounds to be returned, sorted by tanimoto
Returns: sorted numpy array of top num tanimoto coeff sorted_inchi: sorted top num inchi Return type: sorted_coeff
-
4.1.1.7.4. datanator.util.chem_util module¶
4.1.1.7.5. datanator.util.constants module¶
4.1.1.7.6. datanator.util.file_util module¶
-
class
datanator.util.file_util.
FileUtil
[source]¶ Bases:
object
-
access_dict_by_index
(_dict, count)[source]¶ Assuming dict has an order, return the first num of elements in dictionary :param _dict: { ‘a’:1, ‘b’:2, ‘c’:3, … } :param count: number of items to return
Returns: - a dictionary with the first count
- from _dict {‘a’:1}
Return type: result
-
flatten_json
(nested_json)[source]¶ Flatten json object with nested keys into a single level. e.g. {a: b, {a: b,
- c: [ d: e,
- {d: e}, => f: g } {f: g}]}
Parameters: nested_json – A nested json object. Returns: The flattened json object if successful, None otherwise.
-
get_common
(list1, list2)[source]¶ Given two lists, find the closest common ancestor :param list1: [a, b, c, f, g] :param list2: [a, b, d, e]
Returns: - the closest common ancestor, in
- the above example would be b
Return type: result
-
make_dict
(keys, values)[source]¶ Give two lists, make a list of dictionaries :param keys: [a, b, c, d, …] :param values: [1, 2, 3, 4]
Returns: {‘a’: 1, ‘b’: 2, ‘c’: 3, …} Return type: dic
-
replace_dict_key
(_dict, replacements)[source]¶ Replace keys in a dictionary with the order in replacements e.g., {‘a’: 0, ‘b’: 1, ‘c’: 2}, [‘d’, ‘e’, ‘f’] => {‘d’: 0, ‘e’: 1, ‘f’: 2} :param _dict: dictionary whose keys are to be replaced :param replacement: list of replacement keys
Returns: dictionary with replaced keys Return type: result
-
4.1.1.7.7. datanator.util.index_collection module¶
Index collections in MongoDB accordingly
-
class
datanator.util.index_collection.
IndexCollection
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
index_metabolites_meta
(collection_str='metabolites_meta')[source]¶ Index metabolites_meta collection
-
4.1.1.7.8. datanator.util.molecule_util module¶
Utilities for dealing with molecules
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2017-04-12 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.util.molecule_util.
InchiMolecule
(structure)[source]¶ Bases:
object
Represents the InChI-encoded structure of a molecule
-
LAYERS
= {'': 'formula', 'b': 'double_bonds', 'c': 'connections', 'f': 'fixed_hydrogens', 'h': 'hydrogens', 'i': 'isotopes', 'm': 'stereochemistry_parity', 'p': 'protons', 'q': 'charge', 'r': 'reconnected_metals', 's': 'stereochemistry_type', 't': 'stereochemistry'}[source]
-
__str__
()[source]¶ Generate an InChI string representation of the molecule
Returns: InChI string representation of the molecule Return type: str
-
get_formula_and_connectivity
()[source]¶ Get a string representation of the formula and connectivity
Returns: string representation of the formula and connectivity Return type: str
-
is_equal
(other, check_protonation=True, check_double_bonds=True, check_stereochemistry=True, check_isotopes=True, check_fixed_hydrogens=True, check_reconnected_metals=True)[source]¶ Determine if two molecules are semantically equal (all of their layers are equal).
Parameters: - other (
InchiMolecule
) – other molecule - check_protonation (
bool
, optional) – if obj:True, check that the protonation states (h, p, q) are equal - check_double_bonds (
bool
, optional) – if obj:True, check that the doubling bonding layers (b) are equal - check_stereochemistry (
bool
, optional) – if obj:True, check that the stereochemistry layers (t, m, s) are equal - check_isotopes (
bool
, optional) – if obj:True, check that the isotopic layers (i) are equal - check_fixed_hydrogens (
bool
, optional) – if obj:True, check that the fixed hydrogen layers (f) are equal - check_reconnected_metals (
bool
, optional) – if obj:True, check that the reconnected metals layers (r) are equal
Returns: True
the molecules are semantically equalReturn type: bool
- other (
-
is_protonation_isomer
(other)[source]¶ Determine if two molecules are protonation isomers
Parameters: other ( InchiMolecule
) – other moleculeReturns: True
if the molecules are protonation isomersReturn type: bool
-
is_stereoisomer
(other)[source]¶ Determine if two molecules are steroisomers
Parameters: other ( InchiMolecule
) – other moleculeReturns: True
if the molecules are stereoisomersReturn type: bool
-
is_tautomer
(other)[source]¶ Determine if two molecules are tautomers
Parameters: other ( InchiMolecule
) – other moleculeReturns: True
if the molecules are tautomersReturn type: bool
-
-
class
datanator.util.molecule_util.
Molecule
(id='', name='', structure='', cross_references=None)[source]¶ Bases:
object
Represents a molecule
-
get_fingerprint
(type='fp2')[source]¶ Calculate a fingerprint
Parameters: type ( str
, optional) – fingerprint type to calculateReturns: fingerprint Return type: pybel.Fingerprint
-
static
get_fingerprint_types
()[source]¶ Get list of fingerprint types
Returns: list of fingerprint types Return type: list
ofstr
-
get_similarity
(other, fingerprint_type='fp2')[source]¶ Calculate the similarity with another molecule
Parameters: - other (
Molecule
) – a second molecule - fingerprint_type (
str
, optional) – fingerprint type to use to calculate similarity
Returns: the similarity with the other molecule
Return type: float
- other (
-
to_format
(format)[source]¶ Get the structure in a format
:param
str
: format such as inchi, mol, smilesReturns: structure in a format Return type: str
-
to_inchi
()[source]¶ Get the structure in InChI format
Returns: structure in InChi format Return type: str
-
to_openbabel
()[source]¶ Create an Open Babel molecule for the molecule
Returns: Open Babel molecule Return type: openbabel.OBMol
-
4.1.1.7.9. datanator.util.mongo_util module¶
-
class
datanator.util.mongo_util.
MongoUtil
(cache_dirname=None, MongoDB=None, replicaSet=None, db='test', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
object
-
fill_db
(collection_str, sym_link=False)[source]¶ Check if collection is already in MongoDB if already in:
do nothing- else:
- load data into db from quiltdata (karrlab/datanator_nosql)
-
4.1.1.7.10. datanator.util.reaction_util module¶
Utilities for dealing with reactions
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2017-04-13 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
datanator.util.reaction_util.
calc_reactant_product_pairs
(reaction)[source]¶ Get list of pairs of similar reactants and products using a greedy algorithm.
Parameters: reaction ( data_model.Reaction
) – reactionReturns: data_model.Specie, data_model.Specie
: list of pairs of similar reactants and productsReturn type: list
oftuple
of obj
4.1.1.7.11. datanator.util.rna_seq_util module¶
Utilities for RNA-seq data
Author: | Jonathan Karr <jonrkarr@gmail.com> |
---|---|
Author: | Yosef Roth <yosefdroth@gmail.com> |
Date: | 2018-01-15 |
Copyright: | 2018, Karr Lab |
License: | MIT |
-
class
datanator.util.rna_seq_util.
Kallisto
[source]¶ Bases:
object
Python interface to kallisto.
-
index
(fasta_filenames, index_filename=None, kmer_size=31, make_unique=False)[source]¶ Generate index from FASTA files
Parameters: - fastq_filenames (
list
ofstr
) – paths to FASTA files - index_filename (
str
, optional) – path to the kallisto index file to be created - kmer_size (
int
, optional) – k-mer length - make_unique (
bool
, optional) – ifTrue
, replace repeated target names with unique names
- fastq_filenames (
-
quant
(fastq_filenames, index_filename=None, output_dirname=None, bias=False, bootstrap_samples=0, seed=42, plaintext=False, fusion=False, single_end_reads=False, forward_stranded=False, reverse_stranded=False, fragment_length=None, fragment_length_std=None, threads=1, pseudobam=False)[source]¶ Process RNA-seq FASTQ files
Parameters: - fastq_filenames (
list
ofstr
) – paths to FASTQ files - index_filename (
str
, optional) – path to the kallisto index file to be used for quantification - output_dirname (
str
, optional) – path to the output directory - single_end_reads (
bool
, optional) – ifTrue
, quantify single-end reads - fragment_length (
float
, optional) – estimated average fragment length - fragment_length_std (
float
, optional) – estimated standard deviation of fragment length
- fastq_filenames (
-
4.1.1.7.12. datanator.util.server_util module¶
-
class
datanator.util.server_util.
ServerUtil
(config_file=None, username=None, password=None, server=None, port=None, verbose=True)[source]¶ Bases:
object
Utility function to read authentication files for connection with MongoDB servers on AWS
[user] User = some-user Password = some-password Server = server-address Port = port-number
4.1.1.7.13. datanator.util.standardize_util module¶
standardize key to a uniform nomenclature
4.1.1.7.14. datanator.util.taxonomy_util module¶
Utilities for dealing with taxa
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2017-04-11 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.util.taxonomy_util.
Taxon
(id='', name='', ncbi_id=None, cross_references=None)[source]¶ Bases:
object
Represents a taxon such as a genus, species, or strain
-
id_of_nearest_ncbi_taxon
[source]¶ ID of the nearest parent taxon which is in the NCBI database
Type: int
-
distance_from_nearest_ncbi_taxon
[source]¶ distance from the taxon to its nearest parent which is in the NCBI database
Type: int
-
additional_name_beyond_nearest_ncbi_taxon
[source]¶ additional part of the taxon’s beyond that of its nearest parent in the NCBI database
Type: str
-
get_common_ancestor
(other)[source]¶ Get the lastest common ancestor of two taxa
Parameters: other ( Taxon
) – a second taxonReturns: latest common ancestor Return type: Taxon
-
get_distance_to_common_ancestor
(other)[source]¶ Calculate the number of links in the NCBI taxonomic tree between two taxa and their latest common ancestor
Note: This distances depends on the granularity of the lineage of the taxon. For example, there are only 7 links between most bacteria species and the Bacteria superkingdom. However, there are 28 links between the Homo sapiens species and the Eukaryota superkingdom.
Parameters: other ( Taxon
) – a second taxonReturns: - number of links between
self
and its latest common ancestor withother
in the NCBI - taxonomic tree
Return type: int
- number of links between
-
get_distance_to_root
()[source]¶ Get the distance from the taxon to the root of the NCBI taxonomy tree
Returns: distance from the taxon to the root Return type: int
-
get_max_distance_to_common_ancestor
()[source]¶ Get the maximum distance from the taxon to a common ancestor with another taxon
Returns: maximum distance from the taxon to a common ancestor with another taxon Return type: int
-
-
datanator.util.taxonomy_util.
setup_database
(force_update=False)[source]¶ Setup a local sqllite copy of the NCBI Taxonomy database. If
force_update
is False, then only download the content from NCBI and build the sqllite database, if a local database doesn’t already exist. Ifforce_update
is True, then always download the content from NCBI and rebuild the sqllite copy of the database.Parameters: force_update ( bool
, optional) –False
: only download the content for the database and build a local sqllite database- if a local sqllite copy of the database doesn’t already exist
True
: always download the content for the database from NCBI and rebuild a local sqllite- database
4.1.1.7.15. datanator.util.warning_util module¶
Warning utilities
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2017-04-13 |
Copyright: | 2017, Karr Lab |
License: | MIT |