4.1.1.5. datanator.util package¶
4.1.1.5.1. Submodules¶
4.1.1.5.2. datanator.util.base26 module¶
Fork from git@github.com:mnowotka/chembl_ikey.git
4.1.1.5.3. datanator.util.build_util module¶
4.1.1.5.4. datanator.util.calc_tanimoto module¶
-
class
datanator.util.calc_tanimoto.
CalcTanimoto
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=True, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Calculating the Tanimoto similarity matrix given two compound collections e.g. ECMDB YMDB
-
get_tanimoto
(mol1, mol2, str_format='inchi', rounding=3)[source]¶ Calculates tanimoto coefficients between two molecules, mol1 and mol2
- Parameters
mol1 – molecule 1 in some format
mol2 – molecule 2 in same format as molecule 1
str_format – format for molecular representation supported formats are provided by Pybel
rounding – rounding of the final results
- Returns
rounded tanimoto coefficient
- Return type
tani
-
many_to_many
(collection_str1='metabolites_meta', collection_str2='metabolites_meta', field1='inchi', field2='inchi', lookup1='InChI_Key', lookup2='InChI_Key', num=100)[source]¶ Go through collection_str and assign each compound top ‘num’ amount of most similar compounds :param collection_str1: collection in which compound is drawn :param collection_str2: collection in which comparison is made :param field1: field of interest in collection_str1 :param field2: filed of interest in collection_str2 :param num: number of most similar compound :param batch_size: batch_size for each server round trip
-
one_to_many
(inchi, collection_str='metabolites_meta', field='inchi', lookup='InChI_Key', num=100)[source]¶ Calculate tanimoto coefficients between one metabolite and the rest of the ‘collection_str’ :param inchi: chosen chemical compound in InChI format :param collection_str: collection in which comparisons are made :param field: field that has the chemical structure :param lookup: field that had been previous indexed :param num: max number of compounds to be returned, sorted by tanimoto
- Returns
sorted numpy array of top num tanimoto coeff sorted_inchi: sorted top num inchi
- Return type
sorted_coeff
-
4.1.1.5.5. datanator.util.chem_util module¶
-
class
datanator.util.chem_util.
ChemUtil
[source]¶ Bases:
object
-
inchi_to_inchikey
(szINCHISource)[source]¶ fork from git@github.com:mnowotka/chembl_ikey.git
-
4.1.1.5.6. datanator.util.constants module¶
4.1.1.5.7. datanator.util.file_util module¶
-
class
datanator.util.file_util.
FileUtil
[source]¶ Bases:
object
-
access_dict_by_index
(_dict, count)[source]¶ Assuming dict has an order, return the first num of elements in dictionary :param _dict: { ‘a’:1, ‘b’:2, ‘c’:3, … } :param count: number of items to return
- Returns
- a dictionary with the first count
from _dict {‘a’:1}
- Return type
result
-
exists_key_value_pair
(dictionary, k, v)[source]¶ Test if a key/value pair exists in dictionary :param dict (: obj: dict): dictionary to be checked :param k (: obj: str): key to be matched :param v (: obj: ``): value to be matched
- Returns
obj: bool): True or False
- Return type
result (
-
flatten_json
(nested_json)[source]¶ Flatten json object with nested keys into a single level. e.g. {a: b, {a: b,
- c: [ d: e,
{d: e}, => f: g } {f: g}]}
- Parameters
nested_json – A nested json object.
- Returns
The flattened json object if successful, None otherwise.
-
get_common
(list1, list2)[source]¶ Given two lists, find the closest common ancestor :param list1: [a, b, c, f, g] :param list2: [a, b, d, e]
- Returns
- the closest common ancestor, in
the above example would be b
- Return type
result
-
get_val_from_dict_list
(dict_list, key)[source]¶ Get values for key from a list of dictionaries :param dict_list (: obj: list of :obj: dict): list of dictionary
to query
- Parameters
( (key) – obj: str): key for which to get the value
- Returns
obj: list of :obj:): list of values
- Return type
results (
-
make_dict
(keys, values)[source]¶ Give two lists, make a list of dictionaries :param keys: [a, b, c, d, …] :param values: [1, 2, 3, 4]
- Returns
{‘a’: 1, ‘b’: 2, ‘c’: 3, …}
- Return type
dic
-
merge_dict
(dicts)[source]¶ Merge a list of dictionaries :param dicts (: obj: list of :obj: dict): list of dictionaries
- Returns
obj: dict): merged dictionries
- Return type
result (
-
replace_dict_key
(_dict, replacements)[source]¶ Replace keys in a dictionary with the order in replacements e.g., {‘a’: 0, ‘b’: 1, ‘c’: 2}, [‘d’, ‘e’, ‘f’] => {‘d’: 0, ‘e’: 1, ‘f’: 2} :param _dict: dictionary whose keys are to be replaced :param replacement: list of replacement keys
- Returns
dictionary with replaced keys
- Return type
result
-
replace_list_dict_key
(_list, replacements)[source]¶ Replace keys in a dictionary with the order in replacements e.g., [{‘a’: 0}, {‘b’: 1}, {‘c’: 2}], [‘d’, ‘e’, ‘f’] => [{‘d’: 0}, {‘e’: 1}, {‘f’: 2}] :param _list (: obj: list of :obj: dict): list of dictionaries whose keys are to be replaced :param replacement (: obj: list): list of replacement keys
- Returns
obj: list of :obj: dict): dictionary with replaced keys
- Return type
result (
-
search_dict_list
(dict_list, key, value='')[source]¶ Find the dictionary with key/value pair in a list of dictionaries
- Parameters
( (value) – obj: list of :obj: dict): list of dictionaries
( – obj: string): key in the dictionary
( – obj: ``): value to be matched if value==None, then only search for key
- Returns
obj: list of :obj: dict): list of dictionaries with the key/value pair
- Return type
result (
-
4.1.1.5.8. datanator.util.index_collection module¶
Index collections in MongoDB accordingly
-
class
datanator.util.index_collection.
IndexCollection
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
index_metabolites_meta
(collection_str='metabolites_meta')[source]¶ Index metabolites_meta collection
-
4.1.1.5.9. datanator.util.molecule_util module¶
Utilities for dealing with molecules
- Author
Yosef Roth <yosefdroth@gmail.com>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2017-04-12
- Copyright
2017, Karr Lab
- License
MIT
-
class
datanator.util.molecule_util.
InchiMolecule
(structure)[source]¶ Bases:
object
Represents the InChI-encoded structure of a molecule
-
LAYERS
= {'': 'formula', 'b': 'double_bonds', 'c': 'connections', 'f': 'fixed_hydrogens', 'h': 'hydrogens', 'i': 'isotopes', 'm': 'stereochemistry_parity', 'p': 'protons', 'q': 'charge', 'r': 'reconnected_metals', 's': 'stereochemistry_type', 't': 'stereochemistry'}[source]
-
__str__
()[source]¶ Generate an InChI string representation of the molecule
- Returns
InChI string representation of the molecule
- Return type
str
-
get_formula_and_connectivity
()[source]¶ Get a string representation of the formula and connectivity
- Returns
string representation of the formula and connectivity
- Return type
str
-
is_equal
(other, check_protonation=True, check_double_bonds=True, check_stereochemistry=True, check_isotopes=True, check_fixed_hydrogens=True, check_reconnected_metals=True)[source]¶ Determine if two molecules are semantically equal (all of their layers are equal).
- Parameters
other (
InchiMolecule
) – other moleculecheck_protonation (
bool
, optional) – if obj:True, check that the protonation states (h, p, q) are equalcheck_double_bonds (
bool
, optional) – if obj:True, check that the doubling bonding layers (b) are equalcheck_stereochemistry (
bool
, optional) – if obj:True, check that the stereochemistry layers (t, m, s) are equalcheck_isotopes (
bool
, optional) – if obj:True, check that the isotopic layers (i) are equalcheck_fixed_hydrogens (
bool
, optional) – if obj:True, check that the fixed hydrogen layers (f) are equalcheck_reconnected_metals (
bool
, optional) – if obj:True, check that the reconnected metals layers (r) are equal
- Returns
True
the molecules are semantically equal- Return type
bool
-
is_protonation_isomer
(other)[source]¶ Determine if two molecules are protonation isomers
- Parameters
other (
InchiMolecule
) – other molecule- Returns
True
if the molecules are protonation isomers- Return type
bool
-
is_stereoisomer
(other)[source]¶ Determine if two molecules are steroisomers
- Parameters
other (
InchiMolecule
) – other molecule- Returns
True
if the molecules are stereoisomers- Return type
bool
-
is_tautomer
(other)[source]¶ Determine if two molecules are tautomers
- Parameters
other (
InchiMolecule
) – other molecule- Returns
True
if the molecules are tautomers- Return type
bool
-
-
class
datanator.util.molecule_util.
Molecule
(id='', name='', structure='', cross_references=None)[source]¶ Bases:
object
Represents a molecule
-
get_fingerprint
(type='fp2')[source]¶ Calculate a fingerprint
- Parameters
type (
str
, optional) – fingerprint type to calculate- Returns
fingerprint
- Return type
pybel.Fingerprint
-
static
get_fingerprint_types
()[source]¶ Get list of fingerprint types
- Returns
list of fingerprint types
- Return type
list
ofstr
-
get_similarity
(other, fingerprint_type='fp2')[source]¶ Calculate the similarity with another molecule
- Parameters
other (
Molecule
) – a second moleculefingerprint_type (
str
, optional) – fingerprint type to use to calculate similarity
- Returns
the similarity with the other molecule
- Return type
float
-
to_format
(format)[source]¶ Get the structure in a format
:param
str
: format such as inchi, mol, smiles- Returns
structure in a format
- Return type
str
-
to_inchi
()[source]¶ Get the structure in InChI format
- Returns
structure in InChi format
- Return type
str
-
to_openbabel
()[source]¶ Create an Open Babel molecule for the molecule
- Returns
Open Babel molecule
- Return type
openbabel.OBMol
-
4.1.1.5.10. datanator.util.mongo_util module¶
-
class
datanator.util.mongo_util.
MongoUtil
(cache_dirname=None, MongoDB=None, replicaSet=None, db='test', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
object
-
fill_db
(collection_str)[source]¶ Check if collection is already in MongoDB
- If already in MongoDB:
Do nothing
- Else:
Load data into db from quiltdata (karrlab/datanator)
- Parameters
collection_str – name of collection (e.g. ‘ecmdb’, ‘pax’, etc)
-
4.1.1.5.11. datanator.util.reaction_util module¶
Utilities for dealing with reactions
- Author
Yosef Roth <yosefdroth@gmail.com>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2017-04-13
- Copyright
2017, Karr Lab
- License
MIT
-
datanator.util.reaction_util.
calc_reactant_product_pairs
(reaction)[source]¶ Get list of pairs of similar reactants and products using a greedy algorithm.
- Parameters
reaction (
data_model.Reaction
) – reaction- Returns
data_model.Specie,
data_model.Specie
: list of pairs of similar reactants and products- Return type
list
oftuple
of obj
4.1.1.5.12. datanator.util.rna_halflife_util module¶
-
class
datanator.util.rna_halflife_util.
RnaHLUtil
(server=None, username=None, password=None, src_db=None, des_db=None, protein_col=None, rna_col=None, authDB='admin', readPreference=None, max_entries=inf, verbose=False, cache_dir=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
fill_uniprot_by_embl
(embl, species=None)[source]¶ Fill uniprot collection using EMBL data
- Parameters
embl (
str
) – sequence embl dataspecies (
list
) – NCBI Taxonomy ID of the species
-
fill_uniprot_by_gn
(gene_name, species=None)[source]¶ Fill uniprot collection using gene name
- Parameters
gene_name (
str
) – Ordered locus namespecies (
list
) – NCBI Taxonomy ID of the species
-
fill_uniprot_by_oln
(oln, species=None)[source]¶ Fill uniprot collection using ordered locus name
- Parameters
oln (
str
) – Ordered locus namespecies (
list
) – NCBI Taxonomy ID of the species
-
fill_uniprot_with_df
(df, identifier, identifier_type='oln', species=None)[source]¶ Fill uniprot colleciton with ordered_locus_name from excel sheet
- Parameters
df (
pandas.DataFrame
) – dataframe to be inserted into uniprot collection.df conforms to the schemas required by load_uniprot function in uniprot.py (Assuming) –
identifier (
str
) – name of column that stores ordered locus name information.identifier_type (
str
) – type of identifier, i.e. ‘oln’, ‘gene_name’species (
list
) – NCBI Taxonomy ID of the species.
-
make_df
(url, sheet_name, header=0, names=None, usecols=None, skiprows=None, nrows=None, na_values=None, file_type='xlsx', file_name=None)[source]¶ Read online excel file as dataframe
- Parameters
url (
str
) – excel file urlsheet_name (
str
) – name of sheet in xlsxheader (
int
) – Row (0-indexed) to use for the column labels of the parsed DataFrame.names (
list
) – list of column names to useusecols (
int
orlist
orstr
) – Return a subset of the columns.nrows (
int
) – number of rows to parse. Defaults to None.file_type (
str
) – downloaded file type. Defaults to xlsx.file_name (
str
) – name of the file of interest.
- Returns
xlsx transformed to pandas.DataFrame
- Return type
(
pandas.DataFrame
)
-
4.1.1.5.13. datanator.util.rna_seq_util module¶
Utilities for RNA-seq data
- Author
Jonathan Karr <jonrkarr@gmail.com>
- Author
Yosef Roth <yosefdroth@gmail.com>
- Date
2018-01-15
- Copyright
2018, Karr Lab
- License
MIT
-
class
datanator.util.rna_seq_util.
Kallisto
[source]¶ Bases:
object
Python interface to kallisto.
-
index
(fasta_filenames, index_filename=None, kmer_size=31, make_unique=False)[source]¶ Generate index from FASTA files
- Parameters
fastq_filenames (
list
ofstr
) – paths to FASTA filesindex_filename (
str
, optional) – path to the kallisto index file to be createdkmer_size (
int
, optional) – k-mer lengthmake_unique (
bool
, optional) – ifTrue
, replace repeated target names with unique names
-
quant
(fastq_filenames, index_filename=None, output_dirname=None, bias=False, bootstrap_samples=0, seed=42, plaintext=False, fusion=False, single_end_reads=False, forward_stranded=False, reverse_stranded=False, fragment_length=None, fragment_length_std=None, threads=1, pseudobam=False)[source]¶ Process RNA-seq FASTQ files
- Parameters
fastq_filenames (
list
ofstr
) – paths to FASTQ filesindex_filename (
str
, optional) – path to the kallisto index file to be used for quantificationoutput_dirname (
str
, optional) – path to the output directorysingle_end_reads (
bool
, optional) – ifTrue
, quantify single-end readsfragment_length (
float
, optional) – estimated average fragment lengthfragment_length_std (
float
, optional) – estimated standard deviation of fragment length
-
4.1.1.5.14. datanator.util.taxonomy_util module¶
Utilities for dealing with taxa
- Author
Yosef Roth <yosefdroth@gmail.com>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2017-04-11
- Copyright
2017, Karr Lab
- License
MIT
-
class
datanator.util.taxonomy_util.
Taxon
(id='', name='', ncbi_id=None, cross_references=None)[source]¶ Bases:
object
Represents a taxon such as a genus, species, or strain
-
id_of_nearest_ncbi_taxon
[source]¶ ID of the nearest parent taxon which is in the NCBI database
- Type
int
-
distance_from_nearest_ncbi_taxon
[source]¶ distance from the taxon to its nearest parent which is in the NCBI database
- Type
int
-
additional_name_beyond_nearest_ncbi_taxon
[source]¶ additional part of the taxon’s beyond that of its nearest parent in the NCBI database
- Type
str
-
get_distance_to_common_ancestor
(other)[source]¶ Calculate the number of links in the NCBI taxonomic tree between two taxa and their latest common ancestor
Note: This distances depends on the granularity of the lineage of the taxon. For example, there are only 7 links between most bacteria species and the Bacteria superkingdom. However, there are 28 links between the Homo sapiens species and the Eukaryota superkingdom.
- Parameters
other (
Taxon
) – a second taxon- Returns
- number of links between
self
and its latest common ancestor withother
in the NCBI taxonomic tree
- number of links between
- Return type
int
-
get_distance_to_root
()[source]¶ Get the distance from the taxon to the root of the NCBI taxonomy tree
- Returns
distance from the taxon to the root
- Return type
int
-
get_max_distance_to_common_ancestor
()[source]¶ Get the maximum distance from the taxon to a common ancestor with another taxon
- Returns
maximum distance from the taxon to a common ancestor with another taxon
- Return type
int
-
-
datanator.util.taxonomy_util.
setup_database
(force_update=False)[source]¶ Setup a local sqllite copy of the NCBI Taxonomy database. If
force_update
is False, then only download the content from NCBI and build the sqllite database, if a local database doesn’t already exist. Ifforce_update
is True, then always download the content from NCBI and rebuild the sqllite copy of the database.- Parameters
force_update (
bool
, optional) –False
: only download the content for the database and build a local sqllite databaseif a local sqllite copy of the database doesn’t already exist
True
: always download the content for the database from NCBI and rebuild a local sqllitedatabase
4.1.1.5.15. datanator.util.warning_util module¶
Warning utilities
- Author
Yosef Roth <yosefdroth@gmail.com>
- Author
Jonathan Karr <jonrkarr@gmail.com>
- Date
2017-04-13
- Copyright
2017, Karr Lab
- License
MIT