3.1.1.3. datanator_query_python.query package¶
3.1.1.3.1. Submodules¶
3.1.1.3.2. datanator_query_python.query.full_text_search module¶
-
class
datanator_query_python.query.full_text_search.
FTX
(profile_name=None, credential_path=None, config_path=None, elastic_path=None, cache_dir=None, service_name='es', max_entries=inf, verbose=False)[source]¶ Bases:
karr_lab_aws_manager.elasticsearch_kl.query_builder.QueryBuilder
-
bool_query
(query_message, index, must=None, should=None, must_not=None, _filter=None, minimum_should_match=0, **kwargs)[source]¶ Perform boolean query in elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html)
- Parameters
query_message (
str
) – simple string for queryingindex (
str
) – comma separated string to indicate indices in which query will be done.must (
list
ordict
, optional) – Body for must. Defaults to None._filter (
list
ordict
, optional) – Body for filter. Defaults to None.should (
list
ordict
, optional) – Body for should. Defaults to None.must_not (
list
ordict
, optional) – Body for must_not. Defaults to None.minimum_should_match (
int
) – Specify the number or percentage of should clauses returned documents must match. Defaults to 0.**size (
int
) – number of hits to be returned**from_ (
int
) – starting offset (default: 0)**scroll (
str
) – specify how long a consistent view of the index should be maintained for scrolled search(https – //www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll).
-
get_genes_ko_count
(q, num, agg_field='ko_number', **kwargs)[source]¶ Get protein index with different ko_number field for up to num hits, provided at least one of the proteins under ko_number has abundance info.
- Parameters
q (
str
) – query message.num (
int
) – number of hits needed.agg_field (
str
) – field to be aggregated.**from_ (
int
) – starting offset (default: 0).
- Returns
obj of index hits {‘index’: []}
- Return type
(
dict
)
-
get_genes_orthodb_count
(q, num, agg_field='orthodb_id.keyword', **kwargs)[source]¶ Get protein index with different ko_number field for up to num hits, provided at least one of the proteins under orthodb_id has abundance info.
- Parameters
q (
str
) – query message.num (
int
) – number of hits needed.agg_field (
str
) – field to be aggregated.**from_ (
int
) – starting offset (default: 0).
- Returns
obj of index hits {‘index’: []}
- Return type
(
dict
)
-
get_index_in_page
(r, index)[source]¶ Get indices in current hits page
- Parameters
r (
dict
) – ftx search resultindex (
list
) – list of string of indices.
- Returns
obj of index hits {‘index_0’: [], ‘index_1’: []}
- Return type
(
dict
)
-
get_index_ko_count
(q, num, agg_field='frontend_gene_aggregate', index='protein', **kwargs)[source]¶ Get protein index with different ko_number field for up to num hits.
- Parameters
q (
str
) – query message.num (
int
) – number of hits needed.agg_field (
str
) – field to be aggregated.index (
str
) – name of index.**from_ (
int
) – starting offset (default: 0).
- Returns
obj of index hits {‘index’: []}
- Return type
(
dict
)
-
get_num_source
(q, q_index, index, fields=['name', 'synonyms'], count=10, from_=0, batch_size=100)[source]¶ Extract a count number of source (ecmdb, ymdb, metabolite_meta, etc) index from ftx search result
- Parameters
q (
str
) – ftx query messageq_index (
str
) – comma separated string to indicate indices in which query will be doneindex (
set
) – set of index of interest (source collections)fields (
list
, optional) – list of fields to query. Defaults to [‘name’, ‘synonyms’]count (
int
, optional) – number of records required. Defaults to 0.from_ (
int
, optional) – page start. Defaults to 0.batch_size (
int
, optional) – ftx query page size. Defaults to 100.
- Returns
list of hits of index
- Return type
(
list
)
-
get_rxn_oi
(query_message, minimum_should_match=0, from_=0, size=10)[source]¶ Get reaction where at km or kcat exists.
- Parameters
query_message (
str
) – query message.minimum_should_match (
int
) – specify the number or percentage of should clauses returned documents must match. Defaults to 0.from_ (
int
) – es offset. Defaults to 0.size (
int
) – es return size. Defaults to 10.
-
get_single_index_count
(q, index, num, excludes=[], includes=[], **kwargs)[source]¶ Get single index up to num hits
- Parameters
q (
str
) – query messageindex (
str
) – index in which query will be performednum (
int
) – number of hits neededincludes (
list
ofstr
) – list of fields to be included in the data returned.excludes (
list
ofstr
) – list of fields to be excluded from the data returned.
- Returns
obj of index hits {‘index’: []}
- Return type
(
dict
)
-
simple_query_string
(query_message, index, **kwargs)[source]¶ Perform simple_query_string in elasticsearch (https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/full-text/#simple-query-string)
- Parameters
query_message (
str
) – simple string for queryingindex (
str
) – comma separated string to indicate indices in which query will be done**size (
int
) – number of hits to be returned**from_ (
int
) – starting offset (default: 0)**scroll (
str
) – specify how long a consistent view of the index should be maintained for scrolled search(https – //www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll).
-
3.1.1.3.3. datanator_query_python.query.query_corum module¶
-
class
datanator_query_python.query.query_corum.
QueryCorum
(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='corum', readPreference='nearest', replicaSet=None)[source]¶ Bases:
object
-
get_complexes_with_ncbi
(ncbi_id, projection={'_id': 0})[source]¶ Find all complexes in species with ncbi taxonomy id
- Parameters
ncbi (int) – ncbi taxonomy id
- Returns
list of all objects that meet the constraint
- Return type
(list)
-
get_complexes_with_uniprot
(uniprot_id, ncbi_id=9606)[source]¶ Find complexes in species that have protein with uniprot_id
- Parameters
uniprot_id (str) – uniprot id of protein
ncbi_id (int, optional) – ncbi taxonomy id of species. Defaults to 9606.
- Returns
list of complexes that meet the requirement
- Return type
(
list
ofdict
)
-
3.1.1.3.4. datanator_query_python.query.query_intact_complex module¶
3.1.1.3.5. datanator_query_python.query.query_kegg_organism_code module¶
-
class
datanator_query_python.query.query_kegg_organism_code.
QueryKOC
(username=None, password=None, server=None, authSource='admin', database='datanator', collection_str=None, readPreference='nearest', replicaSet=None)[source]¶ Bases:
object
3.1.1.3.6. datanator_query_python.query.query_kegg_orthology module¶
-
class
datanator_query_python.query.query_kegg_orthology.
QueryKO
(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, readPreference='nearest', replicaSet=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
get_def_by_kegg_id
(kegg_id)[source]¶ Get kegg definition by kegg id
- Parameters
kegg_id (
str
) – kegg orthology- Returns
list of kegg orthology definitions
- Return type
(
list
ofstr
)
-
get_ko_by_name
(name)[source]¶ Get a gene’s ko number by its gene name
- Parameters
name – (
str
): gene name- Returns
(
str
): ko number of the gene- Return type
result
-
get_loci_by_id_org
(kegg_id, org, gene_id)[source]¶ Get ortholog locus id given kegg_id, organism code and gene_id.
- Parameters
kegg_id (
str
) – Kegg ortholog id.org (
str
) – Kegg organism code.gene_id (
str
) – Gene id.
- Returns
locus id.
- Return type
(
str
)
-
get_meta_by_kegg_id
(kegg_id)[source]¶ Get meta information by kegg_id
- Parameters
kegg_id (
str
) – Kegg ID.- Returns
Kegg meta object.
- Return type
(
Obj
)
-
get_meta_by_kegg_ids
(kegg_ids, projection={'_id': 0, 'gene_ortholog': 0})[source]¶ Get meta given kegg ids
- Parameters
kegg_ids (
list
ofstr
) – List of kegg ids.projection (
dict
) – MongoDB result projection.
- Returns
pymongo Cursor obj and number of documents found.
- Return type
(
tuple
ofpymongo.Cursor
andint
)
-
get_meta_by_ortho_ids
(orthodb_ids, projection={'_id': 0, 'gene_ortholog': 0}, limit=0)[source]¶ Get meta given kegg ids
- Parameters
orthodb_ids (
list
ofstr
) – List of orthodb ids.projection (
dict
) – MongoDB result projection.
- Returns
pymongo Cursor obj and number of documents found.
- Return type
(
tuple
ofpymongo.Cursor
andint
)
-
3.1.1.3.7. datanator_query_python.query.query_metabolite_concentrations module¶
-
class
datanator_query_python.query.query_metabolite_concentrations.
QueryMetaboliteConcentrations
(MongoDB=None, db=None, collection_str=None, username=None, password=None, authSource='admin', readPreference='nearest', verbose=True, replicaSet=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
get_conc_by_taxon
(_id)[source]¶ Get concentrations by ncbi taxonomy ID.
- Parameters
_id (
int
) – NCBI Taxonomy ID.- Returns
(
Pymongo.Cursor
)
-
get_similar_concentrations
(metabolite, threshold=0.6)[source]¶ Get metabolite’s similar compounds’ concentrations above threshold tanimoto value.
- Parameters
metabolite (
str
) – InChIKey of metabolite.threshold (
float
, optional) – Threshold value (inclusive).
- Returns
[{‘inchikey’: xxxx, ‘similarity_score’: …, ‘concentrations’: []}]
- Return type
(
list
ofObj
)
-
3.1.1.3.8. datanator_query_python.query.query_metabolites module¶
Metabolite Query :Author: Bilal Shaikh <bilalshaikh42@gmail.com>
Zhouyang Lian <zhouyang.lian@familian.life>
- Date
2019-08-01
- Copyright
2019, Karr Lab
- License
MIT
-
class
datanator_query_python.query.query_metabolites.
QueryMetabolites
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to metabolites (ECMDB, YMDB) collection
-
get_conc_from_inchi
(inchi, inchi_key=False, consensus=False, projection={'_id': 0})[source]¶ Given inchi, find the metabolite’s concentration values.
- Parameters
inchi (
str
) – inchi or inchi key of metabolite.inchi_key (
bool
) – input is InChI Key or not.(obj (consensus) – bool): whether to return consensus values or list of individual values.
- Returns
list of obj: dict): concentration values separated by collections e.g. [{‘ymdb’: }, {‘ecmdb’: }]
- Return type
(obj
-
get_concentration_count
()[source]¶ Get number of metabolites with concentration values.
- Returns
Number of metabolites with concentrations.
- Return type
(
int
)
-
get_meta_from_inchis
(inchis, species, last_id='000000000000000000000000', page_size=20)[source]¶ Get all information about metabolites given a list of inchi strings :param inchis (obj: list of obj: str): list of inchi strings :param species (obj: str): name of species in which the metabolite resides :param last_id (obj: str): hex encoded version of ObjectId o, which is the last item of the previous page :param page_size (obj: int): number of items per page
- Returns
list of obj: dict): list of information
- Return type
result (obj
-
3.1.1.3.9. datanator_query_python.query.query_metabolites_meta module¶
-
class
datanator_query_python.query.query_metabolites_meta.
QueryMetabolitesMeta
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, collection_str='metabolites_meta', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to metabolites_meta collection
-
get_doc_by_name
(names)[source]¶ Get document by metabolite’s list of possible names.
- Parameters
names (
list
ofstr
) – Name of possible names.- Returns
(
Obj
)
-
get_eymeta
(inchi_key)[source]¶ Get meta info from ECMDB or YMDB
- Parameters
inchi_key (
str
) – inchikey / name of metabolite molecule.- Returns
meta information.
- Return type
(
Obj
)
-
get_ids_from_hash
(hashed_inchi)[source]¶ Given a hashed inchi string, find its corresponding m2m_id and/or ymdb_id :param hashed_inchi (obj: str): string of hashed inchi
- Returns
- dict): dictionary of ids and their keys
{‘m2m_id’: …, ‘ymdb_id’: …}
- Return type
result (obj
-
get_ids_from_hashes
(hashed_inchi)[source]¶ Given a list of hashed inchi string, find their corresponding m2m_id and/or ymdb_id :param hashed_inchi (obj: list of obj: str): list of hashed inchi
- Returns
- list of obj: dict): dictionary of ids and their keys
[{‘m2m_id’: …, ‘ymdb_id’: …, ‘InChI_Key’: …}, {}, ..]
- Return type
result (obj
-
get_metabolite_hashed_inchi
(compounds)[source]¶ Given a list of compound name(s) Return the corresponding hashed inchi string :param compounds: [‘ATP’, ‘2-Ketobutanoate’]
- Returns
[‘3e23df….’, ‘7666ffa….’]
- Return type
hashed_inchi
-
get_metabolite_inchi
(compounds)[source]¶ Given a list of compound name(s) Return the corrensponding inchi string
- Parameters
compounds – list of compounds
'2-Ketobutanoate'] (['ATP',) –
- Returns
[‘….’, ‘InChI=1S/C4H6O3/c1-2-3(5)4(6)7/…’]
-
get_metabolite_name_by_hash
(compounds)[source]¶ Given a list of hashed inchi, return a list of name (one of the synonyms) for each compound :param compounds: list of compounds in inchikey format
- Returns
- list of names
[name, name, name]
- Return type
result
-
get_metabolite_synonyms
(compounds)[source]¶ Find synonyms of a compound
- Parameters
compound (list) – name(s) of the compound e.g. “ATP”, [“ATP”, “Oxygen”, …]
- Returns
- dictionary of synonyms of the compounds
{‘ATP’: [], ‘Oxygen’: [], …}
- rxns: dictionary of rxns in which each compound is found
{‘ATP’: [12345,45678,…], ‘Oxygen’: […], …}
- Return type
synonyms
-
3.1.1.3.10. datanator_query_python.query.query_pax module¶
-
class
datanator_query_python.query.query_pax.
QueryPax
(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='pax', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to pax collection
-
get_abundance_from_uniprot
(uniprot_id)[source]¶ Get all abundance data for uniprot_id
- Parameters
uniprot_id (
str
) – protein uniprot_id.- Returns
result containing [{‘ncbi_taxonomy_id’: , ‘species_name’: , ‘ordered_locus_name’: }, {‘organ’: , ‘abundance’}, {‘organ’: , ‘abundance’}].
- Return type
result (
list
ofdict
)
-
get_all_species
()[source]¶ Get a list of all species in pax collection
- Returns
- list of specie names
with no duplicates
- Return type
results (
list
ofstr
)
-
get_file_by_name
(file_name: list, projection={'_id': 0}, collation=None) → list[source]¶ Given file name, get the information attached to the file.
- Parameters
file_name (
list
) – list of file names, e.g. [‘9606/9606-iPS_(DF19.11)_iTRAQ-114_Phanstiel_2011_gene.txt’]- Returns
files that meet the requirement
- Return type
list
-
get_file_by_ncbi_id
(taxon: list, projection={'_id': 0}, collation=None) → list[source]¶ Given the list of taxon ncbi ID, get all the files associated to the taxon.
- Parameters
taxon (
list
) – list of taxon ncbi ID- Returns
files that meet the requirement
- Return type
list
-
get_file_by_organ
(organ, projection={'_id': 0})[source]¶ Get documents by organ
- Parameters
organ (
str
) – organ type in paxdbprojection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0}.
- Returns
- tuple containing:
docs (
Interator
): mongodb docs interator; count (int
): total number of documents that meet the query conditions.
- Return type
(
tuple
)
-
get_file_by_publication
(publication, projection={'_id': 0})[source]¶ Get documents by publication
- Parameters
publication (
str
) – URL of publicationprojection (
dict
, optional) – mongodb query projection. Defaults to {‘_id’: 0}.
- Returns
- tuple containing:
docs (
Interator
): mongodb docs interator; count (int
): total number of documents that meet the query conditions.
- Return type
(
tuple
)
-
get_file_by_quality
(organ, score=4.0, coverage=20, ncbi_id=None, projection={'_id': 0, 'weight': 0})[source]¶ Get ‘organ’s’ paxdb file by quality of data
- Parameters
organ (
str
) – organ type in paxdb, e.g. WHOLE_ORGANISM, CELL_LINE, etcscore (
float
, optional) – paxdb data quality score. Defaults to 4.0.coverage (
int
, optional) – paxdb data coverage. Defaults to 20.ncbi_id (
int
, optional) – ncbi taxonomy id of organism. Defaults to None.projection (
dict
, optional) – mongodb query projection. Defaults to {‘_id’: 0, ‘weight’: 0}
- Returns
- tuple containing:
docs (
Interator
): mongodb docs interator; count (int
): total number of documents that meet the query conditions.
- Return type
(
tuple
)
-
3.1.1.3.11. datanator_query_python.query.query_protein module¶
-
class
datanator_query_python.query.query_protein.
QueryProtein
(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='uniprot', readPreference='nearest', replicaSet=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
get_abundance_by_id
(_id)[source]¶ Get protein abundance information by uniprot_id.
- Parameters
id – list of uniprot_id.
-
get_abundance_by_ko
(ko)[source]¶ Get abundance information of proteins with the same KO.
- Parameters
ko (
str
) – KO number.- Returns
information [{‘uniprot_id’: , ‘abundances’: }, {},…,{}].
- Return type
(
list
ofdict
)
-
get_abundance_by_taxon
(_id)[source]¶ Get protein abundance information in one species.
- Parameters
id (
str
) – taxonomy id.- Returns
list of abundance information
- Return type
(
list
of dict)
-
get_abundance_with_same_ko
(_id)[source]¶ Find abundance information for protein with the same KO number.
- Parameters
_id (
str
) – uniprot ID.- Returns
information [{‘uniprot_id’: , ‘abundances’: }, {},…,{}].
- Return type
(
list
ofdict
)
-
get_all_kegg
(ko, anchor, max_distance)[source]¶ - Get replacement abundance value by taxonomic distance
with the same kegg_orthology number.
- Parameters
ko (
str
) – kegg orthology id to query for.anchor (
str
) – anchor species’ name.max_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.max_depth (
int
) –
- Returns
list of result proteins and their info [ {‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].
- Return type
(
list
ofdict
)
-
get_all_ortho
(ko, anchor, max_distance)[source]¶ - Get replacement abundance value by taxonomic distance
with the same OrthoDB group number.
- Parameters
ko (
str
) – OrthoDB group id to query for.anchor (
str
) – anchor species’ name.max_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.max_depth (
int
) –
- Returns
list of result proteins and their info [ {‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].
- Return type
(
list
ofdict
)
-
get_equivalent_kegg_with_anchor_obsolete
(ko, anchor, max_distance, max_depth=inf)[source]¶ Get replacement abundance value by taxonomic distance with the same kegg_orthology number.
- Parameters
ko (
str
) – kegg orthology id to query for.anchor (
str
) – anchor species’ name.max_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.max_depth (
int
) –
- Returns
- list of result proteins and their info
- [{‘distance’: 0, ‘documents’: [{}]}
{‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].
- Return type
(
list
ofdict
)
-
get_equivalent_protein
(_id, max_distance, max_depth=inf)[source]¶ Get replacement abundance value by taxonomic distance with the same kegg_orthology number.
- Parameters
_id (
str
) – uniprot_id to query for.max_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.max_depth (
int
) –
- Returns
- list of result proteins and their info
- [{‘distance’: 1, ‘documents’: [{}, {}, {} …]},
{‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].
- Return type
(
list
ofdict
)
-
get_equivalent_protein_with_anchor
(_id, max_distance, max_depth=inf)[source]¶ Get replacement abundance value by taxonomic distance with the same kegg_orthology number.
- Parameters
_id (
str
) – uniprot_id to query for.max_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.max_depth (
int
) –
- Returns
- list of result proteins and their info
- [{‘distance’: 0, ‘documents’: [{}]}
{‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].
- Return type
(
list
ofdict
)
-
get_id_by_name
(name)[source]¶ Get proteins whose name contains string ‘name’.
- Parameters
name (
str
) – complete/incomplete protein name.- Returns
list of dictionary containing protein’s uniprot_id and name.
- Return type
(
list
ofdict
)
-
get_info_by_ko
(ko)[source]¶ Find all proteins with the same kegg orthology id.
- Parameters
ko (
str
) – kegg orthology ID.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].
- Return type
(
list
ofdict
)
-
get_info_by_ko_abundance
(ko)[source]¶ Find all proteins with the same kegg orthology id.
- Parameters
ko (
str
) – kegg orthology ID.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {}},
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {}}].
- Return type
(
list
ofdict
)
-
get_info_by_orthodb
(orthodb)[source]¶ Find all proteins with the same kegg orthology id.
- Parameters
orthodb (
str
) – kegg orthology ID.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘orthodb_id’: … ‘orthodb_name’: … ‘uniprot_ids’: []},
{‘orthodb_id’: … ‘orthodb_name’: … ‘uniprot_ids’: []}].
- Return type
(
list
ofdict
)
-
get_info_by_taxonid
(_id)[source]¶ Get proteins whose name or kegg name contains string ‘name’.
- Parameters
_id (
int
) – ncbi taxonomy id.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].
- Return type
(
list
ofdict
)
-
get_info_by_taxonid_abundance
(_id)[source]¶ Get proteins associated with ncbi id.
- Parameters
_id (
int
) – ncbi taxonomy id.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}},
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}].
- Return type
result (
list
ofdict
)
-
get_info_by_text
(name)[source]¶ Get proteins whose name or kegg name contains string ‘name’.
- Parameters
name (
str
) – complete/incomplete protein name.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].
- Return type
(
list
ofdict
)
-
get_info_by_text_abundances
(name)[source]¶ Get proteins whose name or kegg name contains string ‘name’.
- Parameters
name (
str
) – complete/incomplete protein name.- Returns
list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}, # 0: has abundances info, 1: no abundances infor
{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}].
- Return type
(
list
ofdict
)
-
get_kegg_orthology
(uniprot_id)[source]¶ Get protein’s kegg orthology number given uniprot id.
- Parameters
uniprot_id (
str
) – protein’s uniprot id.- Returns
- tuple containing:
(
str
): kegg orthology id; (list
ofstr
): list of kegg orthology descriptions.
- Return type
(
tuple
)
-
get_kinlaw_by_id
(_id)[source]¶ Get protein kinetic law information by uniprot_id.
- Parameters
_id (
list
ofstr
) – list of uniprot IDs.- Returns
list of kinlaw information.
- Return type
(
list
of dict)
-
get_kinlaw_by_name
(name)[source]¶ Get protein kinetic law information by protein name.
- Parameters
_id – (
str
): protein’s name.- Returns
information.
- Return type
(
list
ofdict
)
-
get_meta_by_id
(_id)[source]¶ Get protein’s metadata given uniprot id
- Parameters
_id (
list
ofstr
) – list of uniprot id.- Returns
list of information.
- Return type
(
list
ofdict
)
-
get_meta_by_name_name
(protein_name, species_name)[source]¶ Get protein metadata by protein name and the name of the species the protein resides
- Parameters
protein_name (
str
) – name of the proteinspecies_name (
str
) – complete/partial name of the organism
- Returns
protein’s metadata
- Return type
(
list
ofdict
)
-
get_meta_by_name_taxon
(name, taxon_id)[source]¶ Get protein’s metadata given protein name and its ncbi taxonomy ID
- Parameters
name (
str
) – protein’s complete/partial name.taxon_id (
int
) – protein’s ncbi taxonomy id.
- Returns
protein’s metadata.
- Return type
(
list
ofdict
)
-
get_ortho_by_id
(_id)[source]¶ Get protein’s metadata given uniprot id
- Parameters
_id (
str
) – uniprot id.- Returns
list of information.
- Return type
(
list
ofdict
)
-
get_proximity_abundance_taxon
(_id, max_distance=3)[source]¶ Get replacement abundance value by taxonomic distance with the same kegg_orthology number.
- Parameters
_id (
str
) – uniprot_id to query formax_distance (
int
) – max taxonomic distance from origin protein allowed for proteins in results.
- Returns
- list of result proteins and their info
- [{‘distance’: 1, ‘documents’: [{}, {}, {} …]},
{‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …]
- Return type
(
list
ofdict
)
-
get_uniprot_by_ko
(ko)[source]¶ Find all proteins with the same kegg orthology id.
- Parameters
ko (
str
) – kegg orthology ID.- Returns
list of uniprot_id.
- Return type
(
list
ofstr
)
-
3.1.1.3.12. datanator_query_python.query.query_rna_halflife module¶
-
class
datanator_query_python.query.query_rna_halflife.
QueryRNA
(server=None, username=None, password=None, verbose=False, db=None, collection_str=None, authDB='admin', readPreference='nearest', replicaSet=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
get_doc_by_ko
(ko_number, projection={'_id': 0}, _from=0, size=0)[source]¶ Get documents by ko_number
- Parameters
ko_number (
str
) – Kegg ortholog number.projection (
dict
, optional) – mongodb query resultDefaults to {'_id' (projection.) – 0}.
_from (
int
) – first page (0-indexed).size (
int
) – number of items per page.
- Returns
pymongo interable and number of documents.
- Return type
(
tuple
ofPymongo.Cursor
andint
)
-
get_doc_by_names
(name, projection={'_id': 0}, _from=0, size=0)[source]¶ Get document by protein name
- Parameters
name (
str
) – name of the proteinprojection (
dict
, optional) – mongodb query result projection. Defaults to {‘_id’: 0}._from (
int
) – first page (0-indexed).size (
int
) – number of items per page.
- Returns
Pymongo cursor object and number of documents returned.
- Return type
(
tuple
ofPymongo.Cursor
andint
)
-
get_doc_by_oln
(oln, projection={'_id': 0})[source]¶ Get document by ordered locus name
- Parameters
oln (
str
) – odered locus name.projection (
dict
) – pymongo query projection.
- Returns
Pymongo cursor object and number of documents returned
- Return type
(
tuple
ofPymongo.Cursor
andint
)
-
get_doc_by_orthodb
(orthodb, projection={'_id': 0}, _from=0, size=0)[source]¶ Get documents by orthodb group ID.
- Parameters
orthodb (
str
) – Orthodb group ID.projection (
dict
, optional) – mongodb query resultDefaults to {'_id' (projection.) – 0}.
_from (
int
) – first page (0-indexed).size (
int
) – number of items per page.
- Returns
pymongo interable and number of documents.
- Return type
(
tuple
ofPymongo.Cursor
andint
)
-
3.1.1.3.13. datanator_query_python.query.query_sabio_compound module¶
-
class
datanator_query_python.query.query_sabio_compound.
QuerySabioCompound
(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='sabio_compound', readPreference='nearest', replicaSet=None)[source]¶
3.1.1.3.14. datanator_query_python.query.query_sabio_reaction_entries module¶
-
class
datanator_query_python.query.query_sabio_reaction_entries.
QuerySabioRxn
(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_reaction_entries', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to sabio_reaction_entries collection
-
get_ids_by_participant_inchikey
(substrates, products, dof=1)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey
- Parameters
substrates (
list
) – list of substrates’ inchikeyproducts (
list
) – list of products’ inchikeydof (
int
, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0
- Returns
list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]
- Return type
rxns
-
3.1.1.3.15. datanator_query_python.query.query_sabiork module¶
-
class
datanator_query_python.query.query_sabiork.
QuerySabio
(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_rk', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to sabio_rk collection
-
find_reaction_participants
(kinlaw_id)[source]¶ Find the reaction participants defined in sabio_rk using kinetic law id
- Parameters
kinlaw_id (
list
ofint
) –- Returns
rxns (
list
ofdict
) list of dictionaries containing names of reaction participants [{‘substrates’: [], ‘products’: [] }, … {} ]
-
get_kinlaw_by_environment
(taxon=None, taxon_wildtype=None, ph_range=None, temp_range=None, name_space=None, observed_type=None, projection={'_id': 0})[source]¶ get kinlaw info based on experimental conditions
- Parameters
taxon (
list
, optional) – list of ncbi taxon idtaxon_wildtype (
list
ofbool
, optional) – True indicates wildtype and False indicates mutantph_range (
list
, optional) – range of pHtemp_range (
list
, optional) – range of temperaturename_space (
dict
, optional) – cross_reference key/value pair, i.e. {‘ec-code’: ‘3.4.21.62’}observed_type (
list
, optional) – possible values for parameters.observed_typeprojection (
dict
, optional) – mongodb query result projection
- Returns
list of kinetic laws that meet the constraints
- Return type
(list)
-
get_kinlawid_by_inchi
(hashed_inchi)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchi string :param inchi: list of inchi, all in one rxn :type inchi:
list
ofstr
- Returns
list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]
- Return type
rxns (
list
ofint
)
-
get_kinlawid_by_name
(substrates, products)[source]¶ Get kinlaw_id from substrates and products, all in one reaction
- Parameters
substrates – (
list
ofstr
): list of substrate namesproducts – (
list
ofstr
): list of product names
- Returns
(
list
ofstr
): list of compound names- Return type
result
-
get_kinlawid_by_rxn
(substrates, products)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchi string
- Parameters
substrates – list of substrates’ inchi
products – list of products’ inchi
- Returns
list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]
- Return type
rxns
-
3.1.1.3.16. datanator_query_python.query.query_sabiork_old module¶
-
class
datanator_query_python.query.query_sabiork_old.
QuerySabioOld
(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_rk_old', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to sabio_rk collection
-
get_info_by_entryid
(entry_id, target_organism=None, size=10, last_id=0)[source]¶ Find reactions by sabio entry id, return all information
- Parameters
entry_id (
int
) – entry_idtarget_organism (
str
) – the organism in which the reaction occurssize (
int
) – pagination page sizelast_id (
int
) –
- Returns
list of documents of entry id
- Return type
(
list
ofdict
)
-
get_kinlaw_by_entryid
(entry_id)[source]¶ Find reactions by sabio entry id
- Parameters
entry_id (
int
) – entry_idReturn – (
dict
): {‘kinlaw_id’: [], ‘substrates’: [], ‘products’: []}
-
get_kinlaw_by_environment
(taxon=None, taxon_wildtype=None, ph_range=None, temp_range=None, name_space=None, param_type=None, projection={'_id': 0})[source]¶ get kinlaw info based on experimental conditions
- Parameters
taxon (
list
, optional) – list of ncbi taxon idtaxon_wildtype (
list
ofbool
, optional) – True indicates wildtype and False indicates mutantph_range (
list
, optional) – range of pHtemp_range (
list
, optional) – range of temperaturename_space (
dict
, optional) – cross_reference key/value pair, i.e. {‘ec-code’: ‘3.4.21.62’}param_type (
list
, optional) – possible values for parameters.typeprojection (
dict
, optional) – mongodb query result projection
- Returns
(
tuple
) consisting of docs (list
ofdict
): list of docs; count (int
): number of documents found
-
get_kinlaw_by_rxn
(substrates, products, dof=0, projection={'_id': 0, 'kinlaw_id': 1}, bound='loose', skip=0, limit=0)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey
- Parameters
substrates (
list
) – list of substrates’ inchikeyproducts (
list
) – list of products’ inchikeydof (
int
, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0projection (
dict
) – pymongo query projectionbound (
str
) – limit substrates/products to include only input values
- Returns
list of kinlaws that satisfy the condition
- Return type
(
list
ofdict
)
-
get_kinlaw_by_rxn_name
(substrates, products, projection={'_id': 0, 'kegg_meta._id': 0, 'kegg_meta.gene_ortholog': 0}, bound='loose', skip=0, limit=0)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ names
- Parameters
substrates (
list
) – list of substrates’ namesproducts (
list
) – list of products’ namesprojection (
dict
) – pymongo query projectionbound (
str
) – limit substrates/products to include only input values
- Returns
list of kinlaws that satisfy the condition
- Return type
(
list
ofdict
)
-
get_kinlaw_by_rxn_ortho
(substrates, products, dof=0, projection={'_id': 0, 'enzymes': 1, 'kinlaw_id': 1}, bound='loose', skip=0, limit=0)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey
- Parameters
substrates (
list
) – list of substrates’ inchikeyproducts (
list
) – list of products’ inchikeydof (
int
, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0projection (
dict
) – pymongo query projectionbound (
str
) – limit substrates/products to include only input values
- Returns
list of kinlaws that satisfy the condition
- Return type
(
list
ofdict
)
-
get_kinlawid_by_rxn
(substrates, products, dof=0)[source]¶ Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey
- Parameters
substrates (
list
) – list of substrates’ inchikeyproducts (
list
) – list of products’ inchikeydof (
int
, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0
- Returns
list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]
- Return type
rxns
-
get_reaction_by_subunit
(_ids)[source]¶ Get reactions by enzyme subunit uniprot IDs
- Parameters
_ids (
list
ofstr
) – List of uniprot IDs.- Returns
List of kinlaw IDs.
- Return type
(
list
ofstr
)
-
get_reaction_doc
(kinlaw_id, projection={'_id': 0})[source]¶ Find a document on reaction with the kinlaw_id :param kinlaw_id: :type kinlaw_id:
list
ofint
:param projection: mongodb query result projection :type projection:dict
- Returns
(
tuple
) consisting of docs (list
ofdict
): list of docs; count (int
): number of documents found
-
get_rxn_with_prm
(kinlaw_ids, _from=0, size=10)[source]¶ Given a list of kinlaw ids, return documents where kinlaw has at least one Km or kcat.
- Parameters
kinlaw_ids (
list
ofint
) – List of kinlaw IDs._from (
int
) – record offset. Defaults to 0.size (
int
) – number of records to be returned. Defaults to 10.
- Returns
list of rxn documents, and ids that have parameter
- Return type
(
tuple
oflist
ofdict
andlist
ofint
)
-
3.1.1.3.17. datanator_query_python.query.query_taxon_tree module¶
-
class
datanator_query_python.query.query_taxon_tree.
QueryTaxonTree
(cache_dirname=None, collection_str='taxon_tree', verbose=False, max_entries=inf, username=None, MongoDB=None, password=None, db='datanator-test', authSource='admin', readPreference='nearest', replicaSet=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
Queries specific to taxon_tree collection
-
each_under_category
(src_tax_ids, target_tax_id)[source]¶ Given a list of source organism IDs, check if each ID is the child of target organism.
- Parameters
src_tax_ids (
list
ofint
) – List of NCBI Taxonomy IDs.target_tax_id (
int
) – Target organism ID.
- Returns
Boolean indicating if source is the child or target.
- Return type
(
list
ofbool
)
-
get_all_species
()[source]¶ Get all organisms in taxon_tree collection :returns: list of organisms :rtype: result (
list
ofstr
)
-
get_anc_by_id
(ids)[source]¶ Get organism’s ancestor ids by using organism’s ids :param ids: list of organism’s ids e.g.[12345, 234456]
- Returns
list of ancestors in order of the farthest to the closest
- Return type
(
tuple
oflist
)
-
get_anc_by_name
(names)[source]¶ Get organism’s ancestor ids by using organism’s names :param names: list of organism’s names e.g. Candidatus Diapherotrites
- Returns
list of ancestors ids in order of the farthest to the closest result_name: list of ancestors’ names in order of the farthest to the closest
- Return type
result_id
-
get_canon_common_ancestor
(org1, org2, org_format='tax_id')[source]¶ Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name
- Returns
(
Obj
)
-
get_canon_common_ancestor_fast
(org1, org2, org_format='tax_id')[source]¶ Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name
- Returns
(
Obj
)
-
get_canon_rank_distance
(_id, front_end=False)[source]¶ - Given the ncbi_id, return canonically-ranked ancestors
along the lineage and their non-canonical distances
- Parameters
_id (
int
) – ncbi_id of the organism.front_end (
bool
) – meets front_end request
- Returns
canonical organisms and distances e.g. [{‘a’:1}, {‘b’: 3}, …]
- Return type
(
list
ofdict
)
-
get_canon_rank_distance_by_name
(name, front_end=False)[source]¶ - Given the name of species, return canonically-ranked ancestors
along the lineage and their non-canonical distances
- Parameters
name (
str
) – name of the organism.front_end (
bool
) – meets front_end request
- Returns
canonical organisms and distances e.g. [{‘a’:1}, {‘b’: 3}, …]
- Return type
(
list
ofdict
)
-
get_common_ancestor
(org1, org2, org_format='name')[source]¶ Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name
- Returns
closest common ancestor’s name distance: each organism’s distance to the ancestor
- Return type
ancestor
-
get_equivalent_species
(_id, max_distance, max_depth=inf)[source]¶ Get equivalent species of species with tax_id _id, given the max taxonomic distances, for instance, given three species {‘tax_id’: 8, ‘anc_id’: [5,4,3,2,6,7]} {‘tax_id’: 9, ‘anc_id’: [5,4,3]} {‘tax_id’: 0, ‘anc_id’: [5,4,3,2,1]} the equivalent species of 0 given max_distance of 2, is 8 the equivalent species of 0 given max_distance of 3, is 8 and 9 :param _id: taxonomy id of the species :type _id:
int
:param max_distance: max distance allowed from species _id :type max_distance:int
:param max_depth: :type max_depth:int
- Returs:
ids (
list
ofint
): list of ids of the species that met the condition names (list
ofstr
) list of names of the species that met the condition
-
get_ids_by_name
(name)[source]¶ Get all taxon ids associated with an organism name :param name: species name :type name:
str
- Returns
list of taxon ids
- Return type
ids (
list
ofint
)
-
get_name_by_id
(ids)[source]¶ Get organisms’ names given their tax_ids :param ids: list of organisms’ tax_ids :type ids:
list
- Returns
organisms’ ids and names
- Return type
(
dict
)
-
get_rank
(ids)[source]¶ Given a list of taxon ids, return the list of ranks. no rank = ‘+’ :param ids: list of taxon ids [1234,2453,431]
- Returns
list of ranks [‘kingdom’, ‘+’, ‘phylum’]
- Return type
ranks
-
under_category
(src_tax_id, target_tax_id)[source]¶ Given source taxonomy id, check if it is among the children of target tax id.
- Parameters
src_tax_id (
int
) – source oragnism taxonomic ID.target_tax_id (
int
) – target organism taxonomic ID.
- Returns
whether source is under target organism.
- Return type
(
bool
)
-
3.1.1.3.18. datanator_query_python.query.query_uniprot module¶
-
class
datanator_query_python.query.query_uniprot.
QueryUniprot
(username=None, password=None, server=None, authSource='admin', database='datanator', collection_str=None, readPreference='nearest', replicaSet=None)[source]¶ Bases:
object
-
get_doc_by_locus
(locus, projection={'_id': 0})[source]¶ Get preferred gene name by locus name
- Parameters
locus (
str
) – Gene locus nameprojection (
dict
, optional) – MongoDB query projection. Defaults to {‘_id’:0}.
- Returns
pymongo cursor object and number of documents.
- Return type
(
tuple
ofIter
and int)
-
get_gene_protein_name_by_embl
(embl, species=None, projection={'_id': 0})[source]¶ Get documents by EMBL or RefSeq.
- Parameters
embl (
list
) – EMBL information.species (
list
) – NCBI taxonomy id. Defaults to None.projection (
dict
, optional) – Pymongo projection. Defaults to {‘_id’: 0}.
- Returns
gene_name and protein_name
- Return type
(
tuple
ofstr
)
-
get_gene_protein_name_by_oln
(oln, species=None, projection={'_id': 0})[source]¶ Get documents by ordered locus name
- Parameters
oln (
str
) – Ordered locus name.species (
list
) – NCBI taxonomy id. Defaults to None.projection (
dict
, optional) – Pymongo projection. Defaults to {‘_id’: 0}.
- Returns
gene_name and protein_name
- Return type
(
tuple
ofstr
)
-
get_id_by_org_gene
(org_gene)[source]¶ Convert kegg org_gene into uniprot id.
- Parameters
org_gene (
str
) – Kegg org_gene format, e.g. aly:ARALYDRAFT_486312.- Returns
Uniprot ID.
- Return type
(
str
)
-
get_info_by_entrez_id
(entrez_id)[source]¶ Get protein info by gene entrez information
- Parameters
entrez_id (
str
) – Gene entrez id.- Returns
Uniprot ID.
- Return type
(
str
)
-
get_names_by_gene_name
(gene_name)[source]¶ Get standard gene name by gene name.
- Parameters
gene_name (
list
ofstr
) – list of gene names belonging to one protein.- Returns
standard gene_name, protein_name
- Return type
(
tuple
ofstr
)
-
get_protein_name_by_gn
(gene_name, species=None, projection={'_id': 0})[source]¶ Get documents by gene name.
- Parameters
gene_name (
str
) – gene name.species (
list
) – NCBI taxonomy id. Defaults to None.projection (
dict
, optional) – Pymongo projection. Defaults to {‘_id’: 0}.
- Returns
gene_name and protein_name
- Return type
(
tuple
ofstr
)
-
3.1.1.3.19. datanator_query_python.query.query_uniprot_org module¶
For querying uniprot.org using uniprot API (https://www.uniprot.org/help/api_queries)
-
class
datanator_query_python.query.query_uniprot_org.
QueryUniprotOrg
(query, api='https://www.uniprot.org/uniprot/?', include='yes', compress='no', limit=1, offset=0)[source]¶ Bases:
object
-
get_kegg_ortholog
()[source]¶ Get kegg ortholog information using query message.
- Returns
kegg ortholog number
- Return type
(
str
)
-
3.1.1.3.20. datanator_query_python.query.query_xmdb module¶
-
class
datanator_query_python.query.query_xmdb.
QueryXmdb
(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='ecmdb', readPreference='nearest', replicaSet=None)[source]¶ Bases:
object
-
get_all_concentrations
(projection={'_id': 0, 'inchi': 1, 'inchikey': 1, 'name': 1, 'smiles': 1})[source]¶ Get all entries that have concentration values
- Parameters
projection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0, ‘inchi’: 1,’inchikey’: 1, ‘smiles’: 1, ‘name’: 1}.
- Returns
all results that meet the constraint.
- Return type
(list)
-