3.1.1.3. datanator_query_python.query package

3.1.1.3.1. Submodules

3.1.1.3.3. datanator_query_python.query.query_corum module

class datanator_query_python.query.query_corum.QueryCorum(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='corum', readPreference='nearest', replicaSet=None)[source]

Bases: object

get_complexes_with_ncbi(ncbi_id, projection={'_id': 0})[source]

Find all complexes in species with ncbi taxonomy id

Parameters

ncbi (int) – ncbi taxonomy id

Returns

list of all objects that meet the constraint

Return type

(list)

get_complexes_with_uniprot(uniprot_id, ncbi_id=9606)[source]

Find complexes in species that have protein with uniprot_id

Parameters
  • uniprot_id (str) – uniprot id of protein

  • ncbi_id (int, optional) – ncbi taxonomy id of species. Defaults to 9606.

Returns

list of complexes that meet the requirement

Return type

(list of dict)

3.1.1.3.4. datanator_query_python.query.query_intact_complex module

class datanator_query_python.query.query_intact_complex.QueryIntactComplex(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='intact_complex', readPreference='nearest', replicaSet=None)[source]

Bases: object

get_complex_with_ncbi(ncbi)[source]

Get complexes that are in species with ncbi id

Parameters

ncbi (int) – ncbi taxonomy id of species

Returns

list of complexes

Return type

(list of dict)

3.1.1.3.5. datanator_query_python.query.query_kegg_organism_code module

class datanator_query_python.query.query_kegg_organism_code.QueryKOC(username=None, password=None, server=None, authSource='admin', database='datanator', collection_str=None, readPreference='nearest', replicaSet=None)[source]

Bases: object

get_ncbi_by_org_code(org_code)[source]

Get kegg organism code by NCBI Taxonomy ID.

Parameters

org_code (int) – Kegg organism code.

Returns

NCBI Taxonomy ID.

Return type

(int)

get_org_code_by_ncbi(_id)[source]

Get Kegg organism code given NCBI Taxonomy ID.

Parameters

_id (int) – NCBI Taxonomy ID.

Returns

Kegg organism code.

Return type

(str)

3.1.1.3.6. datanator_query_python.query.query_kegg_orthology module

class datanator_query_python.query.query_kegg_orthology.QueryKO(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, readPreference='nearest', replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

get_def_by_kegg_id(kegg_id)[source]

Get kegg definition by kegg id

Parameters

kegg_id (str) – kegg orthology

Returns

list of kegg orthology definitions

Return type

(list of str)

get_ko_by_name(name)[source]

Get a gene’s ko number by its gene name

Parameters

name – (str): gene name

Returns

(str): ko number of the gene

Return type

result

get_loci_by_id_org(kegg_id, org, gene_id)[source]

Get ortholog locus id given kegg_id, organism code and gene_id.

Parameters
  • kegg_id (str) – Kegg ortholog id.

  • org (str) – Kegg organism code.

  • gene_id (str) – Gene id.

Returns

locus id.

Return type

(str)

get_meta_by_kegg_id(kegg_id)[source]

Get meta information by kegg_id

Parameters

kegg_id (str) – Kegg ID.

Returns

Kegg meta object.

Return type

(Obj)

get_meta_by_kegg_ids(kegg_ids, projection={'_id': 0, 'gene_ortholog': 0})[source]

Get meta given kegg ids

Parameters
  • kegg_ids (list of str) – List of kegg ids.

  • projection (dict) – MongoDB result projection.

Returns

pymongo Cursor obj and number of documents found.

Return type

(tuple of pymongo.Cursor and int)

get_meta_by_ortho_ids(orthodb_ids, projection={'_id': 0, 'gene_ortholog': 0}, limit=0)[source]

Get meta given kegg ids

Parameters
  • orthodb_ids (list of str) – List of orthodb ids.

  • projection (dict) – MongoDB result projection.

Returns

pymongo Cursor obj and number of documents found.

Return type

(tuple of pymongo.Cursor and int)

3.1.1.3.7. datanator_query_python.query.query_metabolite_concentrations module

class datanator_query_python.query.query_metabolite_concentrations.QueryMetaboliteConcentrations(MongoDB=None, db=None, collection_str=None, username=None, password=None, authSource='admin', readPreference='nearest', verbose=True, replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

get_conc_by_taxon(_id)[source]

Get concentrations by ncbi taxonomy ID.

Parameters

_id (int) – NCBI Taxonomy ID.

Returns

(Pymongo.Cursor)

get_conc_count()[source]

Get total number of concentration data points.

get_similar_concentrations(metabolite, threshold=0.6)[source]

Get metabolite’s similar compounds’ concentrations above threshold tanimoto value.

Parameters
  • metabolite (str) – InChIKey of metabolite.

  • threshold (float, optional) – Threshold value (inclusive).

Returns

[{‘inchikey’: xxxx, ‘similarity_score’: …, ‘concentrations’: []}]

Return type

(list of Obj)

3.1.1.3.8. datanator_query_python.query.query_metabolites module

Metabolite Query :Author: Bilal Shaikh <bilalshaikh42@gmail.com>

Date

2019-08-01

Copyright

2019, Karr Lab

License

MIT

class datanator_query_python.query.query_metabolites.QueryMetabolites(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to metabolites (ECMDB, YMDB) collection

get_conc_from_inchi(inchi, inchi_key=False, consensus=False, projection={'_id': 0})[source]

Given inchi, find the metabolite’s concentration values.

Parameters
  • inchi (str) – inchi or inchi key of metabolite.

  • inchi_key (bool) – input is InChI Key or not.

  • (obj (consensus) – bool): whether to return consensus values or list of individual values.

Returns

list of obj: dict): concentration values separated by collections e.g. [{‘ymdb’: }, {‘ecmdb’: }]

Return type

(obj

get_concentration_count()[source]

Get number of metabolites with concentration values.

Returns

Number of metabolites with concentrations.

Return type

(int)

get_meta_from_inchis(inchis, species, last_id='000000000000000000000000', page_size=20)[source]

Get all information about metabolites given a list of inchi strings :param inchis (obj: list of obj: str): list of inchi strings :param species (obj: str): name of species in which the metabolite resides :param last_id (obj: str): hex encoded version of ObjectId o, which is the last item of the previous page :param page_size (obj: int): number of items per page

Returns

list of obj: dict): list of information

Return type

result (obj

3.1.1.3.9. datanator_query_python.query.query_metabolites_meta module

class datanator_query_python.query.query_metabolites_meta.QueryMetabolitesMeta(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, collection_str='metabolites_meta', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to metabolites_meta collection

get_doc_by_name(names)[source]

Get document by metabolite’s list of possible names.

Parameters

names (list of str) – Name of possible names.

Returns

(Obj)

get_eymeta(inchi_key)[source]

Get meta info from ECMDB or YMDB

Parameters

inchi_key (str) – inchikey / name of metabolite molecule.

Returns

meta information.

Return type

(Obj)

get_ids_from_hash(hashed_inchi)[source]

Given a hashed inchi string, find its corresponding m2m_id and/or ymdb_id :param hashed_inchi (obj: str): string of hashed inchi

Returns

dict): dictionary of ids and their keys

{‘m2m_id’: …, ‘ymdb_id’: …}

Return type

result (obj

get_ids_from_hashes(hashed_inchi)[source]

Given a list of hashed inchi string, find their corresponding m2m_id and/or ymdb_id :param hashed_inchi (obj: list of obj: str): list of hashed inchi

Returns

list of obj: dict): dictionary of ids and their keys

[{‘m2m_id’: …, ‘ymdb_id’: …, ‘InChI_Key’: …}, {}, ..]

Return type

result (obj

get_metabolite_hashed_inchi(compounds)[source]

Given a list of compound name(s) Return the corresponding hashed inchi string :param compounds: [‘ATP’, ‘2-Ketobutanoate’]

Returns

[‘3e23df….’, ‘7666ffa….’]

Return type

hashed_inchi

get_metabolite_inchi(compounds)[source]

Given a list of compound name(s) Return the corrensponding inchi string

Parameters
  • compounds – list of compounds

  • '2-Ketobutanoate'] (['ATP',) –

Returns

[‘….’, ‘InChI=1S/C4H6O3/c1-2-3(5)4(6)7/…’]

get_metabolite_name_by_hash(compounds)[source]

Given a list of hashed inchi, return a list of name (one of the synonyms) for each compound :param compounds: list of compounds in inchikey format

Returns

list of names

[name, name, name]

Return type

result

get_metabolite_synonyms(compounds)[source]

Find synonyms of a compound

Parameters

compound (list) – name(s) of the compound e.g. “ATP”, [“ATP”, “Oxygen”, …]

Returns

dictionary of synonyms of the compounds

{‘ATP’: [], ‘Oxygen’: [], …}

rxns: dictionary of rxns in which each compound is found

{‘ATP’: [12345,45678,…], ‘Oxygen’: […], …}

Return type

synonyms

get_metabolites_meta(inchi_key)[source]

Get metabolite’s meta information given inchi_key.

:param (str): InChI Key of metabolites

Returns

meta information object.

Return type

(dict)

get_unique_metabolites()[source]

Get number of unique metabolites.

Returns

number of unique metabolites.

Return type

(int)

3.1.1.3.10. datanator_query_python.query.query_pax module

class datanator_query_python.query.query_pax.QueryPax(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='pax', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to pax collection

get_abundance_from_uniprot(uniprot_id)[source]

Get all abundance data for uniprot_id

Parameters

uniprot_id (str) – protein uniprot_id.

Returns

result containing [{‘ncbi_taxonomy_id’: , ‘species_name’: , ‘ordered_locus_name’: }, {‘organ’: , ‘abundance’}, {‘organ’: , ‘abundance’}].

Return type

result (list of dict)

get_all_species()[source]

Get a list of all species in pax collection

Returns

list of specie names

with no duplicates

Return type

results (list of str)

get_file_by_name(file_name: list, projection={'_id': 0}, collation=None) → list[source]

Given file name, get the information attached to the file.

Parameters

file_name (list) – list of file names, e.g. [‘9606/9606-iPS_(DF19.11)_iTRAQ-114_Phanstiel_2011_gene.txt’]

Returns

files that meet the requirement

Return type

list

get_file_by_ncbi_id(taxon: list, projection={'_id': 0}, collation=None) → list[source]

Given the list of taxon ncbi ID, get all the files associated to the taxon.

Parameters

taxon (list) – list of taxon ncbi ID

Returns

files that meet the requirement

Return type

list

get_file_by_organ(organ, projection={'_id': 0})[source]

Get documents by organ

Parameters
  • organ (str) – organ type in paxdb

  • projection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0}.

Returns

tuple containing:

docs (Interator): mongodb docs interator; count (int): total number of documents that meet the query conditions.

Return type

(tuple)

get_file_by_publication(publication, projection={'_id': 0})[source]

Get documents by publication

Parameters
  • publication (str) – URL of publication

  • projection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0}.

Returns

tuple containing:

docs (Interator): mongodb docs interator; count (int): total number of documents that meet the query conditions.

Return type

(tuple)

get_file_by_quality(organ, score=4.0, coverage=20, ncbi_id=None, projection={'_id': 0, 'weight': 0})[source]

Get ‘organ’s’ paxdb file by quality of data

Parameters
  • organ (str) – organ type in paxdb, e.g. WHOLE_ORGANISM, CELL_LINE, etc

  • score (float, optional) – paxdb data quality score. Defaults to 4.0.

  • coverage (int, optional) – paxdb data coverage. Defaults to 20.

  • ncbi_id (int, optional) – ncbi taxonomy id of organism. Defaults to None.

  • projection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0, ‘weight’: 0}

Returns

tuple containing:

docs (Interator): mongodb docs interator; count (int): total number of documents that meet the query conditions.

Return type

(tuple)

3.1.1.3.11. datanator_query_python.query.query_protein module

class datanator_query_python.query.query_protein.QueryProtein(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='uniprot', readPreference='nearest', replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

get_abundance_by_id(_id)[source]

Get protein abundance information by uniprot_id.

Parameters

id – list of uniprot_id.

get_abundance_by_ko(ko)[source]

Get abundance information of proteins with the same KO.

Parameters

ko (str) – KO number.

Returns

information [{‘uniprot_id’: , ‘abundances’: }, {},…,{}].

Return type

(list of dict)

get_abundance_by_taxon(_id)[source]

Get protein abundance information in one species.

Parameters

id (str) – taxonomy id.

Returns

list of abundance information

Return type

(list of dict)

get_abundance_with_same_ko(_id)[source]

Find abundance information for protein with the same KO number.

Parameters

_id (str) – uniprot ID.

Returns

information [{‘uniprot_id’: , ‘abundances’: }, {},…,{}].

Return type

(list of dict)

get_all_kegg(ko, anchor, max_distance)[source]
Get replacement abundance value by taxonomic distance

with the same kegg_orthology number.

Parameters
  • ko (str) – kegg orthology id to query for.

  • anchor (str) – anchor species’ name.

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

  • max_depth (int) –

Returns

list of result proteins and their info [ {‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].

Return type

(list of dict)

get_all_ortho(ko, anchor, max_distance)[source]
Get replacement abundance value by taxonomic distance

with the same OrthoDB group number.

Parameters
  • ko (str) – OrthoDB group id to query for.

  • anchor (str) – anchor species’ name.

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

  • max_depth (int) –

Returns

list of result proteins and their info [ {‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].

Return type

(list of dict)

get_equivalent_kegg_with_anchor_obsolete(ko, anchor, max_distance, max_depth=inf)[source]

Get replacement abundance value by taxonomic distance with the same kegg_orthology number.

Parameters
  • ko (str) – kegg orthology id to query for.

  • anchor (str) – anchor species’ name.

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

  • max_depth (int) –

Returns

list of result proteins and their info
[{‘distance’: 0, ‘documents’: [{}]}

{‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].

Return type

(list of dict)

get_equivalent_protein(_id, max_distance, max_depth=inf)[source]

Get replacement abundance value by taxonomic distance with the same kegg_orthology number.

Parameters
  • _id (str) – uniprot_id to query for.

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

  • max_depth (int) –

Returns

list of result proteins and their info
[{‘distance’: 1, ‘documents’: [{}, {}, {} …]},

{‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].

Return type

(list of dict)

get_equivalent_protein_with_anchor(_id, max_distance, max_depth=inf)[source]

Get replacement abundance value by taxonomic distance with the same kegg_orthology number.

Parameters
  • _id (str) – uniprot_id to query for.

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

  • max_depth (int) –

Returns

list of result proteins and their info
[{‘distance’: 0, ‘documents’: [{}]}

{‘distance’: 1, ‘documents’: [{}, {}, {} …]}, {‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …].

Return type

(list of dict)

get_id_by_name(name)[source]

Get proteins whose name contains string ‘name’.

Parameters

name (str) – complete/incomplete protein name.

Returns

list of dictionary containing protein’s uniprot_id and name.

Return type

(list of dict)

get_info_by_ko(ko)[source]

Find all proteins with the same kegg orthology id.

Parameters

ko (str) – kegg orthology ID.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].

Return type

(list of dict)

get_info_by_ko_abundance(ko)[source]

Find all proteins with the same kegg orthology id.

Parameters

ko (str) – kegg orthology ID.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {}},

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {}}].

Return type

(list of dict)

get_info_by_orthodb(orthodb)[source]

Find all proteins with the same kegg orthology id.

Parameters

orthodb (str) – kegg orthology ID.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘orthodb_id’: … ‘orthodb_name’: … ‘uniprot_ids’: []},

{‘orthodb_id’: … ‘orthodb_name’: … ‘uniprot_ids’: []}].

Return type

(list of dict)

get_info_by_taxonid(_id)[source]

Get proteins whose name or kegg name contains string ‘name’.

Parameters

_id (int) – ncbi taxonomy id.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].

Return type

(list of dict)

get_info_by_taxonid_abundance(_id)[source]

Get proteins associated with ncbi id.

Parameters

_id (int) – ncbi taxonomy id.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}},

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}].

Return type

result (list of dict)

get_info_by_text(name)[source]

Get proteins whose name or kegg name contains string ‘name’.

Parameters

name (str) – complete/incomplete protein name.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []},

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: []}].

Return type

(list of dict)

get_info_by_text_abundances(name)[source]

Get proteins whose name or kegg name contains string ‘name’.

Parameters

name (str) – complete/incomplete protein name.

Returns

list of dictionary containing protein’s uniprot_id and kegg information [{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}, # 0: has abundances info, 1: no abundances infor

{‘ko_number’: … ‘ko_name’: … ‘uniprot_ids’: {‘id0’: 0, ‘id1’: 1, ‘id2’: 0}}].

Return type

(list of dict)

get_kegg_orthology(uniprot_id)[source]

Get protein’s kegg orthology number given uniprot id.

Parameters

uniprot_id (str) – protein’s uniprot id.

Returns

tuple containing:

(str): kegg orthology id; (list of str): list of kegg orthology descriptions.

Return type

(tuple)

get_kinlaw_by_id(_id)[source]

Get protein kinetic law information by uniprot_id.

Parameters

_id (list of str) – list of uniprot IDs.

Returns

list of kinlaw information.

Return type

(list of dict)

get_kinlaw_by_name(name)[source]

Get protein kinetic law information by protein name.

Parameters

_id – (str): protein’s name.

Returns

information.

Return type

(list of dict)

get_meta_by_id(_id)[source]

Get protein’s metadata given uniprot id

Parameters

_id (list of str) – list of uniprot id.

Returns

list of information.

Return type

(list of dict)

get_meta_by_name_name(protein_name, species_name)[source]

Get protein metadata by protein name and the name of the species the protein resides

Parameters
  • protein_name (str) – name of the protein

  • species_name (str) – complete/partial name of the organism

Returns

protein’s metadata

Return type

(list of dict)

get_meta_by_name_taxon(name, taxon_id)[source]

Get protein’s metadata given protein name and its ncbi taxonomy ID

Parameters
  • name (str) – protein’s complete/partial name.

  • taxon_id (int) – protein’s ncbi taxonomy id.

Returns

protein’s metadata.

Return type

(list of dict)

get_ortho_by_id(_id)[source]

Get protein’s metadata given uniprot id

Parameters

_id (str) – uniprot id.

Returns

list of information.

Return type

(list of dict)

get_proximity_abundance_taxon(_id, max_distance=3)[source]

Get replacement abundance value by taxonomic distance with the same kegg_orthology number.

Parameters
  • _id (str) – uniprot_id to query for

  • max_distance (int) – max taxonomic distance from origin protein allowed for proteins in results.

Returns

list of result proteins and their info
[{‘distance’: 1, ‘documents’: [{}, {}, {} …]},

{‘distance’: 2, ‘documents’: [{}, {}, {} …]}, …]

Return type

(list of dict)

get_uniprot_by_ko(ko)[source]

Find all proteins with the same kegg orthology id.

Parameters

ko (str) – kegg orthology ID.

Returns

list of uniprot_id.

Return type

(list of str)

get_unique_organism()[source]

Get number of unique organisms in collection.

Returns

number of unique organisms.

Return type

(int)

get_unique_protein()[source]

Get number of unique proteins in collection

Returns

number of unique proteins.

Return type

(int)

3.1.1.3.12. datanator_query_python.query.query_rna_halflife module

class datanator_query_python.query.query_rna_halflife.QueryRNA(server=None, username=None, password=None, verbose=False, db=None, collection_str=None, authDB='admin', readPreference='nearest', replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

get_doc_by_ko(ko_number, projection={'_id': 0}, _from=0, size=0)[source]

Get documents by ko_number

Parameters
  • ko_number (str) – Kegg ortholog number.

  • projection (dict, optional) – mongodb query result

  • Defaults to {'_id' (projection.) – 0}.

  • _from (int) – first page (0-indexed).

  • size (int) – number of items per page.

Returns

pymongo interable and number of documents.

Return type

(tuple of Pymongo.Cursor and int)

get_doc_by_names(name, projection={'_id': 0}, _from=0, size=0)[source]

Get document by protein name

Parameters
  • name (str) – name of the protein

  • projection (dict, optional) – mongodb query result projection. Defaults to {‘_id’: 0}.

  • _from (int) – first page (0-indexed).

  • size (int) – number of items per page.

Returns

Pymongo cursor object and number of documents returned.

Return type

(tuple of Pymongo.Cursor and int)

get_doc_by_oln(oln, projection={'_id': 0})[source]

Get document by ordered locus name

Parameters
  • oln (str) – odered locus name.

  • projection (dict) – pymongo query projection.

Returns

Pymongo cursor object and number of documents returned

Return type

(tuple of Pymongo.Cursor and int)

get_doc_by_orthodb(orthodb, projection={'_id': 0}, _from=0, size=0)[source]

Get documents by orthodb group ID.

Parameters
  • orthodb (str) – Orthodb group ID.

  • projection (dict, optional) – mongodb query result

  • Defaults to {'_id' (projection.) – 0}.

  • _from (int) – first page (0-indexed).

  • size (int) – number of items per page.

Returns

pymongo interable and number of documents.

Return type

(tuple of Pymongo.Cursor and int)

3.1.1.3.13. datanator_query_python.query.query_sabio_compound module

class datanator_query_python.query.query_sabio_compound.QuerySabioCompound(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='sabio_compound', readPreference='nearest', replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

get_id_by_name(names)[source]

Get sabio compound id given compound name

Parameters

name (list of str) – names of the compound

Returns

sabio compound ids

Return type

(list of int)

get_inchikey_by_name(names)[source]

Get compound InChIKey using compound names.

Parameters

names (list of str) – Names of compounds.

Returns

List of inchikeys (not in the order of the input list).

Return type

(list of str)

3.1.1.3.14. datanator_query_python.query.query_sabio_reaction_entries module

class datanator_query_python.query.query_sabio_reaction_entries.QuerySabioRxn(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_reaction_entries', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to sabio_reaction_entries collection

get_ids_by_participant_inchikey(substrates, products, dof=1)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey

Parameters
  • substrates (list) – list of substrates’ inchikey

  • products (list) – list of products’ inchikey

  • dof (int, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0

Returns

list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]

Return type

rxns

3.1.1.3.15. datanator_query_python.query.query_sabiork module

class datanator_query_python.query.query_sabiork.QuerySabio(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_rk', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to sabio_rk collection

find_reaction_participants(kinlaw_id)[source]

Find the reaction participants defined in sabio_rk using kinetic law id

Parameters

kinlaw_id (list of int) –

Returns

rxns (list of dict) list of dictionaries containing names of reaction participants [{‘substrates’: [], ‘products’: [] }, … {} ]

get_kinlaw_by_environment(taxon=None, taxon_wildtype=None, ph_range=None, temp_range=None, name_space=None, observed_type=None, projection={'_id': 0})[source]

get kinlaw info based on experimental conditions

Parameters
  • taxon (list, optional) – list of ncbi taxon id

  • taxon_wildtype (list of bool, optional) – True indicates wildtype and False indicates mutant

  • ph_range (list, optional) – range of pH

  • temp_range (list, optional) – range of temperature

  • name_space (dict, optional) – cross_reference key/value pair, i.e. {‘ec-code’: ‘3.4.21.62’}

  • observed_type (list, optional) – possible values for parameters.observed_type

  • projection (dict, optional) – mongodb query result projection

Returns

list of kinetic laws that meet the constraints

Return type

(list)

get_kinlawid_by_inchi(hashed_inchi)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchi string :param inchi: list of inchi, all in one rxn :type inchi: list of str

Returns

list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]

Return type

rxns (list of int)

get_kinlawid_by_name(substrates, products)[source]

Get kinlaw_id from substrates and products, all in one reaction

Parameters
  • substrates – (list of str): list of substrate names

  • products – (list of str): list of product names

Returns

(list of str): list of compound names

Return type

result

get_kinlawid_by_rxn(substrates, products)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchi string

Parameters
  • substrates – list of substrates’ inchi

  • products – list of products’ inchi

Returns

list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]

Return type

rxns

get_reaction_doc(kinlaw_id)[source]

Find a document on reaction with the kinlaw_id :param kinlaw_id: :type kinlaw_id: list of int

Returns

list of docs

Return type

result (list of dict)

get_subunit_by_id(_id)[source]

Get protein subunit information by kinlaw_id.

Parameters

_id (int) – kinlaw_id.

Returns

uniprot_id.

Return type

(str)

3.1.1.3.16. datanator_query_python.query.query_sabiork_old module

class datanator_query_python.query.query_sabiork_old.QuerySabioOld(cache_dirname=None, MongoDB=None, replicaSet=None, db='datanator', collection_str='sabio_rk_old', verbose=False, max_entries=inf, username=None, password=None, authSource='admin', readPreference='nearest')[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to sabio_rk collection

get_info_by_entryid(entry_id, target_organism=None, size=10, last_id=0)[source]

Find reactions by sabio entry id, return all information

Parameters
  • entry_id (int) – entry_id

  • target_organism (str) – the organism in which the reaction occurs

  • size (int) – pagination page size

  • last_id (int) –

Returns

list of documents of entry id

Return type

(list of dict)

get_kinlaw_by_entryid(entry_id)[source]

Find reactions by sabio entry id

Parameters
  • entry_id (int) – entry_id

  • Return – (dict): {‘kinlaw_id’: [], ‘substrates’: [], ‘products’: []}

get_kinlaw_by_environment(taxon=None, taxon_wildtype=None, ph_range=None, temp_range=None, name_space=None, param_type=None, projection={'_id': 0})[source]

get kinlaw info based on experimental conditions

Parameters
  • taxon (list, optional) – list of ncbi taxon id

  • taxon_wildtype (list of bool, optional) – True indicates wildtype and False indicates mutant

  • ph_range (list, optional) – range of pH

  • temp_range (list, optional) – range of temperature

  • name_space (dict, optional) – cross_reference key/value pair, i.e. {‘ec-code’: ‘3.4.21.62’}

  • param_type (list, optional) – possible values for parameters.type

  • projection (dict, optional) – mongodb query result projection

Returns

(tuple) consisting of docs (list of dict): list of docs; count (int): number of documents found

get_kinlaw_by_rxn(substrates, products, dof=0, projection={'_id': 0, 'kinlaw_id': 1}, bound='loose', skip=0, limit=0)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey

Parameters
  • substrates (list) – list of substrates’ inchikey

  • products (list) – list of products’ inchikey

  • dof (int, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0

  • projection (dict) – pymongo query projection

  • bound (str) – limit substrates/products to include only input values

Returns

list of kinlaws that satisfy the condition

Return type

(list of dict)

get_kinlaw_by_rxn_name(substrates, products, projection={'_id': 0, 'kegg_meta._id': 0, 'kegg_meta.gene_ortholog': 0}, bound='loose', skip=0, limit=0)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ names

Parameters
  • substrates (list) – list of substrates’ names

  • products (list) – list of products’ names

  • projection (dict) – pymongo query projection

  • bound (str) – limit substrates/products to include only input values

Returns

list of kinlaws that satisfy the condition

Return type

(list of dict)

get_kinlaw_by_rxn_ortho(substrates, products, dof=0, projection={'_id': 0, 'enzymes': 1, 'kinlaw_id': 1}, bound='loose', skip=0, limit=0)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey

Parameters
  • substrates (list) – list of substrates’ inchikey

  • products (list) – list of products’ inchikey

  • dof (int, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0

  • projection (dict) – pymongo query projection

  • bound (str) – limit substrates/products to include only input values

Returns

list of kinlaws that satisfy the condition

Return type

(list of dict)

get_kinlawid_by_rxn(substrates, products, dof=0)[source]

Find the kinlaw_id defined in sabio_rk using rxn participants’ inchikey

Parameters
  • substrates (list) – list of substrates’ inchikey

  • products (list) – list of products’ inchikey

  • dof (int, optional) – degree of freedom allowed (number of parts of inchikey to truncate); the default is 0

Returns

list of kinlaw_ids that satisfy the condition [id0, id1, id2,…, ]

Return type

rxns

get_reaction_by_subunit(_ids)[source]

Get reactions by enzyme subunit uniprot IDs

Parameters

_ids (list of str) – List of uniprot IDs.

Returns

List of kinlaw IDs.

Return type

(list of str)

get_reaction_doc(kinlaw_id, projection={'_id': 0})[source]

Find a document on reaction with the kinlaw_id :param kinlaw_id: :type kinlaw_id: list of int :param projection: mongodb query result projection :type projection: dict

Returns

(tuple) consisting of docs (list of dict): list of docs; count (int): number of documents found

get_rxn_with_prm(kinlaw_ids, _from=0, size=10)[source]

Given a list of kinlaw ids, return documents where kinlaw has at least one Km or kcat.

Parameters
  • kinlaw_ids (list of int) – List of kinlaw IDs.

  • _from (int) – record offset. Defaults to 0.

  • size (int) – number of records to be returned. Defaults to 10.

Returns

list of rxn documents, and ids that have parameter

Return type

(tuple of list of dict and list of int)

get_unique_entries()[source]

Get number of unique curated entries.

Returns

Number of unique entries.

Return type

(int)

get_unique_organisms()[source]

Get number of unique organisms.

Returns

Number of unique organisms.

Return type

(int)

3.1.1.3.17. datanator_query_python.query.query_taxon_tree module

class datanator_query_python.query.query_taxon_tree.QueryTaxonTree(cache_dirname=None, collection_str='taxon_tree', verbose=False, max_entries=inf, username=None, MongoDB=None, password=None, db='datanator-test', authSource='admin', readPreference='nearest', replicaSet=None)[source]

Bases: datanator_query_python.util.mongo_util.MongoUtil

Queries specific to taxon_tree collection

each_under_category(src_tax_ids, target_tax_id)[source]

Given a list of source organism IDs, check if each ID is the child of target organism.

Parameters
  • src_tax_ids (list of int) – List of NCBI Taxonomy IDs.

  • target_tax_id (int) – Target organism ID.

Returns

Boolean indicating if source is the child or target.

Return type

(list of bool)

get_all_species()[source]

Get all organisms in taxon_tree collection :returns: list of organisms :rtype: result (list of str)

get_anc_by_id(ids)[source]

Get organism’s ancestor ids by using organism’s ids :param ids: list of organism’s ids e.g.[12345, 234456]

Returns

list of ancestors in order of the farthest to the closest

Return type

(tuple of list)

get_anc_by_name(names)[source]

Get organism’s ancestor ids by using organism’s names :param names: list of organism’s names e.g. Candidatus Diapherotrites

Returns

list of ancestors ids in order of the farthest to the closest result_name: list of ancestors’ names in order of the farthest to the closest

Return type

result_id

get_canon_common_ancestor(org1, org2, org_format='tax_id')[source]

Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name

Returns

(Obj)

get_canon_common_ancestor_fast(org1, org2, org_format='tax_id')[source]

Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name

Returns

(Obj)

get_canon_rank_distance(_id, front_end=False)[source]
Given the ncbi_id, return canonically-ranked ancestors

along the lineage and their non-canonical distances

Parameters
  • _id (int) – ncbi_id of the organism.

  • front_end (bool) – meets front_end request

Returns

canonical organisms and distances e.g. [{‘a’:1}, {‘b’: 3}, …]

Return type

(list of dict)

get_canon_rank_distance_by_name(name, front_end=False)[source]
Given the name of species, return canonically-ranked ancestors

along the lineage and their non-canonical distances

Parameters
  • name (str) – name of the organism.

  • front_end (bool) – meets front_end request

Returns

canonical organisms and distances e.g. [{‘a’:1}, {‘b’: 3}, …]

Return type

(list of dict)

get_common_ancestor(org1, org2, org_format='name')[source]

Get the closest common ancestor between two organisms and their distances to the said ancestor :param org1: organism 1 :param org2: organism 2 :param org_format: the format of organism eg tax_id or tax_name

Returns

closest common ancestor’s name distance: each organism’s distance to the ancestor

Return type

ancestor

get_equivalent_species(_id, max_distance, max_depth=inf)[source]

Get equivalent species of species with tax_id _id, given the max taxonomic distances, for instance, given three species {‘tax_id’: 8, ‘anc_id’: [5,4,3,2,6,7]} {‘tax_id’: 9, ‘anc_id’: [5,4,3]} {‘tax_id’: 0, ‘anc_id’: [5,4,3,2,1]} the equivalent species of 0 given max_distance of 2, is 8 the equivalent species of 0 given max_distance of 3, is 8 and 9 :param _id: taxonomy id of the species :type _id: int :param max_distance: max distance allowed from species _id :type max_distance: int :param max_depth: :type max_depth: int

Returs:

ids (list of int): list of ids of the species that met the condition names (list of str) list of names of the species that met the condition

get_ids_by_name(name)[source]

Get all taxon ids associated with an organism name :param name: species name :type name: str

Returns

list of taxon ids

Return type

ids (list of int)

get_name_by_id(ids)[source]

Get organisms’ names given their tax_ids :param ids: list of organisms’ tax_ids :type ids: list

Returns

organisms’ ids and names

Return type

(dict)

get_rank(ids)[source]

Given a list of taxon ids, return the list of ranks. no rank = ‘+’ :param ids: list of taxon ids [1234,2453,431]

Returns

list of ranks [‘kingdom’, ‘+’, ‘phylum’]

Return type

ranks

under_category(src_tax_id, target_tax_id)[source]

Given source taxonomy id, check if it is among the children of target tax id.

Parameters
  • src_tax_id (int) – source oragnism taxonomic ID.

  • target_tax_id (int) – target organism taxonomic ID.

Returns

whether source is under target organism.

Return type

(bool)

3.1.1.3.18. datanator_query_python.query.query_uniprot module

class datanator_query_python.query.query_uniprot.QueryUniprot(username=None, password=None, server=None, authSource='admin', database='datanator', collection_str=None, readPreference='nearest', replicaSet=None)[source]

Bases: object

get_doc_by_locus(locus, projection={'_id': 0})[source]

Get preferred gene name by locus name

Parameters
  • locus (str) – Gene locus name

  • projection (dict, optional) – MongoDB query projection. Defaults to {‘_id’:0}.

Returns

pymongo cursor object and number of documents.

Return type

(tuple of Iter and int)

get_gene_protein_name_by_embl(embl, species=None, projection={'_id': 0})[source]

Get documents by EMBL or RefSeq.

Parameters
  • embl (list) – EMBL information.

  • species (list) – NCBI taxonomy id. Defaults to None.

  • projection (dict, optional) – Pymongo projection. Defaults to {‘_id’: 0}.

Returns

gene_name and protein_name

Return type

(tuple of str)

get_gene_protein_name_by_oln(oln, species=None, projection={'_id': 0})[source]

Get documents by ordered locus name

Parameters
  • oln (str) – Ordered locus name.

  • species (list) – NCBI taxonomy id. Defaults to None.

  • projection (dict, optional) – Pymongo projection. Defaults to {‘_id’: 0}.

Returns

gene_name and protein_name

Return type

(tuple of str)

get_id_by_org_gene(org_gene)[source]

Convert kegg org_gene into uniprot id.

Parameters

org_gene (str) – Kegg org_gene format, e.g. aly:ARALYDRAFT_486312.

Returns

Uniprot ID.

Return type

(str)

get_info_by_entrez_id(entrez_id)[source]

Get protein info by gene entrez information

Parameters

entrez_id (str) – Gene entrez id.

Returns

Uniprot ID.

Return type

(str)

get_names_by_gene_name(gene_name)[source]

Get standard gene name by gene name.

Parameters

gene_name (list of str) – list of gene names belonging to one protein.

Returns

standard gene_name, protein_name

Return type

(tuple of str)

get_protein_name_by_gn(gene_name, species=None, projection={'_id': 0})[source]

Get documents by gene name.

Parameters
  • gene_name (str) – gene name.

  • species (list) – NCBI taxonomy id. Defaults to None.

  • projection (dict, optional) – Pymongo projection. Defaults to {‘_id’: 0}.

Returns

gene_name and protein_name

Return type

(tuple of str)

3.1.1.3.19. datanator_query_python.query.query_uniprot_org module

For querying uniprot.org using uniprot API (https://www.uniprot.org/help/api_queries)

class datanator_query_python.query.query_uniprot_org.QueryUniprotOrg(query, api='https://www.uniprot.org/uniprot/?', include='yes', compress='no', limit=1, offset=0)[source]

Bases: object

get_kegg_ortholog()[source]

Get kegg ortholog information using query message.

Returns

kegg ortholog number

Return type

(str)

get_protein_name()[source]

Get protein name.

Returns

list of protein names.

Return type

(list of str)

get_uniprot_id()[source]

Get uniprot id.

Returns

uniprot id

Return type

(str)

3.1.1.3.20. datanator_query_python.query.query_xmdb module

class datanator_query_python.query.query_xmdb.QueryXmdb(username=None, password=None, server=None, authSource='admin', database='datanator', max_entries=inf, verbose=True, collection_str='ecmdb', readPreference='nearest', replicaSet=None)[source]

Bases: object

get_all_concentrations(projection={'_id': 0, 'inchi': 1, 'inchikey': 1, 'name': 1, 'smiles': 1})[source]

Get all entries that have concentration values

Parameters

projection (dict, optional) – mongodb query projection. Defaults to {‘_id’: 0, ‘inchi’: 1,’inchikey’: 1, ‘smiles’: 1, ‘name’: 1}.

Returns

all results that meet the constraint.

Return type

(list)

get_name_by_inchikey(inchikey)[source]

Get metabolite’s name by its inchikey

Parameters

inchikey (str) – inchi key of metabolite

Returns

name of metabolite

Return type

(str)

get_standard_ids_by_id(_id)[source]

Get chebi_id, pubmed_id, and kegg_id from database specific id.

Parameters

_id (str) – Database specific ID.

Returns

Dictionary containing the information.

Return type

(dict)

3.1.1.3.21. Module contents