4.1.1.3. datanator.data_source package¶
4.1.1.3.1. Subpackages¶
4.1.1.3.2. Submodules¶
4.1.1.3.3. datanator.data_source.corum_nosql module¶
-
class
datanator.data_source.corum_nosql.
CorumNoSQL
(MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', cache_dirname=None)[source]¶
-
datanator.data_source.corum_nosql.
correct_protein_name_list
(lst)[source]¶ Correct a list of protein names with incorrect separators involving ‘[Cleaved into: …]’
- Parameters
lst (
str
) – list of protein names with incorrect separators- Returns
corrected list of protein names
- Return type
str
4.1.1.3.4. datanator.data_source.ec module¶
-
class
datanator.data_source.ec.
EC
(server=None, db=None, username=None, password=None, authSource='admin', readPreference='nearest', collection_str='ec', verbose=True, max_entries=inf, cache_dir=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
establish_ftp
()[source]¶ establish ftp connection. (ftp://ftp.expasy.org/databases/enzyme/enzyme.dat)
-
make_doc
(lines)[source]¶ Turn a block of EC info into a dictionary object
- Parameters
lines (
list
ofstr
) – list consists of lines of information on one EC group.- Returns
dictionary object.
- Return type
(
dict
)
-
4.1.1.3.5. datanator.data_source.gene_ortholog module¶
-
class
datanator.data_source.gene_ortholog.
KeggGeneOrtholog
(server, src_db='datanator', des_db='datanator', collection_str='uniprot', username=None, password=None, readPreference='nearest', authSource='admin', verbose=True, max_entries=inf)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
get_html
(query)[source]¶ Get HTML file based on org:gene_code string, e.g. aly:ARALYDRAFT_486312.
- Parameters
query (
str
) – org:gene_code string.
-
load_data
(skip=0, top_hits=10)[source]¶ Loading data.
- Parameters
skip (
int
, optional) – Beginning of the documents. Defaults to 0.top_hits (
int
, optional) – Number of top hits to iterate through. Defaults to 10.
-
parse_gene_info
(gene)[source]¶ Use mygene.info to get protein information given a string of gene code.
- Parameters
gene (
str
) – Gene information.- Returns
List of protein IDs.
- Return type
(
list
ofstr
)
-
parse_html
(soup)[source]¶ Parse out gene_orthologs from HTML (https://www.kegg.jp/ssdb-bin/ssdb_best?org_gene=aly:ARALYDRAFT_486312).
- Parameters
soup (
BeautifulSoup
) – BeautifulSoup object
-
4.1.1.3.6. datanator.data_source.intact_nosql module¶
Downloads and parses the IntAct database of protein-protein interactions
-
class
datanator.data_source.intact_nosql.
IntActNoSQL
(cache_dirname=None, MongoDB=None, db=None, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
A local MongoDB copy of the IntAct database
-
find_between
(string, first, last)[source]¶ Get the substring between the first occurrence of the substring
first
and the last occurrence of the substringlast
- Parameters
string (
str
) – stringfirst (
str
) – starting substringlast (
str
) – ending substring
- Returns
- substring between the first occurrence of the substring
first
and the last occurrence of the substring :obj:`last
- substring between the first occurrence of the substring
- Return type
str
-
find_between_psi_mi_parentheses
(string)[source]¶ Find the text between parentheses in values of psi-mi key-value pairs
- Parameters
string (
str
) – string- Returns
- substring between the first occurrence of the substring
first
and the last occurrence of the substring :obj:`last
- substring between the first occurrence of the substring
- Return type
str
-
find_protein_gene
(interactor, alias)[source]¶ Parse the protein and gene identifiers from key-value pairs of interactors and their aliases
- Parameters
interactor (
str
) – key-value pairs of interactoralias (
str
) – key-value pairs of the alias of the interactor
- Returns
protein identifier
str
: gene identifier- Return type
str
-
find_pubmed_id
(string)[source]¶ Parse PubMed identifier from annotated key-value pair of publication type-identifier
- Parameters
string (
str
) – key-value pair of publication type-identifier- Returns
PubMed identifier
- Return type
str
-
4.1.1.3.7. datanator.data_source.kegg_org_code module¶
-
class
datanator.data_source.kegg_org_code.
KeggOrgCode
(MongoDB, db, cache_dirname=None, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, readPreference=None, authSource='admin', collection_str='kegg_organism_code')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
bulk_load
(bulk_size=100)[source]¶ Loading bulk data into MongoDB.
- Parameters
bulk_size (
int
) – number of entries per insertion. Defaults to 100.
-
get_ncbi_id
(name)[source]¶ Given name of species, look up ncbi_taxonomy_id from official ncbi database by parsing html webpage.
- Parameters
name (
str
) – name of the organism.- Returns
NCBI Taxonomy ID.
- Return type
(
int
)
-
get_ncbi_id_rest
(name)[source]¶ Get ncbi taxonomy id of an organism using api.datanator.info
- Parameters
name (
str
) – Name of the organism.- Returns
NCBI Taxonomy ID.
- Return type
(
int
)
-
4.1.1.3.8. datanator.data_source.kegg_orthology module¶
-
class
datanator.data_source.kegg_orthology.
KeggOrthology
(cache_dirname=None, MongoDB=None, db=None, replicaSet='', verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
parse_definition
(line)[source]¶ - Definition line could be something as follows:
” fructose-bisphosphate aldolase / 6-deoxy-5-ketofructose 1-phosphate synthase [NADP…] [EC:4.1.2.13 2.2.1.11]
- “
EC code can be optional
-
parse_gene
(lines)[source]¶ Parse GENES category (http://rest.kegg.jp/get/ko:K00023)
- Parameters
lines (
readlines()
) – Lines for genes.- Returns
list of parsed genes.
- Return type
(
list
ofdict
)
-
parse_pathway_disease
(lines, category='pathway')[source]¶ Parse parthway or disease or module information
- Parameters
line (
readlines()
) – pathway lines.category (
str
) – which category to parse. Defaults to pathway.
- Returns
list of pathways [{“kegg_pathway_code”: …, “pathway_description”: …}]
- Return type
(
list
ofdict
)
-
4.1.1.3.9. datanator.data_source.kegg_reaction_class module¶
-
class
datanator.data_source.kegg_reaction_class.
KeggReaction
(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None)[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
parse_rc_multiline
(lines)[source]¶ - Input:
- DEFINITION C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y
N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y … … O1a-O2x:*-C1z:C1b-C1x
- Output:
[C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y, N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y, …]
-
parse_rc_orthology
(lines)[source]¶ - Input:
ORTHOLOGY K00260 glutamate dehydrogenase [EC:1.4.1.2] K00261 glutamate dehydrogenase (NAD(P)+) [EC:1.4.1.3] K00262 glutamate dehydrogenase (NADP+) [EC:1.4.1.4] K00263 leucine dehydrogenase [EC:1.4.1.9] … K13547 L-glutamine:2-deoxy-scyllo-inosose/3-amino-2,3-dideoxy-scyllo-inosose aminotransferase [EC:2.6.1.100 2.6.1.101] ..
- Output
[K00260, K00261, …]
-
4.1.1.3.10. datanator.data_source.metabolite_nosql module¶
- Author
Zhouyang Lian <zhouyang.lian@familian.life>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2019-04-02
- Copyright
2019, Karr Lab
- License
MIT
-
class
datanator.data_source.metabolite_nosql.
MetaboliteNoSQL
(output_directory, source, MongoDB, db, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
Loads metabolite information into mongodb and output documents as JSON files for each metabolite Attribuites:
source: source database e.g. ‘ecmdb’ ‘ymdb’ MongoDB: mongodb server address e.g. ‘mongodb://localhost:27017/’ max_entries: maximum number of documents to be processed output_direcotory: directory in which JSON files will be stored.
4.1.1.3.11. datanator.data_source.metabolites_meta_collection module¶
-
class
datanator.data_source.metabolites_meta_collection.
MetabolitesMeta
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', meta_loc=None)[source]¶ Bases:
datanator_query_python.query.query_sabiork.QuerySabio
meta_loc: database location to save the meta collection
-
fill_metabolite_fields
(fields=None, collection_src=None, collection_des=None)[source]¶ Fill in values of fields of interest from metabolite collection: ecmdb or ymdb
- Args:
fileds: list of fields of interest collection_src: collection in which query will be done collection_des: collection in which result will be updated
-
fill_standard_id
(skip=0)[source]¶ Fill meta collection with chebi_id, pubmed_id, and kegg_id.
- Parameters
skip (
int
) – skip first n number of records.
-
remove_dups
(_key)[source]¶ Remove entries with the same _key.
- Parameters
_key (
str
) – Name of fields in which dups will be identified.
-
reset_cellular_locations
(start=0)[source]¶ Github (https://github.com/KarrLab/datanator_rest_api/issues/69)
-
4.1.1.3.12. datanator.data_source.modomics module¶
Use BpForms to gather rRNA and tRNA modification information from MODOMICS <https://iimcb.genesilico.pl/modomics/>.
- Author
Jonathan Karr <karr@mssm.edu>
- Date
2020-04-23
- Copyright
2019, Karr Lab
- License
MIT
4.1.1.3.13. datanator.data_source.pax_nosql module¶
4.1.1.3.14. datanator.data_source.protein_aggregate module¶
-
class
datanator.data_source.protein_aggregate.
ProteinAggregate
(username=None, password=None, server=None, authSource='admin', src_database='datanator', max_entries=inf, verbose=True, collection='protein', destination_database='datanator', cache_dir=None)[source]¶ Bases:
object
4.1.1.3.15. datanator.data_source.sabio_compound module¶
4.1.1.3.16. datanator.data_source.sabio_reaction module¶
-
class
datanator.data_source.sabio_reaction.
RxnAggregate
(username=None, password=None, server=None, authSource='admin', src_database='datanator', max_entries=inf, verbose=True, collection='sabio_reaction_entries', destination_database='datanator', cache_dir=None)[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
extract_enzyme_names
(doc)[source]¶ Extract enzyme names
- Parameters
doc (
dict
) – sabio_rk_old document- Returns
list of enzyme names
- Return type
(
list
)
-
extract_reactant_names
(doc)[source]¶ Extract compound information from doc dictionary
- Parameters
doc (
dict
) – sabio_rk_old document- Returns
substrates and products names [[],[],…,[]], [[],[],…,[]]
- Return type
(
tuple
)
-
hash_null_reactants
(start=0)[source]¶ -
- Parameters
start (
int
, optional) – Start of document. Defaults to 0.
-
4.1.1.3.17. datanator.data_source.sabio_rk module¶
- Author
Yosef Roth <yosefdroth@gmail.com>
- Author
Jonathan Karr <jonrkarr@gmail.com>
- Date
2017-05-04
- Copyright
2017, Karr Lab
- License
MIT
-
class
datanator.data_source.sabio_rk.
Compartment
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a compartment in the SABIO-RK database
-
kinetic_laws
[source]¶ list of kinetic laws
- Type
list
ofKineticLaw
-
-
class
datanator.data_source.sabio_rk.
Compound
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a compound in the SABIO-RK database
-
_is_name_ambiguous
[source]¶ if
True
, the currently stored compound name should not be trusted because multiple names for the same compound have been discovered. The consensus name must be obtained usingdownload_compounds
- Type
bool
-
structures
[source]¶ structures
- Type
list
ofCompoundStructure
-
reaction_participants
[source]¶ list of reaction participants
- Type
list
ofReactionParticipant
-
get_inchi_structures
()[source]¶ Get InChI-formatted structures
- Returns
list of structures in InChI format
- Return type
list
ofstr
-
get_smiles_structures
()[source]¶ Get SMILES-formatted structures
- Returns
list of structures in SMILES format
- Return type
list
ofstr
-
structures
[source]
-
-
class
datanator.data_source.sabio_rk.
CompoundStructure
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents the structure of a compound and its format
-
_value_inchi_formula_connectivity
[source]¶ empiral formula (without hydrogen) and connectivity InChI layers; used to quickly search for compound structures
- Type
str
-
calc_inchi_formula_connectivity
()[source]¶ Calculate a searchable structures
InChI format
Core InChI format
Formula layer (without hydrogen)
Connectivity layer
-
format
[source]
-
value
[source]
-
-
class
datanator.data_source.sabio_rk.
Entry
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a compartment in the SABIO-RK database
-
created
[source]
-
cross_references
[source]
-
id
[source]
-
name
[source]
-
synonyms
[source]
-
-
class
datanator.data_source.sabio_rk.
Enzyme
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents an enzyme in the SABIO-RK database
-
subunits
[source]¶ list of subunits
- Type
list
ofEnzymeSubunit
-
kinetic_laws
[source]¶ list of kinetic laws
- Type
list
ofKineticLaw
-
molecular_weight
[source]
-
-
class
datanator.data_source.sabio_rk.
EnzymeSubunit
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents an enzyme in the SABIO-RK database
-
coefficient
[source]
-
enzyme
[source]
-
molecular_weight
[source]
-
sequence
[source]
-
-
class
datanator.data_source.sabio_rk.
KineticLaw
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a kinetic law in the SABIO-RK database
-
reactants
[source]¶ list of reactants
- Type
list
ofReactionParticipant
-
products
[source]¶ list of products
- Type
list
ofReactionParticipant
-
modifiers
[source]¶ list of modifiers
- Type
list
ofReactionParticipant
-
enzyme
[source]
-
enzyme_compartment
[source]
-
enzyme_type
[source]
-
equation
[source]
-
mechanism
[source]
-
media
[source]
-
modifiers
[source]
-
parameters
[source]
-
ph
[source]
-
products
[source]
-
reactants
[source]
-
references
[source]
-
taxon
[source]
-
taxon_variant
[source]
-
taxon_wildtype
[source]
-
temperature
[source]
-
tissue
[source]
-
-
class
datanator.data_source.sabio_rk.
Parameter
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a parameter in the SABIO-RK database
-
TYPES (:obj:`dict` of :obj:`int`
str
): dictionary of SBO terms and their canonical string symbols
-
UNITS (:obj:`dict` of :obj:`int`
str
): dictionary of SBO terms and their canonical units
-
compartment
[source]
-
compound
[source]
-
enzyme
[source]
-
error
[source]
-
observed_error
[source]
-
observed_name
[source]
-
observed_type
[source]
-
observed_units
[source]
-
observed_value
[source]
-
type
[source]
-
units
[source]
-
value
[source]
-
-
class
datanator.data_source.sabio_rk.
ReactionParticipant
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a participant in a SABIO-RK reaction
-
coefficient
[source]
-
compartment
[source]
-
compound
[source]
-
type
[source]
-
-
class
datanator.data_source.sabio_rk.
Resource
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an external resource
-
kinetic_laws
[source]¶ kinetic laws
- Type
list
ofKineticLaw
-
id
[source]
-
namespace
[source]
-
-
class
datanator.data_source.sabio_rk.
SabioRk
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=False, verbose=False, clear_requests_cache=False, download_request_backup=False, webservice_batch_size=1, excel_batch_size=100, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the SABIO-RK database
-
webservice_batch_size
[source]¶ default size of batches to download kinetic information from the SABIO webservice. Note: this should be set to one because SABIO exports units incorrectly when multiple kinetic laws are requested
- Type
int
-
excel_batch_size
[source]¶ default size of batches to download kinetic information from the SABIO Excel download service
- Type
int
-
ENDPOINT_KINETIC_LAWS_SEARCH
[source]¶ URL to obtain a list of the ids of all of the kinetic laws in SABIO-Rk
- Type
str
-
SKIP_KINETIC_LAW_IDS
[source]¶ IDs of kinetic laws that should be skipped (because they cannot contained errors and can’t be downloaded from SABIO)
- Type
tuple
ofint
-
PUBCHEM_TRY_DELAY
[source]¶ delay in seconds between PubChem queries (to delay overloading the server)
- Type
float
-
ENDPOINT_COMPOUNDS_PAGE
= 'http://sabiork.h-its.org/compdetails.jsp'[source]
-
ENDPOINT_DOMAINS
= {'sabio_rk': 'http://sabiork.h-its.org', 'uniprot': 'http://www.uniprot.org'}[source]¶
-
ENDPOINT_EXCEL_EXPORT
= 'http://sabiork.h-its.org/entry/exportToExcelCustomizable'[source]
-
ENDPOINT_KINETIC_LAWS_SEARCH
= 'http://sabiork.h-its.org/sabioRestWebServices/searchKineticLaws/entryIDs'[source]
-
ENDPOINT_WEBSERVICE
= 'http://sabiork.h-its.org/sabioRestWebServices/kineticLaws'[source]
-
PUBCHEM_MAX_TRIES
= 10[source]
-
PUBCHEM_TRY_DELAY
= 0.25[source]
-
SKIP_KINETIC_LAW_IDS
= (51286,)[source]
-
calc_enzyme_molecular_weights
(enzymes)[source]¶ Calculate the molecular weight of each enzyme
- Parameters
enzymes (
list
ofEnzyme
) – list of enzymes
-
calc_stats
()[source]¶ Calculate statistics about SABIO-RK
- Returns
list of list of statistics
- Return type
list
oflist
ofobj
-
create_compartment_from_sbml
(sbml)[source]¶ Add a compartment to the local sqlite database
- Parameters
sbml (
libsbml.Compartment
) – SBML-representation of a compartment- Returns
compartment
- Return type
-
create_cross_references_from_sbml
(sbml)[source]¶ Add cross references to the local sqlite database for an SBML object
- Parameters
sbml (
libsbml.SBase
) – object in an SBML documentation- Returns
list of resources
- Return type
list
ofResource
-
create_kinetic_law_from_sbml
(id, sbml, specie_properties, functions, units)[source]¶ Add a kinetic law to the local sqlite database
- Parameters
id (
int
) – identifiersbml (
libsbml.KineticLaw
) – SBML-representation of a reactionspecie_properties (
dict
) –additional properties of the compounds/enzymes
is_wildtype (
bool
): indicates if the enzyme is wildtype or mutantvariant (
str
): description of the variant of the eznymemodifier_type (
str
): type of the enzyme (e.g. Modifier-Catalyst)
:param functions (
dict
ofstr
:str
): dictionary of rate law equations (keys = IDs in SBML, values = equations) :param units (dict
ofstr
:str
): dictionary of units (keys = IDs in SBML, values = names)- Returns
kinetic law
- Return type
- Raises
ValueError – if the temperature is expressed in an unsupported unit
-
create_kinetic_laws_from_sbml
(ids, sbml)[source]¶ Add kinetic laws defined in an SBML file to the local sqlite database
- Parameters
ids (
list
ofint
) – list kinetic law IDssbml (
str
) – SBML representation of one or more kinetic laws
- Returns
list
ofKineticLaw
: list of kinetic lawslist
ofCompound
orEnzyme
: list of species (compounds or enzymes)list
ofCompartment
: list of compartments
- Return type
tuple
-
create_specie_from_sbml
(sbml)[source]¶ Add a species to the local sqlite database
- Parameters
sbml (
libsbml.Species
) – SBML-representation of a compound or enzyme- Returns
- Return type
tuple
- Raises
ValueError – if a species is of an unsupported type (i.e. not a compound or enzyme)
-
export_stats
(stats, filename=None)[source]¶ Export statistics to an Excel workbook
- Parameters
stats (
list
oflist
ofobj
) – list of list of statisticsfilename (
str
, optional) – path to export statistics
-
get_parameter_by_properties
(kinetic_law, parameter_properties)[source]¶ Get the parameter of
kinetic_law
whose attribute values are equal to that ofparameter_properties
- Parameters
kinetic_law (
KineticLaw
) – kinetic law to find parameter ofparameter_properties (
dict
) – properties of parameter to find
- Returns
parameter with attribute values equal to values of
parameter_properties
- Return type
-
get_specie_reference_from_sbml
(specie_id)[source]¶ Get the compound/enzyme associated with an SBML species by its ID
- Parameters
specie_id (
str
) – ID of an SBML species- Returns
Compartment
: compartment
- Return type
tuple
- Raises
ValueError – if the species is not a compound or enzyme, no species with id = specie_id exists, or no compartment with name = compartment_name exists
-
infer_compound_structures_from_names
(compounds)[source]¶ Try to use PubChem to infer the structure of compounds from their names
Notes: we don’t try look up structures from their cross references because SABIO has already gathered all structures from their cross references to ChEBI, KEGG, and PubChem
- Parameters
compounds (
list
ofCompound
) – list of compounds
-
load_compounds
(compounds=None)[source]¶ Download information from SABIO-RK about all of the compounds stored in the local sqlite copy of SABIO-RK
- Parameters
compounds (
list
ofCompound
) – list of compounds to download- Raises
Error – if an HTTP request fails
-
load_kinetic_law_ids
()[source]¶ Download the IDs of all of the kinetic laws stored in SABIO-RK
- Returns
list of kinetic law IDs
- Return type
list
ofint
- Raises
Error – if an HTTP request fails or the expected number of kinetic laws is not returned
-
load_kinetic_laws
(ids)[source]¶ Download kinetic laws from SABIO-RK
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to download- Raises
Error – if an HTTP request fails
-
load_missing_enzyme_information_from_html
(ids)[source]¶ Loading enzyme subunit information from html
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to download
-
load_missing_kinetic_law_information_from_tsv
(ids)[source]¶ Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to download
-
load_missing_kinetic_law_information_from_tsv_helper
(tsv)[source]¶ Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.
Note: this method is necessary because neither of SABIO’s SBML and Excel export methods provide all of the SABIO’s content.
- Parameters
tsv (
str
) – TSV-formatted table- Raises
ValueError – if a kinetic law or compartment is not contained in the local sqlite database
-
normalize_kinetic_laws
(ids)[source]¶ Normalize parameter values
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to download
-
normalize_parameter_value
(name, type, value, error, units, enzyme_molecular_weight)[source]¶ - Parameters
name (
str
) – parameter nametype (
int
) parameter type (SBO term id) –value (
float
) – observed valueerror (
float
) – observed errorunits (
str
) – observed unitsenzyme_molecular_weight (
float
) – enzyme molecular weight
- Returns
- normalized name and
its type (SBO term), value, error, and units
- Return type
tuple
ofstr
,int
,float
,float
,str
- Raises
ValueError – if
units
is not a supported unit oftype
-
parse_complex_subunit_structure
(text)[source]¶ Parse the subunit structure of complex into a dictionary of subunit coefficients
- Parameters
text (
str
) – subunit structure described with nested parentheses- Returns
dictionary of subunit coefficients
- Return type
dict
ofstr
,int
-
parse_enzyme_name
(sbml)[source]¶ Parse the name of an enzyme in SBML for the enzyme name, wild type status, and variant description that it contains.
- Parameters
sbml (
str
) – enzyme name in SBML- Returns
str
: namebool
: ifTrue
, the enzyme is wild typestr
: variant
- Return type
tuple
- Raises
ValueError – if the enzyme name is formatted in an unsupport format
-
4.1.1.3.18. datanator.data_source.sabio_rk_json_mongo module¶
- Parse SabioRk json files into MongoDB documents
(json files acquired by running sqlite_to_json.py)
- Author
Zhouyang Lian <zhouyang.lian@familian.life>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2019-04-02
- Copyright
2019, Karr Lab
- License
MIT
-
class
datanator.data_source.sabio_rk_json_mongo.
SabioRkNoSQL
(db=None, MongoDB=None, cache_directory=None, verbose=False, max_entries=inf, replicaSet=None, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
fill_kegg_meta
(start=0)[source]¶ Fill kegg information for reactions.
- Parameters
start (
int
, optional) – Starting document. Defaults to 0.
-
4.1.1.3.19. datanator.data_source.sabio_rk_nosql module¶
-
class
datanator.data_source.sabio_rk_nosql.
SabioRk
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', webservice_batch_size=50, excel_batch_size=50)[source]¶ Bases:
object
-
calc_enzyme_molecular_weights
(enzymes, length)[source]¶ Calculate the molecular weight of each enzyme
- Parameters
enzymes (
list
ofdict
) – list of enzymes- Returns
list of enzymes
- Return type
enzymes (
list
ofdict
)
-
calc_inchi_formula_connectivity
(structure)[source]¶ Calculate a searchable structures
InChI format
Core InChI format
Formula layer (without hydrogen)
Connectivity layer
-
create_cross_references_from_sbml
(sbml)[source]¶ Look up cross references from an SBML object to dictionary
- Parameters
sbml (
libsbml.SBase
) – object in an SBML documentation- Returns
list of resources
- Return type
list
of dictionary
-
create_kinetic_law_from_sbml
(id, sbml, root_species, specie_properties, functions, units)[source]¶ Make a kinetic law doc for mongoDB
- Parameters
id (
int
) – identifiersbml (
libsbml.KineticLaw
) – SBML-representation of a reaction (reaction_sbml)species (
list
) – list of species in root sbmlspecie_properties (
dict
) –additional properties of the compounds/enzymes
is_wildtype (
bool
): indicates if the enzyme is wildtype or mutantvariant (
str
): description of the variant of the eznymemodifier_type (
str
): type of the enzyme (e.g. Modifier-Catalyst)
:param functions (
dict
ofstr
:str
): dictionary of rate law equations (keys = IDs in SBML, values = equations) :param units (dict
ofstr
:str
): dictionary of units (keys = IDs in SBML, values = names)- Returns
kinetic law
- Return type
dictionary
- Raises
ValueError – if the temperature is expressed in an unsupported unit
-
create_kinetic_laws_from_sbml
(ids, sbml)[source]¶ Add kinetic laws defined in an SBML file to the local mongodb database
- Parameters
ids (
list
ofint
) – list kinetic law IDssbml (
str
) – SBML representation of one or more kinetic laws (root)
- Returns
list
ofKineticLaw
: list of kinetic lawslist
ofCompound
orEnzyme
: list of species (compounds or enzymes)list
ofCompartment
: list of compartments
- Return type
tuple
-
get_compartment_from_sbml
(sbml)[source]¶ get compartment from sbml
- Parameters
sbml (
libsbml.Compartment
) – SBML-representation of a compartment- Returns
dictionary: compartment
-
get_parameter_by_properties
(kinetic_law, parameter_properties)[source]¶ - Get the parameter of
kinetic_law
whose attribute values are equal to that of
parameter_properties
- Parameters
kinetic_law (
KineticLaw
) – kinetic law to find parameter ofparameter_properties (
dict
) – properties of parameter to find
- Returns
parameter with attribute values equal to values of
parameter_properties
- Return type
Parameter
- Get the parameter of
-
get_specie_from_sbml
(sbml)[source]¶ get species information from sbml
- Parameters
sbml (
libsbml.Species
) – SBML-representation of a compound or enzyme- Returns
Compound
: orEnzyme
: compound or enzymedict
: additional properties of the compound/enzymeis_wildtype (
bool
): indicates if the enzyme is wildtype or mutantvariant (
str
): description of the variant of the eznymemodifier_type (
str
): type of the enzyme (e.g. Modifier-Catalyst)
- Return type
tuple
- Raises
ValueError – if a species is of an unsupported type (i.e. not a compound or enzyme)
-
get_specie_reference_from_sbml
(specie_id, species)[source]¶ Get the compound/enzyme associated with an SBML species by its ID
- Parameters
specie_id (
str
) – ID of an SBML species- Returns
Compound
orEnzyme
: compound or enzymeCompartment
: compartment
- Return type
tuple
- Raises
ValueError – if the species is not a compound or enzyme, no species with id = specie_id exists, or no compartment with name = compartment_name exists
-
infer_compound_structures_from_names
(compounds)[source]¶ Try to use PubChem to infer the structure of compounds from their names
Notes: we don’t try look up structures from their cross references because SABIO has already gathered all structures from their cross references to ChEBI, KEGG, and PubChem
- Parameters
compounds (
list
ofdict
) – list of compounds
-
load_compounds
(compounds=None)[source]¶ - Download information from SABIO-RK about all of the compounds stored sabio_compounds
collection
- Parameters
compounds (
list
ofobj
) – list of compounds to download- Raises
Error – if an HTTP request fails
-
load_kinetic_law_ids
()[source]¶ Download the IDs of all of the kinetic laws stored in SABIO-RK
- Returns
list of kinetic law IDs
- Return type
list
ofint
-
load_kinetic_laws
(ids)[source]¶ Download kinetic laws from SABIO-RK
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to download- Raises
Error – if an HTTP request fails
-
load_missing_enzyme_information_from_html
(ids, start=0)[source]¶ Loading enzyme subunit information from html
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to downloadstart (
int
) – starting point for iterator
-
load_missing_kinetic_law_information_from_tsv
(ids)[source]¶ Update the properties of kinetic laws in mongodb based on content downloaded from SABIO in TSV format.
- Parameters
ids (
list
ofint
) – list of IDs of kinetic laws to downloadstart (
int
) – starting row
-
load_missing_kinetic_law_information_from_tsv_helper
(tsv, start=0)[source]¶ Update the properties of kinetic laws in the mongodb based on content downloaded from SABIO in TSV format.
Note: this method is necessary because neither of SABIO’s SBML and Excel export methods provide all of the SABIO’s content.
- Parameters
tsv (
str
) – TSV-formatted tablestart (
int
) – starting row
- Raises
ValueError – if a kinetic law or compartment is not contained in the local sqlite database
-
normalize_kinetic_laws
(new_ids)[source]¶ Normalize parameter values.
- Parameters
new_ids (
list
ofint
) – list of IDs of kinetic laws to normalize
-
normalize_parameter_value
(name, type, value, error, units, enzyme_molecular_weight)[source]¶ - Parameters
name (
str
) – parameter nametype (
int
) parameter type (SBO term id) –value (
float
) – observed valueerror (
float
) – observed errorunits (
str
) – observed unitsenzyme_molecular_weight (
float
) – enzyme molecular weight
- Returns
- normalized name and
its type (SBO term), value, error, and units
- Return type
tuple
ofstr
,int
,float
,float
,str
- Raises
ValueError – if
units
is not a supported unit oftype
-
parse_complex_subunit_structure
(text)[source]¶ Parse the subunit structure of complex into a dictionary of subunit coefficients
- Parameters
text (
str
) – subunit structure described with nested parentheses- Returns
dictionary of subunit coefficients
- Return type
dict
ofstr
,int
-
parse_enzyme_name
(sbml)[source]¶ Parse the name of an enzyme in SBML for the enzyme name, wild type status, and variant description that it contains.
- Parameters
sbml (
str
) – enzyme name in SBML- Returns
str
: namebool
: ifTrue
, the enzyme is wild typestr
: variant
- Return type
tuple
- Raises
ValueError – if the enzyme name is formatted in an unsupport format
-
4.1.1.3.20. datanator.data_source.sqlite_to_json module¶
Converts tables in SQLite into json files .. attribute:: database
path to sqlite database
4.1.1.3.21. datanator.data_source.taxon_tree module¶
-
class
datanator.data_source.taxon_tree.
TaxonTree
(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
insert_canon_anc
(start=0)[source]¶ Insert two arrays to each document, one is canon_anc_id, the other is canon_anc_name
-
parse_fullname_line
(line)[source]¶ Parses lines in file fullnamelineage.dmp and return elements in a list
-
parse_fullname_taxid
()[source]¶ Parse fullnamelineage.dmp and taxidlineage.dmp store in MongoDB Always run first before loading anything else (insert_one)
-
4.1.1.3.22. datanator.data_source.uniprot_nosql module¶
- Author
Zhouyang Lian <zhouyang.lian@familian.life>
- Author
Jonathan <jonrkarr@gmail.com>
- Date
2019-04-02
- Copyright
2019, Karr Lab
- License
MIT
-
class
datanator.data_source.uniprot_nosql.
UniprotNoSQL
(MongoDB=None, db=None, max_entries=inf, verbose=False, username=None, password=None, authSource='admin', replicaSet=None, collection_str='uniprot')[source]¶ Bases:
datanator_query_python.util.mongo_util.MongoUtil
-
embl_helper
(s)[source]¶ Processing emble or refseq strings into a list of standard format. “NP_796298.2 [E9PXF8-1];XP_006507989.1 [E9PXF8-2];” -> [‘NP_796298.2’, ‘XP_006507989.1’].
- Parameters
s (
pandas.Dataframe
) – object to be processed.- Returns
list of processed strings
- Return type
(
list
ofstr
)
-
fill_abundance_publication
(start=0)[source]¶ (https://github.com/KarrLab/datanator/issues/51)
- Parameters
start (
int
, optional) – beginning of documents.
-
fill_reactions
(start=0)[source]¶ Fill reactions in which the protein acts as a catalyst.
- Parameters
start (
int
, optional) – Starting document in sabio_rk. Defaults to 0.
-
load_uniprot
(query=False, msg='', species=None)[source]¶ Build dataframe
- Parameters
query (
bool
, optional) – Whether download all reviewed entries of perform individual queries. Defaults to False.msg (
str
, optional) – Query message. Defaults to ‘’.species (
list
, optional) – species information to extract from df and loaded into uniprot. Defaults to None.
-