4.1.1.5. datanator.data_source package

4.1.1.5.2. Submodules

4.1.1.5.3. datanator.data_source.array_express module

Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT

class datanator.data_source.array_express.ArrayExpress(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS

list of IDs of datasets to exclude

type:list of str
ENDPOINT_DOMAINS = {'array_express': 'https://www.ebi.ac.uk/arrayexpress/json/v3/experiments'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

get_or_create_object(cls, **kwargs)[source]

Get the first instance of cls that has the property-values pairs described by kwargs, or create an instance of cls if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls: class :param **kwargs: values of the properties of the object

Returns:instance of cls hat has the property-values pairs described by kwargs
Return type:Base
load_content(test_url='')[source]

Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year: int, optional :param end_year: the last year to retrieve experiments for :type end_year: int, optional

load_experiment_metadata(test_url='')[source]

Get a list of accession identifiers for the experiments from the year start_year to year end_year :param start_year: the first year to retrieve experiment acession ids for :type start_year: int, optional :param end_year: the last year to retrieve experiment acession ids for :type end_year: int, optional

Returns:list of experiment accession identifiers
Return type:list of str
load_experiment_protocol(experiment, protocol_json)[source]

Load the protocols for an experiment :param experiment: experiment :type experiment: Experiment :param protocol_json: sample :type protocol_json: dict

load_experiment_protocols(experiment)[source]

Load the protocols for an experiment :param experiment: experiment :type experiment: Experiment

load_experiment_sample(experiment, sample_json, index)[source]

Load the samples for an experiment :param experiment: experiment :type experiment: Experiment :param sample_json: sample :type sample_json: dict :param index: index of the sample within the experiment :type index: int

load_experiment_samples(experiment)[source]

Load the samples for an experiment :param experiment: experiment :type experiment: Experiment

class datanator.data_source.array_express.Characteristic(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experimental characteristic .. attribute:: _id

unique id

type:int
category[source]

name of the characteristic (e.g. organism)

Type:str
value[source]

value of characteristic (e.g. Mus musculus)

Type:str
samples[source]

samples

Type:list of Sample
category[source]
samples[source]
value[source]
class datanator.data_source.array_express.DataFormat(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a data format .. attribute:: _id

unique id

type:int
name[source]

name

Type:str
bio_assay_data_cubes[source]

number of dimensions to the data

Type:int
bio_assay_data_cubes[source]
experiments[source]
name[source]
class datanator.data_source.array_express.EnsemblInfo(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type:int
organism_strain[source]

the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)

Type:str
url[source]

the download url for the CDNA file from ensembl

Type:str
organism_strain[source]
ref_genome[source]
samples[source]
url[source]
class datanator.data_source.array_express.Experiment(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experiment .. attribute:: _id

unique id

type:int
id[source]

unique string identifier assigned by ArrayExpress

Type:str
name[source]

name

Type:str
name_2[source]

second name

Type:str
description[source]

description

Type:str
organisms[source]

list of organisms

Type:list of Organism
types[source]

list of experiment types

Type:list of ExperimentType
designs[source]

list of experimental designs

Type:list of ExperimentDesign
submission_date[source]

submission date

Type:datetime.date
release_date[source]

release date

Type:datetime.date
data_formats[source]

list of data formats

Type:list of DataFormat
read_type[source]

type of FASTQ files (in an RNA-Seq experiment)

Type:str
has_fastq_files[source]

whether this experiment has FASTQ files or not

Type:bool
data_formats[source]
description[source]
designs[source]
has_fastq_files[source]
id[source]
name[source]
name_2[source]
organisms[source]
protocols[source]
read_type[source]
release_date[source]
samples[source]
submission_date[source]
types[source]
class datanator.data_source.array_express.ExperimentDesign(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents and experimental design .. attribute:: _id

unique id

type:int
name[source]

name

Type:str
experiments[source]
name[source]
class datanator.data_source.array_express.ExperimentType(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a type of experiment .. attribute:: _id

unique id

type:int
name[source]

name

Type:str
experiments[source]
name[source]
class datanator.data_source.array_express.Extract(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an extract of a sample .. attribute:: _id

unique id

type:int
name[source]

name

Type:str
samples[source]

list of samples

Type:list of Sample
name[source]
samples[source]
class datanator.data_source.array_express.Organism(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an organism .. attribute:: _id

unique id

type:int
name[source]

name

Type:str
experiments[source]
name[source]
ncbi_id[source]
class datanator.data_source.array_express.Protocol(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protocol for an experiment .. attribute:: _id

unique id

type:int
protocol_accession[source]

array express identifier for protocol

Type:str
protocol_type[source]

the type of exerpimental protocol (e.g. normalization, extraction, etc.)

Type:list of Sample
text[source]

description the protocol

Type:str
performer[source]

name of the person who did the experiment

Type:str
hardware[source]

hardware (usually detection instruments) used in protocol

Type:str
software[source]

software (usually for analyzing and normalizing the data)

Type:str
experiments[source]

list of experiments that performed this protocol

Type:list of Experiment
experiments[source]
hardware[source]
performer[source]
protocol_accession[source]
protocol_type[source]
software[source]
text[source]
class datanator.data_source.array_express.Sample(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observed concentration .. attribute:: _id

unique id

type:int
experiment_id[source]

(int): the id of the experiment the samaple belongs in

experiment[source]

experiment that the sample belongs to

Type:Experiment
index[source]

index of the sample within the experiment

Type:int
name[source]

name of the source of the sample (this is used to identify the sample in arraya express)

Type:str
assay[source]

name of the assay

Type:str
ensembl_organism_strain[source]

the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)

Type:str
characteristics (:obj:`list` of

obj:`Characteristic’): characteristics

variables[source]

name of the assay

Type:list of Variable'): variablesassay (:obj:`str
fastq_urls[source]

name of the assay

Type:list of Url'): variablesassay (:obj:`str
read_type[source]

the nature of the FASTQ file reads. Either ‘single’, ‘multiple’, or ‘parallel’

Type:str
ensembl_info (:obj:`list` of

obj:`Variable’): informtation about the ensembl reference genome

full_strain_specificity[source]

whether or not ensembl reference genome matches the full strain specifity recoreded in array express

Type:bool
assay[source]
characteristics[source]
ensembl_info[source]
ensembl_organism_strain[source]
experiment[source]
experiment_id[source]
extracts[source]
fastq_urls[source]
full_strain_specificity[source]
index[source]
name[source]
read_type[source]
variables[source]
class datanator.data_source.array_express.Url(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type:int
url[source]

the text of the url

Type:str
samples[source]

samples

Type:list of Sample
samples[source]
url[source]
class datanator.data_source.array_express.Variable(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experimental variable .. attribute:: _id

unique id

type:int
name[source]

name of the variable (e.g. genotype)

Type:str
value[source]

value of variable (e.g control)

Type:str
unit[source]

units of value (e.g control). This field is not always filled.

Type:str
samples[source]

samples

Type:list of Sample
name[source]
samples[source]
unit[source]
value[source]

4.1.1.5.4. datanator.data_source.bio_portal module

Downloads ontologies from BioPortal

Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-05-23
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.bio_portal.BioPortal(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, flask=False, quilt_owner=None, quilt_package=None, ontologies=None)[source]

Bases: datanator.core.data_source.CachedDataSource

Loads ontologies from BioPortal

ontologies[source]

list of filenames of ontologies

Type:list
BIOPORTAL_ENDPOINT[source]

URL pattern to download ontologies

Type:str
CCO_DOWNLOAD_URL[source]

URL to download CCO ontology

Type:str
BIOPORTAL_ENDPOINT = 'http://data.bioontology.org'[source]
CCO_DOWNLOAD_URL = 'http://www.bio.ntnu.no/ontology/CCO/cco.obo'[source]
DEFAULT_ONTOLOGIES = ('BTO.obo', 'CCO.obo', 'CL.owl', 'DOID.obo', 'EFO.owl', 'FMA.owl', 'GO.obo', 'PW.obo', 'SBO.obo')[source]
clear_content()[source]

Clear the content of the sqlite database (i.e. drop and recreate all tables).

download_ontologies()[source]

:param list of str: list of ontologies

download_ontology(id)[source]

Download an ontology from BioPortal

Parameters:id (str) – identifier of the ontology in BioPortal
get_api_key()[source]

Get BioPortal API key

Returns:key
Return type:str
get_engine()[source]

Get an engine for the sqlite database. If the database doesn’t exist, initialize its structure.

Returns:database engine
Return type:sqlalchemy.engine.Engine
get_ontologies()[source]

Get list of ontologies

Returns:list of ontologies
Return type:list
get_ontologies_filename()[source]

Get the local filename to store a list of the ontologies

Returns:filename
Return type:str
get_ontology(id)[source]

Load ontology and download the ontology from BioPortal if neccessary

Parameters:id (str) – identifier of the ontology in BioPortal
Returns:ontology
Return type:pronto.Ontology
get_ontology_filename(id)[source]

Get the local filename to store a copy of an ontology

Parameters:id (str) – identifier of the ontology in BioPortal
Returns:filename
Return type:str
get_paths_to_backup(download=False)[source]

Get a list of the files to backup/unpack

Parameters:download (bool, optional) – if True, prepare the files for uploading
Returns:list of paths to backup
Return type:list of str
get_session()[source]

Get a session for the sqlite database

Returns:database session
Return type:sqlalchemy.orm.session.Session
load_content()[source]

Load the content of the local copy of the data source

quilt_package = None[source]

load content as necessary

4.1.1.5.5. datanator.data_source.corum module

This codebase takes CORUM protein complexes database and formats it to an SQL database

Author:Balazs Szigeti <balazs.szigeti@mssm.edu>
Author:Saahith Pochiraju <saahith116@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2018-08-13
Copyright:2017-2018, Karr Lab
License:MIT
class datanator.data_source.corum.Base(**kwargs)[source]

Bases: object

The most base type

metadata = MetaData(bind=None)[source]
class datanator.data_source.corum.Complex(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein complex .. attribute:: observation_id

ID of the observation

type:int
complex_id[source]

ID of the complex

Type:int
complex_name[source]

Complex name

Type:str
go_id[source]

GO funtinal annotation

Type:str
go_dsc[source]

Description of the annotation

Type:str
funcat_id[source]

FUNCAT functional annotation

Type:str
funcat_dsc[source]

Description of the annotation

Type:str
su_cmt[source]

Subunit comments

Type:str
complex_cmt[source]

Compex comments

Type:str
disease_cmt[source]

Disease comments

Type:str
complex_cmt[source]
complex_id[source]
complex_name[source]
disease_cmt[source]
funcat_dsc[source]
funcat_id[source]
go_dsc[source]
go_id[source]
observation[source]
observation_id[source]
su_cmt[source]
subunits[source]
class datanator.data_source.corum.Corum(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the CORUM database

ENDPOINT_DOMAINS = {'corum': 'https://mips.helmholtz-muenchen.de/corum/download/allComplexes.txt.zip'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Collect and parse all data from CORUM website and add to SQLite database

class datanator.data_source.corum.Observation(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observation (entries in the original DB) .. attribute:: id

internal ID for the observation entry

type:int
cell line

cell line (in whcih the measurement was done)

Type:str
pur_method[source]

purification method

Type:str
pubmed_id[source]

Pubmed ID of the associated publication

Type:str
taxon_ncbi_id[source]

NCBI taxonomy id of the organism

Type:str
cell_line[source]
complex[source]
id[source]
pubmed_id[source]
pur_method[source]
taxon[source]
taxon_ncbi_id[source]
class datanator.data_source.corum.Subunit(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents subunits of complexes .. attribute:: id

Internal subunit ID

type:int
complex_id[source]

ID of the complex to which the subunit belongs

Type:int
su_uniprot[source]

UNIPROT ID

Type:int
su_entrezs[source]

ENTREZS ID

Type:int
protein_name[source]

Name of the protein

Type:str
gene_name[source]

Gene name

Type:str
gene_syn[source]

Synonyms of the gene name

Type:str
complex[source]
complex_id[source]
gene_name[source]
gene_syn[source]
id[source]
protein_name[source]
su_entrezs[source]
su_uniprot[source]
class datanator.data_source.corum.Taxon(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a species .. attribute:: ncbi_id

NCBI id

type:int
species_name[source]

name and possibly genetic variant

Type:str
ncbi_id[source]
observation[source]
swissprot_id[source]
datanator.data_source.corum.correct_protein_name_list(lst)[source]

Correct a list of protein names with incorrect separators involving ‘[Cleaved into: …]’

Parameters:lst (str) – list of protein names with incorrect separators
Returns:corrected list of protein names
Return type:str
datanator.data_source.corum.parse_list(str_lst)[source]

Parse a semicolon-separated list of strings into a list, ignoring semicolons that are inside square brackets

Parameters:str_lst (str) – semicolon-separated encoding of a list
Returns:list
Return type:list of str

4.1.1.5.6. datanator.data_source.corum_nosql module

class datanator.data_source.corum_nosql.CorumNoSQL(MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', cache_dirname=None)[source]

Bases: datanator.util.mongo_util.MongoUtil

load_content()[source]

Collect and parse all data from CORUM website into JSON files and add to NoSQL database

datanator.data_source.corum_nosql.correct_protein_name_list(lst)[source]

Correct a list of protein names with incorrect separators involving ‘[Cleaved into: …]’

Parameters:lst (str) – list of protein names with incorrect separators
Returns:corrected list of protein names
Return type:str
datanator.data_source.corum_nosql.main()[source]
datanator.data_source.corum_nosql.parse_list(str_lst)[source]

Parse a semicolon-separated list of strings into a list, ignoring semicolons that are inside square brackets

Parameters:str_lst (str) – semicolon-separated encoding of a list
Returns:list
Return type:list of str

4.1.1.5.7. datanator.data_source.cron_aggregate module

datanator.data_source.cron_aggregate.write_sabio_json(self, cache_dirname)[source]

4.1.1.5.8. datanator.data_source.ecmdb module

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-05-04
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.ecmdb.Compartment(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a compartment

name[source]

name

Type:str
compounds[source]

list of compounds

Type:list of Compound
compounds[source]
name[source]
class datanator.data_source.ecmdb.Compound(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an ECMDB entry

id[source]

ECMDB identifier

Type:str
name[source]

name

Type:str
synonyms[source]

synonyms

Type:list of Synonym
description[source]

description

Type:str
structure[source]

structure in InChI format

Type:str
_structure_formula_connectivity[source]

empiral formula and connectivity InChI layers; used to quickly search for compound structures

Type:str
compartments[source]

compartments

Type:list of Compartment
concentrations[source]

concentrations

Type:list of Concentration
cross_references[source]

cross references

Type:list of Resources
comment[source]

internal ECMDB comments about the entry

Type:str
created[source]

time that the entry was created in ECMDB

Type:datetime.datetime
updated[source]

time that the entry was last updated in ECMDB

Type:datetime.datetime
downloaded[source]

time that the entry was downloaded from ECMDB

Type:datetime.datetime
comment[source]
compartments[source]
concentrations[source]
created[source]
cross_references[source]
description[source]
downloaded[source]
id[source]
name[source]
structure[source]
synonyms[source]
updated[source]
class datanator.data_source.ecmdb.Concentration(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observed concentration

compound[source]

compound

Type:Compound
value[source]

value in uM

Type:float
error[source]

error in uM

Type:float
strain[source]

observed strain

Type:str
growth_status[source]

observed growth status (e.g. exponential phase, log phase, etc.)

Type:str
media[source]

observed media

Type:str
temperaturer[source]

temperature in C

Type:float
growth_system[source]

observed growth system (e.g. chemostat, 384 well plate, etc.)

Type:str
references[source]

list of references

Type:list of Resource
compound[source]
compound_id[source]
error[source]
growth_status[source]
growth_system[source]
media[source]
references[source]
strain[source]
temperature[source]
value[source]
class datanator.data_source.ecmdb.Ecmdb(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ECMDB database

DOWNLOAD_INDEX_URL[source]

URL to download an index of ECMDB

Type:str
DOWNLOAD_COMPOUND_URL[source]

URL pattern to download an ECMDB compound entry

Type:str
DOWNLOAD_COMPOUND_STRUCTURE_URL = 'http://ecmdb.ca/structures/compounds/{}.inchi'[source]
DOWNLOAD_COMPOUND_URL = 'http://ecmdb.ca/compounds/{}.xml'[source]
DOWNLOAD_INDEX_URL = 'http://ecmdb.ca/download/ecmdb.json.zip'[source]
ENDPOINT_DOMAINS = {'ecmdb': 'http://ecmdb.ca'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

get_node_children(node, children_name)[source]

Get the children of an XML node

Parameters:
  • node (jxmlease.cdatanode.XMLNode) – XML node
  • children_name (str) – tag names of the desired children
Returns:

list of child nodes

Return type:

list of XMLNode

get_node_text(node)[source]

Get the next of a XML node

Parameters:node (jxmlease.cdatanode.XMLCDATANode or str) – XML node or its text
Returns:text of the node
Return type:str
load_content()[source]

Download the content of ECMDB and store it to a local sqlite database.

class datanator.data_source.ecmdb.Resource(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an external resource

namespace[source]

external namespace

Type:str
id[source]

external identifier

Type:str
compounds[source]

compounds

Type:list of Compound
concentrations[source]

concentrations

Type:list of Concentration
compounds[source]
concentrations[source]
id[source]
namespace[source]
class datanator.data_source.ecmdb.Synonym(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a synonym

Parameters:
  • name (str) – name
  • compounds (list of Compound) – list of compounds
compounds[source]
name[source]

4.1.1.5.9. datanator.data_source.ensembl module

Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT

class datanator.data_source.ensembl.GeneEntry(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

exp_id[source]
identifiers[source]
name[source]
organism[source]
samp_id[source]
class datanator.data_source.ensembl.GeneIdentifier(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type:int
category[source]

name of the characteristic (e.g. organism)

Type:str
value[source]

value of characteristic (e.g. Mus musculus)

Type:str
samples[source]

samples

Type:list of Sample
name[source]
samples[source]
class datanator.data_source.ensembl.GetGenes(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS

list of IDs of datasets to exclude

type:list of str
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year: int, optional :param end_year: the last year to retrieve experiments for :type end_year: int, optional

4.1.1.5.10. datanator.data_source.ezyme module

Ezyme

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan <jonrkarr@gmail.com>
Date:2017-05-04
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.ezyme.Ezyme[source]

Bases: datanator.core.data_source.WebserviceDataSource

Utilities for using Ezyme to predict EC numbers.

See Ezyme (http://www.genome.jp/tools-bin/predict_reaction) for more information.

REQUEST_URL[source]

URL to request Ezyme EC number prediction

Type:str
RETRIEVAL_URL[source]

URL to retrieve Ezyme results

Type:str
EC_PREDICTION_URL[source]

URL where predicted EC number is encoded

Type:str
EC_PREDICTION_URL = 'http://www.genome.jp/kegg-bin/get_htext?htext=ko01000.keg&query='[source]
ENDPOINT_DOMAINS = {'ezyme': 'http://www.genome.jp'}[source]
REQUEST_URL = 'http://www.genome.jp/tools-bin/predict_view'[source]
RETRIEVAL_URL = 'http://www.genome.jp/tools-bin/e-zyme2/result.cgi'[source]
run(reaction)[source]

Use Ezyme to predict the first three digits of the EC number of a reaction.

:param data_model.Reaction: reaction

Returns:
ranked list of predicted EC numbers and their scores
or None if one or more participant doesn’t have a defined structure
Return type:list of EzymeResult or None
class datanator.data_source.ezyme.EzymeResult(ec_number, score)[source]

Bases: object

Represents a predicted EC number

ec_number[source]

EC number

Type:str
score[source]

score

Type:float

4.1.1.5.11. datanator.data_source.intact module

Downloads and parses the IntAct database of protein-protein interactions

Author:Saahith Pochiraju <saahith116@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2018-08-13
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.intact.IntAct(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.FtpDataSource

A local SQLite copy of the IntAct database

ENDPOINT_DOMAINS = {'complextab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/complex/current/complextab/', 'psimitab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psimitab/intact_negative.txt'}[source]
add_complexes()[source]

Parse complexes from data and add complexes to SQLite database

add_interactions()[source]

Parse interactions from data and add interactions to SQLite database

base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

download_content()[source]

Download data from FTP server

find_between(string, first, last)[source]

Get the substring between the first occurrence of the substring first and the last occurrence of the substring last

Parameters:
  • string (str) – string
  • first (str) – starting substring
  • last (str) – ending substring
Returns:

substring between the first occurrence of the substring first and the

last occurrence of the substring :obj:`last

Return type:

str

find_between_psi_mi_parentheses(string)[source]

Find the text between parentheses in values of psi-mi key-value pairs

Parameters:string (str) – string
Returns:
substring between the first occurrence of the substring first and the
last occurrence of the substring :obj:`last
Return type:str
find_protein_gene(interactor, alias)[source]

Parse the protein and gene identifiers from key-value pairs of interactors and their aliases

Parameters:
  • interactor (str) – key-value pairs of interactor
  • alias (str) – key-value pairs of the alias of the interactor
Returns:

protein identifier str: gene identifier

Return type:

str

find_pubmed_id(string)[source]

Parse PubMed identifier from annotated key-value pair of publication type-identifier

Parameters:string (str) – key-value pair of publication type-identifier
Returns:PubMed identifier
Return type:str
get_paths_to_backup(download=False)[source]

Get a list of the files to backup/unpack

Parameters:download (bool, optional) – if True, prepare the files for uploading
Returns:list of paths to backup
Return type:list of str
load_content()[source]

Load the content of the local copy of the data source

split_colon(string)[source]

Split a string into substrings separated by ‘:’

Parameters:string (str) – string
Returns:substring separated by ‘:’
Return type:list
split_line(string)[source]

Split a string into substrings separated by ‘|’

Parameters:string (str) – string
Returns:substring separated by ‘|’
Return type:list
class datanator.data_source.intact.ProteinComplex(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein complexes from the IntAct database

identifier[source]
Type:str
name[source]
Type:str
ncbi[source]
Type:str
subunits[source]
Type:str
evidence[source]
Type:str
go_annot[source]
Type:str
desc[source]
Type:str
source[source]
Type:str
desc[source]
evidence[source]
go_annot[source]
identifier[source]
name[source]
ncbi[source]
source[source]
subunits[source]
class datanator.data_source.intact.ProteinInteraction(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein interactions in from the IntAct database

Index[source]

Index of the DB

Type:int
interactor_a[source]

represents participant A

Type:str
interactor_b[source]

represents participant B

Type:str
publications[source]

resource

Type:str
interaction[source]

interaction ID

Type:str
feature_a[source]

binding site of participant A

Type:str
feature_b[source]

binding site of participant B

Type:str
stoich_a[source]

stoichiometry of participant A

Type:str
stoich_b[source]

stoichiometry of participant B

Type:str
confidence[source]
feature_a[source]
feature_b[source]
gene_a[source]
gene_b[source]
index[source]
interaction_id[source]
interaction_type[source]
method[source]
protein_a[source]
protein_b[source]
publication[source]
publication_author[source]
role_a[source]
role_b[source]
stoich_a[source]
stoich_b[source]
type_a[source]
type_b[source]

4.1.1.5.12. datanator.data_source.intact_nosql module

Downloads and parses the IntAct database of protein-protein interactions

class datanator.data_source.intact_nosql.IntActNoSQL(cache_dirname=None, MongoDB=None, db=None, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

A local MongoDB copy of the IntAct database

add_complexes()[source]

Parse complexes from data and add complexes to MongoDB

add_interactions()[source]

Parse interactions from data and add interactions to SQLite database

download_content()[source]

Download data from FTP server

find_between(string, first, last)[source]

Get the substring between the first occurrence of the substring first and the last occurrence of the substring last

Parameters:
  • string (str) – string
  • first (str) – starting substring
  • last (str) – ending substring
Returns:

substring between the first occurrence of the substring first and the

last occurrence of the substring :obj:`last

Return type:

str

find_between_psi_mi_parentheses(string)[source]

Find the text between parentheses in values of psi-mi key-value pairs

Parameters:string (str) – string
Returns:
substring between the first occurrence of the substring first and the
last occurrence of the substring :obj:`last
Return type:str
find_protein_gene(interactor, alias)[source]

Parse the protein and gene identifiers from key-value pairs of interactors and their aliases

Parameters:
  • interactor (str) – key-value pairs of interactor
  • alias (str) – key-value pairs of the alias of the interactor
Returns:

protein identifier str: gene identifier

Return type:

str

find_pubmed_id(string)[source]

Parse PubMed identifier from annotated key-value pair of publication type-identifier

Parameters:string (str) – key-value pair of publication type-identifier
Returns:PubMed identifier
Return type:str
load_content()[source]

Load the content of the local copy of the data source

split_colon(string)[source]

Split a string into substrings separated by ‘:’

Parameters:string (str) – string
Returns:substring separated by ‘:’
Return type:list
split_line(string)[source]

Split a string into substrings separated by ‘|’

Parameters:string (str) – string
Returns:substring separated by ‘|’
Return type:list

4.1.1.5.13. datanator.data_source.jaspar module

This module downloads the JASPAR database of transcription factor binding motifs (http://jaspar.genereg.net/) via a seris of text files, parses them, and stores them in an SQLlite database.

Author:Saahith Pochiraju <saahith116@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-08-01
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.jaspar.Annotation(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]
TAG[source]
VAL[source]
class datanator.data_source.jaspar.Data(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]
col[source]
row[source]
val[source]
class datanator.data_source.jaspar.Jaspar(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local SQLite copy of the JASPAR database of transcription factor binding profiles

ENDPOINT_DOMAINS = {'jaspar': 'http://jaspar.genereg.net/download/database/JASPAR2018.sqlite.tar.gz'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Load the content of the local copy of the data source

class datanator.data_source.jaspar.Matrix(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

BASE_ID[source]
COLLECTION[source]
ID[source]
NAME[source]
VERSION[source]
class datanator.data_source.jaspar.Protein(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ACC[source]
ID[source]
class datanator.data_source.jaspar.Species(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]
TAX_ID[source]
class datanator.data_source.jaspar.Taxon(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

SPECIES[source]
TAX_ID[source]
class datanator.data_source.jaspar.TaxonExtension(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

NAME[source]
TAX_ID[source]
class datanator.data_source.jaspar.Tffm(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

BASE_ID[source]
EXPERIMENT_NAME[source]
ID[source]
LOG_P_1ST_ORDER[source]
LOG_P_DETAILED[source]
MATRIX_BASE_ID[source]
MATRIX_VERSION[source]
NAME[source]
VERSION[source]

4.1.1.5.14. datanator.data_source.kegg module

class datanator.data_source.kegg.Kegg(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the KEGG Ontology

ENDPOINT_DOMAINS = {'kegg': ''}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Load the content of the local copy of the data source

4.1.1.5.15. datanator.data_source.kegg_orthology module

class datanator.data_source.kegg_orthology.KeggOrthology(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

download_ko(name)[source]
load_content()[source]

Load kegg_orthologs into MongoDB

parse_definition(line)[source]
Definition line could be something as follows:
” fructose-bisphosphate aldolase / 6-deoxy-5-ketofructose 1-phosphate synthase [NADP…] [EC:4.1.2.13 2.2.1.11]
EC code can be optional
parse_ko_txt(filename)[source]

Parse kegg_ortho txt file into dictionary object

4.1.1.5.16. datanator.data_source.kegg_reaction_class module

class datanator.data_source.kegg_reaction_class.KeggReaction(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None)[source]

Bases: datanator.util.mongo_util.MongoUtil

download_rxn(name)[source]
download_rxn_cls(cls)[source]
load_content()[source]

Load kegg_reactions into MongoDB

parse_rc_multiline(lines)[source]
Input:
DEFINITION C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y
N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y … … O1a-O2x:*-C1z:C1b-C1x
Output:
[C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y, N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y, …]
parse_rc_orthology(lines)[source]
Input:
ORTHOLOGY K00260 glutamate dehydrogenase [EC:1.4.1.2] K00261 glutamate dehydrogenase (NAD(P)+) [EC:1.4.1.3] K00262 glutamate dehydrogenase (NADP+) [EC:1.4.1.4] K00263 leucine dehydrogenase [EC:1.4.1.9] … K13547 L-glutamine:2-deoxy-scyllo-inosose/3-amino-2,3-dideoxy-scyllo-inosose aminotransferase [EC:2.6.1.100 2.6.1.101] ..
Output
[K00260, K00261, …]
parse_root_json()[source]

Parse root json file and return reaction classes

parse_rxn_cls_txt(filename)[source]

Parse kegg_ortho txt file into dictionary object categories = [‘ENTRY’, ‘DEFINITION’, ‘RPAIR’, ‘REACTION’,

‘ENZYME’, ‘PATHWAY’, ‘ORTHOLOGY’]
datanator.data_source.kegg_reaction_class.main()[source]

4.1.1.5.17. datanator.data_source.metabolite_nosql module

Author:Zhouyang Lian <zhouyang.lian@familian.life>
Author:Jonathan <jonrkarr@gmail.com>
Date:2019-04-02
Copyright:2019, Karr Lab
License:MIT
class datanator.data_source.metabolite_nosql.MetaboliteNoSQL(output_directory, source, MongoDB, db, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]

Bases: datanator.util.mongo_util.MongoUtil

Loads metabolite information into mongodb and output documents as JSON files for each metabolite Attribuites:

source: source database e.g. ‘ecmdb’ ‘ymdb’ MongoDB: mongodb server address e.g. ‘mongodb://localhost:27017/’ max_entries: maximum number of documents to be processed output_direcotory: directory in which JSON files will be stored.
write_to_json()[source]

4.1.1.5.18. datanator.data_source.metabolites_meta_collection module

class datanator.data_source.metabolites_meta_collection.MetabolitesMeta(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', meta_loc=None)[source]

Bases: datanator.core.query_nosql.QuerySabio

meta_loc: database location to save the meta collection

fill_metabolite_fields(fields=None, collection_src=None, collection_des=None)[source]

Fill in values of fields of interest from metabolite collection: ecmdb or ymdb

Args:
fileds: list of fields of interest collection_src: collection in which query will be done collection_des: collection in which result will be updated
load_content()[source]
datanator.data_source.metabolites_meta_collection.main()[source]

4.1.1.5.19. datanator.data_source.pax module

This codebase takes the txt files of the PaxDB protein abundance database and inserts them into an SQL database

define_tables.py - defines the python classes corresponding to the tables in the resulting SQL database

Author:Balazs Szigeti <balazs.szigeti@mssm.edu>
Author:Saahith Pochiraju <saahith116@gmail.com>
Date:2017 June 3
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.pax.Base(**kwargs)[source]

Bases: object

The most base type

metadata = MetaData(bind=None)[source]
class datanator.data_source.pax.Dataset(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a given dataset (typically results form a single paper) .. attribute:: ncbi_id

NCBI id - linked to the ‘taxon’ table

type:int
publication[source]

URL of the corresponding publication

Type:str
file_name[source]

the name of text file corresponding to the dataset

Type:str
score[source]

PaxDb’s internal quality score

Type:flt
weight[source]

TBA

Type:int
coverage[source]

what percentage of the genome is coevred by the datatset

Type:int
coverage[source]
file_name[source]
id[source]
observation[source]
publication[source]
score[source]
taxon[source]
taxon_ncbi_id[source]
weight[source]
class datanator.data_source.pax.Observation(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein .. attribute:: protein_id

PaxDB’s internal numerical protein ID

type:int
dataset_id[source]

ID of the database - linked to the ‘dataset’ table

Type:int
abundance[source]

Normalized abudnance of the protein

Type:flt
abundance[source]
dataset[source]
dataset_id[source]
id[source]
protein[source]
protein_id[source]
class datanator.data_source.pax.Pax(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the Pax database

ENDPOINT_DOMAINS = {'pax': 'https://pax-db.org/downloads/4.1/datasets/paxdb-abundance-files-v4.1.zip', 'pax_protein': 'http://pax-db.org/downloads/latest/paxdb-uniprot-links-v4.1.zip'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Collects and Parses all data from Pax DB website and adds to SQLlite DB

Parameters:req (requests object) – Requests session object
parse_paxDB_files()[source]

This function parses pax DB files and adds them to the SQL database .. attribute:: session (

obj:) : SQLalchemy object
file_id[source]

internal ID of the file

Type:str
data_files[source]

list of the files to be processed

Type:str
data_folder[source]

root folder of the database

Type:str
class datanator.data_source.pax.Protein(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein .. attribute:: protein_id

PaxDB’s internal numerical protein ID

type:int
string_id[source]

Ensembl ID of protein

Type:str
observation[source]
protein_id[source]
string_id[source]
uniprot_id[source]
class datanator.data_source.pax.Taxon(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a species .. attribute:: ncbi_id

NCBI id

type:int
species_name[source]

name and possibly genetic variant

Type:str
datasets[source]
ncbi_id[source]
species_name[source]
datanator.data_source.pax.find_files(path)[source]

Scan a directory (and its subdirectories) for files and sort by ncbi_id

Parameters:path (str) – Path containing the data_files
Returns:list of files to add to DB
Return type:list

4.1.1.5.20. datanator.data_source.pax_nosql module

class datanator.data_source.pax_nosql.PaxNoSQL(cache_dirname, MongoDB, db, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]

Bases: datanator.util.mongo_util.MongoUtil

load_content()[source]

Collects and Parses all data from Pax DB website and adds to MongoDB

parse_paxDB_files()[source]

This function parses pax DB files and adds them to the NoSQL database

datanator.data_source.pax_nosql.find_files(path)[source]

Scan a directory (and its subdirectories) for files and sort by ncbi_id

Parameters:path (str) – Path containing the data_files
Returns:list of files to add to DB
Return type:list

4.1.1.5.21. datanator.data_source.refseq module

import pprint from Bio import SeqIO import datetime import dateutil.parser import pkg_resources import sqlalchemy import sqlalchemy.ext.declarative import sqlalchemy.orm from datanator.core import data_source

class datanator.data_source.refseq.EcNumber(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ec_number[source]
gene[source]
class datanator.data_source.refseq.Gene(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

ec_numbers[source]
essentiality[source]
gene_synonyms[source]
id[source]
identifiers[source]
location[source]
locus_tag[source]
name[source]
qualifiers[source]
ref_genome_version[source]
referenceGenome[source]
class datanator.data_source.refseq.GeneSynonym(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]
name[source]
class datanator.data_source.refseq.Identifier(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]
name[source]
namespace[source]
class datanator.data_source.refseq.Location(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

end[source]
gene[source]
nofuzzy_end[source]
nofuzzy_start[source]
ref[source]
ref_db[source]
start[source]
strand[source]
class datanator.data_source.refseq.Qualifier(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]
key[source]
value[source]
class datanator.data_source.refseq.ReferenceGenome(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

accessions[source]
genes[source]
organism[source]
version[source]
class datanator.data_source.refseq.ReferenceGenomeAccession(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

id[source]
referenceGenome[source]
class datanator.data_source.refseq.Refseq(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=False, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

find_nth(haystack, needle, n)[source]
get_json_ends(tree)[source]
get_or_create_object(cls, **kwargs)[source]

Get the first instance of cls that has the property-values pairs described by kwargs, or create an instance of cls if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls: class :param **kwargs: values of the properties of the object

Returns:instance of cls hat has the property-values pairs described by kwargs
Return type:Base
get_paths_to_backup(download=False)[source]

Get a list of the files to backup/unpack

Parameters:download (bool, optional) – if True, prepare the files for uploading
Returns:list of paths to backup
Return type:list of str
get_ref_seq_url(org_symbol)[source]
load_content(list_bio_seqio_objects)[source]

Load the content of the local copy of the data source

upload_data_from_kegg_org_symbol(kegg_org_symbol)[source]
upload_ref_seq_for_all_prokaryotic_kegg_org()[source]
datanator.data_source.refseq.create_orm(class_1, class_2)[source]

4.1.1.5.22. datanator.data_source.sabio_rk module

Author:Yosef Roth <yosefdroth@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2017-05-04
Copyright:2017, Karr Lab
License:MIT
class datanator.data_source.sabio_rk.Compartment(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents a compartment in the SABIO-RK database

kinetic_laws[source]

list of kinetic laws

Type:list of KineticLaw
created[source]
cross_references[source]
id[source]
kinetic_laws[source]
modified[source]
name[source]
parameters[source]
reaction_participants[source]
synonyms[source]
class datanator.data_source.sabio_rk.Compound(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents a compound in the SABIO-RK database

_is_name_ambiguous[source]

if True, the currently stored compound name should not be trusted because multiple names for the same compound have been discovered. The consensus name must be obtained using download_compounds

Type:bool
structures[source]

structures

Type:list of CompoundStructure
reaction_participants[source]

list of reaction participants

Type:list of ReactionParticipant
parameters[source]

list of parameters

Type:list of Parameter
created[source]
cross_references[source]
get_inchi_structures()[source]

Get InChI-formatted structures

Returns:list of structures in InChI format
Return type:list of str
get_smiles_structures()[source]

Get SMILES-formatted structures

Returns:list of structures in SMILES format
Return type:list of str
id[source]
modified[source]
name[source]
parameters[source]
reaction_participants[source]
structures[source]
synonyms[source]
class datanator.data_source.sabio_rk.CompoundStructure(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents the structure of a compound and its format

compounds[source]

list of compounds

Type:list of Compound
value[source]

the structure in InChI, SMILES, etc. format

Type:str
format[source]

format (InChI, SMILES, etc.) of the structure

Type:str
_value_inchi[source]

structure in InChI format

Type:str
_value_inchi_formula_connectivity[source]

empiral formula (without hydrogen) and connectivity InChI layers; used to quickly search for compound structures

Type:str
calc_inchi_formula_connectivity()[source]

Calculate a searchable structures

  • InChI format

  • Core InChI format

    • Formula layer (without hydrogen)
    • Connectivity layer
compounds[source]
format[source]
value[source]
class datanator.data_source.sabio_rk.Entry(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a compartment in the SABIO-RK database

id[source]

external identifier

Type:int
name[source]

name

Type:str
synonyms[source]

list of synonyms

Type:list of Synonym
cross_references[source]

list of cross references

Type:list of Resource
created[source]

date that the sqlite object was created

Type:datetime.datetime
updated[source]

date that the sqlite object was last updated

Type:datetime.datetime
created[source]
cross_references[source]
id[source]
modified[source]
name[source]
synonyms[source]
class datanator.data_source.sabio_rk.Enzyme(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents an enzyme in the SABIO-RK database

subunits[source]

list of subunits

Type:list of EnzymeSubunit
kinetic_laws[source]

list of kinetic laws

Type:list of KineticLaw
molecular_weight[source]

molecular weight in Daltons

Type:float
parameters[source]

list of parameters

Type:list of Parameter
created[source]
cross_references[source]
id[source]
kinetic_laws[source]
modified[source]
molecular_weight[source]
name[source]
parameters[source]
subunits[source]
synonyms[source]
class datanator.data_source.sabio_rk.EnzymeSubunit(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents an enzyme in the SABIO-RK database

enzyme[source]

enzyme

Type:Enzyme
coefficient[source]

stoichiometry of the subunit in the enzyme

Type:int
sequence[source]

amino acid sequence

Type:str
molecular_weight[source]

molecular weight in Daltons

Type:float
coefficient[source]
created[source]
cross_references[source]
enzyme[source]
enzyme_id[source]
id[source]
modified[source]
molecular_weight[source]
name[source]
sequence[source]
synonyms[source]
class datanator.data_source.sabio_rk.KineticLaw(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents a kinetic law in the SABIO-RK database

reactants[source]

list of reactants

Type:list of ReactionParticipant
products[source]

list of products

Type:list of ReactionParticipant
enzyme[source]

enzyme

Type:Enzyme
enzyme_compartment[source]

compartment

Type:Compartment
enzyme_type[source]

type of the enzyme (e.g. Modifier-Catalyst)

Type:str
tissue[source]

tissue

Type:str
mechanism[source]

mechanism of enzymatic catalysis (e.g. Michaelis-Menten)

Type:str
equation[source]

equation

Type:str
parameters[source]

list of parameters

Type:list of Parameter
modifiers[source]

list of modifiers

Type:list of ReactionParticipant
taxon[source]

taxon

Type:str
taxon_wildtype[source]

if True, the taxon represent the wild type

Type:bool
taxon_variant[source]

variant of the taxon

Type:str
temperature[source]

temperature in C

Type:float
ph[source]

pH

Type:float
media[source]

media

Type:str
references[source]

list of PubMed references

Type:list of Resource
created[source]
cross_references[source]
enzyme[source]
enzyme_compartment[source]
enzyme_compartment_id[source]
enzyme_id[source]
enzyme_type[source]
equation[source]
id[source]
mechanism[source]
media[source]
modified[source]
modifiers[source]
name[source]
parameters[source]
ph[source]
products[source]
reactants[source]
references[source]
synonyms[source]
taxon[source]
taxon_variant[source]
taxon_wildtype[source]
temperature[source]
tissue[source]
class datanator.data_source.sabio_rk.Parameter(**kwargs)[source]

Bases: datanator.data_source.sabio_rk.Entry

Represents a parameter in the SABIO-RK database

kinetic_law[source]

kinetic law

Type:KineticLaw
type[source]

SBO term

Type:int
compound[source]

compound

Type:Compound
enzyme[source]

enzyme

Type:Enzyme
compartment[source]

compartment

Type:Compartment
value[source]

normalized value

Type:float
error[source]

normalized error

Type:float
units[source]

normalized units

Type:str
observed_name[source]

name

Type:str
observed_type[source]

SBO term

Type:int
observed_value[source]

observed value

Type:float
observed_error[source]

observed error

Type:float
observed_units[source]

observed units

Type:str
TYPES (:obj:`dict` of :obj:`int`

str): dictionary of SBO terms and their canonical string symbols

UNITS (:obj:`dict` of :obj:`int`

str): dictionary of SBO terms and their canonical units

TYPES = {25: 'k_cat', 27: 'k_m', 186: 'v_max', 261: 'k_i'}[source]
compartment[source]
compartment_id[source]
compound[source]
compound_id[source]
created[source]
cross_references[source]
enzyme[source]
enzyme_id[source]
error[source]
id[source]
kinetic_law[source]
kinetic_law_id[source]
modified[source]
name[source]
observed_error[source]
observed_name[source]
observed_type[source]
observed_units[source]
observed_value[source]
synonyms[source]
type[source]
units[source]
value[source]
class datanator.data_source.sabio_rk.ReactionParticipant(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a participant in a SABIO-RK reaction

compound[source]

compound

Type:Compound
compartment[source]

compartment

Type:Compartment
coefficient[source]

coefficient

Type:float
type[source]

type

Type:str
reactant_kinetic_law[source]

kinetic law in which the participant appears as a reactant

Type:KineticLaw
product_kinetic_law[source]

kinetic law in which the participant appears as a product

Type:KineticLaw
coefficient[source]
compartment[source]
compartment_id[source]
compound[source]
compound_id[source]
modifier_kinetic_law[source]
modifier_kinetic_law_id[source]
product_kinetic_law[source]
product_kinetic_law_id[source]
reactant_kinetic_law[source]
reactant_kinetic_law_id[source]
type[source]
class datanator.data_source.sabio_rk.Resource(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents an external resource

namespace[source]

external namespace

Type:str
id[source]

external identifier

Type:str
entries[source]

entries

Type:list of Entry
kinetic_laws[source]

kinetic laws

Type:list of KineticLaw
entries[source]
id[source]
kinetic_laws[source]
namespace[source]
class datanator.data_source.sabio_rk.SabioRk(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, webservice_batch_size=1, excel_batch_size=100, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the SABIO-RK database

webservice_batch_size[source]

default size of batches to download kinetic information from the SABIO webservice. Note: this should be set to one because SABIO exports units incorrectly when multiple kinetic laws are requested

Type:int
excel_batch_size[source]

default size of batches to download kinetic information from the SABIO Excel download service

Type:int

URL to obtain a list of the ids of all of the kinetic laws in SABIO-Rk

Type:str
ENDPOINT_WEBSERVICE[source]

URL for the SABIO-RK webservice

Type:str
ENDPOINT_EXCEL_EXPORT[source]

URL to download kinetic data as a table in TSV format

Type:str
ENDPOINT_COMPOUNDS_PAGE[source]

URL to download information about a SABIO-RK compound

Type:str
SKIP_KINETIC_LAW_IDS[source]

IDs of kinetic laws that should be skipped (because they cannot contained errors and can’t be downloaded from SABIO)

Type:tuple of int
PUBCHEM_MAX_TRIES[source]

maximum number of times to time querying PubChem before failing

Type:int
PUBCHEM_TRY_DELAY[source]

delay in seconds between PubChem queries (to delay overloading the server)

Type:float
ENDPOINT_COMPOUNDS_PAGE = 'http://sabiork.h-its.org/compdetails.jsp'[source]
ENDPOINT_DOMAINS = {'sabio_rk': 'http://sabiork.h-its.org', 'uniprot': 'http://www.uniprot.org'}[source]
ENDPOINT_EXCEL_EXPORT = 'http://sabiork.h-its.org/entry/exportToExcelCustomizable'[source]
ENDPOINT_KINETIC_LAWS_PAGE = 'http://sabiork.h-its.org/kindatadirectiframe.jsp'[source]
ENDPOINT_KINETIC_LAWS_SEARCH = 'http://sabiork.h-its.org/sabioRestWebServices/searchKineticLaws/entryIDs'[source]
ENDPOINT_WEBSERVICE = 'http://sabiork.h-its.org/sabioRestWebServices/kineticLaws'[source]
PUBCHEM_MAX_TRIES = 10[source]
PUBCHEM_TRY_DELAY = 0.25[source]
SKIP_KINETIC_LAW_IDS = (51286,)[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

calc_enzyme_molecular_weights(enzymes)[source]

Calculate the molecular weight of each enzyme

Parameters:enzymes (list of Enzyme) – list of enzymes
calc_stats()[source]

Calculate statistics about SABIO-RK

Returns:list of list of statistics
Return type:list of list of obj
create_compartment_from_sbml(sbml)[source]

Add a compartment to the local sqlite database

Parameters:sbml (libsbml.Compartment) – SBML-representation of a compartment
Returns:compartment
Return type:Compartment
create_cross_references_from_sbml(sbml)[source]

Add cross references to the local sqlite database for an SBML object

Parameters:sbml (libsbml.SBase) – object in an SBML documentation
Returns:list of resources
Return type:list of Resource
create_kinetic_law_from_sbml(id, sbml, specie_properties, functions, units)[source]

Add a kinetic law to the local sqlite database

Parameters:
  • id (int) – identifier
  • sbml (libsbml.KineticLaw) – SBML-representation of a reaction
  • specie_properties (dict) –

    additional properties of the compounds/enzymes

    • is_wildtype (bool): indicates if the enzyme is wildtype or mutant
    • variant (str): description of the variant of the eznyme
    • modifier_type (str): type of the enzyme (e.g. Modifier-Catalyst)

:param functions (dict of str: str): dictionary of rate law equations (keys = IDs in SBML, values = equations) :param units (dict of str: str): dictionary of units (keys = IDs in SBML, values = names)

Returns:kinetic law
Return type:KineticLaw
Raises:ValueError – if the temperature is expressed in an unsupported unit
create_kinetic_laws_from_sbml(ids, sbml)[source]

Add kinetic laws defined in an SBML file to the local sqlite database

Parameters:
  • ids (list of int) – list kinetic law IDs
  • sbml (str) – SBML representation of one or more kinetic laws
Returns:

Return type:

tuple

create_specie_from_sbml(sbml)[source]

Add a species to the local sqlite database

Parameters:sbml (libsbml.Species) – SBML-representation of a compound or enzyme
Returns:
  • Compound: or Enzyme: compound or enzyme
  • dict: additional properties of the compound/enzyme
    • is_wildtype (bool): indicates if the enzyme is wildtype or mutant
    • variant (str): description of the variant of the eznyme
    • modifier_type (str): type of the enzyme (e.g. Modifier-Catalyst)
Return type:tuple
Raises:ValueError – if a species is of an unsupported type (i.e. not a compound or enzyme)
export_stats(stats, filename=None)[source]

Export statistics to an Excel workbook

Parameters:
  • stats (list of list of obj) – list of list of statistics
  • filename (str, optional) – path to export statistics
get_parameter_by_properties(kinetic_law, parameter_properties)[source]

Get the parameter of kinetic_law whose attribute values are equal to that of parameter_properties

Parameters:
  • kinetic_law (KineticLaw) – kinetic law to find parameter of
  • parameter_properties (dict) – properties of parameter to find
Returns:

parameter with attribute values equal to values of parameter_properties

Return type:

Parameter

get_specie_reference_from_sbml(specie_id)[source]

Get the compound/enzyme associated with an SBML species by its ID

Parameters:specie_id (str) – ID of an SBML species
Returns:
Return type:tuple
Raises:ValueError – if the species is not a compound or enzyme, no species with id = specie_id exists, or no compartment with name = compartment_name exists
infer_compound_structures_from_names(compounds)[source]

Try to use PubChem to infer the structure of compounds from their names

Notes: we don’t try look up structures from their cross references because SABIO has already gathered all structures from their cross references to ChEBI, KEGG, and PubChem

Parameters:compounds (list of Compound) – list of compounds
load_compounds(compounds=None)[source]

Download information from SABIO-RK about all of the compounds stored in the local sqlite copy of SABIO-RK

Parameters:compounds (list of Compound) – list of compounds to download
Raises:Error – if an HTTP request fails
load_content()[source]

Download the content of SABIO-RK and store it to a local sqlite database.

load_kinetic_law_ids()[source]

Download the IDs of all of the kinetic laws stored in SABIO-RK

Returns:list of kinetic law IDs
Return type:list of int
Raises:Error – if an HTTP request fails or the expected number of kinetic laws is not returned
load_kinetic_laws(ids)[source]

Download kinetic laws from SABIO-RK

Parameters:ids (list of int) – list of IDs of kinetic laws to download
Raises:Error – if an HTTP request fails
load_missing_enzyme_information_from_html(ids)[source]

Loading enzyme subunit information from html

Parameters:ids (list of int) – list of IDs of kinetic laws to download
load_missing_kinetic_law_information_from_tsv(ids)[source]

Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.

Parameters:ids (list of int) – list of IDs of kinetic laws to download
load_missing_kinetic_law_information_from_tsv_helper(tsv)[source]

Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.

Note: this method is necessary because neither of SABIO’s SBML and Excel export methods provide all of the SABIO’s content.

Parameters:tsv (str) – TSV-formatted table
Raises:ValueError – if a kinetic law or compartment is not contained in the local sqlite database
normalize_kinetic_laws(ids)[source]

Normalize parameter values

Parameters:ids (list of int) – list of IDs of kinetic laws to download
normalize_parameter_value(name, type, value, error, units, enzyme_molecular_weight)[source]
Parameters:
  • name (str) – parameter name
  • type (int) parameter type (SBO term id) –
  • value (float) – observed value
  • error (float) – observed error
  • units (str) – observed units
  • enzyme_molecular_weight (float) – enzyme molecular weight
Returns:

normalized name and

its type (SBO term), value, error, and units

Return type:

tuple of str, int, float, float, str

Raises:

ValueError – if units is not a supported unit of type

parse_complex_subunit_structure(text)[source]

Parse the subunit structure of complex into a dictionary of subunit coefficients

Parameters:text (str) – subunit structure described with nested parentheses
Returns:dictionary of subunit coefficients
Return type:dict of str, int
parse_enzyme_name(sbml)[source]

Parse the name of an enzyme in SBML for the enzyme name, wild type status, and variant description that it contains.

Parameters:sbml (str) – enzyme name in SBML
Returns:
  • str: name
  • bool: if True, the enzyme is wild type
  • str: variant
Return type:tuple
Raises:ValueError – if the enzyme name is formatted in an unsupport format
class datanator.data_source.sabio_rk.Synonym(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents a synonym to a SABIO-RK entry

name[source]

name of the synonym

Type:str
entries[source]

list of entries with the synonym

Type:list of Entry
entries[source]
name[source]

4.1.1.5.23. datanator.data_source.sabio_rk_nosql module

Parse SabioRk json files into MongoDB documents
(json files acquired by running sqlite_to_json.py)
Author:Zhouyang Lian <zhouyang.lian@familian.life>
Author:Jonathan <jonrkarr@gmail.com>
Date:2019-04-02
Copyright:2019, Karr Lab
License:MIT
class datanator.data_source.sabio_rk_nosql.SabioRkNoSQL(db=None, MongoDB=None, cache_directory=None, quilt_package=None, verbose=False, max_entries=inf, replicaSet=None, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

add_deprot_inchi()[source]
load_json()[source]
make_doc(file_names, file_dict)[source]
datanator.data_source.sabio_rk_nosql.main()[source]

4.1.1.5.24. datanator.data_source.sqlite_to_json module

Converts tables in SQLite into json files .. attribute:: database

path to sqlite database
datanator.data_source.sqlite_to_json.query[source]

query execution command in string format

class datanator.data_source.sqlite_to_json.SQLToJSON(query, cache_dirname=None, quilt_package=None, system_path=None)[source]

Bases: object

db()[source]
query_table(table, one=True)[source]
table()[source]

4.1.1.5.25. datanator.data_source.taxon_tree module

class datanator.data_source.taxon_tree.TaxonTree(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]

Bases: datanator.util.mongo_util.MongoUtil

count_line(file)[source]

Efficiently count total number of lines in a given file

download_dump()[source]
load_content()[source]

Load contents of several .dmp files into MongoDB

parse_division()[source]

division.dmp

parse_fullname_line(line)[source]

Parses lines in file fullnamelineage.dmp and return elements in a list

parse_fullname_taxid()[source]

Parse fullnamelineage.dmp and taxidlineage.dmp store in MongoDB Always run first before loading anything else (insert_one)

parse_gencode()[source]

gencode.dmp

parse_names()[source]

names.dmp 1 | all | | synonym | 1 | root | | scientific name | 2 | bacteria | bacteria <blast2> | blast name | 2 | Bacteria | Bacteria <prokaryotes> | scientific name | 2 | eubacteria | | genbank common name

parse_nodes()[source]

nodes.dmp

parse_nodes_line(line)[source]

Parse lines in nodes.dmp

parse_taxid_line(line)[source]
Parses lines in file taxidlineage.dmp and return elements in a list
delimited by ” |
(tab, vertical bar, and newline) characters. Each record consists of one or more fields delimited by ” | ” (tab, vertical bar, and tab) characters.
datanator.data_source.taxon_tree.main()[source]

4.1.1.5.26. datanator.data_source.uniprot module

Downloads and parses the UnitProt database for protein-protein interactions

Author:Saahith Pochiraju <saahith116@gmail.com>
Author:Jonathan Karr <jonrkarr@gmail.com>
Date:2018-08-15
Copyright:2017-2018, Karr Lab
License:MIT
class datanator.data_source.uniprot.Uniprot(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]

Bases: datanator.core.data_source.HttpDataSource

ENDPOINT_DOMAINS = {'uniprot': 'http://www.uniprot.org/uniprot/?fil=reviewed:yes'}[source]
base_model[source]

alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]

Load the content of the local copy of the data source

class datanator.data_source.uniprot.UniprotData(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein interactions in from the IntAct Database

Index[source]

Index of the DB

Type:int
interactor_a[source]

represents participant A

Type:str
interactor_b[source]

represents participant B

Type:str
publications[source]

resource

Type:str
interaction[source]

interaction ID

Type:str
feature_a[source]

binding site of participant A

Type:str
feature_b[source]

binding site of participant B

Type:str
stoich_a[source]

stoichiometry of participant A

Type:str
stoich_b[source]

stoichiometry of participant B

Type:str
canonical_sequence[source]
ec_number[source]
entrez_id[source]
entry_name[source]
gene_name[source]
index[source]
length[source]
mass[source]
protein_name[source]
status[source]
uniprot_id[source]

4.1.1.5.27. datanator.data_source.uniprot_nosql module

Author:Zhouyang Lian <zhouyang.lian@familian.life>
Author:Jonathan <jonrkarr@gmail.com>
Date:2019-04-02
Copyright:2019, Karr Lab
License:MIT
class datanator.data_source.uniprot_nosql.UniprotNoSQL(MongoDB=None, db=None, max_entries=inf, verbose=False, username=None, password=None, authSource='admin', replicaSet=None)[source]

Bases: datanator.util.mongo_util.MongoUtil

get_uniprot()[source]
load_uniprot()[source]

4.1.1.5.28. Module contents