4.1.1.5. datanator.data_source package¶
4.1.1.5.1. Subpackages¶
4.1.1.5.2. Submodules¶
4.1.1.5.3. datanator.data_source.array_express module¶
Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT
-
class
datanator.data_source.array_express.
ArrayExpress
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS
list of IDs of datasets to exclude
type: list
ofstr
-
ENDPOINT_DOMAINS
= {'array_express': 'https://www.ebi.ac.uk/arrayexpress/json/v3/experiments'}[source]¶
-
get_or_create_object
(cls, **kwargs)[source]¶ Get the first instance of
cls
that has the property-values pairs described by kwargs, or create an instance ofcls
if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls:class
:param **kwargs: values of the properties of the objectReturns: instance of cls
hat has the property-values pairs described by kwargsReturn type: Base
-
load_content
(test_url='')[source]¶ Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year:
int
, optional :param end_year: the last year to retrieve experiments for :type end_year:int
, optional
-
load_experiment_metadata
(test_url='')[source]¶ Get a list of accession identifiers for the experiments from the year
start_year
to yearend_year
:param start_year: the first year to retrieve experiment acession ids for :type start_year:int
, optional :param end_year: the last year to retrieve experiment acession ids for :type end_year:int
, optionalReturns: list of experiment accession identifiers Return type: list
ofstr
-
load_experiment_protocol
(experiment, protocol_json)[source]¶ Load the protocols for an experiment :param experiment: experiment :type experiment:
Experiment
:param protocol_json: sample :type protocol_json:dict
-
load_experiment_protocols
(experiment)[source]¶ Load the protocols for an experiment :param experiment: experiment :type experiment:
Experiment
-
load_experiment_sample
(experiment, sample_json, index)[source]¶ Load the samples for an experiment :param experiment: experiment :type experiment:
Experiment
:param sample_json: sample :type sample_json:dict
:param index: index of the sample within the experiment :type index:int
-
load_experiment_samples
(experiment)[source]¶ Load the samples for an experiment :param experiment: experiment :type experiment:
Experiment
-
-
class
datanator.data_source.array_express.
Characteristic
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an experimental characteristic .. attribute:: _id
unique id
type: int
-
category
[source]
-
samples
[source]
-
value
[source]
-
-
class
datanator.data_source.array_express.
DataFormat
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a data format .. attribute:: _id
unique id
type: int
-
bio_assay_data_cubes
[source]
-
name
[source]
-
-
class
datanator.data_source.array_express.
EnsemblInfo
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a url .. attribute:: _id
unique id
type: int
-
organism_strain
[source]¶ the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)
Type: str
-
organism_strain
[source]
-
url
[source]
-
-
class
datanator.data_source.array_express.
Experiment
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an experiment .. attribute:: _id
unique id
type: int
-
types
[source]¶ list of experiment types
Type: list
ofExperimentType
-
designs
[source]¶ list of experimental designs
Type: list
ofExperimentDesign
-
data_formats
[source]¶ list of data formats
Type: list
ofDataFormat
-
data_formats
[source]
-
description
[source]
-
designs
[source]
-
has_fastq_files
[source]
-
id
[source]
-
name
[source]
-
name_2
[source]
-
organisms
[source]
-
read_type
[source]
-
release_date
[source]
-
submission_date
[source]
-
types
[source]
-
-
class
datanator.data_source.array_express.
ExperimentDesign
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents and experimental design .. attribute:: _id
unique id
type: int
-
name
[source]
-
-
class
datanator.data_source.array_express.
ExperimentType
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a type of experiment .. attribute:: _id
unique id
type: int
-
name
[source]
-
-
class
datanator.data_source.array_express.
Extract
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an extract of a sample .. attribute:: _id
unique id
type: int
-
name
[source]
-
samples
[source]
-
-
class
datanator.data_source.array_express.
Organism
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an organism .. attribute:: _id
unique id
type: int
-
name
[source]
-
-
class
datanator.data_source.array_express.
Protocol
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a protocol for an experiment .. attribute:: _id
unique id
type: int
-
protocol_type
[source]¶ the type of exerpimental protocol (e.g. normalization, extraction, etc.)
Type: list
ofSample
-
experiments
[source]¶ list of experiments that performed this protocol
Type: list
ofExperiment
-
experiments
[source]
-
hardware
[source]
-
performer
[source]
-
protocol_accession
[source]
-
protocol_type
[source]
-
software
[source]
-
text
[source]
-
-
class
datanator.data_source.array_express.
Sample
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an observed concentration .. attribute:: _id
unique id
type: int
-
experiment
[source]¶ experiment that the sample belongs to
Type: Experiment
-
name
[source]¶ name of the source of the sample (this is used to identify the sample in arraya express)
Type: str
-
ensembl_organism_strain
[source]¶ the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)
Type: str
-
characteristics (:obj:`list` of
obj:`Characteristic’): characteristics
-
read_type
[source]¶ the nature of the FASTQ file reads. Either ‘single’, ‘multiple’, or ‘parallel’
Type: str
-
ensembl_info (:obj:`list` of
obj:`Variable’): informtation about the ensembl reference genome
-
full_strain_specificity
[source]¶ whether or not ensembl reference genome matches the full strain specifity recoreded in array express
Type: bool
-
assay
[source]
-
ensembl_organism_strain
[source]
-
experiment
[source]
-
experiment_id
[source]
-
fastq_urls
[source]
-
full_strain_specificity
[source]
-
index
[source]
-
name
[source]
-
read_type
[source]
-
variables
[source]
-
-
class
datanator.data_source.array_express.
Url
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a url .. attribute:: _id
unique id
type: int
-
samples
[source]
-
url
[source]
-
4.1.1.5.4. datanator.data_source.bio_portal module¶
Downloads ontologies from BioPortal
Author: | Jonathan Karr <jonrkarr@gmail.com> |
---|---|
Date: | 2017-05-23 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.bio_portal.
BioPortal
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, flask=False, quilt_owner=None, quilt_package=None, ontologies=None)[source]¶ Bases:
datanator.core.data_source.CachedDataSource
Loads ontologies from BioPortal
-
BIOPORTAL_ENDPOINT
= 'http://data.bioontology.org'[source]
-
CCO_DOWNLOAD_URL
= 'http://www.bio.ntnu.no/ontology/CCO/cco.obo'[source]
-
DEFAULT_ONTOLOGIES
= ('BTO.obo', 'CCO.obo', 'CL.owl', 'DOID.obo', 'EFO.owl', 'FMA.owl', 'GO.obo', 'PW.obo', 'SBO.obo')[source]¶
-
clear_content
()[source]¶ Clear the content of the sqlite database (i.e. drop and recreate all tables).
-
download_ontology
(id)[source]¶ Download an ontology from BioPortal
Parameters: id ( str
) – identifier of the ontology in BioPortal
-
get_engine
()[source]¶ Get an engine for the sqlite database. If the database doesn’t exist, initialize its structure.
Returns: database engine Return type: sqlalchemy.engine.Engine
-
get_ontologies_filename
()[source]¶ Get the local filename to store a list of the ontologies
Returns: filename Return type: str
-
get_ontology
(id)[source]¶ Load ontology and download the ontology from BioPortal if neccessary
Parameters: id ( str
) – identifier of the ontology in BioPortalReturns: ontology Return type: pronto.Ontology
-
get_ontology_filename
(id)[source]¶ Get the local filename to store a copy of an ontology
Parameters: id ( str
) – identifier of the ontology in BioPortalReturns: filename Return type: str
-
get_paths_to_backup
(download=False)[source]¶ Get a list of the files to backup/unpack
Parameters: download ( bool
, optional) – ifTrue
, prepare the files for uploadingReturns: list of paths to backup Return type: list
ofstr
-
4.1.1.5.5. datanator.data_source.corum module¶
This codebase takes CORUM protein complexes database and formats it to an SQL database
Author: | Balazs Szigeti <balazs.szigeti@mssm.edu> |
---|---|
Author: | Saahith Pochiraju <saahith116@gmail.com> |
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2018-08-13 |
Copyright: | 2017-2018, Karr Lab |
License: | MIT |
-
class
datanator.data_source.corum.
Complex
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a protein complex .. attribute:: observation_id
ID of the observation
type: int
-
complex_cmt
[source]
-
complex_id
[source]
-
complex_name
[source]
-
disease_cmt
[source]
-
funcat_dsc
[source]
-
funcat_id
[source]
-
go_dsc
[source]
-
go_id
[source]
-
su_cmt
[source]
-
-
class
datanator.data_source.corum.
Corum
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the CORUM database
-
class
datanator.data_source.corum.
Observation
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an observation (entries in the original DB) .. attribute:: id
internal ID for the observation entry
type: int
-
cell line
cell line (in whcih the measurement was done)
Type: str
-
pubmed_id
[source]
-
pur_method
[source]
-
taxon_ncbi_id
[source]
-
-
class
datanator.data_source.corum.
Subunit
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents subunits of complexes .. attribute:: id
Internal subunit ID
type: int
-
complex_id
[source]
-
gene_name
[source]
-
gene_syn
[source]
-
protein_name
[source]
-
su_entrezs
[source]
-
su_uniprot
[source]
-
-
class
datanator.data_source.corum.
Taxon
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a species .. attribute:: ncbi_id
NCBI id
type: int
4.1.1.5.6. datanator.data_source.corum_nosql module¶
-
class
datanator.data_source.corum_nosql.
CorumNoSQL
(MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', cache_dirname=None)[source]¶
4.1.1.5.7. datanator.data_source.cron_aggregate module¶
4.1.1.5.8. datanator.data_source.ecmdb module¶
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2017-05-04 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.ecmdb.
Compartment
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a compartment
-
compounds
[source]
-
name
[source]
-
-
class
datanator.data_source.ecmdb.
Compound
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an ECMDB entry
-
_structure_formula_connectivity
[source]¶ empiral formula and connectivity InChI layers; used to quickly search for compound structures
Type: str
-
compartments
[source]¶ compartments
Type: list
ofCompartment
-
concentrations
[source]¶ concentrations
Type: list
ofConcentration
-
comment
[source]
-
compartments
[source]
-
concentrations
[source]
-
created
[source]
-
cross_references
[source]
-
description
[source]
-
downloaded
[source]
-
id
[source]
-
name
[source]
-
structure
[source]
-
synonyms
[source]
-
updated
[source]
-
-
class
datanator.data_source.ecmdb.
Concentration
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an observed concentration
-
compound
[source]
-
error
[source]
-
growth_status
[source]
-
growth_system
[source]
-
media
[source]
-
references
[source]
-
strain
[source]
-
value
[source]
-
-
class
datanator.data_source.ecmdb.
Ecmdb
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the ECMDB database
-
DOWNLOAD_COMPOUND_URL
= 'http://ecmdb.ca/compounds/{}.xml'[source]
-
DOWNLOAD_INDEX_URL
= 'http://ecmdb.ca/download/ecmdb.json.zip'[source]
-
get_node_children
(node, children_name)[source]¶ Get the children of an XML node
Parameters: - node (
jxmlease.cdatanode.XMLNode
) – XML node - children_name (
str
) – tag names of the desired children
Returns: list of child nodes
Return type: list
ofXMLNode
- node (
-
4.1.1.5.9. datanator.data_source.ensembl module¶
Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT
-
class
datanator.data_source.ensembl.
GeneEntry
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.ensembl.
GeneIdentifier
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a url .. attribute:: _id
unique id
type: int
-
samples
[source]
-
-
class
datanator.data_source.ensembl.
GetGenes
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS
list of IDs of datasets to exclude
type: list
ofstr
-
load_content
()[source]¶ Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year:
int
, optional :param end_year: the last year to retrieve experiments for :type end_year:int
, optional
-
4.1.1.5.10. datanator.data_source.ezyme module¶
Ezyme
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2017-05-04 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.ezyme.
Ezyme
[source]¶ Bases:
datanator.core.data_source.WebserviceDataSource
Utilities for using Ezyme to predict EC numbers.
See Ezyme (http://www.genome.jp/tools-bin/predict_reaction) for more information.
-
EC_PREDICTION_URL
= 'http://www.genome.jp/kegg-bin/get_htext?htext=ko01000.keg&query='[source]
-
REQUEST_URL
= 'http://www.genome.jp/tools-bin/predict_view'[source]
-
RETRIEVAL_URL
= 'http://www.genome.jp/tools-bin/e-zyme2/result.cgi'[source]
-
run
(reaction)[source]¶ Use Ezyme to predict the first three digits of the EC number of a reaction.
:param
data_model.Reaction
: reactionReturns: - ranked list of predicted EC numbers and their scores
- or
None
if one or more participant doesn’t have a defined structure
Return type: list
ofEzymeResult
orNone
-
4.1.1.5.11. datanator.data_source.intact module¶
Downloads and parses the IntAct database of protein-protein interactions
Author: | Saahith Pochiraju <saahith116@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2018-08-13 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.intact.
IntAct
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.FtpDataSource
A local SQLite copy of the IntAct database
-
ENDPOINT_DOMAINS
= {'complextab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/complex/current/complextab/', 'psimitab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psimitab/intact_negative.txt'}[source]¶
-
find_between
(string, first, last)[source]¶ Get the substring between the first occurrence of the substring
first
and the last occurrence of the substringlast
Parameters: - string (
str
) – string - first (
str
) – starting substring - last (
str
) – ending substring
Returns: - substring between the first occurrence of the substring
first
and the last occurrence of the substring :obj:`last
Return type: str
- string (
-
find_between_psi_mi_parentheses
(string)[source]¶ Find the text between parentheses in values of psi-mi key-value pairs
Parameters: string ( str
) – stringReturns: - substring between the first occurrence of the substring
first
and the - last occurrence of the substring :obj:`last
Return type: str
- substring between the first occurrence of the substring
-
find_protein_gene
(interactor, alias)[source]¶ Parse the protein and gene identifiers from key-value pairs of interactors and their aliases
Parameters: - interactor (
str
) – key-value pairs of interactor - alias (
str
) – key-value pairs of the alias of the interactor
Returns: protein identifier
str
: gene identifierReturn type: str
- interactor (
-
find_pubmed_id
(string)[source]¶ Parse PubMed identifier from annotated key-value pair of publication type-identifier
Parameters: string ( str
) – key-value pair of publication type-identifierReturns: PubMed identifier Return type: str
-
get_paths_to_backup
(download=False)[source]¶ Get a list of the files to backup/unpack
Parameters: download ( bool
, optional) – ifTrue
, prepare the files for uploadingReturns: list of paths to backup Return type: list
ofstr
-
-
class
datanator.data_source.intact.
ProteinComplex
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents protein complexes from the IntAct database
-
desc
[source]
-
evidence
[source]
-
go_annot
[source]
-
identifier
[source]
-
name
[source]
-
ncbi
[source]
-
source
[source]
-
subunits
[source]
-
4.1.1.5.12. datanator.data_source.intact_nosql module¶
Downloads and parses the IntAct database of protein-protein interactions
-
class
datanator.data_source.intact_nosql.
IntActNoSQL
(cache_dirname=None, MongoDB=None, db=None, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
A local MongoDB copy of the IntAct database
-
find_between
(string, first, last)[source]¶ Get the substring between the first occurrence of the substring
first
and the last occurrence of the substringlast
Parameters: - string (
str
) – string - first (
str
) – starting substring - last (
str
) – ending substring
Returns: - substring between the first occurrence of the substring
first
and the last occurrence of the substring :obj:`last
Return type: str
- string (
-
find_between_psi_mi_parentheses
(string)[source]¶ Find the text between parentheses in values of psi-mi key-value pairs
Parameters: string ( str
) – stringReturns: - substring between the first occurrence of the substring
first
and the - last occurrence of the substring :obj:`last
Return type: str
- substring between the first occurrence of the substring
-
find_protein_gene
(interactor, alias)[source]¶ Parse the protein and gene identifiers from key-value pairs of interactors and their aliases
Parameters: - interactor (
str
) – key-value pairs of interactor - alias (
str
) – key-value pairs of the alias of the interactor
Returns: protein identifier
str
: gene identifierReturn type: str
- interactor (
-
find_pubmed_id
(string)[source]¶ Parse PubMed identifier from annotated key-value pair of publication type-identifier
Parameters: string ( str
) – key-value pair of publication type-identifierReturns: PubMed identifier Return type: str
-
4.1.1.5.13. datanator.data_source.jaspar module¶
This module downloads the JASPAR database of transcription factor binding motifs (http://jaspar.genereg.net/) via a seris of text files, parses them, and stores them in an SQLlite database.
Author: | Saahith Pochiraju <saahith116@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2017-08-01 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.jaspar.
Annotation
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.jaspar.
Data
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.jaspar.
Jaspar
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local SQLite copy of the JASPAR database of transcription factor binding profiles
-
class
datanator.data_source.jaspar.
Matrix
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.jaspar.
Protein
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.jaspar.
Species
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.jaspar.
Taxon
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
4.1.1.5.14. datanator.data_source.kegg module¶
-
class
datanator.data_source.kegg.
Kegg
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the KEGG Ontology
4.1.1.5.15. datanator.data_source.kegg_orthology module¶
-
class
datanator.data_source.kegg_orthology.
KeggOrthology
(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶
4.1.1.5.16. datanator.data_source.kegg_reaction_class module¶
-
class
datanator.data_source.kegg_reaction_class.
KeggReaction
(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None)[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
parse_rc_multiline
(lines)[source]¶ - Input:
- DEFINITION C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y
- N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y … … O1a-O2x:*-C1z:C1b-C1x
- Output:
- [C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y, N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y, …]
-
parse_rc_orthology
(lines)[source]¶ - Input:
- ORTHOLOGY K00260 glutamate dehydrogenase [EC:1.4.1.2] K00261 glutamate dehydrogenase (NAD(P)+) [EC:1.4.1.3] K00262 glutamate dehydrogenase (NADP+) [EC:1.4.1.4] K00263 leucine dehydrogenase [EC:1.4.1.9] … K13547 L-glutamine:2-deoxy-scyllo-inosose/3-amino-2,3-dideoxy-scyllo-inosose aminotransferase [EC:2.6.1.100 2.6.1.101] ..
- Output
- [K00260, K00261, …]
-
4.1.1.5.17. datanator.data_source.metabolite_nosql module¶
Author: | Zhouyang Lian <zhouyang.lian@familian.life> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2019-04-02 |
Copyright: | 2019, Karr Lab |
License: | MIT |
-
class
datanator.data_source.metabolite_nosql.
MetaboliteNoSQL
(output_directory, source, MongoDB, db, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
Loads metabolite information into mongodb and output documents as JSON files for each metabolite Attribuites:
source: source database e.g. ‘ecmdb’ ‘ymdb’ MongoDB: mongodb server address e.g. ‘mongodb://localhost:27017/’ max_entries: maximum number of documents to be processed output_direcotory: directory in which JSON files will be stored.
4.1.1.5.18. datanator.data_source.metabolites_meta_collection module¶
-
class
datanator.data_source.metabolites_meta_collection.
MetabolitesMeta
(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', meta_loc=None)[source]¶ Bases:
datanator.core.query_nosql.QuerySabio
meta_loc: database location to save the meta collection
-
fill_metabolite_fields
(fields=None, collection_src=None, collection_des=None)[source]¶ Fill in values of fields of interest from metabolite collection: ecmdb or ymdb
- Args:
- fileds: list of fields of interest collection_src: collection in which query will be done collection_des: collection in which result will be updated
-
4.1.1.5.19. datanator.data_source.pax module¶
This codebase takes the txt files of the PaxDB protein abundance database and inserts them into an SQL database
define_tables.py - defines the python classes corresponding to the tables in the resulting SQL database
Author: | Balazs Szigeti <balazs.szigeti@mssm.edu> |
---|---|
Author: | Saahith Pochiraju <saahith116@gmail.com> |
Date: | 2017 June 3 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.pax.
Dataset
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a given dataset (typically results form a single paper) .. attribute:: ncbi_id
NCBI id - linked to the ‘taxon’ table
type: int
-
coverage
[source]
-
file_name
[source]
-
publication
[source]
-
score
[source]
-
weight
[source]
-
-
class
datanator.data_source.pax.
Observation
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a protein .. attribute:: protein_id
PaxDB’s internal numerical protein ID
type: int
-
abundance
[source]
-
dataset_id
[source]
-
-
class
datanator.data_source.pax.
Pax
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the Pax database
-
ENDPOINT_DOMAINS
= {'pax': 'https://pax-db.org/downloads/4.1/datasets/paxdb-abundance-files-v4.1.zip', 'pax_protein': 'http://pax-db.org/downloads/latest/paxdb-uniprot-links-v4.1.zip'}[source]¶
-
load_content
()[source]¶ Collects and Parses all data from Pax DB website and adds to SQLlite DB
Parameters: req ( requests object
) – Requests session object
-
-
class
datanator.data_source.pax.
Protein
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a protein .. attribute:: protein_id
PaxDB’s internal numerical protein ID
type: int
-
string_id
[source]
-
4.1.1.5.20. datanator.data_source.pax_nosql module¶
4.1.1.5.21. datanator.data_source.refseq module¶
import pprint from Bio import SeqIO import datetime import dateutil.parser import pkg_resources import sqlalchemy import sqlalchemy.ext.declarative import sqlalchemy.orm from datanator.core import data_source
-
class
datanator.data_source.refseq.
EcNumber
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
Gene
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
GeneSynonym
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
Identifier
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
Location
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
Qualifier
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
ReferenceGenome
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
ReferenceGenomeAccession
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
class
datanator.data_source.refseq.
Refseq
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=False, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
-
get_or_create_object
(cls, **kwargs)[source]¶ Get the first instance of
cls
that has the property-values pairs described by kwargs, or create an instance ofcls
if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls:class
:param **kwargs: values of the properties of the objectReturns: instance of cls
hat has the property-values pairs described by kwargsReturn type: Base
-
4.1.1.5.22. datanator.data_source.sabio_rk module¶
Author: | Yosef Roth <yosefdroth@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2017-05-04 |
Copyright: | 2017, Karr Lab |
License: | MIT |
-
class
datanator.data_source.sabio_rk.
Compartment
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a compartment in the SABIO-RK database
-
kinetic_laws
[source]¶ list of kinetic laws
Type: list
ofKineticLaw
-
kinetic_laws
[source]
-
-
class
datanator.data_source.sabio_rk.
Compound
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a compound in the SABIO-RK database
-
_is_name_ambiguous
[source]¶ if
True
, the currently stored compound name should not be trusted because multiple names for the same compound have been discovered. The consensus name must be obtained usingdownload_compounds
Type: bool
-
structures
[source]¶ structures
Type: list
ofCompoundStructure
-
reaction_participants
[source]¶ list of reaction participants
Type: list
ofReactionParticipant
-
get_inchi_structures
()[source]¶ Get InChI-formatted structures
Returns: list of structures in InChI format Return type: list
ofstr
-
get_smiles_structures
()[source]¶ Get SMILES-formatted structures
Returns: list of structures in SMILES format Return type: list
ofstr
-
parameters
[source]
-
reaction_participants
[source]
-
structures
[source]
-
-
class
datanator.data_source.sabio_rk.
CompoundStructure
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents the structure of a compound and its format
-
_value_inchi_formula_connectivity
[source]¶ empiral formula (without hydrogen) and connectivity InChI layers; used to quickly search for compound structures
Type: str
-
calc_inchi_formula_connectivity
()[source]¶ Calculate a searchable structures
InChI format
Core InChI format
- Formula layer (without hydrogen)
- Connectivity layer
-
compounds
[source]
-
format
[source]
-
value
[source]
-
-
class
datanator.data_source.sabio_rk.
Entry
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a compartment in the SABIO-RK database
-
created
[source]
-
cross_references
[source]
-
id
[source]
-
name
[source]
-
synonyms
[source]
-
-
class
datanator.data_source.sabio_rk.
Enzyme
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents an enzyme in the SABIO-RK database
-
subunits
[source]¶ list of subunits
Type: list
ofEnzymeSubunit
-
kinetic_laws
[source]¶ list of kinetic laws
Type: list
ofKineticLaw
-
kinetic_laws
[source]
-
molecular_weight
[source]
-
parameters
[source]
-
subunits
[source]
-
-
class
datanator.data_source.sabio_rk.
EnzymeSubunit
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents an enzyme in the SABIO-RK database
-
coefficient
[source]
-
enzyme
[source]
-
molecular_weight
[source]
-
sequence
[source]
-
-
class
datanator.data_source.sabio_rk.
KineticLaw
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a kinetic law in the SABIO-RK database
-
reactants
[source]¶ list of reactants
Type: list
ofReactionParticipant
-
products
[source]¶ list of products
Type: list
ofReactionParticipant
-
enzyme_compartment
[source]¶ compartment
Type: Compartment
-
modifiers
[source]¶ list of modifiers
Type: list
ofReactionParticipant
-
enzyme
[source]
-
enzyme_compartment
[source]
-
enzyme_type
[source]
-
equation
[source]
-
mechanism
[source]
-
media
[source]
-
modifiers
[source]
-
parameters
[source]
-
ph
[source]
-
products
[source]
-
reactants
[source]
-
references
[source]
-
taxon
[source]
-
taxon_variant
[source]
-
taxon_wildtype
[source]
-
temperature
[source]
-
tissue
[source]
-
-
class
datanator.data_source.sabio_rk.
Parameter
(**kwargs)[source]¶ Bases:
datanator.data_source.sabio_rk.Entry
Represents a parameter in the SABIO-RK database
-
kinetic_law
[source]¶ kinetic law
Type: KineticLaw
-
compartment
[source]¶ compartment
Type: Compartment
-
TYPES (:obj:`dict` of :obj:`int`
str
): dictionary of SBO terms and their canonical string symbols
-
UNITS (:obj:`dict` of :obj:`int`
str
): dictionary of SBO terms and their canonical units
-
compartment
[source]
-
compound
[source]
-
enzyme
[source]
-
error
[source]
-
kinetic_law
[source]
-
observed_error
[source]
-
observed_name
[source]
-
observed_type
[source]
-
observed_units
[source]
-
observed_value
[source]
-
type
[source]
-
units
[source]
-
value
[source]
-
-
class
datanator.data_source.sabio_rk.
ReactionParticipant
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents a participant in a SABIO-RK reaction
-
compartment
[source]¶ compartment
Type: Compartment
-
reactant_kinetic_law
[source]¶ kinetic law in which the participant appears as a reactant
Type: KineticLaw
-
product_kinetic_law
[source]¶ kinetic law in which the participant appears as a product
Type: KineticLaw
-
coefficient
[source]
-
compartment
[source]
-
compound
[source]
-
product_kinetic_law
[source]
-
reactant_kinetic_law
[source]
-
type
[source]
-
-
class
datanator.data_source.sabio_rk.
Resource
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Represents an external resource
-
kinetic_laws
[source]¶ kinetic laws
Type: list
ofKineticLaw
-
entries
[source]
-
id
[source]
-
kinetic_laws
[source]
-
namespace
[source]
-
-
class
datanator.data_source.sabio_rk.
SabioRk
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, webservice_batch_size=1, excel_batch_size=100, quilt_owner=None, quilt_package=None)[source]¶ Bases:
datanator.core.data_source.HttpDataSource
A local sqlite copy of the SABIO-RK database
-
webservice_batch_size
[source]¶ default size of batches to download kinetic information from the SABIO webservice. Note: this should be set to one because SABIO exports units incorrectly when multiple kinetic laws are requested
Type: int
-
excel_batch_size
[source]¶ default size of batches to download kinetic information from the SABIO Excel download service
Type: int
-
ENDPOINT_KINETIC_LAWS_SEARCH
[source]¶ URL to obtain a list of the ids of all of the kinetic laws in SABIO-Rk
Type: str
-
SKIP_KINETIC_LAW_IDS
[source]¶ IDs of kinetic laws that should be skipped (because they cannot contained errors and can’t be downloaded from SABIO)
Type: tuple
ofint
-
PUBCHEM_TRY_DELAY
[source]¶ delay in seconds between PubChem queries (to delay overloading the server)
Type: float
-
ENDPOINT_COMPOUNDS_PAGE
= 'http://sabiork.h-its.org/compdetails.jsp'[source]
-
ENDPOINT_DOMAINS
= {'sabio_rk': 'http://sabiork.h-its.org', 'uniprot': 'http://www.uniprot.org'}[source]¶
-
ENDPOINT_EXCEL_EXPORT
= 'http://sabiork.h-its.org/entry/exportToExcelCustomizable'[source]
-
ENDPOINT_KINETIC_LAWS_SEARCH
= 'http://sabiork.h-its.org/sabioRestWebServices/searchKineticLaws/entryIDs'[source]
-
ENDPOINT_WEBSERVICE
= 'http://sabiork.h-its.org/sabioRestWebServices/kineticLaws'[source]
-
PUBCHEM_MAX_TRIES
= 10[source]
-
PUBCHEM_TRY_DELAY
= 0.25[source]
-
SKIP_KINETIC_LAW_IDS
= (51286,)[source]
-
calc_enzyme_molecular_weights
(enzymes)[source]¶ Calculate the molecular weight of each enzyme
Parameters: enzymes ( list
ofEnzyme
) – list of enzymes
-
calc_stats
()[source]¶ Calculate statistics about SABIO-RK
Returns: list of list of statistics Return type: list
oflist
ofobj
-
create_compartment_from_sbml
(sbml)[source]¶ Add a compartment to the local sqlite database
Parameters: sbml ( libsbml.Compartment
) – SBML-representation of a compartmentReturns: compartment Return type: Compartment
-
create_cross_references_from_sbml
(sbml)[source]¶ Add cross references to the local sqlite database for an SBML object
Parameters: sbml ( libsbml.SBase
) – object in an SBML documentationReturns: list of resources Return type: list
ofResource
-
create_kinetic_law_from_sbml
(id, sbml, specie_properties, functions, units)[source]¶ Add a kinetic law to the local sqlite database
Parameters: - id (
int
) – identifier - sbml (
libsbml.KineticLaw
) – SBML-representation of a reaction - specie_properties (
dict
) –additional properties of the compounds/enzymes
- is_wildtype (
bool
): indicates if the enzyme is wildtype or mutant - variant (
str
): description of the variant of the eznyme - modifier_type (
str
): type of the enzyme (e.g. Modifier-Catalyst)
- is_wildtype (
:param functions (
dict
ofstr
:str
): dictionary of rate law equations (keys = IDs in SBML, values = equations) :param units (dict
ofstr
:str
): dictionary of units (keys = IDs in SBML, values = names)Returns: kinetic law Return type: KineticLaw
Raises: ValueError
– if the temperature is expressed in an unsupported unit- id (
-
create_kinetic_laws_from_sbml
(ids, sbml)[source]¶ Add kinetic laws defined in an SBML file to the local sqlite database
Parameters: - ids (
list
ofint
) – list kinetic law IDs - sbml (
str
) – SBML representation of one or more kinetic laws
Returns: list
ofKineticLaw
: list of kinetic lawslist
ofCompound
orEnzyme
: list of species (compounds or enzymes)list
ofCompartment
: list of compartments
Return type: tuple
- ids (
-
create_specie_from_sbml
(sbml)[source]¶ Add a species to the local sqlite database
Parameters: sbml ( libsbml.Species
) – SBML-representation of a compound or enzymeReturns: Return type: tuple
Raises: ValueError
– if a species is of an unsupported type (i.e. not a compound or enzyme)
-
export_stats
(stats, filename=None)[source]¶ Export statistics to an Excel workbook
Parameters: - stats (
list
oflist
ofobj
) – list of list of statistics - filename (
str
, optional) – path to export statistics
- stats (
-
get_parameter_by_properties
(kinetic_law, parameter_properties)[source]¶ Get the parameter of
kinetic_law
whose attribute values are equal to that ofparameter_properties
Parameters: - kinetic_law (
KineticLaw
) – kinetic law to find parameter of - parameter_properties (
dict
) – properties of parameter to find
Returns: parameter with attribute values equal to values of
parameter_properties
Return type: - kinetic_law (
-
get_specie_reference_from_sbml
(specie_id)[source]¶ Get the compound/enzyme associated with an SBML species by its ID
Parameters: specie_id ( str
) – ID of an SBML speciesReturns: Compound
orEnzyme
: compound or enzymeCompartment
: compartment
Return type: tuple
Raises: ValueError
– if the species is not a compound or enzyme, no species with id = specie_id exists, or no compartment with name = compartment_name exists
-
infer_compound_structures_from_names
(compounds)[source]¶ Try to use PubChem to infer the structure of compounds from their names
Notes: we don’t try look up structures from their cross references because SABIO has already gathered all structures from their cross references to ChEBI, KEGG, and PubChem
Parameters: compounds ( list
ofCompound
) – list of compounds
-
load_compounds
(compounds=None)[source]¶ Download information from SABIO-RK about all of the compounds stored in the local sqlite copy of SABIO-RK
Parameters: compounds ( list
ofCompound
) – list of compounds to downloadRaises: Error
– if an HTTP request fails
-
load_kinetic_law_ids
()[source]¶ Download the IDs of all of the kinetic laws stored in SABIO-RK
Returns: list of kinetic law IDs Return type: list
ofint
Raises: Error
– if an HTTP request fails or the expected number of kinetic laws is not returned
-
load_kinetic_laws
(ids)[source]¶ Download kinetic laws from SABIO-RK
Parameters: ids ( list
ofint
) – list of IDs of kinetic laws to downloadRaises: Error
– if an HTTP request fails
-
load_missing_enzyme_information_from_html
(ids)[source]¶ Loading enzyme subunit information from html
Parameters: ids ( list
ofint
) – list of IDs of kinetic laws to download
-
load_missing_kinetic_law_information_from_tsv
(ids)[source]¶ Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.
Parameters: ids ( list
ofint
) – list of IDs of kinetic laws to download
-
load_missing_kinetic_law_information_from_tsv_helper
(tsv)[source]¶ Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.
Note: this method is necessary because neither of SABIO’s SBML and Excel export methods provide all of the SABIO’s content.
Parameters: tsv ( str
) – TSV-formatted tableRaises: ValueError
– if a kinetic law or compartment is not contained in the local sqlite database
-
normalize_kinetic_laws
(ids)[source]¶ Normalize parameter values
Parameters: ids ( list
ofint
) – list of IDs of kinetic laws to download
-
normalize_parameter_value
(name, type, value, error, units, enzyme_molecular_weight)[source]¶ Parameters: - name (
str
) – parameter name - type (
int
) parameter type (SBO term id) – - value (
float
) – observed value - error (
float
) – observed error - units (
str
) – observed units - enzyme_molecular_weight (
float
) – enzyme molecular weight
Returns: - normalized name and
its type (SBO term), value, error, and units
Return type: tuple
ofstr
,int
,float
,float
,str
Raises: ValueError
– ifunits
is not a supported unit oftype
- name (
-
parse_complex_subunit_structure
(text)[source]¶ Parse the subunit structure of complex into a dictionary of subunit coefficients
Parameters: text ( str
) – subunit structure described with nested parenthesesReturns: dictionary of subunit coefficients Return type: dict
ofstr
,int
-
parse_enzyme_name
(sbml)[source]¶ Parse the name of an enzyme in SBML for the enzyme name, wild type status, and variant description that it contains.
Parameters: sbml ( str
) – enzyme name in SBMLReturns: str
: namebool
: ifTrue
, the enzyme is wild typestr
: variant
Return type: tuple
Raises: ValueError
– if the enzyme name is formatted in an unsupport format
-
4.1.1.5.23. datanator.data_source.sabio_rk_nosql module¶
- Parse SabioRk json files into MongoDB documents
- (json files acquired by running sqlite_to_json.py)
Author: | Zhouyang Lian <zhouyang.lian@familian.life> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2019-04-02 |
Copyright: | 2019, Karr Lab |
License: | MIT |
4.1.1.5.24. datanator.data_source.sqlite_to_json module¶
Converts tables in SQLite into json files .. attribute:: database
path to sqlite database
4.1.1.5.25. datanator.data_source.taxon_tree module¶
-
class
datanator.data_source.taxon_tree.
TaxonTree
(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶ Bases:
datanator.util.mongo_util.MongoUtil
-
parse_fullname_line
(line)[source]¶ Parses lines in file fullnamelineage.dmp and return elements in a list
-
parse_fullname_taxid
()[source]¶ Parse fullnamelineage.dmp and taxidlineage.dmp store in MongoDB Always run first before loading anything else (insert_one)
-
4.1.1.5.26. datanator.data_source.uniprot module¶
Downloads and parses the UnitProt database for protein-protein interactions
Author: | Saahith Pochiraju <saahith116@gmail.com> |
---|---|
Author: | Jonathan Karr <jonrkarr@gmail.com> |
Date: | 2018-08-15 |
Copyright: | 2017-2018, Karr Lab |
License: | MIT |
-
class
datanator.data_source.uniprot.
Uniprot
(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶
4.1.1.5.27. datanator.data_source.uniprot_nosql module¶
Author: | Zhouyang Lian <zhouyang.lian@familian.life> |
---|---|
Author: | Jonathan <jonrkarr@gmail.com> |
Date: | 2019-04-02 |
Copyright: | 2019, Karr Lab |
License: | MIT |