4.1.1.5. datanator.data_source package¶

4.1.1.5.1. Subpackages¶

4.1.1.5.2. Submodules¶

4.1.1.5.3. datanator.data_source.array_express module¶

Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT

class datanator.data_source.array_express.ArrayExpress(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS

list of IDs of datasets to exclude

type: list of str

ENDPOINT_DOMAINS = {'array_express': 'https://www.ebi.ac.uk/arrayexpress/json/v3/experiments'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

get_or_create_object(cls, **kwargs)[source]¶

Get the first instance of cls that has the property-values pairs described by kwargs, or create an instance of cls if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls: class :param **kwargs: values of the properties of the object

Returns:	instance of `cls` hat has the property-values pairs described by kwargs
Return type:	`Base`

load_content(test_url='')[source]¶: Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year: int, optional :param end_year: the last year to retrieve experiments for :type end_year: int, optional

load_experiment_metadata(test_url='')[source]¶

Get a list of accession identifiers for the experiments from the year start_year to year end_year :param start_year: the first year to retrieve experiment acession ids for :type start_year: int, optional :param end_year: the last year to retrieve experiment acession ids for :type end_year: int, optional

Returns:	list of experiment accession identifiers
Return type:	`list` of `str`

load_experiment_protocol(experiment, protocol_json)[source]¶: Load the protocols for an experiment :param experiment: experiment :type experiment: Experiment :param protocol_json: sample :type protocol_json: dict

load_experiment_protocols(experiment)[source]¶: Load the protocols for an experiment :param experiment: experiment :type experiment: Experiment

load_experiment_sample(experiment, sample_json, index)[source]¶: Load the samples for an experiment :param experiment: experiment :type experiment: Experiment :param sample_json: sample :type sample_json: dict :param index: index of the sample within the experiment :type index: int

load_experiment_samples(experiment)[source]¶: Load the samples for an experiment :param experiment: experiment :type experiment: Experiment

class datanator.data_source.array_express.Characteristic(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experimental characteristic .. attribute:: _id

unique id

type: int

category[source]¶

name of the characteristic (e.g. organism)

Type:	`str`

value[source]¶

value of characteristic (e.g. Mus musculus)

Type:	`str`

samples[source]¶

samples

Type:	`list` of `Sample`

category[source]

samples[source]

value[source]

class datanator.data_source.array_express.DataFormat(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a data format .. attribute:: _id

unique id

type: int

name[source]¶

name

Type:	`str`

bio_assay_data_cubes[source]¶

number of dimensions to the data

Type:	`int`

bio_assay_data_cubes[source]

experiments[source]¶

name[source]

class datanator.data_source.array_express.EnsemblInfo(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type: int

organism_strain[source]¶

the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)

Type:	`str`

url[source]¶

the download url for the CDNA file from ensembl

Type:	`str`

organism_strain[source]

ref_genome[source]¶

samples[source]¶

url[source]

class datanator.data_source.array_express.Experiment(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experiment .. attribute:: _id

unique id

type: int

id[source]¶

unique string identifier assigned by ArrayExpress

Type:	`str`

name[source]¶

name

Type:	`str`

name_2[source]¶

second name

Type:	`str`

description[source]¶

description

Type:	`str`

organisms[source]¶

list of organisms

Type:	`list` of `Organism`

types[source]¶

list of experiment types

Type:	`list` of `ExperimentType`

designs[source]¶

list of experimental designs

Type:	`list` of `ExperimentDesign`

submission_date[source]¶

submission date

Type:	`datetime.date`

release_date[source]¶

release date

Type:	`datetime.date`

data_formats[source]¶

list of data formats

Type:	`list` of `DataFormat`

read_type[source]¶

type of FASTQ files (in an RNA-Seq experiment)

Type:	`str`

has_fastq_files[source]¶

whether this experiment has FASTQ files or not

Type:	`bool`

data_formats[source]

description[source]

designs[source]

has_fastq_files[source]

id[source]

name[source]

name_2[source]

organisms[source]

protocols[source]¶

read_type[source]

release_date[source]

samples[source]¶

submission_date[source]

types[source]

class datanator.data_source.array_express.ExperimentDesign(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents and experimental design .. attribute:: _id

unique id

type: int

name[source]¶

name

Type:	`str`

experiments[source]¶

name[source]

class datanator.data_source.array_express.ExperimentType(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a type of experiment .. attribute:: _id

unique id

type: int

name[source]¶

name

Type:	`str`

experiments[source]¶

name[source]

class datanator.data_source.array_express.Extract(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an extract of a sample .. attribute:: _id

unique id

type: int

name[source]¶

name

Type:	`str`

samples[source]¶

list of samples

Type:	`list` of `Sample`

name[source]

samples[source]

class datanator.data_source.array_express.Organism(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an organism .. attribute:: _id

unique id

type: int

name[source]¶

name

Type:	`str`

experiments[source]¶

name[source]

ncbi_id[source]¶

class datanator.data_source.array_express.Protocol(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protocol for an experiment .. attribute:: _id

unique id

type: int

protocol_accession[source]¶

array express identifier for protocol

Type:	`str`

protocol_type[source]¶

the type of exerpimental protocol (e.g. normalization, extraction, etc.)

Type:	`list` of `Sample`

text[source]¶

description the protocol

Type:	`str`

performer[source]¶

name of the person who did the experiment

Type:	`str`

hardware[source]¶

hardware (usually detection instruments) used in protocol

Type:	`str`

software[source]¶

software (usually for analyzing and normalizing the data)

Type:	`str`

experiments[source]¶

list of experiments that performed this protocol

Type:	`list` of `Experiment`

experiments[source]

hardware[source]

performer[source]

protocol_accession[source]

protocol_type[source]

software[source]

text[source]

class datanator.data_source.array_express.Sample(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observed concentration .. attribute:: _id

unique id

type: int

experiment_id[source]¶: (int): the id of the experiment the samaple belongs in

experiment[source]¶

experiment that the sample belongs to

Type:	`Experiment`

index[source]¶

index of the sample within the experiment

Type:	`int`

name[source]¶

name of the source of the sample (this is used to identify the sample in arraya express)

Type:	`str`

assay[source]¶

name of the assay

Type:	`str`

ensembl_organism_strain[source]¶

the particular strain that relates to the ensembl reference genome (e.g. escherichia_coli_k12)

Type:	`str`

characteristics (:obj:`list` of: obj:`Characteristic’): characteristics

variables[source]¶

name of the assay

Type:	`list` of Variable'): variablesassay (:obj:`str

fastq_urls[source]¶

name of the assay

Type:	`list` of Url'): variablesassay (:obj:`str

read_type[source]¶

the nature of the FASTQ file reads. Either ‘single’, ‘multiple’, or ‘parallel’

Type:	`str`

ensembl_info (:obj:`list` of: obj:`Variable’): informtation about the ensembl reference genome

full_strain_specificity[source]¶

whether or not ensembl reference genome matches the full strain specifity recoreded in array express

Type:	`bool`

assay[source]

characteristics[source]¶

ensembl_info[source]¶

ensembl_organism_strain[source]

experiment[source]

experiment_id[source]

extracts[source]¶

fastq_urls[source]

full_strain_specificity[source]

index[source]

name[source]

read_type[source]

variables[source]

class datanator.data_source.array_express.Url(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type: int

url[source]¶

the text of the url

Type:	`str`

samples[source]¶

samples

Type:	`list` of `Sample`

samples[source]

url[source]

class datanator.data_source.array_express.Variable(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an experimental variable .. attribute:: _id

unique id

type: int

name[source]¶

name of the variable (e.g. genotype)

Type:	`str`

value[source]¶

value of variable (e.g control)

Type:	`str`

unit[source]¶

units of value (e.g control). This field is not always filled.

Type:	`str`

samples[source]¶

samples

Type:	`list` of `Sample`

name[source]

samples[source]

unit[source]

value[source]

4.1.1.5.4. datanator.data_source.bio_portal module¶

Downloads ontologies from BioPortal

Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2017-05-23
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.bio_portal.BioPortal(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, flask=False, quilt_owner=None, quilt_package=None, ontologies=None)[source]¶

Bases: datanator.core.data_source.CachedDataSource

Loads ontologies from BioPortal

ontologies[source]¶

list of filenames of ontologies

Type:	`list`

BIOPORTAL_ENDPOINT[source]¶

URL pattern to download ontologies

Type:	`str`

CCO_DOWNLOAD_URL[source]¶

URL to download CCO ontology

Type:	`str`

BIOPORTAL_ENDPOINT = 'http://data.bioontology.org'[source]

CCO_DOWNLOAD_URL = 'http://www.bio.ntnu.no/ontology/CCO/cco.obo'[source]

DEFAULT_ONTOLOGIES = ('BTO.obo', 'CCO.obo', 'CL.owl', 'DOID.obo', 'EFO.owl', 'FMA.owl', 'GO.obo', 'PW.obo', 'SBO.obo')[source]¶

clear_content()[source]¶: Clear the content of the sqlite database (i.e. drop and recreate all tables).

download_ontologies()[source]¶: :param list of str: list of ontologies

download_ontology(id)[source]¶

Download an ontology from BioPortal

Parameters:	id (`str`) – identifier of the ontology in BioPortal

get_api_key()[source]¶

Get BioPortal API key

Returns:	key
Return type:	`str`

get_engine()[source]¶

Get an engine for the sqlite database. If the database doesn’t exist, initialize its structure.

Returns:	database engine
Return type:	`sqlalchemy.engine.Engine`

get_ontologies()[source]¶

Get list of ontologies

Returns:	list of ontologies
Return type:	`list`

get_ontologies_filename()[source]¶

Get the local filename to store a list of the ontologies

Returns:	filename
Return type:	`str`

get_ontology(id)[source]¶

Load ontology and download the ontology from BioPortal if neccessary

Parameters:	id (`str`) – identifier of the ontology in BioPortal
Returns:	ontology
Return type:	`pronto.Ontology`

get_ontology_filename(id)[source]¶

Get the local filename to store a copy of an ontology

Parameters:	id (`str`) – identifier of the ontology in BioPortal
Returns:	filename
Return type:	`str`

get_paths_to_backup(download=False)[source]¶

Get a list of the files to backup/unpack

Parameters:	download (`bool`, optional) – if `True`, prepare the files for uploading
Returns:	list of paths to backup
Return type:	`list` of `str`

get_session()[source]¶

Get a session for the sqlite database

Returns:	database session
Return type:	`sqlalchemy.orm.session.Session`

load_content()[source]¶: Load the content of the local copy of the data source

quilt_package = None[source]¶: load content as necessary

4.1.1.5.5. datanator.data_source.corum module¶

This codebase takes CORUM protein complexes database and formats it to an SQL database

Author:	Balazs Szigeti <balazs.szigeti@mssm.edu>
Author:	Saahith Pochiraju <saahith116@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2018-08-13
Copyright:	2017-2018, Karr Lab
License:	MIT

class datanator.data_source.corum.Base(**kwargs)[source]¶

Bases: object

The most base type

metadata = MetaData(bind=None)[source]¶

class datanator.data_source.corum.Complex(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein complex .. attribute:: observation_id

ID of the observation

type: int

complex_id[source]¶

ID of the complex

Type:	`int`

complex_name[source]¶

Complex name

Type:	`str`

go_id[source]¶

GO funtinal annotation

Type:	`str`

go_dsc[source]¶

Description of the annotation

Type:	`str`

funcat_id[source]¶

FUNCAT functional annotation

Type:	`str`

funcat_dsc[source]¶

Description of the annotation

Type:	`str`

su_cmt[source]¶

Subunit comments

Type:	`str`

complex_cmt[source]¶

Compex comments

Type:	`str`

disease_cmt[source]¶

Disease comments

Type:	`str`

complex_cmt[source]

complex_id[source]

complex_name[source]

disease_cmt[source]

funcat_dsc[source]

funcat_id[source]

go_dsc[source]

go_id[source]

observation[source]¶

observation_id[source]¶

su_cmt[source]

subunits[source]¶

class datanator.data_source.corum.Corum(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the CORUM database

ENDPOINT_DOMAINS = {'corum': 'https://mips.helmholtz-muenchen.de/corum/download/allComplexes.txt.zip'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶: Collect and parse all data from CORUM website and add to SQLite database

class datanator.data_source.corum.Observation(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observation (entries in the original DB) .. attribute:: id

internal ID for the observation entry

type: int

cell line

cell line (in whcih the measurement was done)

Type:	`str`

pur_method[source]¶

purification method

Type:	`str`

pubmed_id[source]¶

Pubmed ID of the associated publication

Type:	`str`

taxon_ncbi_id[source]¶

NCBI taxonomy id of the organism

Type:	`str`

cell_line[source]¶

complex[source]¶

id[source]¶

pubmed_id[source]

pur_method[source]

taxon[source]¶

taxon_ncbi_id[source]

class datanator.data_source.corum.Subunit(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents subunits of complexes .. attribute:: id

Internal subunit ID

type: int

complex_id[source]¶

ID of the complex to which the subunit belongs

Type:	`int`

su_uniprot[source]¶

UNIPROT ID

Type:	`int`

su_entrezs[source]¶

ENTREZS ID

Type:	`int`

protein_name[source]¶

Name of the protein

Type:	`str`

gene_name[source]¶

Gene name

Type:	`str`

gene_syn[source]¶

Synonyms of the gene name

Type:	`str`

complex[source]¶

complex_id[source]

gene_name[source]

gene_syn[source]

id[source]¶

protein_name[source]

su_entrezs[source]

su_uniprot[source]

class datanator.data_source.corum.Taxon(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a species .. attribute:: ncbi_id

NCBI id

type: int

species_name[source]¶

name and possibly genetic variant

Type:	`str`

ncbi_id[source]¶

observation[source]¶

swissprot_id[source]¶

datanator.data_source.corum.correct_protein_name_list(lst)[source]¶

Correct a list of protein names with incorrect separators involving ‘[Cleaved into: …]’

Parameters:	lst (`str`) – list of protein names with incorrect separators
Returns:	corrected list of protein names
Return type:	`str`

datanator.data_source.corum.parse_list(str_lst)[source]¶

Parse a semicolon-separated list of strings into a list, ignoring semicolons that are inside square brackets

Parameters:	str_lst (`str`) – semicolon-separated encoding of a list
Returns:	list
Return type:	`list` of `str`

4.1.1.5.6. datanator.data_source.corum_nosql module¶

class datanator.data_source.corum_nosql.CorumNoSQL(MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', cache_dirname=None)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

load_content()[source]¶: Collect and parse all data from CORUM website into JSON files and add to NoSQL database

datanator.data_source.corum_nosql.correct_protein_name_list(lst)[source]¶

Correct a list of protein names with incorrect separators involving ‘[Cleaved into: …]’

Parameters:	lst (`str`) – list of protein names with incorrect separators
Returns:	corrected list of protein names
Return type:	`str`

datanator.data_source.corum_nosql.main()[source]¶

datanator.data_source.corum_nosql.parse_list(str_lst)[source]¶

Parse a semicolon-separated list of strings into a list, ignoring semicolons that are inside square brackets

Parameters:	str_lst (`str`) – semicolon-separated encoding of a list
Returns:	list
Return type:	`list` of `str`

4.1.1.5.7. datanator.data_source.cron_aggregate module¶

datanator.data_source.cron_aggregate.write_sabio_json(self, cache_dirname)[source]¶

4.1.1.5.8. datanator.data_source.ecmdb module¶

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2017-05-04
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.ecmdb.Compartment(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a compartment

name[source]¶

name

Type:	`str`

compounds[source]¶

list of compounds

Type:	`list` of `Compound`

compounds[source]

name[source]

class datanator.data_source.ecmdb.Compound(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an ECMDB entry

id[source]¶

ECMDB identifier

Type:	`str`

name[source]¶

name

Type:	`str`

synonyms[source]¶

synonyms

Type:	`list` of `Synonym`

description[source]¶

description

Type:	`str`

structure[source]¶

structure in InChI format

Type:	`str`

_structure_formula_connectivity[source]¶

empiral formula and connectivity InChI layers; used to quickly search for compound structures

Type:	`str`

compartments[source]¶

compartments

Type:	`list` of `Compartment`

concentrations[source]¶

concentrations

Type:	`list` of `Concentration`

cross_references[source]¶

cross references

Type:	`list` of `Resources`

comment[source]¶

internal ECMDB comments about the entry

Type:	`str`

created[source]¶

time that the entry was created in ECMDB

Type:	`datetime.datetime`

updated[source]¶

time that the entry was last updated in ECMDB

Type:	`datetime.datetime`

downloaded[source]¶

time that the entry was downloaded from ECMDB

Type:	`datetime.datetime`

comment[source]

compartments[source]

concentrations[source]

created[source]

cross_references[source]

description[source]

downloaded[source]

id[source]

name[source]

structure[source]

synonyms[source]

updated[source]

class datanator.data_source.ecmdb.Concentration(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an observed concentration

compound[source]¶

compound

Type:	`Compound`

value[source]¶

value in uM

Type:	`float`

error[source]¶

error in uM

Type:	`float`

strain[source]¶

observed strain

Type:	`str`

growth_status[source]¶

observed growth status (e.g. exponential phase, log phase, etc.)

Type:	`str`

media[source]¶

observed media

Type:	`str`

temperaturer[source]¶

temperature in C

Type:	`float`

growth_system[source]¶

observed growth system (e.g. chemostat, 384 well plate, etc.)

Type:	`str`

references[source]¶

list of references

Type:	`list` of `Resource`

compound[source]

compound_id[source]¶

error[source]

growth_status[source]

growth_system[source]

media[source]

references[source]

strain[source]

temperature[source]¶

value[source]

class datanator.data_source.ecmdb.Ecmdb(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ECMDB database

DOWNLOAD_INDEX_URL[source]¶

URL to download an index of ECMDB

Type:	`str`

DOWNLOAD_COMPOUND_URL[source]¶

URL pattern to download an ECMDB compound entry

Type:	`str`

DOWNLOAD_COMPOUND_STRUCTURE_URL = 'http://ecmdb.ca/structures/compounds/{}.inchi'[source]¶

DOWNLOAD_COMPOUND_URL = 'http://ecmdb.ca/compounds/{}.xml'[source]

DOWNLOAD_INDEX_URL = 'http://ecmdb.ca/download/ecmdb.json.zip'[source]

ENDPOINT_DOMAINS = {'ecmdb': 'http://ecmdb.ca'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

get_node_children(node, children_name)[source]¶

Get the children of an XML node

Parameters:	node (`jxmlease.cdatanode.XMLNode`) – XML node children_name (`str`) – tag names of the desired children
Returns:	list of child nodes
Return type:	`list` of `XMLNode`

get_node_text(node)[source]¶

Get the next of a XML node

Parameters:	node (`jxmlease.cdatanode.XMLCDATANode` or `str`) – XML node or its text
Returns:	text of the node
Return type:	`str`

load_content()[source]¶: Download the content of ECMDB and store it to a local sqlite database.

class datanator.data_source.ecmdb.Resource(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an external resource

namespace[source]¶

external namespace

Type:	`str`

id[source]¶

external identifier

Type:	`str`

compounds[source]¶

compounds

Type:	`list` of `Compound`

concentrations[source]¶

concentrations

Type:	`list` of `Concentration`

compounds[source]

concentrations[source]

id[source]

namespace[source]

class datanator.data_source.ecmdb.Synonym(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a synonym

Parameters:	name (`str`) – name compounds (`list` of `Compound`) – list of compounds

compounds[source]¶

name[source]¶

4.1.1.5.9. datanator.data_source.ensembl module¶

Downloads and parses the ArrayExpress database :Author: Yosef Roth <yosefdroth@gmail.com> :Author: Jonathan Karr <jonrkarr@gmail.com> :Date: 2017-08-16 :Copyright: 2017, Karr Lab :License: MIT

class datanator.data_source.ensembl.GeneEntry(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

exp_id[source]¶

identifiers[source]¶

name[source]¶

organism[source]¶

samp_id[source]¶

class datanator.data_source.ensembl.GeneIdentifier(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a url .. attribute:: _id

unique id

type: int

category[source]¶

name of the characteristic (e.g. organism)

Type:	`str`

value[source]¶

value of characteristic (e.g. Mus musculus)

Type:	`str`

samples[source]¶

samples

Type:	`list` of `Sample`

name[source]¶

samples[source]

class datanator.data_source.ensembl.GetGenes(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the ArrayExpress database .. attribute:: EXCLUDED_DATASET_IDS

list of IDs of datasets to exclude

type: list of str

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶: Downloads all medatata from array exrpess on their samples and experiments. The metadata is saved as the text file. Within the text files, the data is stored as a JSON object. :param start_year: the first year to retrieve experiments for :type start_year: int, optional :param end_year: the last year to retrieve experiments for :type end_year: int, optional

4.1.1.5.10. datanator.data_source.ezyme module¶

Ezyme

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2017-05-04
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.ezyme.Ezyme[source]¶

Bases: datanator.core.data_source.WebserviceDataSource

Utilities for using Ezyme to predict EC numbers.

See Ezyme (http://www.genome.jp/tools-bin/predict_reaction) for more information.

REQUEST_URL[source]¶

URL to request Ezyme EC number prediction

Type:	`str`

RETRIEVAL_URL[source]¶

URL to retrieve Ezyme results

Type:	`str`

EC_PREDICTION_URL[source]¶

URL where predicted EC number is encoded

Type:	`str`

EC_PREDICTION_URL = 'http://www.genome.jp/kegg-bin/get_htext?htext=ko01000.keg&query='[source]

ENDPOINT_DOMAINS = {'ezyme': 'http://www.genome.jp'}[source]¶

REQUEST_URL = 'http://www.genome.jp/tools-bin/predict_view'[source]

RETRIEVAL_URL = 'http://www.genome.jp/tools-bin/e-zyme2/result.cgi'[source]

run(reaction)[source]¶

Use Ezyme to predict the first three digits of the EC number of a reaction.

:param data_model.Reaction: reaction

Returns:	ranked list of predicted EC numbers and their scores or `None` if one or more participant doesn’t have a defined structure
Return type:	`list` of `EzymeResult` or `None`

class datanator.data_source.ezyme.EzymeResult(ec_number, score)[source]¶

Bases: object

Represents a predicted EC number

ec_number[source]¶

EC number

Type:	`str`

score[source]¶

score

Type:	`float`

4.1.1.5.11. datanator.data_source.intact module¶

Downloads and parses the IntAct database of protein-protein interactions

Author:	Saahith Pochiraju <saahith116@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2018-08-13
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.intact.IntAct(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.FtpDataSource

A local SQLite copy of the IntAct database

ENDPOINT_DOMAINS = {'complextab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/complex/current/complextab/', 'psimitab': 'ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psimitab/intact_negative.txt'}[source]¶

add_complexes()[source]¶: Parse complexes from data and add complexes to SQLite database

add_interactions()[source]¶: Parse interactions from data and add interactions to SQLite database

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

download_content()[source]¶: Download data from FTP server

find_between(string, first, last)[source]¶

Get the substring between the first occurrence of the substring first and the last occurrence of the substring last

Parameters:

string (str) – string
first (str) – starting substring
last (str) – ending substring

Returns:

substring between the first occurrence of the substring first and the: last occurrence of the substring :obj:`last

Return type:

str

find_between_psi_mi_parentheses(string)[source]¶

Find the text between parentheses in values of psi-mi key-value pairs

Parameters:	string (`str`) – string
Returns:	substring between the first occurrence of the substring `first` and the last occurrence of the substring :obj:`last
Return type:	`str`

find_protein_gene(interactor, alias)[source]¶

Parse the protein and gene identifiers from key-value pairs of interactors and their aliases

Parameters:	interactor (`str`) – key-value pairs of interactor alias (`str`) – key-value pairs of the alias of the interactor
Returns:	protein identifier `str`: gene identifier
Return type:	`str`

find_pubmed_id(string)[source]¶

Parse PubMed identifier from annotated key-value pair of publication type-identifier

Parameters:	string (`str`) – key-value pair of publication type-identifier
Returns:	PubMed identifier
Return type:	`str`

get_paths_to_backup(download=False)[source]¶

Get a list of the files to backup/unpack

Parameters:	download (`bool`, optional) – if `True`, prepare the files for uploading
Returns:	list of paths to backup
Return type:	`list` of `str`

load_content()[source]¶: Load the content of the local copy of the data source

split_colon(string)[source]¶

Split a string into substrings separated by ‘:’

Parameters:	string (`str`) – string
Returns:	substring separated by ‘:’
Return type:	`list`

split_line(string)[source]¶

Split a string into substrings separated by ‘|’

Parameters:	string (`str`) – string
Returns:	substring separated by ‘\|’
Return type:	`list`

class datanator.data_source.intact.ProteinComplex(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein complexes from the IntAct database

identifier[source]¶

Type:	`str`

name[source]¶

Type:	`str`

ncbi[source]¶

Type:	`str`

subunits[source]¶

Type:	`str`

evidence[source]¶

Type:	`str`

go_annot[source]¶

Type:	`str`

desc[source]¶

Type:	`str`

source[source]¶

Type:	`str`

desc[source]

evidence[source]

go_annot[source]

identifier[source]

name[source]

ncbi[source]

source[source]

subunits[source]

class datanator.data_source.intact.ProteinInteraction(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein interactions in from the IntAct database

Index[source]¶

Index of the DB

Type:	`int`

interactor_a[source]¶

represents participant A

Type:	`str`

interactor_b[source]¶

represents participant B

Type:	`str`

publications[source]¶

resource

Type:	`str`

interaction[source]¶

interaction ID

Type:	`str`

feature_a[source]¶

binding site of participant A

Type:	`str`

feature_b[source]¶

binding site of participant B

Type:	`str`

stoich_a[source]¶

stoichiometry of participant A

Type:	`str`

stoich_b[source]¶

stoichiometry of participant B

Type:	`str`

confidence[source]¶

feature_a[source]

feature_b[source]

gene_a[source]¶

gene_b[source]¶

index[source]¶

interaction_id[source]¶

interaction_type[source]¶

method[source]¶

protein_a[source]¶

protein_b[source]¶

publication[source]¶

publication_author[source]¶

role_a[source]¶

role_b[source]¶

stoich_a[source]

stoich_b[source]

type_a[source]¶

type_b[source]¶

4.1.1.5.12. datanator.data_source.intact_nosql module¶

Downloads and parses the IntAct database of protein-protein interactions

class datanator.data_source.intact_nosql.IntActNoSQL(cache_dirname=None, MongoDB=None, db=None, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

A local MongoDB copy of the IntAct database

add_complexes()[source]¶: Parse complexes from data and add complexes to MongoDB

add_interactions()[source]¶: Parse interactions from data and add interactions to SQLite database

download_content()[source]¶: Download data from FTP server

find_between(string, first, last)[source]¶

Get the substring between the first occurrence of the substring first and the last occurrence of the substring last

Parameters:

string (str) – string
first (str) – starting substring
last (str) – ending substring

Returns:

substring between the first occurrence of the substring first and the: last occurrence of the substring :obj:`last

Return type:

str

find_between_psi_mi_parentheses(string)[source]¶

Find the text between parentheses in values of psi-mi key-value pairs

Parameters:	string (`str`) – string
Returns:	substring between the first occurrence of the substring `first` and the last occurrence of the substring :obj:`last
Return type:	`str`

find_protein_gene(interactor, alias)[source]¶

Parse the protein and gene identifiers from key-value pairs of interactors and their aliases

Parameters:	interactor (`str`) – key-value pairs of interactor alias (`str`) – key-value pairs of the alias of the interactor
Returns:	protein identifier `str`: gene identifier
Return type:	`str`

find_pubmed_id(string)[source]¶

Parse PubMed identifier from annotated key-value pair of publication type-identifier

Parameters:	string (`str`) – key-value pair of publication type-identifier
Returns:	PubMed identifier
Return type:	`str`

load_content()[source]¶: Load the content of the local copy of the data source

split_colon(string)[source]¶

Split a string into substrings separated by ‘:’

Parameters:	string (`str`) – string
Returns:	substring separated by ‘:’
Return type:	`list`

split_line(string)[source]¶

Split a string into substrings separated by ‘|’

Parameters:	string (`str`) – string
Returns:	substring separated by ‘\|’
Return type:	`list`

4.1.1.5.13. datanator.data_source.jaspar module¶

This module downloads the JASPAR database of transcription factor binding motifs (http://jaspar.genereg.net/) via a seris of text files, parses them, and stores them in an SQLlite database.

Author:	Saahith Pochiraju <saahith116@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2017-08-01
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.jaspar.Annotation(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]¶

TAG[source]¶

VAL[source]¶

class datanator.data_source.jaspar.Data(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]¶

col[source]¶

row[source]¶

val[source]¶

class datanator.data_source.jaspar.Jaspar(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local SQLite copy of the JASPAR database of transcription factor binding profiles

ENDPOINT_DOMAINS = {'jaspar': 'http://jaspar.genereg.net/download/database/JASPAR2018.sqlite.tar.gz'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶: Load the content of the local copy of the data source

class datanator.data_source.jaspar.Matrix(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

BASE_ID[source]¶

COLLECTION[source]¶

ID[source]¶

NAME[source]¶

VERSION[source]¶

class datanator.data_source.jaspar.Protein(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ACC[source]¶

ID[source]¶

class datanator.data_source.jaspar.Species(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ID[source]¶

TAX_ID[source]¶

class datanator.data_source.jaspar.Taxon(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

SPECIES[source]¶

TAX_ID[source]¶

class datanator.data_source.jaspar.TaxonExtension(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

NAME[source]¶

TAX_ID[source]¶

class datanator.data_source.jaspar.Tffm(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

BASE_ID[source]¶

EXPERIMENT_NAME[source]¶

ID[source]¶

LOG_P_1ST_ORDER[source]¶

LOG_P_DETAILED[source]¶

MATRIX_BASE_ID[source]¶

MATRIX_VERSION[source]¶

NAME[source]¶

VERSION[source]¶

4.1.1.5.14. datanator.data_source.kegg module¶

class datanator.data_source.kegg.Kegg(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the KEGG Ontology

ENDPOINT_DOMAINS = {'kegg': ''}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶: Load the content of the local copy of the data source

4.1.1.5.15. datanator.data_source.kegg_orthology module¶

class datanator.data_source.kegg_orthology.KeggOrthology(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

download_ko(name)[source]¶

load_content()[source]¶: Load kegg_orthologs into MongoDB

parse_definition(line)[source]¶

Definition line could be something as follows:: ” fructose-bisphosphate aldolase / 6-deoxy-5-ketofructose 1-phosphate synthase [NADP…] [EC:4.1.2.13 2.2.1.11]

“: EC code can be optional

parse_ko_txt(filename)[source]¶: Parse kegg_ortho txt file into dictionary object

4.1.1.5.16. datanator.data_source.kegg_reaction_class module¶

class datanator.data_source.kegg_reaction_class.KeggReaction(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

download_rxn(name)[source]¶

download_rxn_cls(cls)[source]¶

load_content()[source]¶: Load kegg_reactions into MongoDB

parse_rc_multiline(lines)[source]¶

Input:

DEFINITION C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y: N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y … … O1a-O2x:*-C1z:C1b-C1x

Output:

[C1y-C2y:-:C1b+C8y+N1y-C1b+C8y+N2y, N1y-N2y:-:C1a+C1x+C1y-C1a+C1x+C2y, …]

parse_rc_orthology(lines)[source]¶

Input:: ORTHOLOGY K00260 glutamate dehydrogenase [EC:1.4.1.2] K00261 glutamate dehydrogenase (NAD(P)+) [EC:1.4.1.3] K00262 glutamate dehydrogenase (NADP+) [EC:1.4.1.4] K00263 leucine dehydrogenase [EC:1.4.1.9] … K13547 L-glutamine:2-deoxy-scyllo-inosose/3-amino-2,3-dideoxy-scyllo-inosose aminotransferase [EC:2.6.1.100 2.6.1.101] ..
Output: [K00260, K00261, …]

parse_root_json()[source]¶: Parse root json file and return reaction classes

parse_rxn_cls_txt(filename)[source]¶: Parse kegg_ortho txt file into dictionary object categories = [‘ENTRY’, ‘DEFINITION’, ‘RPAIR’, ‘REACTION’,

‘ENZYME’, ‘PATHWAY’, ‘ORTHOLOGY’]

datanator.data_source.kegg_reaction_class.main()[source]¶

4.1.1.5.17. datanator.data_source.metabolite_nosql module¶

Author:	Zhouyang Lian <zhouyang.lian@familian.life>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2019-04-02
Copyright:	2019, Karr Lab
License:	MIT

class datanator.data_source.metabolite_nosql.MetaboliteNoSQL(output_directory, source, MongoDB, db, verbose=True, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

Loads metabolite information into mongodb and output documents as JSON files for each metabolite Attribuites:

source: source database e.g. ‘ecmdb’ ‘ymdb’ MongoDB: mongodb server address e.g. ‘mongodb://localhost:27017/’ max_entries: maximum number of documents to be processed output_direcotory: directory in which JSON files will be stored.

write_to_json()[source]¶

4.1.1.5.18. datanator.data_source.metabolites_meta_collection module¶

class datanator.data_source.metabolites_meta_collection.MetabolitesMeta(cache_dirname=None, MongoDB=None, replicaSet=None, db=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', meta_loc=None)[source]¶

Bases: datanator.core.query_nosql.QuerySabio

meta_loc: database location to save the meta collection

fill_metabolite_fields(fields=None, collection_src=None, collection_des=None)[source]¶

Fill in values of fields of interest from metabolite collection: ecmdb or ymdb

Args:

fileds: list of fields of interest collection_src: collection in which query will be done collection_des: collection in which result will be updated

load_content()[source]¶

datanator.data_source.metabolites_meta_collection.main()[source]¶

4.1.1.5.19. datanator.data_source.pax module¶

This codebase takes the txt files of the PaxDB protein abundance database and inserts them into an SQL database

define_tables.py - defines the python classes corresponding to the tables in the resulting SQL database

Author:	Balazs Szigeti <balazs.szigeti@mssm.edu>
Author:	Saahith Pochiraju <saahith116@gmail.com>
Date:	2017 June 3
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.pax.Base(**kwargs)[source]¶

Bases: object

The most base type

metadata = MetaData(bind=None)[source]¶

class datanator.data_source.pax.Dataset(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a given dataset (typically results form a single paper) .. attribute:: ncbi_id

NCBI id - linked to the ‘taxon’ table

type: int

publication[source]¶

URL of the corresponding publication

Type:	`str`

file_name[source]¶

the name of text file corresponding to the dataset

Type:	`str`

score[source]¶

PaxDb’s internal quality score

Type:	`flt`

weight[source]¶

TBA

Type:	`int`

coverage[source]¶

what percentage of the genome is coevred by the datatset

Type:	`int`

coverage[source]

file_name[source]

id[source]¶

observation[source]¶

publication[source]

score[source]

taxon[source]¶

taxon_ncbi_id[source]¶

weight[source]

class datanator.data_source.pax.Observation(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein .. attribute:: protein_id

PaxDB’s internal numerical protein ID

type: int

dataset_id[source]¶

ID of the database - linked to the ‘dataset’ table

Type:	`int`

abundance[source]¶

Normalized abudnance of the protein

Type:	`flt`

abundance[source]

dataset[source]¶

dataset_id[source]

id[source]¶

protein[source]¶

protein_id[source]¶

class datanator.data_source.pax.Pax(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the Pax database

ENDPOINT_DOMAINS = {'pax': 'https://pax-db.org/downloads/4.1/datasets/paxdb-abundance-files-v4.1.zip', 'pax_protein': 'http://pax-db.org/downloads/latest/paxdb-uniprot-links-v4.1.zip'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶

Collects and Parses all data from Pax DB website and adds to SQLlite DB

Parameters:	req (`requests object`) – Requests session object

parse_paxDB_files()[source]¶

This function parses pax DB files and adds them to the SQL database .. attribute:: session (

obj:) : SQLalchemy object

file_id[source]¶

internal ID of the file

Type:	`str`

data_files[source]¶

list of the files to be processed

Type:	`str`

data_folder[source]¶

root folder of the database

Type:	`str`

class datanator.data_source.pax.Protein(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a protein .. attribute:: protein_id

PaxDB’s internal numerical protein ID

type: int

string_id[source]¶

Ensembl ID of protein

Type:	`str`

observation[source]¶

protein_id[source]¶

string_id[source]

uniprot_id[source]¶

class datanator.data_source.pax.Taxon(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a species .. attribute:: ncbi_id

NCBI id

type: int

species_name[source]¶

name and possibly genetic variant

Type:	`str`

datasets[source]¶

ncbi_id[source]¶

species_name[source]

datanator.data_source.pax.find_files(path)[source]¶

Scan a directory (and its subdirectories) for files and sort by ncbi_id

Parameters:	path (`str`) – Path containing the data_files
Returns:	list of files to add to DB
Return type:	`list`

4.1.1.5.20. datanator.data_source.pax_nosql module¶

class datanator.data_source.pax_nosql.PaxNoSQL(cache_dirname, MongoDB, db, verbose=False, max_entries=inf, username=None, password=None, authSource='admin', replicaSet=None)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

load_content()[source]¶: Collects and Parses all data from Pax DB website and adds to MongoDB

parse_paxDB_files()[source]¶: This function parses pax DB files and adds them to the NoSQL database

datanator.data_source.pax_nosql.find_files(path)[source]¶

Scan a directory (and its subdirectories) for files and sort by ncbi_id

Parameters:	path (`str`) – Path containing the data_files
Returns:	list of files to add to DB
Return type:	`list`

4.1.1.5.21. datanator.data_source.refseq module¶

import pprint from Bio import SeqIO import datetime import dateutil.parser import pkg_resources import sqlalchemy import sqlalchemy.ext.declarative import sqlalchemy.orm from datanator.core import data_source

class datanator.data_source.refseq.EcNumber(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ec_number[source]¶

gene[source]¶

class datanator.data_source.refseq.Gene(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

ec_numbers[source]¶

essentiality[source]¶

gene_synonyms[source]¶

id[source]¶

identifiers[source]¶

location[source]¶

locus_tag[source]¶

name[source]¶

qualifiers[source]¶

ref_genome_version[source]¶

referenceGenome[source]¶

class datanator.data_source.refseq.GeneSynonym(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]¶

name[source]¶

class datanator.data_source.refseq.Identifier(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]¶

name[source]¶

namespace[source]¶

class datanator.data_source.refseq.Location(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

end[source]¶

gene[source]¶

nofuzzy_end[source]¶

nofuzzy_start[source]¶

ref[source]¶

ref_db[source]¶

start[source]¶

strand[source]¶

class datanator.data_source.refseq.Qualifier(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

gene[source]¶

key[source]¶

value[source]¶

class datanator.data_source.refseq.ReferenceGenome(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

accessions[source]¶

genes[source]¶

organism[source]¶

version[source]¶

class datanator.data_source.refseq.ReferenceGenomeAccession(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

id[source]¶

referenceGenome[source]¶

class datanator.data_source.refseq.Refseq(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=False, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

find_nth(haystack, needle, n)[source]¶

get_json_ends(tree)[source]¶

get_or_create_object(cls, **kwargs)[source]¶

Get the first instance of cls that has the property-values pairs described by kwargs, or create an instance of cls if there is no instance with the property-values pairs described by kwargs :param cls: type of object to find or create :type cls: class :param **kwargs: values of the properties of the object

Returns:	instance of `cls` hat has the property-values pairs described by kwargs
Return type:	`Base`

get_paths_to_backup(download=False)[source]¶

Get a list of the files to backup/unpack

Parameters:	download (`bool`, optional) – if `True`, prepare the files for uploading
Returns:	list of paths to backup
Return type:	`list` of `str`

get_ref_seq_url(org_symbol)[source]¶

load_content(list_bio_seqio_objects)[source]¶: Load the content of the local copy of the data source

upload_data_from_kegg_org_symbol(kegg_org_symbol)[source]¶

upload_ref_seq_for_all_prokaryotic_kegg_org()[source]¶

datanator.data_source.refseq.create_orm(class_1, class_2)[source]¶

4.1.1.5.22. datanator.data_source.sabio_rk module¶

Author:	Yosef Roth <yosefdroth@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2017-05-04
Copyright:	2017, Karr Lab
License:	MIT

class datanator.data_source.sabio_rk.Compartment(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents a compartment in the SABIO-RK database

kinetic_laws[source]¶

list of kinetic laws

Type:	`list` of `KineticLaw`

created[source]¶

cross_references[source]¶

id[source]¶

kinetic_laws[source]

modified[source]¶

name[source]¶

parameters[source]¶

reaction_participants[source]¶

synonyms[source]¶

class datanator.data_source.sabio_rk.Compound(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents a compound in the SABIO-RK database

_is_name_ambiguous[source]¶

if True, the currently stored compound name should not be trusted because multiple names for the same compound have been discovered. The consensus name must be obtained using download_compounds

Type:	`bool`

structures[source]¶

structures

Type:	`list` of `CompoundStructure`

reaction_participants[source]¶

list of reaction participants

Type:	`list` of `ReactionParticipant`

parameters[source]¶

list of parameters

Type:	`list` of `Parameter`

created[source]¶

cross_references[source]¶

get_inchi_structures()[source]¶

Get InChI-formatted structures

Returns:	list of structures in InChI format
Return type:	`list` of `str`

get_smiles_structures()[source]¶

Get SMILES-formatted structures

Returns:	list of structures in SMILES format
Return type:	`list` of `str`

id[source]¶

modified[source]¶

name[source]¶

parameters[source]

reaction_participants[source]

structures[source]

synonyms[source]¶

class datanator.data_source.sabio_rk.CompoundStructure(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents the structure of a compound and its format

compounds[source]¶

list of compounds

Type:	`list` of `Compound`

value[source]¶

the structure in InChI, SMILES, etc. format

Type:	`str`

format[source]¶

format (InChI, SMILES, etc.) of the structure

Type:	`str`

_value_inchi[source]¶

structure in InChI format

Type:	`str`

_value_inchi_formula_connectivity[source]¶

empiral formula (without hydrogen) and connectivity InChI layers; used to quickly search for compound structures

Type:	`str`

calc_inchi_formula_connectivity()[source]¶

Calculate a searchable structures

InChI format
Core InChI format
- Formula layer (without hydrogen)
- Connectivity layer

compounds[source]

format[source]

value[source]

class datanator.data_source.sabio_rk.Entry(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a compartment in the SABIO-RK database

id[source]¶

external identifier

Type:	`int`

name[source]¶

name

Type:	`str`

synonyms[source]¶

list of synonyms

Type:	`list` of `Synonym`

cross_references[source]¶

list of cross references

Type:	`list` of `Resource`

created[source]¶

date that the sqlite object was created

Type:	`datetime.datetime`

updated[source]¶

date that the sqlite object was last updated

Type:	`datetime.datetime`

created[source]

cross_references[source]

id[source]

modified[source]¶

name[source]

synonyms[source]

class datanator.data_source.sabio_rk.Enzyme(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents an enzyme in the SABIO-RK database

subunits[source]¶

list of subunits

Type:	`list` of `EnzymeSubunit`

kinetic_laws[source]¶

list of kinetic laws

Type:	`list` of `KineticLaw`

molecular_weight[source]¶

molecular weight in Daltons

Type:	`float`

parameters[source]¶

list of parameters

Type:	`list` of `Parameter`

created[source]¶

cross_references[source]¶

id[source]¶

kinetic_laws[source]

modified[source]¶

molecular_weight[source]

name[source]¶

parameters[source]

subunits[source]

synonyms[source]¶

class datanator.data_source.sabio_rk.EnzymeSubunit(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents an enzyme in the SABIO-RK database

enzyme[source]¶

enzyme

Type:	`Enzyme`

coefficient[source]¶

stoichiometry of the subunit in the enzyme

Type:	`int`

sequence[source]¶

amino acid sequence

Type:	`str`

molecular_weight[source]¶

molecular weight in Daltons

Type:	`float`

coefficient[source]

created[source]¶

cross_references[source]¶

enzyme[source]

enzyme_id[source]¶

id[source]¶

modified[source]¶

molecular_weight[source]

name[source]¶

sequence[source]

synonyms[source]¶

class datanator.data_source.sabio_rk.KineticLaw(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents a kinetic law in the SABIO-RK database

reactants[source]¶

list of reactants

Type:	`list` of `ReactionParticipant`

products[source]¶

list of products

Type:	`list` of `ReactionParticipant`

enzyme[source]¶

enzyme

Type:	`Enzyme`

enzyme_compartment[source]¶

compartment

Type:	`Compartment`

enzyme_type[source]¶

type of the enzyme (e.g. Modifier-Catalyst)

Type:	`str`

tissue[source]¶

tissue

Type:	`str`

mechanism[source]¶

mechanism of enzymatic catalysis (e.g. Michaelis-Menten)

Type:	`str`

equation[source]¶

equation

Type:	`str`

parameters[source]¶

list of parameters

Type:	`list` of `Parameter`

modifiers[source]¶

list of modifiers

Type:	`list` of `ReactionParticipant`

taxon[source]¶

taxon

Type:	`str`

taxon_wildtype[source]¶

if True, the taxon represent the wild type

Type:	`bool`

taxon_variant[source]¶

variant of the taxon

Type:	`str`

temperature[source]¶

temperature in C

Type:	`float`

ph[source]¶

pH

Type:	`float`

media[source]¶

media

Type:	`str`

references[source]¶

list of PubMed references

Type:	`list` of `Resource`

created[source]¶

cross_references[source]¶

enzyme[source]

enzyme_compartment[source]

enzyme_compartment_id[source]¶

enzyme_id[source]¶

enzyme_type[source]

equation[source]

id[source]¶

mechanism[source]

media[source]

modified[source]¶

modifiers[source]

name[source]¶

parameters[source]

ph[source]

products[source]

reactants[source]

references[source]

synonyms[source]¶

taxon[source]

taxon_variant[source]

taxon_wildtype[source]

temperature[source]

tissue[source]

class datanator.data_source.sabio_rk.Parameter(**kwargs)[source]¶

Bases: datanator.data_source.sabio_rk.Entry

Represents a parameter in the SABIO-RK database

kinetic_law[source]¶

kinetic law

Type:	`KineticLaw`

type[source]¶

SBO term

Type:	`int`

compound[source]¶

compound

Type:	`Compound`

enzyme[source]¶

enzyme

Type:	`Enzyme`

compartment[source]¶

compartment

Type:	`Compartment`

value[source]¶

normalized value

Type:	`float`

error[source]¶

normalized error

Type:	`float`

units[source]¶

normalized units

Type:	`str`

observed_name[source]¶

name

Type:	`str`

observed_type[source]¶

SBO term

Type:	`int`

observed_value[source]¶

observed value

Type:	`float`

observed_error[source]¶

observed error

Type:	`float`

observed_units[source]¶

observed units

Type:	`str`

TYPES (:obj:`dict` of :obj:`int`: str): dictionary of SBO terms and their canonical string symbols

UNITS (:obj:`dict` of :obj:`int`: str): dictionary of SBO terms and their canonical units

TYPES = {25: 'k_cat', 27: 'k_m', 186: 'v_max', 261: 'k_i'}[source]¶

compartment[source]

compartment_id[source]¶

compound[source]

compound_id[source]¶

created[source]¶

cross_references[source]¶

enzyme[source]

enzyme_id[source]¶

error[source]

id[source]¶

kinetic_law[source]

kinetic_law_id[source]¶

modified[source]¶

name[source]¶

observed_error[source]

observed_name[source]

observed_type[source]

observed_units[source]

observed_value[source]

synonyms[source]¶

type[source]

units[source]

value[source]

class datanator.data_source.sabio_rk.ReactionParticipant(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a participant in a SABIO-RK reaction

compound[source]¶

compound

Type:	`Compound`

compartment[source]¶

compartment

Type:	`Compartment`

coefficient[source]¶

coefficient

Type:	`float`

type[source]¶

type

Type:	`str`

reactant_kinetic_law[source]¶

kinetic law in which the participant appears as a reactant

Type:	`KineticLaw`

product_kinetic_law[source]¶

kinetic law in which the participant appears as a product

Type:	`KineticLaw`

coefficient[source]

compartment[source]

compartment_id[source]¶

compound[source]

compound_id[source]¶

modifier_kinetic_law[source]¶

modifier_kinetic_law_id[source]¶

product_kinetic_law[source]

product_kinetic_law_id[source]¶

reactant_kinetic_law[source]

reactant_kinetic_law_id[source]¶

type[source]

class datanator.data_source.sabio_rk.Resource(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents an external resource

namespace[source]¶

external namespace

Type:	`str`

id[source]¶

external identifier

Type:	`str`

entries[source]¶

entries

Type:	`list` of `Entry`

kinetic_laws[source]¶

kinetic laws

Type:	`list` of `KineticLaw`

entries[source]

id[source]

kinetic_laws[source]

namespace[source]

class datanator.data_source.sabio_rk.SabioRk(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, webservice_batch_size=1, excel_batch_size=100, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

A local sqlite copy of the SABIO-RK database

webservice_batch_size[source]¶

default size of batches to download kinetic information from the SABIO webservice. Note: this should be set to one because SABIO exports units incorrectly when multiple kinetic laws are requested

Type:	`int`

excel_batch_size[source]¶

default size of batches to download kinetic information from the SABIO Excel download service

Type:	`int`

ENDPOINT_KINETIC_LAWS_SEARCH[source]¶

URL to obtain a list of the ids of all of the kinetic laws in SABIO-Rk

Type:	`str`

ENDPOINT_WEBSERVICE[source]¶

URL for the SABIO-RK webservice

Type:	`str`

ENDPOINT_EXCEL_EXPORT[source]¶

URL to download kinetic data as a table in TSV format

Type:	`str`

ENDPOINT_COMPOUNDS_PAGE[source]¶

URL to download information about a SABIO-RK compound

Type:	`str`

SKIP_KINETIC_LAW_IDS[source]¶

IDs of kinetic laws that should be skipped (because they cannot contained errors and can’t be downloaded from SABIO)

Type:	`tuple` of `int`

PUBCHEM_MAX_TRIES[source]¶

maximum number of times to time querying PubChem before failing

Type:	`int`

PUBCHEM_TRY_DELAY[source]¶

delay in seconds between PubChem queries (to delay overloading the server)

Type:	`float`

ENDPOINT_COMPOUNDS_PAGE = 'http://sabiork.h-its.org/compdetails.jsp'[source]

ENDPOINT_DOMAINS = {'sabio_rk': 'http://sabiork.h-its.org', 'uniprot': 'http://www.uniprot.org'}[source]¶

ENDPOINT_EXCEL_EXPORT = 'http://sabiork.h-its.org/entry/exportToExcelCustomizable'[source]

ENDPOINT_KINETIC_LAWS_PAGE = 'http://sabiork.h-its.org/kindatadirectiframe.jsp'[source]¶

ENDPOINT_KINETIC_LAWS_SEARCH = 'http://sabiork.h-its.org/sabioRestWebServices/searchKineticLaws/entryIDs'[source]

ENDPOINT_WEBSERVICE = 'http://sabiork.h-its.org/sabioRestWebServices/kineticLaws'[source]

PUBCHEM_MAX_TRIES = 10[source]

PUBCHEM_TRY_DELAY = 0.25[source]

SKIP_KINETIC_LAW_IDS = (51286,)[source]

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

calc_enzyme_molecular_weights(enzymes)[source]¶

Calculate the molecular weight of each enzyme

Parameters:	enzymes (`list` of `Enzyme`) – list of enzymes

calc_stats()[source]¶

Calculate statistics about SABIO-RK

Returns:	list of list of statistics
Return type:	`list` of `list` of `obj`

create_compartment_from_sbml(sbml)[source]¶

Add a compartment to the local sqlite database

Parameters:	sbml (`libsbml.Compartment`) – SBML-representation of a compartment
Returns:	compartment
Return type:	`Compartment`

create_cross_references_from_sbml(sbml)[source]¶

Add cross references to the local sqlite database for an SBML object

Parameters:	sbml (`libsbml.SBase`) – object in an SBML documentation
Returns:	list of resources
Return type:	`list` of `Resource`

create_kinetic_law_from_sbml(id, sbml, specie_properties, functions, units)[source]¶

Add a kinetic law to the local sqlite database

Parameters:	id (`int`) – identifier sbml (`libsbml.KineticLaw`) – SBML-representation of a reaction specie_properties (`dict`) – additional properties of the compounds/enzymes is_wildtype (`bool`): indicates if the enzyme is wildtype or mutant variant (`str`): description of the variant of the eznyme modifier_type (`str`): type of the enzyme (e.g. Modifier-Catalyst)

:param functions (dict of str: str): dictionary of rate law equations (keys = IDs in SBML, values = equations) :param units (dict of str: str): dictionary of units (keys = IDs in SBML, values = names)

Returns:	kinetic law
Return type:	`KineticLaw`
Raises:	`ValueError` – if the temperature is expressed in an unsupported unit

create_kinetic_laws_from_sbml(ids, sbml)[source]¶

Add kinetic laws defined in an SBML file to the local sqlite database

Parameters:

ids (list of int) – list kinetic law IDs
sbml (str) – SBML representation of one or more kinetic laws

Returns:

list of KineticLaw: list of kinetic laws
list of Compound or Enzyme: list of species (compounds or enzymes)
list of Compartment: list of compartments

Return type:

tuple

create_specie_from_sbml(sbml)[source]¶

Add a species to the local sqlite database

Parameters:	sbml (`libsbml.Species`) – SBML-representation of a compound or enzyme
Returns:	`Compound`: or `Enzyme`: compound or enzyme `dict`: additional properties of the compound/enzyme is_wildtype (`bool`): indicates if the enzyme is wildtype or mutant variant (`str`): description of the variant of the eznyme modifier_type (`str`): type of the enzyme (e.g. Modifier-Catalyst)
Return type:	`tuple`
Raises:	`ValueError` – if a species is of an unsupported type (i.e. not a compound or enzyme)

export_stats(stats, filename=None)[source]¶

Export statistics to an Excel workbook

Parameters:	stats (`list` of `list` of `obj`) – list of list of statistics filename (`str`, optional) – path to export statistics

get_parameter_by_properties(kinetic_law, parameter_properties)[source]¶

Get the parameter of kinetic_law whose attribute values are equal to that of parameter_properties

Parameters:	kinetic_law (`KineticLaw`) – kinetic law to find parameter of parameter_properties (`dict`) – properties of parameter to find
Returns:	parameter with attribute values equal to values of `parameter_properties`
Return type:	`Parameter`

get_specie_reference_from_sbml(specie_id)[source]¶

Get the compound/enzyme associated with an SBML species by its ID

Parameters:	specie_id (`str`) – ID of an SBML species
Returns:	`Compound` or `Enzyme`: compound or enzyme `Compartment`: compartment
Return type:	`tuple`
Raises:	`ValueError` – if the species is not a compound or enzyme, no species with id = specie_id exists, or no compartment with name = compartment_name exists

infer_compound_structures_from_names(compounds)[source]¶

Try to use PubChem to infer the structure of compounds from their names

Notes: we don’t try look up structures from their cross references because SABIO has already gathered all structures from their cross references to ChEBI, KEGG, and PubChem

Parameters:	compounds (`list` of `Compound`) – list of compounds

load_compounds(compounds=None)[source]¶

Download information from SABIO-RK about all of the compounds stored in the local sqlite copy of SABIO-RK

Parameters:	compounds (`list` of `Compound`) – list of compounds to download
Raises:	`Error` – if an HTTP request fails

load_content()[source]¶: Download the content of SABIO-RK and store it to a local sqlite database.

load_kinetic_law_ids()[source]¶

Download the IDs of all of the kinetic laws stored in SABIO-RK

Returns:	list of kinetic law IDs
Return type:	`list` of `int`
Raises:	`Error` – if an HTTP request fails or the expected number of kinetic laws is not returned

load_kinetic_laws(ids)[source]¶

Download kinetic laws from SABIO-RK

Parameters:	ids (`list` of `int`) – list of IDs of kinetic laws to download
Raises:	`Error` – if an HTTP request fails

load_missing_enzyme_information_from_html(ids)[source]¶

Loading enzyme subunit information from html

Parameters:	ids (`list` of `int`) – list of IDs of kinetic laws to download

load_missing_kinetic_law_information_from_tsv(ids)[source]¶

Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.

Parameters:	ids (`list` of `int`) – list of IDs of kinetic laws to download

load_missing_kinetic_law_information_from_tsv_helper(tsv)[source]¶

Update the properties of kinetic laws in the local sqlite database based on content downloaded from SABIO in TSV format.

Note: this method is necessary because neither of SABIO’s SBML and Excel export methods provide all of the SABIO’s content.

Parameters:	tsv (`str`) – TSV-formatted table
Raises:	`ValueError` – if a kinetic law or compartment is not contained in the local sqlite database

normalize_kinetic_laws(ids)[source]¶

Normalize parameter values

Parameters:	ids (`list` of `int`) – list of IDs of kinetic laws to download

normalize_parameter_value(name, type, value, error, units, enzyme_molecular_weight)[source]¶

Parameters:	name (`str`) – parameter name type (`int`) parameter type (SBO term id) – value (`float`) – observed value error (`float`) – observed error units (`str`) – observed units enzyme_molecular_weight (`float`) – enzyme molecular weight
Returns:	normalized name and its type (SBO term), value, error, and units
Return type:	`tuple` of `str`, `int`, `float`, `float`, `str`
Raises:	`ValueError` – if `units` is not a supported unit of `type`

parse_complex_subunit_structure(text)[source]¶

Parse the subunit structure of complex into a dictionary of subunit coefficients

Parameters:	text (`str`) – subunit structure described with nested parentheses
Returns:	dictionary of subunit coefficients
Return type:	`dict` of `str`, `int`

parse_enzyme_name(sbml)[source]¶

Parse the name of an enzyme in SBML for the enzyme name, wild type status, and variant description that it contains.

Parameters:	sbml (`str`) – enzyme name in SBML
Returns:	`str`: name `bool`: if `True`, the enzyme is wild type `str`: variant
Return type:	`tuple`
Raises:	`ValueError` – if the enzyme name is formatted in an unsupport format

class datanator.data_source.sabio_rk.Synonym(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents a synonym to a SABIO-RK entry

name[source]¶

name of the synonym

Type:	`str`

entries[source]¶

list of entries with the synonym

Type:	`list` of `Entry`

entries[source]

name[source]

4.1.1.5.23. datanator.data_source.sabio_rk_nosql module¶

Parse SabioRk json files into MongoDB documents: (json files acquired by running sqlite_to_json.py)

Author:	Zhouyang Lian <zhouyang.lian@familian.life>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2019-04-02
Copyright:	2019, Karr Lab
License:	MIT

class datanator.data_source.sabio_rk_nosql.SabioRkNoSQL(db=None, MongoDB=None, cache_directory=None, quilt_package=None, verbose=False, max_entries=inf, replicaSet=None, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

add_deprot_inchi()[source]¶

load_json()[source]¶

make_doc(file_names, file_dict)[source]¶

datanator.data_source.sabio_rk_nosql.main()[source]¶

4.1.1.5.24. datanator.data_source.sqlite_to_json module¶

Converts tables in SQLite into json files .. attribute:: database

path to sqlite database

datanator.data_source.sqlite_to_json.query[source]¶: query execution command in string format

class datanator.data_source.sqlite_to_json.SQLToJSON(query, cache_dirname=None, quilt_package=None, system_path=None)[source]¶

Bases: object

db()[source]¶

query_table(table, one=True)[source]¶

table()[source]¶

4.1.1.5.25. datanator.data_source.taxon_tree module¶

class datanator.data_source.taxon_tree.TaxonTree(cache_dirname, MongoDB, db, replicaSet=None, verbose=False, max_entries=inf, username=None, password=None, authSource='admin')[source]¶

Bases: datanator.util.mongo_util.MongoUtil

count_line(file)[source]¶: Efficiently count total number of lines in a given file

download_dump()[source]¶

load_content()[source]¶: Load contents of several .dmp files into MongoDB

parse_division()[source]¶: division.dmp

parse_fullname_line(line)[source]¶: Parses lines in file fullnamelineage.dmp and return elements in a list

parse_fullname_taxid()[source]¶: Parse fullnamelineage.dmp and taxidlineage.dmp store in MongoDB Always run first before loading anything else (insert_one)

parse_gencode()[source]¶: gencode.dmp

parse_names()[source]¶: names.dmp 1 | all | | synonym | 1 | root | | scientific name | 2 | bacteria | bacteria <blast2> | blast name | 2 | Bacteria | Bacteria <prokaryotes> | scientific name | 2 | eubacteria | | genbank common name

parse_nodes()[source]¶: nodes.dmp

parse_nodes_line(line)[source]¶: Parse lines in nodes.dmp

parse_taxid_line(line)[source]¶

Parses lines in file taxidlineage.dmp and return elements in a list: delimited by ” |

“: (tab, vertical bar, and newline) characters. Each record consists of one or more fields delimited by ” | ” (tab, vertical bar, and tab) characters.

datanator.data_source.taxon_tree.main()[source]¶

4.1.1.5.26. datanator.data_source.uniprot module¶

Downloads and parses the UnitProt database for protein-protein interactions

Author:	Saahith Pochiraju <saahith116@gmail.com>
Author:	Jonathan Karr <jonrkarr@gmail.com>
Date:	2018-08-15
Copyright:	2017-2018, Karr Lab
License:	MIT

class datanator.data_source.uniprot.Uniprot(name=None, cache_dirname=None, clear_content=False, load_content=False, max_entries=inf, commit_intermediate_results=False, download_backups=True, verbose=False, clear_requests_cache=False, download_request_backup=False, quilt_owner=None, quilt_package=None)[source]¶

Bases: datanator.core.data_source.HttpDataSource

ENDPOINT_DOMAINS = {'uniprot': 'http://www.uniprot.org/uniprot/?fil=reviewed:yes'}[source]¶

base_model[source]¶: alias of sqlalchemy.ext.declarative.api.Base

load_content()[source]¶: Load the content of the local copy of the data source

class datanator.data_source.uniprot.UniprotData(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Represents protein interactions in from the IntAct Database

Index[source]¶

Index of the DB

Type:	`int`

interactor_a[source]¶

represents participant A

Type:	`str`

interactor_b[source]¶

represents participant B

Type:	`str`

publications[source]¶

resource

Type:	`str`

interaction[source]¶

interaction ID

Type:	`str`

feature_a[source]¶

binding site of participant A

Type:	`str`

feature_b[source]¶

binding site of participant B

Type:	`str`

stoich_a[source]¶

stoichiometry of participant A

Type:	`str`

stoich_b[source]¶

stoichiometry of participant B

Type:	`str`

canonical_sequence[source]¶

ec_number[source]¶

entrez_id[source]¶

entry_name[source]¶

gene_name[source]¶

index[source]¶

length[source]¶

mass[source]¶

protein_name[source]¶

status[source]¶

uniprot_id[source]¶

4.1.1.5.27. datanator.data_source.uniprot_nosql module¶

Author:	Zhouyang Lian <zhouyang.lian@familian.life>
Author:	Jonathan <jonrkarr@gmail.com>
Date:	2019-04-02
Copyright:	2019, Karr Lab
License:	MIT

class datanator.data_source.uniprot_nosql.UniprotNoSQL(MongoDB=None, db=None, max_entries=inf, verbose=False, username=None, password=None, authSource='admin', replicaSet=None)[source]¶

Bases: datanator.util.mongo_util.MongoUtil

get_uniprot()[source]¶

load_uniprot()[source]¶

4.1.1.5. datanator.data_source package¶

4.1.1.5.1. Subpackages¶

4.1.1.5.2. Submodules¶

4.1.1.5.3. datanator.data_source.array_express module¶

4.1.1.5.4. datanator.data_source.bio_portal module¶

4.1.1.5.5. datanator.data_source.corum module¶

4.1.1.5.6. datanator.data_source.corum_nosql module¶

4.1.1.5.7. datanator.data_source.cron_aggregate module¶

4.1.1.5.8. datanator.data_source.ecmdb module¶

4.1.1.5.9. datanator.data_source.ensembl module¶

4.1.1.5.10. datanator.data_source.ezyme module¶

4.1.1.5.11. datanator.data_source.intact module¶

4.1.1.5.12. datanator.data_source.intact_nosql module¶

4.1.1.5.13. datanator.data_source.jaspar module¶

4.1.1.5.14. datanator.data_source.kegg module¶

4.1.1.5.15. datanator.data_source.kegg_orthology module¶

4.1.1.5.16. datanator.data_source.kegg_reaction_class module¶

4.1.1.5.17. datanator.data_source.metabolite_nosql module¶

4.1.1.5.18. datanator.data_source.metabolites_meta_collection module¶

4.1.1.5.19. datanator.data_source.pax module¶

4.1.1.5.20. datanator.data_source.pax_nosql module¶

4.1.1.5.21. datanator.data_source.refseq module¶

4.1.1.5.22. datanator.data_source.sabio_rk module¶

4.1.1.5.23. datanator.data_source.sabio_rk_nosql module¶

4.1.1.5.24. datanator.data_source.sqlite_to_json module¶

4.1.1.5.25. datanator.data_source.taxon_tree module¶

4.1.1.5.26. datanator.data_source.uniprot module¶

4.1.1.5.27. datanator.data_source.uniprot_nosql module¶

4.1.1.5.28. Module contents¶