3.2. Input data organization¶
Building large models requires a large amount of input data to inform the structure of the model and the value of each parameter. Consequently, it is helpful to organize this data into a readily understandable and computable database.
Unfortunately, there are few tools specifically designed to organize the input data needed for mechanistic models. However, Pathway/genome databases (PGDBs) or model organism databases (MOD) are conceptually similar and the existing PGDB tools provide much of the functionality needed to organize the input data for mechanistic models. In fact, the Pathway Tools PDB tool includes a module called MetaFlux which can be used to build flux balance analysis models of metabolism. In particular, PGDBs can track detailed molecular information at the genomic-scale for individual organisms. Some of the major limitations of the existing PGDBs are that they provide limited support for non-metabolic pathways and that they provide limited support for quantitative data.
3.2.1. Schema¶
The input data used to build models can be organized with the following schema
Value
Uncertainty
Units
Genetic conditions
Taxon
Variant
Environmental conditions
Temperature
pH
Media
Localization
Intracellular compartment
Tissue
Timepoint
Cell cycle phase
Growth phase
Time post-perturbation
Measurement method
Parameters
Version
Experiment: collection of values observed in the same experiment
Reference
Several ontologies such as the CCO and CL can be used to describe components of the schema.
3.2.2. Software tools¶
Below are some of the best tools for organizing the input data used to build models. Unfortunately, all of these tools have significant limitations. Consequently, we must develop better tools for organizing the input data used to build models.
3.2.3. Exercises¶
3.2.3.1. EcoCyc and Pathway Tools¶
Browse the webpages of BioCyc
Observe the types of data EcoCyc contains and how it is organized
Read the schema documentation
3.2.3.2. WholeCellKB¶
Browse the webpages of WholeCellKB
Observe the types of data WholeCellKB contains and how it is organized
Read the schema documentation