4.2. Using the wc_lang package to define whole-cell models¶
This tutorial teaches you how to use the wc_lang
package to access and create whole-cell models.
wc_lang
provides a foundation for defining, writing, reading and manipulating biochemical models composed of species,
reactions, compartments and other parts of a biochemical system.
It can be used to define models of entire cells, or models of smaller biochemical systems.
wc_lang
contains methods to read and write models from two types of files –
Excel spreadsheet workbooks and sets of delimited files. It also includes methods that
analyze or transform models – e.g., methods that validate, compare, and normalize them.
wc_lang
depends heavily on the obj_tables
package which defines a generic language for declaring
interrelated Python objects, converting them to and from data records,
transferring the records to and from files, and validating their values.
obj_tables
is essentially an object-relational mapping (ORM) system that stores data in files
instead of databases.
However, users of wc_lang
do not need to use obj_tables
directly.
4.2.1. Semantics of a wc_lang
biochemical Model¶
A wc_lang
biochemical model represents a biochemical system as Species
(we indicate
classes in wc_lang
by capitalized names in fixed-width
text) that get transformed by reactions.
A SpeciesType
describes a biochemical molecule, including its name
(following Python
convention, attributes
of classes are lowercase names), structure
, molecular_weight
,
charge
and other properties.
The concentration of a SpeciesType
in a compartment is stored by a Species
instance
that references instances of SpeciesType
, Compartment
, and Concentration
, which provide
the Species
’ location and concentration.
A compartment may represent an organelle or a conceptual region of a model.
Adjacency relationships among compartments are implied by reactions that transfer
species among them, but physical relationships between compartments or their 3D positions
are not represented.
The data in
a wc_lang
model is organized in a highly-interconnected graph of related Python objects, each of
which is an obj_tables.core.Model
instance.
For example, a Species
instance contains reaction_participants
,
which references each Reaction
in which the Species
participates.
The graph contains many convenience relationships like this, which make it easy to
follow the relationships between obj_tables.core.Model
instances anywhere in a wc_lang
model.
A wc_lang
model also supports some metadata.
Named Parameter
entities store arbitrary values, such as input parameters.
Published data sources used by a model should be recorded in Reference
entities,
or in a DatabaseReference
objects that identify a biological or chemical database.
wc_lang
models are typically used to describe the initial state of a model – a wc_lang
description lacks any notion of time.
More generally, a comprehensive wc_lang
model should provide a complete description of a model,
including its data sources and comments about model components.
4.2.2. wc_lang
Classes Used to Define biochemical Models¶
This subsection enumerates the obj_tables.core.Model
classes that store data in wc_lang
models.
When using an existing model the attributes of these classes are frequently accessed, although their definitions are not typically imported. However, they must be imported when they are being instantiated programmatically.
Many of these classes implement the methods deserialize()
and serialize()
.
deserialize()
parses an object’s string representation – as would be stored in a text file or spreadsheet
representation of a biochemical model – into one or more obj_tables.core.Model
instances.
serialize()
performs the reverse, converting a wc_lang
class instance into a string representation.
Thus, the deserialize()
methods are used when reading models from files and serialize()
is used when writing a model to disk.
deserialize()
returns an error when a string representation cannot be parsed into a
Python object.
4.2.2.1. Static Enumerations¶
Static attributes of these classes are used as attributes of wc_lang
model components.
TaxonRank
The names of biological taxonomic ranks: domain, kingdom, phylum, etc.
SubmodelAlgorithm
The names of algorithms that can integrate submodels: dfba, ode, and ssa.
SpeciesTypeType
Types of species types: metabolite, protein, dna, rna, and pseudo_species.
RateLawDirection
The direction of a reaction rate law: backward or forward.
ReferenceType
Reference types, such as article, book, online, proceedings, etc.
4.2.2.2. wc_lang
Model Components¶
These classes are instantiated as components of a wc_lang
model.
When a model is stored on disk all the instances of each class are
usually stored in a separate table, either an Excel workbook’s worksheet or delimiter-separated file.
In the former case, the model is stored in one workbook, while in the latter it is stored in a set of files.
Taxon
The taxonomic rank of a model.
Submodel
A part of a whole-cell model which is to be simulated with a particular
algorithm
from the enumerationSubmodelAlgorithm
. EachSubmodel
is associated with aCompartment
that contains theSpecies
it models, and all the reactions that transform them. ASubmodel
may also have parameters.Compartment
A named physical container in the biochemical system being modeled. It could represent an organelle, a cell’s cytoplasm, or another physical or conceptual structure. It includes an
initial_volume
in liters, and references to the initial concentrations of theSpecies
it contains. A compartment can have a semi-permeable membrane, which is modeled by reactions that transform reactant species in the compartment to product species in another compartment. These are called membrane-transfer reactions. A membrane-transfer reaction that moves species from compartment x to compartment y implies that x and y are adjacent.SpeciesType
The biochemical type of a species. It contains the type’s
name
,structure
– which is represented in InChI for metabolites and as sequences for DNA, RNA, and proteins,empirical_formula
,molecular_weight
, andcharge
. A species’type
is drawn from the attributes ofSpeciesTypeType
.Species
A particular
SpeciesType
contained in a particularCompartment
at a particular concentration.Concentration
The molar concentration (M) of a species.
Reaction
A biochemical reaction. Each
Reaction
belongs to onesubmodel
. It consists of a list of the species that participate in the reaction, stored as a list of references toReactionParticipant
instances inparticipants
. A reaction that’s simulated by a dynamic algorithm, such as an ODE system or SSA, must have a forward rate law. A Boolean indicates whether the reaction is thermodynamicallyreversible
. Ifreversible
isTrue
, then the reaction must also have a backward rate law. Rate laws are stored in therate_laws
list, and their directions are drawn from the attributes ofRateLawDirection
.
ReactionParticipant
ReactionParticipant
combines aSpecies
and its stoichiometric reaction coefficient. Coefficients are negative for reactants and positive for products.RateLaw
A rate law contains a textual
equation
which stores the mathematical expression of the rate law. It contains thedirection
of the rate law, encoded with aRateLawDirection
attribute.k_cat
andk_m
attributes for a Michaelis–Menten kinetics model are provided, but their use isn’t required.RateLawEquation
A rate law equation’s
expression
contains a textual, mathematical expression of the rate law. A rate law can be used by more than oneReaction
. The expression will be transcoded into a valid Python expression, stored in thetranscoded
attribute, and evaluated as a Python expression by a simulator. This evaluation must produce a number.The expression is constructed from species names, compartment names, stoichiometric reaction coefficients, k_cat and k_m, and Python functions and mathematical operators.
SpeciesType
andCompartment
names must be valid Python identifiers, and the entire expression must be a valid Python expression. A species composed of aSpeciesType
namedspecies_x
located in aCompartment
namedc
is writtenspecies_x[c]
. When a rate law equation is evaluated during the simulation of a model the expressionspecies_x[c]
is interpreted as the current concentration ofspecies_x
in compartmentc
.
Parameter
A
Parameter
holds an arbitrary floating pointvalue
. It is named, associated with a a set ofsubmodels
, and should include a modifier indicating the value’sunits
.
4.2.2.3. wc_lang
Model Data Sources¶
These classes record the sources of a model’s data.
Reference
A
Reference
holds a reference to a publication that contains data used in the model.DatabaseReference
A
Reference
describes a biological or chemical database that provided data for the model.
4.2.3. Using wc_lang
¶
The following tutorial shows several ways to use wc_lang
, including
reading a model from disk, defining a model programmatically and writing it to disk,
and using these models:
Install the required software for the tutorial:
Python
Pip
Install the tutorial and the whole-cell packages that it uses:
git clone https://github.com/KarrLab/intro_to_wc_modeling.git pip install --upgrade \ ipython \ git+https://github.com/KarrLab/wc_lang.git#egg=wc_lang \ git+https://github.com/KarrLab/wc_utils.git#egg=wc_utils
Change to the directory for this tutorial:
cd intro_to_wc_modeling/intro_to_wc_modeling/wc_modeling/wc_lang_tutorial
Open an interactive python interpreter:
ipython
Import the
os
andwc_lang.io
modules:import os import wc_lang.io
Read and write models in Excel and delimited files
wc_lang
can read and write models from specially formatted Excel workbooks in which each worksheet represents one of the model component classes above, each row represents a class instance, each column represents an instance attribute, each cell represents the value of an attribute of an instance, and string identifiers are used to indicate relationships among objects.wc_lang
can also read and write models from a specially formatted sets of delimiter-separated files.In addition to defining a model, files that define models should contain all of the annotation needed to understand the biological semantic meaning of the model. Ideally, this should include:
NCBI Taxonomy ID for the taxon
Gene Ontology (GO) annotations for each submodel
The structure of each species: InChI for small molecules; sequences for polymers
Where possible, ChEBI ids for each small molecule
Where possible, ids for each gene, transcript, and protein
Where possible, EC numbers or KEGG ids for each reaction
Cell Component Ontology (CCO) annotations for each compartment
Systems Biology Ontology (SBO) annotations for each parameter
The citations which support each model decision
PubMed id, DOI, ISBN, or URL for each citation
This example illustrates how to read a model from an Excel file:
model = wc_lang.io.Reader().run(model_filename)[wc_lang.Model][0]
(You may ignore a
UserWarning
generated by these commands.)If a model file is invalid (for example, it defines two species types with the same id, or a concentration that refers to a species type that is not defined), this operation will raise an exception which contains a list of all of the errors in the model definition.
To name a model stored in a set of delimiter-separated files,
wc_lang
uses a filename glob pattern that matches the files in the set. The supported delimiters are commas in .csv files and tabs in .tsv files. These files use the same format as the Excel workbook format, except that each worksheet is stored as a separate file. Excel workbooks are easier to read and edit interactively, but changes to delimiter-separated files can be tracked in code version control systems such as Git.This example illustrates how to write a model to a set of .tsv files:
# 'examples_dir' is a directory model_filename_pattern = os.path.join(examples_dir, 'example_model-*.tsv') wc_lang.io.Writer().run(model_filename_pattern, model, data_repo_metadata=False)
The glob pattern in
model_filename_pattern
matches these files:example_model-Biomass components.tsv example_model-Biomass reactions.tsv example_model-Compartments.tsv example_model-Concentrations.tsv example_model-database references.tsv example_model-Model.tsv example_model-Parameters.tsv example_model-Rate laws.tsv example_model-Reactions.tsv example_model-References.tsv example_model-Species types.tsv example_model-Submodels.tsv example_model-Taxon.tsv
in
examples_dir
, each of which contains a component of the model.Continuing the previous example, this command reads this set of .tsv files into a model:
model_from_tsv = wc_lang.io.Reader().run(model_filename_pattern)[wc_lang.Model][0]
csv files can be used similarly.
Access properties of the model
A
wc_lang
model (an instance ofwc_lang.core.Model
) has multiple attributes:model.id # the model's unique identifier model.name # its human readable name model.version # its version number model.taxon # the taxon of the organism being modeled model.submodels # a list of the model's submodels model.compartments # " " " the model's compartments model.species_types # " " " its species types model.parameters # " " " its parameters model.references # " " " publication sources for the model instance model.identifiers # " " " identifiers in external namespaces for the model instance
These provide access to the parts of a
wc_lang
model that are directly referenced by a model instance.wc_lang
also provides some convenience methods that get all of the elements of a specific type which are part of a model. Each of these methods returns a list of the instances of requested type.model.get_compartments() model.get_species_types() model.get_submodels() model.get_species() model.get_distribution_init_concentrations() model.get_reactions() model.get_dfba_obj_reactions() model.get_rate_laws() model.get_parameters() model.get_references()
For example,
get_reactions()
returns a list of all of the reactions in a model’s submodels. As illustrated below, this can be used to obtain the id of each reaction and the name of its submodel:reaction_identification = [] for reaction in model.get_reactions(): reaction_identification.append('submodel name: {}, reaction id: {}'.format( reaction.submodel.name, reaction.id))
Programmatically build a new model and edit its model properties
You can also use the classes and methods in
wc_lang.core
to programmatically build and edit models. While modelers typically will not create models programmatically, creating model components in this way gives you a feeling for how models are built and will .The following illustrates how to program a trivial model with 1 compartment, 5 species types and one reaction:
# create a model with one submodel and one compartment prog_model = wc_lang.Model(id='programmatic_model', name='Programmatic model') submodel = wc_lang.Submodel(id='submodel_1', model=prog_model) cytosol = wc_lang.Compartment(id='c', name='Cytosol') # create 5 species types atp = wc_lang.SpeciesType(id='atp', name='ATP', model=prog_model) adp = wc_lang.SpeciesType(id='adp', name='ADP', model=prog_model) pi = wc_lang.SpeciesType(id='pi', name='Pi', model=prog_model) h2o = wc_lang.SpeciesType(id='h2o', name='H2O', model=prog_model) h = wc_lang.SpeciesType(id='h', name='H+', model=prog_model) # create an 'ATP hydrolysis' reaction that uses these species types atp_hydrolysis = wc_lang.Reaction(id='atp_hydrolysis', name='ATP hydrolysis') # add two reactants, which have negative stoichiometric coefficients atp_hydrolysis.participants.create( species=wc_lang.Species(id='atp[c]', species_type=atp, compartment=cytosol), coefficient=-1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='h2o[c]', species_type=h2o, compartment=cytosol), coefficient=-1) # add three products, with positive stoichiometric coefficients atp_hydrolysis.participants.create( species=wc_lang.Species(id='adp[c]', species_type=adp, compartment=cytosol), coefficient=1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='pi[c]', species_type=pi, compartment=cytosol), coefficient=1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='h[c]', species_type=h, compartment=cytosol), coefficient=1)
In this example
wc_lang.core.SpeciesType(id='atp', name='ATP', model=prog_model)
instantiates aSpeciesType
instance with two string attributes and amodel
attribute that references an existing model. In addition, this expression adds the newSpeciesType
to the model’s species types, thereby showing howobj_tables
’s underlying functionality automatically creates bi-directional references that make it easy to build and navigatewc_lang
models, and making this assertion hold:assert(atp in prog_model.get_species_types())
The example above illustrates another way to create and connect model components. Consider the expression:
atp_hydrolysis.participants.create( species=wc_lang.core.Species(species_type=atp, compartment=cytosol), coefficient=-1)
participants
is a Reaction instance attribute that stores a list of ReactionParticipant objects. In this expressioncreate
takes keyword arguments for the parameters used to instantiate aReactionParticipant
, instantiates aReactionParticipant
, and appends it to the list inatp_hydrolysis.participants
. These assertions hold after the 5 participants are added to the ATP hydrolysis reaction:# 5 participants were added to the reaction assert(len(atp_hydrolysis.participants) == 5) first_reaction_participant = atp_hydrolysis.participants[0] assert(first_reaction_participant.reactions[0] is atp_hydrolysis)
In general, the
create
method can be used to add model components to lists of relatedwc_lang.BaseModel
objects.create
takes keyword arguments and uses them to initialize the attributes of the component created. Thus, ifobj
has an attributeattr
that stores a list of references to components of typeX
, this expression will create an instance ofX
and append it to the list:obj.attr.create(**kwargs)
This simplifies model construction by avoiding creation of unnecessary identifiers for these components.
Similar code can be used to create any part of a model. All
wc_lang
objects that are subclassed fromwc_lang.BaseModel
(an alias forobj_tables.core.Model
) can be instantiated in the normal fashion, as shown forModel
,Submodel
,Compartment
,SpeciesType
andReaction
above. Each subclass ofwc_lang.BaseModel
contains aMeta
attribute that is a class which stores meta information about the subclass. The attributes that can be initialized when awc_lang.BaseModel
class is instantiated can be obtained from the class’Meta
attribute, which is a dictionary that maps from attribute name to attribute instance:wc_lang.Model.Meta.attributes.keys() wc_lang.Submodel.Meta.attributes.keys() wc_lang.SpeciesType.Meta.attributes.keys() wc_lang.Compartment.Meta.attributes.keys()
For example,
Reaction
has the following attributes inwc_lang.core.Reaction.Meta.attributes.keys()
:['comments', 'id', 'max_flux', 'min_flux', 'name', 'participants', 'references', 'reversible', 'submodel']
These attributes can also be set programmatically:
atp_hydrolysis.comments = 'example comments' atp_hydrolysis.reversible = False
Viewing Models and their attributes
All
wc_lang.BaseModel
instances can be viewed withpprint()
, which outputs an indented representation that shows the attributes of a model, and indents and outputs connected models. To constrain the size of its outputpprint()
outputs the graph of interconnected models to a depth ofmax_depth
, which defaults to 3. Model nodes at depthmax_depth+1
are represented by<class name>: ...
, while deeper models are not traversed. And models re-encountered bypprint()
are elided by<attribute name>: --
. For example, after creating the reactionatp_hydrolysis
above this expressionatp_hydrolysis.participants[0].pprint(max_depth=1)
creates this output:
ReactionParticipant: species: Species: species_type: SpeciesType: ... compartment: Compartment: ... concentration: None rate_law_equations: reaction_participants: coefficient: -1 reactions: Reaction: id: atp_hydrolysis name: ATP hydrolysis submodel: None participants: ReactionParticipant: ... ReactionParticipant: ... ReactionParticipant: ... ReactionParticipant: ... reversible: False min_flux: nan max_flux: nan comments: example comments references: database_references: objective_functions: rate_laws:
This shows that the first
ReactionParticipant
inatp_hydrolysis
has the attributes species, coefficient, and reactions, that the coefficient is -1, and that reactions is a list with one element which is theatp_hydrolysis
reaction itself.Validating a programmatically generated Model
The
wc_lang.core.Model.validate
method determines whether a model is valid. If the model is invalid validate return a list of all of the model’s errors. It performs the following checks:Check that only one model and taxon are defined
Check that each submodel, compartment, species type, reaction, and reference is defined only once
Check that each the species type and compartment referenced in each concentration and reaction exist
Check that values of the correct types are provided for each attribute
wc_lang.core.Compartment.initial_volume
: floatwc_lang.core.Concentration.value
: floatwc_lang.core.Parameter.value
: floatwc_lang.core.RateLaw.k_cat
: floatwc_lang.core.RateLaw.k_m
: floatwc_lang.core.Reaction.reversible
: boolwc_lang.core.ReactionParticipant.coefficient
: floatwc_lang.core.Reference.year
: integerwc_lang.core.SpeciesType.charge
: integerwc_lang.core.SpeciesType.molecular_weight
: float
Check that valid values are provided for each enumerated attribute
wc_lang.core.RateLaw.direction
wc_lang.core.Reference.type
wc_lang.core.SpeciesType.type
wc_lang.core.Submodel.algorithm
wc_lang.core.Taxon.rank
This example illustrates how to validate
prog_model
:prog_model.validate()
Compare and difference Models
wc_lang
provides methods that determine if two models are semantically equal and report any semantic differences between two models. Theis_equal
method determines if two models are semantically equal (the two models recursively have the same attribute values, ignoring the order of the attributes which has no semantic meaning). The following code compares the semantic equality ofmodel
andmodel_from_tsv
. Sincemodel_from_tsv
was generated by writingmodel
to tsv files,is_equal
should returnTrue
:assert(model.is_equal(model_from_tsv) == True)
The
difference
method produces a textual description of the differences between two models. The following code excerpt prints the differences betweenmodel
andmodel_from_tsv
. Since they are equal, the differences should be the empty string:assert(model.difference(model_from_tsv) == '')
Normalize
model
into a reproducible order to facilitate reproducible numerical simulationsThe attribute order has no semantic meaning in
wc_lang
. However, numerical simulation results derived from models described inwc_lang
can be sensitive to the attribute order. To facilitate reproducible simulation results,wc_lang
provides anormalize
to sort models into a reproducible order.The following code excerpt will normalize
model
into a reproducible order:model.normalize()
Please see http://code.karrlab.org for documentation of the entire
wc_lang
API.