4.2. Using the wc_lang package to define whole-cell models

This tutorial teaches you how to use the wc_lang package to access and create whole-cell models. wc_lang provides a foundation for defining, writing, reading and manipulating biochemical models composed of species, reactions, compartments and other parts of a biochemical system. It can be used to define models of entire cells, or models of smaller biochemical systems. wc_lang contains methods to read and write models from two types of files – Excel spreadsheet workbooks and sets of delimited files. It also includes methods that analyze or transform models – e.g., methods that validate, compare, and normalize them.

wc_lang depends heavily on the obj_tables package which defines a generic language for declaring interrelated Python objects, converting them to and from data records, transferring the records to and from files, and validating their values. obj_tables is essentially an object-relational mapping (ORM) system that stores data in files instead of databases. However, users of wc_lang do not need to use obj_tables directly.

4.2.1. Semantics of a wc_lang biochemical Model

A wc_lang biochemical model represents a biochemical system as Species (we indicate classes in wc_lang by capitalized names in fixed-width text) that get transformed by reactions.

A SpeciesType describes a biochemical molecule, including its name (following Python convention, attributes of classes are lowercase names), structure, molecular_weight, charge and other properties. The concentration of a SpeciesType in a compartment is stored by a Species instance that references instances of SpeciesType, Compartment, and Concentration, which provide the Species’ location and concentration. A compartment may represent an organelle or a conceptual region of a model. Adjacency relationships among compartments are implied by reactions that transfer species among them, but physical relationships between compartments or their 3D positions are not represented.

The data in a wc_lang model is organized in a highly-interconnected graph of related Python objects, each of which is an obj_tables.core.Model instance. For example, a Species instance contains reaction_participants, which references each Reaction in which the Species participates. The graph contains many convenience relationships like this, which make it easy to follow the relationships between obj_tables.core.Model instances anywhere in a wc_lang model.

A wc_lang model also supports some metadata. Named Parameter entities store arbitrary values, such as input parameters. Published data sources used by a model should be recorded in Reference entities, or in a DatabaseReference objects that identify a biological or chemical database.

wc_lang models are typically used to describe the initial state of a model – a wc_lang description lacks any notion of time. More generally, a comprehensive wc_lang model should provide a complete description of a model, including its data sources and comments about model components.

4.2.2. wc_lang Classes Used to Define biochemical Models

This subsection enumerates the obj_tables.core.Model classes that store data in wc_lang models.

When using an existing model the attributes of these classes are frequently accessed, although their definitions are not typically imported. However, they must be imported when they are being instantiated programmatically.

Many of these classes implement the methods deserialize() and serialize(). deserialize() parses an object’s string representation – as would be stored in a text file or spreadsheet representation of a biochemical model – into one or more obj_tables.core.Model instances. serialize() performs the reverse, converting a wc_lang class instance into a string representation. Thus, the deserialize() methods are used when reading models from files and serialize() is used when writing a model to disk. deserialize() returns an error when a string representation cannot be parsed into a Python object.

4.2.2.1. Static Enumerations

Static attributes of these classes are used as attributes of wc_lang model components.

TaxonRank

The names of biological taxonomic ranks: domain, kingdom, phylum, etc.

SubmodelAlgorithm

The names of algorithms that can integrate submodels: dfba, ode, and ssa.

SpeciesTypeType

Types of species types: metabolite, protein, dna, rna, and pseudo_species.

RateLawDirection

The direction of a reaction rate law: backward or forward.

ReferenceType

Reference types, such as article, book, online, proceedings, etc.

4.2.2.2. wc_lang Model Components

These classes are instantiated as components of a wc_lang model. When a model is stored on disk all the instances of each class are usually stored in a separate table, either an Excel workbook’s worksheet or delimiter-separated file. In the former case, the model is stored in one workbook, while in the latter it is stored in a set of files.

Taxon

The taxonomic rank of a model.

Submodel

A part of a whole-cell model which is to be simulated with a particular algorithm from the enumeration SubmodelAlgorithm. Each Submodel is associated with a Compartment that contains the Species it models, and all the reactions that transform them. A Submodel may also have parameters.

Compartment

A named physical container in the biochemical system being modeled. It could represent an organelle, a cell’s cytoplasm, or another physical or conceptual structure. It includes an initial_volume in liters, and references to the initial concentrations of the Species it contains. A compartment can have a semi-permeable membrane, which is modeled by reactions that transform reactant species in the compartment to product species in another compartment. These are called membrane-transfer reactions. A membrane-transfer reaction that moves species from compartment x to compartment y implies that x and y are adjacent.

SpeciesType

The biochemical type of a species. It contains the type’s name, structure – which is represented in InChI for metabolites and as sequences for DNA, RNA, and proteins, empirical_formula, molecular_weight, and charge. A species’ type is drawn from the attributes of SpeciesTypeType.

Species

A particular SpeciesType contained in a particular Compartment at a particular concentration.

Concentration

The molar concentration (M) of a species.

Reaction

A biochemical reaction. Each Reaction belongs to one submodel. It consists of a list of the species that participate in the reaction, stored as a list of references to ReactionParticipant instances in participants. A reaction that’s simulated by a dynamic algorithm, such as an ODE system or SSA, must have a forward rate law. A Boolean indicates whether the reaction is thermodynamically reversible. If reversible is True, then the reaction must also have a backward rate law. Rate laws are stored in the rate_laws list, and their directions are drawn from the attributes of RateLawDirection.

ReactionParticipant

ReactionParticipant combines a Species and its stoichiometric reaction coefficient. Coefficients are negative for reactants and positive for products.

RateLaw

A rate law contains a textual equation which stores the mathematical expression of the rate law. It contains the direction of the rate law, encoded with a RateLawDirection attribute. k_cat and k_m attributes for a Michaelis–Menten kinetics model are provided, but their use isn’t required.

RateLawEquation

A rate law equation’s expression contains a textual, mathematical expression of the rate law. A rate law can be used by more than one Reaction. The expression will be transcoded into a valid Python expression, stored in the transcoded attribute, and evaluated as a Python expression by a simulator. This evaluation must produce a number.

The expression is constructed from species names, compartment names, stoichiometric reaction coefficients, k_cat and k_m, and Python functions and mathematical operators. SpeciesType and Compartment names must be valid Python identifiers, and the entire expression must be a valid Python expression. A species composed of a SpeciesType named species_x located in a Compartment named c is written species_x[c]. When a rate law equation is evaluated during the simulation of a model the expression species_x[c] is interpreted as the current concentration of species_x in compartment c.

Parameter

A Parameter holds an arbitrary floating point value. It is named, associated with a a set of submodels, and should include a modifier indicating the value’s units.

4.2.2.3. wc_lang Model Data Sources

These classes record the sources of a model’s data.

Reference

A Reference holds a reference to a publication that contains data used in the model.

DatabaseReference

A Reference describes a biological or chemical database that provided data for the model.

4.2.3. Using wc_lang

The following tutorial shows several ways to use wc_lang, including reading a model from disk, defining a model programmatically and writing it to disk, and using these models:

  1. Install the required software for the tutorial:

    • Python

    • Pip

  2. Install the tutorial and the whole-cell packages that it uses:

    git clone https://github.com/KarrLab/intro_to_wc_modeling.git
    pip install --upgrade \
        ipython \
        git+https://github.com/KarrLab/wc_lang.git#egg=wc_lang \
        git+https://github.com/KarrLab/wc_utils.git#egg=wc_utils
    
  3. Change to the directory for this tutorial:

    cd intro_to_wc_modeling/intro_to_wc_modeling/wc_modeling/wc_lang_tutorial
    
  4. Open an interactive python interpreter:

    ipython
    
  5. Import the os and wc_lang.io modules:

    import os
    import wc_lang.io
    
  6. Read and write models in Excel and delimited files

    wc_lang can read and write models from specially formatted Excel workbooks in which each worksheet represents one of the model component classes above, each row represents a class instance, each column represents an instance attribute, each cell represents the value of an attribute of an instance, and string identifiers are used to indicate relationships among objects. wc_lang can also read and write models from a specially formatted sets of delimiter-separated files.

    In addition to defining a model, files that define models should contain all of the annotation needed to understand the biological semantic meaning of the model. Ideally, this should include:

    • NCBI Taxonomy ID for the taxon

    • Gene Ontology (GO) annotations for each submodel

    • The structure of each species: InChI for small molecules; sequences for polymers

    • Where possible, ChEBI ids for each small molecule

    • Where possible, ids for each gene, transcript, and protein

    • Where possible, EC numbers or KEGG ids for each reaction

    • Cell Component Ontology (CCO) annotations for each compartment

    • Systems Biology Ontology (SBO) annotations for each parameter

    • The citations which support each model decision

    • PubMed id, DOI, ISBN, or URL for each citation

    This example illustrates how to read a model from an Excel file:

    model = wc_lang.io.Reader().run(model_filename)[wc_lang.Model][0]
    

    (You may ignore a UserWarning generated by these commands.)

    If a model file is invalid (for example, it defines two species types with the same id, or a concentration that refers to a species type that is not defined), this operation will raise an exception which contains a list of all of the errors in the model definition.

    To name a model stored in a set of delimiter-separated files, wc_lang uses a filename glob pattern that matches the files in the set. The supported delimiters are commas in .csv files and tabs in .tsv files. These files use the same format as the Excel workbook format, except that each worksheet is stored as a separate file. Excel workbooks are easier to read and edit interactively, but changes to delimiter-separated files can be tracked in code version control systems such as Git.

    This example illustrates how to write a model to a set of .tsv files:

    # 'examples_dir' is a directory
    model_filename_pattern = os.path.join(examples_dir, 'example_model-*.tsv')
    wc_lang.io.Writer().run(model_filename_pattern, model, data_repo_metadata=False)
    

    The glob pattern in model_filename_pattern matches these files:

    example_model-Biomass components.tsv
    example_model-Biomass reactions.tsv
    example_model-Compartments.tsv
    example_model-Concentrations.tsv
    example_model-database references.tsv
    example_model-Model.tsv
    example_model-Parameters.tsv
    example_model-Rate laws.tsv
    example_model-Reactions.tsv
    example_model-References.tsv
    example_model-Species types.tsv
    example_model-Submodels.tsv
    example_model-Taxon.tsv
    

    in examples_dir, each of which contains a component of the model.

    Continuing the previous example, this command reads this set of .tsv files into a model:

    model_from_tsv = wc_lang.io.Reader().run(model_filename_pattern)[wc_lang.Model][0]
    

    csv files can be used similarly.

  7. Access properties of the model

    A wc_lang model (an instance of wc_lang.core.Model) has multiple attributes:

    model.id                # the model's unique identifier
    model.name              # its human readable name
    model.version           # its version number
    model.taxon             # the taxon of the organism being modeled
    model.submodels         # a list of the model's submodels
    model.compartments      # "  "   "  the model's compartments
    model.species_types     # "  "   "  its species types
    model.parameters        # "  "   "  its parameters
    model.references        # "  "   "  publication sources for the model instance
    model.identifiers       # "  "   "  identifiers in external namespaces for the model instance
    

    These provide access to the parts of a wc_lang model that are directly referenced by a model instance.

    wc_lang also provides some convenience methods that get all of the elements of a specific type which are part of a model. Each of these methods returns a list of the instances of requested type.

    model.get_compartments()
    model.get_species_types()
    model.get_submodels()
    model.get_species()
    model.get_distribution_init_concentrations()
    model.get_reactions()
    model.get_dfba_obj_reactions()
    model.get_rate_laws()
    model.get_parameters()
    model.get_references()
    

    For example, get_reactions() returns a list of all of the reactions in a model’s submodels. As illustrated below, this can be used to obtain the id of each reaction and the name of its submodel:

    reaction_identification = []
    for reaction in model.get_reactions():
        reaction_identification.append('submodel name: {}, reaction id: {}'.format(
            reaction.submodel.name, reaction.id))
    
  8. Programmatically build a new model and edit its model properties

    You can also use the classes and methods in wc_lang.core to programmatically build and edit models. While modelers typically will not create models programmatically, creating model components in this way gives you a feeling for how models are built and will .

    The following illustrates how to program a trivial model with 1 compartment, 5 species types and one reaction:

    # create a model with one submodel and one compartment
    prog_model = wc_lang.Model(id='programmatic_model', name='Programmatic model')
    
    submodel = wc_lang.Submodel(id='submodel_1', model=prog_model)
    
    cytosol = wc_lang.Compartment(id='c', name='Cytosol')
    
    # create 5 species types
    atp = wc_lang.SpeciesType(id='atp', name='ATP', model=prog_model)
    adp = wc_lang.SpeciesType(id='adp', name='ADP', model=prog_model)
    pi = wc_lang.SpeciesType(id='pi', name='Pi', model=prog_model)
    h2o = wc_lang.SpeciesType(id='h2o', name='H2O', model=prog_model)
    h = wc_lang.SpeciesType(id='h', name='H+', model=prog_model)
    
    # create an 'ATP hydrolysis' reaction that uses these species types
    atp_hydrolysis = wc_lang.Reaction(id='atp_hydrolysis', name='ATP hydrolysis')
    
    # add two reactants, which have negative stoichiometric coefficients
    atp_hydrolysis.participants.create(
        species=wc_lang.Species(id='atp[c]', species_type=atp, compartment=cytosol), coefficient=-1)
    atp_hydrolysis.participants.create(
        species=wc_lang.Species(id='h2o[c]', species_type=h2o, compartment=cytosol), coefficient=-1)
    
    # add three products, with positive stoichiometric coefficients
    atp_hydrolysis.participants.create(
        species=wc_lang.Species(id='adp[c]', species_type=adp, compartment=cytosol), coefficient=1)
    atp_hydrolysis.participants.create(
        species=wc_lang.Species(id='pi[c]', species_type=pi, compartment=cytosol), coefficient=1)
    atp_hydrolysis.participants.create(
        species=wc_lang.Species(id='h[c]', species_type=h, compartment=cytosol), coefficient=1)
    

    In this example wc_lang.core.SpeciesType(id='atp', name='ATP', model=prog_model) instantiates a SpeciesType instance with two string attributes and a model attribute that references an existing model. In addition, this expression adds the new SpeciesType to the model’s species types, thereby showing how obj_tables’s underlying functionality automatically creates bi-directional references that make it easy to build and navigate wc_lang models, and making this assertion hold:

    assert(atp in prog_model.get_species_types())
    

    The example above illustrates another way to create and connect model components. Consider the expression:

    atp_hydrolysis.participants.create(
        species=wc_lang.core.Species(species_type=atp, compartment=cytosol), coefficient=-1)
    

    participants is a Reaction instance attribute that stores a list of ReactionParticipant objects. In this expression create takes keyword arguments for the parameters used to instantiate a ReactionParticipant, instantiates a ReactionParticipant, and appends it to the list in atp_hydrolysis.participants. These assertions hold after the 5 participants are added to the ATP hydrolysis reaction:

    # 5 participants were added to the reaction
    assert(len(atp_hydrolysis.participants) == 5)
    first_reaction_participant = atp_hydrolysis.participants[0]
    assert(first_reaction_participant.reactions[0] is atp_hydrolysis)
    

    In general, the create method can be used to add model components to lists of related wc_lang.BaseModel objects. create takes keyword arguments and uses them to initialize the attributes of the component created. Thus, if obj has an attribute attr that stores a list of references to components of type X, this expression will create an instance of X and append it to the list:

    obj.attr.create(**kwargs)
    

    This simplifies model construction by avoiding creation of unnecessary identifiers for these components.

    Similar code can be used to create any part of a model. All wc_lang objects that are subclassed from wc_lang.BaseModel (an alias for obj_tables.core.Model) can be instantiated in the normal fashion, as shown for Model, Submodel, Compartment, SpeciesType and Reaction above. Each subclass of wc_lang.BaseModel contains a Meta attribute that is a class which stores meta information about the subclass. The attributes that can be initialized when a wc_lang.BaseModel class is instantiated can be obtained from the class’ Meta attribute, which is a dictionary that maps from attribute name to attribute instance:

    wc_lang.Model.Meta.attributes.keys()
    wc_lang.Submodel.Meta.attributes.keys()
    wc_lang.SpeciesType.Meta.attributes.keys()
    wc_lang.Compartment.Meta.attributes.keys()
    

    For example, Reaction has the following attributes in wc_lang.core.Reaction.Meta.attributes.keys():

    ['comments', 'id', 'max_flux', 'min_flux', 'name', 'participants', 'references',
        'reversible', 'submodel']
    

    These attributes can also be set programmatically:

    atp_hydrolysis.comments = 'example comments'
    atp_hydrolysis.reversible = False
    
  9. Viewing Models and their attributes

    All wc_lang.BaseModel instances can be viewed with pprint(), which outputs an indented representation that shows the attributes of a model, and indents and outputs connected models. To constrain the size of its output pprint() outputs the graph of interconnected models to a depth of max_depth, which defaults to 3. Model nodes at depth max_depth+1 are represented by <class name>: ..., while deeper models are not traversed. And models re-encountered by pprint() are elided by <attribute name>: --. For example, after creating the reaction atp_hydrolysis above this expression

    atp_hydrolysis.participants[0].pprint(max_depth=1)
    

    creates this output:

    ReactionParticipant:
       species:
          Species:
             species_type:
                SpeciesType: ...
             compartment:
                Compartment: ...
             concentration: None
             rate_law_equations:
             reaction_participants:
       coefficient: -1
       reactions:
          Reaction:
             id: atp_hydrolysis
             name: ATP hydrolysis
             submodel: None
             participants:
                ReactionParticipant: ...
                ReactionParticipant: ...
                ReactionParticipant: ...
                ReactionParticipant: ...
             reversible: False
             min_flux: nan
             max_flux: nan
             comments: example comments
             references:
             database_references:
             objective_functions:
             rate_laws:
    

    This shows that the first ReactionParticipant in atp_hydrolysis has the attributes species, coefficient, and reactions, that the coefficient is -1, and that reactions is a list with one element which is the atp_hydrolysis reaction itself.

  10. Validating a programmatically generated Model

    The wc_lang.core.Model.validate method determines whether a model is valid. If the model is invalid validate return a list of all of the model’s errors. It performs the following checks:

    • Check that only one model and taxon are defined

    • Check that each submodel, compartment, species type, reaction, and reference is defined only once

    • Check that each the species type and compartment referenced in each concentration and reaction exist

    • Check that values of the correct types are provided for each attribute

      • wc_lang.core.Compartment.initial_volume: float

      • wc_lang.core.Concentration.value: float

      • wc_lang.core.Parameter.value: float

      • wc_lang.core.RateLaw.k_cat: float

      • wc_lang.core.RateLaw.k_m: float

      • wc_lang.core.Reaction.reversible: bool

      • wc_lang.core.ReactionParticipant.coefficient: float

      • wc_lang.core.Reference.year: integer

      • wc_lang.core.SpeciesType.charge: integer

      • wc_lang.core.SpeciesType.molecular_weight: float

    • Check that valid values are provided for each enumerated attribute

      • wc_lang.core.RateLaw.direction

      • wc_lang.core.Reference.type

      • wc_lang.core.SpeciesType.type

      • wc_lang.core.Submodel.algorithm

      • wc_lang.core.Taxon.rank

    This example illustrates how to validate prog_model:

    prog_model.validate()
    
  11. Compare and difference Models

    wc_lang provides methods that determine if two models are semantically equal and report any semantic differences between two models. The is_equal method determines if two models are semantically equal (the two models recursively have the same attribute values, ignoring the order of the attributes which has no semantic meaning). The following code compares the semantic equality of model and model_from_tsv. Since model_from_tsv was generated by writing model to tsv files, is_equal should return True:

    assert(model.is_equal(model_from_tsv) == True)
    

    The difference method produces a textual description of the differences between two models. The following code excerpt prints the differences between model and model_from_tsv. Since they are equal, the differences should be the empty string:

    assert(model.difference(model_from_tsv) == '')
    
  12. Normalize model into a reproducible order to facilitate reproducible numerical simulations

    The attribute order has no semantic meaning in wc_lang. However, numerical simulation results derived from models described in wc_lang can be sensitive to the attribute order. To facilitate reproducible simulation results, wc_lang provides a normalize to sort models into a reproducible order.

    The following code excerpt will normalize model into a reproducible order:

    model.normalize()
    
  13. Please see http://code.karrlab.org for documentation of the entire wc_lang API.