# 3. Tutorial - Getting Familiar With datanator’s Command Line Interface¶

## 3.1. Generate Template Doc¶

datanator can be used run directly from the command line.

Let’s do an example

Create a new directory

In that directory run the following command:



$datanator get-taxonomic-lineage ‘Escherichia coli’ ## 3.3. Entering Reactions Into datanator¶ Open up the template document InputTemplate.xlsx This document has two worksheets: Reactions, and Metabolites. Reactions Worksheet Reaction ID Stoichimetry A Reacion Name #1 (ATP + UMP <==> UDP + ADP) ATP + UMP <==> UDP + ADP A Reaction Name #2 (GMP + ATP <==> ADP + GDP) GMP + ATP ==> ADP + GDP Look at the Reactions worksheet. There are two columns. The first is an identifier for the reaction. You can name this whatever you like as long as it is unique. A good suggestion is just to use the reaction string (in the template document, we did not follow this practice in order to illustrate that the name can be anything). The second column is a the stoichiometric string. Each metabolite used in the stoichiometric string needs to be structurally defined in the “Metabolites” worksheet. Open up the “Metabolites” worksheet Compound ID Structure ATP NC1=C2N=CN(C3OC(COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)C(O)C3O)C2=NC=N1 UMP OC1C(O)C(OC1COP([O-])([O-])=O)N1C=CC(=O)NC1=O UDP OC1C(O)C(OC1COP([O-])(=O)OP([O-])([O-])=O)N1C=CC(=O)NC1=O AMP NC1=C2N=CN(C3OC(COP([O-])([O-])=O)C(O)C3O)C2=NC=N1 GMP NC1=NC2=C(N=CN2C2OC(COP([O-])([O-])=O)C(O)C2O)C(=O)N1 GDP NC1=NC2=C(N=CN2C2OC(COP([O-])(=O)OP([O-])([O-])=O)C(O)C2O)C(=O)N1 The compound ID corresponds to the ID of the compound in the stoichiometric string. The structure is either an Inchi or a SMILES string (SMILES in this example). Let’s try adding a reaction: Lets say we wanted to add the reaction: Adenosine 3',5'-bisphosphate + H2O ==> phosphate + AMP  The first step is to give each metabolite in the stoichiometric string and ID without spaces. So let’s change it to: A-3-5-bisphosphate + H2O ==> phosphate + AMP  The second step is to add this reaction string to the second column in “Reactions” worksheet. In the first column, give the reaction some distinct name. Reaction ID Stoichimetry A Reacion Name #1 (ATP + UMP <==> UDP + ADP) ATP + UMP <==> UDP + ADP A Reaction Name #2 (GMP + ATP <==> ADP + GDP) GMP + ATP ==> ADP + GDP A distinct name of your choosing A-3-5-bisphosphate + H2O ==> phosphate + AMP The third step is to structurally define each metabolite in the reaction string. We already defined structurally defined AMP (we used it in the previous reactions), so we will have to structurally define A-3-5-bisphosphate, H2O, and phosphate. The structural information is here: A-3-5-bisphosphate - NC1=C2N=CN(C3OC(COP([O-])([O-])=O)C(OP([O-])([O-])=O)C3O)C2=NC=N1 H2O - O phosphate - OP([O-])([O-])=O  Now we need to add this information to the “Metabolites” worksheet. Open up the “Metabolites” worksheet. Add the name of the compound used in the stoichiometric string (ex: phosphate) to the first column, and add the structure in the second. Compound ID Structure ATP NC1=C2N=CN(C3OC(COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)C(O)C3O)C2=NC=N1 UMP OC1C(O)C(OC1COP([O-])([O-])=O)N1C=CC(=O)NC1=O UDP OC1C(O)C(OC1COP([O-])(=O)OP([O-])([O-])=O)N1C=CC(=O)NC1=O AMP NC1=C2N=CN(C3OC(COP([O-])([O-])=O)C(O)C3O)C2=NC=N1 GMP NC1=NC2=C(N=CN2C2OC(COP([O-])([O-])=O)C(O)C2O)C(=O)N1 GDP NC1=NC2=C(N=CN2C2OC(COP([O-])(=O)OP([O-])([O-])=O)C(O)C2O)C(=O)N1 A-3-5-bisphosphate NC1=C2N=CN(C3OC(COP([O-])([O-])=O)C(OP([O-])([O-])=O)C3O)C2=NC=N1 H2O O phosphate OP([O-])([O-])=O Once again, run “get-kinetics”: $ datanator get-kinetics InputTemplate.xlsx Results.xlsx 'Escherichia coli'


You should see results for the new reaction you inputted.

## 3.4. Set Maximum Proximity Limit¶

The ideal kinetic data is information about the reaction being studied, collected from an expirement done in the species you are studying. However, often you will have to rely on kinetic data from different species. At a certain taxonomic distance, you might decide it’s better to collect data from similar reactions taken from expirements in more closely related organisms.

There are two dimensions of granularity - reaction variation and species variation - and the user can decide which data is preferred.

The user can do this by setting the “proximLimit”

Let’s try an example:

First, you want to get taxonomic infomation about the organism you are stuyding. Run:

\$ datanator get-taxonomic-lineage 'Escherichia coli'


You should see:

1: Escherichia coli
2: Escherichia
3: Enterobacteriaceae
4: Enterobacterales
5: Gammaproteobacteria
6: Proteobacteria
7: Bacteria
8: cellular organisms
9: root


This is the number of nodes as you start from your organism, and climb up to the top of the taxonomic tree. Each number corresponds to a node.

So let’s say you are studying Escherichia coli. Maybe you think that anything outside the phylum protobacteria is too distantly related to be useful. In that case, you will want to run the “get-kinetics” argument, with an optional argument –proximit-limit. The number given after –proxim-limit is the highest node that will be considered useful. Since we have chosen Protobacteria, that number is 6

So, run:

datanator get-kinetics InputTemplate.xlsx Results.xlsx 'Escherichia coli' --proxim-limit 6