10.1. bpforms package

10.1.2. Submodules

10.1.3. bpforms.__main__ module

bpforms command line interface

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-01-31
Copyright:2019, Karr Lab
License:MIT
class bpforms.__main__.App(label=None, **kw)[source]

Bases: cement.core.foundation.App

Command line application

class Meta[source]

Bases: object

base_controller = 'base'[source]
handlers = [<class 'bpforms.__main__.BaseController'>, <class 'bpforms.__main__.ValidateController'>, <class 'bpforms.__main__.GetPropertiesController'>, <class 'bpforms.__main__.GetMajorMicroSpeciesController'>, <class 'bpforms.__main__.BuildAlphabetsController'>, <class 'bpforms.__main__.VizAlphabetController'>][source]
label = 'bpforms'[source]
class bpforms.__main__.BaseController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Base controller for command line application

class Meta[source]

Bases: object

arguments = [(['-v', '--version'], {'action': 'version', 'version': '0.0.5'})][source]
description = 'bpforms'[source]
help = 'bpforms'[source]
label = 'base'[source]
class bpforms.__main__.BuildAlphabetsController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database

class Meta[source]

Bases: object

arguments = [(['--ph'], {'type': <class 'float'>, 'default': 7.4, 'help': 'pH at which calculate major protonation state of each monomeric form'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'}), (['--max-monomers'], {'type': <class 'float'>, 'default': inf, 'help': 'Maximum number of monomeric forms to build. Used for testing'}), (['--alphabet'], {'type': <class 'str'>, 'default': None, 'dest': 'alphabets', 'action': 'append', 'help': 'Id of alphabet to build. Defualt: build all alphabets'})][source]
description = 'Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database'[source]
help = 'Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database'[source]
label = 'build-alphabets'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.GetMajorMicroSpeciesController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Calculate the major protonation and tautomerization

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'}), (['ph'], {'type': <class 'float'>, 'help': 'pH'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'})][source]
description = 'Calculate the major protonation and tautomerization state of a biopolymer form to a specific pH'[source]
help = 'Calculate the major protonation and tautomerization state of a biopolymer form to a specific pH'[source]
label = 'get-major-micro-species'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.GetPropertiesController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Calculate physical properties such as length, chemical formula, molecular weight, and charge

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'}), (['--ph'], {'default': None, 'type': <class 'float'>, 'help': 'pH at which calculate major protonation state of each monomeric form'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'})][source]
description = 'Calculate physical properties such as length, chemical formula, molecular weight, and charge'[source]
help = 'Calculate physical properties such as length, chemical formula, molecular weight, and charge'[source]
label = 'get-properties'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.ValidateController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Validate a biopolymer form

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'})][source]
description = 'Validate a biopolymer form'[source]
help = 'Validate a biopolymer form'[source]
label = 'validate'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.VizAlphabetController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Visualize an alphabet

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['path'], {'type': <class 'str'>, 'help': 'Path to save visualization of alphabet'})][source]
description = 'Visualize an alphabet'[source]
help = 'Visualize an alphabet'[source]
label = 'viz-alphabet'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
bpforms.__main__.main()[source]

10.1.4. bpforms.core module

Classes to represent modified forms of DNA, RNA, and proteins

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-01-31
Copyright:2019, Karr Lab
License:MIT
class bpforms.core.Alphabet(id=None, name=None, description=None, monomers=None)[source]

Bases: object

Alphabet for monomeric forms

id[source]

id

Type:str
name[source]

name

Type:str
description[source]

description

Type:str
monomers[source]

monomeric forms

Type:dict
from_dict(dict)[source]

Create alphabet from a dictionary representation

Parameters:dict (dict) – dictionary representation of alphabet
Returns:alphabet
Return type:Alphabet
from_yaml(path)[source]

Read alphabet from YAML file

Parameters:path (str) – path to YAML file which defines alphabet
Returns:alphabet
Return type:Alphabet
get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Calculate the major protonation and tautomerization of each monomeric form

Parameters:
  • ph (float) – pH
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
get_monomer_code(monomer)[source]

Get the code for a monomeric form in the alphabet

Parameters:monomer (Monomer) – monomeric form
Returns:code for monomeric form
Return type:str
Raises:ValueError – if monomeric form is not in alphabet
is_equal(other)[source]

Determine two alphabets are semantically equal

Parameters:other (type) – other alphabet
Returns:True, if the alphabets are semantically equal
Return type:bool
monomers[source]

Get the monomeric forms

Returns:monomeric forms
Return type:MonomerDict
to_dict()[source]

Get dictionary representation of alphabet

Returns:dictionary representation of alphabet
Return type:dict
to_yaml(path)[source]

Save alphabet to YAML file

Parameters:path (str) – path to save alphabet in YAML format
class bpforms.core.AlphabetBuilder(_max_monomers=inf)[source]

Bases: abc.ABC

Builder for alphabets

_max_monomers[source]

maximum number of monomeric forms to build; used to limit length of tests

Type:float
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters:
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

get_major_micro_species(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Get major microspecies for monomeric forms in alphabet

Parameters:
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
run(ph=None, major_tautomer=False, dearomatize=False, path=None)[source]

Build alphabet and, optionally, save to YAML file

Parameters:
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
  • path (str, optional) – path to save alphabet
Returns:

alphabet

Return type:

Alphabet

save(alphabet, path)[source]

Save alphabet to YAML file

Parameters:
  • alphabet (Alphabet) – alphabet
  • path (str) – path to save alphabet
class bpforms.core.Atom(molecule, element, position=None, charge=0, monomer=None)[source]

Bases: object

An atom in a compound or bond

molecule[source]

type of parent molecule

Type:type
element[source]

code for the element (e.g. ‘H’)

Type:str
position[source]

position of the atom within the molecule, which should use canonical SMILES atom numbers

Type:int
charge[source]

charge of the atom

Type:int
monomer[source]

index of parent monomeric form within sequence

Type:int
charge[source]

Get the charge

Returns:charge
Return type:str
element[source]

Get the element

Returns:element
Return type:str
from_dict(dict)[source]

Load from dictionary representation

Parameters:dict (dict) – dictionary representation
Returns:atom
Return type:Atom
is_equal(other)[source]

Determine if two atoms are semantically equal

Parameters:other (Atom) – other atom
Returns:obj:True if the atoms are semantically equal
Return type:bool
molecule[source]

Get type of parent molecule

Returns:type of parent molecule
Return type:type
monomer[source]

Get the index of the parent monomer within the sequence

Returns:index of the parent monomer within the sequence
Return type:int
position[source]

Get the position

Returns:position
Return type:int
to_dict()[source]

Get dictionary representation

Returns:dictionary representation
Return type:dict
class bpforms.core.AtomList(atoms=None)[source]

Bases: list

List of atoms

__setitem__(slice, atom)[source]

Set atom(s) at slice

Parameters:
  • slice (int or slice) – position(s) to set atom
  • atom (Atom or AtomList) – atom or atoms
append(atom)[source]

Add a atom

Parameters:atom (Atom) – atom
Raises:ValueError – if the atom is not an instance of Atom
extend(atoms)[source]

Add a list of atoms

Parameters:atoms (iterable of Atom) – iterable of atoms
from_list(list)[source]

Load from list representation

Parameters:list (list) – list representation
Returns:atom list
Return type:AtomList
insert(i, atom)[source]

Insert an atom at a position

Parameters:
  • i (int) – position to insert atom
  • atom (Atom) – atom
is_equal(other)[source]

Determine if two lists of atoms are semantically equal

Parameters:other (AtomList) – other list of atoms
Returns:True, of the lists of atoms are semantically equal
Return type:bool
to_list()[source]

Get list representation

Returns:list representation
Return type:list
class bpforms.core.Backbone(structure=None, monomer_bond_atoms=None, monomer_displaced_atoms=None)[source]

Bases: object

Backbone of a monomeric form

structure[source]

chemical structure

Type:openbabel.OBMol
monomer_bond_atoms[source]

atoms from backbone that bond to monomeric form

Type:AtomList
monomer_displaced_atoms[source]

atoms from backbone displaced by bond to monomeric form

Type:AtomList
export(format, options=())[source]

Export structure to format

Parameters:
  • format (str) – format
  • options (list of str, optional) – export options
Returns:

format representation of structure

Return type:

str

get_charge()[source]

Get the charge

Returns:charge
Return type:int
get_formula()[source]

Get the formula

Returns:formula
Return type:EmpiricalFormula
get_mol_wt()[source]

Get the molecular weight

Returns:molecular weight
Return type:float
is_equal(other)[source]

Determine if two backbones are semantically equal

Parameters:other (Backbone) – other backbone
Returns:True if the backbones are semantically equal
Return type:bool
monomer_bond_atoms[source]

Get the backbone bond atoms

Returns:backbone bond atoms
Return type:AtomList
monomer_displaced_atoms[source]

Get the backbone displaced atoms

Returns:backbone displaced atoms
Return type:AtomList
structure[source]

Get the structure

Returns:structure
Return type:openbabel.OBMol
class bpforms.core.Bond(l_bond_atoms=None, r_bond_atoms=None, l_displaced_atoms=None, r_displaced_atoms=None)[source]

Bases: object

Bond between monomeric forms

l_bond_atoms[source]

atoms from left monomeric form that bond with right monomeric form

Type:AtomList
r_bond_atoms[source]

atoms from right monomeric form that bond with left monomeric form

Type:AtomList
l_displaced_atoms[source]

atoms from left monomeric form displaced by bond

Type:AtomList
r_displaced_atoms[source]

atoms from right monomeric form displaced by bond

Type:AtomList
__str__()[source]

Generate string representation of bond

Returns:string representation of bond
Return type:str
get_charge(none_position=True)[source]

Get the charge

Parameters:none_position (bool, optional) – include atoms whose position is None
Returns:charge
Return type:int
get_formula(none_position=True)[source]

Get the formula

Parameters:none_position (bool, optional) – include atoms whose position is None
Returns:formula
Return type:EmpiricalFormula
get_mol_wt(none_position=True)[source]

Get the molecular weight

Parameters:none_position (bool, optional) – include atoms whose position is None
Returns:molecular weight
Return type:float
is_equal(other)[source]

Determine if two bonds are semantically equal

Parameters:other (Bond) – other bond
Returns:True if the bond are semantically equal
Return type:bool
l_bond_atoms[source]

Get the left bond atoms

Returns:left bond atoms
Return type:AtomList
l_displaced_atoms[source]

Get the left displaced atoms

Returns:left displaced atoms
Return type:AtomList
r_bond_atoms[source]

Get the right bond atoms

Returns:right bond atoms
Return type:AtomList
r_displaced_atoms[source]

Get the right displaced atoms

Returns:right displaced atoms
Return type:AtomList
class bpforms.core.BondSet[source]

Bases: set

Set of bonds

add(bond)[source]

Add a bond

Parameters:bond (Bond) – bond
Raises:ValueError – if the bond is not an instance of Bond
is_equal(other)[source]

Check if two sets of bonds are semantically equal

Parameters:other (BondSet) – other set of bonds
Returns:True, if the bond sets are semantically equal
Return type:bool
symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters:other (BondSet) – other set of bonds
update(bonds)[source]

Add a set of bonds

Parameters:bonds (iterable of Bond) – bonds
class bpforms.core.BpForm(seq=None, alphabet=None, backbone=None, bond=None, circular=False, crosslinks=None)[source]

Bases: object

Biopolymer form

seq[source]

sequence of monomeric forms of the biopolymer

Type:MonomerSequence
alphabet[source]

alphabet of monomeric forms

Type:Alphabet
backbone[source]

backbone that connects monomeric forms

Type:Backbone
bond[source]

bonds between (backbones of) monomeric forms

Type:Bond
circular[source]

if True, indicates that the biopolymer is circular

Type:bool

crosslinking intrachain bonds

Type:BondSet
features[source]

set of features

Type:BpFormFeatureSet
_parser[source]

parser

Type:lark.Lark
DEFAULT_FASTA_CODE = '?'[source]
__contains__(monomer)[source]

Determine if a monomeric form is in the biopolymer form

Parameters:monomer (Monomer) – monomeric form
Returns:true if the monomeric form is in the sequence
Return type:bool
__delitem__(slice)[source]

Delete monomeric form(s) at slice

Parameters:slice (int or slice) – position(s)
__getitem__(slice)[source]

Get monomeric form(s) at slice

Parameters:slice (int or slice) – position(s)
Returns:monomeric form(s)
Return type:Monomer or Monomers
__iter__()[source]

Get iterator over sequence of monomeric forms

Returns:iterator of monomeric forms
Return type:iterator of Monomer
__len__()[source]

Get the length of the sequence of the form

Returns:length
Return type:int
__reversed__()[source]

Get reverse iterator over sequence of monomeric forms

Returns:iterator of monomeric forms
Return type:iterator of Monomer
__setitem__(slice, monomer)[source]

Set monomeric form(s) at slice

Parameters:
  • slice (int or slice) – position(s)
  • monomer (Monomer or Monomers) – monomeric forms(s)
__str__()[source]

Get a string representation of the biopolymer form

Returns:string representation of the biopolymer form
Return type:str
alphabet[source]

Get the alphabet

Returns:alphabet
Return type:Alphabet
backbone[source]

Get the backbones

Returns:backbones
Return type:Backbone
bond[source]

Get the bonds

Returns:bonds
Return type:Bond
can_monomer_bond_left(monomer)[source]

Check if monomeric form can bond to the left

Parameters:monomer (Monomer) – monomeric form
Returns:True, if the monomeric form can bond to the left
Return type:bool
can_monomer_bond_right(monomer)[source]

Check if monomeric form can bond to right

Parameters:monomer (Monomer) – monomeric form
Returns:True, if the monomeric form can bond to the right
Return type:bool
circular[source]

Get the circularity

Returns:circularity
Return type:bool
crosslinks[source]

Get the crosslinking intrachain bonds

Returns:crosslinking intrachain bonds
Return type:BondSet
export(format, include_all_hydrogens=False, options=())[source]

Export structure to format

Parameters:
  • format (str) – format
  • include_all_hydrogens (bool, optional) – if True, explicitly include all hydrogens
  • options (list of str, optional) – export options
Returns:

format representation of structure

Return type:

str

features[source]

Get the features

Returns:features
Return type:BpFormFeatureSet
file = <_io.TextIOWrapper name='/root/project/bpforms/grammar.lark' mode='r' encoding='UTF-8'>[source]
from_str(string)[source]

Create biopolymer form its string representation

Parameters:string (str) – string representation of the biopolymer
Returns:biopolymer form
Return type:BpForm
get_canonical_seq(monomer_codes=None)[source]

Get IUPAC/IUBMB representation of a polymer with bases represented by the character codes of their parent monomers (e.g. methyl-2-adenosine is represented by ‘A’)

Parameters:monomer_codes (dict, optional) – dictionary that maps monomers to their codes
Returns:IUPAC/IUBMB representation of a polymer
Return type:str
get_charge()[source]

Get the charge

Returns:charge
Return type:int
get_formula()[source]

Get the chemical formula

Returns:chemical formula
Return type:EmpiricalFormula
get_image(monomer_color=0, backbone_color=16711680, left_right_bond_color=65280, crosslink_bond_color=255, include_all_hydrogens=True, show_atom_nums=False, width=200, height=200, image_format='svg', include_xml_header=True)[source]

Get image

Parameters:
  • monomer_color (int, optional) – color to paint atoms involved in monomeric forms
  • backbone_color (int, optional) – color to paint atoms involved in backbones
  • left_right_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to left
  • crosslink_bond_color (int, optional) – color to paint atoms involved in crosslinks
  • include_all_hydrogens (bool, optional) – if True, show all hydrogens
  • show_atom_nums (bool, optional) – if True, show the numbers of the atoms
  • width (int, optional) – width in pixels
  • height (int, optional) – height in pixels
  • image_format (str, optional) – format of generated image {emf, eps, jpeg, msbmp, pdf, png, or svg}
  • include_xml_header (bool, optional) – if True, include XML header at the beginning of the SVG
Returns:

image

Return type:

object

get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Get the major protonation and tautomerization state

Parameters:
  • ph (float) – pH
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

major protonation and tautomerization state

Return type:

openbabel.OBMol

get_mol_wt()[source]

Get the molecular weight

Returns:molecular weight
Return type:float
get_monomer_counts()[source]

Get the frequency of each monomeric form within the biopolymer

Returns:dictionary that maps monomeric forms to their counts
Return type:dict
get_structure(include_all_hydrogens=False)[source]

Get an Open Babel molecule of the structure

Parameters:include_all_hydrogens (bool, optional) – if True, explicitly include all hydrogens
Returns:
  • openbabel.OBMol: Open Babel molecule of the structure
  • dict of dict: dictionary which maps indices (1-based) of monomeric forms
    to dictionaries which map types of components of monomeric forms (‘monomer’ or ‘backbone’) to dictionaries which map indices (1-based) of atoms to atoms (instances of openbabel.OBAtom)
Return type:tuple
is_equal(other)[source]

Check if two biopolymer forms are semantically equal

Parameters:other (BpForm) – another biopolymer form
Returns:True, if the objects have the same structure
Return type:bool
seq[source]

Get the sequence of monomeric forms

Returns:sequence of monomeric forms
Return type:MonomerSequence
validate()[source]

Check that the biopolymer form is valid and return any errors

  • Check that monomeric forms \(1 \ldots L-1\) can bond to the right (their right bonding attributes are set)
  • Check that monomeric forms \(2 \ldots L\) can bond to the left (their left bonding attributes are set)
  • No atom is involved in multiple bonds
Returns:list of errors, if any
Return type:list of str
class bpforms.core.BpFormFeature(form, start_position, end_position)[source]

Bases: object

A region (start and end positions) of a BpForm

form[source]

biopolymer form

Type:BpForm
start_position[source]

start position (1-base)

Type:int
end_position[source]

end position (1-based)

Type:int
end_position[source]

Get the end position

Returns:end position
Return type:int
form[source]

Get the biopolymer form

Returns:biopolymer form
Return type:BpForm
start_position[source]

Get the start position

Returns:start position
Return type:int
class bpforms.core.BpFormFeatureSet(form)[source]

Bases: set

Set of features

form[source]

form

Type:BpForm
add(feature)[source]

Add a feature

Parameters:feature (BpFormFeature) – feature
Raises:ValueError – if the feature is not an instance of BpFormFeature
form[source]

Get the biopolymer form

Returns:biopolymer form
Return type:BpForm
remove(feature)[source]

Remove a feature

Parameters:feature (BpFormFeature) – feature
symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters:other (BpFormFeatureSet) – other set of features
update(features)[source]

Add a set of features

Parameters:features (iterable of BpFormFeature) – features
exception bpforms.core.BpFormsWarning[source]

Bases: UserWarning

BpForms warning

class bpforms.core.Identifier(ns, id)[source]

Bases: object

A identifier in a namespace for an external database

ns[source]

namespace

Type:str
id[source]

id in namespace

Type:str
__eq__(other)[source]

Check if two identifiers are semantically equal

Parameters:other (Identifier) – another identifier
Returns:True, if the identifiers are semantically equal
Return type:bool
__hash__()[source]

Generate a hash

Returns:hash
Return type:int
id[source]

Get the id

Returns:id
Return type:str
ns[source]

Get the namespace

Returns:namespace
Return type:str
class bpforms.core.IdentifierSet(identifiers=None)[source]

Bases: set

Set of identifiers

add(identifier)[source]

Add an identifier

Parameters:identifier (Identifier) – identifier
Raises:ValueError – if the identifier is not an instance of Indentifier
symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters:other (IdentifierSet) – other set of identifiers
update(identifiers)[source]

Add a set of identifiers

Parameters:identifiers (iterable of Identifier) – identifiers
class bpforms.core.Monomer(id=None, name=None, synonyms=None, identifiers=None, structure=None, delta_mass=None, delta_charge=None, start_position=None, end_position=None, monomers_position=None, base_monomers=None, backbone_bond_atoms=None, backbone_displaced_atoms=None, r_bond_atoms=None, l_bond_atoms=None, r_displaced_atoms=None, l_displaced_atoms=None, comments=None)[source]

Bases: object

A monomeric form in a biopolymer

id[source]

id

Type:str
name[source]

name

Type:str
synonyms[source]

synonyms

Type:set of str
identifiers[source]

identifiers in namespaces for external databases

Type:set of Identifier, optional
structure[source]

chemical structure

Type:openbabel.OBMol
delta_mass[source]

additional mass (Dalton) relative to structure

Type:float
delta_charge[source]

additional charge relative to structure

Type:int
start_position[source]

uncertainty in the location of the monomeric form

Type:tuple
end_position[source]

uncertainty in the location of the monomeric form

Type:tuple
monomers_position[source]

originating monomers within start_position to end_position where the monomeric form may be located

Type:set of Monomer
base_monomers[source]

monomers which this monomeric form is derived from

Type:set of Monomer
backbone_bond_atoms[source]

atoms from monomeric form that bond to backbone

Type:AtomList
backbone_displaced_atoms[source]

atoms from monomeric form displaced by bond to backbone

Type:AtomList
r_bond_atoms[source]

atoms that bond with right/suceeding/following/forward monomeric form

Type:AtomList
l_bond_atoms[source]

atoms that bond with left/preceding/previous/backward monomeric form

Type:AtomList
r_displaced_atoms[source]

atoms displaced by bond with right/suceeding/following/forward monomeric form

Type:AtomList
l_displaced_atoms[source]

atoms displaced by bond with left/preceding/previous/backward monomeric form

Type:AtomList
comments[source]

comments

Type:str
IMAGE_URL_PATTERN = 'https://cactus.nci.nih.gov/chemical/structure/{}/image?format=gif&bgcolor=transparent&antialiasing=0'[source]
__str__(alphabet=None)[source]

Get a string representation of the monomeric form

Parameters:alphabet (Alphabet, optional) – alphabet
Returns:string representation of the monomeric form
Return type:str
backbone_bond_atoms[source]

Get the atoms from the monomeric form that bond to backbone

Returns:atoms from the monomeric form that bond to backbone
Return type:AtomList
backbone_displaced_atoms[source]

Get the atoms from the monomeric form displaced by the bond to the backbone

Returns:atoms from the monomeric form displaced by the bond to the backbone
Return type:AtomList
base_monomers[source]

Get base monomeric forms

Returns:base monomeric forms
Return type:set of Monomer
comments[source]

Get comments

Returns:comments
Return type:str
delta_charge[source]

Get extra charge

Returns:extra charge
Return type:int
delta_mass[source]

Get extra mass

Returns:extra mass
Return type:float
end_position[source]

Get end position

Returns:end position
Return type:int
export(format, options=())[source]

Export structure to format

Parameters:
  • format (str) – format
  • options (list of str, optional) – export options
Returns:

format representation of structure

Return type:

str

from_dict(dict, alphabet=None)[source]

Get a dictionary representation of the monomeric form

Parameters:
  • dict (dict) – dictionary representation of the monomeric form
  • alphabet (Alphabet, optional) – alphabet
Returns:

monomeric form

Return type:

Monomer

get_canonical_code(monomer_codes, default_code='?')[source]

Get IUPAC/IUBMB representation of a monomeric form using the character code of its parent monomer (e.g. ‘methyl-2-adenosine’ is represented by ‘A’)

Parameters:
  • monomer_codes (dict) – dictionary that maps monomeric forms to codes
  • default_code (str) – default code
Returns:

IUPAC/IUBMB representation of monomeric form

Return type:

str

get_charge()[source]

Get the charge

Returns:charge
Return type:int
get_formula()[source]

Get the chemical formula

Returns:chemical formula
Return type:EmpiricalFormula
get_image(bond_label='', displaced_label='', bond_opacity=255, displaced_opacity=63, backbone_bond_color=16711680, left_bond_color=65280, right_bond_color=255, include_all_hydrogens=True, show_atom_nums=False, width=200, height=200, image_format='svg', include_xml_header=True)[source]

Get image

Parameters:
  • bond_label (str, optional) – label for atoms involved in bonds
  • displaced_label (str, optional) – labels for atoms displaced by bond formation
  • bond_opacity (int, optional) – opacity of atoms involved in bonds
  • displaced_opacity (int, optional) – opacity of atoms dislaced by bond formation
  • backbone_bond_color (int, optional) – color to paint atoms involved in bond with backbone
  • left_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to left
  • right_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to right
  • include_all_hydrogens (bool, optional) – if True, show all hydrogens
  • show_atom_nums (bool, optional) – if True, show the numbers of the atoms
  • width (int, optional) – width in pixels
  • height (int, optional) – height in pixels
  • image_format (str, optional) – format of generated image {emf, eps, jpeg, msbmp, pdf, png, or svg}
  • include_xml_header (bool, optional) – if True, include XML header at the beginning of the SVG
Returns:

image

Return type:

object

get_image_url()[source]

Get URL for image of structure

Returns:URL for image of structure
Return type:str
get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Update to the major protonation and tautomerization state at the pH

Parameters:
  • ph (float) – pH
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
get_mol_wt()[source]

Get the molecular weight

Returns:molecular weight
Return type:float
get_root_monomers()[source]

Get root monomeric forms

Returns:root monomeric forms
Return type:set of Monomer
id[source]

Get id

Returns:id
Return type:str
identifiers[source]

Get identifiers

Returns:identifiers
Return type:IdentifierSet
is_equal(other)[source]

Check if two monomeric forms are semantically equal

Parameters:other (Monomer) – another monomeric form
Returns:True, if the objects have the same structure
Return type:bool
l_bond_atoms[source]

Get the right bond atoms

Returns:right bond atoms
Return type:AtomList
l_displaced_atoms[source]

Get the right displaced atoms

Returns:right displaced atoms
Return type:AtomList
monomers_position[source]

Get the originating monomers within start_position to end_position where the monomeric form may be located

Returns:
originating monomers within start_position to
end_position where the monomeric form may be located
Return type:set of Monomer
name[source]

Get name

Returns:name
Return type:str
r_bond_atoms[source]

Get the left bond atoms

Returns:left bond atoms
Return type:AtomList
r_displaced_atoms[source]

Get the left displaced atoms

Returns:left displaced atoms
Return type:AtomList
start_position[source]

Get start position

Returns:start position
Return type:int
structure[source]

Get structure

Returns:structure
Return type:openbabel.OBMol
synonyms[source]

Get synonyms

Returns:synonyms
Return type:SynonymSet
to_dict(alphabet=None)[source]

Get a dictionary representation of the monomeric form

Parameters:alphabet (Alphabet, optional) – alphabet
Returns:dictionary representation of the monomeric form
Return type:dict
class bpforms.core.MonomerDict(*args, **kwargs)[source]

Bases: attrdict.dictionary.AttrDict

Dictionary for monomeric forms

__setitem__(code, monomer)[source]

Set monomeric form with code

Parameters:
  • code (str) – characters for monomeric form
  • monomer (Monomer) – monomeric form
class bpforms.core.MonomerSequence(monomers=None)[source]

Bases: list

Sequence of monomeric forms

__setitem__(slice, monomer)[source]

Set monomeric form(s) at slice

Parameters:
  • slice (int or slice) – position(s) to set monomeric form
  • monomer (Monomer or list of Monomer) – monomeric form(s)
append(monomer)[source]

Add a monomeric form

Parameters:monomer (Monomer) – monomeric form
Raises:ValueError – if the monomer is not an instance of Monomer
extend(monomers)[source]

Add a list of monomeric forms

Parameters:monomers (iterable of Monomer) – iterable of monomeric forms
get_monomer_counts()[source]

Get the frequency of each monomeric form within the sequence

Returns:dictionary that maps monomeric forms to their counts
Return type:dict
insert(i, monomer)[source]

Insert a monomeric form at a position

Parameters:
  • i (int) – position to insert monomeric form
  • monomer (Monomer) – monomeric form
is_equal(other)[source]

Determine if two sequences of monomeric forms are semantically equal

Parameters:other (MonomerSequence) – other sequence
Returns:True, of the sequences are semantically equal
Return type:bool
class bpforms.core.SynonymSet(synonyms=None)[source]

Bases: set

Set of synonyms

add(synonym)[source]

Add an synonym

Parameters:synonym (str) – synonym
Raises:ValueError – if the synonym is not an instance of Indentifier
symmetric_difference_update(other)[source]

Remove common synonyms with other and add synonyms from other not in self

Parameters:other (SynonymSet) – other set of synonyms
update(synonyms)[source]

Add a set of synonyms

Parameters:synonyms (iterable of SynonymSet) – synonyms
bpforms.core.get_hydrogen_atom(parent_atom, bonding_hydrogens, i_monomer)[source]

Get a hydrogen atom attached to a parent atom

Parameters:
  • parent_atom (openbabel.OBAtom) – parent atom
  • bonding_hydrogens (list) – hydrogens that have already been gotten
  • i_monomer (int) – index of parent monomer in sequence
Returns:

hydrogen atom

Return type:

openbabel.OBAtom

bpforms.core.parse_yaml(path)[source]

Read a YAML file

Parameters:path (str) – path to YAML file which defines alphabet
Returns:content of file
Return type:object

10.1.5. bpforms.rest module

REST JSON API

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-02-05
Copyright:2019, Karr Lab
License:MIT
class bpforms.rest.AlpabetResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get alphabets

endpoint = 'alphabet_alpabet_resource'[source]
get(id)[source]

Get an alphabet

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.AlphabetsResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get list of alphabets

endpoint = 'alphabet_alphabets_resource'[source]
get()[source]

Get a list of available alphabets

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.Bpform(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Optionally, calculate the major protonation and tautomerization form a biopolymer form and calculate its properties

endpoint = 'bpform_bpform'[source]
mediatypes()[source]
methods = {'POST'}[source]
post()[source]

Optionally, calculate the major protonation and tautomerization form a biopolymer form and calculate its properties

class bpforms.rest.MonomerResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get information about a monomer

endpoint = 'alphabet_monomer_resource_2'[source]
get(alphabet, monomer, format)[source]

Get a monomeric form

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.PrefixMiddleware(app, prefix='')[source]

Bases: object

bpforms.rest.get_alphabet(id)[source]

Get an alphabet

Parameters:id (str) – id of alphabet
Returns:dictionary representation of an alphabet
Return type:dict
bpforms.rest.get_monomer(alphabet, monomer, format)[source]

Get a monomeric form

Parameters:
  • alphabet (str) – id of the alphabet
  • monomer (str) – code of a monomeric form
  • format (str) – output format (“emf”, “eps”, “jpeg”, “json”, “msbmp”, “pdf”, “png” or “svg”)
Returns:

dictionary representation of an monomer or SVG-encoded image of a monomer

Return type:

object

bpforms.rest.get_monomer_properties(alphabet, monomer)[source]

Get properties of a monomeric form

Parameters:
  • alphabet (str) – id of an alphabet
  • monomer (str) – code of monomeric form
Returns:

properties of monomeric form

Return type:

dict

10.1.6. bpforms.util module

Utilities for BpForms

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-02-05
Copyright:2019, Karr Lab
License:MIT
bpforms.util.build_alphabets(ph=None, major_tautomer=False, dearomatize=False, _max_monomers=inf, alphabets=None)[source]

Build DNA, RNA, and protein alphabets

Parameters:
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
  • _max_monomers (float, optional) – maximum number of monomeric forms to build; used for testing
  • alphabets (list of str or None, optional) – ids of alphabets to build. If None, build all alphabets
bpforms.util.gen_html_viz_alphabet(bpform_type, filename)[source]

Create and save an HTML document with images of the monomeric forms in an alphabet

Parameters:
  • bpform_type (type) – subclass of core.BpForm
  • filename (str) – path to save HTML document with images of monomeric forms
bpforms.util.get_alphabet(alphabet)[source]

Get an alphabet

Parameters:alphabet (str) – alphabet
Returns:alphabet
Return type:core.Alphabet
bpforms.util.get_alphabets()[source]

Get a list of available alphabets

Returns:dictionary which maps the ids of alphabets to alphabets
Return type:dict
bpforms.util.get_form(alphabet)[source]

Get a subclass of BpFrom

Parameters:alphabet (str) – alphabet
Returns:subclass of BpForm
Return type:type
bpforms.util.read_from_fasta(filename, alphabet)[source]

Read BpForms from a FASTA-formatted file

Parameters:
  • filename (str) – path to FASTA-formatted file
  • alphabet (str) – alphabet of BpForms in file
Returns:

dictionary which maps the ids of molecules to their BpForms-encoded

sequences

Return type:

dict

bpforms.util.validate_bpform_bonds(form_type)[source]

Validate bonds in alphabet

Parameters:form_type (type) – type of BpForm
Raises:ValueError – if any of the bonds are invalid
bpforms.util.write_to_fasta(forms, filename)[source]

Write BpForms to a FASTA-formatted file

Parameters:
  • forms (dict) – dictionary which maps the ids of molecules to their BpForms-encoded sequences
  • filename (str) – path to FASTA-formatted file

10.1.7. Module contents