13.1. bpforms package

13.1.2. Submodules

13.1.3. bpforms.__main__ module

bpforms command line interface

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-01-31

Copyright

2019, Karr Lab

License

MIT

class bpforms.__main__.App(label=None, **kw)[source]

Bases: cement.core.foundation.App

Command line application

class Meta[source]

Bases: object

base_controller = 'base'[source]
handlers = [<class 'bpforms.__main__.BaseController'>, <class 'bpforms.__main__.ValidateController'>, <class 'bpforms.__main__.GetPropertiesController'>, <class 'bpforms.__main__.GetMajorMicroSpeciesController'>, <class 'bpforms.__main__.BuildAlphabetsController'>, <class 'bpforms.__main__.VizAlphabetController'>, <class 'bpforms.__main__.ExportOntosController'>][source]
label = 'bpforms'[source]
class bpforms.__main__.BaseController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Base controller for command line application

class Meta[source]

Bases: object

arguments = [(['-v', '--version'], {'action': 'version', 'version': '0.0.16'})][source]
description = 'bpforms'[source]
help = 'bpforms'[source]
label = 'base'[source]
class bpforms.__main__.BuildAlphabetsController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database

class Meta[source]

Bases: object

arguments = [(['--ph'], {'type': <class 'float'>, 'default': 7.4, 'help': 'pH at which calculate major protonation state of each monomeric form'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'}), (['--max-monomers'], {'type': <class 'float'>, 'default': inf, 'help': 'Maximum number of monomeric forms to build. Used for testing'}), (['--alphabet'], {'type': <class 'str'>, 'default': None, 'dest': 'alphabets', 'action': 'append', 'help': 'Id of alphabet to build. Defualt: build all alphabets'})][source]
description = 'Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database'[source]
help = 'Build DNA, RNA, and protein alphabets from DNAmod, MODOMICS, the PDB Chemical Component Dictionary, RESID, and the RNA Modification Database'[source]
label = 'build-alphabets'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.ExportOntosController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Export alphabets of residues and ontology of crosslinks to OBO file

class Meta[source]

Bases: object

arguments = [(['filename'], {'type': <class 'str'>, 'help': 'Path to save ontology'}), (['--alphabet'], {'type': <class 'str'>, 'default': None, 'dest': 'alphabets', 'action': 'append', 'help': 'Id of alphabet to export. Defualt: export all alphabets'}), (['--max-monomers'], {'type': <class 'int'>, 'default': None, 'help': 'Maximum number of monomers to export'}), (['--max-xlinks'], {'type': <class 'int'>, 'default': None, 'help': 'Maximum number of crosslinks to export'})][source]
description = 'Export alphabets of residues and ontology of crosslinks to OBO file'[source]
help = 'Export alphabets of residues and ontology of crosslinks to OBO file'[source]
label = 'export-ontos'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.GetMajorMicroSpeciesController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Calculate the major protonation and tautomerization

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'}), (['ph'], {'type': <class 'float'>, 'help': 'pH'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'})][source]
description = 'Calculate the major protonation and tautomerization state of a biopolymer form to a specific pH'[source]
help = 'Calculate the major protonation and tautomerization state of a biopolymer form to a specific pH'[source]
label = 'get-major-micro-species'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.GetPropertiesController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Calculate physical properties such as length, chemical formula, molecular weight, and charge

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'}), (['--ph'], {'default': None, 'type': <class 'float'>, 'help': 'pH at which calculate major protonation state of each monomeric form'}), (['--major-tautomer'], {'action': 'store_true', 'default': False, 'help': 'If set, calculate the major tautomer'}), (['--dearomatize'], {'action': 'store_true', 'default': False, 'help': 'If set, dearomatize molecule'})][source]
description = 'Calculate physical properties such as length, chemical formula, molecular weight, and charge'[source]
help = 'Calculate physical properties such as length, chemical formula, molecular weight, and charge'[source]
label = 'get-properties'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.ValidateController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Validate a biopolymer form

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['seq'], {'type': <class 'str'>, 'help': 'Sequence of monomeric forms'}), (['--circular'], {'action': 'store_true', 'default': False, 'help': 'Biopolymer circularity'})][source]
description = 'Validate a biopolymer form'[source]
help = 'Validate a biopolymer form'[source]
label = 'validate'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
class bpforms.__main__.VizAlphabetController(*args, **kw)[source]

Bases: cement.ext.ext_argparse.ArgparseController

Visualize an alphabet

class Meta[source]

Bases: object

arguments = [(['alphabet'], {'type': <class 'str'>, 'help': 'Biopolymer alphabet'}), (['path'], {'type': <class 'str'>, 'help': 'Path to save visualization of alphabet'})][source]
description = 'Visualize an alphabet'[source]
help = 'Visualize an alphabet'[source]
label = 'viz-alphabet'[source]
stacked_on = 'base'[source]
stacked_type = 'nested'[source]
bpforms.__main__.main()[source]

13.1.4. bpforms._version module

13.1.5. bpforms.core module

Classes to represent modified forms of DNA, RNA, and proteins

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-01-31

Copyright

2019, Karr Lab

License

MIT

class bpforms.core.Alphabet(id=None, name=None, description=None, monomers=None)[source]

Bases: object

Alphabet for monomeric forms

id[source]

id

Type

str

name[source]

name

Type

str

description[source]

description

Type

str

monomers[source]

monomeric forms

Type

dict

from_dict(dict)[source]

Create alphabet from a dictionary representation

Parameters

dict (dict) – dictionary representation of alphabet

Returns

alphabet

Return type

Alphabet

from_yaml(path)[source]

Read alphabet from YAML file

Parameters

path (str) – path to YAML file which defines alphabet

Returns

alphabet

Return type

Alphabet

get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Calculate the major protonation and tautomerization of each monomeric form

Parameters
  • ph (float) – pH

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

get_monomer_code(monomer)[source]

Get the code for a monomeric form in the alphabet

Parameters

monomer (Monomer) – monomeric form

Returns

code for monomeric form

Return type

str

Raises

ValueError – if monomeric form is not in alphabet

is_equal(other)[source]

Determine two alphabets are semantically equal

Parameters

other (type) – other alphabet

Returns

True, if the alphabets are semantically equal

Return type

bool

property monomers[source]

Get the monomeric forms

Returns

monomeric forms

Return type

MonomerDict

to_dict()[source]

Get dictionary representation of alphabet

Returns

dictionary representation of alphabet

Return type

dict

to_yaml(path)[source]

Save alphabet to YAML file

Parameters

path (str) – path to save alphabet in YAML format

class bpforms.core.AlphabetBuilder(_max_monomers=inf)[source]

Bases: abc.ABC

Builder for alphabets

_max_monomers[source]

maximum number of monomeric forms to build; used to limit length of tests

Type

float

abstract build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

run(ph=None, major_tautomer=False, dearomatize=False, path=None)[source]

Build alphabet and, optionally, save to YAML file

Parameters
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

  • path (str, optional) – path to save alphabet

Returns

alphabet

Return type

Alphabet

save(alphabet, path)[source]

Save alphabet to YAML file

Parameters
  • alphabet (Alphabet) – alphabet

  • path (str) – path to save alphabet

class bpforms.core.Atom(molecule, element, position=None, charge=0, monomer=None)[source]

Bases: object

An atom in a compound or bond

molecule[source]

type of parent molecule

Type

type

element[source]

code for the element (e.g. ‘H’)

Type

str

position[source]

position of the atom within the molecule, which should use canonical SMILES atom numbers

Type

int

charge[source]

charge of the atom

Type

int

monomer[source]

index of parent monomeric form within sequence

Type

int

property charge[source]

Get the charge

Returns

charge

Return type

str

property element[source]

Get the element

Returns

element

Return type

str

from_dict(dict)[source]

Load from dictionary representation

Parameters

dict (dict) – dictionary representation

Returns

atom

Return type

Atom

is_equal(other)[source]

Determine if two atoms are semantically equal

Parameters

other (Atom) – other atom

Returns

obj:True if the atoms are semantically equal

Return type

bool

property molecule[source]

Get type of parent molecule

Returns

type of parent molecule

Return type

type

property monomer[source]

Get the index of the parent monomer within the sequence

Returns

index of the parent monomer within the sequence

Return type

int

property position[source]

Get the position

Returns

position

Return type

int

to_dict()[source]

Get dictionary representation

Returns

dictionary representation

Return type

dict

class bpforms.core.AtomList(atoms=None)[source]

Bases: list

List of atoms

__setitem__(slice, atom)[source]

Set atom(s) at slice

Parameters
  • slice (int or slice) – position(s) to set atom

  • atom (Atom or AtomList) – atom or atoms

append(atom)[source]

Add a atom

Parameters

atom (Atom) – atom

Raises

ValueError – if the atom is not an instance of Atom

extend(atoms)[source]

Add a list of atoms

Parameters

atoms (iterable of Atom) – iterable of atoms

from_list(list)[source]

Load from list representation

Parameters

list (list) – list representation

Returns

atom list

Return type

AtomList

insert(i, atom)[source]

Insert an atom at a position

Parameters
  • i (int) – position to insert atom

  • atom (Atom) – atom

is_equal(other)[source]

Determine if two lists of atoms are semantically equal

Parameters

other (AtomList) – other list of atoms

Returns

True, of the lists of atoms are semantically equal

Return type

bool

to_list()[source]

Get list representation

Returns

list representation

Return type

list

class bpforms.core.Backbone(structure=None, monomer_bond_atoms=None, monomer_displaced_atoms=None)[source]

Bases: object

Backbone of a monomeric form

structure[source]

chemical structure

Type

openbabel.OBMol

monomer_bond_atoms[source]

atoms from backbone that bond to monomeric form

Type

AtomList

monomer_displaced_atoms[source]

atoms from backbone displaced by bond to monomeric form

Type

AtomList

export(format, options=())[source]

Export structure to format

Parameters
  • format (str) – format

  • options (list of str, optional) – export options

Returns

format representation of structure

Return type

str

get_charge()[source]

Get the charge

Returns

charge

Return type

int

get_formula()[source]

Get the formula

Returns

formula

Return type

EmpiricalFormula

get_mol_wt()[source]

Get the molecular weight

Returns

molecular weight

Return type

float

is_equal(other)[source]

Determine if two backbones are semantically equal

Parameters

other (Backbone) – other backbone

Returns

True if the backbones are semantically equal

Return type

bool

property monomer_bond_atoms[source]

Get the backbone bond atoms

Returns

backbone bond atoms

Return type

AtomList

property monomer_displaced_atoms[source]

Get the backbone displaced atoms

Returns

backbone displaced atoms

Return type

AtomList

property structure[source]

Get the structure

Returns

structure

Return type

openbabel.OBMol

class bpforms.core.Bond(id=None, name=None, synonyms=None, l_monomer=None, r_monomer=None, l_bond_atoms=None, r_bond_atoms=None, l_displaced_atoms=None, r_displaced_atoms=None, order=<BondOrder.single: 1>, stereo=None, comments=None)[source]

Bases: bpforms.core.BondBase

Bond between monomeric forms (inter-residue bond or crosslink)

id[source]

id

Type

str

name[source]

name

Type

str

synonyms[source]

synonyms

Type

SynonymSet

l_monomer[source]

left monomeric form

Type

Monomer

r_monomer[source]

right monomeric form

Type

Monomer

l_bond_atoms[source]

atoms from left monomeric form that bond with right monomeric form

Type

AtomList

r_bond_atoms[source]

atoms from right monomeric form that bond with left monomeric form

Type

AtomList

l_displaced_atoms[source]

atoms from left monomeric form displaced by bond

Type

AtomList

r_displaced_atoms[source]

atoms from right monomeric form displaced by bond

Type

AtomList

order[source]

order

Type

BondOrder

stereo[source]

stereochemistry

Type

BondStereo

comments[source]

comments

Type

str

__str__()[source]

Generate string representation of bond

Returns

string representation of bond

Return type

str

property comments[source]

Get comments

Returns

comments

Return type

str

get_l_bond_atoms()[source]

Get left bond atoms

Returns

left bond atoms

Return type

list of Atom

get_l_displaced_atoms()[source]

Get left displaced atoms

Returns

left bond atoms

Return type

list of Atom

get_order()[source]

Get the order

Returns

order

Return type

BondOrder

get_r_bond_atoms()[source]

Get right bond atoms

Returns

left bond atoms

Return type

list of Atom

get_r_displaced_atoms()[source]

Get right displaced atoms

Returns

left bond atoms

Return type

list of Atom

get_stereo()[source]

Get the stereochemistry

Returns

stereochemistry

Return type

BondStereo

property id[source]

Get id

Returns

id

Return type

str

is_equal(other)[source]

Determine if two bonds are semantically equal

Parameters

other (Bond) – other bond

Returns

True if the bond are semantically equal

Return type

bool

property l_bond_atoms[source]

Get the left bond atoms

Returns

left bond atoms

Return type

AtomList

property l_displaced_atoms[source]

Get the left displaced atoms

Returns

left displaced atoms

Return type

AtomList

property l_monomer[source]

Get the left monomeric form

Returns

left monomeric form

Return type

Monomer

property name[source]

Get name

Returns

name

Return type

str

property order[source]

Get the order

Returns

order

Return type

BondOrder

property r_bond_atoms[source]

Get the right bond atoms

Returns

right bond atoms

Return type

AtomList

property r_displaced_atoms[source]

Get the right displaced atoms

Returns

right displaced atoms

Return type

AtomList

property r_monomer[source]

Get the right monomeric form

Returns

right monomeric form

Return type

Monomer

property stereo[source]

Get the stereochemistry

Returns

stereochemistry

Return type

BondStereo

property synonyms[source]

Get synonyms

Returns

synonyms

Return type

SynonymSet

class bpforms.core.BondBase[source]

Bases: abc.ABC

get_charge(none_position=True)[source]

Get the charge

Parameters

none_position (bool, optional) – include atoms whose position is None

Returns

charge

Return type

int

get_formula(none_position=True)[source]

Get the formula

Parameters

none_position (bool, optional) – include atoms whose position is None

Returns

formula

Return type

EmpiricalFormula

abstract get_l_bond_atoms()[source]

Get left bond atoms

Returns

left bond atoms

Return type

list of Atom

abstract get_l_displaced_atoms()[source]

Get left displaced atoms

Returns

left bond atoms

Return type

list of Atom

get_mol_wt(none_position=True)[source]

Get the molecular weight

Parameters

none_position (bool, optional) – include atoms whose position is None

Returns

molecular weight

Return type

float

abstract get_order()[source]

Get the order

Returns

order

Return type

BondOrder

abstract get_r_bond_atoms()[source]

Get right bond atoms

Returns

left bond atoms

Return type

list of Atom

abstract get_r_displaced_atoms()[source]

Get right displaced atoms

Returns

left bond atoms

Return type

list of Atom

abstract get_stereo()[source]

Get the stereochemistry

Returns

stereochemistry

Return type

BondStereo

class bpforms.core.BondOrder[source]

Bases: int, enum.Enum

Bond order

aromatic = 4[source]
double = 2[source]
single = 1[source]
triple = 3[source]
class bpforms.core.BondSet[source]

Bases: set

Set of bonds

add(bond)[source]

Add a bond

Parameters

bond (BondBase) – bond

Raises

ValueError – if the bond is not an instance of Bond

is_equal(other)[source]

Check if two sets of bonds are semantically equal

Parameters

other (BondSet) – other set of bonds

Returns

True, if the bond sets are semantically equal

Return type

bool

symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters

other (BondSet) – other set of bonds

update(bonds)[source]

Add a set of bonds

Parameters

bonds (iterable of BondBase) – bonds

class bpforms.core.BondStereo[source]

Bases: int, enum.Enum

Bond stereochemistry

down = 4[source]
hash = 2[source]
up = 3[source]
wedge = 1[source]
class bpforms.core.BpForm(seq=None, alphabet=None, backbone=None, bond=None, circular=False, crosslinks=None, nicks=None)[source]

Bases: object

Biopolymer form

seq[source]

sequence of monomeric forms of the biopolymer

Type

MonomerSequence

alphabet[source]

alphabet of monomeric forms

Type

Alphabet

backbone[source]

backbone that connects monomeric forms

Type

Backbone

bond[source]

bonds between (backbones of) monomeric forms

Type

Bond

circular[source]

if True, indicates that the biopolymer is circular

Type

bool

crosslinking intrachain bonds

Type

BondSet

nicks[source]

set of nicks

Type

NickSet

features[source]

set of features

Type

BpFormFeatureSet

_parser[source]

parser

Type

lark.Lark

DEFAULT_FASTA_CODE = '?'[source]
__contains__(monomer)[source]

Determine if a monomeric form is in the biopolymer form

Parameters

monomer (Monomer) – monomeric form

Returns

true if the monomeric form is in the sequence

Return type

bool

__delitem__(slice)[source]

Delete monomeric form(s) at slice

Parameters

slice (int or slice) – position(s)

__getitem__(slice)[source]

Get monomeric form(s) at slice

Parameters

slice (int or slice) – position(s)

Returns

monomeric form(s)

Return type

Monomer or Monomers

__iter__()[source]

Get iterator over sequence of monomeric forms

Returns

iterator of monomeric forms

Return type

iterator of Monomer

__len__()[source]

Get the length of the sequence of the form

Returns

length

Return type

int

__reversed__()[source]

Get reverse iterator over sequence of monomeric forms

Returns

iterator of monomeric forms

Return type

iterator of Monomer

__setitem__(slice, monomer)[source]

Set monomeric form(s) at slice

Parameters
  • slice (int or slice) – position(s)

  • monomer (Monomer or Monomers) – monomeric forms(s)

__str__()[source]

Get a string representation of the biopolymer form

Returns

string representation of the biopolymer form

Return type

str

property alphabet[source]

Get the alphabet

Returns

alphabet

Return type

Alphabet

property backbone[source]

Get the backbones

Returns

backbones

Return type

Backbone

property bond[source]

Get the bonds

Returns

bonds

Return type

Bond

can_monomer_bond_left(monomer)[source]

Check if monomeric form can bond to the left

Parameters

monomer (Monomer) – monomeric form

Returns

True, if the monomeric form can bond to the left

Return type

bool

can_monomer_bond_right(monomer)[source]

Check if monomeric form can bond to right

Parameters

monomer (Monomer) – monomeric form

Returns

True, if the monomeric form can bond to the right

Return type

bool

property circular[source]

Get the circularity

Returns

circularity

Return type

bool

property crosslinks[source]

Get the crosslinking intrachain bonds

Returns

crosslinking intrachain bonds

Return type

BondSet

diff(other)[source]

Determine the semantic difference between two biopolymer forms

Parameters

other (BpForm) – another biopolymer form

Returns

description of the semantic difference between

two biopolymer forms

Return type

str

export(format, include_all_hydrogens=False, options=())[source]

Export structure to format

Parameters
  • format (str) – format

  • include_all_hydrogens (bool, optional) – if True, explicitly include all hydrogens

  • options (list of str, optional) – export options

Returns

format representation of structure

Return type

str

property features[source]

Get the features

Returns

features

Return type

BpFormFeatureSet

file = <_io.TextIOWrapper name='/root/project/bpforms/grammar.lark' mode='r' encoding='UTF-8'>[source]
from_str(string)[source]

Create biopolymer form its string representation

Parameters

string (str) – string representation of the biopolymer

Returns

biopolymer form

Return type

BpForm

get_canonical_seq(monomer_codes=None)[source]

Get IUPAC/IUBMB representation of a polymer with bases represented by the character codes of their parent monomers (e.g. methyl-2-adenosine is represented by ‘A’)

Parameters

monomer_codes (dict, optional) – dictionary that maps monomers to their codes

Returns

IUPAC/IUBMB representation of a polymer

Return type

str

get_charge()[source]

Get the charge

Returns

charge

Return type

int

get_formula()[source]

Get the chemical formula

Returns

chemical formula

Return type

EmpiricalFormula

get_genomic_image(label=None, seq_features=None, **kwargs)[source]

Get a genomic visualization of the BpForm

Parameters
  • label (str, optional) – title

  • seq_features (dict) –

    list of features each represented by a dictionary with three keys

    • label (str): description of the type of feature

    • color (str): color

    • positions (list of list of int): list of position ranges of the type of feature

The method also accepts the same arguments as

bpforms.util.gen_genomic_viz.

Returns

SVG image

Return type

str

get_image(monomer_color=0, backbone_color=16711680, left_right_bond_color=65280, crosslink_bond_color=255, include_all_hydrogens=True, show_atom_nums=False, atom_label_font_size=0.6, width=200, height=200, image_format='svg', include_xml_header=True)[source]

Get molecular visualization

Parameters
  • monomer_color (int, optional) – color to paint atoms involved in monomeric forms

  • backbone_color (int, optional) – color to paint atoms involved in backbones

  • left_right_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to left

  • crosslink_bond_color (int, optional) – color to paint atoms involved in crosslinks

  • include_all_hydrogens (bool, optional) – if True, show all hydrogens

  • show_atom_nums (bool, optional) – if True, show the numbers of the atoms

  • atom_label_font_size (float, optional) – relative atom label font size

  • width (int, optional) – width in pixels

  • height (int, optional) – height in pixels

  • image_format (str, optional) – format of generated image {emf, eps, jpeg, msbmp, pdf, png, or svg}

  • include_xml_header (bool, optional) – if True, include XML header at the beginning of the SVG

Returns

image

Return type

object

get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Get the major protonation and tautomerization state

Parameters
  • ph (float) – pH

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

major protonation and tautomerization state

Return type

openbabel.OBMol

get_mol_wt()[source]

Get the molecular weight

Returns

molecular weight

Return type

float

get_monomer_counts()[source]

Get the frequency of each monomeric form within the biopolymer

Returns

dictionary that maps monomeric forms to their counts

Return type

dict

get_structure(include_all_hydrogens=False)[source]

Get an Open Babel molecule of the structure

Parameters

include_all_hydrogens (bool, optional) – if True, explicitly include all hydrogens

Returns

  • openbabel.OBMol: Open Babel molecule of the structure

  • dict of dict: dictionary which maps indices (1-based) of monomeric forms

    to dictionaries which map types of components of monomeric forms (‘monomer’ or ‘backbone’) to dictionaries which map indices (1-based) of atoms to atoms (instances of openbabel.OBAtom)

Return type

tuple

is_equal(other)[source]

Check if two biopolymer forms are semantically equal

Parameters

other (BpForm) – another biopolymer form

Returns

True, if the objects have the same structure

Return type

bool

property nicks[source]

Get the nicks

Returns

nicks

Return type

NickSet

property seq[source]

Get the sequence of monomeric forms

Returns

sequence of monomeric forms

Return type

MonomerSequence

validate()[source]

Check that the biopolymer form is valid and return any errors

  • Check that monomeric forms \(1 \ldots L-1\) can bond to the right (their right bonding attributes are set)

  • Check that monomeric forms \(2 \ldots L\) can bond to the left (their left bonding attributes are set)

  • No atom is involved in multiple bonds

Returns

list of errors, if any

Return type

list of str

class bpforms.core.BpFormFeature(form, start_position, end_position)[source]

Bases: object

A region (start and end positions) of a BpForm

form[source]

biopolymer form

Type

BpForm

start_position[source]

start position (1-base)

Type

int

end_position[source]

end position (1-based)

Type

int

property end_position[source]

Get the end position

Returns

end position

Return type

int

property form[source]

Get the biopolymer form

Returns

biopolymer form

Return type

BpForm

property start_position[source]

Get the start position

Returns

start position

Return type

int

class bpforms.core.BpFormFeatureSet(form)[source]

Bases: set

Set of features

form[source]

form

Type

BpForm

add(feature)[source]

Add a feature

Parameters

feature (BpFormFeature) – feature

Raises

ValueError – if the feature is not an instance of BpFormFeature

property form[source]

Get the biopolymer form

Returns

biopolymer form

Return type

BpForm

remove(feature)[source]

Remove a feature

Parameters

feature (BpFormFeature) – feature

symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters

other (BpFormFeatureSet) – other set of features

update(features)[source]

Add a set of features

Parameters

features (iterable of BpFormFeature) – features

exception bpforms.core.BpFormsWarning[source]

Bases: UserWarning

BpForms warning

class bpforms.core.Identifier(ns, id)[source]

Bases: object

A identifier in a namespace for an external database

ns[source]

namespace

Type

str

id[source]

id in namespace

Type

str

__eq__(other)[source]

Check if two identifiers are semantically equal

Parameters

other (Identifier) – another identifier

Returns

True, if the identifiers are semantically equal

Return type

bool

__hash__()[source]

Generate a hash

Returns

hash

Return type

int

property id[source]

Get the id

Returns

id

Return type

str

property ns[source]

Get the namespace

Returns

namespace

Return type

str

class bpforms.core.IdentifierSet(identifiers=None)[source]

Bases: set

Set of identifiers

add(identifier)[source]

Add an identifier

Parameters

identifier (Identifier) – identifier

Raises

ValueError – if the identifier is not an instance of Indentifier

symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters

other (IdentifierSet) – other set of identifiers

update(identifiers)[source]

Add a set of identifiers

Parameters

identifiers (iterable of Identifier) – identifiers

class bpforms.core.Monomer(id=None, name=None, synonyms=None, identifiers=None, structure=None, delta_mass=None, delta_charge=None, start_position=None, end_position=None, monomers_position=None, base_monomers=None, backbone_bond_atoms=None, backbone_displaced_atoms=None, r_bond_atoms=None, l_bond_atoms=None, r_displaced_atoms=None, l_displaced_atoms=None, comments=None)[source]

Bases: object

A monomeric form in a biopolymer

id[source]

id

Type

str

name[source]

name

Type

str

synonyms[source]

synonyms

Type

set of str

identifiers[source]

identifiers in namespaces for external databases

Type

set of Identifier, optional

structure[source]

chemical structure

Type

openbabel.OBMol

delta_mass[source]

additional mass (Dalton) relative to structure

Type

float

delta_charge[source]

additional charge relative to structure

Type

int

start_position[source]

uncertainty in the location of the monomeric form

Type

tuple

end_position[source]

uncertainty in the location of the monomeric form

Type

tuple

monomers_position[source]

originating monomers within start_position to end_position where the monomeric form may be located

Type

set of Monomer

base_monomers[source]

monomers which this monomeric form is derived from

Type

set of Monomer

backbone_bond_atoms[source]

atoms from monomeric form that bond to backbone

Type

AtomList

backbone_displaced_atoms[source]

atoms from monomeric form displaced by bond to backbone

Type

AtomList

r_bond_atoms[source]

atoms that bond with right/suceeding/following/forward monomeric form

Type

AtomList

l_bond_atoms[source]

atoms that bond with left/preceding/previous/backward monomeric form

Type

AtomList

r_displaced_atoms[source]

atoms displaced by bond with right/suceeding/following/forward monomeric form

Type

AtomList

l_displaced_atoms[source]

atoms displaced by bond with left/preceding/previous/backward monomeric form

Type

AtomList

comments[source]

comments

Type

str

IMAGE_URL_PATTERN = 'https://cactus.nci.nih.gov/chemical/structure/{}/image?format=gif&bgcolor=transparent&antialiasing=0'[source]
__str__(alphabet=None)[source]

Get a string representation of the monomeric form

Parameters

alphabet (Alphabet, optional) – alphabet

Returns

string representation of the monomeric form

Return type

str

property backbone_bond_atoms[source]

Get the atoms from the monomeric form that bond to backbone

Returns

atoms from the monomeric form that bond to backbone

Return type

AtomList

property backbone_displaced_atoms[source]

Get the atoms from the monomeric form displaced by the bond to the backbone

Returns

atoms from the monomeric form displaced by the bond to the backbone

Return type

AtomList

property base_monomers[source]

Get base monomeric forms

Returns

base monomeric forms

Return type

set of Monomer

property comments[source]

Get comments

Returns

comments

Return type

str

property delta_charge[source]

Get extra charge

Returns

extra charge

Return type

int

property delta_mass[source]

Get extra mass

Returns

extra mass

Return type

float

property end_position[source]

Get end position

Returns

end position

Return type

int

export(format, options=())[source]

Export structure to format

Parameters
  • format (str) – format

  • options (list of str, optional) – export options

Returns

format representation of structure

Return type

str

from_dict(dict, alphabet=None)[source]

Get a dictionary representation of the monomeric form

Parameters
  • dict (dict) – dictionary representation of the monomeric form

  • alphabet (Alphabet, optional) – alphabet

Returns

monomeric form

Return type

Monomer

get_canonical_code(monomer_codes, default_code='?')[source]

Get IUPAC/IUBMB representation of a monomeric form using the character code of its parent monomer (e.g. ‘methyl-2-adenosine’ is represented by ‘A’)

Parameters
  • monomer_codes (dict) – dictionary that maps monomeric forms to codes

  • default_code (str) – default code

Returns

IUPAC/IUBMB representation of monomeric form

Return type

str

get_charge()[source]

Get the charge

Returns

charge

Return type

int

get_formula()[source]

Get the chemical formula

Returns

chemical formula

Return type

EmpiricalFormula

get_image(bond_label='', displaced_label='', bond_opacity=255, displaced_opacity=63, backbone_bond_color=16711680, left_bond_color=65280, right_bond_color=255, include_all_hydrogens=True, show_atom_nums=False, atom_label_font_size=0.6, width=200, height=200, image_format='svg', include_xml_header=True)[source]

Get image

Parameters
  • bond_label (str, optional) – label for atoms involved in bonds

  • displaced_label (str, optional) – labels for atoms displaced by bond formation

  • bond_opacity (int, optional) – opacity of atoms involved in bonds

  • displaced_opacity (int, optional) – opacity of atoms dislaced by bond formation

  • backbone_bond_color (int, optional) – color to paint atoms involved in bond with backbone

  • left_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to left

  • right_bond_color (int, optional) – color to paint atoms involved in bond with monomeric form to right

  • include_all_hydrogens (bool, optional) – if True, show all hydrogens

  • show_atom_nums (bool, optional) – if True, show the numbers of the atoms

  • atom_label_font_size (float, optional) – relative atom label font size

  • width (int, optional) – width in pixels

  • height (int, optional) – height in pixels

  • image_format (str, optional) – format of generated image {emf, eps, jpeg, msbmp, pdf, png, or svg}

  • include_xml_header (bool, optional) – if True, include XML header at the beginning of the SVG

Returns

image

Return type

object

get_image_url()[source]

Get URL for image of structure

Returns

URL for image of structure

Return type

str

get_major_micro_species(ph, major_tautomer=False, dearomatize=False)[source]

Update to the major protonation and tautomerization state at the pH

Parameters
  • ph (float) – pH

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

get_mol_wt()[source]

Get the molecular weight

Returns

molecular weight

Return type

float

get_root_monomers()[source]

Get root monomeric forms

Returns

root monomeric forms

Return type

set of Monomer

property id[source]

Get id

Returns

id

Return type

str

property identifiers[source]

Get identifiers

Returns

identifiers

Return type

IdentifierSet

is_equal(other)[source]

Check if two monomeric forms are semantically equal

Parameters

other (Monomer) – another monomeric form

Returns

True, if the objects have the same structure

Return type

bool

property l_bond_atoms[source]

Get the right bond atoms

Returns

right bond atoms

Return type

AtomList

property l_displaced_atoms[source]

Get the right displaced atoms

Returns

right displaced atoms

Return type

AtomList

property monomers_position[source]

Get the originating monomers within start_position to end_position where the monomeric form may be located

Returns

originating monomers within start_position to

end_position where the monomeric form may be located

Return type

set of Monomer

property name[source]

Get name

Returns

name

Return type

str

property r_bond_atoms[source]

Get the left bond atoms

Returns

left bond atoms

Return type

AtomList

property r_displaced_atoms[source]

Get the left displaced atoms

Returns

left displaced atoms

Return type

AtomList

property start_position[source]

Get start position

Returns

start position

Return type

int

property structure[source]

Get structure

Returns

structure

Return type

openbabel.OBMol

property synonyms[source]

Get synonyms

Returns

synonyms

Return type

SynonymSet

to_dict(alphabet=None)[source]

Get a dictionary representation of the monomeric form

Parameters

alphabet (Alphabet, optional) – alphabet

Returns

dictionary representation of the monomeric form

Return type

dict

class bpforms.core.MonomerDict(*args, **kwargs)[source]

Bases: attrdict.dictionary.AttrDict

Dictionary for monomeric forms

__setitem__(code, monomer)[source]

Set monomeric form with code

Parameters
  • code (str) – characters for monomeric form

  • monomer (Monomer) – monomeric form

class bpforms.core.MonomerSequence(monomers=None)[source]

Bases: list

Sequence of monomeric forms

__setitem__(slice, monomer)[source]

Set monomeric form(s) at slice

Parameters
  • slice (int or slice) – position(s) to set monomeric form

  • monomer (Monomer or list of Monomer) – monomeric form(s)

append(monomer)[source]

Add a monomeric form

Parameters

monomer (Monomer) – monomeric form

Raises

ValueError – if the monomer is not an instance of Monomer

extend(monomers)[source]

Add a list of monomeric forms

Parameters

monomers (iterable of Monomer) – iterable of monomeric forms

get_monomer_counts()[source]

Get the frequency of each monomeric form within the sequence

Returns

dictionary that maps monomeric forms to their counts

Return type

dict

insert(i, monomer)[source]

Insert a monomeric form at a position

Parameters
  • i (int) – position to insert monomeric form

  • monomer (Monomer) – monomeric form

is_equal(other)[source]

Determine if two sequences of monomeric forms are semantically equal

Parameters

other (MonomerSequence) – other sequence

Returns

True, of the sequences are semantically equal

Return type

bool

class bpforms.core.Nick(position=None)[source]

Bases: object

Nick between adjacent monomeric forms

position[source]

position of nick (\(1 \ldots L-1\)) where \(1\) indicates a nick between the first and second residues, \(2\) indicates a nick between the second and third residues, etc.

Type

int

is_equal(other)[source]
property position[source]

Get the position

Returns

position

Return type

int

class bpforms.core.NickSet[source]

Bases: set

Set of nicks

add(nick)[source]

Add a nick

Parameters

nick (Nick) – nick

Raises

ValueError – if the nick is not an instance of Nick

is_equal(other)[source]

Check if two sets of nicks are semantically equal

Parameters

other (NickSet) – other set of nicks

Returns

True, if the nicks sets are semantically equal

Return type

bool

symmetric_difference_update(other)[source]

Remove common elements with other and add elements from other not in self

Parameters

other (NickSet) – other set of nicks

update(nicks)[source]

Add a set of nicks

Parameters

nicks (iterable of Nick) – nicks

class bpforms.core.OntoBond(type=None, l_monomer=None, r_monomer=None)[source]

Bases: bpforms.core.BondBase

A crosslinking bond whose molecular details are defined in an ontology of crosslinks

type[source]

type of bond

Type

Bond

l_monomer[source]

location of left monomeric form

Type

int

r_monomer[source]

location of right monomeric form

Type

int

__str__()[source]

Generate string representation of bond

Returns

string representation of bond

Return type

str

get_l_bond_atoms()[source]

Get left bond atoms

Returns

left bond atoms

Return type

list of Atom

get_l_displaced_atoms()[source]

Get left displaced atoms

Returns

left bond atoms

Return type

list of Atom

get_order()[source]

Get the order

Returns

order

Return type

BondOrder

get_r_bond_atoms()[source]

Get right bond atoms

Returns

left bond atoms

Return type

list of Atom

get_r_displaced_atoms()[source]

Get right displaced atoms

Returns

left bond atoms

Return type

list of Atom

get_stereo()[source]

Get the stereochemistry

Returns

stereochemistry

Return type

BondStereo

is_equal(other)[source]

Determine if two bonds are semantically equal

Parameters

other (Bond) – other bond

Returns

True if the bond are semantically equal

Return type

bool

property l_monomer[source]

Get location of left monomeric form

Returns

location of left monomeric form

Return type

int

property r_monomer[source]

Get location of right monomeric form

Returns

location of right monomeric form

Return type

int

property type[source]

Get type

Returns

type

Return type

Bond

class bpforms.core.SynonymSet(synonyms=None)[source]

Bases: set

Set of synonyms

add(synonym)[source]

Add an synonym

Parameters

synonym (str) – synonym

Raises

ValueError – if the synonym is not an instance of Indentifier

symmetric_difference_update(other)[source]

Remove common synonyms with other and add synonyms from other not in self

Parameters

other (SynonymSet) – other set of synonyms

update(synonyms)[source]

Add a set of synonyms

Parameters

synonyms (iterable of SynonymSet) – synonyms

bpforms.core.get_hydrogen_atom(parent_atom, bonding_hydrogens, i_monomer)[source]

Get a hydrogen atom attached to a parent atom

Parameters
  • parent_atom (openbabel.OBAtom) – parent atom

  • bonding_hydrogens (list) – hydrogens that have already been gotten

  • i_monomer (int) – index of parent monomer in sequence

Returns

hydrogen atom

Return type

openbabel.OBAtom

bpforms.core.parse_yaml(path)[source]

Read a YAML file

Parameters

path (str) – path to YAML file

Returns

content of file

Return type

object

13.1.6. bpforms.rest module

REST JSON API

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-02-05

Copyright

2019, Karr Lab

License

MIT

class bpforms.rest.AlpabetResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get alphabets

endpoint = 'alphabet_alpabet_resource'[source]
get(id)[source]

Get an alphabet

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.AlphabetsResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get list of alphabets

endpoint = 'alphabet_alphabets_resource'[source]
get()[source]

Get a list of available alphabets

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.Bpform(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Optionally, calculate the major protonation and tautomerization form a biopolymer form and calculate its properties

endpoint = 'bpform_bpform'[source]
mediatypes()[source]
methods = {'POST'}[source]
post()[source]

Optionally, calculate the major protonation and tautomerization form a biopolymer form and calculate its properties

class bpforms.rest.MonomerResource(api=None, *args, **kwargs)[source]

Bases: flask_restplus.resource.Resource

Get information about a monomer

endpoint = 'alphabet_monomer_resource_2'[source]
get(alphabet, monomer, format)[source]

Get a monomeric form

mediatypes()[source]
methods = {'GET'}[source]
class bpforms.rest.PrefixMiddleware(app, prefix='')[source]

Bases: object

bpforms.rest.get_alphabet(id)[source]

Get an alphabet

Parameters

id (str) – id of alphabet

Returns

dictionary representation of an alphabet

Return type

dict

bpforms.rest.get_monomer(alphabet, monomer, format)[source]

Get a monomeric form

Parameters
  • alphabet (str) – id of the alphabet

  • monomer (str) – code of a monomeric form

  • format (str) – output format (“emf”, “eps”, “jpeg”, “json”, “msbmp”, “pdf”, “png” or “svg”)

Returns

dictionary representation of an monomer or SVG-encoded image of a monomer

Return type

object

bpforms.rest.get_monomer_properties(alphabet, monomer)[source]

Get properties of a monomeric form

Parameters
  • alphabet (str) – id of an alphabet

  • monomer (str) – code of monomeric form

Returns

properties of monomeric form

Return type

dict

13.1.7. bpforms.util module

Utilities for BpForms

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-02-05

Copyright

2019, Karr Lab

License

MIT

bpforms.util.build_alphabets(ph=None, major_tautomer=False, dearomatize=False, _max_monomers=inf, alphabets=None)[source]

Build DNA, RNA, and protein alphabets

Parameters
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

  • _max_monomers (float, optional) – maximum number of monomeric forms to build; used for testing

  • alphabets (list of str or None, optional) – ids of alphabets to build. If None, build all alphabets

bpforms.util.export_ontos_to_obo(alphabets=None, filename=None, _max_monomers=None, _max_xlinks=None)[source]

Exports alphabets of residues and ontology of crosslinks to OBO format

Parameters
  • alphabets (list of core.Alphabet, optional) – alphabets to export

  • filename (str, optional) – path to export alphabets

  • _max_monomers (int, optional) – maximum number of monomers to export

  • _max_xlinks (int, optional) – maximum number of crosslinks to export

bpforms.util.gen_genomic_viz(polymers, inter_crosslinks=None, polymer_labels=None, seq_features=None, width=800, cols=1, polymer_margin=25, nt_per_track=100, track_sep=10, polymer_label_font_size=15, seq_font_size=13, tick_label_font_size=10, legend_font_size=13, tooltip_font_size=13, x_link_stroke_width=2, x_link_radius=4, nick_stroke_width=2, axis_stroke_width=0.5, seq_color='#000000', non_canonical_color='#e74624', intra_x_link_color='#2daae1', inter_x_link_color='#90e227', nick_color='#dabe2e', axis_color='#000000', polymer_label_color='#000000')[source]

Get a genomic visualization of the BpForm

Parameters
  • polymers (list of core.BpForm) – polymers

  • inter_crosslinks (list) – list of inter-polymer crosslinks

  • polymer_labels (dict, optional) – dictionary that maps polymers to their labels

  • seq_features (list of dict, optional) –

    list of features each represented by a dictionary with three keys

    • label (str): description of the type of feature

    • color (str): color

    • positions (list of dict): dictionary which maps indices of polymers to a list of position ranges of the type of feature

  • width (int, optional) – width

  • cols (int, optional) – number of columns of polymers

  • polymer_margin (int, optional) – horizontal and vertical spacing between polymers

  • nt_per_track (int, optional) – number of nucleotides per track

  • track_sep (int, optional) – vertical separation between tracks in pixels

  • polymer_label_font_size (float, optional) – font size of polymer label

  • seq_font_size (float, optional) – font size of sequence

  • tick_label_font_size (float, optional) – font size of tick labels

  • legend_font_size (float, optional) – font size of legend

  • tooltip_font_size (float, optional) – font size of tooltip

  • x_link_stroke_width (float, optional) – stroke width of crosslinks

  • x_link_radius (float, optional) – radius of crosslinks line

  • nick_stroke_width (float, optional) – stroke width of nicks

  • axis_stroke_width (float, optional) – stroke width of axis

  • seq_color (str, optional) – color of canonical monomers

  • non_canonical_color (str, optional) – color of non-canonical monomers

  • intra_x_link_color (str, optional) – colors of intrastrand crosslinks

  • inter_x_link_color (str, optional) – colors of interstrand crosslinks

  • nick_color (str, optional) – colors of nicks

  • axis_color (str, optional) – color of axis

  • polymer_label_color (str, optional) – color of polymer labels

Returns

SVG image

Return type

str

bpforms.util.gen_html_viz_alphabet(bpform_type, filename)[source]

Create and save an HTML document with images of the monomeric forms in an alphabet

Parameters
  • bpform_type (type) – subclass of core.BpForm

  • filename (str) – path to save HTML document with images of monomeric forms

bpforms.util.get_alphabet(alphabet)[source]

Get an alphabet

Parameters

alphabet (str) – alphabet

Returns

alphabet

Return type

core.Alphabet

bpforms.util.get_alphabets()[source]

Get a list of available alphabets

Returns

dictionary which maps the ids of alphabets to alphabets

Return type

dict

bpforms.util.get_form(alphabet)[source]

Get a subclass of BpFrom

Parameters

alphabet (str) – alphabet

Returns

subclass of BpForm

Return type

type

bpforms.util.read_from_fasta(filename, alphabet)[source]

Read BpForms from a FASTA-formatted file

Parameters
  • filename (str) – path to FASTA-formatted file

  • alphabet (str) – alphabet of BpForms in file

Returns

dictionary which maps the ids of molecules to their BpForms-encoded

sequences

Return type

dict

bpforms.util.validate_bpform_bonds(form_type)[source]

Validate bonds in alphabet

Parameters

form_type (type) – type of BpForm

Raises

ValueError – if any of the bonds are invalid

bpforms.util.write_to_fasta(forms, filename)[source]

Write BpForms to a FASTA-formatted file

Parameters
  • forms (dict) – dictionary which maps the ids of molecules to their BpForms-encoded sequences

  • filename (str) – path to FASTA-formatted file

13.1.8. Module contents