13.1.1.1. bpforms.alphabet package

13.1.1.1.1. Submodules

13.1.1.1.2. bpforms.alphabet.core module

Code to help build alphabets

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-08-14

Copyright

2019, Karr Lab

License

MIT

bpforms.alphabet.core.download_pdb_ccd()[source]

Download PDB CCD

Returns

path to tar.gz file for the PDB CCD

Return type

str

bpforms.alphabet.core.get_can_smiles(mol)[source]

Get the canonical SMILES representation of a molecule without its stereochemistry

Parameters

mol (openbabel.OBMol) – molecule

Returns

SMILES representation of a molecule without its stereochemistry

Return type

str

bpforms.alphabet.core.get_pdb_ccd_open_babel_mol(pdb_mol)[source]

Generate an Open Babel representation of a PDB CCD entry

Parameters

pdb_mol (dict) – structure of a entry

Returns

structure of a entry

Return type

openbabel.OBMol

bpforms.alphabet.core.parse_pdb_ccd(filename, valid_types, max_monomers)[source]

Parse entries out of the PDB CCD

Parameters
  • filename (str) – path to tar.gz file for PDB CCD

  • valid_types (tuple of str) – list of types of entries to retrieve

  • max_monomers (float) – maximum number of entries to process

Returns

list of metadata and

structures of the entries

Return type

list of tuple

bpforms.alphabet.core.parse_pdb_ccd_entry(xml_file, valid_types)[source]

Parse an entry of the PDB CCD

Parameters
  • xml_file (io.BufferedReader) – XML file that defines an entry of the PDB CCD

  • valid_types (list of str) – list of types of entries to retrieve

Returns

  • Monomer: metadata about the entry

  • str: id of base monomer

  • str: SMILES-encoded structure of the entry

  • dict: structure of the entry

  • dict: dictionary that maps atom ids to their

    coordinates

Return type

tuple

13.1.1.1.3. bpforms.alphabet.dna module

Alphabet and BpForm to represent modified DNA

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-02-05

Copyright

2019, Karr Lab

License

MIT

class bpforms.alphabet.dna.CanonicalDnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical DNA form

DEFAULT_FASTA_CODE = 'N'[source]
class bpforms.alphabet.dna.DnaAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build DNA alphabet from MODOMICS

class CovMod(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

cmodid[source]
definition[source]
netcharge[source]
symbol[source]
class ExpandedAlphabet(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Abbreviation[source]
Complement[source]
Name[source]
Symbol[source]
nameid[source]
INVALID_IDS = ("2'-deoxyinosine", "2'-deoxyuridine", "2'-deoxy-5-(4,5-dihydroxypentyl)uridine", "5-(2-aminoethoxy)methyl-2'-deoxyuridine", "5,6-dihydroxy-2'-deoxyuridine", "5-(2-aminoethyl)-2'-deoxyuridine", '7-amido-7-deazaguanosine', "(beta-D-glucopyranosyloxymethyl)deoxyuridine 5'-monophosphate", 'N(2),N(2)-dimethylguanosine')[source]
class ModBase(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

baseid[source]
cmodid[source]
formulaid[source]
nameid[source]
position[source]
verifiedstatus[source]
class ModBaseParents(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

nameid[source]
parentid[source]
class Names(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

chebiname[source]
inchi[source]
inchikey[source]
iupacname[source]
nameid[source]
othernames[source]
smiles[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_dnamod(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from DNAmod

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_pdb_ccd(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from PDB CCD

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_repairtoire(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from DNAmod

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

is_nucleobase_valid(monomer)[source]

Determine if monomeric form should be included in alphabet

Parameters

monomer (Monomer) – monomeric form

Returns

True if monomeric form should be included in alphabet

Return type

bool

load_session()[source]

loads an SQLAlchemy session

run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/dna.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

  • path (str, optional) – path to save alphabet

Returns

alphabet

Return type

Alphabet

class bpforms.alphabet.dna.DnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

DNA form

DEFAULT_FASTA_CODE = 'N'[source]
bpforms.alphabet.dna.get_dnamod(filename)[source]

13.1.1.1.4. bpforms.alphabet.protein module

Alphabet and BpForm to represent modified proteins

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-02-05

Copyright

2019, Karr Lab

License

MIT

class bpforms.alphabet.protein.CanonicalProteinForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical protein form

DEFAULT_FASTA_CODE = 'X'[source]
class bpforms.alphabet.protein.ProteinAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build protein alphabet from the PDB Chemical Component Dictionary <http://www.wwpdb.org/data/ccd> and RESID

MAX_RETRIES = 5[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_from_mod(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from PSI-MI ontology

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

build_from_pdb(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from PDB Chemical Component Dictionary

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

build_from_resid(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from RESID

Parameters
  • alphabet (Alphabet) – alphabet

  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

get_resid_monomer_details(id, session)[source]

Get the CHEBI ID and synonyms of an amino acid from its RESID webpage

Parameters

input_pdb (str) – id of RESID entry

Returns

code SynonymSet: set of synonyms IdentifierSet: set of identifiers set of str: ids of base monomeric forms str: comments

Return type

str

get_resid_monomer_structure(name, pdb_filename, ph=None, major_tautomer=False, dearomatize=False)[source]

Get the structure of an amino acid from a PDB file

Parameters
  • name (str) – name of monomeric form

  • pdb_filename (str) – path to PDB file with structure

  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

structure int: index of atom of N terminus int: index of atom of C terminus

Return type

openbabel.OBMol

get_termini(mol, residue=True)[source]

Get indices of atoms of N and C termini

Parameters
  • mol (openbabel.OBMol) – molecule

  • residue (bool, optional) – if True, search for a residue (H instead of O- at C terminus)

Returns

structure int: index of atom of N terminus int: index of atom of C terminus

Return type

openbabel.OBMol

is_c_terminus(mol, atom, residue=True, convert_to_aa=False)[source]

Determine if an atom is an C-terminus

Parameters
  • mol (openbabel.OBMol) – molecule

  • atom (openbabel.OBAtom) – atom

  • residue (bool, optional) – if True, search for a residue (H instead of O- at C terminus)

  • convert_to_aa (bool, optional) – if True, convert COH to COOH

Returns

True if the atom is an C-terminus

Return type

bool

is_n_terminus(mol, atom)[source]

Determine if an atom is an N-terminus

Parameters
  • mol (openbabel.OBMol) – molecule

  • atom (openbabel.OBAtom) – atom

Returns

True if the atom is an N-terminus

Return type

bool

is_terminus(atom_n, atom_c)[source]

Determine if a pair of atoms are N- and C-termini

Parameters
  • atom_n (openbabel.OBAtom) – potential N-terminus

  • atom_c (openbabel.OBAtom) – potential C-terminus

Returns

True, if the atoms are N- and C-termini

Return type

bool

run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/protein.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

  • path (str, optional) – path to save alphabet

Returns

alphabet

Return type

Alphabet

set_termini(mol, monomer, i_n, i_c)[source]

Set the C and N terminal bond atoms of a monomer

Parameters
  • mol (openbabel.OBMol) – molecule

  • monomer (Monomer) – monomer

  • i_n (int) – index of N terminus

  • i_c (int) – index of C terminus

class bpforms.alphabet.protein.ProteinForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Protein form

DEFAULT_FASTA_CODE = 'X'[source]

13.1.1.1.5. bpforms.alphabet.rna module

Alphabet and BpForm to represent modified RNA

Author

Jonathan Karr <karr@mssm.edu>

Date

2019-02-05

Copyright

2019, Karr Lab

License

MIT

class bpforms.alphabet.rna.CanonicalRnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical RNA form

DEFAULT_FASTA_CODE = 'N'[source]
class bpforms.alphabet.rna.RnaAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build RNA alphabet from MODOMICS and the RNA Modification Database

MAX_RETRIES = 5[source]
MODOMICS_ENTRY_ENDPOINT = 'http://modomics.genesilico.pl/modifications/{}/'[source]
MODOMICS_INDEX_ASCII_ENDPOINT = 'http://modomics.genesilico.pl/modifications/?base=all&type=all&display_ascii=Display+as+ASCII'[source]
MODOMICS_INDEX_ENDPOINT = 'http://modomics.genesilico.pl/modifications/'[source]
RNA_MOD_DB_ENTRY_ENDPOINT = 'https://mods.rna.albany.edu/mods/modifications/view/{}'[source]
RNA_MOD_DB_INDEX_ENDPOINT = 'https://mods.rna.albany.edu/mods/modifications/search'[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_modomics(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from MODOMICS

Parameters
  • alphabet (Alphabet) – alphabet

  • session (requests_cache.core.CachedSession) – request cache session

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

build_pdb(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from PDB CCD

Parameters
  • alphabet (Alphabet) – alphabet

  • session (requests_cache.core.CachedSession) – request cache session

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

Returns

alphabet

Return type

Alphabet

build_rna_mod_db(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from the RNA Modification Database

Parameters
  • alphabet (Alphabet) – alphabet

  • session (requests_cache.core.CachedSession) – request cache session

  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

get_nucleoside_details_from_modomics(id, session)[source]

Get the structure of a nucleoside in the MODOMICS database

Parameters

id (str) – id of nucleoside in MODOMICS database

Returns

structure IdentifierSet: identifiers

Return type

openbabel.OBMol

is_backbone_atom(b_atom)[source]

Determine if an atom is a valid backbone bonding site

Parameters

b_atom (openbabel.OBAtom) – potential backbone atom

Returns

True, if the atom is a valid backbone bonding site

Return type

bool

is_l_atom(l_atom)[source]

Determine if an atom is a valid left bonding site

Parameters

l_atom (openbabel.OBAtom) – potential left atom

Returns

True, if the atom is a valid left bonding site

Return type

bool

is_nucleotide_terminus(l_atom, r_atom)[source]

Determine if a pair of atoms is a valid pair of bonding sites

Parameters
  • l_atom (openbabel.OBAtom) – potential left atom

  • r_atom (openbabel.OBAtom) – potential right bond atom

Returns

True, if the atoms are a valid pair of bonding sites

Return type

bool

is_r_bond_atom(r_atom)[source]

Determine if an atom is a valid right bond bonding site

Parameters

b_atom (openbabel.OBAtom) – potential right bond atom

Returns

True, if the atom is a valid right bond bonding site

Return type

bool

is_terminus(b_atom, r_atom)[source]

Determine if a pair of atoms is a valid pair of bonding sites

Parameters
  • b_atom (openbabel.OBAtom) – potential backbone atom

  • r_atom (openbabel.OBAtom) – potential right bond atom

Returns

True, if the atoms are a valid pair of bonding sites

Return type

bool

is_valid_nucleoside(monomer)[source]

Determine if nucleoside should be included in alphabet

Parameters

monomer (Monomer) – monomeric form

Returns

True if the monomeric form is a valid nucleoside

Return type

bool

run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/rna.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form

  • major_tautomer (bool, optional) – if True, calculate the major tautomer

  • dearomatize (bool, optional) – if True, dearomatize molecule

  • path (str, optional) – path to save alphabet

Returns

alphabet

Return type

Alphabet

class bpforms.alphabet.rna.RnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

RNA form

DEFAULT_FASTA_CODE = 'N'[source]

13.1.1.1.6. Module contents