10.1.1.1. bpforms.alphabet package

10.1.1.1.1. Submodules

10.1.1.1.2. bpforms.alphabet.core module

Code to help build alphabets

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-08-14
Copyright:2019, Karr Lab
License:MIT
bpforms.alphabet.core.download_pdb_ccd()[source]

Download PDB CCD

Returns:path to tar.gz file for the PDB CCD
Return type:str
bpforms.alphabet.core.get_can_smiles(mol)[source]

Get the canonical SMILES representation of a molecule without its stereochemistry

Parameters:mol (openbabel.OBMol) – molecule
Returns:SMILES representation of a molecule without its stereochemistry
Return type:str
bpforms.alphabet.core.get_pdb_ccd_open_babel_mol(pdb_mol)[source]

Generate an Open Babel representation of a PDB CCD entry

Parameters:pdb_mol (dict) – structure of a entry
Returns:structure of a entry
Return type:openbabel.OBMol
bpforms.alphabet.core.parse_pdb_ccd(filename, valid_types, max_monomers)[source]

Parse entries out of the PDB CCD

Parameters:
  • filename (str) – path to tar.gz file for PDB CCD
  • valid_types (tuple of str) – list of types of entries to retrieve
  • max_monomers (float) – maximum number of entries to process
Returns:

list of metadata and

structures of the entries

Return type:

list of tuple

bpforms.alphabet.core.parse_pdb_ccd_entry(xml_file, valid_types)[source]

Parse an entry of the PDB CCD

Parameters:
  • xml_file (io.BufferedReader) – XML file that defines an entry of the PDB CCD
  • valid_types (list of str) – list of types of entries to retrieve
Returns:

  • Monomer: metadata about the entry
  • str: id of base monomer
  • str: SMILES-encoded structure of the entry
  • dict: structure of the entry
  • dict: dictionary that maps atom ids to their
    coordinates

Return type:

tuple

10.1.1.1.3. bpforms.alphabet.dna module

Alphabet and BpForm to represent modified DNA

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-02-05
Copyright:2019, Karr Lab
License:MIT
class bpforms.alphabet.dna.CanonicalDnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical DNA form

DEFAULT_FASTA_CODE = 'N'[source]
class bpforms.alphabet.dna.DnaAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build DNA alphabet from MODOMICS

class CovMod(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

cmodid[source]
definition[source]
netcharge[source]
symbol[source]
class ExpandedAlphabet(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Abbreviation[source]
Complement[source]
Name[source]
Symbol[source]
nameid[source]
INVALID_IDS = ("2'-deoxyinosine", "2'-deoxyuridine", "2'-deoxy-5-(4,5-dihydroxypentyl)uridine", "5-(2-aminoethoxy)methyl-2'-deoxyuridine", "5,6-dihydroxy-2'-deoxyuridine", "5-(2-aminoethyl)-2'-deoxyuridine", '7-amido-7-deazaguanosine', "(beta-D-glucopyranosyloxymethyl)deoxyuridine 5'-monophosphate", 'N(2),N(2)-dimethylguanosine')[source]
class ModBase(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

baseid[source]
cmodid[source]
formulaid[source]
nameid[source]
position[source]
verifiedstatus[source]
class ModBaseParents(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

nameid[source]
parentid[source]
class Names(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

chebiname[source]
inchi[source]
inchikey[source]
iupacname[source]
nameid[source]
othernames[source]
smiles[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters:
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_dnamod(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from DNAmod

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_pdb_ccd(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from PDB CCD

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_repairtoire(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from DNAmod

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

is_nucleobase_valid(monomer)[source]

Determine if monomeric form should be included in alphabet

Parameters:monomer (Monomer) – monomeric form
Returns:True if monomeric form should be included in alphabet
Return type:bool
load_session()[source]

loads an SQLAlchemy session

run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/dna.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters:
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
  • path (str, optional) – path to save alphabet
Returns:

alphabet

Return type:

Alphabet

class bpforms.alphabet.dna.DnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

DNA form

DEFAULT_FASTA_CODE = 'N'[source]
bpforms.alphabet.dna.get_dnamod(filename)[source]

10.1.1.1.4. bpforms.alphabet.protein module

Alphabet and BpForm to represent modified proteins

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-02-05
Copyright:2019, Karr Lab
License:MIT
class bpforms.alphabet.protein.CanonicalProteinForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical protein form

DEFAULT_FASTA_CODE = 'X'[source]
class bpforms.alphabet.protein.ProteinAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build protein alphabet from the PDB Chemical Component Dictionary <http://www.wwpdb.org/data/ccd> and RESID

MAX_RETRIES = 5[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters:
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_from_mod(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from PSI-MI ontology

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
build_from_pdb(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from PDB Chemical Component Dictionary

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
build_from_resid(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from RESID

Parameters:
  • alphabet (Alphabet) – alphabet
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
get_resid_monomer_details(id, session)[source]

Get the CHEBI ID and synonyms of an amino acid from its RESID webpage

Parameters:input_pdb (str) – id of RESID entry
Returns:code SynonymSet: set of synonyms IdentifierSet: set of identifiers set of str: ids of base monomeric forms str: comments
Return type:str
get_resid_monomer_structure(name, pdb_filename, ph=None, major_tautomer=False, dearomatize=False)[source]

Get the structure of an amino acid from a PDB file

Parameters:
  • name (str) – name of monomeric form
  • pdb_filename (str) – path to PDB file with structure
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

structure int: index of atom of N terminus int: index of atom of C terminus

Return type:

openbabel.OBMol

get_termini(mol, residue=True)[source]

Get indices of atoms of N and C termini

Parameters:
  • mol (openbabel.OBMol) – molecule
  • residue (bool, optional) – if True, search for a residue (H instead of O- at C terminus)
Returns:

structure int: index of atom of N terminus int: index of atom of C terminus

Return type:

openbabel.OBMol

is_c_terminus(mol, atom, residue=True, convert_to_aa=False)[source]

Determine if an atom is an C-terminus

Parameters:
  • mol (openbabel.OBMol) – molecule
  • atom (openbabel.OBAtom) – atom
  • residue (bool, optional) – if True, search for a residue (H instead of O- at C terminus)
  • convert_to_aa (bool, optional) – if True, convert COH to COOH
Returns:

True if the atom is an C-terminus

Return type:

bool

is_n_terminus(mol, atom)[source]

Determine if an atom is an N-terminus

Parameters:
  • mol (openbabel.OBMol) – molecule
  • atom (openbabel.OBAtom) – atom
Returns:

True if the atom is an N-terminus

Return type:

bool

is_terminus(atom_n, atom_c)[source]

Determine if a pair of atoms are N- and C-termini

Parameters:
  • atom_n (openbabel.OBAtom) – potential N-terminus
  • atom_c (openbabel.OBAtom) – potential C-terminus
Returns:

True, if the atoms are N- and C-termini

Return type:

bool

run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/protein.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters:
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
  • path (str, optional) – path to save alphabet
Returns:

alphabet

Return type:

Alphabet

set_termini(mol, monomer, i_n, i_c)[source]

Set the C and N terminal bond atoms of a monomer

Parameters:
  • mol (openbabel.OBMol) – molecule
  • monomer (Monomer) – monomer
  • i_n (int) – index of N terminus
  • i_c (int) – index of C terminus
class bpforms.alphabet.protein.ProteinForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Protein form

DEFAULT_FASTA_CODE = 'X'[source]

10.1.1.1.5. bpforms.alphabet.rna module

Alphabet and BpForm to represent modified RNA

Author:Jonathan Karr <karr@mssm.edu>
Date:2019-02-05
Copyright:2019, Karr Lab
License:MIT
class bpforms.alphabet.rna.CanonicalRnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

Canonical RNA form

DEFAULT_FASTA_CODE = 'N'[source]
class bpforms.alphabet.rna.RnaAlphabetBuilder(_max_monomers=inf)[source]

Bases: bpforms.core.AlphabetBuilder

Build RNA alphabet from MODOMICS and the RNA Modification Database

MAX_RETRIES = 5[source]
MODOMICS_ENTRY_ENDPOINT = 'http://modomics.genesilico.pl/modifications/{}/'[source]
MODOMICS_INDEX_ASCII_ENDPOINT = 'http://modomics.genesilico.pl/modifications/?base=all&type=all&display_ascii=Display+as+ASCII'[source]
MODOMICS_INDEX_ENDPOINT = 'http://modomics.genesilico.pl/modifications/'[source]
RNA_MOD_DB_ENTRY_ENDPOINT = 'https://mods.rna.albany.edu/mods/modifications/view/{}'[source]
RNA_MOD_DB_INDEX_ENDPOINT = 'https://mods.rna.albany.edu/mods/modifications/search'[source]
build(ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet

Parameters:
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_modomics(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from MODOMICS

Parameters:
  • alphabet (Alphabet) – alphabet
  • session (requests_cache.core.CachedSession) – request cache session
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
build_pdb(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build monomeric forms from PDB CCD

Parameters:
  • alphabet (Alphabet) – alphabet
  • session (requests_cache.core.CachedSession) – request cache session
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
Returns:

alphabet

Return type:

Alphabet

build_rna_mod_db(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]

Build alphabet from the RNA Modification Database

Parameters:
  • alphabet (Alphabet) – alphabet
  • session (requests_cache.core.CachedSession) – request cache session
  • ph (float, optional) – pH at which to calculate the major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
get_nucleoside_details_from_modomics(id, session)[source]

Get the structure of a nucleoside in the MODOMICS database

Parameters:id (str) – id of nucleoside in MODOMICS database
Returns:structure IdentifierSet: identifiers
Return type:openbabel.OBMol
is_backbone_atom(b_atom)[source]

Determine if an atom is a valid backbone bonding site

Parameters:b_atom (openbabel.OBAtom) – potential backbone atom
Returns:True, if the atom is a valid backbone bonding site
Return type:bool
is_l_atom(l_atom)[source]

Determine if an atom is a valid left bonding site

Parameters:l_atom (openbabel.OBAtom) – potential left atom
Returns:True, if the atom is a valid left bonding site
Return type:bool
is_nucleotide_terminus(l_atom, r_atom)[source]

Determine if a pair of atoms is a valid pair of bonding sites

Parameters:
  • l_atom (openbabel.OBAtom) – potential left atom
  • r_atom (openbabel.OBAtom) – potential right bond atom
Returns:

True, if the atoms are a valid pair of bonding sites

Return type:

bool

is_r_bond_atom(r_atom)[source]

Determine if an atom is a valid right bond bonding site

Parameters:b_atom (openbabel.OBAtom) – potential right bond atom
Returns:True, if the atom is a valid right bond bonding site
Return type:bool
is_terminus(b_atom, r_atom)[source]

Determine if a pair of atoms is a valid pair of bonding sites

Parameters:
  • b_atom (openbabel.OBAtom) – potential backbone atom
  • r_atom (openbabel.OBAtom) – potential right bond atom
Returns:

True, if the atoms are a valid pair of bonding sites

Return type:

bool

is_valid_nucleoside(monomer)[source]

Determine if nucleoside should be included in alphabet

Parameters:monomer (Monomer) – monomeric form
Returns:True if the monomeric form is a valid nucleoside
Return type:bool
run(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/rna.yml')[source]

Build alphabet and, optionally, save to YAML file

Parameters:
  • ph (float, optional) – pH at which calculate major protonation state of each monomeric form
  • major_tautomer (bool, optional) – if True, calculate the major tautomer
  • dearomatize (bool, optional) – if True, dearomatize molecule
  • path (str, optional) – path to save alphabet
Returns:

alphabet

Return type:

Alphabet

class bpforms.alphabet.rna.RnaForm(seq=None, circular=False)[source]

Bases: bpforms.core.BpForm

RNA form

DEFAULT_FASTA_CODE = 'N'[source]

10.1.1.1.6. Module contents