13.1.1.1. bpforms.alphabet package¶
13.1.1.1.1. Submodules¶
13.1.1.1.2. bpforms.alphabet.core module¶
Code to help build alphabets
- Author
Jonathan Karr <karr@mssm.edu>
- Date
2019-08-14
- Copyright
2019, Karr Lab
- License
MIT
-
bpforms.alphabet.core.
download_pdb_ccd
()[source]¶ Download PDB CCD
- Returns
path to tar.gz file for the PDB CCD
- Return type
str
-
bpforms.alphabet.core.
get_can_smiles
(mol)[source]¶ Get the canonical SMILES representation of a molecule without its stereochemistry
- Parameters
mol (
openbabel.OBMol
) – molecule- Returns
SMILES representation of a molecule without its stereochemistry
- Return type
str
-
bpforms.alphabet.core.
get_pdb_ccd_open_babel_mol
(pdb_mol)[source]¶ Generate an Open Babel representation of a PDB CCD entry
- Parameters
pdb_mol (
dict
) – structure of a entry- Returns
structure of a entry
- Return type
openbabel.OBMol
-
bpforms.alphabet.core.
parse_pdb_ccd
(filename, valid_types, max_monomers)[source]¶ Parse entries out of the PDB CCD
- Parameters
filename (
str
) – path to tar.gz file for PDB CCDvalid_types (
tuple
ofstr
) – list of types of entries to retrievemax_monomers (
float
) – maximum number of entries to process
- Returns
- list of metadata and
structures of the entries
- Return type
list
oftuple
-
bpforms.alphabet.core.
parse_pdb_ccd_entry
(xml_file, valid_types)[source]¶ Parse an entry of the PDB CCD
- Parameters
xml_file (
io.BufferedReader
) – XML file that defines an entry of the PDB CCDvalid_types (
list
ofstr
) – list of types of entries to retrieve
- Returns
Monomer
: metadata about the entrystr
: id of base monomerstr
: SMILES-encoded structure of the entrydict
: structure of the entrydict
: dictionary that maps atom ids to theircoordinates
- Return type
tuple
13.1.1.1.3. bpforms.alphabet.dna module¶
Alphabet and BpForm to represent modified DNA
- Author
Jonathan Karr <karr@mssm.edu>
- Date
2019-02-05
- Copyright
2019, Karr Lab
- License
MIT
-
class
bpforms.alphabet.dna.
CanonicalDnaForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
Canonical DNA form
-
class
bpforms.alphabet.dna.
DnaAlphabetBuilder
(_max_monomers=inf)[source]¶ Bases:
bpforms.core.AlphabetBuilder
Build DNA alphabet from MODOMICS
-
INVALID_IDS
= ("2'-deoxyinosine", "2'-deoxyuridine", "2'-deoxy-5-(4,5-dihydroxypentyl)uridine", "5-(2-aminoethoxy)methyl-2'-deoxyuridine", "5,6-dihydroxy-2'-deoxyuridine", "5-(2-aminoethyl)-2'-deoxyuridine", '7-amido-7-deazaguanosine', "(beta-D-glucopyranosyloxymethyl)deoxyuridine 5'-monophosphate", 'N(2),N(2)-dimethylguanosine')[source]¶
-
build
(ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet
- Parameters
ph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_dnamod
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build monomeric forms from DNAmod
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_pdb_ccd
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build monomeric forms from PDB CCD
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_repairtoire
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build monomeric forms from DNAmod
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
is_nucleobase_valid
(monomer)[source]¶ Determine if monomeric form should be included in alphabet
- Parameters
monomer (
Monomer
) – monomeric form- Returns
True
if monomeric form should be included in alphabet- Return type
bool
-
run
(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/dna.yml')[source]¶ Build alphabet and, optionally, save to YAML file
- Parameters
ph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize moleculepath (
str
, optional) – path to save alphabet
- Returns
alphabet
- Return type
Alphabet
-
-
class
bpforms.alphabet.dna.
DnaForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
DNA form
13.1.1.1.4. bpforms.alphabet.protein module¶
Alphabet and BpForm to represent modified proteins
- Author
Jonathan Karr <karr@mssm.edu>
- Date
2019-02-05
- Copyright
2019, Karr Lab
- License
MIT
-
class
bpforms.alphabet.protein.
CanonicalProteinForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
Canonical protein form
-
class
bpforms.alphabet.protein.
ProteinAlphabetBuilder
(_max_monomers=inf)[source]¶ Bases:
bpforms.core.AlphabetBuilder
Build protein alphabet from the PDB Chemical Component Dictionary <http://www.wwpdb.org/data/ccd> and RESID
-
build
(ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet
- Parameters
ph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_from_mod
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet from PSI-MI ontology
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
-
build_from_pdb
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet from PDB Chemical Component Dictionary
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
-
build_from_resid
(alphabet, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet from RESID
- Parameters
alphabet (
Alphabet
) – alphabetph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
-
get_resid_monomer_details
(id, session)[source]¶ Get the CHEBI ID and synonyms of an amino acid from its RESID webpage
- Parameters
input_pdb (
str
) – id of RESID entry- Returns
code
SynonymSet
: set of synonymsIdentifierSet
: set of identifiersset
ofstr
: ids of base monomeric formsstr
: comments- Return type
str
-
get_resid_monomer_structure
(name, pdb_filename, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Get the structure of an amino acid from a PDB file
- Parameters
name (
str
) – name of monomeric formpdb_filename (
str
) – path to PDB file with structureph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
structure
int
: index of atom of N terminusint
: index of atom of C terminus- Return type
openbabel.OBMol
-
get_termini
(mol, residue=True)[source]¶ Get indices of atoms of N and C termini
- Parameters
mol (
openbabel.OBMol
) – moleculeresidue (
bool
, optional) – ifTrue
, search for a residue (H instead of O- at C terminus)
- Returns
structure
int
: index of atom of N terminusint
: index of atom of C terminus- Return type
openbabel.OBMol
-
is_c_terminus
(mol, atom, residue=True, convert_to_aa=False)[source]¶ Determine if an atom is an C-terminus
- Parameters
mol (
openbabel.OBMol
) – moleculeatom (
openbabel.OBAtom
) – atomresidue (
bool
, optional) – ifTrue
, search for a residue (H instead of O- at C terminus)convert_to_aa (
bool
, optional) – ifTrue
, convert COH to COOH
- Returns
True
if the atom is an C-terminus- Return type
bool
-
is_n_terminus
(mol, atom)[source]¶ Determine if an atom is an N-terminus
- Parameters
mol (
openbabel.OBMol
) – moleculeatom (
openbabel.OBAtom
) – atom
- Returns
True
if the atom is an N-terminus- Return type
bool
-
is_terminus
(atom_n, atom_c)[source]¶ Determine if a pair of atoms are N- and C-termini
- Parameters
atom_n (
openbabel.OBAtom
) – potential N-terminusatom_c (
openbabel.OBAtom
) – potential C-terminus
- Returns
True
, if the atoms are N- and C-termini- Return type
bool
-
run
(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/protein.yml')[source]¶ Build alphabet and, optionally, save to YAML file
- Parameters
ph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize moleculepath (
str
, optional) – path to save alphabet
- Returns
alphabet
- Return type
Alphabet
-
-
class
bpforms.alphabet.protein.
ProteinForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
Protein form
13.1.1.1.5. bpforms.alphabet.rna module¶
Alphabet and BpForm to represent modified RNA
- Author
Jonathan Karr <karr@mssm.edu>
- Date
2019-02-05
- Copyright
2019, Karr Lab
- License
MIT
-
class
bpforms.alphabet.rna.
CanonicalRnaForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
Canonical RNA form
-
class
bpforms.alphabet.rna.
RnaAlphabetBuilder
(_max_monomers=inf)[source]¶ Bases:
bpforms.core.AlphabetBuilder
Build RNA alphabet from MODOMICS and the RNA Modification Database
-
MODOMICS_INDEX_ASCII_ENDPOINT
= 'http://modomics.genesilico.pl/modifications/?base=all&type=all&display_ascii=Display+as+ASCII'[source]¶
-
build
(ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet
- Parameters
ph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_modomics
(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet from MODOMICS
- Parameters
alphabet (
Alphabet
) – alphabetsession (
requests_cache.core.CachedSession
) – request cache sessionph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
-
build_pdb
(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build monomeric forms from PDB CCD
- Parameters
alphabet (
Alphabet
) – alphabetsession (
requests_cache.core.CachedSession
) – request cache sessionph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
- Returns
alphabet
- Return type
Alphabet
-
build_rna_mod_db
(alphabet, session, ph=None, major_tautomer=False, dearomatize=False)[source]¶ Build alphabet from the RNA Modification Database
- Parameters
alphabet (
Alphabet
) – alphabetsession (
requests_cache.core.CachedSession
) – request cache sessionph (
float
, optional) – pH at which to calculate the major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize molecule
-
get_nucleoside_details_from_modomics
(id, session)[source]¶ Get the structure of a nucleoside in the MODOMICS database
- Parameters
id (
str
) – id of nucleoside in MODOMICS database- Returns
structure
IdentifierSet
: identifiers- Return type
openbabel.OBMol
-
is_backbone_atom
(b_atom)[source]¶ Determine if an atom is a valid backbone bonding site
- Parameters
b_atom (
openbabel.OBAtom
) – potential backbone atom- Returns
True
, if the atom is a valid backbone bonding site- Return type
bool
-
is_l_atom
(l_atom)[source]¶ Determine if an atom is a valid left bonding site
- Parameters
l_atom (
openbabel.OBAtom
) – potential left atom- Returns
True
, if the atom is a valid left bonding site- Return type
bool
-
is_nucleotide_terminus
(l_atom, r_atom)[source]¶ Determine if a pair of atoms is a valid pair of bonding sites
- Parameters
l_atom (
openbabel.OBAtom
) – potential left atomr_atom (
openbabel.OBAtom
) – potential right bond atom
- Returns
True
, if the atoms are a valid pair of bonding sites- Return type
bool
-
is_r_bond_atom
(r_atom)[source]¶ Determine if an atom is a valid right bond bonding site
- Parameters
b_atom (
openbabel.OBAtom
) – potential right bond atom- Returns
True
, if the atom is a valid right bond bonding site- Return type
bool
-
is_terminus
(b_atom, r_atom)[source]¶ Determine if a pair of atoms is a valid pair of bonding sites
- Parameters
b_atom (
openbabel.OBAtom
) – potential backbone atomr_atom (
openbabel.OBAtom
) – potential right bond atom
- Returns
True
, if the atoms are a valid pair of bonding sites- Return type
bool
-
is_valid_nucleoside
(monomer)[source]¶ Determine if nucleoside should be included in alphabet
- Parameters
monomer (
Monomer
) – monomeric form- Returns
True
if the monomeric form is a valid nucleoside- Return type
bool
-
run
(ph=None, major_tautomer=False, dearomatize=False, path='/root/project/bpforms/alphabet/rna.yml')[source]¶ Build alphabet and, optionally, save to YAML file
- Parameters
ph (
float
, optional) – pH at which calculate major protonation state of each monomeric formmajor_tautomer (
bool
, optional) – ifTrue
, calculate the major tautomerdearomatize (
bool
, optional) – ifTrue
, dearomatize moleculepath (
str
, optional) – path to save alphabet
- Returns
alphabet
- Return type
Alphabet
-
-
class
bpforms.alphabet.rna.
RnaForm
(seq=None, circular=False)[source]¶ Bases:
bpforms.core.BpForm
RNA form