BpForms is a set of tools for unambiguously representing the structures of modified forms of biopolymers such as DNA, RNA, and protein.
The BpForms notation can unambiguously represent the structure of modified forms of biopolymers. For example, the following represents a modified DNA molecule that contains a deoxyinosine monomer at the fourth position:
ACG[ id: "dI" | structure: "O=C1NC=NC2=C1N=CN2" ]T
This concrete representation of modified biopolymers enables the BpForms software tools to calculate the chemical formulae, molecular weights, and charges of biopolymers, as well as automatically calculate the major protonation and tautomerization state of biopolymers at specific pHs.
BpForms emcompasses five tools:
- Notation for describing biopolymers: See Section 2.1.
- Web-based graphical interface: See https://bpforms.org and Section 2.2.
- REST JSON API: See Section 2.3.
- Command line interface: See Section 2.4.
- Python API: See Section 2.5.
BpForms was motivated by the need to concretely represent the biochemistry of DNA modification, DNA repair, post-transcriptional processing, and post-translational processing in whole-cell computational models. In addition, BpForms are a valuable tool for experimental proteomics. In particular, we developed BpForms because there were no notations, schemas, data models, or file formats for concretely representing modified forms of biopolymers, despite the existence of several databases and ontologies of DNA, RNA, and protein modifications and the ProForma Proteoform Notation.
The BpForms syntax was inspired by the ProForma Proteoform Notation. BpForms improves upon this syntax in several ways:
- BpForms separates the representation of modified biopolymers from the chemical processes which generate them.
- BpForms clarifies the representation of multiply modified monomers. This is necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
- BpForms can be customized to represent any modification and, therefore, is not limited to previously enumerated modifications. This is also necessary to represent the combinatorial complexity of modified DNA, RNA, and proteins.
- BpForms supports two additional types of uncertainty in the structures of biopolyers: uncertainty in the position of a modified nucleotide/amino acid and uncertainty in its charge.
- BpForms has a concrete grammar. This enables error checking, as well the calculation of chemical formulae, masses, and charges which is essential for modeling.
- 1. Installation
- 2. Tutorial
- 2.1. BpForms notation
- 2.2. Graphical web interface
- 2.3. REST API
- 2.4. Command line interface
- 2.5. Python API
- 2.5.1. Importing BpForms
- 2.5.2. Creating biopolymer forms
- 2.5.3. Getting and setting monomers
- 2.5.4. Getting and setting the base of a monomer
- 2.5.5. Protonation and tautomerization
- 2.5.6. Calculation of physical properties
- 2.5.7. Generating FASTA sequences for BpForms
- 2.5.8. Determine if two biopolymers describe the same structure
- 3. Alphabets
- 4. Resources for reconstructing modified DNA, RNA, and proteins
- 5. Limitations, alternatives, and future directions
- 6. Contributing to BpForms
- 7. API documentation
- 8. About