ssnmr
Overview
ssnmr provides functions for predicting nuclear magnetic resonance (NMR) observables from protein sequence data. Unlike SSProtein or SSTrajectory, this module is stateless — it contains standalone functions rather than a class, and most functions take a sequence string directly rather than a trajectory object.
Random coil chemical shifts. The primary function, compute_random_coil_chemical_shifts, predicts sequence-corrected random coil 1H, 13C, and 15N chemical shifts for a given amino acid sequence. These are useful as a disordered-state reference baseline when interpreting experimental NMR spectra of intrinsically disordered proteins (IDPs) or unfolded proteins.
Corrections applied include:
Nearest-neighbour sequence effects — shifts are adjusted for the two residues on either side of each position, using the correction factors of Kjaergaard & Poulsen (2011) and Schwarzinger et al. (2001).
Temperature — linear corrections are applied relative to a 5 °C baseline.
pH — charged-state populations for Asp, Glu, His, and phosphorylated residues (pSer, pThr, pTyr) are accounted for via fractional deprotonation at the given pH.
Perdeuteration — optional corrections for fully deuterated protein samples.
Supported residue types. All 20 canonical amino acids are supported, along with three phosphorylated residues: phosphoserine (pSer / SEP / PS), phosphothreonine (pThr / PTHR / PT), and phosphotyrosine (pTyr / PTYR / PY). Phosphorylated residues cannot be combined with the perdeuteration corrections.
Output format. The function returns a list of per-residue dictionaries, one per position (excluding the two terminal padding residues), each containing keys Res, Index, CA, CB, CO, N, HN, and HA. Glycine lacks a Cβ (CB is "**.***") and proline lacks a backbone amide (N and HN are "*.***"). Shifts are returned as floats or three-decimal-place strings depending on the asFloat flag.
Example usage:
from soursop.ssnmr import compute_random_coil_chemical_shifts
sequence = "MAEQKLISEEDL"
shifts = compute_random_coil_chemical_shifts(sequence, temperature=25, pH=7.4)
for residue in shifts:
print(residue['Res'], residue['CA'], residue['N'])
ssnmr - sequence-based random-coil chemical shift prediction.
This module predicts random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA) for an arbitrary amino-acid sequence, corrected for temperature, pH and (optionally) perdeuteration, including support for phosphorylated serine/threonine/tyrosine. The implementation ports the Kjaergaard & Poulsen / Schwarzinger reference-shift and neighbour- correction tables and is the basis for comparing simulated ensembles to experimental NMR data.
The public entry point is compute_random_coil_chemical_shifts; the
remaining functions are private helpers.
Author(s): Alex Keeley (with Alex Holehouse)
- soursop.ssnmr.compute_random_coil_chemical_shifts(protein_sequence, temperature=25, pH=7.4, use_ggxgg=True, use_perdeuteration=False, asFloat=True)[source]
Predict sequence-corrected random-coil chemical shifts.
For a user-provided amino-acid sequence, predicts the random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA) and applies sequence-context (nearest-neighbour), temperature and pH corrections. Reference shifts and general sequence-correction factors are from Kjaergaard & Poulsen (J. Biomol. NMR 2011, 50:157-165); temperature and glycine corrections are from Kjaergaard, Brander & Poulsen (J. Biomol. NMR 2011, 49:139-149); the correction-factor methodology follows Schwarzinger et al. (JACS 2001, 123:2970-2978); and the perdeuteration corrections are from Cavanagh, Fairbrother, Palmer, Rance & Skelton, Protein NMR Spectroscopy, 2nd ed. (Academic Press, 2007). The implementation is a port of the JavaScript tool by Alex Maltsev (NIH); see https://www1.bio.ku.dk/english/research/bms/research/sbinlab/randomchemicalshifts/
The input may be a standard one-letter sequence; phospho-residues can additionally be supplied using parenthesised three-letter codes (e.g.
"AS(SEP)GA"for a phospho-serine). Glycine and proline produce masked placeholder values for atoms they lack (CB for glycine; N/HN for proline).- Parameters:
protein_sequence (str) – Amino-acid sequence to predict shifts for. One-letter codes, with optional parenthesised multi-letter codes for phospho-residues (
SEP/PS,TPO/PT,PTR/PY).temperature (float or int, optional) – Sample temperature in degrees Celsius, used for the temperature correction. Must be between 0 and 100. Default
25.pH (float or int, optional) – Sample pH, used for the pH (titratable-residue) correction. Must be between 0 and 14. Default
7.4.use_ggxgg (bool, optional) – Whether to apply the GGXGG-based neighbour correction for glycines. Default
True.use_perdeuteration (bool, optional) – Whether to apply perdeuterated correction factors. Cannot be combined with phospho-residues. Default
False.asFloat (bool, optional) – If
Truethe output chemical shifts are floats; ifFalsethey are formatted strings. DefaultTrue.
- Returns:
One dictionary per residue in the input sequence, each containing the residue abbreviation (
'Res'), its index ('Index') and the six predicted shifts ('CA','CB','CO','N','HN','HA'). Atoms absent for a residue type (glycine CB, proline N/HN, HA under perdeuteration) carry a masked placeholder.- Return type:
list of dict
- Raises:
soursop.ssexceptions.SSException – If
temperatureis outside 0-100 C, ifpHis outside 0-14, or ifuse_perdeuterationis requested for a sequence containing phosphorylated residues.
Example
>>> shifts = compute_random_coil_chemical_shifts('ASGAS', temperature=25, pH=7.4) >>> sorted(shifts[0].keys()) ['CA', 'CB', 'CO', 'HA', 'HN', 'Index', 'N', 'Res']