ssnmr

Overview

ssnmr covers three complementary NMR-observable predictions for IDP/IDR ensembles:

Sequence-based random-coil chemical shifts (¹H, ¹³C, ¹⁵N), via compute_random_coil_chemical_shifts(). Stateless — takes a sequence string, no trajectory needed.
Structure-based scalar (J) couplings: ³J(HN, Hα) computed per frame per residue from the φ dihedral via the Karplus relation, via compute_J3_HN_HA() (and the generic Karplus evaluator karplus() for arbitrary coefficients). Takes an SSProtein.
NOE distances: per-frame inter-atom distances via compute_NOE_distances(), collapsed to an \(\langle r^{-p}\rangle^{-1/p}\) ensemble average by noe_ensemble_average().

Random coil chemical shifts

The primary function, compute_random_coil_chemical_shifts(), predicts sequence-corrected random coil ¹H, ¹³C, and ¹⁵N chemical shifts for a given amino acid sequence. These are useful as a disordered-state reference baseline when interpreting experimental NMR spectra of intrinsically disordered proteins (IDPs) or unfolded proteins.

Corrections applied include:

Nearest-neighbour sequence effects — shifts are adjusted for the two residues on either side of each position, using the correction factors of Kjaergaard & Poulsen (2011) and Schwarzinger et al. (2001).
Temperature — linear corrections are applied relative to a 5 °C baseline.
pH — charged-state populations for Asp, Glu, His, and phosphorylated residues (pSer, pThr, pTyr) are accounted for via fractional deprotonation at the given pH.
Perdeuteration — optional corrections for fully deuterated protein samples.

Supported residue types. All 20 canonical amino acids are supported, along with three phosphorylated residues: phosphoserine (pSer / SEP / PS), phosphothreonine (pThr / PTHR / PT), and phosphotyrosine (pTyr / PTYR / PY). Phosphorylated residues cannot be combined with the perdeuteration corrections.

Output format. The function returns a list of per-residue dictionaries, one per position (excluding the two terminal padding residues), each containing keys Res, Index, CA, CB, CO, N, HN, and HA. Glycine lacks a Cβ (CB is "**.***") and proline lacks a backbone amide (N and HN are "*.***"). Shifts are returned as floats or three-decimal-place strings depending on the asFloat flag.

Example usage:

from soursop.ssnmr import compute_random_coil_chemical_shifts

sequence = "MAEQKLISEEDL"
shifts = compute_random_coil_chemical_shifts(sequence, temperature=25, pH=7.4)

for residue in shifts:
    print(residue['Res'], residue['CA'], residue['N'])

Scalar (J) couplings

For each residue with a defined backbone φ dihedral, the three-bond ³J(HN, Hα) scalar coupling is well approximated by the Karplus relation

\[{}^{3}\!J(\mathrm{HN}, \mathrm{H}\alpha) = A\cos^{2}\!\big(\phi + \phi_{0}\big) + B\cos\!\big(\phi + \phi_{0}\big) + C ,\]

where A, B, C and φ₀ are empirical coefficients fitted to experimental data. ssnmr ships six literature parameterisations of these coefficients, ported from the biceps package (Voelz lab), which itself adapts MDTraj’s mdtraj/nmr/scalar_couplings.py (Beauchamp et al.):

Model	A	B	C	σ (Hz)
`Bax2007` (default)	8.40	-1.36	0.33	0.36
`Bax1997`	7.09	-1.42	1.55	0.39
`Ruterjans1999`	7.90	-1.05	0.65	0.25
`Habeck`	7.13	-1.31	1.56	0.34
`Vuister`	6.51	-1.76	1.60	0.73
`Pardi`	6.40	-1.40	1.90	0.76

All six models share φ₀ = −60° (the convention is to phase-shift the Karplus form by −60° so that θ = 0 corresponds to the ideal HN–Cα–N–C′ eclipsed geometry). The per-model σ is the RMSD of the parameterisation against its training experimental dataset and is a sensible forward-model uncertainty to use when feeding J-couplings into the BME or COPER reweighters.

Units. ssnmr takes φ in degrees (consistent with SSProtein.get_angles) and returns J in Hz.

Computing 3J(HN, Hα) from an ensemble:

from soursop.sstrajectory import SSTrajectory
from soursop.ssnmr import compute_J3_HN_HA

traj    = SSTrajectory('traj.xtc', 'start.pdb')
protein = traj.proteinTrajectoryList[0]

# Per-frame, per-residue J-couplings (shape: n_frames x n_phi).
atoms, J = compute_J3_HN_HA(protein, model="Bax2007")

# Ensemble mean + the model's forward-model uncertainty in Hz.
atoms, J_mean, sigma = compute_J3_HN_HA(
    protein, model="Bax2007", weights=False, return_uncertainty=True)
J_mean = J.mean(axis=0)

The (n_frames, n_phi) matrix is the natural input for the reweighters - so a typical workflow against an experimental J vector is:

from soursop.ssbme import BME, ExperimentalObservable

atoms, J_calc, sigma = compute_J3_HN_HA(protein, return_uncertainty=True)

# one experimental observable per residue with a defined phi
obs = [ExperimentalObservable(value=J_exp[k], uncertainty=sigma,
                              name=f"3J_HN_HA_res{k}")
       for k in range(J_calc.shape[1])]

result = BME(obs, J_calc).fit(theta=2.0, auto_theta=False)
weights = result.weights

Other Karplus parameterisations. For any other Karplus-form coupling - ³J(Hα, C′), ³J(HN, Cβ), Bothner-By, Tvaroska, Aydin, … - call karplus() directly with the appropriate dihedral angle (in degrees) and your coefficients.

Citations. The HN-Hα coefficients are due to: Vögeli, B. et al. J. Am. Chem. Soc. 129, 9377-9385 (2007) (Bax2007); Hu, J.-S. & Bax, A. J. Am. Chem. Soc. 119, 6360-6368 (1997) (Bax1997); Schmidt, J. M. et al. J. Biomol. NMR 14, 1-12 (1999) (Ruterjans1999); Habeck, M., Rieping, W. & Nilges, M. J. Magn. Reson. 177, 160-165 (2005) (Habeck); Vuister, G. W. & Bax, A. J. Am. Chem. Soc. 115, 7772-7777 (1993) (Vuister); Pardi, A., Billeter, M. & Wüthrich, K. J. Mol. Biol. 180, 741-751 (1984) (Pardi).

NOE distances

The nuclear Overhauser effect (NOE) cross-peak between two protons depends on the inverse sixth power of the inter-proton distance averaged over the ensemble:

\[I_{ij} \propto \langle r_{ij}^{-6} \rangle .\]

Two helpers cover the typical workflow:

compute_NOE_distances() returns the per-frame inter-atom distance matrix in Angstroms for a list of atom pairs — the natural per-frame structural primitive.
noe_ensemble_average() collapses such a distance array via the NOE convention \((\sum_i w_i\,d_i^{-p})^{-1/p}\) (default \(p = 6\); some studies use \(p = 3\)). It honours the package-wide weights contract: weights=False (default) gives the uniform mean, a per-frame weight vector gives a reweighted NOE distance.

Computing ensemble-averaged NOE distances:

import numpy as np
from soursop.ssnmr import compute_NOE_distances, noe_ensemble_average

pairs = np.array([[0, 10], [0, 20], [5, 15]])   # atom indices
d = compute_NOE_distances(protein, pairs)       # (n_frames, 3) A
r_noe = noe_ensemble_average(d, power=6)        # (3,) A

Feeding NOEs to BME / COPER. The linear-additive observable for reweighting is \(r^{-p}\), not \(r\) (because NOE intensity itself is the linear ensemble average of \(r^{-p}\)). So the BME-ready calculated_values matrix is d ** -p and the experimental observable is r_exp ** -p:

from soursop.ssbme import BME, ExperimentalObservable

calc = d ** -6                                  # (n_frames, n_pairs)
obs = [ExperimentalObservable((r_exp[k]) ** -6,
                              uncertainty=6.0 * (r_exp[k]) ** -7 * sigma_r[k],
                              name=f"NOE_pair{k}")
       for k in range(len(r_exp))]              # uncertainty propagated from r_exp
weights = BME(obs, calc).fit(theta=2.0, auto_theta=False).weights

(Note the experimental σ on the linear observable \(r^{-6}\) is obtained by error-propagation from σ on \(r\): σ(r^{-6}) ≈ 6·r^{-7}·σ(r).)

ssnmr - NMR observables for IDP/IDR ensembles.

This module covers two complementary NMR predictions:

Sequence-based random-coil chemical shifts — predicts random-coil backbone shifts (CA, CB, CO, N, HN, HA) for an arbitrary amino-acid sequence, corrected for temperature, pH and (optionally) perdeuteration, including phosphorylated Ser/Thr/Tyr. Implementation ports the Kjaergaard & Poulsen / Schwarzinger reference-shift and neighbour- correction tables. The public entry point is compute_random_coil_chemical_shifts().
Structure-based scalar (J) couplings — predicts the backbone ³J(HN, Hα) scalar coupling per residue per frame from the φ dihedral via the Karplus relation, using any of the six literature parameterisations stored in KARPLUS_HN_HA_COEFFICIENTS (Bax2007, Bax1997, Ruterjans1999, Habeck, Vuister, Pardi). Public entry points are karplus() (generic Karplus evaluator for arbitrary coefficients) and compute_J3_HN_HA() (operates on an SSProtein). The returned (n_frames, n_phi) array is the natural input for soursop.ssbme.BME / BMECustom and soursop.sscoper.COPER reweighting.

The Karplus coefficient table is adapted from biceps (Voelz lab, https://github.com/vvoelz/biceps), itself ported from MDTraj’s mdtraj/nmr/scalar_couplings.py (Beauchamp / McGibbon / Lane).

Author(s): Alex Keeley (chemical shifts) and Alex Holehouse (J-couplings).

soursop.ssnmr.compute_random_coil_chemical_shifts(protein_sequence, temperature=25, pH=7.4, use_ggxgg=True, use_perdeuteration=False, asFloat=True)[source]

Predict sequence-corrected random-coil chemical shifts.

For a user-provided amino-acid sequence, predicts the random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA) and applies sequence-context (nearest-neighbour), temperature and pH corrections. Reference shifts and general sequence-correction factors are from Kjaergaard & Poulsen (J. Biomol. NMR 2011, 50:157-165); temperature and glycine corrections are from Kjaergaard, Brander & Poulsen (J. Biomol. NMR 2011, 49:139-149); the correction-factor methodology follows Schwarzinger et al. (JACS 2001, 123:2970-2978); and the perdeuteration corrections are from Cavanagh, Fairbrother, Palmer, Rance & Skelton, Protein NMR Spectroscopy, 2nd ed. (Academic Press, 2007). The implementation is a port of the JavaScript tool by Alex Maltsev (NIH); see https://www1.bio.ku.dk/english/research/bms/research/sbinlab/randomchemicalshifts/

The input may be a standard one-letter sequence; phospho-residues can additionally be supplied using parenthesised three-letter codes (e.g. "AS(SEP)GA" for a phospho-serine). Glycine and proline produce masked placeholder values for atoms they lack (CB for glycine; N/HN for proline).

Parameters:

protein_sequence (str) – Amino-acid sequence to predict shifts for. One-letter codes, with optional parenthesised multi-letter codes for phospho-residues (SEP/PS, TPO/PT, PTR/PY).
temperature (float or int, optional) – Sample temperature in degrees Celsius, used for the temperature correction. Must be between 0 and 100. Default 25.
pH (float or int, optional) – Sample pH, used for the pH (titratable-residue) correction. Must be between 0 and 14. Default 7.4.
use_ggxgg (bool, optional) – Whether to apply the GGXGG-based neighbour correction for glycines. Default True.
use_perdeuteration (bool, optional) – Whether to apply perdeuterated correction factors. Cannot be combined with phospho-residues. Default False.
asFloat (bool, optional) – If True the output chemical shifts are floats; if False they are formatted strings. Default True.

Returns:

One dictionary per residue in the input sequence, each containing the residue abbreviation ('Res'), its index ('Index') and the six predicted shifts ('CA', 'CB', 'CO', 'N', 'HN', 'HA'). Atoms absent for a residue type (glycine CB, proline N/HN, HA under perdeuteration) carry a masked placeholder.

Return type:

list of dict

Raises:

soursop.ssexceptions.SSException – If temperature is outside 0-100 C, if pH is outside 0-14, or if use_perdeuteration is requested for a sequence containing phosphorylated residues.

Example

>>> shifts = compute_random_coil_chemical_shifts('ASGAS', temperature=25, pH=7.4)
>>> sorted(shifts[0].keys())
['CA', 'CB', 'CO', 'HA', 'HN', 'Index', 'N', 'Res']

soursop.ssnmr.karplus(angle, A, B, C, phi0=0.0)[source]

Generic Karplus relation J = A cos^2(theta) + B cos(theta) + C.

theta = angle + phi0 with both quantities in degrees; the returned scalar coupling is in Hz. Vectorised — accepts a scalar, 1D array or higher-dimensional array of angle.

The shape of the input angle is preserved in the output, so this function evaluates the Karplus form for any literature parameterisation (the protein ³J(HN,Hα) sets in KARPLUS_HN_HA_COEFFICIENTS, but also Bothner-By, Tvaroska, Aydin or any other Karplus-type relation with its own A, B, C and phi0).

Parameters:

angle (float or numpy.ndarray) – Dihedral angle(s) in degrees.
A (float) – Karplus coefficients.
B (float) – Karplus coefficients.
C (float) – Karplus coefficients.
phi0 (float, optional) – Phase offset (degrees) added to angle before evaluating the Karplus form. Default 0.0.

Returns:

A cos^2(angle + phi0) + B cos(angle + phi0) + C, same shape as angle, in Hz.

Return type:

float or numpy.ndarray

Examples

>>> from soursop.ssnmr import karplus, KARPLUS_HN_HA_COEFFICIENTS
>>> round(karplus(60.0, **KARPLUS_HN_HA_COEFFICIENTS["Bax2007"]), 3)
7.37
>>> import numpy as np
>>> phi = np.array([-60.0, 60.0, 180.0])
>>> np.round(karplus(phi, **KARPLUS_HN_HA_COEFFICIENTS["Bax2007"]), 2)
array([3.11, 7.37, 3.11])

soursop.ssnmr.compute_J3_HN_HA(protein, model='Bax2007', stride=1, weights=False, etol=1e-07, return_uncertainty=False)[source]

Compute 3J(HN, H_alpha) scalar couplings from an SSProtein.

Evaluates the Karplus relation on the per-frame φ dihedral angles (in degrees) returned by get_angles() using the chosen literature parameterisation. The result is a (n_frames, n_phi) matrix (per frame, per residue with a defined φ) ready to be passed as calculated_values to soursop.ssbme.BME or soursop.sscoper.COPER. The first residue has no φ, so n_phi == n_residues - 1 for a single-chain protein.

Parameters:

protein (soursop.ssprotein.SSProtein) – Protein chain whose ensemble we want J-couplings for.
model (str, optional) – Karplus parameterisation. Must be a key of KARPLUS_HN_HA_COEFFICIENTS. Default KARPLUS_HN_HA_DEFAULT_MODEL ("Bax2007").
stride (int, optional) – Subsample the trajectory by taking every stride-th frame before evaluation. Default 1.
weights (numpy.ndarray or False, optional) – Optional per-frame weight vector (length n_frames) used to collapse the frame axis to a single per-residue ensemble mean, validated by soursop.ssutils.validate_weights() (so the usual [0, 1], finite, sum(w) == 1 contract applies). When stride and weights are both given the weight vector is first subsampled and re-normalised, matching the consistent package-wide reweighting behaviour. Default False (no weighted collapse — the full (n_frames, n_phi) matrix is returned).
etol (float, optional) – Tolerance for the sum(weights) == 1 check. Default 1e-7.
return_uncertainty (bool, optional) – If True, additionally return the Karplus-model RMSD-vs-experiment from KARPLUS_HN_HA_UNCERTAINTIES (scalar, Hz). This is a useful default forward-model uncertainty for use with BME/COPER. Default False.

Returns:

atom_indices (list of list of mdtraj.Atom) – The four atoms (C_{i-1}, N_i, CA_i, C_i) defining each φ dihedral. len(atom_indices) == n_phi.
J (numpy.ndarray) – ³J(HN, Hα) in Hz. Shape (n_frames, n_phi) by default; shape (n_phi,) when weights is supplied (frame axis collapsed to the weighted mean).
sigma (float) – Karplus-model uncertainty in Hz. Returned only if return_uncertainty=True.

Raises:

SSException – If model is not a key of KARPLUS_HN_HA_COEFFICIENTS, or if weights fails soursop.ssutils.validate_weights().

Examples

>>> # standard per-frame matrix, ready for BME / COPER
>>> atoms, J = compute_J3_HN_HA(protein, model="Bax2007")
>>> J.shape
(n_frames, n_phi)
>>> # ensemble mean (uniform weights) with the model's uncertainty
>>> import numpy as np
>>> w = np.full(protein.n_frames, 1.0 / protein.n_frames)
>>> atoms, J_mean, sigma = compute_J3_HN_HA(
...     protein, weights=w, return_uncertainty=True)

soursop.ssnmr.compute_NOE_distances(protein, atom_pairs, stride=1)[source]

Per-frame inter-atom distances for a set of NOE atom pairs.

Thin wrapper around mdtraj.compute_distances() that returns distances in Angstroms (the soursop convention) and the shape that BME / COPER / BMECustom consume directly. The raw r-values are returned per frame; collapse to a single NOE ensemble distance with noe_ensemble_average() (or take r**-p yourself if you want the linear-additive observable to feed to BME against an experimental r_exp**-p).

Parameters:

protein (soursop.ssprotein.SSProtein)
atom_pairs (array_like, shape (n_pairs, 2)) – Zero-based atom indices into protein.traj.topology. (Use protein.traj.topology.select(...) or topology.atom(...) to translate residue/atom names into indices.)
stride (int, optional) – Subsample frames before the distance computation. Default 1.

Returns:

Distances in Angstroms, shape (n_frames, n_pairs) (n_frames after striding).

Return type:

numpy.ndarray

Raises:

SSException – If atom_pairs does not have shape (n_pairs, 2).

Examples

>>> import numpy as np
>>> from soursop.ssnmr import compute_NOE_distances, noe_ensemble_average
>>> pairs = np.array([[0, 10], [0, 20], [5, 15]])    # atom indices
>>> d = compute_NOE_distances(protein, pairs)        # (n_frames, 3) Å
>>> r_noe = noe_ensemble_average(d, power=6)         # (3,) Å

soursop.ssnmr.noe_ensemble_average(distances, power=6, weights=False, etol=1e-07, axis=0)[source]

NOE-averaged distance across the axis of a distance array.

Implements \(\big( \sum_i w_i\, d_i^{-p} \big)^{-1/p}\) along axis: the standard NOE r^-p ensemble convention (default p=6; some studies use p=3). Honours the package-wide weights= contract (see soursop.ssutils.validate_weights()): weights=False (default) gives the uniform mean, while a vector of per-frame weights (must be length distances.shape[axis], in [0, 1], finite, summing to 1) gives the weighted NOE distance.

Parameters:

distances (numpy.ndarray) – Distance array (any shape) — typically (n_frames, n_pairs) from compute_NOE_distances().
power (float, optional) – NOE exponent. Default 6.
weights (numpy.ndarray or False, optional) – Per-frame weights, validated by soursop.ssutils.validate_weights(). Default False -> uniform.
etol (float, optional) – Tolerance on sum(weights) == 1. Default 1e-7.
axis (int, optional) – Axis to collapse. Default 0 (frame axis).

Returns:

NOE distance(s) (Å) with axis collapsed.

Return type:

numpy.ndarray

Raises:

SSException – If weights fails validation, or if any distance along axis is non-positive (since r^-p is undefined).