ssnmr

Overview

ssnmr provides functions for predicting nuclear magnetic resonance (NMR) observables from protein sequence data. Unlike SSProtein or SSTrajectory, this module is stateless — it contains standalone functions rather than a class, and most functions take a sequence string directly rather than a trajectory object.

Random coil chemical shifts. The primary function, compute_random_coil_chemical_shifts, predicts sequence-corrected random coil 1H, 13C, and 15N chemical shifts for a given amino acid sequence. These are useful as a disordered-state reference baseline when interpreting experimental NMR spectra of intrinsically disordered proteins (IDPs) or unfolded proteins.

Corrections applied include:

  • Nearest-neighbour sequence effects — shifts are adjusted for the two residues on either side of each position, using the correction factors of Kjaergaard & Poulsen (2011) and Schwarzinger et al. (2001).

  • Temperature — linear corrections are applied relative to a 5 °C baseline.

  • pH — charged-state populations for Asp, Glu, His, and phosphorylated residues (pSer, pThr, pTyr) are accounted for via fractional deprotonation at the given pH.

  • Perdeuteration — optional corrections for fully deuterated protein samples.

Supported residue types. All 20 canonical amino acids are supported, along with three phosphorylated residues: phosphoserine (pSer / SEP / PS), phosphothreonine (pThr / PTHR / PT), and phosphotyrosine (pTyr / PTYR / PY). Phosphorylated residues cannot be combined with the perdeuteration corrections.

Output format. The function returns a list of per-residue dictionaries, one per position (excluding the two terminal padding residues), each containing keys Res, Index, CA, CB, CO, N, HN, and HA. Glycine lacks a Cβ (CB is "**.***") and proline lacks a backbone amide (N and HN are "*.***"). Shifts are returned as floats or three-decimal-place strings depending on the asFloat flag.

Example usage:

from soursop.ssnmr import compute_random_coil_chemical_shifts

sequence = "MAEQKLISEEDL"
shifts = compute_random_coil_chemical_shifts(sequence, temperature=25, pH=7.4)

for residue in shifts:
    print(residue['Res'], residue['CA'], residue['N'])

ssnmr - sequence-based random-coil chemical shift prediction.

This module predicts random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA) for an arbitrary amino-acid sequence, corrected for temperature, pH and (optionally) perdeuteration, including support for phosphorylated serine/threonine/tyrosine. The implementation ports the Kjaergaard & Poulsen / Schwarzinger reference-shift and neighbour- correction tables and is the basis for comparing simulated ensembles to experimental NMR data.

The public entry point is compute_random_coil_chemical_shifts; the remaining functions are private helpers.

Author(s): Alex Keeley (with Alex Holehouse)

soursop.ssnmr.compute_random_coil_chemical_shifts(protein_sequence, temperature=25, pH=7.4, use_ggxgg=True, use_perdeuteration=False, asFloat=True)[source]

Predict sequence-corrected random-coil chemical shifts.

For a user-provided amino-acid sequence, predicts the random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA) and applies sequence-context (nearest-neighbour), temperature and pH corrections. Reference shifts and general sequence-correction factors are from Kjaergaard & Poulsen (J. Biomol. NMR 2011, 50:157-165); temperature and glycine corrections are from Kjaergaard, Brander & Poulsen (J. Biomol. NMR 2011, 49:139-149); the correction-factor methodology follows Schwarzinger et al. (JACS 2001, 123:2970-2978); and the perdeuteration corrections are from Cavanagh, Fairbrother, Palmer, Rance & Skelton, Protein NMR Spectroscopy, 2nd ed. (Academic Press, 2007). The implementation is a port of the JavaScript tool by Alex Maltsev (NIH); see https://www1.bio.ku.dk/english/research/bms/research/sbinlab/randomchemicalshifts/

The input may be a standard one-letter sequence; phospho-residues can additionally be supplied using parenthesised three-letter codes (e.g. "AS(SEP)GA" for a phospho-serine). Glycine and proline produce masked placeholder values for atoms they lack (CB for glycine; N/HN for proline).

Parameters:
  • protein_sequence (str) – Amino-acid sequence to predict shifts for. One-letter codes, with optional parenthesised multi-letter codes for phospho-residues (SEP/PS, TPO/PT, PTR/PY).

  • temperature (float or int, optional) – Sample temperature in degrees Celsius, used for the temperature correction. Must be between 0 and 100. Default 25.

  • pH (float or int, optional) – Sample pH, used for the pH (titratable-residue) correction. Must be between 0 and 14. Default 7.4.

  • use_ggxgg (bool, optional) – Whether to apply the GGXGG-based neighbour correction for glycines. Default True.

  • use_perdeuteration (bool, optional) – Whether to apply perdeuterated correction factors. Cannot be combined with phospho-residues. Default False.

  • asFloat (bool, optional) – If True the output chemical shifts are floats; if False they are formatted strings. Default True.

Returns:

One dictionary per residue in the input sequence, each containing the residue abbreviation ('Res'), its index ('Index') and the six predicted shifts ('CA', 'CB', 'CO', 'N', 'HN', 'HA'). Atoms absent for a residue type (glycine CB, proline N/HN, HA under perdeuteration) carry a masked placeholder.

Return type:

list of dict

Raises:

soursop.ssexceptions.SSException – If temperature is outside 0-100 C, if pH is outside 0-14, or if use_perdeuteration is requested for a sequence containing phosphorylated residues.

Example

>>> shifts = compute_random_coil_chemical_shifts('ASGAS', temperature=25, pH=7.4)
>>> sorted(shifts[0].keys())
['CA', 'CB', 'CO', 'HA', 'HN', 'Index', 'N', 'Res']