sshdx

Overview

sshdx predicts per-residue HDX protection factors (PFs) for backbone amide hydrogen-deuterium exchange from a structural ensemble, using the empirical Best-Vendruscolo relation

\[\ln P_i \;=\; \beta_c\, N_c(i) \;+\; \beta_h\, N_h(i) \;+\; \beta_0 ,\]

where for each residue \(i\) with a backbone amide N-H:

  • \(N_c(i)\) is the number of heavy atoms in the protein within contact_cutoff (default 6.5 Å) of the amide N of residue \(i\), ignoring residues at sequence separation \(|i - j| \le\) exclude_neighbours (default 2);

  • \(N_h(i)\) is the number of backbone H-bonds the amide H of residue \(i\) forms (as donor) to a backbone carbonyl O at sequence separation \(|i - j| >\) exclude_neighbours. Per-frame H-bonds are detected by mdtraj.wernet_nilsson().

Proline (no backbone H) and any residue that lacks a recognisable backbone amide H are dropped from the residue list. The three exported helpers all return (residue_indices, array) so the caller knows the per-residue mapping.

Public entry points

All three accept stride and return (n_frames, n_residues) arrays — the natural input shape for soursop.ssbme.BME / soursop.sscoper.COPER reweighting against experimental protection factors.

Defaults (Best & Vendruscolo, Structure 14, 97-106, 2006):

  • DEFAULT_BETA_C = 0.35

  • DEFAULT_BETA_H = 2.0

  • DEFAULT_BETA_0 = 0.0

  • DEFAULT_CONTACT_CUTOFF_NM = 0.65 (= 6.5 Å)

  • DEFAULT_EXCLUDE_NEIGHBOURS = 2

Computing ln(P) for an ensemble

from soursop.sstrajectory import SSTrajectory
from soursop.sshdx import compute_protection_factors

traj    = SSTrajectory('traj.xtc', 'start.pdb')
protein = traj.proteinTrajectoryList[0]

# Per-frame, per-residue ln(P)
residues, lnP = compute_protection_factors(protein)
# lnP.shape == (n_frames, len(residues))

# Ensemble-averaged ln(P) per residue (uniform weights)
import numpy as np
w = np.full(protein.n_frames, 1.0 / protein.n_frames)
residues, lnP_mean = compute_protection_factors(protein, weights=w)

Tune the empirical coefficients if you are using a different parameterisation:

residues, lnP = compute_protection_factors(
    protein, beta_c=0.5, beta_h=2.5, beta_0=-0.3)

Reweighting against experimental protection factors

The (n_frames, n_residues) ln(P) matrix plugs directly into the BME / COPER reweighters:

from soursop.sshdx import compute_protection_factors
from soursop.ssbme import BME, ExperimentalObservable

residues, lnP_calc = compute_protection_factors(protein)
# experimental ln(P) per residue with its associated uncertainty
obs = [ExperimentalObservable(lnP_exp[k], uncertainty=sigma_lnP[k],
                              name=f"PF_res{residues[k]}")
       for k in range(len(residues))]

result = BME(obs, lnP_calc).fit(theta=2.0, auto_theta=False)
weights = result.weights

Pitfalls and conventions

  • Units. contact_cutoff is in nanometres (matches mdtraj). 6.5 Å = 0.65 nm.

  • H-bond convention. compute_Nh only counts backbone-to-backbone H-bonds with the amide H of residue \(i\) as donor and a backbone carbonyl O at \(|i - j| >\) exclude_neighbours as acceptor — sidechain contributions and short-range \(i \rightarrow i, i+1, i+2\) interactions are dropped, in line with the Best-Vendruscolo definition.

  • N-terminus. The N-terminal amine of most force fields is named H1/H2/H3; sshdx keeps the residue if it can find any of H/HN/H1 on the backbone N. If your N-terminus is parameterised differently, mask out the first residue post hoc.

  • Per-frame statistics. The per-frame ln(P) array is what the reweighters consume; the function only collapses the frame axis when you supply weights. Always inspect lnP.mean(axis=0) (or feed a uniform-weight vector) when comparing to experimental ln(P) directly.

  • Parameterisation. The default \(\beta_c = 0.35\), \(\beta_h = 2.0\) come from Best & Vendruscolo’s original fit on globular proteins. For IDRs / partially-folded states it is reasonable to keep them as a baseline, but a forward-model uncertainty in BME / COPER (or treating \(\beta_c\), \(\beta_h\) as nuisance parameters) is recommended.

Key references

  • Best, R. B. & Vendruscolo, M. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins. Structure 14, 97-106 (2006). doi:10.1016/j.str.2005.09.012.

  • Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. Rare Fluctuations of Native Proteins Sampled by Equilibrium Hydrogen Exchange. J. Am. Chem. Soc. 125, 15686-15687 (2003).

sshdx - HDX protection factors via the Best-Vendruscolo model.

Predicts per-residue ln(protection factor) (ln P) for backbone amide hydrogen-deuterium exchange (HDX) from a structural ensemble, using the empirical Best-Vendruscolo relation

ln P_i = beta_c * N_c(i) + beta_h * N_h(i) + beta_0

where, for each residue i with a backbone amide N-H:

  • N_c(i) is the number of heavy atoms in the protein within contact_cutoff (default 6.5 A) of the amide N of residue i, ignoring residues whose sequence separation from i is at most exclude_neighbours (default 2);

  • N_h(i) is the number of backbone H-bonds the amide H of residue i forms as donor to a backbone carbonyl O at sequence separation > exclude_neighbours (per-frame H-bonds are detected with mdtraj.wernet_nilsson).

Both counts are returned per frame per residue, so the resulting (n_frames, n_residues) arrays plug directly into the SOURSOP BME / COPER reweighters. The optional weights argument collapses the frame axis to a single per-residue ensemble mean (per the package-wide soursop.ssutils.validate_weights() contract).

Proline (no backbone H) and any residue lacking a recognisable backbone amide H are dropped from the residue list; the function returns the residue indices it covered alongside the data array.

Public entry points

References

  • Best, R. B. & Vendruscolo, M. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins. Structure 14, 97-106 (2006). doi:10.1016/j.str.2005.09.012.

  • Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. Rare Fluctuations of Native Proteins Sampled by Equilibrium Hydrogen Exchange. J. Am. Chem. Soc. 125, 15686-15687 (2003).

Author(s): Alex Holehouse

soursop.sshdx.compute_Nc(protein, contact_cutoff=0.65, exclude_neighbours=2, stride=1)[source]

Per-residue per-frame heavy-atom contact count N_c(i).

N_c(i) is the number of protein heavy atoms within contact_cutoff of the backbone amide N of residue i, excluding atoms in residues at sequence separation |i - j| <= exclude_neighbours. The cutoff is in nanometres (matching mdtraj’s convention).

Parameters:
  • protein (soursop.ssprotein.SSProtein)

  • contact_cutoff (float, optional) – Distance cutoff in nm. Default 0.65 (= 6.5 A, Best-Vendruscolo).

  • exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default 2.

  • stride (int, optional) – Frame subsample. Default 1.

Returns:

  • residue_indices (numpy.ndarray, shape (n_res,)) – Zero-based residue indices for which N_c is defined (the backbone-amide residues; proline and any residue lacking a recognisable backbone H are dropped).

  • Nc (numpy.ndarray, shape (n_frames, n_res), int) – Per-frame, per-residue heavy-atom contact counts.

Raises:

SSException – If the protein has no recognisable backbone amide N+H residues.

soursop.sshdx.compute_Nh(protein, exclude_neighbours=2, stride=1)[source]

Per-residue per-frame backbone H-bond count N_h(i).

Counts backbone H-bonds in which the amide H of residue i is the donor and a backbone carbonyl O is the acceptor, at sequence separation |i - j| > exclude_neighbours. Per-frame H-bonds are detected by mdtraj.wernet_nilsson() (cone criterion on the Donor-H…Acceptor geometry).

Parameters:
  • protein (soursop.ssprotein.SSProtein)

  • exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default 2.

  • stride (int, optional) – Frame subsample. Default 1.

Returns:

  • residue_indices (numpy.ndarray, shape (n_res,)) – Same residue list as compute_Nc().

  • Nh (numpy.ndarray, shape (n_frames, n_res), int)

soursop.sshdx.compute_protection_factors(protein, beta_c=0.35, beta_h=2.0, beta_0=0.0, contact_cutoff=0.65, exclude_neighbours=2, stride=1, weights=False, etol=1e-07)[source]

Per-residue ln(protection factor) via the Best-Vendruscolo formula.

ln P_i = beta_c * N_c(i) + beta_h * N_h(i) + beta_0 evaluated per frame, then optionally collapsed to a single per-residue ensemble mean by a per-frame weight vector. The output shape (n_frames, n_res) is the natural input for soursop.ssbme.BME / soursop.sscoper.COPER reweighting against experimental HDX protection factors.

Parameters:
  • protein (soursop.ssprotein.SSProtein)

  • beta_c (float, optional) – Best-Vendruscolo coefficients. Defaults (0.35, 2.0, 0.0).

  • beta_h (float, optional) – Best-Vendruscolo coefficients. Defaults (0.35, 2.0, 0.0).

  • beta_0 (float, optional) – Best-Vendruscolo coefficients. Defaults (0.35, 2.0, 0.0).

  • contact_cutoff (float, optional) – Heavy-atom contact cutoff (nm). Default 0.65.

  • exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default 2.

  • stride (int, optional) – Frame subsample. Default 1.

  • weights (numpy.ndarray or False, optional) – Optional per-frame weights collapsing the frame axis to a per-residue mean (validated by soursop.ssutils.validate_weights()). Default False (per-frame array returned).

  • etol (float, optional) – Tolerance on sum(weights) == 1. Default 1e-7.

Returns:

  • residue_indices (numpy.ndarray, shape (n_res,)) – Residue indices for which ln(P) is defined.

  • lnP (numpy.ndarray) – (n_frames, n_res) per-frame ln(P) by default; (n_res,) ensemble mean when weights is supplied.

Raises:

SSException – If the protein has no backbone amide N+H residues, or if weights fails validation.