sshdx
Overview
sshdx predicts per-residue HDX protection factors (PFs) for backbone amide hydrogen-deuterium exchange from a structural ensemble, using the empirical Best-Vendruscolo relation
where for each residue \(i\) with a backbone amide N-H:
\(N_c(i)\) is the number of heavy atoms in the protein within
contact_cutoff(default 6.5 Å) of the amide N of residue \(i\), ignoring residues at sequence separation \(|i - j| \le\)exclude_neighbours(default 2);\(N_h(i)\) is the number of backbone H-bonds the amide H of residue \(i\) forms (as donor) to a backbone carbonyl O at sequence separation \(|i - j| >\)
exclude_neighbours. Per-frame H-bonds are detected bymdtraj.wernet_nilsson().
Proline (no backbone H) and any residue that lacks a recognisable backbone amide H are dropped from the residue list. The three exported helpers all return (residue_indices, array) so the caller knows the per-residue mapping.
Public entry points
compute_Nc()— per-residue per-frame heavy-atom contacts.compute_Nh()— per-residue per-frame backbone H-bond counts.compute_protection_factors()— per-residue ln(P), optionally collapsed to an ensemble mean via the package-wideweightsargument.
All three accept stride and return (n_frames, n_residues) arrays — the natural input shape for soursop.ssbme.BME / soursop.sscoper.COPER reweighting against experimental protection factors.
Defaults (Best & Vendruscolo, Structure 14, 97-106, 2006):
DEFAULT_BETA_C= 0.35DEFAULT_BETA_H= 2.0DEFAULT_BETA_0= 0.0DEFAULT_CONTACT_CUTOFF_NM= 0.65(= 6.5 Å)DEFAULT_EXCLUDE_NEIGHBOURS= 2
Computing ln(P) for an ensemble
from soursop.sstrajectory import SSTrajectory
from soursop.sshdx import compute_protection_factors
traj = SSTrajectory('traj.xtc', 'start.pdb')
protein = traj.proteinTrajectoryList[0]
# Per-frame, per-residue ln(P)
residues, lnP = compute_protection_factors(protein)
# lnP.shape == (n_frames, len(residues))
# Ensemble-averaged ln(P) per residue (uniform weights)
import numpy as np
w = np.full(protein.n_frames, 1.0 / protein.n_frames)
residues, lnP_mean = compute_protection_factors(protein, weights=w)
Tune the empirical coefficients if you are using a different parameterisation:
residues, lnP = compute_protection_factors(
protein, beta_c=0.5, beta_h=2.5, beta_0=-0.3)
Reweighting against experimental protection factors
The (n_frames, n_residues) ln(P) matrix plugs directly into the BME / COPER reweighters:
from soursop.sshdx import compute_protection_factors
from soursop.ssbme import BME, ExperimentalObservable
residues, lnP_calc = compute_protection_factors(protein)
# experimental ln(P) per residue with its associated uncertainty
obs = [ExperimentalObservable(lnP_exp[k], uncertainty=sigma_lnP[k],
name=f"PF_res{residues[k]}")
for k in range(len(residues))]
result = BME(obs, lnP_calc).fit(theta=2.0, auto_theta=False)
weights = result.weights
Pitfalls and conventions
Units.
contact_cutoffis in nanometres (matches mdtraj). 6.5 Å = 0.65 nm.H-bond convention.
compute_Nhonly counts backbone-to-backbone H-bonds with the amide H of residue \(i\) as donor and a backbone carbonyl O at \(|i - j| >\)exclude_neighboursas acceptor — sidechain contributions and short-range \(i \rightarrow i, i+1, i+2\) interactions are dropped, in line with the Best-Vendruscolo definition.N-terminus. The N-terminal amine of most force fields is named
H1/H2/H3;sshdxkeeps the residue if it can find any ofH/HN/H1on the backbone N. If your N-terminus is parameterised differently, mask out the first residue post hoc.Per-frame statistics. The per-frame ln(P) array is what the reweighters consume; the function only collapses the frame axis when you supply
weights. Always inspectlnP.mean(axis=0)(or feed a uniform-weight vector) when comparing to experimental ln(P) directly.Parameterisation. The default \(\beta_c = 0.35\), \(\beta_h = 2.0\) come from Best & Vendruscolo’s original fit on globular proteins. For IDRs / partially-folded states it is reasonable to keep them as a baseline, but a forward-model uncertainty in BME / COPER (or treating \(\beta_c\), \(\beta_h\) as nuisance parameters) is recommended.
Key references
Best, R. B. & Vendruscolo, M. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins. Structure 14, 97-106 (2006). doi:10.1016/j.str.2005.09.012.
Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. Rare Fluctuations of Native Proteins Sampled by Equilibrium Hydrogen Exchange. J. Am. Chem. Soc. 125, 15686-15687 (2003).
sshdx - HDX protection factors via the Best-Vendruscolo model.
Predicts per-residue ln(protection factor) (ln P) for backbone amide hydrogen-deuterium exchange (HDX) from a structural ensemble, using the empirical Best-Vendruscolo relation
ln P_i = beta_c * N_c(i) + beta_h * N_h(i) + beta_0
where, for each residue i with a backbone amide N-H:
N_c(i)is the number of heavy atoms in the protein withincontact_cutoff(default 6.5 A) of the amide N of residuei, ignoring residues whose sequence separation fromiis at mostexclude_neighbours(default 2);N_h(i)is the number of backbone H-bonds the amide H of residueiforms as donor to a backbone carbonyl O at sequence separation> exclude_neighbours(per-frame H-bonds are detected withmdtraj.wernet_nilsson).
Both counts are returned per frame per residue, so the resulting
(n_frames, n_residues) arrays plug directly into the SOURSOP BME /
COPER reweighters. The optional weights argument collapses the frame
axis to a single per-residue ensemble mean (per the package-wide
soursop.ssutils.validate_weights() contract).
Proline (no backbone H) and any residue lacking a recognisable backbone amide H are dropped from the residue list; the function returns the residue indices it covered alongside the data array.
Public entry points
compute_Nc()- per-residue heavy-atom contacts (per frame).compute_Nh()- per-residue backbone H-bonds (per frame).compute_protection_factors()- per-residue ln(P) (per frame, or optionally collapsed viaweights).
References
Best, R. B. & Vendruscolo, M. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins. Structure 14, 97-106 (2006). doi:10.1016/j.str.2005.09.012.
Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. Rare Fluctuations of Native Proteins Sampled by Equilibrium Hydrogen Exchange. J. Am. Chem. Soc. 125, 15686-15687 (2003).
Author(s): Alex Holehouse
- soursop.sshdx.compute_Nc(protein, contact_cutoff=0.65, exclude_neighbours=2, stride=1)[source]
Per-residue per-frame heavy-atom contact count
N_c(i).N_c(i)is the number of protein heavy atoms withincontact_cutoffof the backbone amide N of residuei, excluding atoms in residues at sequence separation|i - j| <= exclude_neighbours. The cutoff is in nanometres (matching mdtraj’s convention).- Parameters:
protein (soursop.ssprotein.SSProtein)
contact_cutoff (float, optional) – Distance cutoff in nm. Default
0.65(= 6.5 A, Best-Vendruscolo).exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default
2.stride (int, optional) – Frame subsample. Default
1.
- Returns:
residue_indices (numpy.ndarray, shape (n_res,)) – Zero-based residue indices for which N_c is defined (the backbone-amide residues; proline and any residue lacking a recognisable backbone H are dropped).
Nc (numpy.ndarray, shape (n_frames, n_res), int) – Per-frame, per-residue heavy-atom contact counts.
- Raises:
SSException – If the protein has no recognisable backbone amide N+H residues.
- soursop.sshdx.compute_Nh(protein, exclude_neighbours=2, stride=1)[source]
Per-residue per-frame backbone H-bond count
N_h(i).Counts backbone H-bonds in which the amide H of residue i is the donor and a backbone carbonyl O is the acceptor, at sequence separation
|i - j| > exclude_neighbours. Per-frame H-bonds are detected bymdtraj.wernet_nilsson()(cone criterion on the Donor-H…Acceptor geometry).- Parameters:
protein (soursop.ssprotein.SSProtein)
exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default
2.stride (int, optional) – Frame subsample. Default
1.
- Returns:
residue_indices (numpy.ndarray, shape (n_res,)) – Same residue list as
compute_Nc().Nh (numpy.ndarray, shape (n_frames, n_res), int)
- soursop.sshdx.compute_protection_factors(protein, beta_c=0.35, beta_h=2.0, beta_0=0.0, contact_cutoff=0.65, exclude_neighbours=2, stride=1, weights=False, etol=1e-07)[source]
Per-residue ln(protection factor) via the Best-Vendruscolo formula.
ln P_i = beta_c * N_c(i) + beta_h * N_h(i) + beta_0evaluated per frame, then optionally collapsed to a single per-residue ensemble mean by a per-frame weight vector. The output shape(n_frames, n_res)is the natural input forsoursop.ssbme.BME/soursop.sscoper.COPERreweighting against experimental HDX protection factors.- Parameters:
protein (soursop.ssprotein.SSProtein)
beta_c (float, optional) – Best-Vendruscolo coefficients. Defaults
(0.35, 2.0, 0.0).beta_h (float, optional) – Best-Vendruscolo coefficients. Defaults
(0.35, 2.0, 0.0).beta_0 (float, optional) – Best-Vendruscolo coefficients. Defaults
(0.35, 2.0, 0.0).contact_cutoff (float, optional) – Heavy-atom contact cutoff (nm). Default
0.65.exclude_neighbours (int, optional) – Sequence-separation exclusion radius. Default
2.stride (int, optional) – Frame subsample. Default
1.weights (numpy.ndarray or False, optional) – Optional per-frame weights collapsing the frame axis to a per-residue mean (validated by
soursop.ssutils.validate_weights()). DefaultFalse(per-frame array returned).etol (float, optional) – Tolerance on
sum(weights) == 1. Default1e-7.
- Returns:
residue_indices (numpy.ndarray, shape (n_res,)) – Residue indices for which ln(P) is defined.
lnP (numpy.ndarray) –
(n_frames, n_res)per-frame ln(P) by default;(n_res,)ensemble mean whenweightsis supplied.
- Raises:
SSException – If the protein has no backbone amide N+H residues, or if
weightsfails validation.