Ensemble reweighting (frame weights)
Overview
By default every analysis routine in SOURSOP treats all frames of a trajectory as contributing equally to an ensemble average. In many situations — enhanced-sampling simulations, Markov-state-model reweighting, maximum-entropy / experimentally-restrained reweighting, or simple importance reweighting — each frame should instead contribute according to a statistical weight.
Every SOURSOP function that returns an ensemble-average value therefore
accepts an optional weights keyword: a per-frame vector that defines
how much each frame contributes to the average. The default,
weights=False, is an exact no-op and reproduces the original
unweighted behaviour bit-for-bit.
Reweighting in SOURSOP is deterministic. A weighted average is the closed-form expectation under the supplied weights — there is no stochastic resampling, so results are exactly reproducible and do not depend on a random seed.
The weight vector contract
A valid weights vector is a probability vector over frames:
one entry per frame (
len(weights) == n_frames);every entry in the closed interval
[0, 1];all entries finite (no
nan/inf);the entries sum to
1within a small toleranceetol(default1e-7).
These conditions are enforced centrally by
soursop.ssutils.validate_weights(). If any condition fails an
SSException is raised with a descriptive message, rather than
silently producing a meaningless number.
Most reweighting-capable methods also expose an etol keyword so the
sum-to-one tolerance can be tightened or relaxed.
Interaction with stride
When a method is called with both a frame stride and weights,
the weight vector is first subsampled (weights[::stride]) and then
renormalised so it still sums to 1 over the retained frames. A
warning is emitted because per-stride reweighting is rarely what you
want unless the weights were computed for exactly those frames.
The deterministic helpers
All weighting is funnelled through a small set of shared helpers in
soursop.ssutils so that every method applies weights identically:
validate_weights()— validate (and stride-normalise) a weight vector; the single source of truth.weighted_mean()—sum(w * x).weighted_rms()—sqrt(sum(w * x**2))(the polymer-physics RMS order parameter).weighted_std()— the reliability-weighted population standard deviation. Frame weights are probability weights with no associated sample size, so noddofcorrection is well-defined; the population estimator is the unambiguous, reproducible choice.weighted_corr()— the weighted Pearson correlation (used by the correlation observables).
Example
import numpy as np
from soursop.sstrajectory import SSTrajectory
TrajOb = SSTrajectory('traj.xtc', 'start.pdb')
protein = TrajOb.proteinTrajectoryList[0]
n = protein.n_frames
# 1. default: every frame contributes equally
rg_per_frame = protein.get_radius_of_gyration() # array, length n
# 2. uniform weights == the ordinary mean of the per-frame values
w_uniform = np.full(n, 1.0 / n)
rg_mean = protein.get_radius_of_gyration(weights=w_uniform)
assert np.isclose(rg_mean, rg_per_frame.mean())
# 3. a one-hot weight isolates a single frame
w_one = np.zeros(n); w_one[10] = 1.0
rg_frame10 = protein.get_radius_of_gyration(weights=w_one)
assert np.isclose(rg_frame10, rg_per_frame[10])
# 4. a real reweighting vector (e.g. from MSM / MaxEnt)
w = my_reweighting_vector # shape (n,), >= 0, sums to 1
dmap_w, _ = protein.get_distance_map(weights=w)
nu, A0 = protein.get_scaling_exponent(weights=w)[:2]
Behaviour at the extremes
Uniform weights (
1/neverywhere) reproduce the ordinary (unweighted) ensemble mean to within floating-point precision.One-hot weights (a single
1.0) return exactly that frame’s value.Invalid weights (wrong length, an element outside
[0, 1], a non-finite element, or a sum that differs from1by more thanetol) raise anSSException.
What can be reweighted
Any method that collapses the trajectory to an ensemble value accepts
weights. This includes the global dimensions
(get_radius_of_gyration, get_hydrodynamic_radius,
get_asphericity, get_end_to_end_distance,
get_gyration_tensor), the per-frame getters’ ensemble means, the
distance/contact maps, get_Q, the polymer-scaling observables
(get_internal_scaling, get_internal_scaling_RMS,
get_scaling_exponent, get_local_to_global_correlation), the
SASA summaries (get_all_SASA, get_site_accessibility,
get_regional_SASA), get_angle_decay, get_local_collapse,
get_end_to_end_vs_rg_correlation, the dihedral mutual information,
and the SSTrajectory overall/inter-chain observables.
A small number of routines describe a distribution or a pairwise
frame-vs-frame quantity rather than a single ensemble average — for
example get_internal_scaling(mean_vals=False) and
get_local_heterogeneity. A single per-frame probability vector is
not well-defined for those, so they raise an SSException if
weights is supplied (instead of silently returning a questionable
number).
API reference
- soursop.ssutils.validate_weights(weights, n_frames, stride=1, etol=1e-07)[source]
Validate (and stride-normalise) a per-frame re-weighting vector.
This is the single, shared entry point used by every SOURSOP function that accepts a
weightskeyword. It enforces that the weights form a proper per-frame probability vector so that all downstream deterministic weighted averages (weighted_mean(),weighted_std(),weighted_rms(),weighted_corr()) are well defined and consistent.The literal
False(orNone) is the “no weighting” sentinel and is passed straight through, so a default ofweights=Falseis a strict no-op.Validation (each failure raises
SSException):False/None-> returned unchanged (no-op).castable to a 1-D
numpy.float64array.all elements finite (no
nan/inf).exactly one weight per frame (
len == n_frames).every element in the closed interval
[0, 1].if
stride > 1: the vector is subsampled (weights[::stride]) and renormalised to sum to 1 so the strided weighted average is still a proper expectation (this fixes the historical behaviour where strided weights silently no longer summed to 1).|sum(weights) - 1| < etol(checked after any stride/renormalise).
- Parameters:
weights (array_like, False or None) – Per-frame weights, or the
False/Noneno-op sentinel.n_frames (int) – Number of frames the (unstrided) weight vector must match.
stride (int, optional) – Frame stride that will be applied to the trajectory. Default 1.
etol (float, optional) – Tolerance on
|sum(weights) - 1|. Default1e-7.
- Returns:
Falseif the input wasFalse/None; otherwise anumpy.float64array of lengthceil(n_frames / stride)that sums to 1 withinetol.- Return type:
numpy.ndarray or False
- Raises:
SSException – If any of the validation conditions above fail.
Example
>>> import numpy as np >>> from soursop.ssutils import validate_weights >>> validate_weights(False, 10) is False True >>> w = validate_weights(np.full(10, 0.1), 10) >>> float(w.sum()) 1.0
- soursop.ssutils.weighted_mean(x, weights, axis=0)[source]
Deterministic frame-weighted mean (
sum(w * x)alongaxis).Thin wrapper around
numpy.average(). Becauseweightsis a validated probability vector (sum == 1) this is exactly the re-weighted ensemble expectation and is numerically identical to thenumpy.average(x, axis, weights=weights)calls already used elsewhere in SOURSOP.- Parameters:
x (array_like) – Per-frame data; the frame axis is
axis.weights (numpy.ndarray) – Validated per-frame weights (see
validate_weights()).axis (int, optional) – Axis to average over (the frame axis). Default 0.
- Returns:
xaveraged overaxiswith the given weights.- Return type:
numpy.ndarray or float
Example
>>> import numpy as np >>> weighted_mean(np.array([1.0, 3.0]), np.array([0.25, 0.75])) 2.5
- soursop.ssutils.weighted_rms(x, weights, axis=0)[source]
Deterministic frame-weighted root-mean-square (
sqrt(sum(w*x^2))).The polymer-physics order parameter used by the internal-scaling and scaling-exponent routines is an RMS distance, so this is the weighted analogue of
sqrt(mean(x**2)).- Parameters:
x (array_like) – Per-frame data; the frame axis is
axis.weights (numpy.ndarray) – Validated per-frame weights (see
validate_weights()).axis (int, optional) – Axis to reduce over. Default 0.
- Returns:
The weighted RMS of
xoveraxis.- Return type:
numpy.ndarray or float
Example
>>> import numpy as np >>> float(weighted_rms(np.array([3.0, 4.0]), np.array([0.5, 0.5]))) 3.5355339059327378
- soursop.ssutils.weighted_std(x, weights, axis=0)[source]
Deterministic frame-weighted (population) standard deviation.
Uses the reliability-weighted population estimator
sqrt(sum(w * (x - mean)**2)). Frame weights here are probability weights with no associated sample size, so there is no well-definedddof(Bessel-style) correction; the population estimator is the unambiguous, reproducible choice and the de-facto standard for re-weighted molecular-dynamics ensembles.- Parameters:
x (array_like) – Per-frame data; the frame axis is
axis.weights (numpy.ndarray) – Validated per-frame weights (see
validate_weights()).axis (int, optional) – Axis to reduce over. Default 0.
- Returns:
The weighted population standard deviation of
xoveraxis.- Return type:
numpy.ndarray or float
Example
>>> import numpy as np >>> float(weighted_std(np.array([1.0, 1.0]), np.array([0.5, 0.5]))) 0.0
- soursop.ssutils.weighted_corr(a, b, weights)[source]
Deterministic frame-weighted Pearson correlation between two vectors.
Computed from the weighted covariance matrix (
numpy.cov(..., ddof=0, aweights=weights)), matching theddof=0/aweightsconvention already used byget_local_to_global_correlation.- Parameters:
a (array_like) – Equal-length per-frame vectors.
b (array_like) – Equal-length per-frame vectors.
weights (numpy.ndarray) – Validated per-frame weights (see
validate_weights()).
- Returns:
The weighted Pearson correlation coefficient of
aandb.- Return type:
float
Example
>>> import numpy as np >>> w = np.full(4, 0.25) >>> round(float(weighted_corr(np.array([1.,2,3,4]), np.array([2.,4,6,8]), w)), 6) 1.0