Ensemble reweighting (frame weights)

Overview

By default every analysis routine in SOURSOP treats all frames of a trajectory as contributing equally to an ensemble average. In many situations — enhanced-sampling simulations, Markov-state-model reweighting, maximum-entropy / experimentally-restrained reweighting, or simple importance reweighting — each frame should instead contribute according to a statistical weight.

Every SOURSOP function that returns an ensemble-average value therefore accepts an optional weights keyword: a per-frame vector that defines how much each frame contributes to the average. The default, weights=False, is an exact no-op and reproduces the original unweighted behaviour bit-for-bit.

Reweighting in SOURSOP is deterministic. A weighted average is the closed-form expectation under the supplied weights — there is no stochastic resampling, so results are exactly reproducible and do not depend on a random seed.

The weight vector contract

A valid weights vector is a probability vector over frames:

  • one entry per frame (len(weights) == n_frames);

  • every entry in the closed interval [0, 1];

  • all entries finite (no nan / inf);

  • the entries sum to 1 within a small tolerance etol (default 1e-7).

These conditions are enforced centrally by soursop.ssutils.validate_weights(). If any condition fails an SSException is raised with a descriptive message, rather than silently producing a meaningless number.

Most reweighting-capable methods also expose an etol keyword so the sum-to-one tolerance can be tightened or relaxed.

Interaction with stride

When a method is called with both a frame stride and weights, the weight vector is first subsampled (weights[::stride]) and then renormalised so it still sums to 1 over the retained frames. A warning is emitted because per-stride reweighting is rarely what you want unless the weights were computed for exactly those frames.

The deterministic helpers

All weighting is funnelled through a small set of shared helpers in soursop.ssutils so that every method applies weights identically:

  • validate_weights() — validate (and stride-normalise) a weight vector; the single source of truth.

  • weighted_mean()sum(w * x).

  • weighted_rms()sqrt(sum(w * x**2)) (the polymer-physics RMS order parameter).

  • weighted_std() — the reliability-weighted population standard deviation. Frame weights are probability weights with no associated sample size, so no ddof correction is well-defined; the population estimator is the unambiguous, reproducible choice.

  • weighted_corr() — the weighted Pearson correlation (used by the correlation observables).

Example

import numpy as np
from soursop.sstrajectory import SSTrajectory

TrajOb  = SSTrajectory('traj.xtc', 'start.pdb')
protein = TrajOb.proteinTrajectoryList[0]

n = protein.n_frames

# 1. default: every frame contributes equally
rg_per_frame = protein.get_radius_of_gyration()        # array, length n

# 2. uniform weights == the ordinary mean of the per-frame values
w_uniform = np.full(n, 1.0 / n)
rg_mean   = protein.get_radius_of_gyration(weights=w_uniform)
assert np.isclose(rg_mean, rg_per_frame.mean())

# 3. a one-hot weight isolates a single frame
w_one      = np.zeros(n); w_one[10] = 1.0
rg_frame10 = protein.get_radius_of_gyration(weights=w_one)
assert np.isclose(rg_frame10, rg_per_frame[10])

# 4. a real reweighting vector (e.g. from MSM / MaxEnt)
w = my_reweighting_vector            # shape (n,), >= 0, sums to 1
dmap_w, _ = protein.get_distance_map(weights=w)
nu, A0    = protein.get_scaling_exponent(weights=w)[:2]

Behaviour at the extremes

  • Uniform weights (1/n everywhere) reproduce the ordinary (unweighted) ensemble mean to within floating-point precision.

  • One-hot weights (a single 1.0) return exactly that frame’s value.

  • Invalid weights (wrong length, an element outside [0, 1], a non-finite element, or a sum that differs from 1 by more than etol) raise an SSException.

What can be reweighted

Any method that collapses the trajectory to an ensemble value accepts weights. This includes the global dimensions (get_radius_of_gyration, get_hydrodynamic_radius, get_asphericity, get_end_to_end_distance, get_gyration_tensor), the per-frame getters’ ensemble means, the distance/contact maps, get_Q, the polymer-scaling observables (get_internal_scaling, get_internal_scaling_RMS, get_scaling_exponent, get_local_to_global_correlation), the SASA summaries (get_all_SASA, get_site_accessibility, get_regional_SASA), get_angle_decay, get_local_collapse, get_end_to_end_vs_rg_correlation, the dihedral mutual information, and the SSTrajectory overall/inter-chain observables.

A small number of routines describe a distribution or a pairwise frame-vs-frame quantity rather than a single ensemble average — for example get_internal_scaling(mean_vals=False) and get_local_heterogeneity. A single per-frame probability vector is not well-defined for those, so they raise an SSException if weights is supplied (instead of silently returning a questionable number).

API reference

soursop.ssutils.validate_weights(weights, n_frames, stride=1, etol=1e-07)[source]

Validate (and stride-normalise) a per-frame re-weighting vector.

This is the single, shared entry point used by every SOURSOP function that accepts a weights keyword. It enforces that the weights form a proper per-frame probability vector so that all downstream deterministic weighted averages (weighted_mean(), weighted_std(), weighted_rms(), weighted_corr()) are well defined and consistent.

The literal False (or None) is the “no weighting” sentinel and is passed straight through, so a default of weights=False is a strict no-op.

Validation (each failure raises SSException):

  1. False / None -> returned unchanged (no-op).

  2. castable to a 1-D numpy.float64 array.

  3. all elements finite (no nan / inf).

  4. exactly one weight per frame (len == n_frames).

  5. every element in the closed interval [0, 1].

  6. if stride > 1: the vector is subsampled (weights[::stride]) and renormalised to sum to 1 so the strided weighted average is still a proper expectation (this fixes the historical behaviour where strided weights silently no longer summed to 1).

  7. |sum(weights) - 1| < etol (checked after any stride/renormalise).

Parameters:
  • weights (array_like, False or None) – Per-frame weights, or the False/None no-op sentinel.

  • n_frames (int) – Number of frames the (unstrided) weight vector must match.

  • stride (int, optional) – Frame stride that will be applied to the trajectory. Default 1.

  • etol (float, optional) – Tolerance on |sum(weights) - 1|. Default 1e-7.

Returns:

False if the input was False/None; otherwise a numpy.float64 array of length ceil(n_frames / stride) that sums to 1 within etol.

Return type:

numpy.ndarray or False

Raises:

SSException – If any of the validation conditions above fail.

Example

>>> import numpy as np
>>> from soursop.ssutils import validate_weights
>>> validate_weights(False, 10) is False
True
>>> w = validate_weights(np.full(10, 0.1), 10)
>>> float(w.sum())
1.0
soursop.ssutils.weighted_mean(x, weights, axis=0)[source]

Deterministic frame-weighted mean (sum(w * x) along axis).

Thin wrapper around numpy.average(). Because weights is a validated probability vector (sum == 1) this is exactly the re-weighted ensemble expectation and is numerically identical to the numpy.average(x, axis, weights=weights) calls already used elsewhere in SOURSOP.

Parameters:
  • x (array_like) – Per-frame data; the frame axis is axis.

  • weights (numpy.ndarray) – Validated per-frame weights (see validate_weights()).

  • axis (int, optional) – Axis to average over (the frame axis). Default 0.

Returns:

x averaged over axis with the given weights.

Return type:

numpy.ndarray or float

Example

>>> import numpy as np
>>> weighted_mean(np.array([1.0, 3.0]), np.array([0.25, 0.75]))
2.5
soursop.ssutils.weighted_rms(x, weights, axis=0)[source]

Deterministic frame-weighted root-mean-square (sqrt(sum(w*x^2))).

The polymer-physics order parameter used by the internal-scaling and scaling-exponent routines is an RMS distance, so this is the weighted analogue of sqrt(mean(x**2)).

Parameters:
  • x (array_like) – Per-frame data; the frame axis is axis.

  • weights (numpy.ndarray) – Validated per-frame weights (see validate_weights()).

  • axis (int, optional) – Axis to reduce over. Default 0.

Returns:

The weighted RMS of x over axis.

Return type:

numpy.ndarray or float

Example

>>> import numpy as np
>>> float(weighted_rms(np.array([3.0, 4.0]), np.array([0.5, 0.5])))
3.5355339059327378
soursop.ssutils.weighted_std(x, weights, axis=0)[source]

Deterministic frame-weighted (population) standard deviation.

Uses the reliability-weighted population estimator sqrt(sum(w * (x - mean)**2)). Frame weights here are probability weights with no associated sample size, so there is no well-defined ddof (Bessel-style) correction; the population estimator is the unambiguous, reproducible choice and the de-facto standard for re-weighted molecular-dynamics ensembles.

Parameters:
  • x (array_like) – Per-frame data; the frame axis is axis.

  • weights (numpy.ndarray) – Validated per-frame weights (see validate_weights()).

  • axis (int, optional) – Axis to reduce over. Default 0.

Returns:

The weighted population standard deviation of x over axis.

Return type:

numpy.ndarray or float

Example

>>> import numpy as np
>>> float(weighted_std(np.array([1.0, 1.0]), np.array([0.5, 0.5])))
0.0
soursop.ssutils.weighted_corr(a, b, weights)[source]

Deterministic frame-weighted Pearson correlation between two vectors.

Computed from the weighted covariance matrix (numpy.cov(..., ddof=0, aweights=weights)), matching the ddof=0 / aweights convention already used by get_local_to_global_correlation.

Parameters:
  • a (array_like) – Equal-length per-frame vectors.

  • b (array_like) – Equal-length per-frame vectors.

  • weights (numpy.ndarray) – Validated per-frame weights (see validate_weights()).

Returns:

The weighted Pearson correlation coefficient of a and b.

Return type:

float

Example

>>> import numpy as np
>>> w = np.full(4, 0.25)
>>> round(float(weighted_corr(np.array([1.,2,3,4]), np.array([2.,4,6,8]), w)), 6)
1.0