sscoper

Overview

sscoper provides COPER (Convex Optimization for Ensemble Reweighting; Leung et al., 2016) and its iterative variant iCOPER for reweighting conformational ensembles against experimental observables. Like ssbme, it computes a new set of frame weights that brings the ensemble averages into agreement with experiment while perturbing the prior ensemble as little as possible, and the resulting weight vector plugs straight into the consistent SOURSOP reweighting system — every ensemble-average observable accepts a weights= argument (see Ensemble reweighting (frame weights)).

COPER and BME answer the same question but pose it differently. BME minimises a penalty (½χ² + θ·D_KL) with a tunable regularisation \(\theta\); COPER instead solves a hard-constrained problem — maximise the ensemble entropy subject to \(\chi^2 \le 1\) — with no free parameter beyond the χ² limit itself. See COPER vs BME below.

The module is self-contained (NumPy + SciPy only). Its user-facing API is deliberately analogous to ssbme, so the same code patterns work for both:

ExperimentalObservable — a container for one experimental data point (value, uncertainty, constraint type, optional data-type group). This is the same class used by BME (shared via soursop.ssutils).
COPER — standard COPER reweighting.
iCOPER — iterative COPER, which additionally fits an unknown global scale and offset between calculated and experimental data.

Both classes return a COPERResult and share a chi2_limit_scan() helper (the COPER analogue of BME’s theta_scan).

The reweighting problem (COPER)

A simulation produces an ensemble of \(N\) conformations with prior weights \(w_i^{0}\) (uniform unless you supply enhanced-sampling weights). For each conformation we compute \(M\) observables; the ensemble average of observable \(k\) is \(\langle F_k \rangle = \sum_i w_i F_k(x_i)\), and the disagreement with experiment is measured by the reduced chi-squared

\[\chi^2(w) = \frac{1}{M}\sum_{k=1}^{M} \left(\frac{\langle F_k\rangle - F_k^{\text{exp}}}{\sigma_k}\right)^2 .\]

COPER seeks the weights that maximise the entropy (stay as close as possible to the prior) subject to fitting the data within error:

\[\max_{w}\; S(w) = -\sum_i w_i \ln w_i \qquad\text{subject to}\qquad \chi^2(w) \le 1,\;\; w_i \ge 0,\;\; \textstyle\sum_i w_i = 1 .\]

(SOURSOP generalises \(S\) to the relative entropy \(-D_{\mathrm{KL}}(w\|w^0)\) so a non-uniform prior is supported; for the usual uniform prior the two differ only by the constant \(\ln N\).) Because the entropy is concave and the χ² and simplex constraints are convex, this is a convex optimisation problem with a unique global solution, solved here as a primal problem over the \(N\) weights with SciPy’s trust-constr interior-point method, following the two-step recipe of the paper:

χ² minimisation / feasibility. Minimise \(\chi^2(w)\) over the simplex. If the minimum satisfies \(\chi^2 \le 1\), that point is a feasible interior point for step 2; if not, the problem has no solution — no reweighting of the prior ensemble can reproduce the data, a diagnostic in its own right.
Entropy maximisation. Maximise the entropy subject to \(\chi^2 \le 1\). SOURSOP starts this step from the uniform prior — the unconstrained entropy maximum — and tightens toward the constraint, which is better conditioned than starting from the (low-entropy) step-1 point; the problem is convex, so the optimum is the same either way.

The χ² limit need not be exactly 1: scaling it is equivalent to scaling the experimental uncertainties (the paper varies it over ~0.25–4). Tightening it fits the data harder at the cost of ensemble diversity.

Two derived quantities are central to interpreting a COPER fit. The entropy reduction

\[\Delta S = S(w) - S(w^0) = -D_{\mathrm{KL}}(w\|w^0) \le 0\]

measures the information content of the experimental data relative to the prior: a large \(|\Delta S|\) means the data force a big change. It equals (in magnitude) the mean free-energy change \(\langle\Delta G\rangle / kT\) needed to reconcile model and experiment. The per-frame reweighting factor \(r_i = w_i / w_i^{0}\) (with \(\Delta G_i = -kT\ln r_i\)) shows which conformations are up- or down-weighted. As in BME, the fraction of effective frames \(\phi = \exp(-D_{\mathrm{KL}}) \in (0,1]\) summarises how much diversity survived.

How COPER solves this (and why it scales)

In plain terms. Think of the weights as a recipe for mixing the \(N\) conformers. Maximising entropy means keeping the recipe as even as possible (closest to the prior); the χ² constraint is a wall you may not cross (the fit must be at least this good). COPER simply walks “downhill in unevenness” until it just touches that wall — and because the landscape is bowl-shaped (convex), there is one and only one place it can come to rest.

A computer hits two practical snags doing this, and COPER removes both:

The recipe must stay valid. Every weight has to be non-negative and the whole set must sum to one — \(N+1\) fiddly side-conditions that bog the optimiser down. COPER makes them disappear with the softmax trick: instead of tuning the weights directly, it tunes unconstrained numbers \(z_i\) and defines \(w_i = e^{z_i}/\sum_j e^{z_j}\). Any choice of the \(z_i\) automatically gives positive weights that sum to one, so the optimiser is free to roam.
Taking confident steps needs the local curvature. To jump straight to the bottom of a bowl (rather than inching along), the solver wants the Hessian — the table of second derivatives describing how the landscape curves. Written out, that table is \(N\times N\): for 100,000 conformers it would hold \(10^{10}\) numbers (~80 GB) — hopeless to even store. COPER never builds it. The curvature here has a very simple structure that we know exactly: the entropy part is essentially diagonal (each conformer curves on its own), and the data part is “low rank” — it carries only as much complexity as there are data points \(M\), which is tiny next to \(N\). So instead of a giant table we keep a recipe for computing “curvature × direction” on the fly, costing about \(N\times M\) work and only \(O(N)\) memory.

The payoff is that COPER converges in a few dozen steps and fits a hundred-thousand-conformer ensemble in seconds-to-tens-of-seconds, returning the same maximum-entropy solution it always would.

In detail. With \(F_{ik} = F_k(x_i)\) the calculated value of observable \(k\) for frame \(i\), the simplex is removed by optimising over \(z\in\mathbb{R}^N\) with \(w = \mathrm{softmax}(z)\). For any scalar function \(g(w)\) the chain rule through the softmax (whose Jacobian is \(J = \mathrm{diag}(w) - w w^{\mathsf T}\)) gives the gradient

\[(\nabla_z g)_i = w_i\big[(\nabla_w g)_i - w\cdot\nabla_w g\big],\]

and the Hessian-vector product used by the trust-region solver is

\[H_z\,v = u\circ(g_w - a) \;+\; w\circ(H_w u) \;-\; (b + c)\,w,\]

where \(g_w=\nabla_w g\), \(H_w=\nabla_w^2 g\), \(u = J v = w\circ v - w\,(w\cdot v)\), and the scalars \(a = w\cdot g_w\), \(b = u\cdot g_w\), \(c = w\cdot(H_w u)\) (\(\circ\) is the elementwise product). Only the w-space pieces \(g_w\) and \(H_w u\) are problem-specific:

Entropy \(g = D_{\mathrm{KL}}(w\|w^0)\): \(g_w = \ln(w/w^0) + 1\) and \(H_w = \mathrm{diag}(1/w)\). The factor of \(w\) in \(u\) cancels the \(1/w\), so \(H_w u = v - (w\cdot v)\) — diagonal-cheap and free of any division by small weights.
Per-group χ² \(\chi^2_\alpha\) is quadratic in \(w\), so its w-space Hessian is the constant, rank-≤ :math:`M_alpha` matrix

\[H_w^{(\alpha)} = \frac{2}{M_\alpha}\sum_{k\in\alpha} \frac{f_k\, f_k^{\mathsf T}}{\sigma_k^2} = \frac{2}{M_\alpha}\,C_\alpha\,\mathrm{diag}(\sigma_k^{-2})\,C_\alpha^{\mathsf T},\]

where \(f_k\) is the length-\(N\) column \(F_{\cdot k}\) and \(C_\alpha\) stacks the active columns. We never form this \(N\times N\) matrix: the product \(H_w^{(\alpha)} u = \tfrac{2}{M_\alpha} C_\alpha\big(\mathrm{diag}(\sigma_k^{-2})\,(C_\alpha^{\mathsf T}u)\big)\) is evaluated right-to-left in \(O(N M_\alpha)\). (For upper / lower constraints only the currently violated observables are included, exactly as in the gradient.)

Both Hessians are handed to trust-constr as matrix-free scipy.sparse.linalg.LinearOperators, so memory stays \(O(N)\) and the solver uses its iterative trust-region subproblem solver — the reason COPER scales to \(\sim 10^5\) conformers. None of this changes the optimum: it is the same convex problem, just posed so it can be solved cheaply.

COPER

COPER performs the reweighting described above. You construct it with a list of observables and a (n_frames, n_observables) array of calculated values, then call fit():

from soursop.sscoper import COPER, ExperimentalObservable
import numpy as np

calc = np.column_stack([rg_per_frame, ree_per_frame])   # (n_frames, 2)

obs = [
    ExperimentalObservable(value=23.0, uncertainty=1.0, name="Rg"),
    ExperimentalObservable(value=60.0, uncertainty=2.0, name="Ree"),
]

coper = COPER(obs, calc)
result = coper.fit(chi2_limit=1.0)

if result.feasible:
    weights = result.weights        # plug into any SOURSOP observable
result.print_diagnostics()

Constraints ("equality" / "upper" / "lower") behave exactly as in ssbme: an upper / lower bound only penalises the disallowed side, so an already-satisfied bound leaves the ensemble essentially untouched.

Feasibility. Unlike BME, COPER can report that the data are infeasible: if the smallest achievable χ² already exceeds the limit, result.feasible is False, result.success is False, and result.weights hold the closest the prior ensemble can come (the χ²-minimiser). This is a genuine result — it says the data cannot be reproduced by reweighting alone (a sampling or force-field problem) — so always check result.feasible before using the weights.

Per-data-type χ². When you fit several kinds of data at once (e.g. RDCs and J-couplings, as in the paper), a single pooled χ² can let one data type dominate. Assign each observable a group label and COPER imposes a separate \(\chi^2_\alpha \le \text{limit}\) for every group:

obs = [
    ExperimentalObservable(-5.1, 0.4, name="RDC1", group="RDC"),
    ExperimentalObservable( 2.3, 0.4, name="RDC2", group="RDC"),
    ExperimentalObservable( 6.0, 0.5, name="J1",   group="Jcoupling"),
]
result = COPER(obs, calc).fit(chi2_limit=1.0)

Observables without a group are pooled into one default group.

iCOPER — iterative COPER

For data with an unknown global scale and/or offset (the canonical case being SAXS, where the calculated intensity curve has an arbitrary scale and a flat background), iCOPER alternates two steps until χ² stabilises, exactly as soursop.ssbme.iBME does for BME:

a (by default \(1/\sigma^2\)-weighted) linear regression of the ensemble-averaged calculated values against the experimental values, giving a scale \(\alpha\) and offset \(\beta\), after which calc → α·calc + β; then
a standard COPER step on the rescaled data.

The returned COPERResult reports the net scale and offset relative to the original input plus a per-iteration log (icoper_iterations):

from soursop.sscoper import iCOPER, ExperimentalObservable

obs  = [ExperimentalObservable(I_q[k], sigma_q[k], name=f"q{k}")
        for k in range(len(q))]
result = iCOPER(obs, calc_intensities).fit(chi2_limit=1.0, ftol=0.01)
print(result.scale, result.offset)
weights = result.weights

Set fit_offset=False for a scale-only fit, and lr_weights=False for an unweighted regression.

Choosing the χ² limit (the error-scaling scan)

The χ² limit plays the role BME’s \(\theta\) plays, and is chosen the same way — by scanning it and looking for the knee of the trade-off between fit quality (χ²) and ensemble diversity (relative entropy / \(\phi\)). chi2_limit_scan() (also COPER.scan_chi2_limit / iCOPER.scan_chi2_limit) sweeps the limit over the paper’s error-scaling range and selects the knee:

scan = coper.scan_chi2_limit(chi2_limits=(0.25, 4.0), n_points=10)
scan.print_summary()
print("recommended limit:", scan.optimal_chi2_limit)

result = coper.fit(chi2_limit=scan.optimal_chi2_limit)

Interpreting the result and common pitfalls

Every fit returns a COPERResult; result.print_diagnostics() produces a formatted report and result.diagnostics() returns the same as a dict with explicit warnings. Key things to check:

Feasibility first. If result.feasible is False the data cannot be fit by any reweighting at this limit. Loosen the χ² limit / uncertainties, or — more likely — improve sampling or the force field. Reweighting cannot create conformations the ensemble never sampled.
φ / ΔS is not extreme. A very low \(\phi\) (large \(|\Delta S|\)) means a handful of frames dominate the reweighted ensemble; the result is then effectively a few structures, not an ensemble. A large \(\langle\Delta G\rangle/kT\) (well beyond a few \(kT\)) signals a model that disagrees strongly with the data.
Two effective sample sizes. The diagnostics report both the entropy-based \(N_{\text{eff}} = N\phi\) and the Rényi-2 / participation ratio \(1/\sum_i w_i^2\), which is more sensitive to a few dominant weights.
Information content of data types. Because \(\Delta S\) is well defined, comparing the entropy reduction induced by different observable groups quantifies how much each data type actually constrains the ensemble (a central use in the paper).
Realistic uncertainties. \(\sigma_k\) must include experimental and forward-model error; scaling the χ² limit is exactly equivalent to scaling all \(\sigma_k\).
Don’t validate on the fitting data. χ² against the fitted observables only improves. Assess quality by predicting independent observables with predict(), or by checking the reproducibility of \(\Delta S\) across independent subsets of the ensemble (a sampling-density test).

COPER vs BME

Both methods produce the least-biased (maximum-entropy) reweighting consistent with the data; they differ in formulation and in which parameter you tune:

	BME (ssbme)	COPER (this page)
Formulation	Penalty: minimise \(\tfrac12\chi^2 + \theta\,D_{\mathrm{KL}}\)	Hard constraint: maximise \(S\) s.t. \(\chi^2 \le 1\)
Free parameter	\(\theta\) (regularisation)	χ² limit (= error scaling)
Solved over	\(M\) Lagrange multipliers (the dual problem; the weights then follow in closed form, \(w_i \propto w_i^0 e^{-\sum_k \lambda_k F_k(x_i)}\))	\(N\) frame weights directly (the primal problem)
Infeasibility	not signalled	reported explicitly
Choosing the knob	L-curve `theta_scan`	`chi2_limit_scan`

For most ensembles either method gives similar weights. The difference is computational: BME optimises the \(M\)-dimensional dual problem (one Lagrange multiplier per experimental observable) and recovers the weights afterwards, which is cheaper when there are many frames but few observables (\(M \ll N\)); COPER optimises the \(N\) weights directly, but its explicit χ² constraint and feasibility test make the “can these data be fit at all?” question direct.

Scaling note. The faithful primal optimisation here uses trust-constr over the \(N\) weights; it is convex and fast for ensembles up to ~10³–10⁴ frames. The original paper used IPOPT and reached ~10⁵; for very large ensembles, cluster or subsample the trajectory before reweighting.

Key references

The implementation here is adapted from the COPER method; please cite the primary reference if you use it:

Leung, H. T. A., Bignucolo, O., Aregger, R., Dames, S. A., Mazur, A., Bernèche, S. & Grzesiek, S. A Rigorous and Efficient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content. J. Chem. Theory Comput. 12, 383–394 (2016). doi:10.1021/acs.jctc.5b00759.

Foundational and closely related work:

Kullback, S. & Leibler, R. A. On Information and Sufficiency. Ann. Math. Stat. 22, 79–86 (1951) — the relative entropy COPER maximises.
Bottaro, S., Bengtsen, T. & Lindorff-Larsen, K. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. Methods Mol. Biol. 2112, 219–240 (2020) — the BME / iBME method in ssbme; COPER differs by using a hard χ² constraint rather than a penalty.
Hummer, G. & Köfinger, J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 143, 243150 (2015).
Różycki, B., Kim, Y. C. & Hummer, G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure 19, 109–116 (2011) — EROS; the SAXS scale/offset problem that motivates iCOPER.

The ExperimentalObservable container is documented under Ensemble reweighting (frame weights) (it is shared with BME).

class soursop.sscoper.COPER(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

Convex Optimization for Ensemble Reweighting (COPER).

Reweights an ensemble by maximising entropy subject to a hard chi-squared constraint, solved as a primal convex optimisation over the N frame weights via SciPy’s trust-constr interior-point method. The two-step procedure of Leung et al. (2016):

minimise the per-group chi-squared over the simplex to test feasibility, then
maximise the (relative) entropy subject to chi^2_alpha <= chi2_limit for every observable group, starting from the step-1 point.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables to fit against. Observables sharing the same group label contribute to a single chi-squared term chi^2_alpha; observables without a group are pooled into one default group.
calculated_values (numpy.ndarray) – Per-frame calculated observable values, shape (n_frames, n_observables).
initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalised to sum to 1.

Raises:

SSException – If inputs are malformed (see soursop.ssutils.validate_reweighting_inputs()).

Notes

The simplex (w_i >= 0, sum_i w_i = 1) is enforced implicitly via a softmax reparameterisation (w = softmax(z)), so the optimiser only has to handle the few chi-squared constraints rather than N bound constraints plus a sum-to-one equality. Both the entropy objective and the chi-squared constraints supply exact analytic Hessians as matrix-free LinearOperator objects (the entropy Hessian is effectively diagonal; each chi-squared Hessian is rank <= m via the calculated-value matrix), so no dense N x N matrix is ever formed and memory stays O(N) rather than O(N^2). trust-constr then converges in tens of iterations and scales to ~10^5 frames (a binding 10^5-conformer fit takes seconds-to-tens- of-seconds; the equivalent dense Hessian would need tens of GB). The problem is convex with a unique global solution, so neither the reparameterisation nor the exact Hessians change the optimum.

__init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

fit(chi2_limit: float = 1.0, max_iterations: int = 2000, optimizer: str = 'trust-constr', verbose: bool = True) → COPERResult[source]

Fit COPER weights at a given chi-squared limit.

Parameters:

chi2_limit (float, optional) – Hard upper bound on each per-group chi-squared. Default 1.0 (the paper’s chi^2 <= 1 convention).
max_iterations (int, optional) – Maximum optimizer iterations (each of the two steps). Default from the module.
optimizer (str, optional) – scipy.optimize.minimize method used for the constrained entropy-maximisation step (must support nonlinear constraints). Default "trust-constr". The feasibility step is always solved with the unconstrained "L-BFGS-B".
verbose (bool, optional) – Print progress. Default True.

Return type:

COPERResult

Raises:

SSException – If chi2_limit is not positive.

Notes

The result is also stored on the instance (self.result, self.chi2_limit). When the problem is infeasible the returned result has feasible=False and success=False; its weights are the step-1 chi-squared minimiser (the closest the prior ensemble can come to satisfying the data).

scan_chi2_limit(chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) → COPERScanResult[source]: Run a chi-squared-limit scan using this instance’s data.

predict(calculated_values: ndarray) → ndarray[source]

Weighted means of arbitrary observables using the fitted weights.

Raises:: SSException – If fit() has not succeeded, or frame count mismatches.

class soursop.sscoper.iCOPER(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

Iterative COPER for data with an unknown global scale and offset.

At each iteration a (optionally inverse-variance-weighted) linear regression of the ensemble-averaged calculated values against the experimental values is used to refit a global scale alpha and offset beta; the calculated values are rescaled (calc -> alpha * calc + beta) and a standard COPER step is run. The procedure repeats until the change in the final chi-squared drops below ftol. This is the COPER analogue of soursop.ssbme.iBME.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape (n_frames, n_observables).
initial_weights (numpy.ndarray, optional) – Prior frame weights. Uniform if None.

Raises:

SSException – If inputs are malformed.

__init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

fit(chi2_limit: float = 1.0, ftol: float = 0.01, max_icoper_iterations: int = 50, fit_offset: bool = True, lr_weights: bool = True, max_iterations: int = 2000, optimizer: str = 'trust-constr', verbose: bool = True) → COPERResult[source]

Run iterative COPER at a fixed chi-squared limit.

Parameters:

chi2_limit (float, optional) – Hard upper bound on each per-group chi-squared (passed to the inner COPER step). Default 1.0.
ftol (float, optional) – Convergence tolerance on |delta chi-squared| between iCOPER iterations. Default 0.01.
max_icoper_iterations (int, optional) – Maximum number of scale/offset + COPER iterations. Default 50.
fit_offset (bool, optional) – If True fit a scale and offset; if False fit scale only. Default True.
lr_weights (bool, optional) – If True weight the linear regression by 1/sigma^2; otherwise use uniform weights. Default True.
max_iterations (int, optional) – Maximum inner-optimizer iterations per COPER step.
optimizer (str, optional) – scipy.optimize.minimize method for each inner COPER step.
verbose (bool, optional) – Print per-iteration progress. Default True.

Returns:

Result with the final weights, scale/offset (net, relative to the original input), the icoper_iterations log, chi_squared_initial from the first iteration’s pre-fit chi-squared, and phi computed against the original prior.

Return type:

COPERResult

Raises:

SSException – If chi2_limit is not positive.

scan_chi2_limit(chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) → COPERScanResult[source]: Run a chi-squared-limit scan using iCOPER.

predict(calculated_values: ndarray) → ndarray[source]

Weighted means of arbitrary observables using the fitted weights.

Raises:: SSException – If fit() has not succeeded, or frame count mismatches.

get_icoper_weights() → ndarray[source]

Final iCOPER weights.

Raises:: SSException – If fit() has not been called.

get_icoper_stats() → List[dict][source]

Per-iteration iCOPER log (iteration, scale, offset, chi_squared, feasible, diff).

Raises:: SSException – If fit() has not been called.

class soursop.sscoper.COPERResult(weights: ~numpy.ndarray, initial_weights: ~numpy.ndarray, chi_squared_initial: float, chi_squared_min: float, chi_squared_final: float, chi2_limit: float, feasible: bool, entropy: float, entropy_initial: float, delta_S: float, mean_delta_G_kT: float, phi: float, n_iterations: int, success: bool, message: str, observables: ~typing.List[~soursop.ssutils.ExperimentalObservable], calculated_values: ~numpy.ndarray, metadata: dict, scale: float | None = None, offset: float | None = None, icoper_iterations: ~typing.List[dict] = <factory>)[source]

Container for a COPER / iCOPER reweighting result.

weights

Optimized (posterior) frame weights, summing to 1.

Type:: numpy.ndarray

initial_weights

Prior frame weights.

Type:: numpy.ndarray

chi_squared_initial

Reduced chi-squared at the prior (max over groups when per-data-type constraints are used).

Type:: float

chi_squared_min

Reduced chi-squared at the end of the step-1 minimisation (max over groups). Determines feasibility: feasible is True iff chi_squared_min <= chi2_limit (up to FEASIBILITY_TOLERANCE).

Type:: float

chi_squared_final

Reduced chi-squared after the step-2 entropy maximisation (max over groups). Equal to chi_squared_min when feasible is False.

Type:: float

chi2_limit

The chi-squared limit used (chi^2_alpha <= chi2_limit per group).

Type:: float

feasible

Whether the data can be satisfied by any reweighting at this limit.

Type:: bool

entropy

Shannon entropy -sum w * ln w at the optimized weights.

Type:: float

entropy_initial

Shannon entropy at the prior.

Type:: float

delta_S

Entropy change S(w) - S(w0) <= 0 (equal to -KL(w || w0)). Measures the information content of the experimental data relative to the prior ensemble.

Type:: float

mean_delta_G_kT

Mean free-energy change in kT units, <delta_G>/kT = -delta_S (Leung et al. eq 10).

Type:: float

phi

Fraction of effective frames, exp(-KL(w || w0)) in (0, 1].

Type:: float

n_iterations

Optimizer iterations of the step-2 (entropy) optimisation, or step-1 when infeasible.

Type:: int

success

Whether the optimisation succeeded.

Type:: bool

message

Optimizer / feasibility status message.

Type:: str

observables

Observables used in the fit.

Type:: list of ExperimentalObservable

calculated_values

Calculated values used in the fit (for iCOPER, the final scaled array).

Type:: numpy.ndarray

metadata

Free-form metadata (optimizer, timestamp, group structure, …).

Type:: dict

scale

Final iCOPER scale factor relative to the original input. None for plain COPER.

Type:: float, optional

offset

Final iCOPER offset relative to the original input. None for plain COPER.

Type:: float, optional

icoper_iterations

Per-iteration iCOPER log (empty for plain COPER). Each entry has iteration, scale, offset, chi_squared, feasible and diff.

Type:: list of dict

predict(calculated_values: ndarray) → ndarray[source]

Weighted average of arbitrary observables using these weights.

Parameters:: calculated_values (numpy.ndarray) – Per-frame observable values, shape (n_frames, n_observables). n_frames must match the fitted ensemble.
Returns:: Weighted means, shape (n_observables,).
Return type:: numpy.ndarray
Raises:: SSException – If the number of frames does not match the fitted ensemble.

diagnostics(warn_threshold: float = 0.5) → dict[source]

Diagnose the reweighting result and flag potential issues.

Parameters:: warn_threshold (float, optional) – Minimum acceptable phi before a low-diversity warning is raised. Default 0.5.
Returns:: Diagnostic metrics and a list of human-readable warnings.
Return type:: dict

Notes

Two effective sample sizes are reported: the entropy-based neff_entropy (N * phi, the COPER <delta_G>/kT measure) and the Renyi-2 / participation ratio neff_renyi2 (1 / sum w^2), which is more sensitive to a few dominant weights.

print_diagnostics(warn_threshold: float = 0.5)[source]

Print a formatted diagnostic report for this result.

Parameters:: warn_threshold (float, optional) – Passed through to diagnostics(). Default 0.5.

class soursop.sscoper.COPERScanResult(chi2_limits: ndarray, chi_squared_values: ndarray, phi_values: ndarray, kl_divergence_values: ndarray, feasible_mask: ndarray, results: List[COPERResult], optimal_chi2_limit: float, optimal_idx: int, method: str)[source]

Container for a chi-squared-limit scan.

chi2_limits

Scanned chi-squared-limit grid.

Type:: numpy.ndarray

chi_squared_values

Final chi-squared (chi_squared_final) per limit.

Type:: numpy.ndarray

phi_values

Effective-fraction (phi) per limit.

Type:: numpy.ndarray

kl_divergence_values

Relative entropy per limit.

Type:: numpy.ndarray

feasible_mask

feasible flag per limit.

Type:: numpy.ndarray of bool

results

Per-limit fit results.

Type:: list of COPERResult

optimal_chi2_limit

Selected chi-squared limit (L-curve knee).

Type:: float

optimal_idx

Index of optimal_chi2_limit in chi2_limits.

Type:: int

method

Human-readable knee-selection method used.

Type:: str

print_summary()[source]: Print a formatted summary of the chi-squared-limit scan.

soursop.sscoper.chi2_limit_scan(observables: List[ExperimentalObservable], calculated_values: ndarray, reweighter: str = 'coper', chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, initial_weights: ndarray | None = None, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) → COPERScanResult[source]

Scan a grid of chi-squared limits for COPER or iCOPER.

The L-curve analogue of soursop.ssbme.theta_scan(): tightening the chi-squared limit (the paper’s error-scaling range of roughly 0.25-4) trades fit quality for ensemble diversity. The knee of the chi-squared / relative-entropy curve gives a principled choice.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape (n_frames, n_observables).
reweighter (str, optional) – "coper" (default) or "icoper".
chi2_limits (tuple or numpy.ndarray, optional) – (min, max) for a generated grid of n_points, or an explicit 1D array of limits. Default (0.25, 4.0) (the paper’s error- scaling range).
n_points (int, optional) – Number of grid points when chi2_limits is a tuple. Default 8.
log_scale (bool, optional) – Sample logarithmically when chi2_limits is a tuple. Default True.
initial_weights (numpy.ndarray, optional) – Prior weights. Uniform if None.
method (str, optional) – Knee-selection method (see soursop.ssutils.find_optimal_theta()).
verbose (bool, optional) – Print per-limit progress. Default False.
fit_kwargs (dict, optional) – Extra keyword arguments forwarded to the reweighter’s fit (e.g. ftol / max_icoper_iterations for iCOPER).

Return type:

COPERScanResult

Raises:

SSException – If reweighter is unknown, n_points < 1, or a log-scale grid has non-positive endpoints.