sscoper
Overview
sscoper provides COPER (Convex Optimization for Ensemble Reweighting; Leung et al., 2016) and its iterative variant iCOPER for reweighting conformational ensembles against experimental observables. Like ssbme, it computes a new set of frame weights that brings the ensemble averages into agreement with experiment while perturbing the prior ensemble as little as possible, and the resulting weight vector plugs straight into the consistent SOURSOP reweighting system — every ensemble-average observable accepts a weights= argument (see Ensemble reweighting (frame weights)).
COPER and BME answer the same question but pose it differently. BME minimises a penalty (½χ² + θ·D_KL) with a tunable regularisation \(\theta\); COPER instead solves a hard-constrained problem — maximise the ensemble entropy subject to \(\chi^2 \le 1\) — with no free parameter beyond the χ² limit itself. See COPER vs BME below.
The module is self-contained (NumPy + SciPy only). Its user-facing API is deliberately analogous to ssbme, so the same code patterns work for both:
ExperimentalObservable— a container for one experimental data point (value, uncertainty, constraint type, optional data-typegroup). This is the same class used by BME (shared viasoursop.ssutils).COPER— standard COPER reweighting.iCOPER— iterative COPER, which additionally fits an unknown global scale and offset between calculated and experimental data.
Both classes return a COPERResult and share a chi2_limit_scan() helper (the COPER analogue of BME’s theta_scan).
The reweighting problem (COPER)
A simulation produces an ensemble of \(N\) conformations with prior weights \(w_i^{0}\) (uniform unless you supply enhanced-sampling weights). For each conformation we compute \(M\) observables; the ensemble average of observable \(k\) is \(\langle F_k \rangle = \sum_i w_i F_k(x_i)\), and the disagreement with experiment is measured by the reduced chi-squared
COPER seeks the weights that maximise the entropy (stay as close as possible to the prior) subject to fitting the data within error:
(SOURSOP generalises \(S\) to the relative entropy \(-D_{\mathrm{KL}}(w\|w^0)\) so a non-uniform prior is supported; for the usual uniform prior the two differ only by the constant \(\ln N\).) Because the entropy is concave and the χ² and simplex constraints are convex, this is a convex optimisation problem with a unique global solution, solved here as a primal problem over the \(N\) weights with SciPy’s trust-constr interior-point method, following the two-step recipe of the paper:
χ² minimisation / feasibility. Minimise \(\chi^2(w)\) over the simplex. If the minimum satisfies \(\chi^2 \le 1\), that point is a feasible interior point for step 2; if not, the problem has no solution — no reweighting of the prior ensemble can reproduce the data, a diagnostic in its own right.
Entropy maximisation. Maximise the entropy subject to \(\chi^2 \le 1\). SOURSOP starts this step from the uniform prior — the unconstrained entropy maximum — and tightens toward the constraint, which is better conditioned than starting from the (low-entropy) step-1 point; the problem is convex, so the optimum is the same either way.
The χ² limit need not be exactly 1: scaling it is equivalent to scaling the experimental uncertainties (the paper varies it over ~0.25–4). Tightening it fits the data harder at the cost of ensemble diversity.
Two derived quantities are central to interpreting a COPER fit. The entropy reduction
measures the information content of the experimental data relative to the prior: a large \(|\Delta S|\) means the data force a big change. It equals (in magnitude) the mean free-energy change \(\langle\Delta G\rangle / kT\) needed to reconcile model and experiment. The per-frame reweighting factor \(r_i = w_i / w_i^{0}\) (with \(\Delta G_i = -kT\ln r_i\)) shows which conformations are up- or down-weighted. As in BME, the fraction of effective frames \(\phi = \exp(-D_{\mathrm{KL}}) \in (0,1]\) summarises how much diversity survived.
How COPER solves this (and why it scales)
In plain terms. Think of the weights as a recipe for mixing the \(N\) conformers. Maximising entropy means keeping the recipe as even as possible (closest to the prior); the χ² constraint is a wall you may not cross (the fit must be at least this good). COPER simply walks “downhill in unevenness” until it just touches that wall — and because the landscape is bowl-shaped (convex), there is one and only one place it can come to rest.
A computer hits two practical snags doing this, and COPER removes both:
The recipe must stay valid. Every weight has to be non-negative and the whole set must sum to one — \(N+1\) fiddly side-conditions that bog the optimiser down. COPER makes them disappear with the softmax trick: instead of tuning the weights directly, it tunes unconstrained numbers \(z_i\) and defines \(w_i = e^{z_i}/\sum_j e^{z_j}\). Any choice of the \(z_i\) automatically gives positive weights that sum to one, so the optimiser is free to roam.
Taking confident steps needs the local curvature. To jump straight to the bottom of a bowl (rather than inching along), the solver wants the Hessian — the table of second derivatives describing how the landscape curves. Written out, that table is \(N\times N\): for 100,000 conformers it would hold \(10^{10}\) numbers (~80 GB) — hopeless to even store. COPER never builds it. The curvature here has a very simple structure that we know exactly: the entropy part is essentially diagonal (each conformer curves on its own), and the data part is “low rank” — it carries only as much complexity as there are data points \(M\), which is tiny next to \(N\). So instead of a giant table we keep a recipe for computing “curvature × direction” on the fly, costing about \(N\times M\) work and only \(O(N)\) memory.
The payoff is that COPER converges in a few dozen steps and fits a hundred-thousand-conformer ensemble in seconds-to-tens-of-seconds, returning the same maximum-entropy solution it always would.
In detail. With \(F_{ik} = F_k(x_i)\) the calculated value of observable \(k\) for frame \(i\), the simplex is removed by optimising over \(z\in\mathbb{R}^N\) with \(w = \mathrm{softmax}(z)\). For any scalar function \(g(w)\) the chain rule through the softmax (whose Jacobian is \(J = \mathrm{diag}(w) - w w^{\mathsf T}\)) gives the gradient
and the Hessian-vector product used by the trust-region solver is
where \(g_w=\nabla_w g\), \(H_w=\nabla_w^2 g\), \(u = J v = w\circ v - w\,(w\cdot v)\), and the scalars \(a = w\cdot g_w\), \(b = u\cdot g_w\), \(c = w\cdot(H_w u)\) (\(\circ\) is the elementwise product). Only the w-space pieces \(g_w\) and \(H_w u\) are problem-specific:
Entropy \(g = D_{\mathrm{KL}}(w\|w^0)\): \(g_w = \ln(w/w^0) + 1\) and \(H_w = \mathrm{diag}(1/w)\). The factor of \(w\) in \(u\) cancels the \(1/w\), so \(H_w u = v - (w\cdot v)\) — diagonal-cheap and free of any division by small weights.
Per-group χ² \(\chi^2_\alpha\) is quadratic in \(w\), so its w-space Hessian is the constant, rank-≤ :math:`M_alpha` matrix
\[H_w^{(\alpha)} = \frac{2}{M_\alpha}\sum_{k\in\alpha} \frac{f_k\, f_k^{\mathsf T}}{\sigma_k^2} = \frac{2}{M_\alpha}\,C_\alpha\,\mathrm{diag}(\sigma_k^{-2})\,C_\alpha^{\mathsf T},\]where \(f_k\) is the length-\(N\) column \(F_{\cdot k}\) and \(C_\alpha\) stacks the active columns. We never form this \(N\times N\) matrix: the product \(H_w^{(\alpha)} u = \tfrac{2}{M_\alpha} C_\alpha\big(\mathrm{diag}(\sigma_k^{-2})\,(C_\alpha^{\mathsf T}u)\big)\) is evaluated right-to-left in \(O(N M_\alpha)\). (For
upper/lowerconstraints only the currently violated observables are included, exactly as in the gradient.)
Both Hessians are handed to trust-constr as matrix-free
scipy.sparse.linalg.LinearOperators, so memory stays \(O(N)\) and the
solver uses its iterative trust-region subproblem solver — the reason COPER scales
to \(\sim 10^5\) conformers. None of this changes the optimum: it is the same
convex problem, just posed so it can be solved cheaply.
COPER
COPER performs the reweighting described above. You construct it with a list of observables and a (n_frames, n_observables) array of calculated values, then call fit():
from soursop.sscoper import COPER, ExperimentalObservable
import numpy as np
calc = np.column_stack([rg_per_frame, ree_per_frame]) # (n_frames, 2)
obs = [
ExperimentalObservable(value=23.0, uncertainty=1.0, name="Rg"),
ExperimentalObservable(value=60.0, uncertainty=2.0, name="Ree"),
]
coper = COPER(obs, calc)
result = coper.fit(chi2_limit=1.0)
if result.feasible:
weights = result.weights # plug into any SOURSOP observable
result.print_diagnostics()
Constraints ("equality" / "upper" / "lower") behave exactly as in ssbme: an upper / lower bound only penalises the disallowed side, so an already-satisfied bound leaves the ensemble essentially untouched.
Feasibility. Unlike BME, COPER can report that the data are infeasible: if the smallest achievable χ² already exceeds the limit, result.feasible is False, result.success is False, and result.weights hold the closest the prior ensemble can come (the χ²-minimiser). This is a genuine result — it says the data cannot be reproduced by reweighting alone (a sampling or force-field problem) — so always check result.feasible before using the weights.
Per-data-type χ². When you fit several kinds of data at once (e.g. RDCs and J-couplings, as in the paper), a single pooled χ² can let one data type dominate. Assign each observable a group label and COPER imposes a separate \(\chi^2_\alpha \le \text{limit}\) for every group:
obs = [
ExperimentalObservable(-5.1, 0.4, name="RDC1", group="RDC"),
ExperimentalObservable( 2.3, 0.4, name="RDC2", group="RDC"),
ExperimentalObservable( 6.0, 0.5, name="J1", group="Jcoupling"),
]
result = COPER(obs, calc).fit(chi2_limit=1.0)
Observables without a group are pooled into one default group.
iCOPER — iterative COPER
For data with an unknown global scale and/or offset (the canonical case being SAXS, where the calculated intensity curve has an arbitrary scale and a flat background), iCOPER alternates two steps until χ² stabilises, exactly as soursop.ssbme.iBME does for BME:
a (by default \(1/\sigma^2\)-weighted) linear regression of the ensemble-averaged calculated values against the experimental values, giving a scale \(\alpha\) and offset \(\beta\), after which
calc → α·calc + β; thena standard COPER step on the rescaled data.
The returned COPERResult reports the net scale and offset relative to the original input plus a per-iteration log (icoper_iterations):
from soursop.sscoper import iCOPER, ExperimentalObservable
obs = [ExperimentalObservable(I_q[k], sigma_q[k], name=f"q{k}")
for k in range(len(q))]
result = iCOPER(obs, calc_intensities).fit(chi2_limit=1.0, ftol=0.01)
print(result.scale, result.offset)
weights = result.weights
Set fit_offset=False for a scale-only fit, and lr_weights=False for an unweighted regression.
Choosing the χ² limit (the error-scaling scan)
The χ² limit plays the role BME’s \(\theta\) plays, and is chosen the same way — by scanning it and looking for the knee of the trade-off between fit quality (χ²) and ensemble diversity (relative entropy / \(\phi\)). chi2_limit_scan() (also COPER.scan_chi2_limit / iCOPER.scan_chi2_limit) sweeps the limit over the paper’s error-scaling range and selects the knee:
scan = coper.scan_chi2_limit(chi2_limits=(0.25, 4.0), n_points=10)
scan.print_summary()
print("recommended limit:", scan.optimal_chi2_limit)
result = coper.fit(chi2_limit=scan.optimal_chi2_limit)
Interpreting the result and common pitfalls
Every fit returns a COPERResult; result.print_diagnostics() produces a formatted report and result.diagnostics() returns the same as a dict with explicit warnings. Key things to check:
Feasibility first. If
result.feasibleisFalsethe data cannot be fit by any reweighting at this limit. Loosen the χ² limit / uncertainties, or — more likely — improve sampling or the force field. Reweighting cannot create conformations the ensemble never sampled.φ / ΔS is not extreme. A very low \(\phi\) (large \(|\Delta S|\)) means a handful of frames dominate the reweighted ensemble; the result is then effectively a few structures, not an ensemble. A large \(\langle\Delta G\rangle/kT\) (well beyond a few \(kT\)) signals a model that disagrees strongly with the data.
Two effective sample sizes. The diagnostics report both the entropy-based \(N_{\text{eff}} = N\phi\) and the Rényi-2 / participation ratio \(1/\sum_i w_i^2\), which is more sensitive to a few dominant weights.
Information content of data types. Because \(\Delta S\) is well defined, comparing the entropy reduction induced by different observable groups quantifies how much each data type actually constrains the ensemble (a central use in the paper).
Realistic uncertainties. \(\sigma_k\) must include experimental and forward-model error; scaling the χ² limit is exactly equivalent to scaling all \(\sigma_k\).
Don’t validate on the fitting data. χ² against the fitted observables only improves. Assess quality by predicting independent observables with
predict(), or by checking the reproducibility of \(\Delta S\) across independent subsets of the ensemble (a sampling-density test).
COPER vs BME
Both methods produce the least-biased (maximum-entropy) reweighting consistent with the data; they differ in formulation and in which parameter you tune:
BME (ssbme) |
COPER (this page) |
|
|---|---|---|
Formulation |
Penalty: minimise \(\tfrac12\chi^2 + \theta\,D_{\mathrm{KL}}\) |
Hard constraint: maximise \(S\) s.t. \(\chi^2 \le 1\) |
Free parameter |
\(\theta\) (regularisation) |
χ² limit (= error scaling) |
Solved over |
\(M\) Lagrange multipliers (the dual problem; the weights then follow in closed form, \(w_i \propto w_i^0 e^{-\sum_k \lambda_k F_k(x_i)}\)) |
\(N\) frame weights directly (the primal problem) |
Infeasibility |
not signalled |
reported explicitly |
Choosing the knob |
L-curve |
|
For most ensembles either method gives similar weights. The difference is computational: BME optimises the \(M\)-dimensional dual problem (one Lagrange multiplier per experimental observable) and recovers the weights afterwards, which is cheaper when there are many frames but few observables (\(M \ll N\)); COPER optimises the \(N\) weights directly, but its explicit χ² constraint and feasibility test make the “can these data be fit at all?” question direct.
Scaling note. The faithful primal optimisation here uses trust-constr over the \(N\) weights; it is convex and fast for ensembles up to ~103–104 frames. The original paper used IPOPT and reached ~105; for very large ensembles, cluster or subsample the trajectory before reweighting.
Key references
The implementation here is adapted from the COPER method; please cite the primary reference if you use it:
Leung, H. T. A., Bignucolo, O., Aregger, R., Dames, S. A., Mazur, A., Bernèche, S. & Grzesiek, S. A Rigorous and Efficient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content. J. Chem. Theory Comput. 12, 383–394 (2016). doi:10.1021/acs.jctc.5b00759.
Foundational and closely related work:
Kullback, S. & Leibler, R. A. On Information and Sufficiency. Ann. Math. Stat. 22, 79–86 (1951) — the relative entropy COPER maximises.
Bottaro, S., Bengtsen, T. & Lindorff-Larsen, K. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. Methods Mol. Biol. 2112, 219–240 (2020) — the BME / iBME method in ssbme; COPER differs by using a hard χ² constraint rather than a penalty.
Hummer, G. & Köfinger, J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 143, 243150 (2015).
Różycki, B., Kim, Y. C. & Hummer, G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure 19, 109–116 (2011) — EROS; the SAXS scale/offset problem that motivates iCOPER.
The ExperimentalObservable container is documented under
Ensemble reweighting (frame weights) (it is shared with BME).
- class soursop.sscoper.COPER(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
Convex Optimization for Ensemble Reweighting (COPER).
Reweights an ensemble by maximising entropy subject to a hard chi-squared constraint, solved as a primal convex optimisation over the N frame weights via SciPy’s
trust-constrinterior-point method. The two-step procedure of Leung et al. (2016):minimise the per-group chi-squared over the simplex to test feasibility, then
maximise the (relative) entropy subject to
chi^2_alpha <= chi2_limitfor every observable group, starting from the step-1 point.
- Parameters:
observables (list of ExperimentalObservable) – Experimental observables to fit against. Observables sharing the same
grouplabel contribute to a single chi-squared termchi^2_alpha; observables without a group are pooled into one default group.calculated_values (numpy.ndarray) – Per-frame calculated observable values, shape
(n_frames, n_observables).initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalised to sum to 1.
- Raises:
SSException – If inputs are malformed (see
soursop.ssutils.validate_reweighting_inputs()).
Notes
The simplex (
w_i >= 0,sum_i w_i = 1) is enforced implicitly via a softmax reparameterisation (w = softmax(z)), so the optimiser only has to handle the few chi-squared constraints rather thanNbound constraints plus a sum-to-one equality. Both the entropy objective and the chi-squared constraints supply exact analytic Hessians as matrix-freeLinearOperatorobjects (the entropy Hessian is effectively diagonal; each chi-squared Hessian is rank<= mvia the calculated-value matrix), so no denseN x Nmatrix is ever formed and memory staysO(N)rather thanO(N^2).trust-constrthen converges in tens of iterations and scales to ~10^5 frames (a binding 10^5-conformer fit takes seconds-to-tens- of-seconds; the equivalent dense Hessian would need tens of GB). The problem is convex with a unique global solution, so neither the reparameterisation nor the exact Hessians change the optimum.- __init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
- fit(chi2_limit: float = 1.0, max_iterations: int = 2000, optimizer: str = 'trust-constr', verbose: bool = True) COPERResult[source]
Fit COPER weights at a given chi-squared limit.
- Parameters:
chi2_limit (float, optional) – Hard upper bound on each per-group chi-squared. Default
1.0(the paper’schi^2 <= 1convention).max_iterations (int, optional) – Maximum optimizer iterations (each of the two steps). Default from the module.
optimizer (str, optional) – scipy.optimize.minimize method used for the constrained entropy-maximisation step (must support nonlinear constraints). Default
"trust-constr". The feasibility step is always solved with the unconstrained"L-BFGS-B".verbose (bool, optional) – Print progress. Default True.
- Return type:
- Raises:
SSException – If
chi2_limitis not positive.
Notes
The result is also stored on the instance (
self.result,self.chi2_limit). When the problem is infeasible the returned result hasfeasible=Falseandsuccess=False; itsweightsare the step-1 chi-squared minimiser (the closest the prior ensemble can come to satisfying the data).
- scan_chi2_limit(chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) COPERScanResult[source]
Run a chi-squared-limit scan using this instance’s data.
- class soursop.sscoper.iCOPER(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
Iterative COPER for data with an unknown global scale and offset.
At each iteration a (optionally inverse-variance-weighted) linear regression of the ensemble-averaged calculated values against the experimental values is used to refit a global scale
alphaand offsetbeta; the calculated values are rescaled (calc -> alpha * calc + beta) and a standardCOPERstep is run. The procedure repeats until the change in the final chi-squared drops belowftol. This is the COPER analogue ofsoursop.ssbme.iBME.- Parameters:
observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape
(n_frames, n_observables).initial_weights (numpy.ndarray, optional) – Prior frame weights. Uniform if None.
- Raises:
SSException – If inputs are malformed.
- __init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
- fit(chi2_limit: float = 1.0, ftol: float = 0.01, max_icoper_iterations: int = 50, fit_offset: bool = True, lr_weights: bool = True, max_iterations: int = 2000, optimizer: str = 'trust-constr', verbose: bool = True) COPERResult[source]
Run iterative COPER at a fixed chi-squared limit.
- Parameters:
chi2_limit (float, optional) – Hard upper bound on each per-group chi-squared (passed to the inner
COPERstep). Default 1.0.ftol (float, optional) – Convergence tolerance on
|delta chi-squared|between iCOPER iterations. Default 0.01.max_icoper_iterations (int, optional) – Maximum number of scale/offset + COPER iterations. Default 50.
fit_offset (bool, optional) – If True fit a scale and offset; if False fit scale only. Default True.
lr_weights (bool, optional) – If True weight the linear regression by
1/sigma^2; otherwise use uniform weights. Default True.max_iterations (int, optional) – Maximum inner-optimizer iterations per COPER step.
optimizer (str, optional) – scipy.optimize.minimize method for each inner COPER step.
verbose (bool, optional) – Print per-iteration progress. Default True.
- Returns:
Result with the final weights,
scale/offset(net, relative to the original input), theicoper_iterationslog,chi_squared_initialfrom the first iteration’s pre-fit chi-squared, andphicomputed against the original prior.- Return type:
- Raises:
SSException – If
chi2_limitis not positive.
- scan_chi2_limit(chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) COPERScanResult[source]
Run a chi-squared-limit scan using iCOPER.
- predict(calculated_values: ndarray) ndarray[source]
Weighted means of arbitrary observables using the fitted weights.
- Raises:
SSException – If
fit()has not succeeded, or frame count mismatches.
- class soursop.sscoper.COPERResult(weights: ~numpy.ndarray, initial_weights: ~numpy.ndarray, chi_squared_initial: float, chi_squared_min: float, chi_squared_final: float, chi2_limit: float, feasible: bool, entropy: float, entropy_initial: float, delta_S: float, mean_delta_G_kT: float, phi: float, n_iterations: int, success: bool, message: str, observables: ~typing.List[~soursop.ssutils.ExperimentalObservable], calculated_values: ~numpy.ndarray, metadata: dict, scale: float | None = None, offset: float | None = None, icoper_iterations: ~typing.List[dict] = <factory>)[source]
Container for a COPER / iCOPER reweighting result.
- weights
Optimized (posterior) frame weights, summing to 1.
- Type:
numpy.ndarray
- initial_weights
Prior frame weights.
- Type:
numpy.ndarray
- chi_squared_initial
Reduced chi-squared at the prior (max over groups when per-data-type constraints are used).
- Type:
float
- chi_squared_min
Reduced chi-squared at the end of the step-1 minimisation (max over groups). Determines feasibility:
feasibleis True iffchi_squared_min <= chi2_limit(up toFEASIBILITY_TOLERANCE).- Type:
float
- chi_squared_final
Reduced chi-squared after the step-2 entropy maximisation (max over groups). Equal to
chi_squared_minwhenfeasibleis False.- Type:
float
- chi2_limit
The chi-squared limit used (
chi^2_alpha <= chi2_limitper group).- Type:
float
- feasible
Whether the data can be satisfied by any reweighting at this limit.
- Type:
bool
- entropy
Shannon entropy
-sum w * ln wat the optimized weights.- Type:
float
- entropy_initial
Shannon entropy at the prior.
- Type:
float
- delta_S
Entropy change
S(w) - S(w0) <= 0(equal to-KL(w || w0)). Measures the information content of the experimental data relative to the prior ensemble.- Type:
float
- mean_delta_G_kT
Mean free-energy change in
kTunits,<delta_G>/kT = -delta_S(Leung et al. eq 10).- Type:
float
- phi
Fraction of effective frames,
exp(-KL(w || w0)) in (0, 1].- Type:
float
- n_iterations
Optimizer iterations of the step-2 (entropy) optimisation, or step-1 when infeasible.
- Type:
int
- success
Whether the optimisation succeeded.
- Type:
bool
- message
Optimizer / feasibility status message.
- Type:
str
- observables
Observables used in the fit.
- Type:
list of ExperimentalObservable
- calculated_values
Calculated values used in the fit (for iCOPER, the final scaled array).
- Type:
numpy.ndarray
- metadata
Free-form metadata (optimizer, timestamp, group structure, …).
- Type:
dict
- scale
Final iCOPER scale factor relative to the original input.
Nonefor plain COPER.- Type:
float, optional
- offset
Final iCOPER offset relative to the original input.
Nonefor plain COPER.- Type:
float, optional
- icoper_iterations
Per-iteration iCOPER log (empty for plain COPER). Each entry has
iteration,scale,offset,chi_squared,feasibleanddiff.- Type:
list of dict
- predict(calculated_values: ndarray) ndarray[source]
Weighted average of arbitrary observables using these weights.
- Parameters:
calculated_values (numpy.ndarray) – Per-frame observable values, shape
(n_frames, n_observables).n_framesmust match the fitted ensemble.- Returns:
Weighted means, shape
(n_observables,).- Return type:
numpy.ndarray
- Raises:
SSException – If the number of frames does not match the fitted ensemble.
- diagnostics(warn_threshold: float = 0.5) dict[source]
Diagnose the reweighting result and flag potential issues.
- Parameters:
warn_threshold (float, optional) – Minimum acceptable
phibefore a low-diversity warning is raised. Default 0.5.- Returns:
Diagnostic metrics and a list of human-readable
warnings.- Return type:
dict
Notes
Two effective sample sizes are reported: the entropy-based
neff_entropy(N * phi, the COPER<delta_G>/kTmeasure) and the Renyi-2 / participation rationeff_renyi2(1 / sum w^2), which is more sensitive to a few dominant weights.
- print_diagnostics(warn_threshold: float = 0.5)[source]
Print a formatted diagnostic report for this result.
- Parameters:
warn_threshold (float, optional) – Passed through to
diagnostics(). Default 0.5.
- class soursop.sscoper.COPERScanResult(chi2_limits: ndarray, chi_squared_values: ndarray, phi_values: ndarray, kl_divergence_values: ndarray, feasible_mask: ndarray, results: List[COPERResult], optimal_chi2_limit: float, optimal_idx: int, method: str)[source]
Container for a chi-squared-limit scan.
- chi2_limits
Scanned chi-squared-limit grid.
- Type:
numpy.ndarray
- chi_squared_values
Final chi-squared (
chi_squared_final) per limit.- Type:
numpy.ndarray
- phi_values
Effective-fraction (phi) per limit.
- Type:
numpy.ndarray
- kl_divergence_values
Relative entropy per limit.
- Type:
numpy.ndarray
- feasible_mask
feasibleflag per limit.- Type:
numpy.ndarray of bool
- results
Per-limit fit results.
- Type:
list of COPERResult
- optimal_chi2_limit
Selected chi-squared limit (L-curve knee).
- Type:
float
- optimal_idx
Index of
optimal_chi2_limitinchi2_limits.- Type:
int
- method
Human-readable knee-selection method used.
- Type:
str
- soursop.sscoper.chi2_limit_scan(observables: List[ExperimentalObservable], calculated_values: ndarray, reweighter: str = 'coper', chi2_limits: Tuple[float, float] | ndarray = (0.25, 4.0), n_points: int = 8, log_scale: bool = True, initial_weights: ndarray | None = None, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) COPERScanResult[source]
Scan a grid of chi-squared limits for COPER or iCOPER.
The L-curve analogue of
soursop.ssbme.theta_scan(): tightening the chi-squared limit (the paper’s error-scaling range of roughly 0.25-4) trades fit quality for ensemble diversity. The knee of the chi-squared / relative-entropy curve gives a principled choice.- Parameters:
observables (list of ExperimentalObservable) – Experimental observables.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape
(n_frames, n_observables).reweighter (str, optional) –
"coper"(default) or"icoper".chi2_limits (tuple or numpy.ndarray, optional) –
(min, max)for a generated grid ofn_points, or an explicit 1D array of limits. Default(0.25, 4.0)(the paper’s error- scaling range).n_points (int, optional) – Number of grid points when
chi2_limitsis a tuple. Default 8.log_scale (bool, optional) – Sample logarithmically when
chi2_limitsis a tuple. Default True.initial_weights (numpy.ndarray, optional) – Prior weights. Uniform if None.
method (str, optional) – Knee-selection method (see
soursop.ssutils.find_optimal_theta()).verbose (bool, optional) – Print per-limit progress. Default False.
fit_kwargs (dict, optional) – Extra keyword arguments forwarded to the reweighter’s
fit(e.g.ftol/max_icoper_iterationsfor iCOPER).
- Return type:
- Raises:
SSException – If
reweighteris unknown,n_points< 1, or a log-scale grid has non-positive endpoints.