ssbme
Overview
ssbme provides Bayesian Maximum Entropy (BME) and iterative BME (iBME) reweighting of conformational ensembles against experimental observables. Given a set of per-frame calculated quantities and the corresponding experimental measurements, these methods compute a new set of frame weights that brings the ensemble averages into agreement with experiment while perturbing the prior ensemble as little as possible. The resulting weight vector plugs directly into the consistent SOURSOP reweighting system — every ensemble-average observable accepts a weights= argument (see Ensemble reweighting (frame weights)).
The module is self-contained (NumPy + SciPy only) and is adapted from the reference BME implementation of Bottaro, Bengtsen & Lindorff-Larsen with the clean array-based API used in STARLING.
It exposes three user-facing pieces:
ExperimentalObservable— a container for one experimental data point (value, uncertainty, constraint type). This class lives insoursop.ssutilsand is shared with the COPER reweighter (sscoper); it is re-exported fromsoursop.ssbmesofrom soursop.ssbme import ExperimentalObservablecontinues to work.BME— standard maximum-entropy reweighting.iBME— iterative BME, which additionally fits an unknown global scale and offset between calculated and experimental data.
Both classes return a BMEResult and share an L-curve theta_scan() helper for selecting the regularisation strength.
SOURSOP also provides COPER, an alternative maximum-entropy reweighter that imposes a hard \(\chi^2 \le 1\) constraint instead of BME’s tunable penalty. The two share the same ExperimentalObservable interface; see the “COPER vs BME” comparison on the sscoper page.
The reweighting problem
A simulation produces an ensemble of \(N\) conformations with prior weights \(w_i^{0}\) (uniform unless you supply enhanced-sampling weights). For each conformation we can compute \(M\) observables (e.g. a radius of gyration, a set of NOE distances, a SAXS intensity curve). The ensemble average of observable \(k\) is
In general these averages do not match the experimental values \(F_k^{\text{exp}} \pm \sigma_k\), either because the force field is imperfect or because the ensemble is undersampled. Reweighting asks: what is the minimal modification of the weights \(w_i^{0} \rightarrow w_i\) that reconciles the ensemble with the data?
BME answers this by minimising the functional
where \(\chi^2\) measures the disagreement with experiment and \(D_{\mathrm{KL}}(w\|w^{0}) = \sum_i w_i \ln (w_i / w_i^{0})\) is the relative entropy (information loss) between the new and prior ensembles. The hyperparameter \(\theta\) controls the trade-off: large \(\theta\) keeps the ensemble close to the prior (trusts the simulation), small \(\theta\) fits the data aggressively (trusts the experiment). The maximum-entropy solution has the closed exponential form
so in practice only the \(M\) Lagrange multipliers \(\lambda_k\) are optimised (via scipy.optimize.minimize with L-BFGS-B), not the \(N\) weights directly. This makes BME efficient even for very large ensembles.
A key diagnostic is the fraction of effective frames
reported on every result. \(\phi \approx 1\) means the data were already well reproduced (little reweighting); \(\phi \ll 1\) means a small subset of frames dominates the reweighted ensemble and the result should be treated with caution.
BME
BME performs the standard reweighting described above. Use it when the calculated observables are already on the same scale as the experimental values (e.g. an \(R_g\) in Å compared to a SAXS-derived \(R_g\) in Å, or NOE-derived distances).
You construct it with a list of observables and a (n_frames, n_observables) array of calculated values, then call fit(). fit either uses a manually supplied theta or, with auto_theta=True, runs an L-curve scan to choose one automatically:
from soursop.ssbme import BME, ExperimentalObservable
import numpy as np
# per-frame calculated values: shape (n_frames, n_observables)
calc = np.column_stack([rg_per_frame, ree_per_frame])
obs = [
ExperimentalObservable(value=23.0, uncertainty=1.0, name="Rg"),
ExperimentalObservable(value=60.0, uncertainty=2.0, name="Ree"),
]
bme = BME(obs, calc)
result = bme.fit(theta=2.0, auto_theta=False)
print(result) # chi2 before/after, phi, theta
weights = result.weights # plug into any SOURSOP observable
Constraints. Each observable carries a constraint:
"equality"(default) — the observable should matchvaluewithinuncertainty(a two-sided restraint)."upper"— the observable should not exceedvalue; being below is not penalised. Useful for e.g. an experimentally determined upper bound on a distance."lower"— the observable should not fall belowvalue.
Inequality constraints are enforced through bounds on the corresponding Lagrange multiplier, so an already-satisfied bound leaves the ensemble essentially untouched (\(\phi \approx 1\)).
iBME — iterative BME
For several important experimental techniques the calculated and experimental quantities differ by an unknown global scale and/or offset. The canonical example is small-angle X-ray scattering (SAXS): the shape of the calculated intensity curve is meaningful, but its absolute scale (and a flat background offset) are nuisance parameters that must be fitted simultaneously with the ensemble.
iBME (ported from the reference Reweight.ibme) solves this by alternating two steps until the \(\chi^2\) stabilises:
Scale/offset fit — perform a (by default \(1/\sigma^2\)-weighted) linear regression of the current ensemble-averaged calculated values against the experimental values, obtaining a scale \(\alpha\) and offset \(\beta\), and rescale
calc → α·calc + β.Maxent step — run a standard BME fit on the rescaled data and adopt the new weights.
The scale/offset accumulates across iterations; the returned BMEResult reports the net scale and offset relative to the original input, together with a per-iteration log (ibme_iterations):
from soursop.ssbme import iBME, ExperimentalObservable
import numpy as np
# e.g. SAXS: one observable per scattering-vector (q) point
obs = [ExperimentalObservable(I_q[k], sigma_q[k], name=f"q{k}")
for k in range(len(q))]
calc = calculated_intensities # shape (n_frames, len(q))
ib = iBME(obs, calc)
result = ib.fit(theta=10.0, ftol=0.01, max_ibme_iterations=50)
print(result.scale, result.offset) # fitted nuisance parameters
print(result) # chi2, phi, n_iterations
weights = result.weights
Convergence is declared when the change in \(\chi^2\) between iterations drops below ftol (or max_ibme_iterations is reached). Set fit_offset=False to fit a scale only (no additive background), and lr_weights=False to use an unweighted regression instead of the default \(1/\sigma^2\) weighting.
When to use which. Use BME when calculated and experimental observables are directly comparable. Use iBME when there is an unknown multiplicative scale and/or additive offset linking them (SAXS being the textbook case). If in doubt, iBME with fit_offset=True reduces to BME when the true scale is 1 and offset is 0, at the cost of a few extra iterations.
BMECustom — vector / matrix BME with a pluggable cost
For profile data (a SAXS curve, a PRE intensity profile, an NMR chemical-shift vector — anything where the experiment is a length-\(m\) vector and you can back-calculate the same observable per conformer) and for cases where you want a non-Gaussian goodness-of-fit (log-scale \(\chi^2\), a covariance \(\chi^2\), a Pearson-correlation cost, …), BMECustom reweights the ensemble against:
an experimental vector
experimentof shape(m,),a per-conformer matrix
calculated_valuesof shape(n_frames, m),an optional uncertainty scalar or length-
mvector (used by the default cost), andan optional cost function
cost(experiment, calculated_values, weights) -> float(lower = better fit).
It is the penalty form of Bayesian Ensemble Refinement: rather than BME’s closed-form exponential solution (special to the Gaussian \(\chi^2\)), it optimises the weights directly to minimise
over the simplex (\(w_i \ge 0\), \(\sum_i w_i = 1\)). Internally the simplex is enforced via a softmax reparameterisation (\(w = \mathrm{softmax}(z)\)), which reduces the fit to an unconstrained minimisation in \(z \in \mathbb{R}^{n}\) solved by SciPy’s L-BFGS-B (and avoids boundary singularities in \(\log w\)). With no cost_function it reproduces BME’s reduced \(\chi^2\) exactly; with a custom one it is the natural way to plug in any scalar fit metric.
The cost function must take the weights — reweighting is the act of choosing them, and the cost can only be reweighting-sensitive through the weighted prediction \(\langle F_k \rangle = \sum_i w_i\, \mathrm{calc}[i, k]\):
import numpy as np
from soursop.ssbme import BMECustom
# SAXS-like: m points, n conformers
bme = BMECustom(I_exp, calc_I, uncertainty=sigma_exp)
result = bme.fit(theta=1.0)
weights = result.weights
# custom cost — chi-squared on a log scale
def log_chi2(experiment, calculated_values, weights):
avg = weights @ calculated_values
return float(np.mean((np.log(avg) - np.log(experiment)) ** 2))
result = BMECustom(I_exp, calc_I, cost_function=log_chi2).fit(theta=1.0)
The result is a BMECustomResult carrying cost_initial / cost_final in place of \(\chi^2\), phi, the per-frame reweighting_factors \(r_i = w_i / w_i^{0}\), a predict for new observables, and the usual diagnostics / print_diagnostics. BMECustom.scan_theta runs the same L-curve trade-off using the cost in place of \(\chi^2\).
Performance note. The default \(\chi^2\) path uses an analytic gradient; a custom cost_function triggers a finite-difference gradient (one cost evaluation per weight), so an expensive cost on a very large ensemble can be slow. Convexity (and hence a unique optimum) is guaranteed only when the cost is convex in w; an arbitrary cost is optimised locally.
Choosing theta (the L-curve scan)
\(\theta\) is the single most important choice and should not be left at an arbitrary value. The principled approach is an L-curve scan: fit over a grid of \(\theta\), then plot \(\chi^2\) against the relative entropy (or \(\phi\)). The curve is L-shaped — there is a “knee” beyond which fitting the data better requires a disproportionate loss of ensemble diversity. theta_scan() (also available as BME.scan_theta / iBME.scan_theta) automates this and selects the knee via a perpendicular-distance or Menger-curvature criterion:
scan = bme.scan_theta(theta_range=(0.01, 50.0), n_points=20)
scan.print_summary()
print("recommended theta:", scan.optimal_theta)
# refit at the recommended value
result = bme.fit(theta=scan.optimal_theta, auto_theta=False)
Calling bme.fit(auto_theta=True) (the default when no theta is given) performs the scan and returns the knee result in one step.
Interpreting the result and common pitfalls
Every fit returns a BMEResult. Inspect it before trusting the weights — result.print_diagnostics() produces a formatted report, and result.diagnostics() returns the same information as a dict with explicit warnings. Key things to check:
χ² actually decreased.
chi_squared_initialvs.chi_squared_final. If it barely moved, the ensemble may already agree with the data, or the optimisation failed (checkresult.success).φ is not tiny. A very low \(\phi\) (e.g. \(< 0.1\)) means a handful of frames carry essentially all the weight; the “reweighted ensemble” is then statistically a few structures, not an ensemble. This usually indicates the prior ensemble does not contain conformations consistent with the data — reweighting cannot create structures that were never sampled. Prefer increasing \(\theta\), loosening uncertainties, or improving sampling.
Two effective sample sizes. The diagnostics report both the entropy-based \(N_{\text{eff}} = N\phi\) (the usual BME measure) and the Rényi-2 / participation ratio \(1/\sum_i w_i^2\), which is more sensitive to a few dominant weights.
Realistic uncertainties. \(\sigma_k\) must include experimental error and forward-model error. Uncertainties that are too tight force aggressive, low-\(\phi\) reweighting; too loose and nothing happens.
Don’t validate on the fitting data. \(\chi^2\) against the observables used for reweighting will always improve. Assess quality with cross-validation (the L-curve knee) or by predicting independent observables with
predict().Garbage in, garbage out. Reweighting only redistributes weight among conformations that were already sampled. It is not a substitute for adequate sampling (see PENGUIN in sssampling) or a reasonable force field.
Key references
The implementation here is adapted from the reference BME software and protocol; please cite the primary reference if you use it:
Bottaro, S., Bengtsen, T. & Lindorff-Larsen, K. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. Methods Mol. Biol. 2112, 219–240 (2020). doi:10.1007/978-1-0716-0270-6_15 (preprint: bioRxiv 2018). Reference code: github.com/KULL-Centre/BME.
Foundational and closely related methodology:
Hummer, G. & Köfinger, J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 143, 243150 (2015) — the BioEn reweighting functional to which BME is mathematically equivalent.
Cesari, A., Gil-Ley, A. & Bussi, G. Combining simulations and solution experiments as a paradigm for RNA force field refinement. J. Chem. Theory Comput. 12, 6192–6200 (2016) — the maximum-entropy-with-error objective \(\Gamma(\lambda)\) minimised by this module.
Cesari, A., Reißer, S. & Bussi, G. Using the maximum entropy principle to combine simulations and solution experiments. Computation 6, 15 (2018) — a review of the method and its failure modes.
Różycki, B., Kim, Y. C. & Hummer, G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure 19, 109–116 (2011) — EROS; origin of the global scaling parameter \(\theta\) and the SAXS scale/offset problem that motivates iBME.
Bottaro, S. & Lindorff-Larsen, K. Biophysical experiments and biomolecular simulations: A perfect match? Science 361, 355–360 (2018) — perspective on integrating simulation and experiment.
The ExperimentalObservable container is documented under
Ensemble reweighting (frame weights) (it is shared with COPER).
- class soursop.ssbme.BME(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
Bayesian Maximum Entropy reweighting.
Finds the minimally biased frame weights that bring ensemble averages into agreement with experimental observables, by maximising entropy relative to the prior subject to a Gaussian-error data restraint.
- Parameters:
observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated observable values, shape
(n_frames, n_observables).initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalized to sum to 1.
- Raises:
SSException – If inputs are malformed (see
_validate_inputs()).
- __init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
- fit(max_iterations: int = 50000, optimizer: str = 'L-BFGS-B', verbose: bool = True, *, theta: float | None = None, auto_theta: bool = True, theta_scan_kwargs: dict | None = None) BMEResult[source]
Fit BME weights at a fixed or automatically selected theta.
- Parameters:
max_iterations (int, optional) – Maximum optimizer iterations. Default
DEFAULT_MAX_ITERATIONS.optimizer (str, optional) – scipy.optimize.minimize method. Default
"L-BFGS-B".verbose (bool, optional) – Print progress. Default True.
theta (float, optional) – If given, use this theta (no scan). Must be positive.
auto_theta (bool, optional) – If True and
thetais None, run an L-curve scan to pick theta. Default True.theta_scan_kwargs (dict, optional) – Extra keyword arguments forwarded to
scan_theta().
- Return type:
- Raises:
SSException – If
thetais not positive.
- scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) ThetaScanResult[source]
Run an L-curve theta scan using this instance’s data.
- Parameters:
theta_range – See
theta_scan().n_points – See
theta_scan().log_scale – See
theta_scan().method – See
theta_scan().verbose – See
theta_scan().
- Return type:
- predict(calculated_values: ndarray) ndarray[source]
Weighted means of arbitrary observables using the fitted weights.
- Parameters:
calculated_values (numpy.ndarray) – Per-frame values, shape
(n_frames, n_observables).- Returns:
Weighted means, shape
(n_observables,).- Return type:
numpy.ndarray
- Raises:
SSException – If
fit()has not succeeded, or frame count mismatches.
- class soursop.ssbme.iBME(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
Iterative Bayesian Maximum Entropy reweighting.
At each iteration a (optionally error-weighted) linear regression of the ensemble-averaged calculated values against the experimental values is used to refit a global scale
alphaand offsetbeta; the calculated values are rescaled (calc -> alpha*calc + beta) and a standard BME maxent step is run. The procedure repeats until the change in chi-squared drops belowftol. iBME is appropriate when there is an unknown global scale/offset between calculated and experimental data (e.g. SAXS). Ported from the reference implementation of Bottaro et al.- Parameters:
observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape
(n_frames, n_observables).initial_weights (numpy.ndarray, optional) – Prior frame weights. Uniform if None.
- Raises:
SSException – If inputs are malformed (see
_validate_inputs()).
- __init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]
- fit(theta: float, ftol: float = 0.01, max_ibme_iterations: int = 50, fit_offset: bool = True, lr_weights: bool = True, max_iterations: int = 50000, optimizer: str = 'L-BFGS-B', verbose: bool = True) BMEResult[source]
Run iterative BME at a fixed theta.
- Parameters:
theta (float) – Regularisation strength. Must be positive.
ftol (float, optional) – Convergence tolerance on
|delta chi-squared|between iterations. Default 0.01.max_ibme_iterations (int, optional) – Maximum number of scale/offset+BME iterations. Default 50.
fit_offset (bool, optional) – If True fit a scale and offset; if False fit scale only. Default True.
lr_weights (bool, optional) – If True weight the linear regression by
1/sigma^2; otherwise use uniform weights. Default True.max_iterations (int, optional) – Maximum optimizer iterations for each inner BME step.
optimizer (str, optional) – scipy.optimize.minimize method for each inner BME step.
verbose (bool, optional) – Print per-iteration progress. Default True.
- Returns:
Result with the final weights,
scale/offset(net, relative to the original input), theibme_iterationslog,chi_squared_initialfrom the first iteration’s pre-fit chi-squared, andphicomputed against the original prior.- Return type:
- Raises:
SSException – If
thetais not positive.
- scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) ThetaScanResult[source]
Run an L-curve theta scan using iterative BME.
- Parameters:
theta_range – See
theta_scan().n_points – See
theta_scan().log_scale – See
theta_scan().method – See
theta_scan().verbose – See
theta_scan().fit_kwargs (dict, optional) – Extra keyword arguments forwarded to
fit()for each theta (e.g.ftol,max_ibme_iterations,fit_offset).
- Return type:
- predict(calculated_values: ndarray) ndarray[source]
Weighted means of arbitrary observables using the fitted weights.
- Parameters:
calculated_values (numpy.ndarray) – Per-frame values, shape
(n_frames, n_observables).- Returns:
Weighted means, shape
(n_observables,).- Return type:
numpy.ndarray
- Raises:
SSException – If
fit()has not succeeded, or frame count mismatches.
- class soursop.ssbme.BMECustom(experiment: ndarray, calculated_values: ndarray, uncertainty=None, cost_function=None, initial_weights: ndarray | None = None)[source]
BME reweighting against an experimental vector with a custom cost.
A “BME variant” that reweights an ensemble against a whole experimental profile (e.g. a SAXS or PRE curve), with an optional user-supplied goodness-of-fit function. It is the penalty form of Bayesian Ensemble Refinement: it minimises
\[\mathcal{L}(w) = \text{cost}(w) + \theta\, D_{\mathrm{KL}}(w\,\|\,w^0)\]directly over the
nframe weights (subject to the simplexw_i \ge 0,\sum_i w_i = 1) using SciPy’strust-constr. The entropy penaltythetaplays the same role as inBME: largethetakeeps the ensemble close to the prior, smallthetafits the data harder.Unlike
BME(which exploits a closed-form exponential solution special to the Gaussian chi-squared), an arbitrary cost has no such closed form, so the weights are optimised directly. The default cost reproducesBME’s reduced chi-squared exactly.- Parameters:
experiment (numpy.ndarray) – Experimental vector, shape
(m,)(e.g. one value per SAXSq-point or per PRE residue).calculated_values (numpy.ndarray) – Per-conformer calculated values for the same observable, shape
(n_frames, m).uncertainty (float or numpy.ndarray, optional) – Experimental uncertainty used by the default chi-squared cost: a scalar (broadcast to all
mpoints) or a length-mvector. Ignored whencost_functionis supplied. Defaults to1.0.cost_function (callable, optional) – Custom goodness-of-fit, called as
cost_function(experiment, calculated_values, weights) -> float, whereexperimentis(m,),calculated_valuesis(n_frames, m)andweightsis the current(n_frames,)weight vector. Lower is better. IfNone(default), the reduced chi-squaredmean_k ((<calc>_k - V_k)/sigma_k)^2is used, where<calc>_k = sum_i w_i calc[i, k].initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalised to sum to 1.
- Raises:
SSException – If inputs are malformed.
Notes
With a custom
cost_functionthe optimiser uses a finite-difference gradient, so very large ensembles or expensive cost functions can be slow; the default chi-squared path uses an analytic gradient. The problem is convex (unique optimum) when the cost is convex inw(the default chi-squared is); an arbitrary cost is optimised locally.Examples
>>> import numpy as np >>> from soursop.ssbme import BMECustom >>> # SAXS-like profile: m points, n conformers >>> bme = BMECustom(I_exp, calc_I, uncertainty=sigma_exp) >>> result = bme.fit(theta=1.0) >>> weights = result.weights >>> >>> # custom cost: chi-squared on a log scale >>> def log_chi2(exp, calc, w): ... avg = w @ calc ... return float(np.mean((np.log(avg) - np.log(exp)) ** 2)) >>> result = BMECustom(I_exp, calc_I, cost_function=log_chi2).fit(theta=1.0)
- __init__(experiment: ndarray, calculated_values: ndarray, uncertainty=None, cost_function=None, initial_weights: ndarray | None = None)[source]
- fit(theta: float = 1.0, max_iterations: int = 2000, optimizer: str = 'L-BFGS-B', verbose: bool = True) BMECustomResult[source]
Reweight by minimising
cost(w) + theta * D_KL(w || w0).Internally the simplex
w_i >= 0,sum_i w_i = 1is enforced via a softmax reparameterisation (w = softmax(z)), reducing the problem to an unconstrained minimisation overzsolved by L-BFGS-B. This avoids boundary kinks inlog wand gives clean convergence on both the analytic-cost and custom-cost paths.- Parameters:
theta (float, optional) – Entropy-penalty strength (must be positive). Default 1.0.
max_iterations (int, optional) – Maximum optimizer iterations. Default 2000.
optimizer (str, optional) – scipy.optimize.minimize method for the unconstrained problem in the softmax-reparameterised variable
z. Default"L-BFGS-B".verbose (bool, optional) – Print progress. Default True.
- Return type:
- Raises:
SSException – If
thetais not positive.
- scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 100.0), n_points: int = 12, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) ThetaScanResult[source]
Scan
thetaand pick the cost vs. relative-entropy knee.The L-curve analogue of
BME.scan_theta(). The returnedThetaScanResultstores the cost in itschi_squared_valuesfield (the generic fit metric for this variant).- Parameters:
theta_range – See
theta_scan(). Default range(0.01, 100.0).n_points – See
theta_scan(). Default range(0.01, 100.0).log_scale – See
theta_scan(). Default range(0.01, 100.0).method – See
theta_scan(). Default range(0.01, 100.0).verbose – See
theta_scan(). Default range(0.01, 100.0).
- Return type:
- class soursop.ssbme.BMEResult(weights: ~numpy.ndarray, initial_weights: ~numpy.ndarray, lambdas: ~numpy.ndarray, chi_squared_initial: float, chi_squared_final: float, phi: float, n_iterations: int, success: bool, message: str, theta: float, observables: ~typing.List[~soursop.ssutils.ExperimentalObservable], calculated_values: ~numpy.ndarray, metadata: dict, scale: float | None = None, offset: float | None = None, ibme_iterations: ~typing.List[dict] = <factory>)[source]
Container for a BME / iBME reweighting result.
- weights
Optimized (posterior) frame weights, summing to 1.
- Type:
numpy.ndarray
- initial_weights
Prior frame weights.
- Type:
numpy.ndarray
- lambdas
Optimized Lagrange multipliers (one per observable).
- Type:
numpy.ndarray
- chi_squared_initial
Reduced chi-squared before reweighting.
- Type:
float
- chi_squared_final
Reduced chi-squared after reweighting.
- Type:
float
- phi
Fraction of effective frames,
exp(-D_KL(w || w0)).- Type:
float
- n_iterations
Number of optimizer iterations (maxent) or iBME iterations.
- Type:
int
- success
Whether the optimization succeeded.
- Type:
bool
- message
Optimizer status message.
- Type:
str
- theta
Regularisation strength used.
- Type:
float
- observables
Observables used in the fit.
- Type:
list of ExperimentalObservable
- calculated_values
Calculated values used in the fit (for iBME, the final scaled array).
- Type:
numpy.ndarray
- metadata
Free-form metadata (optimizer, timestamp, …).
- Type:
dict
- scale
Final iBME scale factor (alpha) relative to the original input.
Nonefor plain BME.- Type:
float, optional
- offset
Final iBME offset (beta) relative to the original input.
Nonefor plain BME.- Type:
float, optional
- ibme_iterations
Per-iteration iBME log (empty for plain BME). Each entry has
iteration,scale,offset,chi_squaredanddiff.- Type:
list of dict
- predict(calculated_values: ndarray) ndarray[source]
Weighted average of arbitrary observables using these weights.
- Parameters:
calculated_values (numpy.ndarray) – Per-frame observable values, shape
(n_frames, n_observables).n_framesmust match the fitted ensemble.- Returns:
Weighted means, shape
(n_observables,).- Return type:
numpy.ndarray
- Raises:
SSException – If the number of frames does not match the fitted ensemble.
- diagnostics(warn_threshold: float = 0.5) dict[source]
Diagnose the reweighting result and flag potential issues.
- Parameters:
warn_threshold (float, optional) – Minimum acceptable
phibefore a low-diversity warning is raised. Default 0.5.- Returns:
Diagnostic metrics and a list of human-readable
warnings.- Return type:
dict
Notes
Two effective sample sizes are reported: the entropy-based
neff_entropy(N * phi, the usual BME measure) and the Renyi-2 / participation rationeff_renyi2(1 / sum w^2), which is more sensitive to a few dominant weights.
- print_diagnostics(warn_threshold: float = 0.5)[source]
Print a formatted diagnostic report for this result.
- Parameters:
warn_threshold (float, optional) – Passed through to
diagnostics(). Default 0.5.
- class soursop.ssbme.BMECustomResult(weights: ndarray, initial_weights: ndarray, cost_initial: float, cost_final: float, phi: float, n_iterations: int, success: bool, message: str, theta: float, experiment: ndarray, calculated_values: ndarray, metadata: dict)[source]
Container for a
BMECustomreweighting result.Mirrors
BMEResultbut for the raw vector/matrix + custom-cost variant: the scalar fit metric is a generalcost(the reduced chi-squared by default, or whatever the user’scost_functionreturns) rather than a constraint-aware chi-squared, and the experimental data are held as plain arrays rather than a list ofExperimentalObservable.- weights
Optimized (posterior) frame weights, summing to 1.
- Type:
numpy.ndarray
- initial_weights
Prior frame weights.
- Type:
numpy.ndarray
- cost_initial
Cost (goodness-of-fit) at the prior weights.
- Type:
float
- cost_final
Cost at the optimized weights.
- Type:
float
- phi
Fraction of effective frames,
exp(-D_KL(w || w0)) in (0, 1].- Type:
float
- n_iterations
Optimizer iterations.
- Type:
int
- success
Whether the optimization succeeded.
- Type:
bool
- message
Optimizer status message.
- Type:
str
- theta
Entropy-penalty strength used.
- Type:
float
- experiment
Experimental vector, shape
(m,).- Type:
numpy.ndarray
- calculated_values
Per-conformer calculated matrix used in the fit, shape
(n, m).- Type:
numpy.ndarray
- metadata
Free-form metadata (optimizer, whether a custom cost was used, …).
- Type:
dict
- predict(calculated_values: ndarray) ndarray[source]
Weighted average of arbitrary observables using these weights.
- Parameters:
calculated_values (numpy.ndarray) – Per-frame values, shape
(n_frames, ...);n_framesmust match the fitted ensemble.- Returns:
Weighted average over frames (axis 0).
- Return type:
numpy.ndarray
- Raises:
SSException – If the number of frames does not match the fitted ensemble.
- diagnostics(warn_threshold: float = 0.5) dict[source]
Diagnose the reweighting result and flag potential issues.
- Parameters:
warn_threshold (float, optional) – Minimum acceptable
phibefore a low-diversity warning is raised. Default 0.5.- Returns:
Diagnostic metrics and a list of human-readable
warnings.- Return type:
dict
- class soursop.ssbme.ThetaScanResult(theta_values: ndarray, chi_squared_values: ndarray, phi_values: ndarray, kl_divergence_values: ndarray, results: List[BMEResult], optimal_theta: float, optimal_idx: int, method: str)[source]
Container for an L-curve theta scan.
- theta_values
Scanned theta grid.
- Type:
numpy.ndarray
- chi_squared_values
Final chi-squared at each theta.
- Type:
numpy.ndarray
- phi_values
Effective-fraction (phi) at each theta.
- Type:
numpy.ndarray
- kl_divergence_values
Relative entropy at each theta.
- Type:
numpy.ndarray
- optimal_theta
Selected theta (L-curve knee).
- Type:
float
- optimal_idx
Index of
optimal_thetaintheta_values.- Type:
int
- method
Human-readable knee-selection method used.
- Type:
str
- soursop.ssbme.theta_scan(observables: List[ExperimentalObservable], calculated_values: ndarray, reweighter: str = 'bme', theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, initial_weights: ndarray | None = None, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) ThetaScanResult[source]
Scan a grid of theta values for BME or iBME and pick the L-curve knee.
- Parameters:
observables (list of ExperimentalObservable) – Experimental observables.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape
(n_frames, n_observables).reweighter (str, optional) –
"bme"(default) or"ibme".theta_range (tuple or numpy.ndarray, optional) –
(min, max)for a generated grid ofn_points, or an explicit 1D array of theta values. Default(0.01, 10.0).n_points (int, optional) – Number of grid points when
theta_rangeis a tuple. Default 15.log_scale (bool, optional) – Sample logarithmically when
theta_rangeis a tuple. Default True.initial_weights (numpy.ndarray, optional) – Prior weights. Uniform if None.
method (str, optional) – Knee-selection method (see
find_optimal_theta()).verbose (bool, optional) – Print per-theta progress. Default False.
fit_kwargs (dict, optional) – Extra keyword arguments forwarded to the reweighter’s
fit(e.g.ftol/max_ibme_iterationsfor iBME).
- Return type:
- Raises:
SSException – If
reweighteris unknown,n_points< 1, or a log-scale grid has non-positive endpoints.