ssbme

Overview

ssbme provides Bayesian Maximum Entropy (BME) and iterative BME (iBME) reweighting of conformational ensembles against experimental observables. Given a set of per-frame calculated quantities and the corresponding experimental measurements, these methods compute a new set of frame weights that brings the ensemble averages into agreement with experiment while perturbing the prior ensemble as little as possible. The resulting weight vector plugs directly into the consistent SOURSOP reweighting system — every ensemble-average observable accepts a weights= argument (see Ensemble reweighting (frame weights)).

The module is self-contained (NumPy + SciPy only) and is adapted from the reference BME implementation of Bottaro, Bengtsen & Lindorff-Larsen with the clean array-based API used in STARLING.

It exposes three user-facing pieces:

ExperimentalObservable — a container for one experimental data point (value, uncertainty, constraint type). This class lives in soursop.ssutils and is shared with the COPER reweighter (sscoper); it is re-exported from soursop.ssbme so from soursop.ssbme import ExperimentalObservable continues to work.
BME — standard maximum-entropy reweighting.
iBME — iterative BME, which additionally fits an unknown global scale and offset between calculated and experimental data.

Both classes return a BMEResult and share an L-curve theta_scan() helper for selecting the regularisation strength.

SOURSOP also provides COPER, an alternative maximum-entropy reweighter that imposes a hard \(\chi^2 \le 1\) constraint instead of BME’s tunable penalty. The two share the same ExperimentalObservable interface; see the “COPER vs BME” comparison on the sscoper page.

The reweighting problem

A simulation produces an ensemble of \(N\) conformations with prior weights \(w_i^{0}\) (uniform unless you supply enhanced-sampling weights). For each conformation we can compute \(M\) observables (e.g. a radius of gyration, a set of NOE distances, a SAXS intensity curve). The ensemble average of observable \(k\) is

\[\langle F_k \rangle = \sum_{i=1}^{N} w_i \, F_k(x_i).\]

In general these averages do not match the experimental values \(F_k^{\text{exp}} \pm \sigma_k\), either because the force field is imperfect or because the ensemble is undersampled. Reweighting asks: what is the minimal modification of the weights \(w_i^{0} \rightarrow w_i\) that reconciles the ensemble with the data?

BME answers this by minimising the functional

\[\mathcal{L}(w) = \tfrac{1}{2}\chi^2(w) \;+\; \theta \, D_{\mathrm{KL}}(w \,\|\, w^{0}),\]

where \(\chi^2\) measures the disagreement with experiment and \(D_{\mathrm{KL}}(w\|w^{0}) = \sum_i w_i \ln (w_i / w_i^{0})\) is the relative entropy (information loss) between the new and prior ensembles. The hyperparameter \(\theta\) controls the trade-off: large \(\theta\) keeps the ensemble close to the prior (trusts the simulation), small \(\theta\) fits the data aggressively (trusts the experiment). The maximum-entropy solution has the closed exponential form

\[w_i \;\propto\; w_i^{0}\,\exp\!\Big(\!-\!\sum_{k}\lambda_k F_k(x_i)\Big),\]

so in practice only the \(M\) Lagrange multipliers \(\lambda_k\) are optimised (via scipy.optimize.minimize with L-BFGS-B), not the \(N\) weights directly. This makes BME efficient even for very large ensembles.

A key diagnostic is the fraction of effective frames

\[\phi \;=\; \exp\!\big(\!-\!D_{\mathrm{KL}}(w\,\|\,w^{0})\big) \in (0, 1],\]

reported on every result. \(\phi \approx 1\) means the data were already well reproduced (little reweighting); \(\phi \ll 1\) means a small subset of frames dominates the reweighted ensemble and the result should be treated with caution.

BME

BME performs the standard reweighting described above. Use it when the calculated observables are already on the same scale as the experimental values (e.g. an \(R_g\) in Å compared to a SAXS-derived \(R_g\) in Å, or NOE-derived distances).

You construct it with a list of observables and a (n_frames, n_observables) array of calculated values, then call fit(). fit either uses a manually supplied theta or, with auto_theta=True, runs an L-curve scan to choose one automatically:

from soursop.ssbme import BME, ExperimentalObservable
import numpy as np

# per-frame calculated values: shape (n_frames, n_observables)
calc = np.column_stack([rg_per_frame, ree_per_frame])

obs = [
    ExperimentalObservable(value=23.0, uncertainty=1.0, name="Rg"),
    ExperimentalObservable(value=60.0, uncertainty=2.0, name="Ree"),
]

bme = BME(obs, calc)
result = bme.fit(theta=2.0, auto_theta=False)

print(result)                       # chi2 before/after, phi, theta
weights = result.weights            # plug into any SOURSOP observable

Constraints. Each observable carries a constraint:

"equality" (default) — the observable should match value within uncertainty (a two-sided restraint).
"upper" — the observable should not exceed value; being below is not penalised. Useful for e.g. an experimentally determined upper bound on a distance.
"lower" — the observable should not fall below value.

Inequality constraints are enforced through bounds on the corresponding Lagrange multiplier, so an already-satisfied bound leaves the ensemble essentially untouched (\(\phi \approx 1\)).

iBME — iterative BME

For several important experimental techniques the calculated and experimental quantities differ by an unknown global scale and/or offset. The canonical example is small-angle X-ray scattering (SAXS): the shape of the calculated intensity curve is meaningful, but its absolute scale (and a flat background offset) are nuisance parameters that must be fitted simultaneously with the ensemble.

iBME (ported from the reference Reweight.ibme) solves this by alternating two steps until the \(\chi^2\) stabilises:

Scale/offset fit — perform a (by default \(1/\sigma^2\)-weighted) linear regression of the current ensemble-averaged calculated values against the experimental values, obtaining a scale \(\alpha\) and offset \(\beta\), and rescale calc → α·calc + β.
Maxent step — run a standard BME fit on the rescaled data and adopt the new weights.

The scale/offset accumulates across iterations; the returned BMEResult reports the net scale and offset relative to the original input, together with a per-iteration log (ibme_iterations):

from soursop.ssbme import iBME, ExperimentalObservable
import numpy as np

# e.g. SAXS: one observable per scattering-vector (q) point
obs = [ExperimentalObservable(I_q[k], sigma_q[k], name=f"q{k}")
       for k in range(len(q))]
calc = calculated_intensities          # shape (n_frames, len(q))

ib = iBME(obs, calc)
result = ib.fit(theta=10.0, ftol=0.01, max_ibme_iterations=50)

print(result.scale, result.offset)     # fitted nuisance parameters
print(result)                          # chi2, phi, n_iterations
weights = result.weights

Convergence is declared when the change in \(\chi^2\) between iterations drops below ftol (or max_ibme_iterations is reached). Set fit_offset=False to fit a scale only (no additive background), and lr_weights=False to use an unweighted regression instead of the default \(1/\sigma^2\) weighting.

When to use which. Use BME when calculated and experimental observables are directly comparable. Use iBME when there is an unknown multiplicative scale and/or additive offset linking them (SAXS being the textbook case). If in doubt, iBME with fit_offset=True reduces to BME when the true scale is 1 and offset is 0, at the cost of a few extra iterations.

BMECustom — vector / matrix BME with a pluggable cost

For profile data (a SAXS curve, a PRE intensity profile, an NMR chemical-shift vector — anything where the experiment is a length-\(m\) vector and you can back-calculate the same observable per conformer) and for cases where you want a non-Gaussian goodness-of-fit (log-scale \(\chi^2\), a covariance \(\chi^2\), a Pearson-correlation cost, …), BMECustom reweights the ensemble against:

an experimental vector experiment of shape (m,),
a per-conformer matrix calculated_values of shape (n_frames, m),
an optional uncertainty scalar or length-m vector (used by the default cost), and
an optional cost function cost(experiment, calculated_values, weights) -> float (lower = better fit).

It is the penalty form of Bayesian Ensemble Refinement: rather than BME’s closed-form exponential solution (special to the Gaussian \(\chi^2\)), it optimises the weights directly to minimise

\[\mathcal{L}(w) = \text{cost}(w) \;+\; \theta\, D_{\mathrm{KL}}(w\,\|\,w^{0}),\]

over the simplex (\(w_i \ge 0\), \(\sum_i w_i = 1\)). Internally the simplex is enforced via a softmax reparameterisation (\(w = \mathrm{softmax}(z)\)), which reduces the fit to an unconstrained minimisation in \(z \in \mathbb{R}^{n}\) solved by SciPy’s L-BFGS-B (and avoids boundary singularities in \(\log w\)). With no cost_function it reproduces BME’s reduced \(\chi^2\) exactly; with a custom one it is the natural way to plug in any scalar fit metric.

The cost function must take the weights — reweighting is the act of choosing them, and the cost can only be reweighting-sensitive through the weighted prediction \(\langle F_k \rangle = \sum_i w_i\, \mathrm{calc}[i, k]\):

import numpy as np
from soursop.ssbme import BMECustom

# SAXS-like: m points, n conformers
bme = BMECustom(I_exp, calc_I, uncertainty=sigma_exp)
result = bme.fit(theta=1.0)
weights = result.weights

# custom cost — chi-squared on a log scale
def log_chi2(experiment, calculated_values, weights):
    avg = weights @ calculated_values
    return float(np.mean((np.log(avg) - np.log(experiment)) ** 2))

result = BMECustom(I_exp, calc_I, cost_function=log_chi2).fit(theta=1.0)

The result is a BMECustomResult carrying cost_initial / cost_final in place of \(\chi^2\), phi, the per-frame reweighting_factors \(r_i = w_i / w_i^{0}\), a predict for new observables, and the usual diagnostics / print_diagnostics. BMECustom.scan_theta runs the same L-curve trade-off using the cost in place of \(\chi^2\).

Performance note. The default \(\chi^2\) path uses an analytic gradient; a custom cost_function triggers a finite-difference gradient (one cost evaluation per weight), so an expensive cost on a very large ensemble can be slow. Convexity (and hence a unique optimum) is guaranteed only when the cost is convex in w; an arbitrary cost is optimised locally.

Choosing theta (the L-curve scan)

\(\theta\) is the single most important choice and should not be left at an arbitrary value. The principled approach is an L-curve scan: fit over a grid of \(\theta\), then plot \(\chi^2\) against the relative entropy (or \(\phi\)). The curve is L-shaped — there is a “knee” beyond which fitting the data better requires a disproportionate loss of ensemble diversity. theta_scan() (also available as BME.scan_theta / iBME.scan_theta) automates this and selects the knee via a perpendicular-distance or Menger-curvature criterion:

scan = bme.scan_theta(theta_range=(0.01, 50.0), n_points=20)
scan.print_summary()
print("recommended theta:", scan.optimal_theta)

# refit at the recommended value
result = bme.fit(theta=scan.optimal_theta, auto_theta=False)

Calling bme.fit(auto_theta=True) (the default when no theta is given) performs the scan and returns the knee result in one step.

Interpreting the result and common pitfalls

Every fit returns a BMEResult. Inspect it before trusting the weights — result.print_diagnostics() produces a formatted report, and result.diagnostics() returns the same information as a dict with explicit warnings. Key things to check:

χ² actually decreased. chi_squared_initial vs. chi_squared_final. If it barely moved, the ensemble may already agree with the data, or the optimisation failed (check result.success).
φ is not tiny. A very low \(\phi\) (e.g. \(< 0.1\)) means a handful of frames carry essentially all the weight; the “reweighted ensemble” is then statistically a few structures, not an ensemble. This usually indicates the prior ensemble does not contain conformations consistent with the data — reweighting cannot create structures that were never sampled. Prefer increasing \(\theta\), loosening uncertainties, or improving sampling.
Two effective sample sizes. The diagnostics report both the entropy-based \(N_{\text{eff}} = N\phi\) (the usual BME measure) and the Rényi-2 / participation ratio \(1/\sum_i w_i^2\), which is more sensitive to a few dominant weights.
Realistic uncertainties. \(\sigma_k\) must include experimental error and forward-model error. Uncertainties that are too tight force aggressive, low-\(\phi\) reweighting; too loose and nothing happens.
Don’t validate on the fitting data. \(\chi^2\) against the observables used for reweighting will always improve. Assess quality with cross-validation (the L-curve knee) or by predicting independent observables with predict().
Garbage in, garbage out. Reweighting only redistributes weight among conformations that were already sampled. It is not a substitute for adequate sampling (see PENGUIN in sssampling) or a reasonable force field.

Key references

The implementation here is adapted from the reference BME software and protocol; please cite the primary reference if you use it:

Bottaro, S., Bengtsen, T. & Lindorff-Larsen, K. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. Methods Mol. Biol. 2112, 219–240 (2020). doi:10.1007/978-1-0716-0270-6_15 (preprint: bioRxiv 2018). Reference code: github.com/KULL-Centre/BME.

Foundational and closely related methodology:

Hummer, G. & Köfinger, J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 143, 243150 (2015) — the BioEn reweighting functional to which BME is mathematically equivalent.
Cesari, A., Gil-Ley, A. & Bussi, G. Combining simulations and solution experiments as a paradigm for RNA force field refinement. J. Chem. Theory Comput. 12, 6192–6200 (2016) — the maximum-entropy-with-error objective \(\Gamma(\lambda)\) minimised by this module.
Cesari, A., Reißer, S. & Bussi, G. Using the maximum entropy principle to combine simulations and solution experiments. Computation 6, 15 (2018) — a review of the method and its failure modes.
Różycki, B., Kim, Y. C. & Hummer, G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure 19, 109–116 (2011) — EROS; origin of the global scaling parameter \(\theta\) and the SAXS scale/offset problem that motivates iBME.
Bottaro, S. & Lindorff-Larsen, K. Biophysical experiments and biomolecular simulations: A perfect match? Science 361, 355–360 (2018) — perspective on integrating simulation and experiment.

The ExperimentalObservable container is documented under Ensemble reweighting (frame weights) (it is shared with COPER).

class soursop.ssbme.BME(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

Bayesian Maximum Entropy reweighting.

Finds the minimally biased frame weights that bring ensemble averages into agreement with experimental observables, by maximising entropy relative to the prior subject to a Gaussian-error data restraint.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated observable values, shape (n_frames, n_observables).
initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalized to sum to 1.

Raises:

SSException – If inputs are malformed (see _validate_inputs()).

__init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

fit(max_iterations: int = 50000, optimizer: str = 'L-BFGS-B', verbose: bool = True, *, theta: float | None = None, auto_theta: bool = True, theta_scan_kwargs: dict | None = None) → BMEResult[source]

Fit BME weights at a fixed or automatically selected theta.

Parameters:

max_iterations (int, optional) – Maximum optimizer iterations. Default DEFAULT_MAX_ITERATIONS.
optimizer (str, optional) – scipy.optimize.minimize method. Default "L-BFGS-B".
verbose (bool, optional) – Print progress. Default True.
theta (float, optional) – If given, use this theta (no scan). Must be positive.
auto_theta (bool, optional) – If True and theta is None, run an L-curve scan to pick theta. Default True.
theta_scan_kwargs (dict, optional) – Extra keyword arguments forwarded to scan_theta().

Return type:

BMEResult

Raises:

SSException – If theta is not positive.

scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) → ThetaScanResult[source]

Run an L-curve theta scan using this instance’s data.

Parameters:

theta_range – See theta_scan().
n_points – See theta_scan().
log_scale – See theta_scan().
method – See theta_scan().
verbose – See theta_scan().

Return type:

ThetaScanResult

predict(calculated_values: ndarray) → ndarray[source]

Weighted means of arbitrary observables using the fitted weights.

Parameters:: calculated_values (numpy.ndarray) – Per-frame values, shape (n_frames, n_observables).
Returns:: Weighted means, shape (n_observables,).
Return type:: numpy.ndarray
Raises:: SSException – If fit() has not succeeded, or frame count mismatches.

class soursop.ssbme.iBME(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

Iterative Bayesian Maximum Entropy reweighting.

At each iteration a (optionally error-weighted) linear regression of the ensemble-averaged calculated values against the experimental values is used to refit a global scale alpha and offset beta; the calculated values are rescaled (calc -> alpha*calc + beta) and a standard BME maxent step is run. The procedure repeats until the change in chi-squared drops below ftol. iBME is appropriate when there is an unknown global scale/offset between calculated and experimental data (e.g. SAXS). Ported from the reference implementation of Bottaro et al.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables to fit against.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape (n_frames, n_observables).
initial_weights (numpy.ndarray, optional) – Prior frame weights. Uniform if None.

Raises:

SSException – If inputs are malformed (see _validate_inputs()).

__init__(observables: List[ExperimentalObservable], calculated_values: ndarray, initial_weights: ndarray | None = None)[source]

fit(theta: float, ftol: float = 0.01, max_ibme_iterations: int = 50, fit_offset: bool = True, lr_weights: bool = True, max_iterations: int = 50000, optimizer: str = 'L-BFGS-B', verbose: bool = True) → BMEResult[source]

Run iterative BME at a fixed theta.

Parameters:

theta (float) – Regularisation strength. Must be positive.
ftol (float, optional) – Convergence tolerance on |delta chi-squared| between iterations. Default 0.01.
max_ibme_iterations (int, optional) – Maximum number of scale/offset+BME iterations. Default 50.
fit_offset (bool, optional) – If True fit a scale and offset; if False fit scale only. Default True.
lr_weights (bool, optional) – If True weight the linear regression by 1/sigma^2; otherwise use uniform weights. Default True.
max_iterations (int, optional) – Maximum optimizer iterations for each inner BME step.
optimizer (str, optional) – scipy.optimize.minimize method for each inner BME step.
verbose (bool, optional) – Print per-iteration progress. Default True.

Returns:

Result with the final weights, scale/offset (net, relative to the original input), the ibme_iterations log, chi_squared_initial from the first iteration’s pre-fit chi-squared, and phi computed against the original prior.

Return type:

BMEResult

Raises:

SSException – If theta is not positive.

scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) → ThetaScanResult[source]

Run an L-curve theta scan using iterative BME.

Parameters:

theta_range – See theta_scan().
n_points – See theta_scan().
log_scale – See theta_scan().
method – See theta_scan().
verbose – See theta_scan().
fit_kwargs (dict, optional) – Extra keyword arguments forwarded to fit() for each theta (e.g. ftol, max_ibme_iterations, fit_offset).

Return type:

ThetaScanResult

predict(calculated_values: ndarray) → ndarray[source]

Weighted means of arbitrary observables using the fitted weights.

Parameters:: calculated_values (numpy.ndarray) – Per-frame values, shape (n_frames, n_observables).
Returns:: Weighted means, shape (n_observables,).
Return type:: numpy.ndarray
Raises:: SSException – If fit() has not succeeded, or frame count mismatches.

get_ibme_weights() → ndarray[source]

Final iBME weights.

Return type:: numpy.ndarray
Raises:: SSException – If fit() has not been called.

get_ibme_stats() → List[dict][source]

Per-iteration iBME log (scale, offset, chi-squared, diff).

Return type:: list of dict
Raises:: SSException – If fit() has not been called.

class soursop.ssbme.BMECustom(experiment: ndarray, calculated_values: ndarray, uncertainty=None, cost_function=None, initial_weights: ndarray | None = None)[source]

BME reweighting against an experimental vector with a custom cost.

A “BME variant” that reweights an ensemble against a whole experimental profile (e.g. a SAXS or PRE curve), with an optional user-supplied goodness-of-fit function. It is the penalty form of Bayesian Ensemble Refinement: it minimises

\[\mathcal{L}(w) = \text{cost}(w) + \theta\, D_{\mathrm{KL}}(w\,\|\,w^0)\]

directly over the n frame weights (subject to the simplex w_i \ge 0, \sum_i w_i = 1) using SciPy’s trust-constr. The entropy penalty theta plays the same role as in BME: large theta keeps the ensemble close to the prior, small theta fits the data harder.

Unlike BME (which exploits a closed-form exponential solution special to the Gaussian chi-squared), an arbitrary cost has no such closed form, so the weights are optimised directly. The default cost reproduces BME’s reduced chi-squared exactly.

Parameters:

experiment (numpy.ndarray) – Experimental vector, shape (m,) (e.g. one value per SAXS q-point or per PRE residue).
calculated_values (numpy.ndarray) – Per-conformer calculated values for the same observable, shape (n_frames, m).
uncertainty (float or numpy.ndarray, optional) – Experimental uncertainty used by the default chi-squared cost: a scalar (broadcast to all m points) or a length-m vector. Ignored when cost_function is supplied. Defaults to 1.0.
cost_function (callable, optional) – Custom goodness-of-fit, called as cost_function(experiment, calculated_values, weights) -> float, where experiment is (m,), calculated_values is (n_frames, m) and weights is the current (n_frames,) weight vector. Lower is better. If None (default), the reduced chi-squared mean_k ((<calc>_k - V_k)/sigma_k)^2 is used, where <calc>_k = sum_i w_i calc[i, k].
initial_weights (numpy.ndarray, optional) – Prior frame weights. If None, uniform weights are used. Internally normalised to sum to 1.

Raises:

SSException – If inputs are malformed.

Notes

With a custom cost_function the optimiser uses a finite-difference gradient, so very large ensembles or expensive cost functions can be slow; the default chi-squared path uses an analytic gradient. The problem is convex (unique optimum) when the cost is convex in w (the default chi-squared is); an arbitrary cost is optimised locally.

Examples

>>> import numpy as np
>>> from soursop.ssbme import BMECustom
>>> # SAXS-like profile: m points, n conformers
>>> bme = BMECustom(I_exp, calc_I, uncertainty=sigma_exp)
>>> result = bme.fit(theta=1.0)
>>> weights = result.weights
>>>
>>> # custom cost: chi-squared on a log scale
>>> def log_chi2(exp, calc, w):
...     avg = w @ calc
...     return float(np.mean((np.log(avg) - np.log(exp)) ** 2))
>>> result = BMECustom(I_exp, calc_I, cost_function=log_chi2).fit(theta=1.0)

__init__(experiment: ndarray, calculated_values: ndarray, uncertainty=None, cost_function=None, initial_weights: ndarray | None = None)[source]

fit(theta: float = 1.0, max_iterations: int = 2000, optimizer: str = 'L-BFGS-B', verbose: bool = True) → BMECustomResult[source]

Reweight by minimising cost(w) + theta * D_KL(w || w0).

Internally the simplex w_i >= 0, sum_i w_i = 1 is enforced via a softmax reparameterisation (w = softmax(z)), reducing the problem to an unconstrained minimisation over z solved by L-BFGS-B. This avoids boundary kinks in log w and gives clean convergence on both the analytic-cost and custom-cost paths.

Parameters:

theta (float, optional) – Entropy-penalty strength (must be positive). Default 1.0.
max_iterations (int, optional) – Maximum optimizer iterations. Default 2000.
optimizer (str, optional) – scipy.optimize.minimize method for the unconstrained problem in the softmax-reparameterised variable z. Default "L-BFGS-B".
verbose (bool, optional) – Print progress. Default True.

Return type:

BMECustomResult

Raises:

SSException – If theta is not positive.

scan_theta(theta_range: Tuple[float, float] | ndarray = (0.01, 100.0), n_points: int = 12, log_scale: bool = True, method: str = 'perpendicular', verbose: bool = False) → ThetaScanResult[source]

Scan theta and pick the cost vs. relative-entropy knee.

The L-curve analogue of BME.scan_theta(). The returned ThetaScanResult stores the cost in its chi_squared_values field (the generic fit metric for this variant).

Parameters:

theta_range – See theta_scan(). Default range (0.01, 100.0).
n_points – See theta_scan(). Default range (0.01, 100.0).
log_scale – See theta_scan(). Default range (0.01, 100.0).
method – See theta_scan(). Default range (0.01, 100.0).
verbose – See theta_scan(). Default range (0.01, 100.0).

Return type:

ThetaScanResult

predict(calculated_values: ndarray) → ndarray[source]

Weighted means of arbitrary observables using the fitted weights.

Raises:: SSException – If fit() has not succeeded, or frame count mismatches.

class soursop.ssbme.BMEResult(weights: ~numpy.ndarray, initial_weights: ~numpy.ndarray, lambdas: ~numpy.ndarray, chi_squared_initial: float, chi_squared_final: float, phi: float, n_iterations: int, success: bool, message: str, theta: float, observables: ~typing.List[~soursop.ssutils.ExperimentalObservable], calculated_values: ~numpy.ndarray, metadata: dict, scale: float | None = None, offset: float | None = None, ibme_iterations: ~typing.List[dict] = <factory>)[source]

Container for a BME / iBME reweighting result.

weights

Optimized (posterior) frame weights, summing to 1.

Type:: numpy.ndarray

initial_weights

Prior frame weights.

Type:: numpy.ndarray

lambdas

Optimized Lagrange multipliers (one per observable).

Type:: numpy.ndarray

chi_squared_initial

Reduced chi-squared before reweighting.

Type:: float

chi_squared_final

Reduced chi-squared after reweighting.

Type:: float

phi

Fraction of effective frames, exp(-D_KL(w || w0)).

Type:: float

n_iterations

Number of optimizer iterations (maxent) or iBME iterations.

Type:: int

success

Whether the optimization succeeded.

Type:: bool

message

Optimizer status message.

Type:: str

theta

Regularisation strength used.

Type:: float

observables

Observables used in the fit.

Type:: list of ExperimentalObservable

calculated_values

Calculated values used in the fit (for iBME, the final scaled array).

Type:: numpy.ndarray

metadata

Free-form metadata (optimizer, timestamp, …).

Type:: dict

scale

Final iBME scale factor (alpha) relative to the original input. None for plain BME.

Type:: float, optional

offset

Final iBME offset (beta) relative to the original input. None for plain BME.

Type:: float, optional

ibme_iterations

Per-iteration iBME log (empty for plain BME). Each entry has iteration, scale, offset, chi_squared and diff.

Type:: list of dict

predict(calculated_values: ndarray) → ndarray[source]

Weighted average of arbitrary observables using these weights.

Parameters:: calculated_values (numpy.ndarray) – Per-frame observable values, shape (n_frames, n_observables). n_frames must match the fitted ensemble.
Returns:: Weighted means, shape (n_observables,).
Return type:: numpy.ndarray
Raises:: SSException – If the number of frames does not match the fitted ensemble.

diagnostics(warn_threshold: float = 0.5) → dict[source]

Diagnose the reweighting result and flag potential issues.

Parameters:: warn_threshold (float, optional) – Minimum acceptable phi before a low-diversity warning is raised. Default 0.5.
Returns:: Diagnostic metrics and a list of human-readable warnings.
Return type:: dict

Notes

Two effective sample sizes are reported: the entropy-based neff_entropy (N * phi, the usual BME measure) and the Renyi-2 / participation ratio neff_renyi2 (1 / sum w^2), which is more sensitive to a few dominant weights.

print_diagnostics(warn_threshold: float = 0.5)[source]

Print a formatted diagnostic report for this result.

Parameters:: warn_threshold (float, optional) – Passed through to diagnostics(). Default 0.5.

class soursop.ssbme.BMECustomResult(weights: ndarray, initial_weights: ndarray, cost_initial: float, cost_final: float, phi: float, n_iterations: int, success: bool, message: str, theta: float, experiment: ndarray, calculated_values: ndarray, metadata: dict)[source]

Container for a BMECustom reweighting result.

Mirrors BMEResult but for the raw vector/matrix + custom-cost variant: the scalar fit metric is a general cost (the reduced chi-squared by default, or whatever the user’s cost_function returns) rather than a constraint-aware chi-squared, and the experimental data are held as plain arrays rather than a list of ExperimentalObservable.

weights

Optimized (posterior) frame weights, summing to 1.

Type:: numpy.ndarray

initial_weights

Prior frame weights.

Type:: numpy.ndarray

cost_initial

Cost (goodness-of-fit) at the prior weights.

Type:: float

cost_final

Cost at the optimized weights.

Type:: float

phi

Fraction of effective frames, exp(-D_KL(w || w0)) in (0, 1].

Type:: float

n_iterations

Optimizer iterations.

Type:: int

success

Whether the optimization succeeded.

Type:: bool

message

Optimizer status message.

Type:: str

theta

Entropy-penalty strength used.

Type:: float

experiment

Experimental vector, shape (m,).

Type:: numpy.ndarray

calculated_values

Per-conformer calculated matrix used in the fit, shape (n, m).

Type:: numpy.ndarray

metadata

Free-form metadata (optimizer, whether a custom cost was used, …).

Type:: dict

predict(calculated_values: ndarray) → ndarray[source]

Weighted average of arbitrary observables using these weights.

Parameters:: calculated_values (numpy.ndarray) – Per-frame values, shape (n_frames, ...); n_frames must match the fitted ensemble.
Returns:: Weighted average over frames (axis 0).
Return type:: numpy.ndarray
Raises:: SSException – If the number of frames does not match the fitted ensemble.

diagnostics(warn_threshold: float = 0.5) → dict[source]

Diagnose the reweighting result and flag potential issues.

Parameters:: warn_threshold (float, optional) – Minimum acceptable phi before a low-diversity warning is raised. Default 0.5.
Returns:: Diagnostic metrics and a list of human-readable warnings.
Return type:: dict

print_diagnostics(warn_threshold: float = 0.5)[source]: Print a formatted diagnostic report for this result.

class soursop.ssbme.ThetaScanResult(theta_values: ndarray, chi_squared_values: ndarray, phi_values: ndarray, kl_divergence_values: ndarray, results: List[BMEResult], optimal_theta: float, optimal_idx: int, method: str)[source]

Container for an L-curve theta scan.

theta_values

Scanned theta grid.

Type:: numpy.ndarray

chi_squared_values

Final chi-squared at each theta.

Type:: numpy.ndarray

phi_values

Effective-fraction (phi) at each theta.

Type:: numpy.ndarray

kl_divergence_values

Relative entropy at each theta.

Type:: numpy.ndarray

results

Per-theta fit results.

Type:: list of BMEResult

optimal_theta

Selected theta (L-curve knee).

Type:: float

optimal_idx

Index of optimal_theta in theta_values.

Type:: int

method

Human-readable knee-selection method used.

Type:: str

print_summary()[source]: Print a formatted summary of the theta scan.

soursop.ssbme.theta_scan(observables: List[ExperimentalObservable], calculated_values: ndarray, reweighter: str = 'bme', theta_range: Tuple[float, float] | ndarray = (0.01, 10.0), n_points: int = 15, log_scale: bool = True, initial_weights: ndarray | None = None, method: str = 'perpendicular', verbose: bool = False, fit_kwargs: dict | None = None) → ThetaScanResult[source]

Scan a grid of theta values for BME or iBME and pick the L-curve knee.

Parameters:

observables (list of ExperimentalObservable) – Experimental observables.
calculated_values (numpy.ndarray) – Per-frame calculated values, shape (n_frames, n_observables).
reweighter (str, optional) – "bme" (default) or "ibme".
theta_range (tuple or numpy.ndarray, optional) – (min, max) for a generated grid of n_points, or an explicit 1D array of theta values. Default (0.01, 10.0).
n_points (int, optional) – Number of grid points when theta_range is a tuple. Default 15.
log_scale (bool, optional) – Sample logarithmically when theta_range is a tuple. Default True.
initial_weights (numpy.ndarray, optional) – Prior weights. Uniform if None.
method (str, optional) – Knee-selection method (see find_optimal_theta()).
verbose (bool, optional) – Print per-theta progress. Default False.
fit_kwargs (dict, optional) – Extra keyword arguments forwarded to the reweighter’s fit (e.g. ftol/max_ibme_iterations for iBME).

Return type:

ThetaScanResult

Raises:

SSException – If reweighter is unknown, n_points < 1, or a log-scale grid has non-positive endpoints.