sssampling

Overview

sssampling implements PENGUIN (Pipeline for Evaluating coNformational heteroGeneity in Unstructured proteINs), an algorithmic framework for quantifying per-residue local conformational sampling quality in molecular dynamics or Monte Carlo simulations of intrinsically disordered proteins (IDRs).

The method and its applications are described in full in:

Lotthammer J.M. & Holehouse A.S. “Disentangling Folding from Energetic Traps in Simulations of Disordered Proteins.” J. Chem. Inf. Model. (2025). https://doi.org/10.1021/acs.jcim.4c02005

Why sampling assessment is hard for IDRs

Folded proteins are naturally assessed by convergence of structural metrics around a reference structure. IDRs have no such reference: they occupy a vast, nearly flat energy landscape populated by many distinct low-energy configurations. Two fundamentally different failure modes can therefore afflict IDR simulations:

Energetic trapping / poor sampling — a subregion becomes stuck in one (or several independent but incompatible) local minima, yielding simulation-to-simulation variation that cannot be attributed to sequence-encoded conformational preference.
Local or global folding — a subregion reproducibly collapses to the same specific conformation across all independent replicas, reflecting a genuine (transient or persistent) structural preference rather than simulation artefact.

Global metrics such as \(R_g\) or autocorrelation convergence are insensitive to these local failures: a single poorly-sampled subregion is easily masked by a well-sampled majority. PENGUIN therefore operates at residue-level granularity and is designed to distinguish these two scenarios.

The excluded-volume reference model

PENGUIN leverages the excluded-volume (EV) limiting polymer model as a theoretical upper bound on local conformational heterogeneity. In the EV limit all inter-residue attractive terms are removed from the force field; only steric repulsion, bond geometry, and short-range repulsive potentials are retained. The result is a self-avoiding random coil whose backbone dihedral distributions are maximally broad — effectively a sequence-specific ceiling on what well-sampled disordered behaviour looks like.

Comparing dihedral distributions from a full-Hamiltonian (FH) simulation to the EV reference yields an information-theoretic score for each residue. If the FH and EV distributions are similar, the residue is broadly sampling its available dihedral space. If they are very different, the residue has adopted a specific (or limited) set of dihedrals that deviate from random-coil statistics.

SOURSOP provides two routes to the EV reference:

Run your own EV simulations and supply the resulting trajectory files as reference_list to SamplingQuality.
Use precomputed distributions (the default). SOURSOP ships with tabulated EV dihedral angle distributions for all 8000 possible amino-acid tripeptide contexts (all 20³ combinations of i-1, i, i+1). The PrecomputedDihedralInterface class handles look-up and sampling from these tables, eliminating the need for bespoke EV runs.

The Hellinger distance

PENGUIN uses the Hellinger distance \(H(P, Q)\) to compare dihedral probability distributions. For two discrete probability vectors \(P = (p_1, \ldots, p_k)\) and \(Q = (q_1, \ldots, q_k)\):

\[H(P, Q) = \frac{1}{\sqrt{2}} \sqrt{\sum_{i=1}^{k} \left(\sqrt{p_i} - \sqrt{q_i}\right)^2}\]

The distance is bounded in \([0, 1]\): 0 means the distributions are identical; 1 means they have completely disjoint support. The Hellinger distance is symmetric, handles empty bins naturally, and is fast to compute — important for running many replicate comparisons. A 2D variant (simultaneous \(\varphi\)/\(\psi\) comparison) is the default in SOURSOP and is generally preferred because it captures correlated backbone preferences that the marginal 1D distributions can miss.

Two complementary comparisons are performed for each residue:

FH vs. EV — how far are the simulated dihedrals from the maximally heterogeneous reference? Values near 0 indicate EV-like sampling (a necessary and sufficient condition for good sampling); values near 1 indicate restricted dihedrals.
FH vs. FH (all-to-all) — how consistent are the dihedral distributions across independent replicas? Near-zero inter-replica distances indicate reproducible behaviour (either good sampling or consistent folding). Large inter-replica scatter indicates distinct trapped states, i.e., poor sampling.

Together, these two views let you distinguish the three main scenarios:

FH vs. EV	FH vs. FH (all-to-all)	Interpretation
Near 0	Near 0	Well-sampled, disordered
Elevated, consistent	Near 0	Locally folded / transient structure
Elevated, variable	Large	Energetically trapped (poor sampling)

Typical workflow

Most users will interact with SamplingQuality, which accepts a list of replicate trajectory files and (optionally) a list of reference trajectories. If no reference is provided it falls back to the precomputed EV distributions via PrecomputedDihedralInterface:

from soursop.sssampling import SamplingQuality

# 4 independent replicate simulations, all vs. precomputed EV reference
sq = SamplingQuality(
    traj_list=['rep0.xtc', 'rep1.xtc', 'rep2.xtc', 'rep3.xtc'],
    top_file='topology.pdb',
)

# Per-residue Hellinger distances (FH vs. EV and FH vs. FH all-to-all)
fh_vs_ev   = sq.compute_dihedral_hellingers()
fh_vs_fh   = sq.get_all_to_all_2d_trj_comparison()

# 4-panel PENGUIN summary figure
sq.quality_plot()

For cases where you want to supply your own EV trajectories (e.g., from a CAMPARI run in the excluded-volume limit):

sq = SamplingQuality(
    traj_list=['fh_rep0.xtc', 'fh_rep1.xtc'],
    reference_list=['ev_rep0.xtc', 'ev_rep1.xtc'],
    top_file='topology.pdb',
    ref_top='ev_topology.pdb',
)

Low-level distance functions (hellinger_distance(), compute_joint_hellinger_distance(), rel_entropy()) are also available for custom analyses.

SamplingQuality Functions

class soursop.sssampling.SamplingQuality(traj_list: List[str], reference_list: List[str] | None = None, top_file: str = '__START.pdb', ref_top: str | None = None, method: str = '2D angle distributions', bwidth: float = np.float64(0.2617993877991494), proteinID: int = 0, n_cpus: int | None = None, truncate: bool = False, force_sequential: bool = False, **kwargs: dict)[source]

compute_frac_helicity(proteinID: int = 0, recompute: bool = False) → ndarray[source]

Per-residue fractional helicity for every loaded trajectory and reference.

Helicity is taken directly from SSProtein.get_secondary_structure_DSSP() (the helix column of the DSSP summary). If no reference trajectories were supplied, the reference helicity is zeros — the precomputed EV polymer reference is dihedral-only and has no DSSP equivalent.

Results are cached on the instance; pass recompute=True to bypass the cache.

Parameters:

proteinID (int, optional) – Index of the chain in each trajectory’s proteinTrajectoryList. Default 0.
recompute (bool, optional) – If True, ignore any cached result and recompute. Default False.

Returns:

(trj_helicity, ref_helicity) each of shape (n_trajectories, n_residues). When no reference was supplied, ref_helicity is all zeros.

Return type:

tuple of (np.ndarray, np.ndarray)

Example

>>> trj_h, ref_h = sq.compute_frac_helicity()

compute_dihedral_hellingers() → ndarray[source]

Per-residue Hellinger distance between simulated and reference dihedrals.

Behaviour depends on the method set at construction:

'2D angle distributions': histograms the joint (phi, psi) distribution per residue, then computes one joint-Hellinger distance per (trajectory, residue) pair via compute_joint_hellinger_distance(). Returns a shape (n_trajectories, n_residues) array.
'1D angle distributions': histograms phi and psi separately and computes a Hellinger distance for each. Returns a shape (2, n_trajectories, n_residues) array stacked as [phi_hellingers, psi_hellingers].

Returns:: Hellinger distances, shape depending on method (see above).
Return type:: np.ndarray
Raises:: NotImplementedError – If method is not one of the two supported strings.

Example

>>> H = sq.compute_dihedral_hellingers()
>>> H.shape         # for 2D angle distributions
(3, 56)

compute_dihedral_rel_entropy() → ndarray[source]

Per-residue Kullback-Leibler relative entropy between simulated and reference dihedrals.

Histograms phi and psi 1D distributions independently, then computes \(D_{KL}(P || Q)\) per residue via rel_entropy().

Returns:: Array of shape (2, n_trajectories, n_residues) stacked as [phi_rel_entropy, psi_rel_entropy]. Values are in nats.
Return type:: np.ndarray

Example

>>> rel_e = sq.compute_dihedral_rel_entropy()
>>> rel_e.shape
(2, 3, 56)

compute_series_of_histograms_along_axis(data: ndarray, bins: ndarray, axis: int = 0)[source]

2D (phi, psi) PDFs for every (trajectory, residue) pair.

Builds an n_trajectories x n_residues grid of 2D joint histograms (one per residue) and normalises them so each is a probability density. The result is the per-pair PDF stack consumed by compute_dihedral_hellingers() in 2D mode.

Parameters:

data (np.ndarray) – 4D array of shape (2, n_trajectories, n_residues, n_frames) where the leading axis stacks [phi_angles, psi_angles].
bins (np.ndarray) – 1D bin edges shared by both phi and psi axes.
axis (int, optional) – Retained for API compatibility; the function always reduces over the frame axis internally. Default 0.

Returns:

PDFs of shape (n_trajectories, n_residues, len(bins)-1, len(bins)-1). Each [i, j] slice is a normalised joint phi/psi distribution that sums to 1.

Return type:

np.ndarray

Example

>>> pdfs = sq.compute_series_of_histograms_along_axis(
...     np.array([sq.phi_angles, sq.psi_angles]),
...     bins=sq.bins,
... )

compute_pdf(arr: ndarray, bins: ndarray) → ndarray[source]

Per-residue 1D probability density histograms.

Operates on either a 2D (n_residues, n_frames) array or a 3D (n_trajectories, n_residues, n_frames) stack. Each residue-level histogram is normalised by np.histogram(..., density=True) and rescaled by the bin width (in degrees), so each row sums to ~1.

Parameters:

arr (np.ndarray) – 2D or 3D angle array. The last axis is the frame axis.
bins (np.ndarray) – 1D array of bin edges.

Returns:

Input 2D -> output (n_residues, len(bins) - 1).
Input 3D -> output (n_trajectories, n_residues, len(bins) - 1).

Return type:

np.ndarray

Example

>>> phi_pdfs = sq.compute_pdf(sq.phi_angles, bins=sq.bins)

get_all_to_all_2d_trj_comparison(metric: str = 'hellingers', recompute=False) → Tuple[pandas.DataFrame][source]

All-vs-all 2D joint-dihedral Hellinger distances across trajectories.

Histograms the joint (phi, psi) distribution per residue for every loaded trajectory, then forms every pairwise trajectory combination (using itertools.combinations) and computes a per-residue Hellinger distance for each pair. With a single trajectory this degenerates to a 1:1 self-comparison (which is always zero) and is useful only as a sanity check.

Parameters:

metric (str, optional) – Currently only 'hellingers' is implemented. Default 'hellingers'.
recompute (bool, optional) – Currently unused (accepted for API symmetry with get_all_to_all_trj_comparisons()). Default False.

Returns:

Shape (n_combinations, n_residues) of pairwise per-residue Hellinger distances in [0, 1].

Return type:

np.ndarray

Example

>>> mat = sq.get_all_to_all_2d_trj_comparison()
>>> mat.shape   # 3 trajs -> C(3,2) == 3 pairs
(3, 56)

get_all_to_all_trj_comparisons(metric: str = 'hellingers', recompute=False) → Tuple[pandas.DataFrame, pandas.DataFrame][source]

All-vs-all per-residue dihedral comparisons (separate phi and psi).

Builds the per-residue 1D phi and psi PDFs for every trajectory, enumerates every pairwise trajectory combination, and computes a per-residue Hellinger distance or relative entropy for each pair. The two dihedrals are kept separate (unlike get_all_to_all_2d_trj_comparison()).

Parameters:

metric ({'hellingers', 'relative entropy'}, optional) – Which divergence to compute. Default 'hellingers'.
recompute (bool, optional) – If True, ignore cached PDFs from self.trj_pdfs and rebuild them. Default False.

Returns:

(phi_df, psi_df) each of shape (n_combinations, n_residues) containing the chosen metric for every pairwise comparison.

Return type:

tuple of (pd.DataFrame, pd.DataFrame)

Raises:

NotImplementedError – If metric is not one of the two supported strings.

Example

>>> phi_df, psi_df = sq.get_all_to_all_trj_comparisons()

get_degree_bins() → ndarray[source]

Histogram bin edges spanning [-180, 180] degrees.

Constructs the bin edges used by every histogram-based method on this class. Uses self.bwidth (in radians) converted to degrees and rounded to handle floating-point error so the final edge lands cleanly on 180.

Returns:: 1D array of bin edges in degrees, monotonically increasing, starting at -180 and ending at 180.
Return type:: np.ndarray

Example

>>> sq.get_degree_bins()       # bwidth = 15 degrees
array([-180., -165., ...,  165.,  180.])

quality_plot(increment: int = 5, figsize: Tuple[int, int] = (7, 5), dpi: int = 400, panel_labels: bool = False, fontsize: int = 10, save_dir: str | None = None, dihedral: None | str = '2D', figname: str = 'hellingers.pdf')[source]

Plot a four-panel sampling-quality summary figure.

The four panels are:

A - per-residue Hellinger distance vs. the chosen reference (e.g. excluded-volume limit) with per-trajectory points and across-trajectory mean.
B - per-residue all-vs-all trajectory Hellinger distances.
C - fractional helicity (simulated trajectories + reference).
D - paired comparison panel (configurable).

Layout is mosaic "AABB;CCDD". The chosen dihedral selector controls which of phi / psi / joint 2D is shown.

Parameters:

increment (int, optional) – X-axis tick stride (residues). Default 5.
figsize (tuple of (int, int), optional) – Figure dimensions in inches. Default (7, 5).
dpi (int, optional) – Output DPI for savefig. Default 400.
panel_labels (bool, optional) – If True, add A/B/C/D panel labels for manuscript figures. Default False.
fontsize (int, optional) – Font size used for tick labels, titles, and axis labels. Default 10.
save_dir (str or None, optional) – If given, write the figure to <save_dir>/<figname>. Default None (no file written; figure is returned only).
dihedral ({'2D', 'phi', 'psi'} or None, optional) – Which dihedral comparison to plot. '2D' requires method='2D angle distributions'. Default '2D'.
figname (str, optional) – File name (joined with save_dir). Default 'hellingers.pdf'.

Returns:

(fig, axd) — the matplotlib figure and the mosaic Axes dictionary keyed by 'A', 'B', 'C', 'D'.

Return type:

tuple

Raises:

ValueError – If method='1D angle distributions' is paired with dihedral='2D'.
NotImplementedError – If a requested combination of method and dihedral isn’t yet supported.

Example

>>> fig, axd = sq.quality_plot(dihedral='phi', save_dir='./figs')

trj_pdfs(dihedral: str = 'joint', recompute: bool = False)[source]

Per-residue PDFs from the simulated trajectories’ dihedral angles.

Builds (and memoises) the three PDF stacks used by Hellinger / relative-entropy calculations against the trajectories:

'trj_phi_pdfs' — 1D phi histogram per residue.
'trj_psi_pdfs' — 1D psi histogram per residue.
'joint' (default) — 2D (phi, psi) histogram per residue.

On every call all three are populated on the cache; the selector determines which is returned. recompute=True bypasses the cache.

Parameters:

dihedral ({'joint', 'trj_phi_pdfs', 'trj_psi_pdfs'}, optional) – Which PDF stack to return. Default 'joint'.
recompute (bool, optional) – If True, ignore any cached PDFs and rebuild. Default False.

Returns:

For 'trj_phi_pdfs' / 'trj_psi_pdfs': (n_trajectories, n_residues, n_bins).
For 'joint': (n_trajectories, n_residues, n_bins, n_bins).

Return type:

np.ndarray

Raises:

NotImplementedError – If dihedral is not one of the three allowed strings.

Example

>>> phi_pdfs = sq.trj_pdfs(dihedral='trj_phi_pdfs')

ref_pdfs(dihedral='joint', recompute=False)[source]

Per-residue PDFs from the reference trajectories’ dihedral angles.

The reference analogue of trj_pdfs(). The three accepted selectors here are 'ref_phi_pdfs', 'ref_psi_pdfs', and 'joint' (default). When no reference trajectories were supplied to the constructor, the reference angles came from the precomputed excluded-volume polymer model.

Parameters:

dihedral ({'joint', 'ref_phi_pdfs', 'ref_psi_pdfs'}, optional) – Which PDF stack to return. Default 'joint'.
recompute (bool, optional) – If True, ignore any cached PDFs and rebuild. Default False.

Returns:

For 'ref_phi_pdfs' / 'ref_psi_pdfs': (n_trajectories, n_residues, n_bins).
For 'joint': (n_trajectories, n_residues, n_bins, n_bins).

Return type:

np.ndarray

Raises:

NotImplementedError – If dihedral is not one of the three allowed strings.

Example

>>> ref_phi = sq.ref_pdfs(dihedral='ref_phi_pdfs')

hellingers_distances(recompute=False)[source]

Cached accessor for per-residue Hellinger distances.

Thin wrapper around compute_dihedral_hellingers() that memoises the result on first call. Pass recompute=True to invalidate the cache and force a fresh computation.

Parameters:

recompute (bool, optional) – If True, rebuild the Hellinger distances from scratch. Default False.

Returns:

Shape and meaning depend on the SamplingQuality method:

'2D angle distributions': (n_trajectories, n_residues) joint Hellinger distances.
'1D angle distributions': (2, n_trajectories, n_residues) stacked as [phi_hellingers, psi_hellingers].

Return type:

np.ndarray

Example

>>> H = sq.hellingers_distances()

fractional_helicity(recompute=False)[source]

Cached accessor for per-residue fractional helicity.

Thin wrapper around compute_frac_helicity() that returns results from the instance cache if available. The recompute=True flag is forwarded so the underlying computation re-runs.

Parameters:: recompute (bool, optional) – If True, bypass the cache and recompute. Default False.
Returns:: (trj_helicity, ref_helicity) each of shape (n_trajectories, n_residues).
Return type:: tuple of (np.ndarray, np.ndarray)

Example

>>> trj_h, ref_h = sq.fractional_helicity()

PrecomputedDihedralInterface Functions

class soursop.sssampling.PrecomputedDihedralInterface(sequence, bins, num_trajs, nsamples)[source]

Reference dihedral provider backed by the precomputed EV polymer model.

Used by SamplingQuality when no explicit reference_list of reference trajectories is supplied. For each residue the appropriate phi / psi distribution is looked up from the excluded-volume polymer reference tables in ssdata, then inverse-CDF sampled to produce a synthetic per-trajectory angle array of the same shape as the simulated trajectories’ dihedral arrays — so the downstream Hellinger and relative-entropy code treats it identically.

Parameters:

sequence (str) – Single-letter amino-acid sequence of the trajectory chain (caps already stripped).
bins (np.ndarray) – Histogram bin edges (in degrees) used for the inverse-CDF sampling. Should match the SamplingQuality instance’s bins.
num_trajs (int) – Number of simulated-trajectory replicas to mimic. The precomputed angles are tiled across this dimension.
nsamples (int) – Number of synthetic frames to generate per replica.

ref_phi_angles, ref_psi_angles

Inverse-CDF-sampled reference angles of shape (num_trajs, n_residues, nsamples).

Type:: np.ndarray

Example

>>> from soursop.sssampling import PrecomputedDihedralInterface
>>> ev = PrecomputedDihedralInterface(
...     sequence='AAAAAAAA', bins=np.arange(-180, 181, 15),
...     num_trajs=3, nsamples=1000,
... )
>>> ev.ref_phi_angles.shape
(3, 8, 1000)

sample_angles(angle)[source]

Inverse-CDF sample reference dihedrals to match a target sample count.

Builds a per-residue histogram from the precomputed reference angles, normalises it to a CDF, and inverse-CDF samples self.nsamples synthetic angles per residue using numpy.random. Used at construction time to populate ref_phi_angles and ref_psi_angles.

Parameters:: angle ({'phi', 'psi'}) – Which backbone dihedral to sample.
Returns:: Array of shape (n_residues, self.nsamples) of synthetic dihedral angles (in degrees) drawn from the precomputed reference distributions.
Return type:: np.ndarray

Example

>>> ev.sample_angles('phi').shape
(8, 1000)

gather_phi_reference_dihedrals(sequence: str) → ndarray[source]

Look up the excluded-volume reference phi distribution for each residue.

For each position i, the relevant phi distribution depends on the chemical context of residue i-1 (the residue preceding the rotatable phi bond). The lookup maps the preceding residue to its EV-table key via EV_RESIDUE_MAPPER and pulls the distribution from PHI_EV_ANGLES_DICT. For position 0 we substitute alanine as the preceding context.

Parameters:: sequence (str) – One-letter amino-acid sequence (no caps).
Returns:: Array of shape (n_residues, n_reference_samples) where each row is the EV reference phi distribution for that residue.
Return type:: np.ndarray

Example

>>> ev.gather_phi_reference_dihedrals('AAAA').shape
(4, 50000)

gather_psi_reference_dihedrals(sequence: str) → ndarray[source]

Look up the excluded-volume reference psi distribution for each residue.

Mirror of gather_phi_reference_dihedrals() for psi: the distribution at position i depends on the following residue i+1 (because psi rotates the i-(i+1) bond). For the last position we substitute alanine as the following context. Reference distributions come from PSI_EV_ANGLES_DICT.

Parameters:: sequence (str) – One-letter amino-acid sequence (no caps).
Returns:: Array of shape (n_residues, n_reference_samples) where each row is the EV reference psi distribution for that residue.
Return type:: np.ndarray

Example

>>> ev.gather_psi_reference_dihedrals('AAAA').shape
(4, 50000)

Stand-alone sssampling functions

soursop.sssampling.rel_entropy(p: ndarray, q: ndarray) → ndarray[source]

Kullback-Leibler relative entropy \(D_{KL}(P || Q)\).

Computed via scipy.special.rel_entr (which handles p == 0 and q == 0 correctly), summed along the last axis. Asymmetric in p and q; the result is always non-negative and is 0 only when p == q almost everywhere.

Parameters:

p (np.ndarray) – Probability distributions of identical shape. The last axis is the distribution axis; leading axes are broadcast.
q (np.ndarray) – Probability distributions of identical shape. The last axis is the distribution axis; leading axes are broadcast.

Returns:

Relative entropy values with shape p.shape[:-1], in nats.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> from soursop.sssampling import rel_entropy
>>> rel_entropy(np.array([0.5, 0.5]), np.array([0.5, 0.5]))
0.0
>>> rel_entropy(np.array([0.9, 0.1]), np.array([0.5, 0.5]))
0.368

soursop.sssampling.hellinger_distance(p: ndarray, q: ndarray) → ndarray[source]

Hellinger distance(s) between pairs of 1D probability distributions.

For each pair the distance is

\[H(P, Q) = \frac{1}{\sqrt{2}} \sqrt{\sum_{i=1}^{k} (\sqrt{p_i} - \sqrt{q_i})^2}\]

which lies in [0, 1]. The reduction is taken along the last axis of p / q, so passing higher-rank arrays computes one distance per leading-axis slice.

Parameters:

p (np.ndarray) – Probability distributions of identical shape. The last axis is treated as the distribution axis; any leading axes are broadcast.
q (np.ndarray) – Probability distributions of identical shape. The last axis is treated as the distribution axis; any leading axes are broadcast.

Returns:

Hellinger distance(s) with shape p.shape[:-1].

Return type:

np.ndarray

Example

>>> import numpy as np
>>> from soursop.sssampling import hellinger_distance
>>> hellinger_distance(np.array([0.5, 0.5]), np.array([0.5, 0.5]))
0.0
>>> # per-residue distances for an (n_residues, n_bins) PDF stack
>>> pdf_a = np.full((10, 20), 1/20)
>>> pdf_b = np.full((10, 20), 1/20)
>>> hellinger_distance(pdf_a, pdf_b).shape
(10,)

soursop.sssampling.compute_joint_hellinger_distance(p, q)[source]

Hellinger distance between two joint probability distributions.

Defined via the Bhattacharyya coefficient \(BC = \sum_i \sqrt{p_i q_i}\) as \(H = \sqrt{1 - BC}\). The normalisation factor of \(1/\sqrt{2}\) used in hellinger_distance() is omitted here because the inputs are already joint (2D) probability surfaces that sum to 1.

Parameters:

p (array_like) – Joint probability distributions of the same shape. Each must sum to 1 (within numerical tolerance) for the result to be a valid Hellinger distance.
q (array_like) – Joint probability distributions of the same shape. Each must sum to 1 (within numerical tolerance) for the result to be a valid Hellinger distance.

Returns:

Hellinger distance in [0, 1]; 0 means identical distributions and 1 means disjoint supports.

Return type:

float

Example

>>> import numpy as np
>>> from soursop.sssampling import compute_joint_hellinger_distance
>>> p = np.eye(2) / 2
>>> compute_joint_hellinger_distance(p, p)
0.0