Welcome to the SOURSOP Documentation!
Last updated: June 2026
SOURSOP (Simulation analysis Of Unfolded RegionS Of Proteins) is a Python package for the analysis of all-atom and coarse-grained simulations of unfolded and disordered proteins. It provides a wide range of functionality that may not be relevant for folded proteins but is essential for extracting polymer-physics insight from simulations of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). SOURSOP was formerly CAMPARITraj, which was formerly CTraj, and includes all the original functionality therein.
The SOURSOP GitHub page can be accessed here: https://github.com/holehouse-lab/soursop.
SOURSOP was built with the CAMPARI simulation engine in mind, but has been successfully tested on a wide range of trajectories generated by different software packages. It uses the mdtraj (https://mdtraj.org) backend for trajectory reading and representation, and focusses on analysis routines for characterizing ensembles of disordered and unfolded proteins through the lens of polymer physics. It works for both all-atom and one-bead-per-residue coarse-grained ensembles.
Why SOURSOP?
Most molecular-simulation analysis tools are oriented towards folded proteins (RMSD to a native state, secondary-structure stability, binding-pocket geometry). Disordered proteins have no single reference structure, so the questions - and the right observables - are different. SOURSOP focusses on the ensemble- and polymer-centric quantities that are meaningful for IDPs/IDRs:
Global dimensions - radius of gyration, hydrodynamic radius, end-to-end distance, asphericity, the gyration tensor, and the dimensionless size parameter \(\langle t \rangle\).
Polymer scaling - internal-scaling profiles, apparent scaling exponents (with frame-level bootstrap confidence intervals and a reduced-\(\chi^2\) fit-quality estimate), and homopolymer-deviation maps.
Distance & contact maps - mean / RMS inter-residue distance maps and fractional contact maps, including fast inter-chain maps for multi-chain systems.
Local structure - DSSP and BBSEG2 secondary structure, dihedral angles and dihedral mutual information, sliding-window local heterogeneity and local collapse.
Solvent exposure - per-residue / per-atom / sidechain / backbone SASA and regional accessibility.
NMR & PRE observables - sequence-corrected random-coil chemical shifts, backbone ³J(HN, Hα) scalar couplings, and per-frame NOE distances (
ssnmr), plus synthetic paramagnetic relaxation enhancement profiles (sspre) for direct comparison with experiment.HDX protection factors - per-residue Best-Vendruscolo ln(P) from heavy-atom contacts and backbone H-bonds (
sshdx), ready for reweighting against experimental HDX data.Sampling quality - assessment of ensemble convergence via the PENGUIN tools in
sssampling.Ensemble reweighting - every ensemble-average observable accepts an optional per-frame
weightsvector, applied consistently and deterministically, for re-weighted / enhanced-sampling / maximum-entropy ensembles (see Ensemble reweighting (frame weights)). Weights can be derived directly from experimental data by Bayesian Maximum Entropy / iterative BME (see ssbme) or by COPER / iterative COPER (see sscoper) reweighting.
In addition to the pre-built analyses, SOURSOP gives easy and rapid access to all inter-residue and inter-atomic distances, so custom observables are straightforward to build.
Quickstart
pip install soursop # or: uv pip install soursop
from soursop.sstrajectory import SSTrajectory
# read a trajectory (trajectory file + topology/PDB file)
traj = SSTrajectory('traj.xtc', 'start.pdb')
# each protein chain is an SSProtein object
protein = traj.proteinTrajectoryList[0]
# per-frame radius of gyration, and the ensemble mean
rg = protein.get_radius_of_gyration()
print(rg.mean())
# a re-weighted ensemble average (e.g. from an MSM / MaxEnt reweighting)
rg_reweighted = protein.get_radius_of_gyration(weights=my_weights)
# mean inter-residue distance map and a contact map
dmap, dstd = protein.get_distance_map()
cmap, corder = protein.get_contact_map()
See Overview for the core concepts (SSTrajectory vs. SSProtein, multi-chain systems) and Examples for eleven worked, end-to-end IDP analyses.
Documentation map
Overview - core concepts and how the pieces fit together.
Installation - install with pip, uv, or conda (PyPI or GitHub), and how to run the tests.
Examples - worked, copy-pasteable IDP analysis recipes.
Ensemble reweighting (frame weights) - the consistent ensemble-reweighting (frame
weights) system and the shared validation helpers.Development - extending SOURSOP, the plugin system, and contributing.
Module API references - sstrajectory, ssprotein, ssnmr, sspre, sssampling, ssbme, sscoper, sshdx.
Citing SOURSOP
If you use SOURSOP in your work, please cite:
Lalmansingh, J. M., Keeley, A. T., Ruff, K. M., Pappu, R. V. & Holehouse, A. S. SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins. J. Chem. Theory Comput. (2023). doi:10.1021/acs.jctc.3c00190.
Reporting bugs & requesting features
Please report bugs, typos, or unexpected behaviour on the GitHub issue tracker. Contributions are welcome - see Development for the plugin workflow and contribution guidelines.
Changelog
Update: June 2026 (2.0.0) A major release. Highlights:
Bug fixes across
ssnmr(phospho-residue glycine corrections),ssprotein(native-contacts NaN/overflow, glycine sidechain contacts, cluster centroid),sstrajectory(interchain-map cap-residue handling),ssutils(thread count on Apple Accelerate) andssmutualinformation(a latentcalc_MIweighted-path bug).Polymer-scaling error propagation -
get_scaling_exponent(and the fit used byget_polymer_scaled_distance_map) now reports a proper, dimensionally consistent reduced \(\chi^2\) and frame-level bootstrap confidence intervals for \(\nu\) / \(A_0\) (replacing the earlier min/max range over disjoint data chunks). This also fixes a broken log-log \(\chi^2\) residual and a latent ragged-array crash on short/few-frame trajectories. API change:subdivision_batch_sizeis replaced byn_bootstrapandconfidence_interval, and return slots 3–6 are now confidence-interval bounds (nu_ci_low/high,A0_ci_low/high).Performance - large behaviour-preserving (byte-identical) speed-ups: the inter-chain distance/contact maps in
sstrajectoryand the O(n2) CA-mode polymer-scaling loops inssprotein(get_internal_scaling,get_scaling_exponent,get_local_to_global_correlation); a newstrideparameter onget_interchain_contact_map/get_interchain_distance.Consistent ensemble reweighting - every function returning an ensemble-average value now accepts an optional per-frame
weightsvector, applied deterministically (no stochastic resampling) and validated by a single sharedssutils.validate_weights;stride+weightsnow work together correctly. See Ensemble reweighting (frame weights).Reweighting against experimental data - two new modules derive frame
weightsfrom experimental observables:ssbme(Bayesian/MaxEntBME, the iterative scale/offsetiBME, and a vector/matrix variantBMECustomthat accepts an arbitrary user goodness-of-fit function; see ssbme) andsscoper(Convex Optimization for Ensemble ReweightingCOPERandiCOPER, a hard-χ²-constraint alternative that also reports infeasibility; see sscoper). The two share an identicalExperimentalObservableinterface viassutils.Scalar (J) couplings in ssnmr - new
compute_J3_HN_HAcomputes the backbone ³J(HN, Hα) scalar coupling per frame per residue from the φ dihedral via the Karplus relation (six literature parameterisations: Bax2007/Bax1997/Ruterjans1999/Habeck/Vuister/Pardi, ported from biceps). A generickarplus(...)evaluator is also exposed for arbitrary Karplus-form coefficients. Output shape(n_frames, n_phi)is the natural input for the BME / COPER reweighters; see ssnmr.NOE distances in ssnmr - new
compute_NOE_distancesreturns per-frame inter-atom distances (Å) for arbitrary atom pairs, andnoe_ensemble_averagecollapses them via the NOE \(\langle r^{-p}\rangle^{-1/p}\) convention (default \(p = 6\)). The per-frame matrix is BME/COPER-ready against experimental NOE intensities; see ssnmr.HDX protection factors (new ``sshdx`` module) - per-residue ln(P) via the Best-Vendruscolo formula \(\ln P_i = \beta_c N_c(i) + \beta_h N_h(i) + \beta_0\) from per-residue heavy-atom contacts (
compute_Nc()) and backbone H-bond counts (compute_Nh(), viamdtraj.wernet_nilsson()). Drop-in input for reweighting against experimental HDX protection factors; see sshdx.Testing - new
test_weights.py(~68 parametrized tests),test_ssbme.pyandtest_sscoper.py(BME/iBME and COPER/iCOPER), plus the pickle-based regression suite (~600+ observable tests) and extendedsstrajectorycoverage.Documentation - full numpy-style docstring rewrite across all major modules, narrative overviews on every page, rewritten worked examples (including BME and COPER reweighting), new
bmeandcoperAPI pages, a new ensemble-reweighting page, an expanded front page, and a worked SAXS-reweighting demo (four notebooks underdemo_examples/).Packaging - reconciled dependency manifests, refreshed CI (Python 3.9-3.14), removed dead config.
Update: November. 2024 (0.2.6) Transition to pyproject.toml for packaging and versioning. Dramatically improved performance of loading coarse-grained ensembles.
Update: July. 2024 Official inclusion of the PENGUIN code into sssampling is now complete ahead of preprint!
Update: July. 2023
Added explicit_residue_checking option to SSTrajectory constructor to make parsing solvated .gro files or files where non-protein molecules are included in the same chain possible and easy.
Update: Jan. 2023 Finalization of code and documentation ahead of preprint deposition.
Update: Sept. 2022 Additional tests and docs updates ahead of preprint.
Update: April 2022 Moved SOURSOP onto PyPI in anticipation of the final release. Additional tests, code clean up etc.
Update: March 2022 Numerous updates to internal code documentation, removal of soursop_cli, update to pip install over git to use https.
Update: July 2021 The anticipated release of soursop is August 2021! We have nearly finished all testing and are finalizing the associated manuscript.
About
SOURSOP was built by Jared Lalmansingh (Pappu lab) and Alex Holehouse. Its development was supported financially and intellectually by the Molecular Sciences Software Institute (MOLSSI). It was also supported by NSF grant no. 2128068 to Alex, and we thank members of the Water and Life Interface Institute (WALII), supported by NSF DBI grant #2213983, for helpful discussions.