Welcome to the SOURSOP Documentation!

Last updated: May 2026

SOURSOP (Simulation analysis Of Unfolded RegionS Of Proteins) is a Python package for the analysis of all-atom and coarse-grained simulations of unfolded and disordered proteins. It provides a wide range of functionality that may not be relevant for folded proteins but is essential for extracting polymer-physics insight from simulations of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). SOURSOP was formerly CAMPARITraj, which was formerly CTraj, and includes all the original functionality therein.

The SOURSOP GitHub page can be accessed here: https://github.com/holehouse-lab/soursop.

SOURSOP was built with the CAMPARI simulation engine in mind, but has been successfully tested on a wide range of trajectories generated by different software packages. It uses the mdtraj (http://mdtraj.org) backend for trajectory reading and representation, and focusses on analysis routines for characterizing ensembles of disordered and unfolded proteins through the lens of polymer physics. It works for both all-atom and coarse-grained (one-bead-per-residue) ensembles.

Why SOURSOP?

Most molecular-simulation analysis tools are oriented towards folded proteins (RMSD to a native state, secondary-structure stability, binding-pocket geometry). Disordered proteins have no single reference structure, so the questions - and the right observables - are different. SOURSOP focusses on the ensemble- and polymer-centric quantities that are meaningful for IDPs/IDRs:

  • Global dimensions - radius of gyration, hydrodynamic radius, end-to-end distance, asphericity, the gyration tensor, and the dimensionless size parameter \(\langle t \rangle\).

  • Polymer scaling - internal-scaling profiles, apparent scaling exponents (with bootstrap error estimation), and homopolymer-deviation maps.

  • Distance & contact maps - mean / RMS inter-residue distance maps and fractional contact maps, including fast inter-chain maps for multi-chain systems.

  • Local structure - DSSP and BBSEG2 secondary structure, dihedral angles and dihedral mutual information, sliding-window local heterogeneity and local collapse.

  • Solvent exposure - per-residue / per-atom / sidechain / backbone SASA and regional accessibility.

  • NMR & PRE observables - sequence-corrected random-coil chemical shifts (ssnmr) and synthetic paramagnetic relaxation enhancement profiles (sspre) for direct comparison with experiment.

  • Sampling quality - assessment of ensemble convergence via the PENGUIN tools in sssampling.

  • Ensemble reweighting - every ensemble-average observable accepts an optional per-frame weights vector, applied consistently and deterministically, for re-weighted / enhanced-sampling / maximum-entropy ensembles (see Ensemble reweighting (frame weights)).

In addition to the pre-built analyses, SOURSOP gives easy and rapid access to all inter-residue and inter-atomic distances, so custom observables are straightforward to build.

Quickstart

pip install soursop          # or:  uv pip install soursop
from soursop.sstrajectory import SSTrajectory

# read a trajectory (trajectory file + topology/PDB file)
traj = SSTrajectory('traj.xtc', 'start.pdb')

# each protein chain is an SSProtein object
protein = traj.proteinTrajectoryList[0]

# per-frame radius of gyration, and the ensemble mean
rg = protein.get_radius_of_gyration()
print(rg.mean())

# a re-weighted ensemble average (e.g. from an MSM / MaxEnt reweighting)
rg_reweighted = protein.get_radius_of_gyration(weights=my_weights)

# mean inter-residue distance map and a contact map
dmap, dstd = protein.get_distance_map()
cmap, corder = protein.get_contact_map()

See Overview for the core concepts (SSTrajectory vs. SSProtein, multi-chain systems) and Examples for nine worked, end-to-end IDP analyses.

Documentation map

Citing SOURSOP

If you use SOURSOP in your work, please cite:

Lalmansingh, J. M., Keeley, A. T., Ruff, K. M., Pappu, R. V. & Holehouse, A. S. SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins. J. Chem. Theory Comput. (2023). doi:10.1021/acs.jctc.3c00190.

Reporting bugs & requesting features

Please report bugs, typos, or unexpected behaviour on the GitHub issue tracker. Contributions are welcome - see Development for the plugin workflow and contribution guidelines.

Changelog

Update: May 2026 (0.2.7) A large maintenance, performance, and documentation release. Highlights:

  • Bug fixes across ssnmr (phospho-residue glycine corrections), ssprotein (native-contacts NaN/overflow, glycine sidechain contacts, cluster centroid), sstrajectory (interchain-map cap-residue handling), ssutils (thread count on Apple Accelerate) and ssmutualinformation (a latent calc_MI weighted-path bug).

  • Performance - large behaviour-preserving (byte-identical) speed-ups: the inter-chain distance/contact maps in sstrajectory and the O(n2) CA-mode polymer-scaling loops in ssprotein (get_internal_scaling, get_scaling_exponent, get_local_to_global_correlation); a new stride parameter on get_interchain_contact_map / get_interchain_distance.

  • Consistent ensemble reweighting - every function returning an ensemble-average value now accepts an optional per-frame weights vector, applied deterministically (no stochastic resampling) and validated by a single shared ssutils.validate_weights; stride + weights now work together correctly. See Ensemble reweighting (frame weights).

  • Testing - new test_weights.py (~68 parametrized tests) plus the pickle-based regression suite (~600+ observable tests) and extended sstrajectory coverage.

  • Documentation - full numpy-style docstring rewrite across all major modules, narrative overviews on every page, rewritten worked examples, a new ensemble-reweighting page, and an expanded front page.

  • Packaging - reconciled dependency manifests, refreshed CI (Python 3.9-3.12), removed dead config.

Update: November. 2024 (0.2.6) Transition to pyproject.toml for packaging and versioning. Dramatically improved performance of loading coarse-grained ensembles.

Update: July. 2024 Official inclusion of the PENGUIN code into sssampling is now complete ahead of preprint!

Update: July. 2023 Added explicit_residue_checking option to SSTrajectory constructor to make parsing solvated .gro files or files where non-protein molecules are included in the same chain possible and easy.

Update: Jan. 2023 Finalization of code and documentation ahead of preprint deposition.

Update: Sept. 2022 Additional tests and docs updates ahead of preprint.

Update: April 2022 Moved SOURSOP onto PyPI in anticipation of the final release. Additional tests, code clean up etc.

Update: March 2022 Numerous updates to internal code documentation, removal of soursop_cli, update to pip install over git to use https.

Update: July 2021 The anticipated release of soursop is August 2021! We have nearly finished all testing and are finalizing the associated manuscript.

Update: Jan 2021 CAMPARITraj was recently restructured to homogenize a number of functions, as well as ensure future and backwards compatibility with mdtraj version 1.9.5.

Update: May 2019 CAMPARITraj is currently still being finalized, so we do not currently recommend installation directly from the GitHub repository as tests, final code tweaks, and even major changes to the code base are constantly occurring. However, in principle, assuming camparitraj is building, once downloaded from GitHub, it can be installed via

About

SOURSOP was built by Jared Lalmansingh (Pappu lab) and Alex Holehouse. Its development was supported financially and intellectually by the Molecular Sciences Software Institute (MOLSSI). It was also supported by NSF grant no. 2128068 to Alex, and we thank members of the Water and Life Interface Institute (WALII), supported by NSF DBI grant #2213983, for helpful discussions.