.. soursop documentation master file, created by
sphinx-quickstart on Thu Mar 15 13:55:56 2018.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to the SOURSOP Documentation!
=========================================================
*Last updated: May 2026*
**SOURSOP** (**S**\imulation analysis **O**\f **U**\nfolded **R**\egion\ **S** **O**\f **P**\roteins) is a Python package for the analysis of all-atom and coarse-grained simulations of unfolded and disordered proteins. It provides a wide range of functionality that may not be relevant for folded proteins but is essential for extracting polymer-physics insight from simulations of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). **SOURSOP** was formerly *CAMPARITraj*, which was formerly *CTraj*, and includes all the original functionality therein.
The **SOURSOP** GitHub page can be accessed here: `https://github.com/holehouse-lab/soursop `_.
**SOURSOP** was built with the CAMPARI simulation engine in mind, but has been successfully tested on a wide range of trajectories generated by different software packages. It uses the ``mdtraj`` (`http://mdtraj.org `_) backend for trajectory reading and representation, and focusses on analysis routines for characterizing ensembles of disordered and unfolded proteins through the lens of polymer physics. It works for both all-atom and coarse-grained (one-bead-per-residue) ensembles.
Why SOURSOP?
=========================================================
Most molecular-simulation analysis tools are oriented towards folded proteins (RMSD to a native state, secondary-structure stability, binding-pocket geometry). Disordered proteins have no single reference structure, so the questions - and the right observables - are different. SOURSOP focusses on the ensemble- and polymer-centric quantities that *are* meaningful for IDPs/IDRs:
* **Global dimensions** - radius of gyration, hydrodynamic radius, end-to-end distance, asphericity, the gyration tensor, and the dimensionless size parameter :math:`\langle t \rangle`.
* **Polymer scaling** - internal-scaling profiles, apparent scaling exponents (with bootstrap error estimation), and homopolymer-deviation maps.
* **Distance & contact maps** - mean / RMS inter-residue distance maps and fractional contact maps, including fast inter-chain maps for multi-chain systems.
* **Local structure** - DSSP and BBSEG2 secondary structure, dihedral angles and dihedral mutual information, sliding-window local heterogeneity and local collapse.
* **Solvent exposure** - per-residue / per-atom / sidechain / backbone SASA and regional accessibility.
* **NMR & PRE observables** - sequence-corrected random-coil chemical shifts (``ssnmr``) and synthetic paramagnetic relaxation enhancement profiles (``sspre``) for direct comparison with experiment.
* **Sampling quality** - assessment of ensemble convergence via the PENGUIN tools in ``sssampling``.
* **Ensemble reweighting** - *every* ensemble-average observable accepts an optional per-frame ``weights`` vector, applied consistently and deterministically, for re-weighted / enhanced-sampling / maximum-entropy ensembles (see :doc:`usage/weights`).
In addition to the pre-built analyses, SOURSOP gives easy and rapid access to **all** inter-residue and inter-atomic distances, so custom observables are straightforward to build.
Quickstart
=========================================================
.. code-block:: bash
pip install soursop # or: uv pip install soursop
.. code-block:: python
from soursop.sstrajectory import SSTrajectory
# read a trajectory (trajectory file + topology/PDB file)
traj = SSTrajectory('traj.xtc', 'start.pdb')
# each protein chain is an SSProtein object
protein = traj.proteinTrajectoryList[0]
# per-frame radius of gyration, and the ensemble mean
rg = protein.get_radius_of_gyration()
print(rg.mean())
# a re-weighted ensemble average (e.g. from an MSM / MaxEnt reweighting)
rg_reweighted = protein.get_radius_of_gyration(weights=my_weights)
# mean inter-residue distance map and a contact map
dmap, dstd = protein.get_distance_map()
cmap, corder = protein.get_contact_map()
See :doc:`usage/overview` for the core concepts (``SSTrajectory`` vs. ``SSProtein``, multi-chain systems) and :doc:`usage/examples` for nine worked, end-to-end IDP analyses.
Documentation map
=========================================================
* :doc:`usage/overview` - core concepts and how the pieces fit together.
* :doc:`usage/installation` - install with pip, uv, or conda (PyPI or GitHub), and how to run the tests.
* :doc:`usage/examples` - worked, copy-pasteable IDP analysis recipes.
* :doc:`usage/weights` - the consistent ensemble-reweighting (frame ``weights``) system and the shared validation helpers.
* :doc:`usage/development` - extending SOURSOP, the plugin system, and contributing.
* **Module API references** - :doc:`modules/sstrajectory`, :doc:`modules/ssprotein`, :doc:`modules/ssnmr`, :doc:`modules/sspre`, :doc:`modules/sssampling`.
.. toctree::
:maxdepth: 1
:caption: Contents:
usage/overview
usage/installation
usage/examples
usage/weights
modules/sstrajectory
modules/ssprotein
modules/ssnmr
modules/sspre
modules/sssampling
usage/development
Citing SOURSOP
=========================================================
If you use SOURSOP in your work, please cite:
Lalmansingh, J. M., Keeley, A. T., Ruff, K. M., Pappu, R. V. & Holehouse, A. S. *SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins.* J. Chem. Theory Comput. (2023). doi:`10.1021/acs.jctc.3c00190 `_.
Reporting bugs & requesting features
=========================================================
Please report bugs, typos, or unexpected behaviour on the `GitHub issue tracker `_. Contributions are welcome - see :doc:`usage/development` for the plugin workflow and contribution guidelines.
Changelog
==========
*Update: May 2026* (0.2.7)
A large maintenance, performance, and documentation release. Highlights:
* **Bug fixes** across ``ssnmr`` (phospho-residue glycine corrections), ``ssprotein`` (native-contacts NaN/overflow, glycine sidechain contacts, cluster centroid), ``sstrajectory`` (interchain-map cap-residue handling), ``ssutils`` (thread count on Apple Accelerate) and ``ssmutualinformation`` (a latent ``calc_MI`` weighted-path bug).
* **Performance** - large behaviour-preserving (byte-identical) speed-ups: the inter-chain distance/contact maps in ``sstrajectory`` and the O(n\ :sup:`2`) CA-mode polymer-scaling loops in ``ssprotein`` (``get_internal_scaling``, ``get_scaling_exponent``, ``get_local_to_global_correlation``); a new ``stride`` parameter on ``get_interchain_contact_map`` / ``get_interchain_distance``.
* **Consistent ensemble reweighting** - every function returning an ensemble-average value now accepts an optional per-frame ``weights`` vector, applied deterministically (no stochastic resampling) and validated by a single shared ``ssutils.validate_weights``; ``stride`` + ``weights`` now work together correctly. See :doc:`usage/weights`.
* **Testing** - new ``test_weights.py`` (~68 parametrized tests) plus the pickle-based regression suite (~600+ observable tests) and extended ``sstrajectory`` coverage.
* **Documentation** - full numpy-style docstring rewrite across all major modules, narrative overviews on every page, rewritten worked examples, a new ensemble-reweighting page, and an expanded front page.
* **Packaging** - reconciled dependency manifests, refreshed CI (Python 3.9-3.12), removed dead config.
*Update: November. 2024* (0.2.6)
Transition to pyproject.toml for packaging and versioning. Dramatically improved performance of loading coarse-grained ensembles.
*Update: July. 2024*
Official inclusion of the PENGUIN code into sssampling is now complete ahead of preprint!
*Update: July. 2023*
Added ``explicit_residue_checking`` option to SSTrajectory constructor to make parsing solvated .gro files or files where non-protein molecules are included in the same chain possible and easy.
*Update: Jan. 2023*
Finalization of code and documentation ahead of preprint deposition.
*Update: Sept. 2022*
Additional tests and docs updates ahead of preprint.
*Update: April 2022*
Moved SOURSOP onto PyPI in anticipation of the final release. Additional tests, code clean up etc.
*Update: March 2022*
Numerous updates to internal code documentation, removal of `soursop_cli`, update to pip install over git to use https.
*Update: July 2021*
The anticipated release of `soursop` is August 2021! We have nearly finished all testing and are finalizing the associated manuscript.
*Update: Jan 2021*
**CAMPARITraj** was recently restructured to homogenize a number of functions, as well as ensure future and backwards compatibility with `mdtraj` version 1.9.5.
*Update: May 2019*
**CAMPARITraj** is currently still being finalized, so we do not currently recommend installation directly from the GitHub repository as tests, final code tweaks, and even major changes to the code base are constantly occurring. However, in principle, assuming **camparitraj** is building, once downloaded from GitHub, it can be installed via
About
========
**SOURSOP** was built by Jared Lalmansingh (Pappu lab) and `Alex Holehouse `_. Its development was supported financially and intellectually by the `Molecular Sciences Software Institute (MOLSSI) `_. It was also supported by NSF grant no. 2128068 to Alex, and we thank members of the `Water and Life Interface Institute (WALII) `_, supported by NSF DBI grant #2213983, for helpful discussions.