Overview
SOURSOP is a Python-based simulation analysis package developed for the analysis of conformational ensembles of disordered and unfolded proteins. Its goal is to make it as easy as possible to read in an ensemble of an intrinsically disordered region (IDR) and quickly extract polymer-physics-aware observables. In addition to a large library of pre-built analysis routines, SOURSOP provides easy and rapid access to all inter-residue and inter-atomic distances, contact maps, dimensions, secondary-structure content, and more.
SOURSOP is built on top of MDTraj, which handles trajectory I/O and the low-level atomic representation. SOURSOP focuses on the analysis layer, with routines specifically chosen to be useful for characterizing disordered and unfolded states through the lens of polymer physics.
A first example
# import soursop
from soursop.sstrajectory import SSTrajectory
# read in the simulation trajectory (trajectory file + topology file)
TO = SSTrajectory('traj.xtc', 'start.pdb')
# once the trajectory has been read in, individual protein chains can
# be extracted from the proteinTrajectoryList
protein = TO.proteinTrajectoryList[0]
# per-residue center-of-mass distance between residues 10 and 20
d_10_20 = protein.get_inter_residue_COM_distance(10, 20)
# ensemble-average asphericity
asph = protein.get_asphericity()
# ensemble-average radius of gyration
rg = protein.get_radius_of_gyration()
# ensemble-average inter-residue distance map
dm = protein.get_distance_map()
Core concepts
Two objects underpin almost all analysis in SOURSOP:
SSTrajectory
SSTrajectory is the top-level, system-level object. You construct
it from a trajectory file and a topology (PDB) file. It wraps the
underlying MDTraj trajectory, identifies the distinct protein chains in
the system, and exposes system-wide and inter-chain analyses (for
example inter-chain distance and contact maps for multi-chain
simulations of, e.g., phase-separating systems or protein complexes).
SSProtein
Each individual protein chain identified during loading is represented
as an SSProtein object, accessed through
SSTrajectory.proteinTrajectoryList. SSProtein is where the bulk
of single-chain analysis lives: global dimensions (Rg, Rh,
end-to-end distance, asphericity), polymer scaling, distance and contact
maps, secondary structure, solvent accessibility, dihedral angles, and
arbitrary inter-residue / inter-atomic distances. Expensive lookups are
memoised, so repeated queries on the same object are fast.
The typical pattern is therefore:
TO = SSTrajectory('traj.xtc', 'start.pdb') # system-level object
P = TO.proteinTrajectoryList[0] # one protein chain
# ... run analyses on P ...
For multi-chain systems, iterate over proteinTrajectoryList (one
SSProtein per chain) and use the SSTrajectory inter-chain
methods for cross-chain observables.
The SOURSOP modules
SOURSOP is organised into a small number of focused modules:
sstrajectory- theSSTrajectoryclass; trajectory loading, chain detection, system-level and inter-chain analysis, and helpers for parallel loading of many trajectories.ssprotein- theSSProteinclass; the main single-chain analysis engine (dimensions, scaling, maps, secondary structure, SASA, angles, and inter-residue/atomic distances).ssnmr- sequence-based prediction of random-coil backbone chemical shifts (CA, CB, CO, N, HN, HA), with temperature, pH and perdeuteration corrections and support for phospho-residues. Useful for comparing simulated ensembles against NMR data.sspre- theSSPREclass; fast calculation of synthetic paramagnetic relaxation enhancement (PRE) intensity ratios and gamma profiles for a spin label placed at an arbitrary sequence position.sssampling- theSSSamplingclass and PENGUIN support, for assessing the sampling quality / convergence of disordered-protein ensembles.sstools- miscellaneous numerical helper functions shared across the package (chunking, residue-name normalisation, the polymer power-law model, minimum-image distances, trajectory-file discovery).
In addition, SOURSOP is designed to be extended via user-contributed
plugins in soursop/plugins - see the Development page.
Where to go next
Installation - install SOURSOP with pip, uv, or conda.
Examples - worked, end-to-end IDP analysis examples.
Development - extending SOURSOP and contributing plugins.
The per-module API references (
sstrajectory,ssprotein,ssnmr,sspre,sssampling) for the full list of available analysis routines.