Yank¶
Interface for automated free energy calculations.
-
class
yank.yank.
Topography
(topology, ligand_atoms=None, solvent_atoms='auto')[source]¶ A class mapping and labelling the different components of a system.
The object holds the topology of a system and offers convenience functions to identify its various parts such as solvent, receptor, ions and ligand atoms.
A molecule should be labeled as a ligand, only if there is also a receptor. If there is only a single molecule its atom indices can be obtained from solute_atoms instead. In ligand-receptor system, solute_atoms provides the atom indices for both molecules.
Parameters: - topology : mdtraj.Topology or simtk.openmm.app.Topology
The topology object specifying the system.
- ligand_atoms : iterable of int or str, optional
The atom indices of the ligand. A string is interpreted as an mdtraj DSL specification of the ligand atoms.
- solvent_atoms : iterable of int or str, optional
The atom indices of the solvent. A string is interpreted as an mdtraj DSL specification of the solvent atoms. If ‘auto’, a list of common solvent residue names will be used to automatically detect solvent atoms (default is ‘auto’).
Attributes: ligand_atoms
The atom indices of the ligand as list
receptor_atoms
The atom indices of the receptor as list (read-only).
solute_atoms
The atom indices of the non-solvent molecule(s) (read-only).
solvent_atoms
The atom indices of the solvent molecules.
ions_atoms
The indices of all ions atoms in the solvent (read-only).
-
topology
¶ mdtraj.Topology: A copy of the topology (read-only).
-
ligand_atoms
¶ The atom indices of the ligand as list
This can be empty if this
Topography
doesn’t represent a receptor-ligand system. Use solute_atoms to obtain the atom indices of the molecule if this is the case.If assigned to a string, it will be interpreted as an mdtraj DSL specification of the atom indices.
-
receptor_atoms
¶ The atom indices of the receptor as list (read-only).
This can be empty if this Topography doesn’t represent a receptor-ligand system. Use solute_atoms to obtain the atom indices of the molecule if this is the case.
-
solute_atoms
¶ The atom indices of the non-solvent molecule(s) (read-only).
Practically, this are all the indices of the atoms that are not considered solvent. In a receptor-ligand system, this includes the atom indices of both the receptor and the ligand.
-
solvent_atoms
¶ The atom indices of the solvent molecules.
This includes eventual ions. If assigned to a string, it will be interpreted as an mdtraj DSL specification of the atom indices. If assigned to ‘auto’, a set of solvent auto indices is automatically built from common solvent residue names.
-
ions_atoms
¶ The indices of all ions atoms in the solvent (read-only).
-
add_region
(region_name, region_selection, subset=None)[source]¶ Add a region to the Topography based on a selection string of atoms. The selection accepts multiple formats such as a DSL string, a SMIRKS selection string, or hard coded atom indices. The selection string is converted to a list of atom indices.
Parameters: - region_name : str
Name of the region. This must be unique and also not the name of an existing method
- region_selection : str or list of ints
Atom selection identifier, either a MDTraj DSL string, a SMARTS string, a compound region selection, or a hard-coded list of ints. The SMARTS string requires the OpenEye OEChem library to correctly select
- subset : str or list of ints, or None
Atom selection sub-region to filter the atom selection through. This is a way to define your new region as a relative selection to the subset. Follows the same conditions as
region_selection
.
-
remove_region
(region_name)[source]¶ Remove a previously added region from this Topography. This only affects regions added through the
add_region()
function.Does nothing if the region was not previously added
Parameters: - region_name : str
Name of the region to remove
-
get_region
(region_name) → typing.List[int][source]¶ Retrieve the atom indices of the given region. This function will also fetch the built-in regions
Parameters: - region_name : str
Name of the region to fetch. Can use both custom regions or built in ones such as “ligand_atoms”
Returns: - region : list of ints
Atom integers which comprise the region
Raises: - KeyError
If region is not part of the Topography
-
select
(selection, as_set=False, sort_by='auto', subset=None) → typing.Union[typing.List[int], typing.Set[int]][source]¶ Select atoms based on selection format which can be a number of formats:
- Single integer (a bit redundant)
- Iterable of integer (also a bit redundant)
- Complex Region selection
- MDTraj String
- SMARTS Selection
This method will never return duplicate atom numbers.
The
sort_by
method controls how the output is sorted before being returned, see the details in the Parameters block for information about each optionThe Complex Region string returns an the set of atoms derived from the arguments using logical operators
and
andor
along with grouping through parenthesis . For example, assume you have two regionsregionA = [0,1,2,3]
andregionB = [2,3,4,5]
. You can do operations such as the following:regionA and regionB
yields[2,3]
, which is the intersection of the regions.regionA or regionB
yields[0,1,2,3,4,5]
, which is the union of the regions.More complex statements with more regions will also work, and statements can be grouped with
()
.The
subset
keyword filters theselection
relative to this subset selection. If not None, the subset is processed first through this same function, then the primary selection is processed relative to it.subset
follows the same conditions asselection
, but sort order for subset is ignored If yourselection
would pick atoms that are NOT part of the subset, then those atoms are NOT RETURNED. If yourselection
is an integer or some sequence of integer, then the indices are relative to thesubset
. Final atom numbers will be absolute to the whole Topography.Parameters: - selection : str, list of ints, or int
String defining the selection
- as_set : bool, Default False
Determines output format. Returns a Set if True, otherwise output is a list.
- sort_by : str or None, Default: ‘auto’
Determine how to sort the output if
as_set
is False.- ‘auto’: Let the selection string determine how to sort it out based on its priorities
- ‘index’: Atoms are sorted index, smallest to largest
- ‘region_order’: Atoms are sorted by which region in the provided
selection_string
occurs first. - So if your expression is
region1 and region2
, then the output will be atoms which appear inregion1
first. Parenthesis are ignored so the expressionregion1 and (region2 or region3)
will prioritizeregion1
first for sorting. This option only works for expressions on Regions, other selections will fall back toNone
- ‘region_order’: Atoms are sorted by which region in the provided
None
: Sorting is left up to however the selection string is processed by their respective drivers,- or by whatever order the set operations returns it. Not guaranteed to be deterministic in all cases.
- subset : None, str, list of ints, or int; Optional; Default: None
Set of atoms to make a relative selection to. Follows the same rules as
selection
for valid inputs. If None,selection
is on whole Topography.
Returns: - selection : list or set
Returns the selected atoms as either a list or set based on
as_set
keyword. Order of the output is determined by thesort_by
keyword. If aas_set
isTrue
, this option has no effect.
-
class
yank.yank.
IMultiStateSampler
[source]¶ A sampler for multiple thermodynamic states.
This is the interface documents the properties and methods that need to be exposed by the sampler object to be compatible with the class
AlchemicalPhase
.Attributes: number_of_iterations
int: the total number of iterations to run.
iteration
int: the current iteration.
metadata
dict: a copy of the metadata dictionary passed on creation.
sampler_states
list of SamplerState: the sampler states at the current iteration.
-
classmethod
read_status
(storage)[source]¶ Read the status of the calculation from the storage file.
Parameters: - storage : str or Reporter
The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.
Returns: - status : namedtuple
The status of the calculation.
-
number_of_iterations
¶ int: the total number of iterations to run.
-
iteration
¶ int: the current iteration.
-
metadata
¶ dict: a copy of the metadata dictionary passed on creation.
-
sampler_states
¶ list of SamplerState: the sampler states at the current iteration.
-
create
(thermodynamic_state, sampler_states, storage, unsampled_thermodynamic_states=None, initial_thermodynamic_states=None, metadata=None)[source]¶ Create new simulation and initialize the storage.
Parameters: - thermodynamic_state : list of openmmtools.states.ThermodynamicState
The thermodynamic states for the simulation.
- sampler_states : openmmtools.states.SamplerState or list
One or more sets of initial sampler states. If a list of SamplerStates, they will be assigned to thermodynamic states in a round-robin fashion.
- storage : str or Reporter
The path to the storage file or a Reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.
- unsampled_thermodynamic_states : list of openmmtools.states.ThermodynamicState, Optional, Default: None
These are ThermodynamicStates that are not propagated, but their reduced potential is computed at each iteration for each replica. These energy can be used as data for reweighting schemes.
- initial_thermodynamic_states : None or list or array-like of int of length len(sampler_states), optional,
default: None. Initial thermodynamic_state index for each sampler_state.
- metadata : dict
Simulation metadata to be stored in the file.
-
minimize
(tolerance, max_iterations)[source]¶ Minimize all states.
Parameters: - tolerance : simtk.unit.Quantity
Minimization tolerance (units of energy/mole/length, default is
1.0 * unit.kilojoules_per_mole / unit.nanometers
).- max_iterations : int
Maximum number of iterations for minimization. If 0, minimization continues until converged.
-
equilibrate
(n_iterations, mcmc_moves=None)[source]¶ Equilibrate all states.
Parameters: - n_iterations : int
Number of equilibration iterations.
- mcmc_moves : MCMCMove or list of MCMCMove, optional
Optionally, the MCMCMoves to use for equilibration can be different from the ones used in production (default is None).
-
run
(n_iterations=None)[source]¶ Run the simulation.
This runs at most
number_of_iterations
iterations. Useextend()
to pass the limit.Parameters: - n_iterations : int, optional
If specified, only at most the specified number of iterations will be run (default is None).
-
class
yank.yank.
AlchemicalPhase
(sampler)[source]¶ A single thermodynamic leg (phase) of an alchemical free energy calculation.
This class wraps around a general MultiStateSampler and handle the creation of an alchemical free energy calculation.
Parameters: - sampler : IMultiStateSampler
The sampler instance implementing the
IMultiStateSampler
interface.
Attributes: iteration
int: the current iteration (read-only).
number_of_iterations
int: the total number of iterations to run.
is_completed
Boolean check if if the sampler has been completed by its own determination or if we have exceeded number of
-
classmethod
from_storage
(storage)[source]¶ Static constructor from an existing storage file.
Parameters: - storage : str or Reporter
The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.
Returns: - alchemical_phase : AlchemicalPhase
A new instance of
AlchemicalPhase
in the same state of the last stored iteration.
-
classmethod
read_status
(storage)[source]¶ Read the status of the calculation from the storage file.
This method can be used to quickly check the status of the simulation before loading the full
ReplicaExchange
object from disk.Parameters: - storage : str or Reporter
The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.
Returns: - status : namedtuple
The status of the calculation.
-
iteration
¶ int: the current iteration (read-only).
-
number_of_iterations
¶ int: the total number of iterations to run.
-
is_completed
¶ Boolean check if if the sampler has been completed by its own determination or if we have exceeded number of iterations
-
create
(thermodynamic_state, sampler_states, topography, protocol, storage, restraint=None, anisotropic_dispersion_cutoff=None, alchemical_regions=None, alchemical_factory=None, metadata=None)[source]¶ Create a new AlchemicalPhase calculation for a specified protocol.
If
anisotropic_dispersion_cutoff
is different thanNone
. The end states of the phase will be reweighted. The fully interacting state accounts for:- The truncation of nonbonded interactions.
2. The reciprocal space which is not modeled in alchemical states if an Ewald method is used for long-range interactions.
Parameters: - thermodynamic_state : openmmtools.states.ThermodynamicState
Thermodynamic state holding the reference system, temperature and pressure.
- sampler_states : openmmtools.states.SamplerState or list
One or more sets of initial sampler states. If a list of SamplerStates, they will be assigned to replicas in a round-robin fashion.
- topography : Topography
The object holding the topology and labelling the different components of the system. This is used to discriminate between ligand-receptor and solvation systems.
- protocol : dict
The dictionary
{parameter_name: list_of_parameter_values}
defining the protocol. All the parameter values list must have the same number of elements.- storage : str or Reporter
The path to the storage file or a Reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.
- restraint : ReceptorLigandRestraint, optional
Restraint to add between protein and ligand. This must be specified for ligand-receptor systems in non-periodic boxes.
- anisotropic_dispersion_cutoff : simtk.openmm.Quantity, ‘auto’, or None, optional, default None
If specified, this is the cutoff at which to reweight long range interactions of the end states to correct for anisotropic dispersions.
If ‘auto’, then the distance is automatically chosen based on the minimum possible size it can be given the box volume, then behaves as if a Quantity was passed in.
If None, the correction won’t be applied (units of length, default is None).
- alchemical_regions : openmmtools.alchemy.AlchemicalRegion or None, optional, default: None
If specified, this is the
AlchemicalRegion
that will be passed to theAbsoluteAlchemicalFactory
, otherwise the ligand will be alchemically modified according to the given protocol.- alchemical_factory : openmmtools.alchemy.AbsoluteAlchemicalFactory, optional
If specified, this
AbsoluteAlchemicalFactory
will be used instead of the one created with default options.- metadata : dict, optional
Simulation metadata to be stored in the file.
-
minimize
(tolerance=Quantity(value=1.0, unit=kilojoule/(nanometer*mole)), max_iterations=0)[source]¶ Minimize all the states.
The minimization is performed in two steps. In the first one, the positions are minimized in the reference thermodynamic state (i.e. non alchemically-modified). Only then, the positions are minimized in their alchemically softened state.
Parameters: - tolerance : simtk.unit.Quantity, optional
Minimization tolerance (units of energy/mole/length, default is
1.0 * unit.kilojoules_per_mole / unit.nanometers
).- max_iterations : int, optional
Maximum number of iterations for minimization. If 0, minimization continues until converged.
-
randomize_ligand
(sigma_multiplier=2.0, close_cutoff=Quantity(value=1.5, unit=angstrom))[source]¶ Randomize the ligand positions in every state.
The position and orientation of the ligand in each state will be randomized. This works only if the system is a ligand-receptor system.
If you call this before minimizing, each positions will be minimized separately in the reference state, so you may want to call it afterwards to speed up minimization.
Parameters: - sigma_multiplier : float, optional
The ligand will be placed close to a random receptor atom at a distance that is normally distributed with standard deviation
sigma_multiplier * receptor_radius_of_gyration
(default is 2.0).- close_cutoff : simtk.unit.Quantity, optional
Each random placement proposal will be rejected if the ligand ends up being closer to the receptor than this cutoff (units of length, default is
1.5*unit.angstrom
).
-
equilibrate
(n_iterations, mcmc_moves=None)[source]¶ Equilibrate all states.
Parameters: - n_iterations : int
Number of equilibration iterations.
- mcmc_moves : MCMCMove or list of MCMCMove, optional
Optionally, the MCMCMoves to use for equilibration can be different from the ones used in production.