Yank

Interface for automated free energy calculations.

class yank.yank.Topography(topology, ligand_atoms=None, solvent_atoms='auto')[source]

A class mapping and labelling the different components of a system.

The object holds the topology of a system and offers convenience functions to identify its various parts such as solvent, receptor, ions and ligand atoms.

A molecule should be labeled as a ligand, only if there is also a receptor. If there is only a single molecule its atom indices can be obtained from solute_atoms instead. In ligand-receptor system, solute_atoms provides the atom indices for both molecules.

Parameters:
topology : mdtraj.Topology or simtk.openmm.app.Topology

The topology object specifying the system.

ligand_atoms : iterable of int or str, optional

The atom indices of the ligand. A string is interpreted as an mdtraj DSL specification of the ligand atoms.

solvent_atoms : iterable of int or str, optional

The atom indices of the solvent. A string is interpreted as an mdtraj DSL specification of the solvent atoms. If ‘auto’, a list of common solvent residue names will be used to automatically detect solvent atoms (default is ‘auto’).

Attributes:
ligand_atoms

The atom indices of the ligand as list

receptor_atoms

The atom indices of the receptor as list (read-only).

solute_atoms

The atom indices of the non-solvent molecule(s) (read-only).

solvent_atoms

The atom indices of the solvent molecules.

ions_atoms

The indices of all ions atoms in the solvent (read-only).

topology

mdtraj.Topology: A copy of the topology (read-only).

ligand_atoms

The atom indices of the ligand as list

This can be empty if this Topography doesn’t represent a receptor-ligand system. Use solute_atoms to obtain the atom indices of the molecule if this is the case.

If assigned to a string, it will be interpreted as an mdtraj DSL specification of the atom indices.

receptor_atoms

The atom indices of the receptor as list (read-only).

This can be empty if this Topography doesn’t represent a receptor-ligand system. Use solute_atoms to obtain the atom indices of the molecule if this is the case.

solute_atoms

The atom indices of the non-solvent molecule(s) (read-only).

Practically, this are all the indices of the atoms that are not considered solvent. In a receptor-ligand system, this includes the atom indices of both the receptor and the ligand.

solvent_atoms

The atom indices of the solvent molecules.

This includes eventual ions. If assigned to a string, it will be interpreted as an mdtraj DSL specification of the atom indices. If assigned to ‘auto’, a set of solvent auto indices is automatically built from common solvent residue names.

ions_atoms

The indices of all ions atoms in the solvent (read-only).

add_region(region_name, region_selection, subset=None)[source]

Add a region to the Topography based on a selection string of atoms. The selection accepts multiple formats such as a DSL string, a SMIRKS selection string, or hard coded atom indices. The selection string is converted to a list of atom indices.

Parameters:
region_name : str

Name of the region. This must be unique and also not the name of an existing method

region_selection : str or list of ints

Atom selection identifier, either a MDTraj DSL string, a SMARTS string, a compound region selection, or a hard-coded list of ints. The SMARTS string requires the OpenEye OEChem library to correctly select

subset : str or list of ints, or None

Atom selection sub-region to filter the atom selection through. This is a way to define your new region as a relative selection to the subset. Follows the same conditions as region_selection.

remove_region(region_name)[source]

Remove a previously added region from this Topography. This only affects regions added through the add_region() function.

Does nothing if the region was not previously added

Parameters:
region_name : str

Name of the region to remove

get_region(region_name) → typing.List[int][source]

Retrieve the atom indices of the given region. This function will also fetch the built-in regions

Parameters:
region_name : str

Name of the region to fetch. Can use both custom regions or built in ones such as “ligand_atoms”

Returns:
region : list of ints

Atom integers which comprise the region

Raises:
KeyError

If region is not part of the Topography

select(selection, as_set=False, sort_by='auto', subset=None) → typing.Union[typing.List[int], typing.Set[int]][source]

Select atoms based on selection format which can be a number of formats:

  • Single integer (a bit redundant)
  • Iterable of integer (also a bit redundant)
  • Complex Region selection
  • MDTraj String
  • SMARTS Selection

This method will never return duplicate atom numbers.

The sort_by method controls how the output is sorted before being returned, see the details in the Parameters block for information about each option

The Complex Region string returns an the set of atoms derived from the arguments using logical operators and and or along with grouping through parenthesis . For example, assume you have two regions regionA = [0,1,2,3] and regionB = [2,3,4,5]. You can do operations such as the following:

regionA and regionB yields [2,3], which is the intersection of the regions.

regionA or regionB yields [0,1,2,3,4,5], which is the union of the regions.

More complex statements with more regions will also work, and statements can be grouped with ().

The subset keyword filters the selection relative to this subset selection. If not None, the subset is processed first through this same function, then the primary selection is processed relative to it. subset follows the same conditions as selection, but sort order for subset is ignored If your selection would pick atoms that are NOT part of the subset, then those atoms are NOT RETURNED. If your selection is an integer or some sequence of integer, then the indices are relative to the subset. Final atom numbers will be absolute to the whole Topography.

Parameters:
selection : str, list of ints, or int

String defining the selection

as_set : bool, Default False

Determines output format. Returns a Set if True, otherwise output is a list.

sort_by : str or None, Default: ‘auto’

Determine how to sort the output if as_set is False.

  • ‘auto’: Let the selection string determine how to sort it out based on its priorities
  • ‘index’: Atoms are sorted index, smallest to largest
  • ‘region_order’: Atoms are sorted by which region in the provided selection_string occurs first.
    So if your expression is region1 and region2, then the output will be atoms which appear in region1 first. Parenthesis are ignored so the expression region1 and (region2 or region3) will prioritize region1 first for sorting. This option only works for expressions on Regions, other selections will fall back to None
  • None: Sorting is left up to however the selection string is processed by their respective drivers,
    or by whatever order the set operations returns it. Not guaranteed to be deterministic in all cases.
subset : None, str, list of ints, or int; Optional; Default: None

Set of atoms to make a relative selection to. Follows the same rules as selection for valid inputs. If None, selection is on whole Topography.

Returns:
selection : list or set

Returns the selected atoms as either a list or set based on as_set keyword. Order of the output is determined by the sort_by keyword. If a as_set is True, this option has no effect.

class yank.yank.IMultiStateSampler[source]

A sampler for multiple thermodynamic states.

This is the interface documents the properties and methods that need to be exposed by the sampler object to be compatible with the class AlchemicalPhase.

Attributes:
number_of_iterations

int: the total number of iterations to run.

iteration

int: the current iteration.

metadata

dict: a copy of the metadata dictionary passed on creation.

sampler_states

list of SamplerState: the sampler states at the current iteration.

classmethod read_status(storage)[source]

Read the status of the calculation from the storage file.

Parameters:
storage : str or Reporter

The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.

Returns:
status : namedtuple

The status of the calculation.

number_of_iterations

int: the total number of iterations to run.

iteration

int: the current iteration.

metadata

dict: a copy of the metadata dictionary passed on creation.

sampler_states

list of SamplerState: the sampler states at the current iteration.

create(thermodynamic_state, sampler_states, storage, unsampled_thermodynamic_states=None, initial_thermodynamic_states=None, metadata=None)[source]

Create new simulation and initialize the storage.

Parameters:
thermodynamic_state : list of openmmtools.states.ThermodynamicState

The thermodynamic states for the simulation.

sampler_states : openmmtools.states.SamplerState or list

One or more sets of initial sampler states. If a list of SamplerStates, they will be assigned to thermodynamic states in a round-robin fashion.

storage : str or Reporter

The path to the storage file or a Reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.

unsampled_thermodynamic_states : list of openmmtools.states.ThermodynamicState, Optional, Default: None

These are ThermodynamicStates that are not propagated, but their reduced potential is computed at each iteration for each replica. These energy can be used as data for reweighting schemes.

initial_thermodynamic_states : None or list or array-like of int of length len(sampler_states), optional,

default: None. Initial thermodynamic_state index for each sampler_state.

metadata : dict

Simulation metadata to be stored in the file.

minimize(tolerance, max_iterations)[source]

Minimize all states.

Parameters:
tolerance : simtk.unit.Quantity

Minimization tolerance (units of energy/mole/length, default is 1.0 * unit.kilojoules_per_mole / unit.nanometers).

max_iterations : int

Maximum number of iterations for minimization. If 0, minimization continues until converged.

equilibrate(n_iterations, mcmc_moves=None)[source]

Equilibrate all states.

Parameters:
n_iterations : int

Number of equilibration iterations.

mcmc_moves : MCMCMove or list of MCMCMove, optional

Optionally, the MCMCMoves to use for equilibration can be different from the ones used in production (default is None).

run(n_iterations=None)[source]

Run the simulation.

This runs at most number_of_iterations iterations. Use extend() to pass the limit.

Parameters:
n_iterations : int, optional

If specified, only at most the specified number of iterations will be run (default is None).

extend(n_iterations)[source]

Extend the simulation by the given number of iterations.

Contrarily to run(), this will extend the number of iterations past number_of_iteration if requested.

Parameters:
n_iterations : int

The number of iterations to run.

class yank.yank.AlchemicalPhase(sampler)[source]

A single thermodynamic leg (phase) of an alchemical free energy calculation.

This class wraps around a general MultiStateSampler and handle the creation of an alchemical free energy calculation.

Parameters:
sampler : IMultiStateSampler

The sampler instance implementing the IMultiStateSampler interface.

Attributes:
iteration

int: the current iteration (read-only).

number_of_iterations

int: the total number of iterations to run.

is_completed

Boolean check if if the sampler has been completed by its own determination or if we have exceeded number of

classmethod from_storage(storage)[source]

Static constructor from an existing storage file.

Parameters:
storage : str or Reporter

The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.

Returns:
alchemical_phase : AlchemicalPhase

A new instance of AlchemicalPhase in the same state of the last stored iteration.

classmethod read_status(storage)[source]

Read the status of the calculation from the storage file.

This method can be used to quickly check the status of the simulation before loading the full ReplicaExchange object from disk.

Parameters:
storage : str or Reporter

The path to the storage file or the reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.

Returns:
status : namedtuple

The status of the calculation.

iteration

int: the current iteration (read-only).

number_of_iterations

int: the total number of iterations to run.

is_completed

Boolean check if if the sampler has been completed by its own determination or if we have exceeded number of iterations

create(thermodynamic_state, sampler_states, topography, protocol, storage, restraint=None, anisotropic_dispersion_cutoff=None, alchemical_regions=None, alchemical_factory=None, metadata=None)[source]

Create a new AlchemicalPhase calculation for a specified protocol.

If anisotropic_dispersion_cutoff is different than None. The end states of the phase will be reweighted. The fully interacting state accounts for:

  1. The truncation of nonbonded interactions.

2. The reciprocal space which is not modeled in alchemical states if an Ewald method is used for long-range interactions.

Parameters:
thermodynamic_state : openmmtools.states.ThermodynamicState

Thermodynamic state holding the reference system, temperature and pressure.

sampler_states : openmmtools.states.SamplerState or list

One or more sets of initial sampler states. If a list of SamplerStates, they will be assigned to replicas in a round-robin fashion.

topography : Topography

The object holding the topology and labelling the different components of the system. This is used to discriminate between ligand-receptor and solvation systems.

protocol : dict

The dictionary {parameter_name: list_of_parameter_values} defining the protocol. All the parameter values list must have the same number of elements.

storage : str or Reporter

The path to the storage file or a Reporter object to forward to the sampler. In the future, this will be able to take a Storage class as well.

restraint : ReceptorLigandRestraint, optional

Restraint to add between protein and ligand. This must be specified for ligand-receptor systems in non-periodic boxes.

anisotropic_dispersion_cutoff : simtk.openmm.Quantity, ‘auto’, or None, optional, default None

If specified, this is the cutoff at which to reweight long range interactions of the end states to correct for anisotropic dispersions.

If ‘auto’, then the distance is automatically chosen based on the minimum possible size it can be given the box volume, then behaves as if a Quantity was passed in.

If None, the correction won’t be applied (units of length, default is None).

alchemical_regions : openmmtools.alchemy.AlchemicalRegion or None, optional, default: None

If specified, this is the AlchemicalRegion that will be passed to the AbsoluteAlchemicalFactory, otherwise the ligand will be alchemically modified according to the given protocol.

alchemical_factory : openmmtools.alchemy.AbsoluteAlchemicalFactory, optional

If specified, this AbsoluteAlchemicalFactory will be used instead of the one created with default options.

metadata : dict, optional

Simulation metadata to be stored in the file.

minimize(tolerance=Quantity(value=1.0, unit=kilojoule/(nanometer*mole)), max_iterations=0)[source]

Minimize all the states.

The minimization is performed in two steps. In the first one, the positions are minimized in the reference thermodynamic state (i.e. non alchemically-modified). Only then, the positions are minimized in their alchemically softened state.

Parameters:
tolerance : simtk.unit.Quantity, optional

Minimization tolerance (units of energy/mole/length, default is 1.0 * unit.kilojoules_per_mole / unit.nanometers).

max_iterations : int, optional

Maximum number of iterations for minimization. If 0, minimization continues until converged.

randomize_ligand(sigma_multiplier=2.0, close_cutoff=Quantity(value=1.5, unit=angstrom))[source]

Randomize the ligand positions in every state.

The position and orientation of the ligand in each state will be randomized. This works only if the system is a ligand-receptor system.

If you call this before minimizing, each positions will be minimized separately in the reference state, so you may want to call it afterwards to speed up minimization.

Parameters:
sigma_multiplier : float, optional

The ligand will be placed close to a random receptor atom at a distance that is normally distributed with standard deviation sigma_multiplier * receptor_radius_of_gyration (default is 2.0).

close_cutoff : simtk.unit.Quantity, optional

Each random placement proposal will be rejected if the ligand ends up being closer to the receptor than this cutoff (units of length, default is 1.5*unit.angstrom).

equilibrate(n_iterations, mcmc_moves=None)[source]

Equilibrate all states.

Parameters:
n_iterations : int

Number of equilibration iterations.

mcmc_moves : MCMCMove or list of MCMCMove, optional

Optionally, the MCMCMoves to use for equilibration can be different from the ones used in production.

run(n_iterations=None)[source]

Run the alchemical phase simulation.

Parameters:
n_iterations : int, optional

If specified, only at most the specified number of iterations will be run (default is None).

extend(n_iterations)[source]

Extend the simulation by the given number of iterations.

Parameters:
n_iterations : int

The number of iterations to run.