Pipeline

Utility functions to help setting up Yank configurations.

yank.pipeline.compute_squared_distances(molecule1_positions, molecule2_positions)[source]

Compute the squared distances between the atoms of two molecules.

All the positions must be expressed in the same unit of measure.

Parameters:
molecule1_positions : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of molecule1.

molecule2_positions : numpy.ndarray

An Mx3 array where, M is the number of atoms, containing the positions of the atoms of the molecule2.

Returns:
squared_distances : numpy.ndarray

An NxM array of squared distances. distances_squared[i][j] is the squared distance between atom i of molecule1 and atom j of molecule 2.

yank.pipeline.compute_min_dist(mol_positions, *args)[source]

Compute the minimum distance between a molecule and a set of other molecules.

All the positions must be expressed in the same unit of measure.

Parameters:
mol_positions : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others

Returns:
min_dist : float

The minimum distance between mol_positions and the other set of positions

Other Parameters:
 
args

A series of numpy.ndarrays containing the positions of the atoms of the other molecules

yank.pipeline.compute_min_max_dist(mol_positions, *args)[source]

Compute minimum and maximum distances between a molecule and a set of other molecules.

All the positions must be expressed in the same unit of measure.

Parameters:
mol_positions : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others

Returns:
min_dist : float

The minimum distance between mol_positions and the atoms of the other positions

max_dist : float

The maximum distance between mol_positions and the atoms of the other positions

Other Parameters:
 
args

A series of numpy.ndarrays containing the positions of the atoms of the other molecules

Examples

>>> mol1_pos = np.array([[-1, -1, -1], [1, 1, 1]], np.float)
>>> mol2_pos = np.array([[2, 2, 2], [2, 4, 5]], np.float)  # determine min dist
>>> mol3_pos = np.array([[3, 3, 3], [3, 4, 5]], np.float)  # determine max dist
>>> min_dist, max_dist = compute_min_max_dist(mol1_pos, mol2_pos, mol3_pos)
>>> min_dist == np.linalg.norm(mol1_pos[1] - mol2_pos[0])
True
>>> max_dist == np.linalg.norm(mol1_pos[1] - mol3_pos[1])
True
yank.pipeline.compute_radius_of_gyration(positions)[source]

Compute the radius of gyration of the specified coordinate set.

Parameters:
positions : simtk.unit.Quantity with units compatible with angstrom

The coordinate set (natoms x 3) for which the radius of gyration is to be computed.

Returns:
radius_of_gyration : simtk.unit.Quantity with units compatible with angstrom

The radius of gyration

yank.pipeline.compute_net_charge(system, atom_indices)[source]

Compute the total net charge of a subset of atoms in the system.

Parameters:
system : simtk.openmm.System

The system object containing the atoms of interest.

atom_indices : list of int

Indices of the atoms of interest.

Returns:
net_charge : int

Total net charge as the sum of the partial charges of the atoms.

yank.pipeline.find_alchemical_counterions(system, topography, region_name)[source]

Return the atom indices of the ligand or solute counter ions.

In periodic systems, the solvation box needs to be neutral, and if the decoupled molecule is charged, it will cause trouble. This can be used to find a set of ions in the system that neutralize the molecule, so that the solvation box will remain neutral all the time.

Parameters:
system : simtk.openmm.System

The system object containing the atoms of interest.

topography : yank.Topography

The topography object holding the indices of the ions and the ligand (for binding free energy) or solute (for transfer free energy).

region_name : str

The region name in the topography (e.g. “ligand_atoms”) for which to find counter ions.

Returns:
counterions_indices : list of int

The list of atom indices in the system of the counter ions neutralizing the region.

Raises:
ValueError

If the topography region has no atoms, or if it impossible to neutralize the region with the ions in the system.

Return the recommended PBradii setting for LeAP.

Parameters:
implicit_solvent : str

The implicit solvent model.

Returns:
pbradii : str or object

The LeAP recommended PBradii for the model.

Raises:
ValueError

If the implicit solvent model is not supported by OpenMM.

Examples

>>> get_leap_recommended_pbradii('OBC2')
'mbondi2'
>>> from simtk.openmm.app import HCT
>>> get_leap_recommended_pbradii(HCT)
'mbondi'
yank.pipeline.create_system(parameters_file, box_vectors, create_system_args, system_options)[source]

Create and return an OpenMM system.

Parameters:
parameters_file : simtk.openmm.app.AmberPrmtopFile or GromacsTopFile

The file used to create they system.

box_vectors : list of Vec3

The default box vectors of the system will be set to this value.

create_system_args : dict of str

The kwargs accepted by the createSystem() function of the parameters_file.

system_options : dict

The kwargs to forward to createSystem().

Returns:
system : simtk.openmm.System

The system created.

yank.pipeline.read_system_files(positions_file_path, parameters_file_path, system_options, gromacs_include_dir=None)[source]

Create a Yank arguments for a phase from system files.

Parameters:
positions_file_path : str

Path to system position file (e.g. ‘complex.inpcrd/.gro/.pdb’).

parameters_file_path : str

Path to system parameters file (e.g. ‘complex.prmtop/.top/.xml’).

system_options : dict

system_options[phase] is a a dictionary containing options to pass to createSystem(). If the parameters file is an OpenMM system in XML format, this will be ignored.

gromacs_include_dir : str, optional

Path to directory in which to look for other files included from the gromacs top file.

Returns:
system : simtk.openmm.System

The OpenMM System built from the given files.

topology : openmm.app.Topology

The OpenMM Topology built from the given files.

sampler_state : openmmtools.states.SamplerState

The sampler state containing the positions of the atoms.

yank.pipeline.remove_overlap(mol_positions, *args, **kwargs)[source]

Remove any eventual overlap between a molecule and a set of others.

The method both randomly shifts and rotates the molecule (when overlapping atoms are detected) until it does not clash with any other given molecule anymore. All the others are kept fixed.

All the positions must be expressed in the same unit of measure.

Parameters:
mol_positions : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that we want to not clash with the others.

min_distance : float

The minimum distance accepted to consider the molecule not clashing with the others. Must be in the same unit of measure of the positions.

sigma : float

The maximum displacement for a single step. Must be in the same unit of measure of the positions.

Returns:
x : numpy.ndarray

Positions of the atoms of the given molecules that do not clash.

Other Parameters:
 
args

A series of numpy.ndarrays containing the positions of the atoms of the molecules that are kept fixed.

yank.pipeline.pack_transformation(mol1_pos, mol2_pos, min_distance, max_distance)[source]

Compute an affine transformation that solve clashes and fit mol2 in the box.

The method randomly shifts and rotates mol2 until all its atoms are within min_distance and max_distance from mol1. The position of mol1 is kept fixed. Every 200 failed iterations, the algorithm increases max_distance by 50%. It raise an exception after 1000 iterations.

All the positions must be expressed in the same unit of measure.

Parameters:
mol1_pos : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be kept fixed.

mol2_pos : numpy.ndarray

An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be eventually moved.

min_distance : float

The minimum distance accepted to consider mol2 not clashing with mol1. It must be in the same unit of measure of the positions.

max_distance : float

The maximum distance from mol1 to consider mol2 within the box. It must be in the same unit of measure of the positions.

Returns:
transformation : numpy.ndarray

A 4x4 ndarray representing the affine transformation that translate and rotate mol2.

yank.pipeline.pull_close(fixed_mol_pos, translated_mol_pos, min_bound, max_bound)[source]

Heuristic algorithm to quickly translate the ligand close to the receptor.

The distance of the ligand from the receptor here is defined as the shortest Euclidean distance between an atom of the ligand and one of the receptor. The molecules positions will not be modified if the ligand is already at a distance in the interval [min_bound, max_bound].

Parameters:
fixed_mol_pos : numpy.array

The positions of the molecule to keep fixed as a Nx3 array.

translated_mol_pos : numpy.array

The positions of the molecule to translate as a Nx3 array.

min_bound : float

Minimum distance from the receptor to the ligand. This should be high enough for the ligand to not overlap the receptor atoms at the beginning of the simulation.

max_bound : float

Maximum distance from the receptor to the ligand. This should be short enough to make the ligand and the receptor interact since the beginning of the simulation.

Returns:
translation : numpy.array

A 1x3 array containing the translation vector to apply to translated_mol_pos to move the molecule at a distance between min_bound and max_bound from fixed_mol_pos.

yank.pipeline.strip_protons(input_file_path, output_file_path)[source]

Remove all hydrogens from PDB file and save the result.

Input and output file cannot be the same file

Parameters:
input_file_path : str

Full file path to the file to read, including extensions

output_file_path : str

Full file path to the file to save, including extensions

yank.pipeline.process_tool_directive(directives, option, dispatch, allowed_values, yields_value=False)[source]

Process a directive.

Parameters:
option : str

The name of the option to be processed. Will remove this option from directives once processed.

dispatch : function

The function to call.

allowed_values : list

If not None, the value of directives[option] will be checked against this list

yields_value : boolean, default False

Tells this function to expect a return from the dispatch function and give it back as needed

yank.pipeline.apply_pdbfixer(input_file_path, output_file_path, directives)[source]

Apply PDBFixer to make changes to the specified molecule.

Single mutants are supported in the form “T315I” Double mutants are supported in the form “L858R/T790M”

The string “WT” still pushes the molecule through PDBFixer, but makes no mutations. This is useful for testing.

Original PDB file numbering scheme is used.

Currently, only PDB files are supported. pdbfixer is used to make the mutations

Parameters:
input_file_path : str

Full file path to the file to read, including extensions

output_file_path : str

Full file path to the file to save, including extensions

directives : dict

Dict containing directives for PDBFixer.

yank.pipeline.apply_modeller(input_file_path, output_file_path, directives)[source]

Apply Salilab Modeller to make changes to the specified molecule.

Single mutants are supported in the form “T315I” Double mutants are not currently supported.

The string “WT” makes no mutations.

Original PDB file numbering scheme is used.

Currently, only PDB files are supported. modeller is used to make the mutations. You must have a license file installed for this to work

Parameters:
input_file_path : str

Full file path to the file to read, including extensions

output_file_path : str

Full file path to the file to save, including extensions

directives : dict

Dict containing directives for modeller.

yank.pipeline.read_csv_lines(file_path, lines)[source]

Return a list of CSV records.

The function takes care of ignoring comments and blank lines.

Parameters:
file_path : str

The path to the CSV file.

lines : ‘all’ or int

The index of the line to read or ‘all’ to return the list of all lines.

Returns:
records : str or list of str

The CSV record if lines is an integer, or a list of CSV records if it is ‘all’.

class yank.pipeline.SetupDatabase(setup_dir, molecules=None, solvents=None, systems=None)[source]

Provide utility functions to set up systems and molecules.

The object allows to access molecules, systems and solvents by ‘id’ and takes care of parametrizing molecules and creating the AMBER prmtop and inpcrd files describing systems.

Parameters:
setup_dir : str

Path to the main setup directory. Changing this means changing the database.

molecules : dict, Optional. Default: None

YAML description of the molecules. Dictionary should be of form {molecule_id : molecule YAML description}

solvents : dict, Optional. Default: None

YAML description of the solvents. Dictionary should be of form {solvent_id : solvent YAML description}

systems : dict, Optional. Default: None

YAML description of the systems. Dictionary should be of form {system_id : system YAML description}

SYSTEMS_DIR = 'systems'

Stock system’s sub-directory name

MOLECULES_DIR = 'molecules'

Stock Molecules sub-directory name

CLASH_THRESHOLD = 1.5

distance in Angstroms to consider two atoms clashing

get_molecule_dir(molecule_id)[source]

Return the directory where the parameter files are stored.

Parameters:
molecule_id : str

The ID of the molecule.

Returns:
str

The path to the molecule directory.

get_system_files_paths(system_id)[source]

Return the paths to the systems files.

Parameters:
system_id : str

The ID of the system.

Returns:
system_files_paths : list of namedtuple

Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).

is_molecule_setup(molecule_id)[source]

Check whether the molecule has been processed previously.

The molecule must be set up if it needs to be parametrize by antechamber (and the gaff.mol2 and frcmod files do not exist), if the molecule must be generated by OpenEye, or if it needs to be extracted by a multi-molecule file.

An example to clarify the difference between the two return values: a protein in a single-frame pdb does not have to be processed (since it does not go through antechamber) thus the function will return is_setup=True and is_processed=False.

Parameters:
molecule_id : str

The id of the molecule.

Returns:
is_setup : bool

True if the molecule’s parameter files have been specified by the user or if they have been generated by SetupDatabase.

is_processed : bool

True if parameter files have been generated previously by SetupDatabase (i.e. if the parameter files were not manually specified by the user).

is_system_setup(system_id)[source]

Check whether the system has been already processed.

Parameters:
system_id : str

The ID of the system.

Returns:
is_setup : bool

True if the system is ready to be used for an experiment. Either because the system has directly provided the system files, or because it already went through the setup pipeline.

is_processed : bool

True if the system has already gone through the setup pipeline.

get_system(system_id)[source]

Make sure that the system files are set up and return the system folder.

If necessary, create the prmtop and inpcrd files from the given components. The system files are generated with tleap. If no molecule specifies a general force field, leaprc.ff14SB is loaded.

Parameters:
system_id : str

The ID of the system.

Returns:
system_files_paths : list of namedtuple

Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).

setup_all_systems()[source]

Setup all molecules and systems in the database.

The method supports parallelization through MPI.

yank.pipeline.trailblaze_alchemical_protocol(thermodynamic_state, sampler_state, mcmc_move, state_parameters, std_energy_threshold=0.5, threshold_tolerance=0.05, n_samples_per_state=100)[source]

Find an alchemical path by placing alchemical states at a fixed distance.

The distance between two states is estimated by collecting n_samples_per_state configurations through the MCMCMove in one of the two alchemical states, and computing the standard deviation of the difference of potential energies between the two states at those configurations.

Two states are chosen for the protocol if their standard deviation is within std_energy_threshold +- threshold_tolerance.

Parameters:
thermodynamic_state : openmmtools.states.CompoundThermodynamicState

The state of the alchemically modified system.

sampler_state : openmmtools.states.SamplerState

The sampler states including initial positions and box vectors.

mcmc_move : openmmtools.mcmc.MCMCMove

The MCMCMove to use for propagation.

state_parameters : list of tuples (str, [float, float])

Each element of this list is a tuple containing first the name of the parameter to be modified (e.g. lambda_electrostatics, lambda_sterics) and a list specifying the initial and final values for the path.

std_energy_threshold : float

The threshold that determines how to separate the states between each others.

threshold_tolerance : float

The tolerance on the found standard deviation.

n_samples_per_state : int

How many samples to collect to estimate the overlap between two states.

Returns:
optimal_protocol : dict {str, list of floats}

The estimated protocol. Each dictionary key is one of the parameters in state_parameters, and its values is the list of values that it takes in each state of the path.