Pipeline¶

Utility functions to help setting up Yank configurations.

yank.pipeline.compute_squared_distances(molecule1_positions, molecule2_positions)[source]¶

Compute the squared distances between the atoms of two molecules.

All the positions must be expressed in the same unit of measure.

Parameters:	molecule1_positions : numpy.ndarray An Nx3 array where, N is the number of atoms, containing the positions of the atoms of molecule1. molecule2_positions : numpy.ndarray An Mx3 array where, M is the number of atoms, containing the positions of the atoms of the molecule2.
Returns:	squared_distances : numpy.ndarray An NxM array of squared distances. distances_squared[i][j] is the squared distance between atom i of molecule1 and atom j of molecule 2.

yank.pipeline.compute_min_dist(mol_positions, *args)[source]¶

Compute the minimum distance between a molecule and a set of other molecules.

All the positions must be expressed in the same unit of measure.

Other Parameters:
Parameters:	mol_positions : numpy.ndarray An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others
Returns:	min_dist : float The minimum distance between `mol_positions` and the other set of positions
	args A series of numpy.ndarrays containing the positions of the atoms of the other molecules

yank.pipeline.compute_min_max_dist(mol_positions, *args)[source]¶

Compute minimum and maximum distances between a molecule and a set of other molecules.

All the positions must be expressed in the same unit of measure.

Other Parameters:
Parameters:	mol_positions : numpy.ndarray An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others
Returns:	min_dist : float The minimum distance between mol_positions and the atoms of the other positions max_dist : float The maximum distance between mol_positions and the atoms of the other positions
	args A series of numpy.ndarrays containing the positions of the atoms of the other molecules

Examples

>>> mol1_pos = np.array([[-1, -1, -1], [1, 1, 1]], np.float)
>>> mol2_pos = np.array([[2, 2, 2], [2, 4, 5]], np.float)  # determine min dist
>>> mol3_pos = np.array([[3, 3, 3], [3, 4, 5]], np.float)  # determine max dist
>>> min_dist, max_dist = compute_min_max_dist(mol1_pos, mol2_pos, mol3_pos)
>>> min_dist == np.linalg.norm(mol1_pos[1] - mol2_pos[0])
True
>>> max_dist == np.linalg.norm(mol1_pos[1] - mol3_pos[1])
True

yank.pipeline.compute_radius_of_gyration(positions)[source]¶

Compute the radius of gyration of the specified coordinate set.

Parameters:	positions : simtk.unit.Quantity with units compatible with angstrom The coordinate set (natoms x 3) for which the radius of gyration is to be computed.
Returns:	radius_of_gyration : simtk.unit.Quantity with units compatible with angstrom The radius of gyration

yank.pipeline.compute_net_charge(system, atom_indices)[source]¶

Compute the total net charge of a subset of atoms in the system.

Parameters:	system : simtk.openmm.System The system object containing the atoms of interest. atom_indices : list of int Indices of the atoms of interest.
Returns:	net_charge : int Total net charge as the sum of the partial charges of the atoms.

yank.pipeline.find_alchemical_counterions(system, topography, region_name)[source]¶

Return the atom indices of the ligand or solute counter ions.

In periodic systems, the solvation box needs to be neutral, and if the decoupled molecule is charged, it will cause trouble. This can be used to find a set of ions in the system that neutralize the molecule, so that the solvation box will remain neutral all the time.

Parameters:	system : simtk.openmm.System The system object containing the atoms of interest. topography : yank.Topography The topography object holding the indices of the ions and the ligand (for binding free energy) or solute (for transfer free energy). region_name : str The region name in the topography (e.g. “ligand_atoms”) for which to find counter ions.
Returns:	counterions_indices : list of int The list of atom indices in the system of the counter ions neutralizing the region.
Raises:	ValueError If the topography region has no atoms, or if it impossible to neutralize the region with the ions in the system.

yank.pipeline.get_leap_recommended_pbradii(implicit_solvent)[source]¶

Return the recommended PBradii setting for LeAP.

Parameters:	implicit_solvent : str The implicit solvent model.
Returns:	pbradii : str or object The LeAP recommended PBradii for the model.
Raises:	ValueError If the implicit solvent model is not supported by OpenMM.

Examples

>>> get_leap_recommended_pbradii('OBC2')
'mbondi2'
>>> from simtk.openmm.app import HCT
>>> get_leap_recommended_pbradii(HCT)
'mbondi'

yank.pipeline.create_system(parameters_file, box_vectors, create_system_args, system_options)[source]¶

Create and return an OpenMM system.

Parameters:	parameters_file : simtk.openmm.app.AmberPrmtopFile or GromacsTopFile The file used to create they system. box_vectors : list of Vec3 The default box vectors of the system will be set to this value. create_system_args : dict of str The kwargs accepted by the `createSystem()` function of the `parameters_file`. system_options : dict The kwargs to forward to `createSystem()`.
Returns:	system : simtk.openmm.System The system created.

yank.pipeline.read_system_files(positions_file_path, parameters_file_path, system_options, gromacs_include_dir=None)[source]¶

Create a Yank arguments for a phase from system files.

Parameters:

positions_file_path : str: Path to system position file (e.g. ‘complex.inpcrd/.gro/.pdb’).
parameters_file_path : str: Path to system parameters file (e.g. ‘complex.prmtop/.top/.xml’).
system_options : dict: system_options[phase] is a a dictionary containing options to pass to createSystem(). If the parameters file is an OpenMM system in XML format, this will be ignored.
gromacs_include_dir : str, optional: Path to directory in which to look for other files included from the gromacs top file.

Returns:

system : simtk.openmm.System: The OpenMM System built from the given files.
topology : openmm.app.Topology: The OpenMM Topology built from the given files.
sampler_state : openmmtools.states.SamplerState: The sampler state containing the positions of the atoms.

yank.pipeline.remove_overlap(mol_positions, *args, **kwargs)[source]¶

Remove any eventual overlap between a molecule and a set of others.

The method both randomly shifts and rotates the molecule (when overlapping atoms are detected) until it does not clash with any other given molecule anymore. All the others are kept fixed.

All the positions must be expressed in the same unit of measure.

Other Parameters:
Parameters:	mol_positions : numpy.ndarray An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that we want to not clash with the others. min_distance : float The minimum distance accepted to consider the molecule not clashing with the others. Must be in the same unit of measure of the positions. sigma : float The maximum displacement for a single step. Must be in the same unit of measure of the positions.
Returns:	x : numpy.ndarray Positions of the atoms of the given molecules that do not clash.
	args A series of numpy.ndarrays containing the positions of the atoms of the molecules that are kept fixed.

yank.pipeline.pack_transformation(mol1_pos, mol2_pos, min_distance, max_distance)[source]¶

Compute an affine transformation that solve clashes and fit mol2 in the box.

The method randomly shifts and rotates mol2 until all its atoms are within min_distance and max_distance from mol1. The position of mol1 is kept fixed. Every 200 failed iterations, the algorithm increases max_distance by 50%. It raise an exception after 1000 iterations.

All the positions must be expressed in the same unit of measure.

Parameters:

mol1_pos : numpy.ndarray: An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be kept fixed.
mol2_pos : numpy.ndarray: An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be eventually moved.
min_distance : float: The minimum distance accepted to consider mol2 not clashing with mol1. It must be in the same unit of measure of the positions.
max_distance : float: The maximum distance from mol1 to consider mol2 within the box. It must be in the same unit of measure of the positions.

Returns:

transformation : numpy.ndarray: A 4x4 ndarray representing the affine transformation that translate and rotate mol2.

yank.pipeline.pull_close(fixed_mol_pos, translated_mol_pos, min_bound, max_bound)[source]¶

Heuristic algorithm to quickly translate the ligand close to the receptor.

The distance of the ligand from the receptor here is defined as the shortest Euclidean distance between an atom of the ligand and one of the receptor. The molecules positions will not be modified if the ligand is already at a distance in the interval [min_bound, max_bound].

Parameters:

fixed_mol_pos : numpy.array: The positions of the molecule to keep fixed as a Nx3 array.
translated_mol_pos : numpy.array: The positions of the molecule to translate as a Nx3 array.
min_bound : float: Minimum distance from the receptor to the ligand. This should be high enough for the ligand to not overlap the receptor atoms at the beginning of the simulation.
max_bound : float: Maximum distance from the receptor to the ligand. This should be short enough to make the ligand and the receptor interact since the beginning of the simulation.

Returns:

translation : numpy.array: A 1x3 array containing the translation vector to apply to translated_mol_pos to move the molecule at a distance between min_bound and max_bound from fixed_mol_pos.

yank.pipeline.strip_protons(input_file_path, output_file_path)[source]¶

Remove all hydrogens from PDB file and save the result.

Input and output file cannot be the same file

Parameters:	input_file_path : str Full file path to the file to read, including extensions output_file_path : str Full file path to the file to save, including extensions

yank.pipeline.process_tool_directive(directives, option, dispatch, allowed_values, yields_value=False)[source]¶

Process a directive.

Parameters:

option : str: The name of the option to be processed. Will remove this option from directives once processed.
dispatch : function: The function to call.
allowed_values : list: If not None, the value of directives[option] will be checked against this list
yields_value : boolean, default False: Tells this function to expect a return from the dispatch function and give it back as needed

yank.pipeline.apply_pdbfixer(input_file_path, output_file_path, directives)[source]¶

Apply PDBFixer to make changes to the specified molecule.

Single mutants are supported in the form “T315I” Double mutants are supported in the form “L858R/T790M”

The string “WT” still pushes the molecule through PDBFixer, but makes no mutations. This is useful for testing.

Original PDB file numbering scheme is used.

Currently, only PDB files are supported. pdbfixer is used to make the mutations

Parameters:	input_file_path : str Full file path to the file to read, including extensions output_file_path : str Full file path to the file to save, including extensions directives : dict Dict containing directives for PDBFixer.

yank.pipeline.apply_modeller(input_file_path, output_file_path, directives)[source]¶

Apply Salilab Modeller to make changes to the specified molecule.

Single mutants are supported in the form “T315I” Double mutants are not currently supported.

The string “WT” makes no mutations.

Original PDB file numbering scheme is used.

Currently, only PDB files are supported. modeller is used to make the mutations. You must have a license file installed for this to work

Parameters:	input_file_path : str Full file path to the file to read, including extensions output_file_path : str Full file path to the file to save, including extensions directives : dict Dict containing directives for modeller.

yank.pipeline.read_csv_lines(file_path, lines)[source]¶

Return a list of CSV records.

The function takes care of ignoring comments and blank lines.

Parameters:	file_path : str The path to the CSV file. lines : ‘all’ or int The index of the line to read or ‘all’ to return the list of all lines.
Returns:	records : str or list of str The CSV record if lines is an integer, or a list of CSV records if it is ‘all’.

class yank.pipeline.SetupDatabase(setup_dir, molecules=None, solvents=None, systems=None)[source]¶

Provide utility functions to set up systems and molecules.

The object allows to access molecules, systems and solvents by ‘id’ and takes care of parametrizing molecules and creating the AMBER prmtop and inpcrd files describing systems.

Parameters:

setup_dir : str: Path to the main setup directory. Changing this means changing the database.
molecules : dict, Optional. Default: None: YAML description of the molecules. Dictionary should be of form {molecule_id : molecule YAML description}
solvents : dict, Optional. Default: None: YAML description of the solvents. Dictionary should be of form {solvent_id : solvent YAML description}
systems : dict, Optional. Default: None: YAML description of the systems. Dictionary should be of form {system_id : system YAML description}

SYSTEMS_DIR = 'systems'¶: Stock system’s sub-directory name

MOLECULES_DIR = 'molecules'¶: Stock Molecules sub-directory name

CLASH_THRESHOLD = 1.5¶: distance in Angstroms to consider two atoms clashing

get_molecule_dir(molecule_id)[source]¶

Return the directory where the parameter files are stored.

Parameters:	molecule_id : str The ID of the molecule.
Returns:	str The path to the molecule directory.

get_system_files_paths(system_id)[source]¶

Return the paths to the systems files.

Parameters:	system_id : str The ID of the system.
Returns:	system_files_paths : list of namedtuple Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).

is_molecule_setup(molecule_id)[source]¶

Check whether the molecule has been processed previously.

The molecule must be set up if it needs to be parametrize by antechamber (and the gaff.mol2 and frcmod files do not exist), if the molecule must be generated by OpenEye, or if it needs to be extracted by a multi-molecule file.

An example to clarify the difference between the two return values: a protein in a single-frame pdb does not have to be processed (since it does not go through antechamber) thus the function will return is_setup=True and is_processed=False.

Parameters:	molecule_id : str The id of the molecule.
Returns:	is_setup : bool True if the molecule’s parameter files have been specified by the user or if they have been generated by SetupDatabase. is_processed : bool True if parameter files have been generated previously by SetupDatabase (i.e. if the parameter files were not manually specified by the user).

is_system_setup(system_id)[source]¶

Check whether the system has been already processed.

Parameters:	system_id : str The ID of the system.
Returns:	is_setup : bool True if the system is ready to be used for an experiment. Either because the system has directly provided the system files, or because it already went through the setup pipeline. is_processed : bool True if the system has already gone through the setup pipeline.

get_system(system_id)[source]¶

Make sure that the system files are set up and return the system folder.

If necessary, create the prmtop and inpcrd files from the given components. The system files are generated with tleap. If no molecule specifies a general force field, leaprc.ff14SB is loaded.

Parameters:	system_id : str The ID of the system.
Returns:	system_files_paths : list of namedtuple Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).

setup_all_systems()[source]¶

Setup all molecules and systems in the database.

The method supports parallelization through MPI.

yank.pipeline.trailblaze_alchemical_protocol(thermodynamic_state, sampler_state, mcmc_move, state_parameters, std_energy_threshold=0.5, threshold_tolerance=0.05, n_samples_per_state=100)[source]¶

Find an alchemical path by placing alchemical states at a fixed distance.

The distance between two states is estimated by collecting n_samples_per_state configurations through the MCMCMove in one of the two alchemical states, and computing the standard deviation of the difference of potential energies between the two states at those configurations.

Two states are chosen for the protocol if their standard deviation is within std_energy_threshold +- threshold_tolerance.

Parameters:

thermodynamic_state : openmmtools.states.CompoundThermodynamicState: The state of the alchemically modified system.
sampler_state : openmmtools.states.SamplerState: The sampler states including initial positions and box vectors.
mcmc_move : openmmtools.mcmc.MCMCMove: The MCMCMove to use for propagation.
state_parameters : list of tuples (str, [float, float]): Each element of this list is a tuple containing first the name of the parameter to be modified (e.g. lambda_electrostatics, lambda_sterics) and a list specifying the initial and final values for the path.
std_energy_threshold : float: The threshold that determines how to separate the states between each others.
threshold_tolerance : float: The tolerance on the found standard deviation.
n_samples_per_state : int: How many samples to collect to estimate the overlap between two states.

Returns:

optimal_protocol : dict {str, list of floats}: The estimated protocol. Each dictionary key is one of the parameters in state_parameters, and its values is the list of values that it takes in each state of the path.