Pipeline¶
Utility functions to help setting up Yank configurations.
-
yank.pipeline.
compute_squared_distances
(molecule1_positions, molecule2_positions)[source]¶ Compute the squared distances between the atoms of two molecules.
All the positions must be expressed in the same unit of measure.
Parameters: - molecule1_positions : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of molecule1.
- molecule2_positions : numpy.ndarray
An Mx3 array where, M is the number of atoms, containing the positions of the atoms of the molecule2.
Returns: - squared_distances : numpy.ndarray
An NxM array of squared distances. distances_squared[i][j] is the squared distance between atom i of molecule1 and atom j of molecule 2.
-
yank.pipeline.
compute_min_dist
(mol_positions, *args)[source]¶ Compute the minimum distance between a molecule and a set of other molecules.
All the positions must be expressed in the same unit of measure.
Parameters: - mol_positions : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others
Returns: - min_dist : float
The minimum distance between
mol_positions
and the other set of positions
Other Parameters: - args
A series of numpy.ndarrays containing the positions of the atoms of the other molecules
-
yank.pipeline.
compute_min_max_dist
(mol_positions, *args)[source]¶ Compute minimum and maximum distances between a molecule and a set of other molecules.
All the positions must be expressed in the same unit of measure.
Parameters: - mol_positions : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule for which we want to compute the minimum distance from the others
Returns: - min_dist : float
The minimum distance between mol_positions and the atoms of the other positions
- max_dist : float
The maximum distance between mol_positions and the atoms of the other positions
Other Parameters: - args
A series of numpy.ndarrays containing the positions of the atoms of the other molecules
Examples
>>> mol1_pos = np.array([[-1, -1, -1], [1, 1, 1]], np.float) >>> mol2_pos = np.array([[2, 2, 2], [2, 4, 5]], np.float) # determine min dist >>> mol3_pos = np.array([[3, 3, 3], [3, 4, 5]], np.float) # determine max dist >>> min_dist, max_dist = compute_min_max_dist(mol1_pos, mol2_pos, mol3_pos) >>> min_dist == np.linalg.norm(mol1_pos[1] - mol2_pos[0]) True >>> max_dist == np.linalg.norm(mol1_pos[1] - mol3_pos[1]) True
-
yank.pipeline.
compute_radius_of_gyration
(positions)[source]¶ Compute the radius of gyration of the specified coordinate set.
Parameters: - positions : simtk.unit.Quantity with units compatible with angstrom
The coordinate set (natoms x 3) for which the radius of gyration is to be computed.
Returns: - radius_of_gyration : simtk.unit.Quantity with units compatible with angstrom
The radius of gyration
-
yank.pipeline.
compute_net_charge
(system, atom_indices)[source]¶ Compute the total net charge of a subset of atoms in the system.
Parameters: - system : simtk.openmm.System
The system object containing the atoms of interest.
- atom_indices : list of int
Indices of the atoms of interest.
Returns: - net_charge : int
Total net charge as the sum of the partial charges of the atoms.
-
yank.pipeline.
find_alchemical_counterions
(system, topography, region_name)[source]¶ Return the atom indices of the ligand or solute counter ions.
In periodic systems, the solvation box needs to be neutral, and if the decoupled molecule is charged, it will cause trouble. This can be used to find a set of ions in the system that neutralize the molecule, so that the solvation box will remain neutral all the time.
Parameters: - system : simtk.openmm.System
The system object containing the atoms of interest.
- topography : yank.Topography
The topography object holding the indices of the ions and the ligand (for binding free energy) or solute (for transfer free energy).
- region_name : str
The region name in the topography (e.g. “ligand_atoms”) for which to find counter ions.
Returns: - counterions_indices : list of int
The list of atom indices in the system of the counter ions neutralizing the region.
Raises: - ValueError
If the topography region has no atoms, or if it impossible to neutralize the region with the ions in the system.
-
yank.pipeline.
get_leap_recommended_pbradii
(implicit_solvent)[source]¶ Return the recommended PBradii setting for LeAP.
Parameters: - implicit_solvent : str
The implicit solvent model.
Returns: - pbradii : str or object
The LeAP recommended PBradii for the model.
Raises: - ValueError
If the implicit solvent model is not supported by OpenMM.
Examples
>>> get_leap_recommended_pbradii('OBC2') 'mbondi2' >>> from simtk.openmm.app import HCT >>> get_leap_recommended_pbradii(HCT) 'mbondi'
-
yank.pipeline.
create_system
(parameters_file, box_vectors, create_system_args, system_options)[source]¶ Create and return an OpenMM system.
Parameters: - parameters_file : simtk.openmm.app.AmberPrmtopFile or GromacsTopFile
The file used to create they system.
- box_vectors : list of Vec3
The default box vectors of the system will be set to this value.
- create_system_args : dict of str
The kwargs accepted by the
createSystem()
function of theparameters_file
.- system_options : dict
The kwargs to forward to
createSystem()
.
Returns: - system : simtk.openmm.System
The system created.
-
yank.pipeline.
read_system_files
(positions_file_path, parameters_file_path, system_options, gromacs_include_dir=None)[source]¶ Create a Yank arguments for a phase from system files.
Parameters: - positions_file_path : str
Path to system position file (e.g. ‘complex.inpcrd/.gro/.pdb’).
- parameters_file_path : str
Path to system parameters file (e.g. ‘complex.prmtop/.top/.xml’).
- system_options : dict
system_options[phase]
is a a dictionary containing options to pass tocreateSystem()
. If the parameters file is an OpenMM system in XML format, this will be ignored.- gromacs_include_dir : str, optional
Path to directory in which to look for other files included from the gromacs top file.
Returns: - system : simtk.openmm.System
The OpenMM System built from the given files.
- topology : openmm.app.Topology
The OpenMM Topology built from the given files.
- sampler_state : openmmtools.states.SamplerState
The sampler state containing the positions of the atoms.
-
yank.pipeline.
remove_overlap
(mol_positions, *args, **kwargs)[source]¶ Remove any eventual overlap between a molecule and a set of others.
The method both randomly shifts and rotates the molecule (when overlapping atoms are detected) until it does not clash with any other given molecule anymore. All the others are kept fixed.
All the positions must be expressed in the same unit of measure.
Parameters: - mol_positions : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that we want to not clash with the others.
- min_distance : float
The minimum distance accepted to consider the molecule not clashing with the others. Must be in the same unit of measure of the positions.
- sigma : float
The maximum displacement for a single step. Must be in the same unit of measure of the positions.
Returns: - x : numpy.ndarray
Positions of the atoms of the given molecules that do not clash.
Other Parameters: - args
A series of numpy.ndarrays containing the positions of the atoms of the molecules that are kept fixed.
-
yank.pipeline.
pack_transformation
(mol1_pos, mol2_pos, min_distance, max_distance)[source]¶ Compute an affine transformation that solve clashes and fit mol2 in the box.
The method randomly shifts and rotates mol2 until all its atoms are within min_distance and max_distance from mol1. The position of mol1 is kept fixed. Every 200 failed iterations, the algorithm increases max_distance by 50%. It raise an exception after 1000 iterations.
All the positions must be expressed in the same unit of measure.
Parameters: - mol1_pos : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be kept fixed.
- mol2_pos : numpy.ndarray
An Nx3 array where, N is the number of atoms, containing the positions of the atoms of the molecule that will be eventually moved.
- min_distance : float
The minimum distance accepted to consider mol2 not clashing with mol1. It must be in the same unit of measure of the positions.
- max_distance : float
The maximum distance from mol1 to consider mol2 within the box. It must be in the same unit of measure of the positions.
Returns: - transformation : numpy.ndarray
A 4x4 ndarray representing the affine transformation that translate and rotate mol2.
-
yank.pipeline.
pull_close
(fixed_mol_pos, translated_mol_pos, min_bound, max_bound)[source]¶ Heuristic algorithm to quickly translate the ligand close to the receptor.
The distance of the ligand from the receptor here is defined as the shortest Euclidean distance between an atom of the ligand and one of the receptor. The molecules positions will not be modified if the ligand is already at a distance in the interval [min_bound, max_bound].
Parameters: - fixed_mol_pos : numpy.array
The positions of the molecule to keep fixed as a Nx3 array.
- translated_mol_pos : numpy.array
The positions of the molecule to translate as a Nx3 array.
- min_bound : float
Minimum distance from the receptor to the ligand. This should be high enough for the ligand to not overlap the receptor atoms at the beginning of the simulation.
- max_bound : float
Maximum distance from the receptor to the ligand. This should be short enough to make the ligand and the receptor interact since the beginning of the simulation.
Returns: - translation : numpy.array
A 1x3 array containing the translation vector to apply to translated_mol_pos to move the molecule at a distance between min_bound and max_bound from fixed_mol_pos.
-
yank.pipeline.
strip_protons
(input_file_path, output_file_path)[source]¶ Remove all hydrogens from PDB file and save the result.
Input and output file cannot be the same file
Parameters: - input_file_path : str
Full file path to the file to read, including extensions
- output_file_path : str
Full file path to the file to save, including extensions
-
yank.pipeline.
apply_pdbfixer
(input_file_path, output_file_path, directives)[source]¶ Apply PDBFixer to make changes to the specified molecule.
Single mutants are supported in the form “T315I” Double mutants are supported in the form “L858R/T790M”
The string “WT” still pushes the molecule through PDBFixer, but makes no mutations. This is useful for testing.
Original PDB file numbering scheme is used.
Currently, only PDB files are supported. pdbfixer is used to make the mutations
Parameters: - input_file_path : str
Full file path to the file to read, including extensions
- output_file_path : str
Full file path to the file to save, including extensions
- directives : dict
Dict containing directives for PDBFixer.
-
yank.pipeline.
read_csv_lines
(file_path, lines)[source]¶ Return a list of CSV records.
The function takes care of ignoring comments and blank lines.
Parameters: - file_path : str
The path to the CSV file.
- lines : ‘all’ or int
The index of the line to read or ‘all’ to return the list of all lines.
Returns: - records : str or list of str
The CSV record if lines is an integer, or a list of CSV records if it is ‘all’.
-
class
yank.pipeline.
SetupDatabase
(setup_dir, molecules=None, solvents=None, systems=None)[source]¶ Provide utility functions to set up systems and molecules.
The object allows to access molecules, systems and solvents by ‘id’ and takes care of parametrizing molecules and creating the AMBER prmtop and inpcrd files describing systems.
Parameters: - setup_dir : str
Path to the main setup directory. Changing this means changing the database.
- molecules : dict, Optional. Default: None
YAML description of the molecules. Dictionary should be of form {molecule_id : molecule YAML description}
- solvents : dict, Optional. Default: None
YAML description of the solvents. Dictionary should be of form {solvent_id : solvent YAML description}
- systems : dict, Optional. Default: None
YAML description of the systems. Dictionary should be of form {system_id : system YAML description}
-
SYSTEMS_DIR
= 'systems'¶ Stock system’s sub-directory name
-
MOLECULES_DIR
= 'molecules'¶ Stock Molecules sub-directory name
-
CLASH_THRESHOLD
= 1.5¶ distance in Angstroms to consider two atoms clashing
-
get_molecule_dir
(molecule_id)[source]¶ Return the directory where the parameter files are stored.
Parameters: - molecule_id : str
The ID of the molecule.
Returns: - str
The path to the molecule directory.
-
get_system_files_paths
(system_id)[source]¶ Return the paths to the systems files.
Parameters: - system_id : str
The ID of the system.
Returns: - system_files_paths : list of namedtuple
Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).
-
is_molecule_setup
(molecule_id)[source]¶ Check whether the molecule has been processed previously.
The molecule must be set up if it needs to be parametrize by antechamber (and the gaff.mol2 and frcmod files do not exist), if the molecule must be generated by OpenEye, or if it needs to be extracted by a multi-molecule file.
An example to clarify the difference between the two return values: a protein in a single-frame pdb does not have to be processed (since it does not go through antechamber) thus the function will return
is_setup=True
andis_processed=False
.Parameters: - molecule_id : str
The id of the molecule.
Returns: - is_setup : bool
True if the molecule’s parameter files have been specified by the user or if they have been generated by SetupDatabase.
- is_processed : bool
True if parameter files have been generated previously by SetupDatabase (i.e. if the parameter files were not manually specified by the user).
-
is_system_setup
(system_id)[source]¶ Check whether the system has been already processed.
Parameters: - system_id : str
The ID of the system.
Returns: - is_setup : bool
True if the system is ready to be used for an experiment. Either because the system has directly provided the system files, or because it already went through the setup pipeline.
- is_processed : bool
True if the system has already gone through the setup pipeline.
-
get_system
(system_id)[source]¶ Make sure that the system files are set up and return the system folder.
If necessary, create the prmtop and inpcrd files from the given components. The system files are generated with tleap. If no molecule specifies a general force field, leaprc.ff14SB is loaded.
Parameters: - system_id : str
The ID of the system.
Returns: - system_files_paths : list of namedtuple
Elements of the list contain the paths to the system files for each phase. Each namedtuple contains the fields position_path (e.g. inpcrd, gro, or pdb) and parameters_path (e.g. prmtop, top, or xml).
-
yank.pipeline.
trailblaze_alchemical_protocol
(thermodynamic_state, sampler_state, mcmc_move, state_parameters, std_energy_threshold=0.5, threshold_tolerance=0.05, n_samples_per_state=100)[source]¶ Find an alchemical path by placing alchemical states at a fixed distance.
The distance between two states is estimated by collecting
n_samples_per_state
configurations through the MCMCMove in one of the two alchemical states, and computing the standard deviation of the difference of potential energies between the two states at those configurations.Two states are chosen for the protocol if their standard deviation is within
std_energy_threshold +- threshold_tolerance
.Parameters: - thermodynamic_state : openmmtools.states.CompoundThermodynamicState
The state of the alchemically modified system.
- sampler_state : openmmtools.states.SamplerState
The sampler states including initial positions and box vectors.
- mcmc_move : openmmtools.mcmc.MCMCMove
The MCMCMove to use for propagation.
- state_parameters : list of tuples (str, [float, float])
Each element of this list is a tuple containing first the name of the parameter to be modified (e.g.
lambda_electrostatics
,lambda_sterics
) and a list specifying the initial and final values for the path.- std_energy_threshold : float
The threshold that determines how to separate the states between each others.
- threshold_tolerance : float
The tolerance on the found standard deviation.
- n_samples_per_state : int
How many samples to collect to estimate the overlap between two states.
Returns: - optimal_protocol : dict {str, list of floats}
The estimated protocol. Each dictionary key is one of the parameters in
state_parameters
, and its values is the list of values that it takes in each state of the path.