Absolute Binding of Binders to T4 lysozyme L99A in Explicit Solvent with Combinatorial Options¶
This example performs absolute binding free energy calculations for a series of small molecules that contain known binders to the T4 Lysozyme L99A protein.
We take advantage of three advanced features of YANK in this example (please see the basic tutorial for the YAML sections and binding free energy calculation for learning how to set up basic YANK simulations):
- YANK’s ability to build out small molecules from SMILES strings (through OpenEye Toolkit)
- Please see our OpenEye Install Instructions for installing these tools
- Alternately see OpenEye’s Instructions for additional help
- YANK’s ability to run multiple ligands through the same commands
- YANK’s ability to run Combinatorial options
- See the section below or the full documentation page for more information on
!Combinatorial
- See the section below or the full documentation page for more information on
This example resides in {PYTHON SOURCE DIR}/share/binding/t4-lysozyme
.
Original source ligands collected by the Shoichet Lab.
Disclaimer on this Example¶
This example runs many simulations due to the library of binders. You may find it helpful to change the following lines in the YAML file
binders:
select: all
to
binders:
select: <Integer>
where <Integer>
is a single integer representing a line in the corresponding file.
Examining YAML file¶
Here we look at the all-ligands-explicit.yaml
file in this example, highlighting the differences between this file and similar
files in other examples.
Options Header¶
There are no new features introduced in this header.
Molecules Header¶
molecules:
t4-lysozyme:
filepath: input/receptor.pdbfixer.pdb
# Define our ligands
binders:
filepath: input/L99A-binders.csv
antechamber:
charge_method: bcc
select: all
The molecules
header has quite a number of differences from the other examples. First, there is one receptor, t4-lysozyme
,
and one “ligand”, which is actually a (semi)comma separated value file, .csv
with multiple ligands per file.
Further, there is the select
command with different arguments. Let’s break down each part of these ligands one at time
First we look at the CSV file itself. The file under the binders
header is formatted as such.
Each line is a molecule where the second column (semicolon separated) is the SMILES string of that molecule.
The remaining columns do not mater for YANK, so long as the 2nd column is the SMILES string. This file alsos take
commas as the delimiter.
When YANK reads a SMILES string, it passes that string off to the OpenEye Toolkit to generate a ligand with all atom
types and coordinates that will be used in YANK. Because the structure it generates is in no way optimized, it is
highly recommended you set minimize
in the primary options
header.
The select
argument tells YANK which line(s) (and therefore molecules) in the multi-line CSV file to read. It defaults
to all
which tells YANK to make a simulation for each molecule in the file, and then run them sequentially. It does
NOT run a single simulation with every molecule present at the same time. Since the default is all
, we did not have
to set the option in the binders
molecule, but we explicitly set it so you can see how it works in this example.
The select
option could also accept an integer to choose a single molecule from your CSV file, where the index
starts at 0. e.g. select: 0
chooses the first molecule in the list, after any leading commented lines.
Alternately, we could have select: !Combinatorial [0,1,2,3]
and gotten the same result as select: all
for this example
Let us now look at one of YANK’s most powerful features the !Combinatorial
options.
!Combinatorial¶
!Combinatorial
tells YANK to set up a unique simulation for every entry in the list following the !Combinatorial
command.
YANK will construct a unique simulation for every combination of every set of parameters across all !Combinatorial
lists in the YAML file.
For example, suppose we had
options:
temperature: !Combinatorial [200*kelvin, 300*kelvin]
systems:
leap:
parameters: !Combinatorial [leaprc.gaff, leaprc.gaff2]
then 4 simulations would be run iterating over every combination across the options. EVERY option can be given the
!Combinatorial
flag except for the options in the protocols
and solvents
headers. Take care
of how many of these flags you set as it will increase the number of simulations that have to be run combinatorially.
However YANK will automatically figure out what options should be combined. For instance, if you set a !Combinatorial
option in two separate molecules, they will not necessarily run every combination between the two molecules, UNLESS there
is a system that uses both molecules. It will run a simulation for every option in a given molecule’s !Combinatorial
option, but will not cross them unless there is system which combines both.
In this example, the !Combinatorial <List of Ints>
could have been called instead of select: all
to select the indices of
all molecules in the file. There is no reason for this list other than we can for this example.
The select: all
is a shortcut in this option for select: !Combinatorial [0, 1, 2, 3, 4, 5, ... N]
where N
is number of molecules in the file.
The all-ligands-implicit.yaml
file in this same example directory shows the select: !Combinatorial [...]
in
action.
Solvents Header¶
Nothing is changed in this header.
Systems Header¶
systems:
t4-ligand:
receptor: t4-lysozyme
ligand: binders
solvent: pme
leap:
parameters: [oldff/leaprc.ff14SB, leaprc.gaff2, frcmod.ionsjc_tip3p]
pack: yes
The output we would expect from this is a unique simulation with every binder in the file.
pack: yes
pulls the ligand close to the receptor ensuring that your solvation box won’t be too big. This is highly
recommended when ligands are generated from SMILES they have a random position.
Other Headers¶
The experiments
and protocols
headers are not changed in this example.
Running the Simulation¶
Running the simulation is the same as the other examples where you can either run the run-explicit.sh
script, or
by running yank script --yaml=explicit.yaml
. For running on multiple nodes, use run-torque-explicit.sh
and
adapt it to your parallel platform.
The output of this run will be different from simulations where !Combinatorial
is not invoked (or in this case
select: all
. First, YANK figures
out all the combinations this run will generate. Next it pre-constructs all the molecules and system files before it
runs any of them. Finally, each simulation is run one after another.
Analyzing the Simulation¶
YANK automatically generates the instructions that yank analyze
will use to compute the free energy difference
for every combination of options. Right now YANK will only tell you the free energy for each individual simulation.
It will be up to you to trap this information and split it into each simulation.
Future versions of YANK will generate more helpful output for !Combinatorial
simulations.
Other Files in this Example¶
We also provide inputs for running implicit simulation of the same problem.
all-ligands-implicit.yaml
- YAML file for running implcit solventrun-all-ligands-implicit.sh
- Shell script for serial running of the implicit all-ligands examplerun-torque-all-ligands-implicit.sh
- Shell script for parallel/cluster running of the implicit all-ligands example