Absolute Binding of Binders to T4 lysozyme L99A in Explicit Solvent with Combinatorial Options

This example performs absolute binding free energy calculations for a series of small molecules that contain known binders to the T4 Lysozyme L99A protein.

We take advantage of three advanced features of YANK in this example (please see the basic tutorial for the YAML sections and binding free energy calculation for learning how to set up basic YANK simulations):

  1. YANK’s ability to build out small molecules from SMILES strings (through OpenEye Toolkit)
  2. YANK’s ability to run multiple ligands through the same commands
  3. YANK’s ability to run Combinatorial options

This example resides in {PYTHON SOURCE DIR}/share/binding/t4-lysozyme.

Original source ligands collected by the Shoichet Lab.

Disclaimer on this Example

This example runs many simulations due to the library of binders. You may find it helpful to change the following lines in the YAML file

binders:
    select: all

to

binders:
    select: <Integer>

where <Integer> is a single integer representing a line in the corresponding file.

Examining YAML file

Here we look at the all-ligands-explicit.yaml file in this example, highlighting the differences between this file and similar files in other examples.

Options Header

There are no new features introduced in this header.

Molecules Header

molecules:
  t4-lysozyme:
    filepath: input/receptor.pdbfixer.pdb
  # Define our ligands
  binders:
    filepath: input/L99A-binders.csv
    antechamber:
      charge_method: bcc
    select: all

The molecules header has quite a number of differences from the other examples. First, there is one receptor, t4-lysozyme, and one “ligand”, which is actually a (semi)comma separated value file, .csv with multiple ligands per file. Further, there is the select command with different arguments. Let’s break down each part of these ligands one at time

First we look at the CSV file itself. The file under the binders header is formatted as such. Each line is a molecule where the second column (semicolon separated) is the SMILES string of that molecule. The remaining columns do not mater for YANK, so long as the 2nd column is the SMILES string. This file alsos take commas as the delimiter.

When YANK reads a SMILES string, it passes that string off to the OpenEye Toolkit to generate a ligand with all atom types and coordinates that will be used in YANK. Because the structure it generates is in no way optimized, it is highly recommended you set minimize in the primary options header.

The select argument tells YANK which line(s) (and therefore molecules) in the multi-line CSV file to read. It defaults to all which tells YANK to make a simulation for each molecule in the file, and then run them sequentially. It does NOT run a single simulation with every molecule present at the same time. Since the default is all, we did not have to set the option in the binders molecule, but we explicitly set it so you can see how it works in this example.

The select option could also accept an integer to choose a single molecule from your CSV file, where the index starts at 0. e.g. select: 0 chooses the first molecule in the list, after any leading commented lines.

Alternately, we could have select: !Combinatorial [0,1,2,3] and gotten the same result as select: all for this example

Let us now look at one of YANK’s most powerful features the !Combinatorial options.

!Combinatorial

!Combinatorial tells YANK to set up a unique simulation for every entry in the list following the !Combinatorial command. YANK will construct a unique simulation for every combination of every set of parameters across all !Combinatorial lists in the YAML file.

For example, suppose we had

options:
  temperature: !Combinatorial [200*kelvin, 300*kelvin]
systems:
  leap:
    parameters: !Combinatorial [leaprc.gaff, leaprc.gaff2]

then 4 simulations would be run iterating over every combination across the options. EVERY option can be given the !Combinatorial flag except for the options in the protocols and solvents headers. Take care of how many of these flags you set as it will increase the number of simulations that have to be run combinatorially. However YANK will automatically figure out what options should be combined. For instance, if you set a !Combinatorial option in two separate molecules, they will not necessarily run every combination between the two molecules, UNLESS there is a system that uses both molecules. It will run a simulation for every option in a given molecule’s !Combinatorial option, but will not cross them unless there is system which combines both.

In this example, the !Combinatorial <List of Ints> could have been called instead of select: all to select the indices of all molecules in the file. There is no reason for this list other than we can for this example. The select: all is a shortcut in this option for select: !Combinatorial [0, 1, 2, 3, 4, 5, ... N] where N is number of molecules in the file.

The all-ligands-implicit.yaml file in this same example directory shows the select: !Combinatorial [...] in action.

Solvents Header

Nothing is changed in this header.

Systems Header

systems:
  t4-ligand:
    receptor: t4-lysozyme
    ligand: binders
    solvent: pme
    leap:
      parameters: [oldff/leaprc.ff14SB, leaprc.gaff2, frcmod.ionsjc_tip3p]
    pack: yes

The output we would expect from this is a unique simulation with every binder in the file.

pack: yes pulls the ligand close to the receptor ensuring that your solvation box won’t be too big. This is highly recommended when ligands are generated from SMILES they have a random position.

Other Headers

The experiments and protocols headers are not changed in this example.

Running the Simulation

Running the simulation is the same as the other examples where you can either run the run-explicit.sh script, or by running yank script --yaml=explicit.yaml. For running on multiple nodes, use run-torque-explicit.sh and adapt it to your parallel platform.

The output of this run will be different from simulations where !Combinatorial is not invoked (or in this case select: all. First, YANK figures out all the combinations this run will generate. Next it pre-constructs all the molecules and system files before it runs any of them. Finally, each simulation is run one after another.

Analyzing the Simulation

YANK automatically generates the instructions that yank analyze will use to compute the free energy difference for every combination of options. Right now YANK will only tell you the free energy for each individual simulation. It will be up to you to trap this information and split it into each simulation.

Future versions of YANK will generate more helpful output for !Combinatorial simulations.

Other Files in this Example

We also provide inputs for running implicit simulation of the same problem.

  • all-ligands-implicit.yaml - YAML file for running implcit solvent
  • run-all-ligands-implicit.sh - Shell script for serial running of the implicit all-ligands example
  • run-torque-all-ligands-implicit.sh - Shell script for parallel/cluster running of the implicit all-ligands example