.. _all-ligand-explicit: Absolute Binding of Binders to T4 lysozyme L99A in Explicit Solvent with Combinatorial Options ============================================================================================== This example performs absolute binding free energy calculations for a series of small molecules that contain known binders to the T4 Lysozyme L99A protein. We take advantage of three advanced features of YANK in this example (please see the :doc:`basic tutorial for the YAML sections and binding free energy calculation ` for learning how to set up basic YANK simulations): 1. YANK's ability to build out small molecules from SMILES strings (through OpenEye Toolkit) * Please see our :ref:`OpenEye Install Instructions ` for installing these tools * Alternately see `OpenEye's Instructions `_ for additional help 2. YANK's ability to run :ref:`multiple ligands ` through the same commands 3. YANK's ability to run :ref:`Combinatorial ` options * See :ref:`the section below ` or the :ref:`full documentation page ` for more information on ``!Combinatorial`` This example resides in ``{PYTHON SOURCE DIR}/share/binding/t4-lysozyme``. Original source ligands collected by the `Shoichet Lab `_. Disclaimer on this Example -------------------------- This example runs many simulations due to the library of binders. You may find it helpful to change the following lines in the YAML file .. code-block:: yaml binders: select: all to .. code-block:: yaml binders: select: where ```` is a single integer representing a line in the corresponding file. Examining YAML file ------------------- Here we look at the ``all-ligands-explicit.yaml`` file in this example, highlighting the differences between this file and similar files in other examples. Options Header ^^^^^^^^^^^^^^ There are no new features introduced in this header. Molecules Header ^^^^^^^^^^^^^^^^ .. code-block:: yaml molecules: t4-lysozyme: filepath: input/receptor.pdbfixer.pdb # Define our ligands binders: filepath: input/L99A-binders.csv antechamber: charge_method: bcc select: all The ``molecules`` header has quite a number of differences from the other examples. First, there is one receptor, ``t4-lysozyme``, and one "ligand", which is actually a (semi)comma separated value file, ``.csv`` with multiple ligands per file. Further, there is the ``select`` command with different arguments. Let's break down each part of these ligands one at time First we look at the CSV file itself. The file under the ``binders`` header is formatted as such. Each line is a molecule where the second column (semicolon separated) is the SMILES string of that molecule. The remaining columns do not mater for YANK, so long as the 2nd column is the SMILES string. This file alsos take commas as the delimiter. When YANK reads a SMILES string, it passes that string off to the OpenEye Toolkit to generate a ligand with all atom types and coordinates that will be used in YANK. Because the structure it generates is in no way optimized, it is highly recommended you set ``minimize`` in the primary ``options`` header. The ``select`` argument tells YANK which line(s) (and therefore molecules) in the multi-line CSV file to read. It defaults to ``all`` which tells YANK to make a simulation for each molecule in the file, and then run them sequentially. It does NOT run a single simulation with every molecule present at the same time. Since the default is ``all``, we did not have to set the option in the ``binders`` molecule, but we explicitly set it so you can see how it works in this example. The ``select`` option could also accept an integer to choose a single molecule from your CSV file, where the index starts at 0. e.g. ``select: 0`` chooses the first molecule in the list, after any leading commented lines. Alternately, we could have ``select: !Combinatorial [0,1,2,3]`` and gotten the same result as ``select: all`` for this example Let us now look at one of YANK's most powerful features the ``!Combinatorial`` options. .. _yank_example_combo: !Combinatorial ++++++++++++++ ``!Combinatorial`` tells YANK to set up a unique simulation for every entry in the list following the ``!Combinatorial`` command. YANK will construct a unique simulation for every combination of every set of parameters across all ``!Combinatorial`` lists in the YAML file. For example, suppose we had .. code-block:: yaml options: temperature: !Combinatorial [200*kelvin, 300*kelvin] systems: leap: parameters: !Combinatorial [leaprc.gaff, leaprc.gaff2] then 4 simulations would be run iterating over every combination across the options. EVERY option can be given the ``!Combinatorial`` flag except for the options in the ``protocols`` and ``solvents`` headers. Take care of how many of these flags you set as it will increase the number of simulations that have to be run combinatorially. However YANK will automatically figure out what options should be combined. For instance, if you set a ``!Combinatorial`` option in two separate molecules, they will not necessarily run every combination between the two molecules, UNLESS there is a system that uses both molecules. It will run a simulation for every option in a given molecule's ``!Combinatorial`` option, but will not cross them unless there is system which combines both. In this example, the ``!Combinatorial `` could have been called instead of ``select: all`` to select the indices of all molecules in the file. There is no reason for this list other than we can for this example. The ``select: all`` is a shortcut in this option for ``select: !Combinatorial [0, 1, 2, 3, 4, 5, ... N]`` where ``N`` is number of molecules in the file. The ``all-ligands-implicit.yaml`` file in this same example directory shows the ``select: !Combinatorial [...]`` in action. Solvents Header ^^^^^^^^^^^^^^^ Nothing is changed in this header. Systems Header ^^^^^^^^^^^^^^ .. code-block:: yaml systems: t4-ligand: receptor: t4-lysozyme ligand: binders solvent: pme leap: parameters: [oldff/leaprc.ff14SB, leaprc.gaff2, frcmod.ionsjc_tip3p] pack: yes The output we would expect from this is a unique simulation with every binder in the file. ``pack: yes`` pulls the ligand close to the receptor ensuring that your solvation box won't be too big. This is highly recommended when ligands are generated from SMILES they have a random position. Other Headers ^^^^^^^^^^^^^ The ``experiments`` and ``protocols`` headers are not changed in this example. Running the Simulation ---------------------- Running the simulation is the same as the other examples where you can either run the ``run-explicit.sh`` script, or by running ``yank script --yaml=explicit.yaml``. For running on multiple nodes, use ``run-torque-explicit.sh`` and adapt it to your parallel platform. The output of this run will be different from simulations where ``!Combinatorial`` is not invoked (or in this case ``select: all``. First, YANK figures out all the combinations this run will generate. Next it pre-constructs all the molecules and system files before it runs any of them. Finally, each simulation is run one after another. Analyzing the Simulation ------------------------ YANK automatically generates the instructions that ``yank analyze`` will use to compute the free energy difference for every combination of options. Right now YANK will only tell you the free energy for each individual simulation. It will be up to you to trap this information and split it into each simulation. Future versions of YANK will generate more helpful output for ``!Combinatorial`` simulations. Other Files in this Example --------------------------- We also provide inputs for running implicit simulation of the same problem. * ``all-ligands-implicit.yaml`` - YAML file for running implcit solvent * ``run-all-ligands-implicit.sh`` - Shell script for serial running of the implicit all-ligands example * ``run-torque-all-ligands-implicit.sh`` - Shell script for parallel/cluster running of the implicit all-ligands example