p-xylene binding to T4 lysozyme L99A in explicit solvent

This example illustrates the computation of the binding free energy of p-xylene to the T4 lysozyme L99A mutant in an NPT system of explicit water. From description, we will craft the settings, molecules, and simulations in YANK. This covers the basics of running YANK and detailed instructions will be given for every step.

This example resides in {PYTHON SOURCE DIR}/share/yank/examples/binding/t4-lysozyme. The rest of the example here assumes you are in this directory.

Examining YAML file

We start this example by looking at the YAML file which controls all of the setting for YANK, p-xylene-explicit.yaml. This file is what YANK uses to define all simulation parameters, and actually run the experiments. The file is broken down into 6 sections: options, molecules, solvents, systems, protocols and experiments. We’ll go through each of those here as they pertain to the example, but please see the detailed YAML documentation: for all possible options and valid values.

This example goes through all the options we choose and why in explicit detail, future examples will not cover every option, but instead just the changes.

Options Heading

The options heading is where the general simulation settings are chosen. These are experiment wide settings that will govern the simulation at runtime, and the the global conditions that we care about. Looking back at our description, it is here that we will enforce the NPT ensemble settings.

options:
  minimize: yes

The minimize option tells YANK to run energy minimization before MD simulation to get the molecules into a more energetically stable starting position. This is helpful as many input files are missing components (such as hydrogens), were taken from a different force field, or do not have the ligand and receptor positions taken from the same crystal structure. Because each of these small perturbations can cause large energy changes, we minimize the energy to reduce the odds of causing a simulation crash.

options:
  verbose: yes

The verbose options tells YANK to be very informative about what it is doing at every step. We turn verbose on to see what the simulation is doing. As someone learning YANK, this is a very helpful to see every action being taken to better understand what is happening.

options:
  temperature: 300*kelvin

This option sets the temperature of our system. YANK simulations use Langivin Dynamics to control temperature. If this option is not set, Velocity Verlet time integration will be used instead.

options:
  pressure: 1*atmosphere

This option sets the pressure of our system. YANK uses OpenMM’s Monte Carlo Barostat to regulate the pressure. By default, pressure is set to 1 atmosphere anyways, we explicitly set it here for this example. If you wanted to instead do NVT simulations, you would need to explicitly set this option to null.

Molecules Heading

The molecules heading is where we will specify our ligand and receptor molecules. Both the receptor and the ligand molecules can have arbitrary names, but we prefer to call them helpful names like t4-lysozyme and p-xylene.

Let us look at the receptor, T4-Lysozyme. We’ll look at the whole code block for defining it.

molecules:
  t4-lysozyme:
    filepath: input/receptor.pdbfixer.pdb
    leap:
      parameters: oldff/leaprc.ff14SB

First we define our molecule name t4-lysozyme, this name will come up again in later sections, hence why we have given it a meaningful name.

Next we tell YANK where the file is, so filename points at the file relative to where the the yaml script.

Lastly we have to tell YANK where to get the molecule’s parameters to put into a simulation. The combination of leap and parameters tells YANK how to pull in parameters. The path of parameters is relative to the the LEaP directory.

Now let us look at the ligand molecule, para-xylene

molecules:
 p-xylene:
   filepath: input/ligand.tripos.mol2
   antechamber:
     charge_method: bcc

First we give our ligand a meaningful name: p-xylene.

Next we tell YANK where the file is with filename.

The antechamber command is probably the most loaded command in the YAML file. Several things happen when this command is invoked. First, specifying antechamber tells YANK to prep the molecule by running it through ANTECHAMBER to get missing torsions, bonds, and other parameters that may not be in the file. This creates an FRCMOD file which is automatically loaded in with the molecule to make the final files, this step will be entirely transparent to you, the user.

Next is the sub-directive charge_method. This sub-directive is required with antechamber and tells YANK how to assign charges to our ligand molecule. The bcc option tells YANK to get this molecule’s charges from AM1-BCC method at run time. The other valid option is null which tells YANK to still use ANTECHAMBER to get missing parameters, but to not attempt to assign charges. This is helpful if you pre-assigned charges in the input file, but still need the missing bonded parameters.

Solvents Heading

The solvents heading is where we specify what type of environment we want the ligand and receptor to be in. Looking at the general description of our system, we know we want explicit solvent (instead of implicit/continuous dielectric). Since there is only one solvent we need to define, we will look at the whole code block at once.

solvents:
  pme:
    nonbonded_method: PME
    nonbonded_cutoff: 9*angstroms
    clearance: 16*angstroms
    positive_ion: Na+
    negative_ion: Cl-

First we define an arbitrary name for our solvent. Here we call it pme since we will be using Particle Mesh Ewald to handle our long-range electrostatics. Again, we could have named this anything we want, but we gave it a meaningful name to go with.

nonbonded_method is where we choose how to handle our nonbonded interactions. Because we have a large, explicit solvent system, it would be very taxing to compute every interaction over all atoms and every periodic copy. Instead, we divide the interactions into short- and long- range interactions with the Particle Mesh Ewald method for computing these interactions efficiently. This option is also the main place where implicit vs explicit solvent is chosen. We do not show how to set up an implicit solvent in our Host-Guest Example. This is also where you can define a vacuum solvent, though that is not covered in this example.

Since we are dividing the system into long and short range interactions, we specify where that cutoff should take place with nonbonded_cutoff.

clearance defines how to fill the simulation with solvent and how the box vectors should be drawn. The box vectors are drawn such that the distance between the box edge and any part of the receptor is at least the distance specified. Next, the space is filled in with TIP3P water. The water molecules are just replicated copies of a unit cell of water, so you absolutely should specify the minimize option in the general options header.

positive_ion and negative_ion tell the simulation what Ions to add in to make the system neutral. If these options are not specified, no counter ions will be added.

Systems Heading

This heading is where we combine the molecules and the solvent to make an actual system that we can simulate. This is also where we specify the parameters we use system wide to account for missing ones from individual molecules. We also use the names we set up in the molecules and solvents section, hence why it was important to have meaningful names.

This block is very strait forward.

systems:
  t4-xylene:
    receptor: t4-lysozyme
    ligand: p-xylene
    solvent: pme
    leap:
      parameters: [oldff/leaprc.ff14SB, leaprc.gaff2, frcmod.ionsjc_tip3p]

We first define a name for our system, t4-xylene is the arbitrary name we chose.

Next we define what the receptor is with the receptor directive. Note that this points at our arbitrary name of “t4-lysozyme” from before.

Then we specify the ligand with ligand. Note that this points to our arbitrary ligand name of “p-xylene”

Then comes the solvent with solvent. Note this points at the arbitrary named “pme” from before.

Lastly, we need to define where to get parameters for the atoms with the leap and subsequent parameters directives. Even if you specify all the atom parameters for every molecule in molecules, you will still need to specify this pair of options to parametrize the explicit water. Note that multiple files can be specified as a comma separated list so long as they are enclosed by brackets, [  ].

Protocols Heading

The protocols heading and its options will be the most foreign to those not familiar with alchemical simulations. Free energy calculations are computationally difficult to compute because in a physical since, the ligand needs to drift in and out of the binding pocket. This action happens on the order of milliseconds to seconds, which are simulation times that are very difficult to achieve with direct simulation. Instead, we use a computationally efficient thermodynamic cycle along efficient thermodynamic paths to mimic the end states (bound ligand in solvent box -> ligand in solvent + receptor in solvent). For more reading, please see the alchemistry.org page on the thermodynamic cycle and on choosing intermediate states for more details.

This header controls how many states you will sample, and which values along the thermodynamic paths to sample in each phase.

protocols:
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
        lambda_restraints:     [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
    solvent:
      alchemical_path:
        lambda_electrostatics: [1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]

We first defined a name for our protocol. The name is arbitrary and we choose absolute-binding.

Next we define what happens in the complex phase. Note that the name complex is semi-arbitrary. It can be called whatever you would like so long as it contains the string “complex” in it. alchemical_path is a required argument as is lambda_electrostatics and lambda_sterics. We will discuss the optional lambda_restraints momentarily, first lets look at the syntax of these arguments.

Each of the lambda_... arguments takes a list of lambda values where each index corresponds with a single state. E.g. index 0 (the first entry) of lambda_electrostatics is the value the electrostatic lambda will take in the first state. At the same time, index 0 of the lambda_sterics is what value the sterics lambda will take in the first state. This also means that both directives must have the same number of values.

The optional lambda_restraints tells the restraints which we specify in experiments how coupled they should be in each state. We will specify a Harmonic restraint which will keep the ligand close to the centroid of the receptor through a weak harmonic biasing potential. However, we only want this restraint on when the ligand is decoupled to prevent it from drifting too far away, so its lambda values actually are coupled, whereas the nonbonded lambdas are decoupled. Also note how only one phase has lambda_restraints specified. This is because the restraint only makes sense in the complex phase as we actually do want the ligand to explore the other phase.

Lastly, we define what happens in the solvent phase. This again is a semi-arbitrary name and can be whatever you want, so long as it contains the string “solvent”. In the thermodynamic cycle for this process, we have to account for the free energy of removing the harmonic restraints and then transferring the ligand to a standard state solvent box. YANK automatically accounts for this free energy for you which is part of the reason lambda_restraints do not appear in the solvent phase.

Experiments Heading

Finally we have the experiments header where we combine a system and a protocol to make the final run.

This is the simplest block

experiments:
  system: t4-xylene
  protocol: absolute-binding

This block requires only a system to target the named system we made, and a protocol to target the named protocol.

Running the Simulation

We are finally ready to run our first simulation, now that everything has been set up in the YAML file. To run the simulation, issue the following command:

$ yank script --yaml=p-xylene-explicit.yaml

and let the simulation take care of the rest. What happens next is YANK will set up the files as we have specified, in this running the ligand through ANTECHAMBER, take the prepped ligand and receptor to make a solvated complex, and run both a complex and solvent simulation to compute the absolute free energy of binding. Several steps will happen over the simulation, we’ll run through them here:

  1. The molecules get processed through LEaP and put into OpenMM
  2. The output file is pre-processed
  3. The different states are all prepared and the output file is updated
  4. Each thermodynamic state is minimized for the complex system
  5. A Hamiltonian Replica Exchange simulation is run
  6. The process is repeated for the solvent simulation

During the simulation you will see that one iteration propagates, then YANK attempts Hamiltonian replica exchange which you see as a large table of energies. The swap statistics are then shown showing how many times two replicas exchanged with one another and finally some timing information (how much longer we expect the simulation to take and so on).

Analyzing Results

Once both phases of the simulation run, we can compute the final binding free energy by running the following command.

$ yank analyze --store=p-xylene-out

This complex and solvent phase will be automatically loaded in, decorrelated, and analyzed to get the free energy. We use the energies from a simulation box with expanded cutoff radius to reduce the impact a cutoff has from anisotropic dispersion correction. Free energy itself is then analyzed through the Multistate Bennet Acceptance Ratio (MBAR) to get the minimally biased free energy estimate across the two phases.

You should make note of how many decorrelated samples are left after analysis, if you feel that there were not enough samples, run for longer to get more. This can be done by modifying the YAML file in the options header and adding the following options:

options:
  number_of_iterations: <Some Integer>
  resume_simulation: yes

where you replace <Some Integer> with a number larger than the number of iterations you just ran.

Other Files in this Example

In this example, we also include an alternate YAML file called implicit.yaml which uses an implicit solvent instead of explicit solvent. The other main difference is that this is effectively NVT ensemble since NPT ensemble makes no sense in implicit solvent. It the execution and analysis of this system are identical, but replace the script target in command line with --yaml=p-xylene-implicit.yaml. We cover more details of the setup of an implicit solvent system in the next example involving a Host Guest System.