p-xylene binding to T4 lysozyme L99A in explicit solvent¶
This example illustrates the computation of the binding free energy of p-xylene to the T4 lysozyme L99A mutant in an NPT system of explicit water. From description, we will craft the settings, molecules, and simulations in YANK. This covers the basics of running YANK and detailed instructions will be given for every step.
This example resides in {PYTHON SOURCE DIR}/share/yank/examples/binding/t4-lysozyme
. The rest of the example here
assumes you are in this directory.
Examining YAML file¶
We start this example by looking at the YAML file which controls all of the setting for YANK, p-xylene-explicit.yaml
. This file
is what YANK uses to define all simulation parameters, and actually run the experiments. The file is broken down into 6
sections: options
, molecules
, solvents
, systems
, protocols
and experiments
. We’ll go through each of those here
as they pertain to the example, but please see the detailed YAML documentation: for all possible options
and valid values.
This example goes through all the options we choose and why in explicit detail, future examples will not cover every option, but instead just the changes.
Options Heading¶
The options
heading is where the general simulation settings are chosen. These are experiment wide settings that will
govern the simulation at runtime, and the the global conditions that we care about. Looking back at our description,
it is here that we will enforce the NPT ensemble settings.
options:
minimize: yes
The minimize
option tells YANK to run energy minimization before MD simulation to get the molecules into a more
energetically stable starting position. This is helpful as many input files are missing components (such as hydrogens),
were taken from a different force field, or do not have the ligand and receptor positions taken from the same crystal
structure. Because each of these small perturbations can cause large energy changes, we minimize the energy to reduce
the odds of causing a simulation crash.
options:
verbose: yes
The verbose
options tells YANK to be very informative about what it is doing at every step. We turn verbose on to
see what the simulation is doing. As someone learning YANK, this is a very helpful to see every action being taken to
better understand what is happening.
options:
temperature: 300*kelvin
This option sets the temperature of our system. YANK simulations use Langivin Dynamics to control temperature. If this option is not set, Velocity Verlet time integration will be used instead.
options:
pressure: 1*atmosphere
This option sets the pressure of our system. YANK uses OpenMM’s Monte Carlo Barostat to regulate the pressure. By default,
pressure is set to 1 atmosphere anyways, we explicitly set it here for this example. If you wanted to instead do NVT
simulations, you would need to explicitly set this option to null
.
Molecules Heading¶
The molecules
heading is where we will specify our ligand and receptor molecules. Both the receptor and the ligand
molecules can have arbitrary names, but we prefer to call them helpful names like t4-lysozyme
and p-xylene
.
Let us look at the receptor, T4-Lysozyme. We’ll look at the whole code block for defining it.
molecules:
t4-lysozyme:
filepath: input/receptor.pdbfixer.pdb
leap:
parameters: oldff/leaprc.ff14SB
First we define our molecule name t4-lysozyme
, this name will come up again in later sections, hence why we have
given it a meaningful name.
Next we tell YANK where the file is, so filepath
points at the file relative to where
the the yaml script.
Lastly we have to tell YANK where to get the molecule’s parameters to put into a simulation. The combination of
leap
and parameters
tells YANK how to pull in parameters. The path of parameters
is relative to the the
LEaP directory.
Now let us look at the ligand molecule, para-xylene
molecules:
p-xylene:
filepath: input/ligand.tripos.mol2
antechamber:
charge_method: bcc
First we give our ligand a meaningful name: p-xylene
.
Next we tell YANK where the file is with filepath
.
The antechamber
command is probably the most loaded command in the YAML file. Several things happen when this command
is invoked. First, specifying antechamber
tells YANK to prep the molecule by running it through ANTECHAMBER to get
missing torsions, bonds, and other parameters that may not be in the file. This creates an FRCMOD file which is automatically
loaded in with the molecule to make the final files, this step will be entirely transparent to you, the user.
Next is
the sub-directive charge_method
. This sub-directive is required with antechamber
and tells YANK how to assign
charges to our ligand molecule. The bcc
option tells YANK to get this molecule’s charges from AM1-BCC method at
run time. The other valid option is null
which tells YANK to still use ANTECHAMBER to get missing parameters, but
to not attempt to assign charges. This is helpful if you pre-assigned charges in the input file, but still need the
missing bonded parameters.
Solvents Heading¶
The solvents
heading is where we specify what type of environment we want the ligand and receptor to be in.
Looking at the general description of our system, we know we want explicit solvent (instead of implicit/continuous
dielectric). Since there is only one solvent we need to define, we will look at the whole code block at once.
solvents:
pme:
nonbonded_method: PME
nonbonded_cutoff: 9*angstroms
clearance: 16*angstroms
positive_ion: Na+
negative_ion: Cl-
First we define an arbitrary name for our solvent. Here we call it pme
since we will be using Particle Mesh Ewald
to handle our long-range electrostatics. Again, we could have named this anything we want, but we gave it a meaningful
name to go with.
nonbonded_method
is where we choose how to handle our nonbonded interactions. Because we have a large, explicit solvent
system, it would be very taxing to compute every interaction over all atoms and every periodic copy. Instead, we
divide the interactions into short- and long- range interactions with the Particle Mesh Ewald method for computing these
interactions efficiently. This option is also the main place where implicit vs explicit solvent is chosen. We do not
show how to set up an implicit solvent in our Host-Guest Example. This is also where you
can define a vacuum solvent, though that is not covered in this example.
Since we are dividing the system into long and short range interactions, we specify where that cutoff should take place
with nonbonded_cutoff
.
clearance
defines how to fill the simulation with solvent and how the box vectors should be drawn.
The box vectors are drawn such that the distance between the box edge and any part of the receptor is at
least the distance specified. Next, the space is filled in with TIP3P water. The water molecules are just replicated
copies of a unit cell of water, so you absolutely should specify the minimize
option in the general options
header.
positive_ion
and negative_ion
tell the simulation what Ions to add in to make the system neutral. If these
options are not specified, no counter ions will be added.
Systems Heading¶
This heading is where we combine the molecules and the solvent to make an actual system that we can simulate. This is
also where we specify the parameters we use system wide to account for missing ones from individual molecules. We also
use the names we set up in the molecules
and solvents
section, hence why it was important to have meaningful names.
This block is very strait forward.
systems:
t4-xylene:
receptor: t4-lysozyme
ligand: p-xylene
solvent: pme
leap:
parameters: [oldff/leaprc.ff14SB, leaprc.gaff2, frcmod.ionsjc_tip3p]
We first define a name for our system, t4-xylene
is the arbitrary name we chose.
Next we define what the receptor is with the receptor
directive. Note that this points at our arbitrary name of
“t4-lysozyme” from before.
Then we specify the ligand with ligand
. Note that this points to our arbitrary ligand name of “p-xylene”
Then comes the solvent with solvent
. Note this points at the arbitrary named “pme” from before.
Lastly, we need to define where to get parameters for the atoms with the leap
and subsequent parameters
directives.
Even if you specify all the atom parameters for every molecule in molecules
, you will still need to specify this
pair of options to parametrize the explicit water. Note that multiple files can be specified as a comma separated list so
long as they are enclosed by brackets, [ ]
.
Protocols Heading¶
The protocols
heading and its options will be the most foreign to those not familiar with alchemical simulations.
Free energy calculations are computationally difficult to compute because in a physical sense, the ligand needs to drift
in and out of the binding pocket. This action happens on the order of milliseconds to seconds, which are simulation
times that are very difficult to achieve with direct simulation. Instead, we use a computationally efficient thermodynamic
cycle along efficient thermodynamic paths to mimic the end states (bound ligand in solvent box -> ligand in solvent
+ receptor in solvent). For more reading, please see the alchemistry.org page on the thermodynamic cycle
and on choosing intermediate states
for more details.
This header controls how many states you will sample, and which values along the thermodynamic paths to sample in each phase.
protocols:
absolute-binding:
complex:
alchemical_path:
lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
lambda_restraints: [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
solvent:
alchemical_path:
lambda_electrostatics: [1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
We first defined a name for our protocol. The name is arbitrary and we choose absolute-binding
.
Next we define what happens in the complex
phase. Note that the name complex
is semi-arbitrary. It can be called
whatever you would like so long as it contains the string “complex” in it. alchemical_path
is a required argument
as is lambda_electrostatics
and lambda_sterics
. We will discuss the optional lambda_restraints
momentarily,
first lets look at the syntax of these arguments.
Each of the lambda_...
arguments takes a list of lambda values where each index corresponds with a single state. E.g.
index 0 (the first entry) of lambda_electrostatics
is the value the electrostatic lambda will take in the first
state. At the same time, index 0 of the lambda_sterics
is what value the sterics lambda will take in the first state.
This also means that both directives must have the same number of values.
The optional lambda_restraints
tells the restraints which we specify in experiments
how coupled they should be
in each state. We will specify a Harmonic
restraint which will keep the ligand close to the centroid of the receptor
through a weak harmonic biasing potential. However, we only want this restraint on when the ligand is decoupled to
prevent it from drifting too far away, so its lambda values actually are coupled, whereas the nonbonded lambdas are
decoupled. Also note how only one phase has lambda_restraints
specified. This is because the restraint only makes
sense in the complex
phase as we actually do want the ligand to explore the other phase.
Lastly, we define what happens in the solvent
phase. This again is a semi-arbitrary name and can be whatever you
want, so long as it contains the string “solvent”. In the thermodynamic cycle
for this process, we have to account for the free energy of removing the harmonic restraints and then transferring the
ligand to a standard state solvent
box. YANK automatically accounts for this free energy for you which is part of
the reason lambda_restraints
do not appear in the solvent
phase.
Experiments Heading¶
Finally we have the experiments
header where we combine a system and a protocol to make the final run.
This is the simplest block
experiments:
system: t4-xylene
protocol: absolute-binding
This block requires only a system
to target the named system we made, and a protocol
to target the named
protocol.
Running the Simulation¶
We are finally ready to run our first simulation, now that everything has been set up in the YAML file. To run the simulation, issue the following command:
$ yank script --yaml=p-xylene-explicit.yaml
and let the simulation take care of the rest. What happens next is YANK will set up the files as we have specified, in this running the ligand through ANTECHAMBER, take the prepped ligand and receptor to make a solvated complex, and run both a complex and solvent simulation to compute the absolute free energy of binding. Several steps will happen over the simulation, we’ll run through them here:
- The molecules get processed through LEaP and put into OpenMM
- The output file is pre-processed
- The different states are all prepared and the output file is updated
- Each thermodynamic state is minimized for the complex system
- A Hamiltonian Replica Exchange simulation is run
- The process is repeated for the solvent simulation
During the simulation you will see that one iteration propagates, then YANK attempts Hamiltonian replica exchange which you see as a large table of energies. The swap statistics are then shown showing how many times two replicas exchanged with one another and finally some timing information (how much longer we expect the simulation to take and so on).
Analyzing Results¶
Once both phases of the simulation run, we can compute the final binding free energy by running the following command.
$ yank analyze --store=p-xylene-out
This complex and solvent phase will be automatically loaded in, decorrelated, and analyzed to get the free energy. We use the energies from a simulation box with expanded cutoff radius to reduce the impact a cutoff has from anisotropic dispersion correction. Free energy itself is then analyzed through the Multistate Bennet Acceptance Ratio (MBAR) to get the minimally biased free energy estimate across the two phases.
You should make note of how many decorrelated samples are left after analysis, if you feel that there were not enough
samples, run for longer to get more. This can be done by modifying the YAML file in the options
header and adding
the following options:
options:
default_number_of_iterations: <Some Integer>
resume_simulation: yes
where you replace <Some Integer>
with a number larger than the number of iterations you just ran.
Other Files in this Example¶
In this example, we also include an alternate YAML file called implicit.yaml
which uses an implicit solvent instead
of explicit solvent. The other main difference is that this is effectively NVT ensemble since NPT ensemble makes no
sense in implicit solvent. It the execution and analysis of this system are identical, but replace the script target
in command line with --yaml=p-xylene-implicit.yaml
. We cover more details of the setup of an implicit solvent system
in the next example involving a Host Guest System.