YAML syntax tutorial¶
This tutorial assumes you’ve installed YANK successfully.
We’ll start by creating a YAML file for a hydration free energy calculation in explicit solvent, and then we’ll add an absolute binding free energy experiment in explicit and implicit solvent.
Contents
A first example: Hydration free energy¶
We’ll go line-by-line through the sections of the following YAML file for YANK that sets up and run a hydration binding
free energy of a small molecule. The three dashes (---
) is the standard way in YAML to indicate the beginning of a
YAML document.
---
options:
verbose: yes
minimize: yes
default_timestep: 2.0*femtoseconds
default_nsteps_per_iteration: 500
temperature: 300*kelvin
pressure: 1*atmosphere
molecules:
benzene:
filepath: benzene.tripos.mol2
antechamber:
charge_method: bcc
leap:
parameters: [leaprc.gaff]
solvents:
water:
nonbonded_method: PME
nonbonded_cutoff: 9*angstroms
clearance: 16*angstroms
solvent_model: tip4pew
leap:
parameters: [leaprc.water.tip4pew]
vacuum:
nonbonded_method: NoCutoff
systems:
hydration-system:
solute: benzene
solvent1: water
solvent2: vacuum
leap:
parameters: [leaprc.protein.ff14SB]
protocols:
hydration-protocol:
solvent1:
alchemical_path: auto
solvent2:
alchemical_path: auto
experiments:
system: hydration-system
protocol: hydration-protocol
Prepare the molecules¶
Let’s skip the options
section for now and look at the molecules
block (docs). This
is the place where you can specify the pipeline that a molecule should go through before eventually being solvated or
combined with another molecule to form a complex.
molecules:
benzene:
filepath: benzene.tripos.mol2
antechamber:
charge_method: bcc
leap:
parameters: [leaprc.gaff]
The benzene
keyword here is a string identifier so it can be any name we want. The nested filepath
keyword points
to a mol2
file specifying the molecule topology and initial coordinates of the atoms. The automatic pipeline in YANK
is based on AmberTools so you can determine charges with antechamber (in this case
AM1-BCC point charges) and assign the parameters to the molecule by loading any parameter file accessible to tleap.
The tleap parameters can be assigned in the systems
section as well with identical effects, but when you have multiple
molecules and you want to assign different parameters to them, it is convenient to be able to specify the parameter file
to load in the molecules
section.
Specify the solvent¶
We’ll turn off benzene’s nonbonded interactions in explicit water and turn them back on in vacuum to close the thermodynamic cycle so let’s create two solvents.
solvents:
water:
nonbonded_method: PME
nonbonded_cutoff: 9*angstroms
clearance: 16*angstroms
solvent_model: tip4pew
leap:
parameters: [leaprc.water.tip4pew]
vacuum:
nonbonded_method: NoCutoff
Again, water
and vacuum` are just two arbitrary string identifiers. The ``vacuum
solvent is quite simple. The
water
solvent we use Particle Mesh Ewald for long-range electrostatic treatment with a cutoff of 9A for both Coulomb
and Lennard-Jones interactions.
The clearance
and solvent_model
keywords instruct YANK to solvate the solute with TIP4P-Ew water molecules so that
the solute will be distant from the box boundaries by 16A. Finally, we ask leap to load the TIP4P-Ew parameters when solvating.
Again, this parameter file can be specified in the in systems
section.
Prepare the systems¶
Finally let’s put everything together to create the systems in solvent and in vacuum that YANK will use to run the hydration free energy calculation.
systems:
hydration-system:
solute: benzene
solvent1: water
solvent2: vacuum
leap:
parameters: [leaprc.protein.ff14SB]
hydration-system
is, as usual, an arbitrary identifier. Here we are instructing YANK to create two systems by
solvating benzene
in solvent1
(i.e. the water
specified above) and in vacuum. YANK will use these two systems
to compute the free energy of transferring benzene from solvent1
to solvent2
. Finally, we load the general amber
parameters that are required by tleap.
If we don’t specify the parameters in the solvents
and molecules
sections, we can do it here.
systems:
hydration-system:
...
leap:
parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew, leaprc.gaff]
Alchemical protocol¶
The protocols
section determines the intermediate states to sample during both phases of the the alchemical calculations.
YANK provides a method to determine this automatically by spacing the intermediate states equally in thermodynamic length.
We’ll see later in the tutorial how you can specify an alchemical path manually instead.
protocols:
hydration-protocol:
solvent1:
alchemical_path: auto
solvent2:
alchemical_path: auto
Experiment¶
This is where we associate a protocol to the system and tell YANK to run the hydration free energy calculation.
experiments:
system: hydration-system
protocol: hydration-protocol
Running options¶
Finally, the options
section contains some several parameters for the simulation (see here
for a complete list).
options:
verbose: yes
minimize: yes
default_timestep: 2.0*femtoseconds
default_nsteps_per_iteration: 500
default_number_of_iterations: 1000
temperature: 300*kelvin
pressure: 1*atmosphere
Here we’re setting the verbosity of the output, and asking YANK to minimize the systems before running 1000 iterations of Hamiltonian replica exchange. Each iteration, by default, perform a Monte Carlo rigid translation and rotation of the ligand followed, in this case, by 1ps of Langevin dynamics (with 2fs time step) keeping the temperature at 300K. The pressure is controlled by a Monte Carlo barostat.
Since it is possible to specify multiple free energy calculations in a single YAML file, this section is used to keep
the options to use as default, but they can be overwritten by other parts of the YAML document as well. For example, the
options
block can be added in the experiment specification to overwrite some or all of the default values.
experiments:
system: hydration-system
protocol: hydration-protocol
options:
temperature: 310*kelvin
default_nsteps_per_iteration: 1000
Add an absolute binding free energy experiment¶
We mentioned that it’s possible to specify multiple experiments in a single YAML file. Let’s add an absolute binding free energy calculation.
Preparing protein receptors¶
We start as before by preparing the receptor.
molecules:
...
t4-lysozyme:
filepath: input/t4.pdb
In this case, we don’t specify leap parameters since we’ll add them to the system.
Note
Since the PDB file goes through tleap, your PDB file should adopt the residue names consistent with their protonation state as expected by tleap (e.g., CYS for protonated Cysteine, CYM for deprotonated, and CYX for cysteines forming a disulphide bridge).
Add buffer and neutralizing ions¶
In this case, we want to simulate a particular buffer ionic strength so we create a new solvent.
solvents:
...
buffer:
nonbonded_method: PME
nonbonded_cutoff: 11*angstroms
clearance: 12*angstroms
positive_ion: Na+
negative_ion: Cl-
ionic_strength: 100*millimolar
solvent_model: tip3p
leap:
parameters: [leaprc.water.tip3p]
By adding the three new keywords positive_ion
, negative_ion
, and ionic_strength
, YANK will add neutralizing
and buffer Na+ and Cl- ions.
Note
If the molecule that is alchemically decoupled has a net charge, YANK will automatically select a random counterion to decouple together with the ligand.
Create receptor ligand system¶
Now we create the system by combining the t4-lysozyme
and benzene
to form a complex.
systems:
...
t4-benzene:
receptor: t4-lysozyme
ligand: benzene
solvent: buffer
leap:
parameters: [leaprc.protein.ff14SB, leaprc.gaff2, leaprc.water.tip4pew]
This time instead of solute
, solvent1
, and solvent2
we have receptor
, ligand
, and solvent
. An
experiment using this system will decouple the ligand
in complex and turn back on the interaction in bulk.
Restraints¶
Absolute binding calculations often require a restraint to keep the ligand in the binding pocket.
protocol:
...
absolute-binding:
complex:
alchemical_path:
lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
lambda_restraints: [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
solvent:
alchemical_path: auto
experiment-t4-benzene:
system: t4-benzene
protocol: absolute-binding
restraint:
type: Harmonic
experiment-hydration:
...
experiments: [experiment-t4-benzene, experiment-hydration]
There are three main changes here. First, notice how the experiments
section now points to a list of experiments.
This is necessary since we are using the same YAML file to run multiple experiments. Secondly, the experiment-t4-benzene
block contains a restraint.type
keyword that we can be used to specify the type of restraint we want to use. Other available
restraints are FlatBottom
, Boresch
, and PeriodicTorsionBoresch
.
Finally, we’ve used a manual protocol instead of auto
to control the intermediate states of the Hamiltonian replica
exchange simulation of the complex
phase. The protocol here first turns on the harmonic restraint (lambda_restraints
goes from 0.0 to 1.0), then turn off charges (lambda_electrostatics
from 1.0 to 0.0), and finally the Lennard-Jones
interactions are decoupled (lambda_sterics
).
YANK uses a heuristic to find the parameters of the restraint, but it is possible to specify them manually.
experiment-t4-benzene:
system: t4-benzene
protocol: absolute-binding
restraint:
type: Harmonic
spring_constant: 0.2*kilocalories_per_mole/(angstrom**2)
restrained_receptor_atoms: (resi 77 or resi 90 or resi 98 or resi 110) and (mass > 1.5)
restrained_ligand_atoms: (resname MOL) and (mass > 1.5)
This example manually set the spring constant, and attach the harmonic restraint to the centroid of the heavy atoms
identified by the MDTraj DSL selection expression. In general, type
is the name of a class, and the keywords that is possible to specify to configure the restraints are its constructor
parameters. Check the Python API documentation
for a complete overview.
The full script after the changes¶
---
options:
verbose: yes
minimize: yes
default_timestep: 2.0*femtoseconds
default_nsteps_per_iteration: 500
temperature: 300*kelvin
pressure: 1*atmosphere
molecules:
benzene:
filepath: benzene.tripos.mol2
antechamber:
charge_method: bcc
leap:
parameters: [leaprc.gaff]
t4-lysozyme:
filepath: input/t4.pdb
solvents:
water:
nonbonded_method: PME
nonbonded_cutoff: 9*angstroms
clearance: 16*angstroms
solvent_model: tip4pew
leap:
parameters: [leaprc.water.tip4pew]
vacuum:
nonbonded_method: NoCutoff
buffer:
nonbonded_method: PME
nonbonded_cutoff: 11*angstroms
clearance: 12*angstroms
positive_ion: Na+
negative_ion: Cl-
ionic_strength: 100*millimolar
solvent_model: tip3p
leap:
parameters: [leaprc.water.tip3p]
systems:
hydration-system:
solute: benzene
solvent1: water
solvent2: vacuum
leap:
parameters: [leaprc.protein.ff14SB]
t4-benzene:
receptor: t4-lysozyme
ligand: benzene
solvent: buffer
leap:
parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]
protocols:
hydration-protocol:
solvent1:
alchemical_path: auto
solvent2:
alchemical_path: auto
absolute-binding:
complex:
alchemical_path:
lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
lambda_restraints: [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
solvent:
alchemical_path: auto
experiment-t4-benzene:
system: t4-benzene
protocol: absolute-binding
restraint:
type: Harmonic
experiment-hydration:
system: hydration-system
protocol: hydration-protocol
experiments: [experiment-t4-benzene, experiment-hydration]
Combinatorial experiments: implicit and explicit solvents¶
We can use the !Combinatorial
keyword in many sections to generate multiple experiments from a single YAML file while
maintaining the representation of our computational experiment compact. For example, let’s assume we want to compare the
effect of multiple restraint types and the accuracy of the binding free energy estimate changes with TIP3P and the GBSA
implicit solvent model.
solvents:
...
implicit:
nonbonded_method: NoCutoff
implicit_solvent: OBC2
implicit_solvent_salt_conc: 100*millimolar
systems:
...
t4-benzene:
receptor: t4-lysozyme
ligand: benzene
solvent: !Combinatorial [buffer, implicit]
leap:
parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]
experiment-t4-benzene:
system: t4-benzene
protocol: absolute-binding
restraint:
type: !Combinatorial [FlatBottom, Harmonic, Boresch]
Here we have defined a new solvent that uses the GBSA/OBC2 model, and we have changed the systems.t4-benzene.solvent
keyword from simply buffer
to !Combinatorial [buffer, implicit]
. We’ve also added !Combinatorial
in the
experiment-t4-benzene
section, this time specifying several restraint types. When YANK parses experiment-t4-benzene
,
it generates 6 different binding free energy calculations using 3 different restraints and 2 different solvent models.
The full script after the changes¶
---
options:
verbose: yes
minimize: yes
default_timestep: 2.0*femtoseconds
default_nsteps_per_iteration: 500
temperature: 300*kelvin
pressure: 1*atmosphere
molecules:
benzene:
filepath: benzene.tripos.mol2
antechamber:
charge_method: bcc
leap:
parameters: [leaprc.gaff]
t4-lysozyme:
filepath: input/t4.pdb
solvents:
water:
nonbonded_method: PME
nonbonded_cutoff: 9*angstroms
clearance: 16*angstroms
solvent_model: tip4pew
leap:
parameters: [leaprc.water.tip4pew]
vacuum:
nonbonded_method: NoCutoff
buffer:
nonbonded_method: PME
nonbonded_cutoff: 11*angstroms
clearance: 12*angstroms
positive_ion: Na+
negative_ion: Cl-
ionic_strength: 100*millimolar
solvent_model: tip3p
leap:
parameters: [leaprc.water.tip3p]
implicit:
nonbonded_method: NoCutoff
implicit_solvent: OBC2
implicit_solvent_salt_conc: 100*millimolar
systems:
hydration-system:
solute: benzene
solvent1: water
solvent2: vacuum
leap:
parameters: [leaprc.protein.ff14SB]
t4-benzene:
receptor: t4-lysozyme
ligand: benzene
solvent: !Combinatorial [buffer, implicit]
leap:
parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]
protocols:
hydration-protocol:
solvent1:
alchemical_path: auto
solvent2:
alchemical_path: auto
absolute-binding:
complex:
alchemical_path:
lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
lambda_restraints: [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
solvent:
alchemical_path: auto
experiment-t4-benzene:
system: t4-benzene
protocol: absolute-binding
restraint:
type: !Combinatorial [FlatBottom, Harmonic, Boresch]
experiment-hydration:
system: hydration-system
protocol: hydration-protocol
experiments: [experiment-t4-benzene, experiment-hydration]
Changing sampling method¶
By default, YANK uses a combination of Monte Carlo rigid displacement/rotation of the ligand and Langevin dynamics to propagate the replicas of the Hamiltonian Replica Exchange algorithm, but both the propagation and the enhanced sampling algorithm can be configured.
mcmc_moves:
langevin:
type: LangevinSplittingDynamicsMove
timestep: 4.0*femtosecond
n_steps: 1250
collision_rate: 1.0/picosecond
reassign_velocities: no
splitting: 'V R O R V'
n_restart_attempts: 4
samplers:
repex:
type: ReplicaExchangeSampler
mcmc_moves: langevin
number_of_iterations: 100000
online_analysis_interval: 10
sams:
type: SAMSSampler
mcmc_moves: langevin
state_update_scheme: global-jump
gamma0: 10.0
flatness_threshold: 10.0
number_of_iterations: 100000
online_analysis_interval: 1000
The mcmc_moves
block allows to specify the constructor of any MCMCMove
object available in the openmmtools.mcmc
module (see API docs there). The type
keyword is used to
indicate the class name of the MCMCMove
, and the other keywords are passed to its constructor. Note that timestep
and n_steps
overwrite the global options default_timestep
and default_nsteps_per_iteration
respectively
defined above.
The samplers
block has a similar structure, but you can invoke the constructor of any sampler in the yank.multistate
package. Similarly, the number_of_iterations
overwrites the global option default_number_of_iterations
. Note also
that we can pass to the samplers.SAMPLER_ID.mcmc_moves
keyword any identifier of the mcmc_moves
section.
Starting from different input files¶
Files with multiple molecules¶
If you have input files describing multiple molecules, you can define molecules
by selecting all or a subset of them.
molecules:
t4-lysozyme:
filepath: input/t4-frames.pdb
select: 0
binders:
filepath: input/binders.mol2
antechamber:
charge_method: bcc
select: all
The example above pick the first structure in the t4-frames.pdb
file to define the t4-lysozyme
molecule, while
it reads all the molecules in the binders.mol2
file to generate the binders. In the latter case, YANK will generate
as many experiments as the number of molecules in binders.mol2
.
Note
Selecting mol2 and sdf files requires an installation of the OpenEye library.
To select a subset of the structures in a single file you can do
molecules:
t4-lysozyme:
filepath: input/t4-frames.pdb
select: !Combinatorial [0, 1, 4, 5, 8]
SMILES¶
If you have OpenEye installed, then you can also generate small molecules from their SMILES representation.
molecules:
...
benzene:
smiles: c1ccccc1
openeye:
quacpac: am1-bcc
antechamber:
charge_method: null
binders:
filepath: input/L99A-binders.csv
openeye:
quacpac: am1-bcc
antechamber:
charge_method: null
select: all
systems:
t4-benzene:
receptor: t4-lysozyme
ligand: benzene
pack: yes
...
Here we define a benzene
molecules passing directly its SMILES representation to the smiles
keyword. It is also
possible to load a CSV file defining multiple SMILES molecule to generate combinatorial experiments. The charges for
these molecules are assigned using the OpenEye’s AM1-BCC charge scheme rather than antechamber.
Finally, note the pack: yes
keyword in the definition of the system. Since no coordinates are specified by the SMILES
representation, the molecule is placed somewhat randomly in the solvation box. Setting the pack
option brings receptor
and ligand close before generating the system with tleap. This is very helpful in explicit solvent since otherwise tleap
may generate a huge solvation box to include both molecules.
Extra directives for tleap¶
In special cases, you may want to add more directives for tleap. For example, you may want to use tleap to rename some
atoms that are not recognized by the force field you want to load. You can do so by creating a leaprc.renameatoms
including the directive to inject into the tleap script
addPdbAtomMap {
{ 1DH6 DH61 }
{ 2DH6 DH62 }
}
and then load it normally together with the leap parameters in the YAML file
molecules:
t4-lysozyme:
filepath: input/t4.pdb
leap:
parameters: [leaprc.renameatoms]
When the automatic pipeline won’t do¶
The easiest way to set up a free energy calculation in YANK is to use the automatic pipeline based on AmberTools, but sometimes that is not flexible enough. In this case, you can use any other program to generate parameter/coordinates files in Amber or Gromacs format or and XML representation of an OpenMM system.
solvents:
pme:
nonbonded_method: PME
nonbonded_cutoff: 11*angstroms
vacuum:
nonbonded_method: NoCutoff
systems:
t4-benzene:
phase1_path: [complex.top, complex.gro]
phase2_path: [solvent.top, solvent.gro]
ligand_dsl: resname MOL
solvent: pme
gromacs_include_dir: include/
hydration-system:
phase1_path: [solvent1.prmtop, solvent1.inpcrd]
phase2_path: [solvent2.prmtop, solvent2.inpcrd]
solvent1: pme
solvent2: vacuum
The t4-benzene
system now loads parameter/coordinates files in Gromacs format for the calculation of the ligand in
complex and in bulk. The ligand is identified by the ligand_dsl
MDTraj DSL selection string. Note that the solvent
must still be specified to indicate the long-range interactions treatment.
When ligand_dsl
is not specified, YANK alchemically modifies the whole solute, which is used in hydration-system
to perform a hydration free energy calculation.
Note
If your ligand/solute has a net charged and you’re using PME, your systems should contain enough ions to neutralizing. YANK will decouple them together with the system to maintain a neutral solvation box.