YAML syntax tutorial

This tutorial assumes you’ve installed YANK successfully.

We’ll start by creating a YAML file for a hydration free energy calculation in explicit solvent, and then we’ll add an absolute binding free energy experiment in explicit and implicit solvent.

A first example: Hydration free energy

We’ll go line-by-line through the sections of the following YAML file for YANK that sets up and run a hydration binding free energy of a small molecule. The three dashes (---) is the standard way in YAML to indicate the beginning of a YAML document.

---

options:
  verbose: yes
  minimize: yes
  default_timestep: 2.0*femtoseconds
  default_nsteps_per_iteration: 500
  temperature: 300*kelvin
  pressure: 1*atmosphere

molecules:
  benzene:
    filepath: benzene.tripos.mol2
    antechamber:
      charge_method: bcc
    leap:
      parameters: [leaprc.gaff]

solvents:
  water:
    nonbonded_method: PME
    nonbonded_cutoff: 9*angstroms
    clearance: 16*angstroms
    solvent_model: tip4pew
    leap:
      parameters: [leaprc.water.tip4pew]
  vacuum:
    nonbonded_method: NoCutoff

systems:
  hydration-system:
    solute: benzene
    solvent1: water
    solvent2: vacuum
    leap:
      parameters: [leaprc.protein.ff14SB]

protocols:
  hydration-protocol:
    solvent1:
      alchemical_path: auto
    solvent2:
      alchemical_path: auto

experiments:
  system: hydration-system
  protocol: hydration-protocol

Prepare the molecules

Let’s skip the options section for now and look at the molecules block (docs). This is the place where you can specify the pipeline that a molecule should go through before eventually being solvated or combined with another molecule to form a complex.

molecules:
  benzene:
    filepath: benzene.tripos.mol2
    antechamber:
      charge_method: bcc
    leap:
      parameters: [leaprc.gaff]

The benzene keyword here is a string identifier so it can be any name we want. The nested filepath keyword points to a mol2 file specifying the molecule topology and initial coordinates of the atoms. The automatic pipeline in YANK is based on AmberTools so you can determine charges with antechamber (in this case AM1-BCC point charges) and assign the parameters to the molecule by loading any parameter file accessible to tleap.

The tleap parameters can be assigned in the systems section as well with identical effects, but when you have multiple molecules and you want to assign different parameters to them, it is convenient to be able to specify the parameter file to load in the molecules section.

Specify the solvent

We’ll turn off benzene’s nonbonded interactions in explicit water and turn them back on in vacuum to close the thermodynamic cycle so let’s create two solvents.

solvents:
  water:
    nonbonded_method: PME
    nonbonded_cutoff: 9*angstroms
    clearance: 16*angstroms
    solvent_model: tip4pew
    leap:
      parameters: [leaprc.water.tip4pew]
  vacuum:
    nonbonded_method: NoCutoff

Again, water and vacuum` are just two arbitrary string identifiers. The ``vacuum solvent is quite simple. The water solvent we use Particle Mesh Ewald for long-range electrostatic treatment with a cutoff of 9A for both Coulomb and Lennard-Jones interactions.

The clearance and solvent_model keywords instruct YANK to solvate the solute with TIP4P-Ew water molecules so that the solute will be distant from the box boundaries by 16A. Finally, we ask leap to load the TIP4P-Ew parameters when solvating. Again, this parameter file can be specified in the in systems section.

Prepare the systems

Finally let’s put everything together to create the systems in solvent and in vacuum that YANK will use to run the hydration free energy calculation.

systems:
  hydration-system:
    solute: benzene
    solvent1: water
    solvent2: vacuum
    leap:
      parameters: [leaprc.protein.ff14SB]

hydration-system is, as usual, an arbitrary identifier. Here we are instructing YANK to create two systems by solvating benzene in solvent1 (i.e. the water specified above) and in vacuum. YANK will use these two systems to compute the free energy of transferring benzene from solvent1 to solvent2. Finally, we load the general amber parameters that are required by tleap.

If we don’t specify the parameters in the solvents and molecules sections, we can do it here.

systems:
  hydration-system:
    ...
    leap:
      parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew, leaprc.gaff]

Alchemical protocol

The protocols section determines the intermediate states to sample during both phases of the the alchemical calculations. YANK provides a method to determine this automatically by spacing the intermediate states equally in thermodynamic length. We’ll see later in the tutorial how you can specify an alchemical path manually instead.

protocols:
  hydration-protocol:
    solvent1:
      alchemical_path: auto
    solvent2:
      alchemical_path: auto

Experiment

This is where we associate a protocol to the system and tell YANK to run the hydration free energy calculation.

experiments:
  system: hydration-system
  protocol: hydration-protocol

Running options

Finally, the options section contains some several parameters for the simulation (see here for a complete list).

options:
  verbose: yes
  minimize: yes
  default_timestep: 2.0*femtoseconds
  default_nsteps_per_iteration: 500
  default_number_of_iterations: 1000
  temperature: 300*kelvin
  pressure: 1*atmosphere

Here we’re setting the verbosity of the output, and asking YANK to minimize the systems before running 1000 iterations of Hamiltonian replica exchange. Each iteration, by default, perform a Monte Carlo rigid translation and rotation of the ligand followed, in this case, by 1ps of Langevin dynamics (with 2fs time step) keeping the temperature at 300K. The pressure is controlled by a Monte Carlo barostat.

Since it is possible to specify multiple free energy calculations in a single YAML file, this section is used to keep the options to use as default, but they can be overwritten by other parts of the YAML document as well. For example, the options block can be added in the experiment specification to overwrite some or all of the default values.

experiments:
  system: hydration-system
  protocol: hydration-protocol
  options:
    temperature: 310*kelvin
    default_nsteps_per_iteration: 1000

Add an absolute binding free energy experiment

We mentioned that it’s possible to specify multiple experiments in a single YAML file. Let’s add an absolute binding free energy calculation.

Preparing protein receptors

We start as before by preparing the receptor.

molecules:
  ...
  t4-lysozyme:
    filepath: input/t4.pdb

In this case, we don’t specify leap parameters since we’ll add them to the system.

Note

Since the PDB file goes through tleap, your PDB file should adopt the residue names consistent with their protonation state as expected by tleap (e.g., CYS for protonated Cysteine, CYM for deprotonated, and CYX for cysteines forming a disulphide bridge).

Add buffer and neutralizing ions

In this case, we want to simulate a particular buffer ionic strength so we create a new solvent.

solvents:
  ...
  buffer:
    nonbonded_method: PME
    nonbonded_cutoff: 11*angstroms
    clearance: 12*angstroms
    positive_ion: Na+
    negative_ion: Cl-
    ionic_strength: 100*millimolar
    solvent_model: tip3p
    leap:
      parameters: [leaprc.water.tip3p]

By adding the three new keywords positive_ion, negative_ion, and ionic_strength, YANK will add neutralizing and buffer Na+ and Cl- ions.

Note

If the molecule that is alchemically decoupled has a net charge, YANK will automatically select a random counterion to decouple together with the ligand.

Create receptor ligand system

Now we create the system by combining the t4-lysozyme and benzene to form a complex.

systems:
  ...
  t4-benzene:
    receptor: t4-lysozyme
    ligand: benzene
    solvent: buffer
    leap:
      parameters: [leaprc.protein.ff14SB, leaprc.gaff2, leaprc.water.tip4pew]

This time instead of solute, solvent1, and solvent2 we have receptor, ligand, and solvent. An experiment using this system will decouple the ligand in complex and turn back on the interaction in bulk.

Restraints

Absolute binding calculations often require a restraint to keep the ligand in the binding pocket.

protocol:
  ...
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
        lambda_restraints:     [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
    solvent:
      alchemical_path: auto

experiment-t4-benzene:
  system: t4-benzene
  protocol: absolute-binding
  restraint:
    type: Harmonic

experiment-hydration:
  ...

experiments: [experiment-t4-benzene, experiment-hydration]

There are three main changes here. First, notice how the experiments section now points to a list of experiments. This is necessary since we are using the same YAML file to run multiple experiments. Secondly, the experiment-t4-benzene block contains a restraint.type keyword that we can be used to specify the type of restraint we want to use. Other available restraints are FlatBottom, Boresch, and PeriodicTorsionBoresch.

Finally, we’ve used a manual protocol instead of auto to control the intermediate states of the Hamiltonian replica exchange simulation of the complex phase. The protocol here first turns on the harmonic restraint (lambda_restraints goes from 0.0 to 1.0), then turn off charges (lambda_electrostatics from 1.0 to 0.0), and finally the Lennard-Jones interactions are decoupled (lambda_sterics).

YANK uses a heuristic to find the parameters of the restraint, but it is possible to specify them manually.

experiment-t4-benzene:
  system: t4-benzene
  protocol: absolute-binding
  restraint:
    type: Harmonic
    spring_constant: 0.2*kilocalories_per_mole/(angstrom**2)
    restrained_receptor_atoms: (resi 77 or resi 90 or resi 98 or resi 110) and (mass > 1.5)
    restrained_ligand_atoms: (resname MOL) and (mass > 1.5)

This example manually set the spring constant, and attach the harmonic restraint to the centroid of the heavy atoms identified by the MDTraj DSL selection expression. In general, type is the name of a class, and the keywords that is possible to specify to configure the restraints are its constructor parameters. Check the Python API documentation for a complete overview.

The full script after the changes

---

options:
  verbose: yes
  minimize: yes
  default_timestep: 2.0*femtoseconds
  default_nsteps_per_iteration: 500
  temperature: 300*kelvin
  pressure: 1*atmosphere

molecules:
  benzene:
    filepath: benzene.tripos.mol2
    antechamber:
      charge_method: bcc
    leap:
      parameters: [leaprc.gaff]
  t4-lysozyme:
    filepath: input/t4.pdb

solvents:
  water:
    nonbonded_method: PME
    nonbonded_cutoff: 9*angstroms
    clearance: 16*angstroms
    solvent_model: tip4pew
    leap:
      parameters: [leaprc.water.tip4pew]
  vacuum:
    nonbonded_method: NoCutoff
  buffer:
    nonbonded_method: PME
    nonbonded_cutoff: 11*angstroms
    clearance: 12*angstroms
    positive_ion: Na+
    negative_ion: Cl-
    ionic_strength: 100*millimolar
    solvent_model: tip3p
    leap:
      parameters: [leaprc.water.tip3p]

systems:
  hydration-system:
    solute: benzene
    solvent1: water
    solvent2: vacuum
    leap:
      parameters: [leaprc.protein.ff14SB]
  t4-benzene:
    receptor: t4-lysozyme
    ligand: benzene
    solvent: buffer
    leap:
      parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]

protocols:
  hydration-protocol:
    solvent1:
      alchemical_path: auto
    solvent2:
      alchemical_path: auto
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
        lambda_restraints:     [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
    solvent:
      alchemical_path: auto

experiment-t4-benzene:
  system: t4-benzene
  protocol: absolute-binding
  restraint:
    type: Harmonic

experiment-hydration:
  system: hydration-system
  protocol: hydration-protocol

experiments: [experiment-t4-benzene, experiment-hydration]

Combinatorial experiments: implicit and explicit solvents

We can use the !Combinatorial keyword in many sections to generate multiple experiments from a single YAML file while maintaining the representation of our computational experiment compact. For example, let’s assume we want to compare the effect of multiple restraint types and the accuracy of the binding free energy estimate changes with TIP3P and the GBSA implicit solvent model.

solvents:
  ...
  implicit:
    nonbonded_method: NoCutoff
    implicit_solvent: OBC2
    implicit_solvent_salt_conc: 100*millimolar

systems:
  ...
  t4-benzene:
    receptor: t4-lysozyme
    ligand: benzene
    solvent: !Combinatorial [buffer, implicit]
    leap:
      parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]

experiment-t4-benzene:
  system: t4-benzene
  protocol: absolute-binding
  restraint:
    type: !Combinatorial [FlatBottom, Harmonic, Boresch]

Here we have defined a new solvent that uses the GBSA/OBC2 model, and we have changed the systems.t4-benzene.solvent keyword from simply buffer to !Combinatorial [buffer, implicit]. We’ve also added !Combinatorial in the experiment-t4-benzene section, this time specifying several restraint types. When YANK parses experiment-t4-benzene, it generates 6 different binding free energy calculations using 3 different restraints and 2 different solvent models.

The full script after the changes

---

options:
  verbose: yes
  minimize: yes
  default_timestep: 2.0*femtoseconds
  default_nsteps_per_iteration: 500
  temperature: 300*kelvin
  pressure: 1*atmosphere

molecules:
  benzene:
    filepath: benzene.tripos.mol2
    antechamber:
      charge_method: bcc
    leap:
      parameters: [leaprc.gaff]
  t4-lysozyme:
    filepath: input/t4.pdb

solvents:
  water:
    nonbonded_method: PME
    nonbonded_cutoff: 9*angstroms
    clearance: 16*angstroms
    solvent_model: tip4pew
    leap:
      parameters: [leaprc.water.tip4pew]
  vacuum:
    nonbonded_method: NoCutoff
  buffer:
    nonbonded_method: PME
    nonbonded_cutoff: 11*angstroms
    clearance: 12*angstroms
    positive_ion: Na+
    negative_ion: Cl-
    ionic_strength: 100*millimolar
    solvent_model: tip3p
    leap:
      parameters: [leaprc.water.tip3p]
  implicit:
    nonbonded_method: NoCutoff
    implicit_solvent: OBC2
    implicit_solvent_salt_conc: 100*millimolar

systems:
  hydration-system:
    solute: benzene
    solvent1: water
    solvent2: vacuum
    leap:
      parameters: [leaprc.protein.ff14SB]
  t4-benzene:
    receptor: t4-lysozyme
    ligand: benzene
    solvent: !Combinatorial [buffer, implicit]
    leap:
      parameters: [leaprc.protein.ff14SB, leaprc.water.tip4pew]

protocols:
  hydration-protocol:
    solvent1:
      alchemical_path: auto
    solvent2:
      alchemical_path: auto
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 0.80, 0.70, 0.60, 0.50, 0.40, 0.30, 0.20, 0.10, 0.00]
        lambda_restraints:     [0.00, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
    solvent:
      alchemical_path: auto

experiment-t4-benzene:
  system: t4-benzene
  protocol: absolute-binding
  restraint:
    type: !Combinatorial [FlatBottom, Harmonic, Boresch]

experiment-hydration:
  system: hydration-system
  protocol: hydration-protocol

experiments: [experiment-t4-benzene, experiment-hydration]

Changing sampling method

By default, YANK uses a combination of Monte Carlo rigid displacement/rotation of the ligand and Langevin dynamics to propagate the replicas of the Hamiltonian Replica Exchange algorithm, but both the propagation and the enhanced sampling algorithm can be configured.

mcmc_moves:
  langevin:
    type: LangevinSplittingDynamicsMove
    timestep: 4.0*femtosecond
    n_steps: 1250
    collision_rate: 1.0/picosecond
    reassign_velocities: no
    splitting: 'V R O R V'
    n_restart_attempts: 4

samplers:
  repex:
    type: ReplicaExchangeSampler
    mcmc_moves: langevin
    number_of_iterations: 100000
    online_analysis_interval: 10
  sams:
    type: SAMSSampler
    mcmc_moves: langevin
    state_update_scheme: global-jump
    gamma0: 10.0
    flatness_threshold: 10.0
    number_of_iterations: 100000
    online_analysis_interval: 1000

The mcmc_moves block allows to specify the constructor of any MCMCMove object available in the openmmtools.mcmc module (see API docs there). The type keyword is used to indicate the class name of the MCMCMove, and the other keywords are passed to its constructor. Note that timestep and n_steps overwrite the global options default_timestep and default_nsteps_per_iteration respectively defined above.

The samplers block has a similar structure, but you can invoke the constructor of any sampler in the yank.multistate package. Similarly, the number_of_iterations overwrites the global option default_number_of_iterations. Note also that we can pass to the samplers.SAMPLER_ID.mcmc_moves keyword any identifier of the mcmc_moves section.

Starting from different input files

Files with multiple molecules

If you have input files describing multiple molecules, you can define molecules by selecting all or a subset of them.

molecules:
  t4-lysozyme:
    filepath: input/t4-frames.pdb
    select: 0
  binders:
    filepath: input/binders.mol2
    antechamber:
      charge_method: bcc
    select: all

The example above pick the first structure in the t4-frames.pdb file to define the t4-lysozyme molecule, while it reads all the molecules in the binders.mol2 file to generate the binders. In the latter case, YANK will generate as many experiments as the number of molecules in binders.mol2.

Note

Selecting mol2 and sdf files requires an installation of the OpenEye library.

To select a subset of the structures in a single file you can do

molecules:
  t4-lysozyme:
    filepath: input/t4-frames.pdb
    select: !Combinatorial [0, 1, 4, 5, 8]

SMILES

If you have OpenEye installed, then you can also generate small molecules from their SMILES representation.

molecules:
  ...
  benzene:
    smiles: c1ccccc1
    openeye:
      quacpac: am1-bcc
    antechamber:
      charge_method: null
  binders:
    filepath: input/L99A-binders.csv
    openeye:
      quacpac: am1-bcc
    antechamber:
      charge_method: null
    select: all

systems:
  t4-benzene:
    receptor: t4-lysozyme
    ligand: benzene
    pack: yes
    ...

Here we define a benzene molecules passing directly its SMILES representation to the smiles keyword. It is also possible to load a CSV file defining multiple SMILES molecule to generate combinatorial experiments. The charges for these molecules are assigned using the OpenEye’s AM1-BCC charge scheme rather than antechamber.

Finally, note the pack: yes keyword in the definition of the system. Since no coordinates are specified by the SMILES representation, the molecule is placed somewhat randomly in the solvation box. Setting the pack option brings receptor and ligand close before generating the system with tleap. This is very helpful in explicit solvent since otherwise tleap may generate a huge solvation box to include both molecules.

Extra directives for tleap

In special cases, you may want to add more directives for tleap. For example, you may want to use tleap to rename some atoms that are not recognized by the force field you want to load. You can do so by creating a leaprc.renameatoms including the directive to inject into the tleap script

addPdbAtomMap {
  { 1DH6 DH61 }
  { 2DH6 DH62 }
}

and then load it normally together with the leap parameters in the YAML file

molecules:
  t4-lysozyme:
    filepath: input/t4.pdb
    leap:
      parameters: [leaprc.renameatoms]

When the automatic pipeline won’t do

The easiest way to set up a free energy calculation in YANK is to use the automatic pipeline based on AmberTools, but sometimes that is not flexible enough. In this case, you can use any other program to generate parameter/coordinates files in Amber or Gromacs format or and XML representation of an OpenMM system.

solvents:
    pme:
      nonbonded_method: PME
      nonbonded_cutoff: 11*angstroms
    vacuum:
      nonbonded_method: NoCutoff

  systems:
    t4-benzene:
      phase1_path: [complex.top, complex.gro]
      phase2_path: [solvent.top, solvent.gro]
      ligand_dsl: resname MOL
      solvent: pme
      gromacs_include_dir: include/
    hydration-system:
      phase1_path: [solvent1.prmtop, solvent1.inpcrd]
      phase2_path: [solvent2.prmtop, solvent2.inpcrd]
      solvent1: pme
      solvent2: vacuum

The t4-benzene system now loads parameter/coordinates files in Gromacs format for the calculation of the ligand in complex and in bulk. The ligand is identified by the ligand_dsl MDTraj DSL selection string. Note that the solvent must still be specified to indicate the long-range interactions treatment.

When ligand_dsl is not specified, YANK alchemically modifies the whole solute, which is used in hydration-system to perform a hydration free energy calculation.

Note

If your ligand/solute has a net charged and you’re using PME, your systems should contain enough ions to neutralizing. YANK will decouple them together with the system to maintain a neutral solvation box.