Cookbook for YAML Options

Having all the options laid out in front of you is good for advanced users, but sometimes practical examples are much more helpful. Below is the YANK YAML Cookbook: A series of examples that may help you understanding how to put together the options found on the other pages.


Best Practices Shown by These Examples

The following examples exemplify some of the best practices <http://www.alchemistry.org/wiki/Best_Practices> laid out on alchemistry.org <http://www.alchemistry.org/>. Specifically:

  • There are no partial atomic charges on an atom while its Lennard-Jones interactions are being removed.
  • The whole free energy setup and run is automated through YANK’s YAML format, meaning its easily transferable and repeatable
  • A statistically efficient alchemical path is selected because the softcore_X parameters were not set, so the default, efficient, soft core parameters were chosen automatically.

Example: Implicit Solvent Binding Simulation:

This example sets up para-xylene binding to T4-Lysozyme in implicit solvent. We recommend these settings as a good baseline simulation setup. Although you do not have to set all these options (e.g. the ligand may be a pdb file, not just a SMILES string), many of these options are good stock settings to remember.

In this example:

  • Targeting folders for input files
  • Configuring Output files
  • Setting good stock options for implicit simulations
  • Setting up simple protein/ligand binding simulation
  • NVT ensemble
options:
  verbose: yes
  setup_dir: setup
  output_dir: output
  experiments_dir: experiments
  randomize_ligand: yes
  minimize: yes
  number_of_iterations: 2000
  temperature: 300*kelvin
  pressure: null

molecules:
  T4_lysozyme:
    filepath: T4.pdb
  p-xylene:
    smiles: CC1=CC=C(C=C1)C
    antechamber:
      charge_method: bcc

solvents:
  GBSA:
    nonbonded_method: NoCutoff
    implicit_solvent: OBC2
    solvent_dielectric: 78.5

systems:
  T4-xylene-complex:
    receptor: T4_lysozyme
    ligand: p-xylene
    solvent: GBSA
    pack: yes
    leap:
      parameters: [leaprc.ff14SB, leaprc.gaff]

protocols:
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.00]
    solvent:
      alchemical_path:
        lambda_electrostatics: [1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.00]

experiments:
  system: T4-xylene-complex
  protocol: absolute-binding
  restraint:
    type: FlatBottom

After this is all setup, simply run: yank script --yaml={ThisScriptName.yaml} and YANK will take care of the rest for you.


Example: Absolute Binding free energy in explicit solvent

This example takes the same para-xylene binding to T4-Lysozyme system as before, but now uses an explicit solvent setup, minimal options, and automatic water addition (TIP3P).

This example also shows how to make YANK run with MPI; assumes 4 nodes are available. It should be noted there is nothing you set in the YAML file or with YANK itself to run with MPI. YANK automatically detects if MPI was called to run YANK and interacts with it accordingly.

In this Example:

  • Automatic solvent addition
  • Setting good stock options for explicit simulations
  • Call MPI
  • NPT ensemble
 options:
   minimize: yes
   verbose: yes
   output_dir: .
   number_of_iterations: 2000
   temperature: 300*kelvin
   pressure: 1*atmosphere

  molecules:
    t4-lysozyme:
      filepath: setup/receptor.pdbfixer.pdb
      parameters: leaprc.ff14SB
    p-xylene:
      filepath: setup/ligand.tripos.mol2
      antechamber:
        charge_method: bcc

solvents:
  PME:
    nonbonded_method: PME
    nonbonded_cutoff: 0.9*nanometer
    switch_distance: 0.8*nanometer
    clearance: 12*angstroms
    positive_ion: Na+
    negative_ion: Cl-

systems:
  t4-xylene-explicit:
    receptor: t4-lysozyne
    ligand: p-xylene
    solvent: PME
    leap:
      parameters: [leaprc.ff12, leaprc.gaff]

protocols:
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.00]
    solvent:
      alchemical_path:
        lambda_electrostatics: [1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]
        lambda_sterics:        [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.00]

experiments:
  system: t4-xylene-explicit
  protocol: absolute-binding
  restraint:
    type: Harmonic

Now run:

build_mpirun_configfile "yank script --yaml=yank.yaml"
mpiexec -configfile configfile

The build_mpirun_configfile is a command available if you have installed YANK through conda, and though the clusterutils repo.

Raw YAML File Examples

# Here are listed all the options that is possible to specify in a
# Yank YAML script. There are no mandatory options. If not specified,
# an option assumes the default value listed below.
---
options:
  # GENERAL OPTIONS
  # ---------------
  verbose: no                       # Turn on/off verbose output.
  resume_setup: no                  # By default, Yank will raise an error when it detects
  resume_simulation: no             # that it will have to overwrite an existing file. Set
                                    # resume_setup and/or resume_simulation if you want Yank
                                    # to resume from an existing setup (molecules and system
                                    # files) and/or simulation (netcdf4 trajectory files)
                                    # respectively instead.
  output_dir: output                # The main output folder. A relative path is interpreted
                                    # as relative w.r.t. the YAML script path.
  setup_dir: setup                  # The main folder where the setup and simulation files
  experiments_dir: experiments      # are saved. Relative paths are interpreted as relative
                                    # w.r.t. output_dir path.
  platform: fastest                 # The OpenMM platform to use between 'Reference', 'CPU',
                                    # 'OpenCL', and 'CUDA'. The default value 'fastest' selects
                                    # automatically the fastest available platform.
  precision: auto                   # Precision mode. For OpenCL and CUDA platforms, this
                                    # can be set to 'single', 'mixed' or 'double'. The default
                                    # value 'auto' selects always 'mixed' when the device
                                    # support this precision, otherwise 'single'.

  # SYSTEM AND SIMULATION PREPARATION
  # ---------------------------------
  randomize_ligand: no                          # Randomize the position of the ligand before starting the
  randomize_ligand_sigma_multiplier: 2.0        # simulation. This works only in implicit solvent. The
  randomize_ligand_close_cutoff: 1.5 * angstrom # ligand will be randomly rotated and displaced by a vector
                                                # with magnitude proportional to randomize_ligand_sigma_multiplier
                                                # with the constraint of being at a distance greater than
                                                # randomize_ligand_close_cutoff from the receptor.
  temperature: 298.0 * kelvin                   # Temperature of the system.
  pressure: 1.0 * atmosphere                    # Pressure of the system. Set to null for NVT ensemble.
  hydrogen_mass: 1.0 * amu                      # Hydrogen mass for HMR simulations.
  constraints: HBonds                           # Constrain bond lengths and angles. Possible values are null,
                                                # HBonds, AllBonds, and HAngles (see Openmm createSystem()).

  # SIMULATION PARAMETERS
  # ---------------------
  online_analysis: no                                           # If set, analysis will occur each iteration.
  online_analysis_min_iterations: 20                            # Minimum number of iterations needed to begin online analysis.
  show_energies: yes                                            # If True, will print energies at each iteration.
  show_mixing_statistics: yes                                   # If True, will show mixing statistics at each iteration.
  minimize: yes                                                 # Minimize configurations before running the simulation.
  minimize_max_iterations: 0                                    # Maximum number of iterations for minimization.
  minimize_tolerance: 1.0 * kilojoules_per_mole / nanometers    # Set minimization tolerance.
  number_of_equilibration_iterations: 1                         # Number of equilibration iterations.
  equilibration_timestep: 1.0 * femtosecond                     # Timestep for use in equilibration.
  number_of_iterations: 1                                       # Number of replica-exchange iterations to simulate.
  nsteps_per_iteration: 500                                     # Number of timesteps per iteration.
  timestep: 2.0 * femtosecond                                   # Timestep for Langevin dyanmics.
  replica_mixing_scheme: swap-all                               # Specify how to mix replicas. Possible values are
                                                                # swap-neighbors and swap-all.
  collision_rate: 5.0 / picosecond                              # The collision rate used for Langevin dynamics.
  constraint_tolerance: 1.0e-6                                  # Relative constraint tolerance.
  mc_displacement_sigma: 10.0 * angstroms                       # Yank will augument Langevin dynamics with MC moves
                                                                # rotating and displacing the ligand. This control the
                                                                # size of the displacement.

  # ALCHEMY PARAMETERS
  # ------------------
  annihilate_electrostatics: yes            # If set, electrostatics is annihilated, rather than decoupled.
  annihilate_sterics: no                    # If set, sterics (Lennard-Jones or Halgren potential) is annihilated,
                                            # rather than decoupled.
  softcore_alpha: 0.5                       # Alchemical softcore parameter for Lennard-Jones.
  softcore_beta: 0.0                        # Alchemical softcore parameter for electrostatics.
                                            # to recover standard electrostatic scaling.
  softcore_a: 1                             # Parameters modifying softcore Lennard-Jones form.
  softcore_b: 1
  softcore_c: 6
  softcore_d: 1
  softcore_e: 1
  softcore_f: 2

# This example show how to setup a combinatorial experiment. We have
# 2 receptors (each in 2 different conformations) and 2 ligands and we
# want to measure the binding affinities for all possible combination
# of these molecules in both implicit and explicit solvent.

# Three dashes start a YAML script
---

# See the file "all-options.yaml" for the meaning of these options.
# -----------------------------------------------------------------
options:
  verbose: true
  output_dir: test_kinase
  temperature: 310*kelvin
  pressure: 1*atmosphere
  constraints: HBonds
  number_of_iterations: 200
  minimize: yes


# In the "molecules" section we specify how we want to our molecules
# to be prepared (i.e. their initial structure and which force
# field parameters to use for them).
# ------------------------------------------------------------------
molecules:
  abl:                                              # Each molecule is identified by an ID, "abl" in this case.
                                                    #
    filepath: !Combinatorial [2HYY.pdb, 3CS9.pdb]   # Lists preceded by "!Combinatorial" here will be expanded
                                                    # combinatorially, so the section "abl" actually specifies
                                                    # two different starting configurations of the kinase Abl.
                                                    #
    strip_protons: yes                              # Let tleap re-add all hydrogen atoms. This is useful if
                                                    # the PDB contains atom names for hydrogen that AMBER does
                                                    # not recognize. The default for this parameter is "no".
  src:
    filepath: !Combinatorial [2OIQ_A.pdb, 3EL7_A.pdb]
    strip_protons: yes

  imatinib:
    filepath: clinical-kinase-inhibitors.csv        # Yank supports pdb, mol2, sdf and csv files. The last one
                                                    # must contain one row for each molecule. The second column
                                                    # must be the SMILES description of the molecule (the first
                                                    # column if the csv file has only one). Yank can use the
                                                    # OpenEye toolkits to generate a molecule from its name or
                                                    # its SMILES description. Instead of using "filepath", you
                                                    # can directly use "smiles" or "name". For example, to
                                                    # specify a benzene molecule you can use either:
                                                    #
                                                    # molecules:
                                                    #   benzene_smiles:
                                                    #     smiles: c1ccccc1
                                                    #     parameters: antechamber
                                                    #   benzene_name:
                                                    #     name: benzene
                                                    #     parameters: antechamber
                                                    #
    select: 0                                       # Select the first row of the csv file which contains the
                                                    # SMILES description of imatinib. Note that the list syntax
                                                    # works everywhere within the "molecules" section, so you
                                                    # could specify both imatinib and bosutinib using
                                                    #
                                                    # molecules:
                                                    #   ligand:
                                                    #     filepath: clinical-kinase-inhibitors.csv
                                                    #     parameters: antechamber
                                                    #     select: [0, 3]
                                                    #
                                                    # The "select" keyword works the same way if you specify a
                                                    # pdb, mol2, or an sdf file containing multiple structures.
                                                    # You can alternatively specify "select: all" which includes
                                                    # all the molecules in the given file. This is also the
                                                    # default value.
                                                    #
    antechamber:                                    # Let antechamber parametrize the molecule using GAFF and
      charge_method: bcc                            # AM1-BCC charges. See antechamber manual for supported
                                                    # charge methods. If OpenEye Toolkits are installed, it is
                                                    # possible to determine the charge with the OpenEye
                                                    # recommended scheme with
                                                    #
                                                    # molecules:
                                                    #   imatinib:
                                                    #     filepath: imatinib.mol2
                                                    #     openeye:
                                                    #       quacpac: am1-bcc
                                                    #     antechamber:
                                                    #       charge_method: null
                                                    #
                                                    # Notice that charge_method is set to null to keep the
                                                    # partial charges determined by openeye.
                                                    #
    epik:                                           # Run Schrodinger's tool Epik with default parameters and
      select: 0                                     # select the most likely protonation state for the molecule
      tautomerize: no                               # (in solution). More control over epik parameters is
      ph: 7.6                                       # possible (see YAML documentation).
      ph_tolerance: 0.7

  bosutinib:
    filepath: molecule_dir/bosutinib.mol2
    leap:                                           # It is possible to optionally include molecule-specific
      parameters: [bosutinib.frcmod, bosutinib.off] # parameters files to import in tleap before creating the
                                                    # system. General parameter files (i.e. leaprc.ff14SB,
                                                    # leaprc.gaff, etc.) are generally specified in the system
                                                    # (see below). To import a single parameter file (i.e. not a
                                                    # list), square brackets are not required.


# Here we specify the parameters of our solvent. The list of
# configuration here is not 100% complete. In general, all parameters
# of simtk.openmm.app.amberprmtopfile.AmberPrmtopFile.createSystem()
# can be specified in this section.
# -----------------------------------------------------------------
solvents:
  RF:                                               # Arbitrary ID for this solvent ("RF" in this case).
    nonbonded_method: CutoffPeriodic                # This specifies an explicit solvent using reaction field.
    nonbonded_cutoff: 1*nanometer                   # Cutoff for interactions is set at 1nm.
    clearance: 10*angstroms                         # The edge of the solvation box will be at a 10 angstroms
                                                    # distance from any atom of the receptor and ligand.
    positive_ion: Na+                               # Neutralizing ions to use.
    negative_ion: Cl-
  vacuum:
    nonbonded_method: NoCutoff

  # We will not use the next two solvents in the experiment
  # but here are two examples of how to define an implicit
  # and a PME solvents with some advanced options supported
  # by OpenMM.
  GBSA:
    nonbonded_method: NoCutoff                      # Implicit solvent using GBSA/OBC2 model.
    implicit_solvent: OBC2
    implicit_solvent_salt_concentration: 1.0*mole/liter
    solute_dielectric: 1.5
    solvent_dielectric: 80.0
  PME:
    nonbonded_method: PME
    nonbonded_cutoff: 1*nanometer
    switch_distance: 0.9*nanometer
    ewald_error_tolerance: 0.0003
    clearance: 10*angstroms
    positive_ion: Na+
    negative_ion: Cl-


# Here we describe the system for the calculation in terms of
# components that form the system. Just as in the section
# "molecules", list preceded by "!Combinatorial" in the section
# "systems" will be expanded combinatorially. Here below we have
# 2x2x2x2=16 calculations to run (2 receptors x 2 initial receptor
# structures x 2 ligands x 2 solvent models).
# -----------------------------------------------------------------
systems:                                                 # If you don't need combinations but you only want to run a
  kinase-inhibitor:                                      # single calculation, don't use combinatorial lists. Example:
    receptor: !Combinatorial [abl, src]                  #   receptor: abl
    ligand: !Combinatorial [imatinib, bosutinib]         #   ligand: imatinib
    solvent: !Combinatorial [RF, vacuum]                 #   solvent: RF
    pack: yes                                            # If the ligand is far away from the receptor or if there
                                                         # are clashing atoms (defined as closer than 1.5 angstroms),
                                                         # Yank will randomly translate and rotate the ligand until
                                                         # this is solved. Set this to "no" if you don't wan't to
                                                         # modify the initial position of the ligand as defined in
                                                         # your input file.
    leap:
      parameters: [oldff/leaprc.ff99SBildn, leaprc.gaff] # All our calculations will use the parameters in AMBER
                                                         # ff99SBildn force field and GAFF.

  imatinib-hydration:                                    # System for hydration free energy of imatinib
    solute: imatinib
    solvent1: RF
    solvent2: vacuum
    leap:
      parameters: [leaprc.ff14SB, leaprc.gaff]           # General parameters files for water and imatinib

  # It is also possible to skip the automatic system setup
  # and provide your own system in AMBER or Gromacs format
  amber-system:
    phase1_path: [complex.prmtop, complex.inpcrd]        # System files for the complex phase.
    phase2_path: [solvent.prmtop, solvent.inpcrd]        # System files for the solvent phase.
    ligand_dsl: resname MOL                              # MDTraj DSL string to select the ligand.
    solvent: RF                                          # Specify how to model the solvent.

  gromacs-system:
    phase1_path: [complex.top, complex.gro]
    phase2_path: [solvent.top, solvent.gro]
    gromacs_include_dir: include/                        # Optional path to the directory containing the files
                                                         # included in .top files.
    ligand_dsl: resname MOL
    solvent: RF


# Here we specify the alchemical path of our simulation.
# -----------------------------------------------------------------
protocols:
  absolute-binding:
    complex:
      alchemical_path:
        lambda_electrostatics: [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0, 0.0, 0,0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
        lambda_sterics: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0]
    solvent:
      alchemical_path:
        lambda_electrostatics: [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0, 0.0, 0,0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
        lambda_sterics: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0]


# Here we specify which experiments to set up and run. Again, like
# in sections "molecules" and "systems". Every list preceeded by
# !Combinatorial will generate multiple experiments. The example
# below would generate 4 experiments (assuming that my-system1 and
# my-system2 were not combinatorial systems as well).
#
#   experiments:
#     system: !Combinatorial [my-system1, my-system2]
#     protocol !Combinatorial [protocol1, protocol2]
#
# -----------------------------------------------------------------
experiments:
  system: kinase-inhibitor
  protocol: absolute-binding
  restraint:                                                 # Optionally, apply a restrain to the ligand to keep
    type: !Combinatorial [Harmonic, FlatBottom]              # it close to the receptor. Possible types are null,
                                                             # Harmonic, FlatBottom, and Boresch.
  options:                                                   # All options that can be specified in the "options"
    temperature: !Combinatorial [298.0*kelvin, 307.0*kelvin] # section can be included here.


# It is also possible to specify a sequence of experiments with the syntax
# my-experiment1:
#   system: kinase-inhibitor
#   protocol: absolute-binding
#
# my-experiment2:
#   system: imatinib-hydration
#   protocol: hydration-protocol
#
# experiments: [my-experiment1, my-experiment2]