Molecules Header for YAML Files

Everything under the molecules defines what molecules are in your systems. You can specify your own molecule names. Because of this user defined names in the syntax examples are marked as {UserDefinedMolecule}.

You can define as many {UserDefinedMolecule} as you like. These molecules will be used in other YAML headed sections.

Unlike the primary options for YAML files, many of these settings are optional and do nothing if not specified. The mandatory/optional of each setting (and what conditionals), as well as the default behavior of each setting is explicitly stated in the setting’s description.

All of the molecules will be built, even if they are not used in a later system, so ensure your molecules do not have errors or are commented out.


Specifying Molecule Names

filepath

molecules:
  {UserDefinedMolecule}:
    filepath: benzene.pdb

Filepath of the molecule. Path is relative to the directory the YAML script is in. Depending on what type of molecule you pass in (ligand vs. protein) will determine what mandatory arguments are required. For small molecules, you need a way to specify the charges, proteins however can rely on built in force field parameters.

MANDATORY but exclusive with smiles and name

Valid Filetypes: PDB, mol2, sdf, cvs

Note: If CVS is specified and there is only one moleucle, the first column must be a SMILES string. If multiple molecules are to be used (for the Combinatorial <combinatorial> ability), then each row is its own molecule where the second column is the SMILES string.

smiles

molecules:
  {UserDefinedMolecule}:
    smiles: c1ccccc1

YANK can process SMILES stings to build the molecule as well. Usually only recommended for small ligands. Requires that the OpenEye Toolkits are installed.

MANDATORY but exclusive with filepath and name

name

molecules:
  {UserDefinedMolecule}:
    name: benzene

YANK can process raw molecule name if the OpenEye Toolkits are installed

MANDATORY but exclusive with filepath and smiles

strip_protons

molecules:
  {UserDefinedMolecule}:
    strip_protons: no

Specifies if LEaP will re-add all hydrogen atoms. This is helpful if the PDB contains atom names for hydrogens that AMBER does not recognize. Primarily for proteins, not small molecules.

OPTIONAL and defaults to no

Valid Options: [no]/yes

select

molecules:
  {UserDefinedMolecule}:
    filepath: clinical-kinase-inhibitors.csv
    antechamber:
        charge_method: bcc
    select: !Combinatorial [0, 3]

The “select” keyword works the same way if you specify a pdb, mol2, sdf, or cvs file containing multiple structures. select has 3 modes:

  1. select: all includes all the molecules in the given file.
  2. select: <Integer> picks the molecule in the file with index <Integer>
  3. select: !Combinatorial: <List of Ints> pick specific indices in the file. See Combinatorial options for more information.

Indexing starts at 0.

OPTIONAL with default value of all

Valid Options: [all]/<Integer>/<Combinatorial List of ints>


Assigning Missing Parameters

antechamber

molecules:
  {UserDefinedMolecule}:
    filepath: benzene.mol2
    antechamber:
      charge_method: bcc

Pass the molecule through AmberTools ANTECHAMBER to assign missing parameters such as torsions and angle terms.

charge_method is a required sub-directive which allows assigning missing charges to the molecule. It is either given a known charge method to ANTECHAMBER method or null to skip assigning charges. The later is helpful when you already have the charges, but are missing other parameters.

OPTIONAL

PARTIALLY EXCLUSIVE If you have acess to the OpenEye toolkits and want to use them to assign partial charges to the atoms through the openeye command, then you should set charge_method to null. ANTECHAMBER can still get the other missing parameters such as torsions and angles.

OPTIONALLY SUPERSEDED by leap or the leap argument in systems. If the parameter files you feed into either leap argument have the charges and molecular parameters already included (such as standard protein residues in many force fields), then there is no need to invoke this command. If the force fields you give to the leap commands are missing parameters though, you should call this.

openeye

molecules:
  {UserDefinedMolecule}:
    filepath: benzene.mol2
    openeye:
      quacpac: am1-bcc

Use the OpenEye Toolkits if installed to determine molecular charge. Only the current options as shown are permitted and must be specified as shown.

OPTIONAL

PARTIALLY EXCLUSIVE If you want to use antechamber to assign partial charges, do not use this command. However, if you want to use antechamber to only get other missing parameters such as torsions and angles, use this command but set charge_method to null in antechamber

OPTIONALLY SUPERSEDED by leap or the leap argument in systems. If the parameter files you feed into either leap argument have the charges and molecular parameters already included (such as standard protein residues in many force fields), then there is no need to invoke this command. If the force fields you give to the leap commands are missing partial charges though, you should call this.


Assigning Extra Information

leap

molecules:
  {UserDefinedMolecule}:
    leap:
      parameters: [mymol.frcmod, mymol.off]

Load molecule-specific force field parameters into the molecule. These can be created from any source so long as leap can parse them. It is possible to assign partial charges with the files read in this way, which would supersede the options of antechamber and openeye.

This command has only one mandatory subargument parameters, which can accept both single files as a string, or can accept a comma separated list of files enclosed by [ ]. Filepaths are relative to either the AmberTools default paths or to the folder the YAML script is in.

Note: Proteins do not necessarily need this command if the force fields given to the leap argument in systems will fully describe them.

OPTIONAL

epik

molecules:
  {UserDefinedMolecule}:
     epik:
       select: 0
       ph: 7.6
       ph_tolerance: 3.0
       tautomerize: no

Run Schrodinger’s tool Epik with to select the most likely protonation state for the molecule in solution. Parameters in this call are direct reflections of the function to invoke epik from OpenMolTools. Each of the parameters in this list (with the exception of select) are optional.

We note that the option ph_tolerance set to a value here of 3.0, the pH range which will be searched will be pH +- 3.0, which is a 7 log unit range, which may take a some time to enumerate, although will likely be less than the simulation overall. Should you feel this time is too long, you might consider reducing the ph_tolerance.

OPTIONAL

regions

molecules:
  {UserDefinedMolecule}:
     regions:
        {UserDefinedRegion}: region_string
        ...

Define molecular regions in the molecule which can be used in upcoming features such as defining restraint regions in more general ways, or specific atom subsets you want to track through the yank.yank.Topography object which is stored as part of the simulation’s metadata, accessible through yank.repex.Reporter.

Any number of user defined regions can be specified for every molecule, so long as their name is unique between all molecules which ultimately wind up in a system. E.g. If you have 2 ligands you want to bind to a receptor in a combinatorial setup, both ligands can have a region named “my_region” since they will never be in the same system together. However, the receptor cannot have a region named “my_region” as well, as that will be ambiguous as to which region, ligand or receptor, to define.

The regions apply only to the molecule the regions section is under, so even if the atom index changes in the yank.yank.Topography, the atomic indices defined in the region entry will be converted.

The region definition supports multiple selection formats:

  • DSL String: An MDTraj DSL string which identifies will identify a region.
  • Future Ability SMARTS String: Molecular selection format similar to regular expression for strings, but for molecules instead. This feature is not in yet, but is planned. The regions framework is the pre-cursor to this feature. See Daylight’s website for more information on SMARTS.
  • List of Ints: Select atoms by integers, this applies only to the final system, so numbers will probably not align with the atom numbers from the input files.
  • Single Int: Same as the list of ints, but with a single entry, subject to same rules

OPTIONAL