Utils

Utilities for the YANK modules

Provides many helper functions and common operations used by the various YANK suites

yank.utils.is_terminal_verbose()[source]

Check whether the logging on the terminal is configured to be verbose.

This is useful in case one wants to occasionally print something that is not really relevant to yank’s log (e.g. external library verbose, citations, etc.).

Returns:

is_verbose : bool

True if the terminal is configured to be verbose, False otherwise.

yank.utils.config_root_logger(verbose, log_file_path=None)[source]

Setup the the root logger’s configuration.

The log messages are printed in the terminal and saved in the file specified by log_file_path (if not None) and printed. Note that logging use sys.stdout to print logging.INFO messages, and stderr for the others. The root logger’s configuration is inherited by the loggers created by logging.getLogger(name).

Different formats are used to display messages on the terminal and on the log file. For example, in the log file every entry has a timestamp which does not appear in the terminal. Moreover, the log file always shows the module that generate the message, while in the terminal this happens only for messages of level WARNING and higher.

Parameters:

verbose : bool

Control the verbosity of the messages printed in the terminal. The logger displays messages of level logging.INFO and higher when verbose=False. Otherwise those of level logging.DEBUG and higher are printed.

log_file_path : str, optional, default = None

If not None, this is the path where all the logger’s messages of level logging.DEBUG or higher are saved.

class yank.utils.CombinatorialLeaf[source]

List type that can be expanded combinatorially in CombinatorialTree.

append(object) → None -- append object to end
clear() → None -- remove all items from L
copy() → list -- a shallow copy of L
count(value) → integer -- return number of occurrences of value
extend(iterable) → None -- extend list by appending elements from the iterable
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

insert()

L.insert(index, object) – insert object before index

pop([index]) → item -- remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value) → None -- remove first occurrence of value.

Raises ValueError if the value is not present.

reverse()

L.reverse() – reverse IN PLACE

sort(key=None, reverse=False) → None -- stable sort *IN PLACE*
class yank.utils.CombinatorialTree(dictionary)[source]

A tree that can be expanded in a combinatorial fashion.

Each tree node with its subnodes is represented as a nested dictionary. Nodes can be accessed through their specific “path” (i.e. the list of the nested dictionary keys that lead to the node value).

Values of a leaf nodes that are list-like objects can be expanded combinatorially in the sense that it is possible to iterate over all possible combinations of trees that are generated by taking leaf node list and create a sequence of trees, each one defining only one of the single values in those lists per leaf node (see Examples).

Examples

Set an arbitrary nested path

>>> tree = CombinatorialTree({'a': {'b': 2}})
>>> path = ('a', 'b')
>>> tree[path]
2
>>> tree[path] = 3
>>> tree[path]
3

Paths can be accessed also with the usual dict syntax

>>> tree['a']['b']
3

Deletion of a node leave an empty dict!

>>> del tree[path]
>>> print(tree)
{'a': {}}

Expand all possible combinations of a tree. The iterator return a dict, not another CombinatorialTree object.

>>> import pprint  # pprint sort the dictionary by key before printing
>>> tree = CombinatorialTree({'a': 1, 'b': CombinatorialLeaf([1, 2]),
...                           'c': {'d': CombinatorialLeaf([3, 4])}})
>>> for t in tree:
...     pprint.pprint(t)
{'a': 1, 'b': 1, 'c': {'d': 3}}
{'a': 1, 'b': 2, 'c': {'d': 3}}
{'a': 1, 'b': 1, 'c': {'d': 4}}
{'a': 1, 'b': 2, 'c': {'d': 4}}

Expand all possible combinations and assign unique names

>>> for name, t in tree.named_combinations(separator='_', max_name_length=5):
...     print(name)
3_1
3_2
4_1
4_2
named_combinations(separator, max_name_length)[source]

Generator to iterate over all possible combinations of trees and assign them unique names.

The names are generated by gluing together the first letters of the values of the combinatorial leaves only, separated by the given separator. If the values contain special characters, they are ignored. Only letters, numbers and the separator are found in the generated names. Values representing paths to existing files contribute to the name only with they file name without extensions.

The iterator yields tuples of (name, dict), not other CombinatorialTree‘s. If there is only a single combination, an empty string is returned for the name.

Parameters:

separator : str

The string used to separate the words in the name.

max_name_length : int

The maximum length of the generated names, excluding disambiguation number.

Yields:

name : str

Unique name of the combination. Empty string returned if there is only one combination

combination : dict

Combination of leafs that was used to create the name

expand_id_nodes(id_nodes_path, update_nodes_paths)[source]

Return a new CombinatorialTree with id-bearing nodes expanded and updated in the rest of the script.

Parameters:

id_nodes_path : tuple of str

The path to the parent node containing ids.

update_nodes_paths : list of tuple of str

A list of all the paths referring to the ids expanded. The string ‘*’ means every node.

Returns:

expanded_tree : CombinatorialTree

The tree with id nodes expanded.

Examples

>>> d = {'molecules':
...          {'mol1': {'mol_value': CombinatorialLeaf([1, 2])}},
...      'systems':
...          {'sys1': {'molecules': 'mol1'},
...           'sys2': {'prmtopfile': 'mysystem.prmtop'}}}
>>> update_nodes_paths = [('systems', '*', 'molecules')]
>>> t = CombinatorialTree(d).expand_id_nodes('molecules', update_nodes_paths)
>>> t['molecules'] == {'mol1_1': {'mol_value': 1}, 'mol1_2': {'mol_value': 2}}
True
>>> t['systems'] == {'sys1': {'molecules': CombinatorialLeaf(['mol1_2', 'mol1_1'])},
...                  'sys2': {'prmtopfile': 'mysystem.prmtop'}}
True
clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D's items
keys() → a set-like object providing a view on D's keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D's values
yank.utils.get_data_filename(relative_path)[source]

Get the full path to one of the reference files shipped for testing

In the source distribution, these files are in examples/*/, but on installation, they’re moved to somewhere in the user’s python site-packages directory.

Parameters:

relative_path : str

Name of the file to load, with respect to the yank egg folder which is typically located at something like ~/anaconda/lib/python3.6/site-packages/yank-*.egg/examples/

Returns:

fn : str

Resource Filename

yank.utils.find_phases_in_store_directory(store_directory)[source]

Build a list of phases in the store directory.

Parameters:

store_directory : str

The directory to examine for stored phase NetCDF data files.

Returns:

phases : dict of str

A dictionary phase_name -> file_path that maps phase names to its NetCDF file path.

yank.utils.update_nested_dict(original, updated)[source]

Return a copy of a (possibly) nested dict of arbitrary depth

Parameters:

original : dict

Original dict which we want to update, can contain nested dicts

updated : dict

Dictionary of updated values to place in original

Returns:

new : dict

Copy of original with values updated from updated

yank.utils.merge_dict(dict1, dict2)[source]

Return the union of two dictionaries in through Python version agnostic code.

In Python 3.5 there is a syntax to do this {**dict1, **dict2} but in Python 2 you need to go through update().

TODO: Refactor to no longer need this now that Python 2 is dropped

Parameters:

dict1 : dict

dict2 : dict

Returns:

merged_dict : dict

Union of dict1 and dict2

yank.utils.underscore_to_camelcase(underscore_str)[source]

Convert the given string from underscore_case to camelCase.

Underscores at the beginning or at the end of the string are ignored. All underscores in the middle of the string are removed.

Parameters:

underscore_str : str

String in underscore_case to convert to camelCase style.

Returns:

camelcase_str : str

String in camelCase style.

Examples

>>> underscore_to_camelcase('__my___variable_')
'__myVariable_'
yank.utils.camelcase_to_underscore(camelcase_str)[source]

Convert the given string from camelCase to underscore_case.

Underscores at the beginning and end of the string are preserved. All capital letters are cast to lower case.

Parameters:

camelcase_str : str

String in camelCase to convert to underscore style.

Returns:

underscore_str : str

String in underscore style.

Examples

>>> camelcase_to_underscore('myVariable')
'my_variable'
>>> camelcase_to_underscore('__my_Variable_')
'__my__variable_'
yank.utils.quantity_from_string(expression)[source]

Create a Quantity object from a string expression

All the functions in the standard module math are available together with most of the methods inside the simtk.unit module.

Parameters:

expression : str

The mathematical expression to rebuild a Quantity as a string.

Returns:

quantity

The result of the evaluated expression.

Examples

>>> expr = '4 * kilojoules / mole'
>>> quantity_from_string(expr)
Quantity(value=4.000000000000002, unit=kilojoule/mole)
yank.utils.process_unit_bearing_str(quantity_str, compatible_units)[source]

Process a unit-bearing string to produce a Quantity.

Parameters:

quantity_str : str

A string containing a value with a unit of measure.

compatible_units : simtk.unit.Unit

The result will be checked for compatibility with specified units, and an exception raised if not compatible.

Note: The output is not converted to compatible_units, they are only used as a unit to validate the input against.

Returns:

quantity : simtk.unit.Quantity

The specified string, returned as a Quantity.

Raises:

TypeError

If quantity_str does not contains units.

ValueError

If the units attached to quantity_str are incompatible with compatible_units

Examples

>>> process_unit_bearing_str('1.0*micrometers', unit.nanometers)
Quantity(value=1.0, unit=micrometer)
yank.utils.to_unit_validator(compatible_units)[source]

Function generator to test unit bearing strings with Cerberus.

yank.utils.generate_signature_schema(func, update_keys=None, exclude_keys=frozenset())[source]

Generate a dictionary to test function signatures with Cerberus’ Schema.

Parameters:

func : function

The function used to build the schema.

update_keys : dict

Keys in here have priority over automatic generation. It can be used to make an argument mandatory, or to use a specific validator.

exclude_keys : list-like

Keys in here are ignored and not included in the schema.

Returns:

func_schema : dict

The dictionary to be used as Cerberus Validator schema. Contains all keyword variables in the function signature as optional argument with the default type as validator. Unit bearing strings are converted. Argument with default None are always accepted. Camel case parameters in the function are converted to underscore style.

Examples

>>> from cerberus import Validator
>>> def f(a, b, camelCase=True, none=None, quantity=3.0*unit.angstroms):
...     pass
>>> f_dict = generate_signature_schema(f, exclude_keys=['quantity'])
>>> print(isinstance(f_dict, dict))
True
>>> f_validator = Validator(generate_signature_schema(f))
>>> f_validator.validated({'quantity': '1.0*nanometer'})
{'quantity': Quantity(value=1.0, unit=nanometer)}
yank.utils.get_keyword_args(function)[source]

Inspect function signature and return keyword args with their default values.

Parameters:

function : function

The function to interrogate.

Returns:

kwargs : dict

A dictionary {'keyword argument': 'default value'}. The arguments of the function that do not have a default value will not be included.

yank.utils.validate_parameters(parameters, template_parameters, check_unknown=False, process_units_str=False, float_to_int=False, ignore_none=True, special_conversions=None)[source]

Utility function for parameters and options validation.

Use the given template to filter the given parameters and infer their expected types. Perform various automatic conversions when requested. If the template is None, the parameter to validate is not checked for type compatibility.

Parameters:

parameters : dict

The parameters to validate.

template_parameters : dict

The template used to filter the parameters and infer the types.

check_unknown : bool

If True, an exception is raised when parameters contain a key that is not contained in template_parameters.

process_units_str: bool

If True, the function will attempt to convert the strings whose template type is simtk.unit.Quantity.

float_to_int : bool

If True, floats in parameters whose template type is int are truncated.

ignore_none : bool

If True, the function do not process parameters whose value is None.

special_conversions : dict

Contains a converter function with signature convert(arg) that must be applied to the parameters specified by the dictionary key.

Returns:

validate_par : dict

The converted parameters that are contained both in parameters and template_parameters.

Raises:

TypeError

If check_unknown is True and there are parameters not in template_parameters.

ValueError

If a parameter has an incompatible type with its template parameter.

Examples

Create the template parameters

>>> template_pars = dict()
>>> template_pars['bool'] = True
>>> template_pars['int'] = 2
>>> template_pars['unspecified'] = None  # this won't be checked for type compatibility
>>> template_pars['to_be_converted'] = [1, 2, 3]
>>> template_pars['length'] = 2.0 * unit.nanometers

Now the parameters to validate

>>> input_pars = dict()
>>> input_pars['bool'] = None  # this will be skipped with ignore_none=True
>>> input_pars['int'] = 4.3  # this will be truncated to 4 with float_to_int=True
>>> input_pars['unspecified'] = 'input'  # this can be of any type since the template is None
>>> input_pars['to_be_converted'] = {'key': 3}
>>> input_pars['length'] = '1.0*nanometers'
>>> input_pars['unknown'] = 'test'  # this will be silently filtered if check_unkown=False

Validate the parameters

>>> valid = validate_parameters(input_pars, template_pars, process_units_str=True,
...                             float_to_int=True, special_conversions={'to_be_converted': list})
>>> import pprint
>>> pprint.pprint(valid)
{'bool': None,
'int': 4,
'length': Quantity(value=1.0, unit=nanometer),
'to_be_converted': ['key'],
'unspecified': 'input'}
class yank.utils.Mol2File(file_path)[source]

Wrapper of ParmEd mol2 parser for easy manipulation of mol2 files.

This is not efficient as every operation access the file. The purpose of this class is simply to provide a shortcut to read and write the mol2 file with a one-liner. If you need to do multiple operations before saving the file, use ParmEd directly.

This works only for single-structure mol2 files.

Parameters:

file_path : str

Path to the mol2 path.

Attributes

resname The residue name of the first molecule found in the mol2 file.
resnames Iterate over the names of all the molecules in the file (read-only).
net_charge Net charge of the file as a float (read-only).
resname

The residue name of the first molecule found in the mol2 file.

This assumes that each molecule in the mol2 file has a single residue name.

resnames

Iterate over the names of all the molecules in the file (read-only).

This assumes that each molecule in the mol2 file has a single residue name.

net_charge

Net charge of the file as a float (read-only).

round_charge()[source]

Round the net charge to the nearest integer to 6-digit precision.

Raises:

RuntimeError

If the total net charge is far from the nearest integer by more than 0.05.

yank.utils.is_openeye_installed(oetools=('oechem', 'oequacpac', 'oeiupac', 'oeomega'))[source]

Check if a given OpenEye tool is installed and Licensed

If the OpenEye toolkit is not installed, returns False

Parameters:

oetools : str or iterable of strings, Optional, Default: (‘oechem’, ‘oequacpac’, ‘oeiupac’, ‘oeomega’)

Set of tools to check by their string name. Defaults to the complete set that YANK could use, depending on feature requested.

Only checks the subset of tools if passed. Also accepts a single tool to check as a string instead of an iterable of length 1.

Returns:

all_installed : bool

True if all tools in oetools are installed and licensed, False otherwise

yank.utils.load_oe_molecules(file_path, molecule_idx=None)[source]

Read one or more molecules from a file.

Requires OpenEye Toolkit. Several formats are supported (including mol2, sdf and pdb).

Parameters:

file_path : str

Complete path to the file on disk.

molecule_idx : None or int, optional, default: None

Index of the molecule on the file. If None, all of them are returned.

Returns:

molecule : openeye.oechem.OEMol or list of openeye.oechem.OEMol

The molecules stored in the file. If molecule_idx is specified only one molecule is returned, otherwise a list (even if the file contain only 1 molecule).

yank.utils.write_oe_molecule(oe_mol, file_path, mol2_resname=None)[source]

Write all conformations in a file and automatically detects format.

Requires OpenEye Toolkit

Parameters:

oe_mol : OpenEye Molecule

Molecule to write to file

file_path : str

Complete path to file with filename and extension

mol2_resname : None or str, Optional, Default: None

Name to replace the residue name if the file is a .mol2 file Requires file_path to match *mol2

yank.utils.get_oe_mol_positions(molecule, conformer_idx=0)[source]

Get the molecule positions from an OpenEye Molecule

Requires OpenEye Toolkit

Parameters:

molecule : OpenEye Molecule

Molecule to extract coordinates from

conformer_idx : int, Optional, Default: 0

Index of the conformer on the file, leave as 0 to not use

class yank.utils.TLeap[source]

Programmatic interface to write and run AmberTools’ tLEaP scripts.

To avoid problems with special characters in file paths, the class run the tleap script in a temporary folder with hardcoded names for files and then copy the output files in their respective folders.

Attributes

script Complete and return the finalized script string
script

Complete and return the finalized script string

Adds a quit command to the end of the script.

add_commands(*args)[source]

Append commands to the script

Parameters:

args : iterable of strings

Individual commands to add to the script written in full as strings. Newline characters are added after each command

load_parameters(*args)[source]

Load the LEaP parameters into the working TLEaP script if not already loaded

This adds to the script

Uses loadAmberParams for frcmod.* files

Uses loadOff for *.off and *.lib files

Uses source for other files.

Parameters:

args : iterable of strings

File names for each type of leap file that can be loaded. Method to load them is automatically determined from file extension or base name

load_unit(unit_name, file_path)[source]

Load a Unit into LEaP, this is typically a molecule or small complex.

This adds to the script

Accepts *.mol2 or *.pdb files

Parameters:

unit_name : str

Name of the unit as it should be represented in LEaP

file_path : str

Full file path with extension of the file to read into LEaP as a new unit

combine(unit_name, *args)[source]

Combine units in LEaP

This adds to the script

Parameters:

unit_name : str

Name of LEaP unit to assign the combination to

args : iterable of strings

Name of LEaP units to combine into a single unit called leap_name

add_ions(unit_name, ion, num_ions=0, replace_solvent=False)[source]

Add ions to a unit in LEaP

This adds to the script

Parameters:

unit_name : str

Name of the existing LEaP unit which Ions will be added into

ion : str

LEaP recognized name of ion to add

num_ions : int, optional

Number of ions of type ion to add to unit_name. If 0, the unit is neutralized (default is 0).

replace_solvent : bool, optional

If True, ions will replace solvent molecules rather than being added.

solvate(unit_name, solvent_model, clearance)[source]

Solvate a unit in LEaP isometrically

This adds to the script

Parameters:

unit_name : str

Name of the existing LEaP unit which will be solvated

solvent_model : str

LEaP recognized name of the solvent model to use, e.g. “TIP3PBOX”

clearance : float

Add solvent up to clearance distance away from the unit_name (radial)

save_unit(unit_name, output_path)[source]

Write a LEaP unit to file.

Accepts either *.prmtop, *.inpcrd, or *.pdb files

This adds to the script

Parameters:

unit_name : str

Name of the unit to save

output_path : str

Full file path with extension to save. Outputs with multiple files (e.g. Amber Parameters) have their names derived from this instead

transform(unit_name, transformation)[source]

Transformation is an array-like representing the affine transformation matrix.

new_section(comment)[source]

Adds a comment line to the script

export_script(file_path)[source]

Write script to file

Parameters:

file_path : str

Full file path with extension of the script to save

run()[source]

Run script and return warning messages in leap log file.

exception yank.utils.SimulationNaNError[source]

Error when a simulation goes to NaN

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.