Utils¶

Utilities for the YANK modules

Provides many helper functions and common operations used by the various YANK suites

yank.utils.is_terminal_verbose()[source]¶

Check whether the logging on the terminal is configured to be verbose.

This is useful in case one wants to occasionally print something that is not really relevant to yank’s log (e.g. external library verbose, citations, etc.).

Returns:	is_verbose : bool True if the terminal is configured to be verbose, False otherwise.

yank.utils.config_root_logger(verbose, log_file_path=None)[source]¶

Setup the the root logger’s configuration.

The log messages are printed in the terminal and saved in the file specified by log_file_path (if not None) and printed. Note that logging use sys.stdout to print logging.INFO messages, and stderr for the others. The root logger’s configuration is inherited by the loggers created by logging.getLogger(name).

Different formats are used to display messages on the terminal and on the log file. For example, in the log file every entry has a timestamp which does not appear in the terminal. Moreover, the log file always shows the module that generate the message, while in the terminal this happens only for messages of level WARNING and higher.

Parameters:	verbose : bool Control the verbosity of the messages printed in the terminal. The logger displays messages of level logging.INFO and higher when verbose=False. Otherwise those of level logging.DEBUG and higher are printed. log_file_path : str, optional, default = None If not None, this is the path where all the logger’s messages of level logging.DEBUG or higher are saved.

class yank.utils.CombinatorialLeaf[source]¶

List type that can be expanded combinatorially in CombinatorialTree.

append(object) → None -- append object to end¶

clear() → None -- remove all items from L¶

copy() → list -- a shallow copy of L¶

count(value) → integer -- return number of occurrences of value¶

extend(iterable) → None -- extend list by appending elements from the iterable¶

index(value[, start[, stop]]) → integer -- return first index of value.¶: Raises ValueError if the value is not present.

insert()¶: L.insert(index, object) – insert object before index

pop([index]) → item -- remove and return item at index (default last).¶: Raises IndexError if list is empty or index is out of range.

remove(value) → None -- remove first occurrence of value.¶: Raises ValueError if the value is not present.

reverse()¶: L.reverse() – reverse IN PLACE

sort(key=None, reverse=False) → None -- stable sort *IN PLACE*¶

class yank.utils.CombinatorialTree(dictionary)[source]¶

A tree that can be expanded in a combinatorial fashion.

Each tree node with its subnodes is represented as a nested dictionary. Nodes can be accessed through their specific “path” (i.e. the list of the nested dictionary keys that lead to the node value).

Values of a leaf nodes that are list-like objects can be expanded combinatorially in the sense that it is possible to iterate over all possible combinations of trees that are generated by taking leaf node list and create a sequence of trees, each one defining only one of the single values in those lists per leaf node (see Examples).

Examples

Set an arbitrary nested path

>>> tree = CombinatorialTree({'a': {'b': 2}})
>>> path = ('a', 'b')
>>> tree[path]
2
>>> tree[path] = 3
>>> tree[path]
3

Paths can be accessed also with the usual dict syntax

>>> tree['a']['b']
3

Deletion of a node leave an empty dict!

>>> del tree[path]
>>> print(tree)
{'a': {}}

Expand all possible combinations of a tree. The iterator return a dict, not another CombinatorialTree object.

>>> import pprint  # pprint sort the dictionary by key before printing
>>> tree = CombinatorialTree({'a': 1, 'b': CombinatorialLeaf([1, 2]),
...                           'c': {'d': CombinatorialLeaf([3, 4])}})
>>> for t in tree:
...     pprint.pprint(t)
{'a': 1, 'b': 1, 'c': {'d': 3}}
{'a': 1, 'b': 2, 'c': {'d': 3}}
{'a': 1, 'b': 1, 'c': {'d': 4}}
{'a': 1, 'b': 2, 'c': {'d': 4}}

Expand all possible combinations and assign unique names

>>> for name, t in tree.named_combinations(separator='_', max_name_length=5):
...     print(name)
3_1
3_2
4_1
4_2

named_combinations(separator, max_name_length)[source]¶

Generator to iterate over all possible combinations of trees and assign them unique names.

The names are generated by gluing together the first letters of the values of the combinatorial leaves only, separated by the given separator. If the values contain special characters, they are ignored. Only letters, numbers and the separator are found in the generated names. Values representing paths to existing files contribute to the name only with they file name without extensions.

The iterator yields tuples of (name, dict), not other CombinatorialTree‘s. If there is only a single combination, an empty string is returned for the name.

Parameters:	separator : str The string used to separate the words in the name. max_name_length : int The maximum length of the generated names, excluding disambiguation number.
Yields:	name : str Unique name of the combination. Empty string returned if there is only one combination combination : dict Combination of leafs that was used to create the name

expand_id_nodes(id_nodes_path, update_nodes_paths)[source]¶

Return a new CombinatorialTree with id-bearing nodes expanded and updated in the rest of the script.

Parameters:	id_nodes_path : tuple of str The path to the parent node containing ids. update_nodes_paths : list of tuple of str A list of all the paths referring to the ids expanded. The string ‘*’ means every node.
Returns:	expanded_tree : CombinatorialTree The tree with id nodes expanded.

Examples

>>> d = {'molecules':
...          {'mol1': {'mol_value': CombinatorialLeaf([1, 2])}},
...      'systems':
...          {'sys1': {'molecules': 'mol1'},
...           'sys2': {'prmtopfile': 'mysystem.prmtop'}}}
>>> update_nodes_paths = [('systems', '*', 'molecules')]
>>> t = CombinatorialTree(d).expand_id_nodes('molecules', update_nodes_paths)
>>> t['molecules'] == {'mol1_1': {'mol_value': 1}, 'mol1_2': {'mol_value': 2}}
True
>>> t['systems'] == {'sys1': {'molecules': CombinatorialLeaf(['mol1_2', 'mol1_1'])},
...                  'sys2': {'prmtopfile': 'mysystem.prmtop'}}
True

clear() → None. Remove all items from D.¶

get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶

items() → a set-like object providing a view on D's items¶

keys() → a set-like object providing a view on D's keys¶

pop(k[, d]) → v, remove specified key and return the corresponding value.¶: If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair¶: as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶

update([E, ]**F) → None. Update D from mapping/iterable E and F.¶: If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D's values¶

yank.utils.get_data_filename(relative_path)[source]¶

Get the full path to one of the reference files shipped for testing

In the source distribution, these files are in examples/*/, but on installation, they’re moved to somewhere in the user’s python site-packages directory.

Parameters:	relative_path : str Name of the file to load, with respect to the yank egg folder which is typically located at something like `~/anaconda/lib/python3.6/site-packages/yank-*.egg/examples/`
Returns:	fn : str Resource Filename

yank.utils.find_phases_in_store_directory(store_directory)[source]¶

Build a list of phases in the store directory.

Parameters:	store_directory : str The directory to examine for stored phase NetCDF data files.
Returns:	phases : dict of str A dictionary phase_name -> file_path that maps phase names to its NetCDF file path.

yank.utils.update_nested_dict(original, updated)[source]¶

Return a copy of a (possibly) nested dict of arbitrary depth

Parameters:	original : dict Original dict which we want to update, can contain nested dicts updated : dict Dictionary of updated values to place in original
Returns:	new : dict Copy of original with values updated from updated

yank.utils.underscore_to_camelcase(underscore_str)[source]¶

Convert the given string from underscore_case to camelCase.

Underscores at the beginning or at the end of the string are ignored. All underscores in the middle of the string are removed.

Parameters:	underscore_str : str String in underscore_case to convert to camelCase style.
Returns:	camelcase_str : str String in camelCase style.

Examples

>>> underscore_to_camelcase('__my___variable_')
'__myVariable_'

yank.utils.camelcase_to_underscore(camelcase_str)[source]¶

Convert the given string from camelCase to underscore_case.

Underscores at the beginning and end of the string are preserved. All capital letters are cast to lower case.

Parameters:	camelcase_str : str String in camelCase to convert to underscore style.
Returns:	underscore_str : str String in underscore style.

Examples

>>> camelcase_to_underscore('myVariable')
'my_variable'
>>> camelcase_to_underscore('__my_Variable_')
'__my__variable_'

yank.utils.quantity_from_string(expression, compatible_units=None)[source]¶

Create a Quantity object from a string expression.

All the functions in the standard module math are available together with most of the methods inside the simtk.unit module.

Parameters:	expression : str The mathematical expression to rebuild a Quantity as a string. compatible_units : simtk.unit.Unit, optional If given, the result is checked for compatibility against the specified units, and an exception raised if not compatible. Note: The output is not converted to `compatible_units`, they are only used as a unit to validate the input.
Returns:	quantity The result of the evaluated expression.
Raises:	TypeError If `compatible_units` is given and the quantity in expression is either unit-less or has incompatible units.

Examples

>>> expr = '4 * kilojoules / mole'
>>> quantity_from_string(expr)
Quantity(value=4.000000000000002, unit=kilojoule/mole)

>>> expr = '1.0*second'
>>> quantity_from_string(expr, compatible_units=unit.femtosecond)
Quantity(value=1.0, unit=second)

yank.utils.get_keyword_args(function, try_mro_from_class=None)[source]¶

Inspect function signature and return keyword args with their default values.

Parameters:

function : callable: The function to interrogate.
try_mro_from_class : any Class or None: Try and trace the method resolution order (MRO) of the function_to_inspect by inferring a method stack from the supplied class. The signature of the function is checked in every MRO up the stack so long as there exists as **kwargs in the method call. This is setting will yield expected results in every case, for instance, if the method does not call super(), or the Super class has a different function name. In the case of conflicting keywords, the lower MRO function is preferred.

Returns:

kwargs : dict: A dictionary {'keyword argument': 'default value'}. The arguments of the function that do not have a default value will not be included.

yank.utils.validate_parameters(parameters, template_parameters, check_unknown=False, process_units_str=False, float_to_int=False, ignore_none=True, special_conversions=None)[source]¶

Utility function for parameters and options validation.

Use the given template to filter the given parameters and infer their expected types. Perform various automatic conversions when requested. If the template is None, the parameter to validate is not checked for type compatibility.

Parameters:	parameters : dict The parameters to validate. template_parameters : dict The template used to filter the parameters and infer the types. check_unknown : bool If True, an exception is raised when parameters contain a key that is not contained in `template_parameters`. process_units_str: bool If True, the function will attempt to convert the strings whose template type is simtk.unit.Quantity. float_to_int : bool If True, floats in parameters whose template type is int are truncated. ignore_none : bool If True, the function do not process parameters whose value is None. special_conversions : dict Contains a converter function with signature convert(arg) that must be applied to the parameters specified by the dictionary key.
Returns:	validate_par : dict The converted parameters that are contained both in parameters and `template_parameters`.
Raises:	TypeError If `check_unknown` is True and there are parameters not in `template_parameters`. ValueError If a parameter has an incompatible type with its template parameter.

Examples

Create the template parameters

>>> template_pars = dict()
>>> template_pars['bool'] = True
>>> template_pars['int'] = 2
>>> template_pars['unspecified'] = None  # this won't be checked for type compatibility
>>> template_pars['to_be_converted'] = [1, 2, 3]
>>> template_pars['length'] = 2.0 * unit.nanometers

Now the parameters to validate

>>> input_pars = dict()
>>> input_pars['bool'] = None  # this will be skipped with ignore_none=True
>>> input_pars['int'] = 4.3  # this will be truncated to 4 with float_to_int=True
>>> input_pars['unspecified'] = 'input'  # this can be of any type since the template is None
>>> input_pars['to_be_converted'] = {'key': 3}
>>> input_pars['length'] = '1.0*nanometers'
>>> input_pars['unknown'] = 'test'  # this will be silently filtered if check_unknown=False

Validate the parameters

>>> valid = validate_parameters(input_pars, template_pars, process_units_str=True,
...                             float_to_int=True, special_conversions={'to_be_converted': list})
>>> import pprint
>>> pprint.pprint(valid)
{'bool': None,
'int': 4,
'length': Quantity(value=1.0, unit=nanometer),
'to_be_converted': ['key'],
'unspecified': 'input'}

class yank.utils.Mol2File(file_path)[source]¶

Wrapper of ParmEd mol2 parser for easy manipulation of mol2 files.

This is not efficient as every operation access the file. The purpose of this class is simply to provide a shortcut to read and write the mol2 file with a one-liner. If you need to do multiple operations before saving the file, use ParmEd directly.

This works only for single-structure mol2 files.

Parameters:	file_path : str Path to the mol2 path.
Attributes:	`resname` The residue name of the first molecule found in the mol2 file. `resnames` Iterate over the names of all the molecules in the file (read-only). `net_charge` Net charge of the file as a float (read-only).

resname¶

The residue name of the first molecule found in the mol2 file.

This assumes that each molecule in the mol2 file has a single residue name.

resnames¶

Iterate over the names of all the molecules in the file (read-only).

This assumes that each molecule in the mol2 file has a single residue name.

net_charge¶: Net charge of the file as a float (read-only).

round_charge()[source]¶

Round the net charge to the nearest integer to 6-digit precision.

Raises:	RuntimeError If the total net charge is far from the nearest integer by more than 0.05.

yank.utils.is_modeller_installed()[source]¶

Check if a Salilab Modeller tool is installed and Licensed.

If Modeller is not installed and licensed, returns False.

Returns:	installed : bool True if all tools in `oetools` are installed and licensed, False otherwise.

yank.utils.is_openeye_installed(oetools=('oechem', 'oequacpac', 'oeiupac', 'oeomega'))[source]¶

Check if a given OpenEye tool is installed and Licensed.

If the OpenEye toolkit is not installed, returns False.

Parameters:	oetools : str or iterable of strings, Optional, Default: (‘oechem’, ‘oequacpac’, ‘oeiupac’, ‘oeomega’) Set of tools to check by their string name. Defaults to the complete set that YANK could use, depending on feature requested. Only checks the subset of tools if passed. Also accepts a single tool to check as a string instead of an iterable of length 1.
Returns:	all_installed : bool True if all tools in `oetools` are installed and licensed, False otherwise.

yank.utils.load_oe_molecules(file_path, molecule_idx=None)[source]¶

Read one or more molecules from a file.

Requires OpenEye Toolkit. Several formats are supported (including mol2, sdf and pdb).

Parameters:	file_path : str Complete path to the file on disk. molecule_idx : None or int, optional, default: None Index of the molecule on the file. If None, all of them are returned.
Returns:	molecule : openeye.oechem.OEMol or list of openeye.oechem.OEMol The molecules stored in the file. If molecule_idx is specified only one molecule is returned, otherwise a list (even if the file contain only 1 molecule).

yank.utils.write_oe_molecule(oe_mol, file_path, mol2_resname=None)[source]¶

Write all conformations in a file and automatically detects format.

Requires OpenEye Toolkit

Parameters:	oe_mol : OpenEye Molecule Molecule to write to file file_path : str Complete path to file with filename and extension mol2_resname : None or str, Optional, Default: None Name to replace the residue name if the file is a .mol2 file Requires `file_path` to match `*mol2`

yank.utils.get_oe_mol_positions(molecule, conformer_idx=0)[source]¶

Get the molecule positions from an OpenEye Molecule

Requires OpenEye Toolkit

Parameters:	molecule : OpenEye Molecule Molecule to extract coordinates from conformer_idx : int, Optional, Default: 0 Index of the conformer on the file, leave as 0 to not use

class yank.utils.TLeap[source]¶

Programmatic interface to write and run AmberTools’ tLEaP scripts.

To avoid problems with special characters in file paths, the class run the tleap script in a temporary folder with hardcoded names for files and then copy the output files in their respective folders.

Attributes:	`script` Complete and return the finalized script string

script¶

Complete and return the finalized script string

Adds a quit command to the end of the script.

add_commands(*args)[source]¶

Append commands to the script

Parameters:	args : iterable of strings Individual commands to add to the script written in full as strings. Newline characters are added after each command

load_parameters(*args)[source]¶

Load the LEaP parameters into the working TLEaP script if not already loaded

This adds to the script

Uses loadAmberParams for frcmod.* files

Uses loadOff for *.off and *.lib files

Uses source for other files.

Parameters:	args : iterable of strings File names for each type of leap file that can be loaded. Method to load them is automatically determined from file extension or base name

load_unit(unit_name, file_path)[source]¶

Load a Unit into LEaP, this is typically a molecule or small complex.

This adds to the script

Accepts *.mol2 or *.pdb files

Parameters:	unit_name : str Name of the unit as it should be represented in LEaP file_path : str Full file path with extension of the file to read into LEaP as a new unit

combine(unit_name, *args)[source]¶

Combine units in LEaP

This adds to the script

Parameters:	unit_name : str Name of LEaP unit to assign the combination to args : iterable of strings Name of LEaP units to combine into a single unit called leap_name

add_ions(unit_name, ion, num_ions=0, replace_solvent=False)[source]¶

Add ions to a unit in LEaP

This adds to the script

Parameters:	unit_name : str Name of the existing LEaP unit which Ions will be added into ion : str LEaP recognized name of ion to add num_ions : int, optional Number of ions of type ion to add to unit_name. If 0, the unit is neutralized (default is 0). replace_solvent : bool, optional If True, ions will replace solvent molecules rather than being added.

solvate(unit_name, solvent_model, clearance)[source]¶

Solvate a unit in LEaP isometrically

This adds to the script

Parameters:	unit_name : str Name of the existing LEaP unit which will be solvated solvent_model : str LEaP recognized name of the solvent model to use, e.g. “TIP3PBOX” clearance : float Add solvent up to clearance distance away from the unit_name (radial)

save_unit(unit_name, output_path)[source]¶

Write a LEaP unit to file.

Accepts either *.prmtop, *.inpcrd, or *.pdb files

This adds to the script

Parameters:	unit_name : str Name of the unit to save output_path : str Full file path with extension to save. Outputs with multiple files (e.g. Amber Parameters) have their names derived from this instead

transform(unit_name, transformation)[source]¶: Transformation is an array-like representing the affine transformation matrix.

new_section(comment)[source]¶: Adds a comment line to the script

export_script(file_path)[source]¶

Write script to file

Parameters:	file_path : str Full file path with extension of the script to save

run()[source]¶: Run script and return warning messages in leap log file.

yank.utils.generate_development_feature(feature_dict)[source]¶

Helper function for generating a class which can flag classes, tests, and functions that are developmental.

Output class not quite a mixin because it has to be the first class due to the __init__ flag

Parameters:

feature_dict : dict: Dictionary of form “test_string : pre-computed test” where “test_string” is just an identifier and “pre-computed test” is a boolean-like object, usually the result of some test. All pre-computed tests will be cast to bool

Returns:

DevelopmentFeature : class

Class which checks against the feature_dict and can be used in several ways:

Class Inherited: When inherited as a class, calling its __init__() will raise an error if features are not met
True/False check function: When calling dev_validate() will return bool if all features are true.
True/False decorator: When decorating function with dev_validation, function will only be called if dev_validate() would return True, otherwise simply returns. Helpful for running tests.
Dict of reasons: Property dev_reasons will return the dictionary of failed dependencies
Dict of all: Property dev_features will return the dictionary of features it expects and their tests

With the exception of the __init__`, all other functions are properties are Class based and do not require instantiation. Function names are all given the dev_ prefix to avoid clashes with other names its a part of its psudo-mixin properties