Utils¶
Utilities for the YANK modules
Provides many helper functions and common operations used by the various YANK suites
-
yank.utils.
is_terminal_verbose
()[source]¶ Check whether the logging on the terminal is configured to be verbose.
This is useful in case one wants to occasionally print something that is not really relevant to yank’s log (e.g. external library verbose, citations, etc.).
Returns: is_verbose : bool
True if the terminal is configured to be verbose, False otherwise.
-
yank.utils.
config_root_logger
(verbose, log_file_path=None)[source]¶ Setup the the root logger’s configuration.
The log messages are printed in the terminal and saved in the file specified by log_file_path (if not None) and printed. Note that logging use sys.stdout to print logging.INFO messages, and stderr for the others. The root logger’s configuration is inherited by the loggers created by logging.getLogger(name).
Different formats are used to display messages on the terminal and on the log file. For example, in the log file every entry has a timestamp which does not appear in the terminal. Moreover, the log file always shows the module that generate the message, while in the terminal this happens only for messages of level WARNING and higher.
Parameters: verbose : bool
Control the verbosity of the messages printed in the terminal. The logger displays messages of level logging.INFO and higher when verbose=False. Otherwise those of level logging.DEBUG and higher are printed.
log_file_path : str, optional, default = None
If not None, this is the path where all the logger’s messages of level logging.DEBUG or higher are saved.
-
class
yank.utils.
CombinatorialLeaf
[source]¶ List type that can be expanded combinatorially in
CombinatorialTree
.-
append
(object) → None -- append object to end¶
-
clear
() → None -- remove all items from L¶
-
copy
() → list -- a shallow copy of L¶
-
count
(value) → integer -- return number of occurrences of value¶
-
extend
(iterable) → None -- extend list by appending elements from the iterable¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
insert
()¶ L.insert(index, object) – insert object before index
-
pop
([index]) → item -- remove and return item at index (default last).¶ Raises IndexError if list is empty or index is out of range.
-
remove
(value) → None -- remove first occurrence of value.¶ Raises ValueError if the value is not present.
-
reverse
()¶ L.reverse() – reverse IN PLACE
-
sort
(key=None, reverse=False) → None -- stable sort *IN PLACE*¶
-
-
class
yank.utils.
CombinatorialTree
(dictionary)[source]¶ A tree that can be expanded in a combinatorial fashion.
Each tree node with its subnodes is represented as a nested dictionary. Nodes can be accessed through their specific “path” (i.e. the list of the nested dictionary keys that lead to the node value).
Values of a leaf nodes that are list-like objects can be expanded combinatorially in the sense that it is possible to iterate over all possible combinations of trees that are generated by taking leaf node list and create a sequence of trees, each one defining only one of the single values in those lists per leaf node (see Examples).
Examples
Set an arbitrary nested path
>>> tree = CombinatorialTree({'a': {'b': 2}}) >>> path = ('a', 'b') >>> tree[path] 2 >>> tree[path] = 3 >>> tree[path] 3
Paths can be accessed also with the usual dict syntax
>>> tree['a']['b'] 3
Deletion of a node leave an empty dict!
>>> del tree[path] >>> print(tree) {'a': {}}
Expand all possible combinations of a tree. The iterator return a dict, not another CombinatorialTree object.
>>> import pprint # pprint sort the dictionary by key before printing >>> tree = CombinatorialTree({'a': 1, 'b': CombinatorialLeaf([1, 2]), ... 'c': {'d': CombinatorialLeaf([3, 4])}}) >>> for t in tree: ... pprint.pprint(t) {'a': 1, 'b': 1, 'c': {'d': 3}} {'a': 1, 'b': 2, 'c': {'d': 3}} {'a': 1, 'b': 1, 'c': {'d': 4}} {'a': 1, 'b': 2, 'c': {'d': 4}}
Expand all possible combinations and assign unique names
>>> for name, t in tree.named_combinations(separator='_', max_name_length=5): ... print(name) 3_1 3_2 4_1 4_2
-
named_combinations
(separator, max_name_length)[source]¶ Generator to iterate over all possible combinations of trees and assign them unique names.
The names are generated by gluing together the first letters of the values of the combinatorial leaves only, separated by the given separator. If the values contain special characters, they are ignored. Only letters, numbers and the separator are found in the generated names. Values representing paths to existing files contribute to the name only with they file name without extensions.
The iterator yields tuples of
(name, dict)
, not otherCombinatorialTree
‘s. If there is only a single combination, an empty string is returned for the name.Parameters: separator : str
The string used to separate the words in the name.
max_name_length : int
The maximum length of the generated names, excluding disambiguation number.
Yields: name : str
Unique name of the combination. Empty string returned if there is only one combination
combination : dict
Combination of leafs that was used to create the name
-
expand_id_nodes
(id_nodes_path, update_nodes_paths)[source]¶ Return a new
CombinatorialTree
with id-bearing nodes expanded and updated in the rest of the script.Parameters: id_nodes_path : tuple of str
The path to the parent node containing ids.
update_nodes_paths : list of tuple of str
A list of all the paths referring to the ids expanded. The string ‘*’ means every node.
Returns: expanded_tree : CombinatorialTree
The tree with id nodes expanded.
Examples
>>> d = {'molecules': ... {'mol1': {'mol_value': CombinatorialLeaf([1, 2])}}, ... 'systems': ... {'sys1': {'molecules': 'mol1'}, ... 'sys2': {'prmtopfile': 'mysystem.prmtop'}}} >>> update_nodes_paths = [('systems', '*', 'molecules')] >>> t = CombinatorialTree(d).expand_id_nodes('molecules', update_nodes_paths) >>> t['molecules'] == {'mol1_1': {'mol_value': 1}, 'mol1_2': {'mol_value': 2}} True >>> t['systems'] == {'sys1': {'molecules': CombinatorialLeaf(['mol1_2', 'mol1_1'])}, ... 'sys2': {'prmtopfile': 'mysystem.prmtop'}} True
-
clear
() → None. Remove all items from D.¶
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), remove and return some (key, value) pair¶ as a 2-tuple; but raise KeyError if D is empty.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
-
values
() → an object providing a view on D's values¶
-
-
yank.utils.
get_data_filename
(relative_path)[source]¶ Get the full path to one of the reference files shipped for testing
In the source distribution, these files are in
examples/*/
, but on installation, they’re moved to somewhere in the user’s python site-packages directory.Parameters: relative_path : str
Name of the file to load, with respect to the yank egg folder which is typically located at something like
~/anaconda/lib/python2.7/site-packages/yank-*.egg/examples/
Returns: fn : str
Resource Filename
-
yank.utils.
find_phases_in_store_directory
(store_directory)[source]¶ Build a list of phases in the store directory.
Parameters: store_directory : str
The directory to examine for stored phase NetCDF data files.
Returns: phases : dict of str
A dictionary phase_name -> file_path that maps phase names to its NetCDF file path.
-
yank.utils.
update_nested_dict
(original, updated)[source]¶ Return a copy of a (possibly) nested dict of arbitrary depth
Parameters: original : dict
Original dict which we want to update, can contain nested dicts
updated : dict
Dictionary of updated values to place in original
Returns: new : dict
Copy of original with values updated from updated
-
yank.utils.
typename
(atype)[source]¶ Convert a type object into a fully qualified typename string
Parameters: atype : type
The type to convert
Returns: typename : str
The string typename.
Examples
>>> typename(type(1)) 'int'
>>> import numpy >>> x = numpy.array([1,2,3], numpy.float32) >>> typename(type(x)) 'numpy.ndarray'
-
yank.utils.
merge_dict
(dict1, dict2)[source]¶ Return the union of two dictionaries in through Python version agnostic code.
In Python 3.5 there is a syntax to do this
{**dict1, **dict2}
but in Python 2 you need to go throughupdate()
.Parameters: dict1 : dict
dict2 : dict
Returns: merged_dict : dict
Union of dict1 and dict2
-
yank.utils.
underscore_to_camelcase
(underscore_str)[source]¶ Convert the given string from
underscore_case
tocamelCase
.Underscores at the beginning or at the end of the string are ignored. All underscores in the middle of the string are removed.
Parameters: underscore_str : str
String in underscore_case to convert to camelCase style.
Returns: camelcase_str : str
String in camelCase style.
Examples
>>> underscore_to_camelcase('__my___variable_') '__myVariable_'
-
yank.utils.
camelcase_to_underscore
(camelcase_str)[source]¶ Convert the given string from
camelCase
tounderscore_case
.Underscores at the beginning and end of the string are preserved. All capital letters are cast to lower case.
Parameters: camelcase_str : str
String in camelCase to convert to underscore style.
Returns: underscore_str : str
String in underscore style.
Examples
>>> camelcase_to_underscore('myVariable') 'my_variable' >>> camelcase_to_underscore('__my_Variable_') '__my__variable_'
-
yank.utils.
quantity_from_string
(expression)[source]¶ Create a Quantity object from a string expression
All the functions in the standard module math are available together with most of the methods inside the
simtk.unit
module.Parameters: expression : str
The mathematical expression to rebuild a Quantity as a string.
Returns: quantity
The result of the evaluated expression.
Examples
>>> expr = '4 * kilojoules / mole' >>> quantity_from_string(expr) Quantity(value=4.000000000000002, unit=kilojoule/mole)
-
yank.utils.
process_unit_bearing_str
(quantity_str, compatible_units)[source]¶ Process a unit-bearing string to produce a Quantity.
Parameters: quantity_str : str
A string containing a value with a unit of measure.
compatible_units : simtk.unit.Unit
The result will be checked for compatibility with specified units, and an exception raised if not compatible.
Note: The output is not converted to
compatible_units
, they are only used as a unit to validate the input against.Returns: quantity : simtk.unit.Quantity
The specified string, returned as a Quantity.
Raises: TypeError
If
quantity_str
does not contains units.ValueError
If the units attached to
quantity_str
are incompatible withcompatible_units
See also
Examples
>>> process_unit_bearing_str('1.0*micrometers', unit.nanometers) Quantity(value=1.0, unit=micrometer)
-
yank.utils.
to_unit_validator
(compatible_units)[source]¶ Function generator to test unit bearing strings with Schema.
-
yank.utils.
generate_signature_schema
(func, update_keys=None, exclude_keys=frozenset())[source]¶ Generate a dictionary to test function signatures with Schema.
Parameters: func : function
The function used to build the schema.
update_keys : dict
Keys in here have priority over automatic generation. It can be used to make an argument mandatory, or to use a specific validator.
exclude_keys : list-like
Keys in here are ignored and not included in the schema.
Returns: func_schema : dict
The dictionary to be used as Schema type. Contains all keyword variables in the function signature as optional argument with the default type as validator. Unit bearing strings are converted. Argument with default None are always accepted. Camel case parameters in the function are converted to underscore style.
Examples
>>> from schema import Schema >>> def f(a, b, camelCase=True, none=None, quantity=3.0*unit.angstroms): ... pass >>> f_dict = generate_signature_schema(f, exclude_keys=['quantity']) >>> print(isinstance(f_dict, dict)) True >>> # Print (key, values) in the correct order >>> print(sorted(dictiter(f_dict), key=lambda x: x[1])) [(Optional('camel_case'), <type 'bool'>), (Optional('none'), <type 'object'>)] >>> f_schema = Schema(generate_signature_schema(f)) >>> f_schema.validate({'quantity': '1.0*nanometer'}) {'quantity': Quantity(value=1.0, unit=nanometer)}
-
yank.utils.
get_keyword_args
(function)[source]¶ Inspect function signature and return keyword args with their default values.
Parameters: function : function
The function to interrogate.
Returns: kwargs : dict
A dictionary
{'keyword argument': 'default value'}
. The arguments of the function that do not have a default value will not be included.
-
yank.utils.
validate_parameters
(parameters, template_parameters, check_unknown=False, process_units_str=False, float_to_int=False, ignore_none=True, special_conversions=None)[source]¶ Utility function for parameters and options validation.
Use the given template to filter the given parameters and infer their expected types. Perform various automatic conversions when requested. If the template is None, the parameter to validate is not checked for type compatibility.
Parameters: parameters : dict
The parameters to validate.
template_parameters : dict
The template used to filter the parameters and infer the types.
check_unknown : bool
If True, an exception is raised when parameters contain a key that is not contained in
template_parameters
.process_units_str: bool
If True, the function will attempt to convert the strings whose template type is simtk.unit.Quantity.
float_to_int : bool
If True, floats in parameters whose template type is int are truncated.
ignore_none : bool
If True, the function do not process parameters whose value is None.
special_conversions : dict
Contains a converter function with signature convert(arg) that must be applied to the parameters specified by the dictionary key.
Returns: validate_par : dict
The converted parameters that are contained both in parameters and
template_parameters
.Raises: TypeError
If
check_unknown
is True and there are parameters not intemplate_parameters
.ValueError
If a parameter has an incompatible type with its template parameter.
Examples
Create the template parameters
>>> template_pars = dict() >>> template_pars['bool'] = True >>> template_pars['int'] = 2 >>> template_pars['unspecified'] = None # this won't be checked for type compatibility >>> template_pars['to_be_converted'] = [1, 2, 3] >>> template_pars['length'] = 2.0 * unit.nanometers
Now the parameters to validate
>>> input_pars = dict() >>> input_pars['bool'] = None # this will be skipped with ignore_none=True >>> input_pars['int'] = 4.3 # this will be truncated to 4 with float_to_int=True >>> input_pars['unspecified'] = 'input' # this can be of any type since the template is None >>> input_pars['to_be_converted'] = {'key': 3} >>> input_pars['length'] = '1.0*nanometers' >>> input_pars['unknown'] = 'test' # this will be silently filtered if check_unkown=False
Validate the parameters
>>> valid = validate_parameters(input_pars, template_pars, process_units_str=True, ... float_to_int=True, special_conversions={'to_be_converted': list}) >>> import pprint >>> pprint.pprint(valid) {'bool': None, 'int': 4, 'length': Quantity(value=1.0, unit=nanometer), 'to_be_converted': ['key'], 'unspecified': 'input'}
-
class
yank.utils.
Mol2File
(file_path)[source]¶ Wrapper of ParmEd mol2 parser for easy manipulation of mol2 files.
This is not efficient as every operation access the file. The purpose of this class is simply to provide a shortcut to read and write the mol2 file with a one-liner. If you need to do multiple operations before saving the file, use ParmEd directly.
This works only for single-structure mol2 files.
Parameters: file_path : str
Path to the mol2 path.
Attributes
resname
The residue name of the first molecule found in the mol2 file. resnames
Iterate over the names of all the molecules in the file (read-only). net_charge
Net charge of the file as a float -
resname
¶ The residue name of the first molecule found in the mol2 file.
This assumes that each molecule in the mol2 file has a single residue name.
-
resnames
¶ Iterate over the names of all the molecules in the file (read-only).
This assumes that each molecule in the mol2 file has a single residue name.
-
net_charge
¶ Net charge of the file as a float
-
-
yank.utils.
is_openeye_installed
(oetools=('oechem', 'oequacpac', 'oeiupac', 'oeomega'))[source]¶ Check if a given OpenEye tool is installed and Licensed
If the OpenEye toolkit is not installed, returns False
Parameters: oetools : str or iterable of strings, Optional, Default: (‘oechem’, ‘oequacpac’, ‘oeiupac’, ‘oeomega’)
Set of tools to check by their string name. Defaults to the complete set that YANK could use, depending on feature requested.
Only checks the subset of tools if passed. Also accepts a single tool to check as a string instead of an iterable of length 1.
Returns: all_installed : bool
True if all tools in
oetools
are installed and licensed, False otherwise
-
yank.utils.
load_oe_molecules
(file_path, molecule_idx=None)[source]¶ Read one or more molecules from a file.
Requires OpenEye Toolkit. Several formats are supported (including mol2, sdf and pdb).
Parameters: file_path : str
Complete path to the file on disk.
molecule_idx : None or int, optional, default: None
Index of the molecule on the file. If None, all of them are returned.
Returns: molecule : openeye.oechem.OEMol or list of openeye.oechem.OEMol
The molecules stored in the file. If molecule_idx is specified only one molecule is returned, otherwise a list (even if the file contain only 1 molecule).
-
yank.utils.
write_oe_molecule
(oe_mol, file_path, mol2_resname=None)[source]¶ Write all conformations in a file and automatically detects format.
Requires OpenEye Toolkit
Parameters: oe_mol : OpenEye Molecule
Molecule to write to file
file_path : str
Complete path to file with filename and extension
mol2_resname : None or str, Optional, Default: None
Name to replace the residue name if the file is a .mol2 file Requires
file_path
to match*mol2
-
yank.utils.
get_oe_mol_positions
(molecule, conformer_idx=0)[source]¶ Get the molecule positions from an OpenEye Molecule
Requires OpenEye Toolkit
Parameters: molecule : OpenEye Molecule
Molecule to extract coordinates from
conformer_idx : int, Optional, Default: 0
Index of the conformer on the file, leave as 0 to not use
-
class
yank.utils.
TLeap
[source]¶ Programmatic interface to write and run AmberTools’
tLEaP
scripts.To avoid problems with special characters in file paths, the class run the tleap script in a temporary folder with hardcoded names for files and then copy the output files in their respective folders.
Attributes
script
Complete and return the finalized script string -
script
¶ Complete and return the finalized script string
Adds a
quit
command to the end of the script.
-
add_commands
(*args)[source]¶ Append commands to the script
Parameters: args : iterable of strings
Individual commands to add to the script written in full as strings. Newline characters are added after each command
-
load_parameters
(*args)[source]¶ Load the LEaP parameters into the working TLEaP script if not already loaded
This adds to the script
Uses
loadAmberParams
forfrcmod.*
filesUses
loadOff
for*.off
and*.lib
filesUses
source
for other files.Parameters: args : iterable of strings
File names for each type of leap file that can be loaded. Method to load them is automatically determined from file extension or base name
-
load_unit
(name, file_path)[source]¶ Load a Unit into LEaP, this is typically a molecule or small complex.
This adds to the script
Accepts
*.mol2
or*.pdb
filesParameters: name : str
Name of the unit as it should be represented in LEaP
file_path : str
Full file path with extension of the file to read into LEaP as a new unit
-
combine
(leap_unit, *args)[source]¶ Combine units in LEaP
This adds to the script
Parameters: leap_unit : str
Name of LEaP unit to assign the combination to
args : iterable of strings
Name of LEaP units to combine into a single unit called leap_name
-
add_ions
(leap_unit, ion, num_ions=0, replace_solvent=False)[source]¶ Add ions to a unit in LEaP
This adds to the script
Parameters: leap_unit : str
Name of the existing LEaP unit which Ions will be added into
ion : str
LEaP recognized name of ion to add
num_ions : int, optional
Number of ions of type ion to add to leap_unit. If 0, the unit is neutralized (default is 0).
replace_solvent : bool, optional
If True, ions will replace solvent molecules rather than being added.
-
solvate
(leap_unit, solvent_model, clearance)[source]¶ Solvate a unit in LEaP isometrically
This adds to the script
Parameters: leap_unit : str
Name of the existing LEaP unit which will be solvated
solvent_model : str
LEaP recognized name of the solvent model to use, e.g. “TIP3PBOX”
clearance : float
Add solvent up to clearance distance away from the leap_unit (radial)
-
save_unit
(leap_unit, output_path)[source]¶ Write a LEaP unit to file.
Accepts either
*.prmtop
,*.inpcrd
, or*.pdb
filesThis adds to the script
Parameters: leap_unit : str
Name of the unit to save
output_path : str
Full file path with extension to save. Outputs with multiple files (e.g. Amber Parameters) have their names derived from this instead
-
transform
(leap_unit, transformation)[source]¶ Transformation is an array-like representing the affine transformation matrix.
-