Utils¶
Utilities for the YANK modules
Provides many helper functions and common operations used by the various YANK suites
-
yank.utils.
is_terminal_verbose
()[source]¶ Check whether the logging on the terminal is configured to be verbose.
This is useful in case one wants to occasionally print something that is not really relevant to yank’s log (e.g. external library verbose, citations, etc.).
Returns: - is_verbose : bool
True if the terminal is configured to be verbose, False otherwise.
-
yank.utils.
config_root_logger
(verbose, log_file_path=None)[source]¶ Setup the the root logger’s configuration.
The log messages are printed in the terminal and saved in the file specified by log_file_path (if not None) and printed. Note that logging use sys.stdout to print logging.INFO messages, and stderr for the others. The root logger’s configuration is inherited by the loggers created by logging.getLogger(name).
Different formats are used to display messages on the terminal and on the log file. For example, in the log file every entry has a timestamp which does not appear in the terminal. Moreover, the log file always shows the module that generate the message, while in the terminal this happens only for messages of level WARNING and higher.
Parameters: - verbose : bool
Control the verbosity of the messages printed in the terminal. The logger displays messages of level logging.INFO and higher when verbose=False. Otherwise those of level logging.DEBUG and higher are printed.
- log_file_path : str, optional, default = None
If not None, this is the path where all the logger’s messages of level logging.DEBUG or higher are saved.
-
class
yank.utils.
CombinatorialLeaf
[source]¶ List type that can be expanded combinatorially in
CombinatorialTree
.-
append
(object) → None -- append object to end¶
-
clear
() → None -- remove all items from L¶
-
copy
() → list -- a shallow copy of L¶
-
count
(value) → integer -- return number of occurrences of value¶
-
extend
(iterable) → None -- extend list by appending elements from the iterable¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
insert
()¶ L.insert(index, object) – insert object before index
-
pop
([index]) → item -- remove and return item at index (default last).¶ Raises IndexError if list is empty or index is out of range.
-
remove
(value) → None -- remove first occurrence of value.¶ Raises ValueError if the value is not present.
-
reverse
()¶ L.reverse() – reverse IN PLACE
-
sort
(key=None, reverse=False) → None -- stable sort *IN PLACE*¶
-
-
class
yank.utils.
CombinatorialTree
(dictionary)[source]¶ A tree that can be expanded in a combinatorial fashion.
Each tree node with its subnodes is represented as a nested dictionary. Nodes can be accessed through their specific “path” (i.e. the list of the nested dictionary keys that lead to the node value).
Values of a leaf nodes that are list-like objects can be expanded combinatorially in the sense that it is possible to iterate over all possible combinations of trees that are generated by taking leaf node list and create a sequence of trees, each one defining only one of the single values in those lists per leaf node (see Examples).
Examples
Set an arbitrary nested path
>>> tree = CombinatorialTree({'a': {'b': 2}}) >>> path = ('a', 'b') >>> tree[path] 2 >>> tree[path] = 3 >>> tree[path] 3
Paths can be accessed also with the usual dict syntax
>>> tree['a']['b'] 3
Deletion of a node leave an empty dict!
>>> del tree[path] >>> print(tree) {'a': {}}
Expand all possible combinations of a tree. The iterator return a dict, not another CombinatorialTree object.
>>> import pprint # pprint sort the dictionary by key before printing >>> tree = CombinatorialTree({'a': 1, 'b': CombinatorialLeaf([1, 2]), ... 'c': {'d': CombinatorialLeaf([3, 4])}}) >>> for t in tree: ... pprint.pprint(t) {'a': 1, 'b': 1, 'c': {'d': 3}} {'a': 1, 'b': 2, 'c': {'d': 3}} {'a': 1, 'b': 1, 'c': {'d': 4}} {'a': 1, 'b': 2, 'c': {'d': 4}}
Expand all possible combinations and assign unique names
>>> for name, t in tree.named_combinations(separator='_', max_name_length=5): ... print(name) 3_1 3_2 4_1 4_2
-
named_combinations
(separator, max_name_length)[source]¶ Generator to iterate over all possible combinations of trees and assign them unique names.
The names are generated by gluing together the first letters of the values of the combinatorial leaves only, separated by the given separator. If the values contain special characters, they are ignored. Only letters, numbers and the separator are found in the generated names. Values representing paths to existing files contribute to the name only with they file name without extensions.
The iterator yields tuples of
(name, dict)
, not otherCombinatorialTree
‘s. If there is only a single combination, an empty string is returned for the name.Parameters: - separator : str
The string used to separate the words in the name.
- max_name_length : int
The maximum length of the generated names, excluding disambiguation number.
Yields: - name : str
Unique name of the combination. Empty string returned if there is only one combination
- combination : dict
Combination of leafs that was used to create the name
-
expand_id_nodes
(id_nodes_path, update_nodes_paths)[source]¶ Return a new
CombinatorialTree
with id-bearing nodes expanded and updated in the rest of the script.Parameters: - id_nodes_path : tuple of str
The path to the parent node containing ids.
- update_nodes_paths : list of tuple of str
A list of all the paths referring to the ids expanded. The string ‘*’ means every node.
Returns: - expanded_tree : CombinatorialTree
The tree with id nodes expanded.
Examples
>>> d = {'molecules': ... {'mol1': {'mol_value': CombinatorialLeaf([1, 2])}}, ... 'systems': ... {'sys1': {'molecules': 'mol1'}, ... 'sys2': {'prmtopfile': 'mysystem.prmtop'}}} >>> update_nodes_paths = [('systems', '*', 'molecules')] >>> t = CombinatorialTree(d).expand_id_nodes('molecules', update_nodes_paths) >>> t['molecules'] == {'mol1_1': {'mol_value': 1}, 'mol1_2': {'mol_value': 2}} True >>> t['systems'] == {'sys1': {'molecules': CombinatorialLeaf(['mol1_2', 'mol1_1'])}, ... 'sys2': {'prmtopfile': 'mysystem.prmtop'}} True
-
clear
() → None. Remove all items from D.¶
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), remove and return some (key, value) pair¶ as a 2-tuple; but raise KeyError if D is empty.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
-
values
() → an object providing a view on D's values¶
-
-
yank.utils.
get_data_filename
(relative_path)[source]¶ Get the full path to one of the reference files shipped for testing
In the source distribution, these files are in
examples/*/
, but on installation, they’re moved to somewhere in the user’s python site-packages directory.Parameters: - relative_path : str
Name of the file to load, with respect to the yank egg folder which is typically located at something like
~/anaconda/lib/python3.6/site-packages/yank-*.egg/examples/
Returns: - fn : str
Resource Filename
-
yank.utils.
find_phases_in_store_directory
(store_directory)[source]¶ Build a list of phases in the store directory.
Parameters: - store_directory : str
The directory to examine for stored phase NetCDF data files.
Returns: - phases : dict of str
A dictionary phase_name -> file_path that maps phase names to its NetCDF file path.
-
yank.utils.
update_nested_dict
(original, updated)[source]¶ Return a copy of a (possibly) nested dict of arbitrary depth
Parameters: - original : dict
Original dict which we want to update, can contain nested dicts
- updated : dict
Dictionary of updated values to place in original
Returns: - new : dict
Copy of original with values updated from updated
-
yank.utils.
underscore_to_camelcase
(underscore_str)[source]¶ Convert the given string from
underscore_case
tocamelCase
.Underscores at the beginning or at the end of the string are ignored. All underscores in the middle of the string are removed.
Parameters: - underscore_str : str
String in underscore_case to convert to camelCase style.
Returns: - camelcase_str : str
String in camelCase style.
Examples
>>> underscore_to_camelcase('__my___variable_') '__myVariable_'
-
yank.utils.
camelcase_to_underscore
(camelcase_str)[source]¶ Convert the given string from
camelCase
tounderscore_case
.Underscores at the beginning and end of the string are preserved. All capital letters are cast to lower case.
Parameters: - camelcase_str : str
String in camelCase to convert to underscore style.
Returns: - underscore_str : str
String in underscore style.
Examples
>>> camelcase_to_underscore('myVariable') 'my_variable' >>> camelcase_to_underscore('__my_Variable_') '__my__variable_'
-
yank.utils.
quantity_from_string
(expression, compatible_units=None)[source]¶ Create a Quantity object from a string expression.
All the functions in the standard module math are available together with most of the methods inside the
simtk.unit
module.Parameters: - expression : str
The mathematical expression to rebuild a Quantity as a string.
- compatible_units : simtk.unit.Unit, optional
If given, the result is checked for compatibility against the specified units, and an exception raised if not compatible.
Note: The output is not converted to
compatible_units
, they are only used as a unit to validate the input.
Returns: - quantity
The result of the evaluated expression.
Raises: - TypeError
If
compatible_units
is given and the quantity in expression is either unit-less or has incompatible units.
Examples
>>> expr = '4 * kilojoules / mole' >>> quantity_from_string(expr) Quantity(value=4.000000000000002, unit=kilojoule/mole)
>>> expr = '1.0*second' >>> quantity_from_string(expr, compatible_units=unit.femtosecond) Quantity(value=1.0, unit=second)
-
yank.utils.
get_keyword_args
(function, try_mro_from_class=None)[source]¶ Inspect function signature and return keyword args with their default values.
Parameters: - function : callable
The function to interrogate.
- try_mro_from_class : any Class or None
Try and trace the method resolution order (MRO) of the
function_to_inspect
by inferring a method stack from the supplied class. The signature of the function is checked in every MRO up the stack so long as there exists as**kwargs
in the method call. This is setting will yield expected results in every case, for instance, if the method does not call super(), or the Super class has a different function name. In the case of conflicting keywords, the lower MRO function is preferred.
Returns: - kwargs : dict
A dictionary
{'keyword argument': 'default value'}
. The arguments of the function that do not have a default value will not be included.
-
yank.utils.
validate_parameters
(parameters, template_parameters, check_unknown=False, process_units_str=False, float_to_int=False, ignore_none=True, special_conversions=None)[source]¶ Utility function for parameters and options validation.
Use the given template to filter the given parameters and infer their expected types. Perform various automatic conversions when requested. If the template is None, the parameter to validate is not checked for type compatibility.
Parameters: - parameters : dict
The parameters to validate.
- template_parameters : dict
The template used to filter the parameters and infer the types.
- check_unknown : bool
If True, an exception is raised when parameters contain a key that is not contained in
template_parameters
.- process_units_str: bool
If True, the function will attempt to convert the strings whose template type is simtk.unit.Quantity.
- float_to_int : bool
If True, floats in parameters whose template type is int are truncated.
- ignore_none : bool
If True, the function do not process parameters whose value is None.
- special_conversions : dict
Contains a converter function with signature convert(arg) that must be applied to the parameters specified by the dictionary key.
Returns: - validate_par : dict
The converted parameters that are contained both in parameters and
template_parameters
.
Raises: - TypeError
If
check_unknown
is True and there are parameters not intemplate_parameters
.- ValueError
If a parameter has an incompatible type with its template parameter.
Examples
Create the template parameters
>>> template_pars = dict() >>> template_pars['bool'] = True >>> template_pars['int'] = 2 >>> template_pars['unspecified'] = None # this won't be checked for type compatibility >>> template_pars['to_be_converted'] = [1, 2, 3] >>> template_pars['length'] = 2.0 * unit.nanometers
Now the parameters to validate
>>> input_pars = dict() >>> input_pars['bool'] = None # this will be skipped with ignore_none=True >>> input_pars['int'] = 4.3 # this will be truncated to 4 with float_to_int=True >>> input_pars['unspecified'] = 'input' # this can be of any type since the template is None >>> input_pars['to_be_converted'] = {'key': 3} >>> input_pars['length'] = '1.0*nanometers' >>> input_pars['unknown'] = 'test' # this will be silently filtered if check_unknown=False
Validate the parameters
>>> valid = validate_parameters(input_pars, template_pars, process_units_str=True, ... float_to_int=True, special_conversions={'to_be_converted': list}) >>> import pprint >>> pprint.pprint(valid) {'bool': None, 'int': 4, 'length': Quantity(value=1.0, unit=nanometer), 'to_be_converted': ['key'], 'unspecified': 'input'}
-
class
yank.utils.
Mol2File
(file_path)[source]¶ Wrapper of ParmEd mol2 parser for easy manipulation of mol2 files.
This is not efficient as every operation access the file. The purpose of this class is simply to provide a shortcut to read and write the mol2 file with a one-liner. If you need to do multiple operations before saving the file, use ParmEd directly.
This works only for single-structure mol2 files.
Parameters: - file_path : str
Path to the mol2 path.
Attributes: resname
The residue name of the first molecule found in the mol2 file.
resnames
Iterate over the names of all the molecules in the file (read-only).
net_charge
Net charge of the file as a float (read-only).
-
resname
¶ The residue name of the first molecule found in the mol2 file.
This assumes that each molecule in the mol2 file has a single residue name.
-
resnames
¶ Iterate over the names of all the molecules in the file (read-only).
This assumes that each molecule in the mol2 file has a single residue name.
-
net_charge
¶ Net charge of the file as a float (read-only).
-
yank.utils.
is_modeller_installed
()[source]¶ Check if a Salilab Modeller tool is installed and Licensed.
If Modeller is not installed and licensed, returns False.
Returns: - installed : bool
True if all tools in
oetools
are installed and licensed, False otherwise.
-
yank.utils.
is_openeye_installed
(oetools=('oechem', 'oequacpac', 'oeiupac', 'oeomega'))[source]¶ Check if a given OpenEye tool is installed and Licensed.
If the OpenEye toolkit is not installed, returns False.
Parameters: - oetools : str or iterable of strings, Optional, Default: (‘oechem’, ‘oequacpac’, ‘oeiupac’, ‘oeomega’)
Set of tools to check by their string name. Defaults to the complete set that YANK could use, depending on feature requested.
Only checks the subset of tools if passed. Also accepts a single tool to check as a string instead of an iterable of length 1.
Returns: - all_installed : bool
True if all tools in
oetools
are installed and licensed, False otherwise.
-
yank.utils.
load_oe_molecules
(file_path, molecule_idx=None)[source]¶ Read one or more molecules from a file.
Requires OpenEye Toolkit. Several formats are supported (including mol2, sdf and pdb).
Parameters: - file_path : str
Complete path to the file on disk.
- molecule_idx : None or int, optional, default: None
Index of the molecule on the file. If None, all of them are returned.
Returns: - molecule : openeye.oechem.OEMol or list of openeye.oechem.OEMol
The molecules stored in the file. If molecule_idx is specified only one molecule is returned, otherwise a list (even if the file contain only 1 molecule).
-
yank.utils.
write_oe_molecule
(oe_mol, file_path, mol2_resname=None)[source]¶ Write all conformations in a file and automatically detects format.
Requires OpenEye Toolkit
Parameters: - oe_mol : OpenEye Molecule
Molecule to write to file
- file_path : str
Complete path to file with filename and extension
- mol2_resname : None or str, Optional, Default: None
Name to replace the residue name if the file is a .mol2 file Requires
file_path
to match*mol2
-
yank.utils.
get_oe_mol_positions
(molecule, conformer_idx=0)[source]¶ Get the molecule positions from an OpenEye Molecule
Requires OpenEye Toolkit
Parameters: - molecule : OpenEye Molecule
Molecule to extract coordinates from
- conformer_idx : int, Optional, Default: 0
Index of the conformer on the file, leave as 0 to not use
-
class
yank.utils.
TLeap
[source]¶ Programmatic interface to write and run AmberTools’
tLEaP
scripts.To avoid problems with special characters in file paths, the class run the tleap script in a temporary folder with hardcoded names for files and then copy the output files in their respective folders.
Attributes: script
Complete and return the finalized script string
-
script
¶ Complete and return the finalized script string
Adds a
quit
command to the end of the script.
-
add_commands
(*args)[source]¶ Append commands to the script
Parameters: - args : iterable of strings
Individual commands to add to the script written in full as strings. Newline characters are added after each command
-
load_parameters
(*args)[source]¶ Load the LEaP parameters into the working TLEaP script if not already loaded
This adds to the script
Uses
loadAmberParams
forfrcmod.*
filesUses
loadOff
for*.off
and*.lib
filesUses
source
for other files.Parameters: - args : iterable of strings
File names for each type of leap file that can be loaded. Method to load them is automatically determined from file extension or base name
-
load_unit
(unit_name, file_path)[source]¶ Load a Unit into LEaP, this is typically a molecule or small complex.
This adds to the script
Accepts
*.mol2
or*.pdb
filesParameters: - unit_name : str
Name of the unit as it should be represented in LEaP
- file_path : str
Full file path with extension of the file to read into LEaP as a new unit
-
combine
(unit_name, *args)[source]¶ Combine units in LEaP
This adds to the script
Parameters: - unit_name : str
Name of LEaP unit to assign the combination to
- args : iterable of strings
Name of LEaP units to combine into a single unit called leap_name
-
add_ions
(unit_name, ion, num_ions=0, replace_solvent=False)[source]¶ Add ions to a unit in LEaP
This adds to the script
Parameters: - unit_name : str
Name of the existing LEaP unit which Ions will be added into
- ion : str
LEaP recognized name of ion to add
- num_ions : int, optional
Number of ions of type ion to add to unit_name. If 0, the unit is neutralized (default is 0).
- replace_solvent : bool, optional
If True, ions will replace solvent molecules rather than being added.
-
solvate
(unit_name, solvent_model, clearance)[source]¶ Solvate a unit in LEaP isometrically
This adds to the script
Parameters: - unit_name : str
Name of the existing LEaP unit which will be solvated
- solvent_model : str
LEaP recognized name of the solvent model to use, e.g. “TIP3PBOX”
- clearance : float
Add solvent up to clearance distance away from the unit_name (radial)
-
save_unit
(unit_name, output_path)[source]¶ Write a LEaP unit to file.
Accepts either
*.prmtop
,*.inpcrd
, or*.pdb
filesThis adds to the script
Parameters: - unit_name : str
Name of the unit to save
- output_path : str
Full file path with extension to save. Outputs with multiple files (e.g. Amber Parameters) have their names derived from this instead
-
transform
(unit_name, transformation)[source]¶ Transformation is an array-like representing the affine transformation matrix.
-
yank.utils.
generate_development_feature
(feature_dict)[source]¶ Helper function for generating a class which can flag classes, tests, and functions that are developmental.
Output class not quite a mixin because it has to be the first class due to the __init__ flag
Parameters: - feature_dict : dict
Dictionary of form “test_string : pre-computed test” where “test_string” is just an identifier and “pre-computed test” is a boolean-like object, usually the result of some test. All pre-computed tests will be cast to bool
Returns: - DevelopmentFeature : class
Class which checks against the feature_dict and can be used in several ways:
- Class Inherited: When inherited as a class, calling its
__init__()
will raise an error if features are not met - True/False check function: When calling
dev_validate()
will return bool if all features are true. - True/False decorator: When decorating function with
dev_validation
, function will only be called ifdev_validate()
would return True, otherwise simply returns. Helpful for running tests. - Dict of reasons: Property
dev_reasons
will return the dictionary of failed dependencies - Dict of all: Property
dev_features
will return the dictionary of features it expects and their tests
With the exception of the __init__`, all other functions are properties are Class based and do not require instantiation. Function names are all given the dev_ prefix to avoid clashes with other names its a part of its psudo-mixin properties
- Class Inherited: When inherited as a class, calling its