Analyze¶

YANK Specific analysis tools for YANK simulations from the yank.yank.AlchemicalPhase classes

Extends classes from the MultiStateAnalyzer package to include the

class yank.analyze.ExperimentAnalyzer(store_directory, **analyzer_kwargs)[source]¶

Semi-automated YANK Experiment analysis with serializable data.

This class is designed to replace the older analyze_directory functions by providing a common analysis data interface which other classes and methods can draw on. This is designed to semi-automate the combination of multi-phase data

Each of the main methods fetches the data from each phase and returns them as a dictionary to the user. The total dump of data to serialized YAML files can also be done.

Each function documents what its output data structure and entries surrounded by curly braces ({ }) indicate variables which change per experiment, often the data.

Output dictionary is of the form:

yank_version: {YANK Version}
phase_names: {Name of each phase, depends on simulation type}
general: {See :func:`get_general_simulation_data`}
equilibration: {See :func:`get_equilibration_data`}
mixing: {See :func:`get_mixing_data`}
free_energy: {See :func:`get_experiment_free_energy_data`}

Parameters:	store_directory : string Location where the analysis.yaml file is and where the NetCDF files are **analyzer_kwargs Keyword arguments to pass to the analyzer class. Quantities can be passed as strings.

Examples

Start with an experiment (Running from the yank.experiment.ExperimentBuilder example)

>>> import textwrap
>>> import openmmtools as mmtools
>>> import yank.utils
>>> import yank.experiment.ExperimentBuilder as ExperimentBuilder
>>> setup_dir = yank.utils.get_data_filename(os.path.join('..', 'examples',
...                                          'p-xylene-implicit', 'input'))
>>> pxylene_path = os.path.join(setup_dir, 'p-xylene.mol2')
>>> lysozyme_path = os.path.join(setup_dir, '181L-pdbfixer.pdb')
>>> with mmtools.utils.temporary_directory() as tmp_dir:
...     yaml_content = '''
...     ---
...     options:
...       default_number_of_iterations: 1
...       output_dir: {}
...     molecules:
...       T4lysozyme:
...         filepath: {}
...       p-xylene:
...         filepath: {}
...         antechamber:
...           charge_method: bcc
...     solvents:
...       vacuum:
...         nonbonded_method: NoCutoff
...     systems:
...         my_system:
...             receptor: T4lysozyme
...             ligand: p-xylene
...             solvent: vacuum
...             leap:
...               parameters: [leaprc.gaff, leaprc.ff14SB]
...     protocols:
...       absolute-binding:
...         complex:
...           alchemical_path:
...             lambda_electrostatics: [1.0, 0.9, 0.8, 0.6, 0.4, 0.2, 0.0]
...             lambda_sterics: [1.0, 0.9, 0.8, 0.6, 0.4, 0.2, 0.0]
...         solvent:
...           alchemical_path:
...             lambda_electrostatics: [1.0, 0.8, 0.6, 0.3, 0.0]
...             lambda_sterics: [1.0, 0.8, 0.6, 0.3, 0.0]
...     experiments:
...       system: my_system
...       protocol: absolute-binding
...     '''.format(tmp_dir, lysozyme_path, pxylene_path)
>>> yaml_builder = ExperimentBuilder(textwrap.dedent(yaml_content))
>>> yaml_builder.run_experiments()

Now analyze the experiment

>>> import os
>>> exp_analyzer = ExperimentAnalyzer(os.path.join(tmp_dir, 'experiment'))
>>> analysis_data = exp_analyzer.auto_analyze()

Attributes:

use_full_trajectory : bool. Analyze with subsampled or complete trajectory
nphases : int. Number of phases detected
phases_names : list of phase names. Used as keys on all attributes below
signs : dict of str. Sign assigned to each phase
analyzers : dict of YankPhaseAnalyzer
iterations : dict of int. Number of maximum iterations in each phase
u_ns : dict of np.ndarray. Timeseries of each phase
nequils : dict of int. Number of equilibrium iterations in each phase
g_ts : dict of int. Subsample rate past nequils in each phase
Neff_maxs : dict of int. Number of effective samples in each phase

get_general_simulation_data()[source]¶

General purpose simulation data on number of iterations, number of states, and number of atoms. This just prints out this data in a regular, formatted pattern.

Output is of the form:

{for phase_name in phase_names}
    iterations : {int}
    natoms : {int}
    nreplicas : {int}
    nstates : {int}

Returns:	general_data : dict General simulation data by phase.

get_equilibration_data(discard_from_start=1)[source]¶

Create the equilibration scatter plots showing the trend lines, correlation time, and number of effective samples

Output is of the form:

{for phase_name in phase_names}
    discarded_from_start : {int}
    effective_samples : {float}
    subsample_rate : {float}
    iterations_considered : {1D np.ndarray of int}
    subsample_rate_by_iterations_considered : {1D np.ndarray of float}
    effective_samples_by_iterations_considered : {1D np.ndarray of float}
    count_total_equilibration_samples : {int}
    count_decorrelated_samples : {int}
    count_correlated_samples : {int}
    percent_total_equilibration_samples : {float}
    percent_decorrelated_samples : {float}
    percent_correlated_samples : {float}

Returns:	equilibration_data : dict Dictionary with the equilibration data

get_mixing_data()[source]¶

Get state diffusion mixing arrays

Output is of the form:

{for phase_name in phase_names}
    transitions : {[nstates, nstates] np.ndarray of float}
    eigenvalues : {[nstates] np.ndarray of float}
    stat_inefficiency : {float}

Returns:	mixing_data : dict Dictionary of mixing data

get_experiment_free_energy_data()[source]¶

Get the free Yank Experiment free energy, broken down by phase and total experiment

Output is of the form:

{for phase_name in phase_names}
    sign : {str of either '+' or '-'}
    kT : {units.quantity}
    free_energy_diff : {float (has units of kT)}
    free_energy_diff_error : {float (has units of kT)}
    free_energy_diff_standard_state_correction : {float (has units of kT)}
    enthalpy_diff : {float (has units of kT)}
    enthalpy_diff_error : {float (has units of kT)}
free_energy_diff : {float (has units of kT)}
free_energy_diff_error : {float (has units of kT)}
free_energy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
free_energy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
enthalpy_diff : {float (has units of kT)}
enthalpy_diff_error : {float (has units of kT)}
enthalpy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
enthalpy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}

Returns:	free_energy_data : dict Dictionary of free energy data

auto_analyze()[source]¶

Run the analysis

Output is of the form:

yank_version: {YANK Version}
phase_names: {Name of each phase, depends on simulation type}
general:
    {for phase_name in phase_names}
        iterations : {int}
        natoms : {int}
        nreplicas : {int}
        nstates : {int}
equilibration:
    {for phase_name in phase_names}
        discarded_from_start : {int}
        effective_samples : {float}
        subsample_rate : {float}
        iterations_considered : {1D np.ndarray of int}
        subsample_rate_by_iterations_considered : {1D np.ndarray of float}
        effective_samples_by_iterations_considered : {1D np.ndarray of float}
        count_total_equilibration_samples : {int}
        count_decorrelated_samples : {int}
        count_correlated_samples : {int}
        percent_total_equilibration_samples : {float}
        percent_decorrelated_samples : {float}
        percent_correlated_samples : {float}
mixing:
    {for phase_name in phase_names}
        transitions : {[nstates, nstates] np.ndarray of float}
        eigenvalues : {[nstates] np.ndarray of float}
        stat_inefficiency : {float}
free_energy:
    {for phase_name in phase_names}
        sign : {str of either '+' or '-'}
        kT : {units.quantity}
        free_energy_diff : {float (has units of kT)}
        free_energy_diff_error : {float (has units of kT)}
        free_energy_diff_standard_state_correction : {float (has units of kT)}
        enthalpy_diff : {float (has units of kT)}
        enthalpy_diff_error : {float (has units of kT)}
    free_energy_diff : {float (has units of kT)}
    free_energy_diff_error : {float (has units of kT)}
    free_energy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
    free_energy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
    enthalpy_diff : {float (has units of kT)}
    enthalpy_diff_error : {float (has units of kT)}
    enthalpy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
    enthalpy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}

Returns:	serialized_data : dict Dictionary of all the auto-analysis calls organized by section headers. See each of the functions to see each of the sub-dictionary structures

Parameters:	serialize_data : bool, Default: True Choose whether or not to serialize the data serial_data_path: str, Optional Name of the serial data file. If not specified, name will be {YAML file name}_analysis.pkl` analyzer_kwargs Additional keywords which will be fed into the `YankMultiStateSamplerAnalyzer` for each phase of each experiment.
Returns:	serial_output : dict Dictionary of each experiment’s output of format {exp_name: ExperimentAnalyzer.auto_analyze() for exp_name in ExperimentBuilder’s Experiments} The sub-dictionary of each key can be seen in `ExperimentAnalyzer.auto_analyze()` docstring

Parameters:	ncfile : netCDF4.Dataset Open NetCDF file to analyze
Returns:	u_n : numpy array of numpy.float64 u_n[n] is -log q(X_n)