Analyze¶
YANK Specific analysis tools for YANK simulations from the yank.yank.AlchemicalPhase
classes
Extends classes from the MultiStateAnalyzer package to include the
-
class
yank.analyze.
ExperimentAnalyzer
(store_directory, **analyzer_kwargs)[source]¶ Semi-automated YANK Experiment analysis with serializable data.
This class is designed to replace the older
analyze_directory
functions by providing a common analysis data interface which other classes and methods can draw on. This is designed to semi-automate the combination of multi-phase dataEach of the main methods fetches the data from each phase and returns them as a dictionary to the user. The total dump of data to serialized YAML files can also be done.
Each function documents what its output data structure and entries surrounded by curly braces (
{ }
) indicate variables which change per experiment, often the data.Output dictionary is of the form:
yank_version: {YANK Version} phase_names: {Name of each phase, depends on simulation type} general: {See :func:`get_general_simulation_data`} equilibration: {See :func:`get_equilibration_data`} mixing: {See :func:`get_mixing_data`} free_energy: {See :func:`get_experiment_free_energy_data`}
Parameters: - store_directory : string
Location where the analysis.yaml file is and where the NetCDF files are
- **analyzer_kwargs
Keyword arguments to pass to the analyzer class. Quantities can be passed as strings.
Examples
Start with an experiment (Running from the
yank.experiment.ExperimentBuilder
example)>>> import textwrap >>> import openmmtools as mmtools >>> import yank.utils >>> import yank.experiment.ExperimentBuilder as ExperimentBuilder >>> setup_dir = yank.utils.get_data_filename(os.path.join('..', 'examples', ... 'p-xylene-implicit', 'input')) >>> pxylene_path = os.path.join(setup_dir, 'p-xylene.mol2') >>> lysozyme_path = os.path.join(setup_dir, '181L-pdbfixer.pdb') >>> with mmtools.utils.temporary_directory() as tmp_dir: ... yaml_content = ''' ... --- ... options: ... default_number_of_iterations: 1 ... output_dir: {} ... molecules: ... T4lysozyme: ... filepath: {} ... p-xylene: ... filepath: {} ... antechamber: ... charge_method: bcc ... solvents: ... vacuum: ... nonbonded_method: NoCutoff ... systems: ... my_system: ... receptor: T4lysozyme ... ligand: p-xylene ... solvent: vacuum ... leap: ... parameters: [leaprc.gaff, leaprc.ff14SB] ... protocols: ... absolute-binding: ... complex: ... alchemical_path: ... lambda_electrostatics: [1.0, 0.9, 0.8, 0.6, 0.4, 0.2, 0.0] ... lambda_sterics: [1.0, 0.9, 0.8, 0.6, 0.4, 0.2, 0.0] ... solvent: ... alchemical_path: ... lambda_electrostatics: [1.0, 0.8, 0.6, 0.3, 0.0] ... lambda_sterics: [1.0, 0.8, 0.6, 0.3, 0.0] ... experiments: ... system: my_system ... protocol: absolute-binding ... '''.format(tmp_dir, lysozyme_path, pxylene_path) >>> yaml_builder = ExperimentBuilder(textwrap.dedent(yaml_content)) >>> yaml_builder.run_experiments()
Now analyze the experiment
>>> import os >>> exp_analyzer = ExperimentAnalyzer(os.path.join(tmp_dir, 'experiment')) >>> analysis_data = exp_analyzer.auto_analyze()
Attributes: - use_full_trajectory : bool. Analyze with subsampled or complete trajectory
- nphases : int. Number of phases detected
- phases_names : list of phase names. Used as keys on all attributes below
- signs : dict of str. Sign assigned to each phase
- analyzers : dict of YankPhaseAnalyzer
- iterations : dict of int. Number of maximum iterations in each phase
- u_ns : dict of np.ndarray. Timeseries of each phase
- nequils : dict of int. Number of equilibrium iterations in each phase
- g_ts : dict of int. Subsample rate past nequils in each phase
- Neff_maxs : dict of int. Number of effective samples in each phase
-
get_general_simulation_data
()[source]¶ General purpose simulation data on number of iterations, number of states, and number of atoms. This just prints out this data in a regular, formatted pattern.
Output is of the form:
{for phase_name in phase_names} iterations : {int} natoms : {int} nreplicas : {int} nstates : {int}
Returns: - general_data : dict
General simulation data by phase.
-
get_equilibration_data
(discard_from_start=1)[source]¶ Create the equilibration scatter plots showing the trend lines, correlation time, and number of effective samples
Output is of the form:
{for phase_name in phase_names} discarded_from_start : {int} effective_samples : {float} subsample_rate : {float} iterations_considered : {1D np.ndarray of int} subsample_rate_by_iterations_considered : {1D np.ndarray of float} effective_samples_by_iterations_considered : {1D np.ndarray of float} count_total_equilibration_samples : {int} count_decorrelated_samples : {int} count_correlated_samples : {int} percent_total_equilibration_samples : {float} percent_decorrelated_samples : {float} percent_correlated_samples : {float}
Returns: - equilibration_data : dict
Dictionary with the equilibration data
-
get_mixing_data
()[source]¶ Get state diffusion mixing arrays
Output is of the form:
{for phase_name in phase_names} transitions : {[nstates, nstates] np.ndarray of float} eigenvalues : {[nstates] np.ndarray of float} stat_inefficiency : {float}
Returns: - mixing_data : dict
Dictionary of mixing data
-
get_experiment_free_energy_data
()[source]¶ Get the free Yank Experiment free energy, broken down by phase and total experiment
Output is of the form:
{for phase_name in phase_names} sign : {str of either '+' or '-'} kT : {units.quantity} free_energy_diff : {float (has units of kT)} free_energy_diff_error : {float (has units of kT)} free_energy_diff_standard_state_correction : {float (has units of kT)} enthalpy_diff : {float (has units of kT)} enthalpy_diff_error : {float (has units of kT)} free_energy_diff : {float (has units of kT)} free_energy_diff_error : {float (has units of kT)} free_energy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} free_energy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} enthalpy_diff : {float (has units of kT)} enthalpy_diff_error : {float (has units of kT)} enthalpy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} enthalpy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
Returns: - free_energy_data : dict
Dictionary of free energy data
-
auto_analyze
()[source]¶ Run the analysis
Output is of the form:
yank_version: {YANK Version} phase_names: {Name of each phase, depends on simulation type} general: {for phase_name in phase_names} iterations : {int} natoms : {int} nreplicas : {int} nstates : {int} equilibration: {for phase_name in phase_names} discarded_from_start : {int} effective_samples : {float} subsample_rate : {float} iterations_considered : {1D np.ndarray of int} subsample_rate_by_iterations_considered : {1D np.ndarray of float} effective_samples_by_iterations_considered : {1D np.ndarray of float} count_total_equilibration_samples : {int} count_decorrelated_samples : {int} count_correlated_samples : {int} percent_total_equilibration_samples : {float} percent_decorrelated_samples : {float} percent_correlated_samples : {float} mixing: {for phase_name in phase_names} transitions : {[nstates, nstates] np.ndarray of float} eigenvalues : {[nstates] np.ndarray of float} stat_inefficiency : {float} free_energy: {for phase_name in phase_names} sign : {str of either '+' or '-'} kT : {units.quantity} free_energy_diff : {float (has units of kT)} free_energy_diff_error : {float (has units of kT)} free_energy_diff_standard_state_correction : {float (has units of kT)} enthalpy_diff : {float (has units of kT)} enthalpy_diff_error : {float (has units of kT)} free_energy_diff : {float (has units of kT)} free_energy_diff_error : {float (has units of kT)} free_energy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} free_energy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} enthalpy_diff : {float (has units of kT)} enthalpy_diff_error : {float (has units of kT)} enthalpy_diff_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT} enthalpy_diff_error_unit : {units.quantity compatible with energy/mole. Corrected for different phase kT}
Returns: - serialized_data : dict
Dictionary of all the auto-analysis calls organized by section headers. See each of the functions to see each of the sub-dictionary structures
-
yank.analyze.
get_analyzer
(file_base_path, **analyzer_kwargs)[source]¶ Utility function to convert storage file to a Reporter and Analyzer by reading the data on file
For now this is mostly placeholder functions since there is only the implemented
ReplicaExchangeAnalyzer
, but creates the API for the user to work with.Parameters: - file_base_path : string
Complete path to the storage file with filename and extension.
- **analyzer_kwargs
Keyword arguments to pass to the analyzer.
Returns: - analyzer : instance of implemented
Yank*Analyzer
Analyzer for the specific phase.
-
yank.analyze.
analyze_directory
(source_directory, **analyzer_kwargs)[source]¶ Analyze contents of store files to compute free energy differences.
This function is needed to preserve the old auto-analysis style of YANK. What it exactly does can be refined when more analyzers and simulations are made available. For now this function exposes the API.
Parameters: - source_directory : string
The location of the simulation storage files.
- **analyzer_kwargs
Keyword arguments to pass to the analyzer.
Returns: - analysis_data : dict
Dictionary containing all the automatic analysis data
-
yank.analyze.
print_analysis_data
(analysis_data, header=None)[source]¶ Helper function of printing the analysis data payload from a
ExperimentAnalyzer.auto_analyze()
callParameters: - analysis_data : dict
Output from
ExperimentAnalyzer.auto_analyze()
- header : str, Optional
Optional header string to print before the formatted helper, useful if you plan to make this call multiple times but want to divide the outputs.
-
class
yank.analyze.
MultiExperimentAnalyzer
(script, **builder_kwargs)[source]¶ Automatic Analysis tool for Multiple YANK Experiments from YAML file
This class takes in a YAML file, infers the experiments from expansion of all combinatorial options, then does the automatic analysis run from
ExperimentAnalyzer.auto_analyze()
yielding a final dictionary output.Parameters: - script : str or dict
Full path to the YAML file which made the YANK experiments. OR The loaded yaml content of said script.
- builder_kwargs
Additional keyword arguments which normally are passed to the
ExperimentBuilder
constructor. The experiments are not setup or built, only the output structure is referenced.
See also
-
run_all_analysis
(serialize_data=True, serial_data_path=None, **analyzer_kwargs)[source]¶ Run all the automatic analysis through the
ExperimentAnalyzer.auto_analyze()
Parameters: - serialize_data : bool, Default: True
Choose whether or not to serialize the data
- serial_data_path: str, Optional
Name of the serial data file. If not specified, name will be {YAML file name}_analysis.pkl`
- analyzer_kwargs
Additional keywords which will be fed into the
YankMultiStateSamplerAnalyzer
for each phase of each experiment.
Returns: - serial_output : dict
Dictionary of each experiment’s output of format {exp_name: ExperimentAnalyzer.auto_analyze() for exp_name in ExperimentBuilder’s Experiments} The sub-dictionary of each key can be seen in
ExperimentAnalyzer.auto_analyze()
docstring
See also
-
yank.analyze.
extract_u_n
(ncfile)[source]¶ Extract timeseries of u_n = - log q(X_n) from store file
where q(X_n) = pi_{k=1}^K u_{s_{nk}}(x_{nk})
with X_n = [x_{n1}, ..., x_{nK}] is the current collection of replica configurations s_{nk} is the current state of replica k at iteration n u_k(x) is the kth reduced potential
TODO: Figure out a way to remove this function
Parameters: - ncfile : netCDF4.Dataset
Open NetCDF file to analyze
Returns: - u_n : numpy array of numpy.float64
u_n[n] is -log q(X_n)
-
yank.analyze.
extract_trajectory
(nc_path, nc_checkpoint_file=None, state_index=None, replica_index=None, start_frame=0, end_frame=-1, skip_frame=1, keep_solvent=True, discard_equilibration=False, image_molecules=False)[source]¶ Extract phase trajectory from the NetCDF4 file.
Parameters: - nc_path : str
Path to the primary nc_file storing the analysis options
- nc_checkpoint_file : str or None, Optional
File name of the checkpoint file housing the main trajectory Used if the checkpoint file is differently named from the default one chosen by the nc_path file. Default: None
- state_index : int, optional
The index of the alchemical state for which to extract the trajectory. One and only one between state_index and replica_index must be not None (default is None).
- replica_index : int, optional
The index of the replica for which to extract the trajectory. One and only one between state_index and replica_index must be not None (default is None).
- start_frame : int, optional
Index of the first frame to include in the trajectory (default is 0).
- end_frame : int, optional
Index of the last frame to include in the trajectory. If negative, will count from the end (default is -1).
- skip_frame : int, optional
Extract one frame every skip_frame (default is 1).
- keep_solvent : bool, optional
If False, solvent molecules are ignored (default is True).
- discard_equilibration : bool, optional
If True, initial equilibration frames are discarded (see the method pymbar.timeseries.detectEquilibration() for details, default is False).
Returns: - trajectory: mdtraj.Trajectory
The trajectory extracted from the netcdf file.