Analyze¶
Analysis tools and module for YANK simulations. Provides programmatic and automatic “best practices” integration to determine free energy and other observables.
Fully extensible to support new samplers and observables.
-
yank.analyze.
generate_phase_name
(current_name, name_list)[source]¶ Provide a regular way to generate unique human-readable names from base names.
Given a base name and a list of existing names, a number will be appended to the base name until a unique string is generated.
Parameters: current_name : string
The base name you wish to ensure is unique. Numbers will be appended to this string until a unique string not in the name_list is provided
name_list : iterable of strings
The current_name, and its modifiers, are compared against this list until a unique string is found
Returns: name : string
Unique string derived from the current_name that is not in name_list. If the parameter current_name is not already in the name_list, then current_name is returned unmodified.
-
yank.analyze.
get_analyzer
(file_base_path)[source]¶ Utility function to convert storage file to a Reporter and Analyzer by reading the data on file
For now this is mostly placeholder functions since there is only the implemented
ReplicaExchangeAnalyzer
, but creates the API for the user to work with.Parameters: file_base_path : string
Complete path to the storage file with filename and extension.
Returns: analyzer : instance of implemented
YankPhaseAnalyzer
Analyzer for the specific phase.
-
yank.analyze.
get_decorrelation_time
(timeseries_to_analyze)[source]¶ Compute the decorrelation times given a timeseries.
See the
pymbar.timeseries.statisticalInefficiency
for full documentation
-
yank.analyze.
get_equilibration_data
(timeseries_to_analyze)[source]¶ Compute equilibration method given a timeseries
See the
pymbar.timeseries.detectEquilibration
function for full documentation
-
yank.analyze.
get_equilibration_data_per_sample
(timeseries_to_analyze, fast=True, nskip=1)[source]¶ Compute the correlation time and n_effective per sample.
This is exactly what
pymbar.timeseries.detectEquilibration
does, but returns the per sample dataSee the
pymbar.timeseries.detectEquilibration
function for full documentation
-
yank.analyze.
remove_unequilibrated_data
(data, number_equilibrated, axis)[source]¶ Remove the number_equilibrated samples from a dataset
Discards number_equilibrated number of indices from given axis
Parameters: data : np.array-like of any dimension length
This is the data which will be paired down
number_equilibrated : int
Number of indices that will be removed from the given axis, i.e. axis will be shorter by number_equilibrated
axis : int
Axis index along which to remove samples from. This supports negative indexing as well
Returns: equilibrated_data : ndarray
Data with the number_equilibrated number of indices removed from the beginning along axis
-
yank.analyze.
subsample_data_along_axis
(data, subsample_rate, axis)[source]¶ Generate a decorrelated version of a given input data and subsample_rate along a single axis.
Parameters: data : np.array-like of any dimension length
subsample_rate : float or int
Rate at which to draw samples. A sample is considered decorrelated after every ceil(subsample_rate) of indices along data and the specified axis
axis : int
axis along which to apply the subsampling
Returns: subsampled_data : ndarray of same number of dimensions as data
Data will be subsampled along the given axis
-
class
yank.analyze.
YankPhaseAnalyzer
(reporter, name=None, reference_states=(0, -1), analysis_kwargs=None)[source]¶ Analyzer for a single phase of a YANK simulation.
Uses the reporter from the simulation to determine the location of all variables.
To compute a specific observable in an implementation of this class, add it to the ObservableRegistry and then implement a
get_X
whereX
is the name of the observable you want to compute. See the ObservablesRegistry for information about formatting the observables.Analyzer works in units of kT unless specifically stated otherwise. To convert back to a unit set, just multiply by the .kT property.
Parameters: reporter : Reporter instance
Reporter from Repex which ties to the simulation data on disk.
name : str, Optional
Unique name you want to assign this phase, this is the name that will appear in
MultiPhaseAnalyzer
‘s. If not set, it will be given the arbitrary name “phase#” where # is an integer, chosen in order that it is assigned to theMultiPhaseAnalyzer
.reference_states : tuple of ints, length 2, Optional, Default: (0,-1)
Integers
i
andj
of the state that is used for reference in observables, “O”. These values are only used when reporting single numbers or combining observables throughMultiPhaseAnalyzer
(since the number of states between phases can be different). Calls to functions such asget_free_energy
in a single Phase results in the O being returned for all states.For O completely defined by the state itself (i.e. no differences between states, e.g. Temperature), only O[i] is used
For O where differences between states are required (e.g. Free Energy): O[i,j] = O[j] - O[i]
For O defined by the phase as a whole, the reference states are not needed.
analysis_kwargs : None or dict, optional
Dictionary of extra keyword arguments to pass into the analysis tool, typically MBAR. For instance, the initial guess of relative free energies to give to MBAR would be something like:
{'initial_f_k':[0,1,2,3]}
Attributes
name
User-readable string name of the phase observables
List of observables that the instanced analyzer can compute/fetch. mbar
MBAR object tied to this phase reference_states
Tuple of reference states i
andj
forMultiPhaseAnalyzer
instanceskT
Quantity of boltzmann constant times temperature of the phase in units of energy per mol reporter
Sampler Reporter tied to this object. -
name
¶ User-readable string name of the phase
-
observables
¶ List of observables that the instanced analyzer can compute/fetch.
This list is automatically compiled upon class initialization based on the functions implemented in the subclass
-
mbar
¶ MBAR object tied to this phase
-
reference_states
¶ Tuple of reference states
i
andj
forMultiPhaseAnalyzer
instances
-
kT
¶ Quantity of boltzmann constant times temperature of the phase in units of energy per mol
Allows conversion between dimensionless energy and unit bearing energy
-
reporter
¶ Sampler Reporter tied to this object.
-
analyze_phase
(*args, **kwargs)[source]¶ Auto-analysis function for the phase
Function which broadly handles “auto-analysis” for those that do not wish to call all the methods on their own. This should be have like the old “analyze” function from versions of YANK pre-1.0.
Returns a dictionary of analysis objects
-
get_states_energies
()[source]¶ Extract the deconvoluted energies from a phase.
Energies from this are NOT decorrelated.
Returns: sampled_energy_matrix : numpy.ndarray of shape K,K,N
Deconvoluted energy of sampled states evaluated at other sampled states.
Has shape (K,K,N) = (number of sampled states, number of sampled states, number of iterations)
Indexed by [k,l,n] where an energy drawn from sampled state [k] is evaluated in sampled state [l] at iteration [n]
unsampled_energy_matrix : numpy.ndarray of shape K,L,N
Has shape (K, L, N) = (number of sampled states, number of UN-sampled states, number of iterations)
Indexed by [k,l,n] where an energy drawn from sampled state [k] is evaluated in un-sampled state [l] at iteration [n]
-
-
class
yank.analyze.
ReplicaExchangeAnalyzer
(reporter, name=None, reference_states=(0, -1), analysis_kwargs=None)[source]¶ The ReplicaExchangeAnalyzer is the analyzer for a simulation generated from a Replica Exchange sampler simulation, implemented as an instance of the
YankPhaseAnalyzer
.See also
-
generate_mixing_statistics
(number_equilibrated: typing.Union[int, NoneType] = None) → typing.NamedTuple[source]¶ Compute and return replica mixing statistics.
Compute the transition state matrix, its eigenvalues sorted from greatest to least, and the state index correlation function.
Parameters: number_equilibrated : int, optional, default=None
If specified, only samples
number_equilibrated:end
will be used in analysis. If not specified, automatically retrieves the number from equilibration data or generates it from the internal energy.Returns: mixing_statistics : namedtuple
A namedtuple containing the following attributes: -
transition_matrix
: (nstates by nstatesnp.array
) -eigenvalues
: (nstates-dimensionalnp.array
) -statistical_inefficiency
: float
-
show_mixing_statistics
(cutoff=0.05, number_equilibrated=None)[source]¶ Print summary of mixing statistics. Passes information off to generate_mixing_statistics then prints it out to the logger
Parameters: cutoff : float, optional, default=0.05
Only transition probabilities above ‘cutoff’ will be printed
number_equilibrated : int, optional, default=None
If specified, only samples number_equilibrated:end will be used in analysis If not specified, it uses the internally held statistics best
-
get_states_energies
()[source]¶ Extract and decorrelate energies from the ncfile to gather energies common data for other functions
Returns: energy_matrix : ndarray of shape (K,K,N)
Potential energy matrix of the sampled states Energy is from each sample n, drawn from state (first k), and evaluated at every sampled state (second k)
unsampled_energy_matrix : ndarray of shape (K,L,N)
Potential energy matrix of the unsampled states Energy from each sample n, drawn from sampled state k, evaluated at unsampled state l If no unsampled states were drawn, this will be shape (K,0,N)
-
static
get_timeseries
(passed_timeseries)[source]¶ Compute the timeseries of a simulation from the Replica Exchange simulation. This is the sum of energies for each sample from the state it was drawn from.
Parameters: passed_timeseries : ndarray of shape (K,L,N), indexed by k,l,n
K is the total number of sampled states
L is the total states we want MBAR to analyze
N is the total number of samples
The kth sample was drawn from state k at iteration n, the nth configuration of kth state is evaluated in thermodynamic state l
Returns: u_n : ndarray of shape (N,)
Timeseries to compute decorrelation and equilibration data from.
-
get_free_energy
()[source]¶ Compute the free energy and error in free energy from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: DeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in free energy from each state relative to each other state
dDeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in free energy from each state relative to each other state
-
get_enthalpy
()[source]¶ Compute the difference in enthalpy and error in that estimate from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in enthalpy from each state relative to each other state
dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in enthalpy from each state relative to each other state
-
get_entropy
()[source]¶ Compute the difference in entropy and error in that estimate from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in enthalpy from each state relative to each other state
dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in enthalpy from each state relative to each other state
-
get_standard_state_correction
()[source]¶ Compute the standard state correction free energy associated with the Phase.
This usually is just a stored variable, but it may need other calculations.
Returns: standard_state_correction : float
Free energy contribution from the standard_state_correction
-
kT
¶ Quantity of boltzmann constant times temperature of the phase in units of energy per mol
Allows conversion between dimensionless energy and unit bearing energy
-
mbar
¶ MBAR object tied to this phase
-
name
¶ User-readable string name of the phase
-
observables
¶ List of observables that the instanced analyzer can compute/fetch.
This list is automatically compiled upon class initialization based on the functions implemented in the subclass
-
reference_states
¶ Tuple of reference states
i
andj
forMultiPhaseAnalyzer
instances
-
reporter
¶ Sampler Reporter tied to this object.
-
-
class
yank.analyze.
MultiPhaseAnalyzer
(phases)[source]¶ Multiple Phase Analyzer creator, not to be directly called itself, but instead called by adding or subtracting different implemented
YankPhaseAnalyzer
or otherMultiPhaseAnalyzers
‘s. The individual Phases of theMultiPhaseAnalyzer
are only references to existing Phase objects, not copies. AllYankPhaseAnalyzer
andMultiPhaseAnalyzer
classes support+
and-
operations.The observables of this phase are determined through inspection of all the passed in phases and only observables which are shared can be computed. For example:
PhaseA
has.get_free_energy
and.get_entropy
PhaseB
has.get_free_energy
and.get_enthalpy
,PhaseAB = PhaseA + PhaseB
will only have a.get_free_energy
methodBecause each Phase may have a different number of states, the
reference_states
property of each phase determines which states from each phase to read the data from.For observables defined by two states, the i’th and j’th reference states are used:
If we define
PhaseAB = PhaseA - PhaseB
Then
PhaseAB.get_free_energy()
is roughly equivalent to doing the following:A_i, A_j = PhaseA.reference_states
B_i, B_j = PhaseB.reference_states
PhaseA.get_free_energy()[A_i, A_j] - PhaseB.get_free_energy()[B_i, B_j]
The above is not exact since get_free_energy returns an error estimate as well
For observables defined by a single state, only the i’th reference state is used
Given
PhaseAB = PhaseA + PhaseB
,PhaseAB.get_temperature()
is equivalent to:A_i = PhaseA.reference_states[0]
B_i = PhaseB.reference_states[0]
PhaseA.get_temperature()[A_i] + PhaseB.get_temperature()[B_i]
For observables defined entirely by the phase, no reference states are needed.
Given
PhaseAB = PhaseA + PhaseB
,PhaseAB.get_standard_state_correction()
gives:PhaseA.get_standard_state_correction() + PhaseB.get_standard_state_correction()
This class is public to see its API.
Parameters: phases : dict
has keys “phases”, “names”, and “signs”
Attributes
observables
List of observables this MultiPhaseAnalyzer
can generatephases
List of implemented YankPhaseAnalyzer
‘s objects thisMultiPhaseAnalyzer
is tied tonames
Unique list of string names identifying this phase. signs
List of signs that are used by the MultiPhaseAnalyzer
to-
observables
¶ List of observables this
MultiPhaseAnalyzer
can generate
-
phases
¶ List of implemented
YankPhaseAnalyzer
‘s objects thisMultiPhaseAnalyzer
is tied to
-
names
¶ Unique list of string names identifying this phase. If this
MultiPhaseAnalyzer
is combined with another, its possible that new names will be generated unique to thatMultiPhaseAnalyzer
, but will still reference the same phase.When in doubt, use
MultiPhaseAnalyzer.phases()
to get the actual phase objects.
-
signs
¶ List of signs that are used by the
MultiPhaseAnalyzer
to
-
-
yank.analyze.
analyze_directory
(source_directory)[source]¶ Analyze contents of store files to compute free energy differences.
This function is needed to preserve the old auto-analysis style of YANK. What it exactly does can be refined when more analyzers and simulations are made available. For now this function exposes the API.
Parameters: source_directory : string
The location of the simulation storage files.
-
yank.analyze.
extract_u_n
(ncfile)[source]¶ Extract timeseries of u_n = - log q(X_n) from store file
where q(X_n) = pi_{k=1}^K u_{s_{nk}}(x_{nk})
with X_n = [x_{n1}, ..., x_{nK}] is the current collection of replica configurations s_{nk} is the current state of replica k at iteration n u_k(x) is the kth reduced potential
TODO: Figure out a way to remove this function
Parameters: ncfile : netCDF4.Dataset
Open NetCDF file to analyze
Returns: u_n : numpy array of numpy.float64
u_n[n] is -log q(X_n)
-
yank.analyze.
extract_trajectory
(nc_path, nc_checkpoint_file=None, state_index=None, replica_index=None, start_frame=0, end_frame=-1, skip_frame=1, keep_solvent=True, discard_equilibration=False, image_molecules=False)[source]¶ Extract phase trajectory from the NetCDF4 file.
Parameters: nc_path : str
Path to the primary nc_file storing the analysis options
nc_checkpoint_file : str or None, Optional
File name of the checkpoint file housing the main trajectory Used if the checkpoint file is differently named from the default one chosen by the nc_path file. Default: None
state_index : int, optional
The index of the alchemical state for which to extract the trajectory. One and only one between state_index and replica_index must be not None (default is None).
replica_index : int, optional
The index of the replica for which to extract the trajectory. One and only one between state_index and replica_index must be not None (default is None).
start_frame : int, optional
Index of the first frame to include in the trajectory (default is 0).
end_frame : int, optional
Index of the last frame to include in the trajectory. If negative, will count from the end (default is -1).
skip_frame : int, optional
Extract one frame every skip_frame (default is 1).
keep_solvent : bool, optional
If False, solvent molecules are ignored (default is True).
discard_equilibration : bool, optional
If True, initial equilibration frames are discarded (see the method pymbar.timeseries.detectEquilibration() for details, default is False).
Returns: trajectory: mdtraj.Trajectory
The trajectory extracted from the netcdf file.