MultiStateAnalyzers¶
Analysis tools and module for MultiStateSampler simulations. Provides programmatic and automatic “best practices” integration to determine free energy and other observables.
Fully extensible to support new samplers and observables.
-
class
yank.multistate.multistateanalyzer.
ObservablesRegistry
[source]¶ Registry of computable observables.
This is a class accessed by the
PhaseAnalyzer
objects to check which observables can be computed, and then provide a regular categorization of them.This registry is a required linked component of any PhaseAnalyzer and especially of the MultiPhaseAnalyzer. This is not an internal class to the PhaseAnalyzer however because it can be instanced, extended, and customized as part of the API for this module.
To define your own methods: 1) Choose a unique observable name. 2) Categorize the observable in one of the following ways by adding to the list in the “observables_X” method:
- 2a) “defined_by_phase”:
- Depends on the Phase as a whole (state independent)
- 2b) “defined_by_single_state”:
- Computed entirely from one state, e.g. Radius of Gyration
- 2c) “defined_by_two_states”:
- Property is relative to some reference state, such as Free Energy Difference
Optionally categorize the error category calculation in the “observables_with_error_adding_Y” methods If not placed in an error category, the observable will be assumed not to carry error Examples: A, B, C are the observable in 3 phases, eA, eB, eC are the error of the observable in each phase
- 3a) “linear”: Error between phases adds linearly.
If C = A + B, eC = eA + eB
- 3b) “quadrature”: Error between phases adds in the square.
If C = A + B, eC = sqrt(eA^2 + eB^2)
Finally, to add this observable to the phase, implement a “get_{method name}” method to the subclass of
YankPhaseAnalyzer
. AnyMultiPhaseAnalyzer
composed of this phase will automatically have the “get_{method name}” if all other phases in theMultiPhaseAnalyzer
have the same method.
-
register_two_state_observable
(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶ Register a new two state observable, or re-register an existing one.
Parameters: - name: str
Name of the observable, will be cast to all lower case and spaces replaced with underscores
- error_class: “quad”, “linear”, or None
How the error of the observable is computed when added with other errors from the same observable.
- “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
- re_register: bool, optional, Default: False
Re-register an existing observable
-
register_one_state_observable
(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶ Register a new one state observable, or re-register an existing one.
Parameters: - name: str
Name of the observable, will be cast to all lower case and spaces replaced with underscores
- error_class: “quad”, “linear”, or None
How the error of the observable is computed when added with other errors from the same observable.
- “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
- re_register: bool, optional, Default: False
Re-register an existing observable
-
register_phase_observable
(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶ Register a new observable defined by phaee, or re-register an existing one.
Parameters: - name: str
Name of the observable, will be cast to all lower case and spaces replaced with underscores
- error_class: ‘quad’, ‘linear’, or None
How the error of the observable is computed when added with other errors from the same observable.
- ‘quad’: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- ‘linear’: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
- re_register: bool, optional, Default: False
Re-register an existing observable
-
observables
¶ Set of observables which are derived from the subsets below
-
observables_defined_by_two_states
¶ Observables that require an i and a j state to define the observable accurately between phases
-
observables_defined_by_single_state
¶ Defined observables which are fully defined by a single state, and not by multiple states such as differences
-
observables_defined_by_phase
¶ Observables which are defined by the phase as a whole, and not defined by any 1 or more states e.g. Standard State Correction
-
observables_with_error
¶ Determine which observables have error by inspecting the the error subsets
-
observables_with_error_adding_quadrature
¶ Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
-
observables_with_error_adding_linear
¶ Observable C = A + B, Error eC = eA + eB
-
class
yank.multistate.multistateanalyzer.
PhaseAnalyzer
(reporter, name=None, reference_states=(0, -1), max_n_iterations=None, analysis_kwargs=None, registry=<yank.multistate.multistateanalyzer.ObservablesRegistry object>, use_online_data=True, use_full_trajectory=False)[source]¶ Analyzer for a single phase of a MultiState simulation.
Uses the reporter from the simulation to determine the location of all variables.
To compute a specific observable in an implementation of this class, add it to the ObservableRegistry and then implement a
get_X
whereX
is the name of the observable you want to compute. See the ObservablesRegistry for information about formatting the observables.Analyzer works in units of kT unless specifically stated otherwise. To convert back to a unit set, just multiply by the .kT property.
A PhaseAnalyzer also needs an ObservablesRegistry to track how to handle each observable given implemented within for things like error and cross-phase analysis.
Parameters: - reporter : MultiStateReporter instance
Reporter from MultiState which ties to the simulation data on disk.
- name : str, Optional
Unique name you want to assign this phase, this is the name that will appear in
MultiPhaseAnalyzer
‘s. If not set, it will be given the arbitrary name “phase#” where # is an integer, chosen in order that it is assigned to theMultiPhaseAnalyzer
.- max_n_iterations : int, optional
The maximum number of iterations to analyze. If not provided, all the iterations will be analyzed.
- reference_states : tuple of ints, length 2, Optional, Default: (0,-1)
Integers
i
andj
of the state that is used for reference in observables, “O”. These values are only used when reporting single numbers or combining observables throughMultiPhaseAnalyzer
(since the number of states between phases can be different). Calls to functions such asget_free_energy
in a single Phase results in the O being returned for all states.For O completely defined by the state itself (i.e. no differences between states, e.g. Temperature), only O[i] is used
For O where differences between states are required (e.g. Free Energy): O[i,j] = O[j] - O[i]
For O defined by the phase as a whole, the reference states are not needed.
- analysis_kwargs : None or dict, optional
Dictionary of extra keyword arguments to pass into the analysis tool, typically MBAR. For instance, the initial guess of relative free energies to give to MBAR would be something like:
{'initial_f_k':[0,1,2,3]}
- use_online_data : bool, optional, Default: True
Attempt to read online analysis data as a way to hot-start the analysis computation. This will attempt to read the data stored by the MultiStateAnalyzer.
If this is set to
False
, the online analysis data is not read.If this is set to
False
after being initialized, theCachedProperty
dependencies are all invalidated and properties will be computed from scratch on next observablesIf no online data is found, this setting has no effect
- use_full_trajectory : bool, optional, Default: False
Force the analysis to use the full trajectory when automatically computing equilibration and decorrelation.
Normal behavior (when this is False) is to discard the initial trajectory due to automatic equilibration detection, and then subsample that data to generate decorelated samples. Setting this to
True
ignores this effect, even if the equilibration data is computed.This can be changed after the
PhaseAnalyzer
has been created to re-compute properties with or without the full trajectory.- registry : ObservablesRegistry instance
Instanced ObservablesRegistry with all observables implemented through a
get_X
function classified and registered. Any cross-phase analysis must use the same instance of an ObservablesRegistry
See also
Attributes: name
User-readable string name of the phase
observables
List of observables that the instanced analyzer can compute/fetch.
- max_n_iterations
reference_states
Tuple of reference states
i
andj
forMultiPhaseAnalyzer
instancesn_iterations
int: The total number of iterations of the phase.
n_replicas
int: Number of replicas.
n_states
int: Number of sampled thermodynamic states.
kT
Quantity of boltzmann constant times temperature of the phase in units of energy per mol
reporter
Sampler Reporter tied to this object.
- registry
use_online_data
Get the online data flag
- use_full_trajectory
-
clear
()[source]¶ Reset all cached objects.
This must to be called if the information in the reporter changes after analysis.
-
name
¶ User-readable string name of the phase
-
observables
¶ List of observables that the instanced analyzer can compute/fetch.
-
reference_states
¶ Tuple of reference states
i
andj
forMultiPhaseAnalyzer
instances
-
n_iterations
¶ int: The total number of iterations of the phase.
-
n_replicas
¶ int: Number of replicas.
-
n_states
¶ int: Number of sampled thermodynamic states.
-
kT
¶ Quantity of boltzmann constant times temperature of the phase in units of energy per mol
Allows conversion between dimensionless energy and unit bearing energy
-
reporter
¶ Sampler Reporter tied to this object.
-
use_online_data
¶ Get the online data flag
-
read_energies
()[source]¶ Extract energies from the ncfile and order them by replica, state, iteration.
Returns: - sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]
Potential energy matrix of the sampled states.
- unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]
Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N).
- neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]
Neighborhood energies were computed at, uses a boolean mask over the energy_matrix.
- replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]
States sampled by the replicas in the energy_matrix
-
has_log_weights
¶ Return True if the storage has log weights, False otherwise
-
read_log_weights
()[source]¶ Extract log weights from the ncfile, if present. Returns ValueError if not present.
Returns: - log_weights : np.ndarray of shape [n_states, n_iterations]
log_weights[l,n] is the log weight applied to state
l
during the collection of samples at iterationn
-
read_logZ
(iteration=None)[source]¶ Extract logZ estimates from the ncfile, if present. Returns ValueError if not present.
Parameters: - iteration : int or slice, optional, default=None
If specified, iteration or slice of iterations to extract
Returns: - logZ : np.ndarray of shape [n_states, n_iterations]
logZ[l,n] is the online logZ estimate for state
l
at iterationn
-
get_effective_energy_timeseries
(energies=None, replica_state_indices=None)[source]¶ Generate the effective energy (negative log deviance) timeseries that is generated for this phase
The effective energy for a series of samples x_n, n = 1..N, is defined as
u_n = - ln pi(x_n) + c
where pi(x) is the probability density being sampled, and c is an arbitrary constant.
Parameters: - energies : ndarray of shape (K,L,N), optional, Default: None
Energies from replicas K, sampled states L, and iterations N If provided, then states input_sampled_states must also be provided
- replica_state_indices : ndarray of shape (K,N), optional, Default: None
Integer indices of each sampled state (matching L dimension in input_energy) that each replica K sampled every iteration N. If provided, then states input_energies must also be provided
Returns: - u_n : ndarray of shape (N,)
u_n[n] is the negative log deviance of the same from iteration
n
Timeseries used to determine equilibration time and statistical inefficiency.
-
static
reformat_energies_for_mbar
(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)[source]¶ Convert [replica, state, iteration] data into [state, total_iteration] data
This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified.
Parameters: - u_kln : np.ndarray of shape (K,L,N’)
K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k
- n_k : np.ndarray of shape K or None
Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion.
If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k
WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state.
Returns: - u_ln : np.ndarray of shape (L, N)
Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’
-
class
yank.multistate.multistateanalyzer.
MultiStateSamplerAnalyzer
(*args, unbias_restraint=True, restraint_energy_cutoff='auto', restraint_distance_cutoff='auto', **kwargs)[source]¶ The MultiStateSamplerAnalyzer is the analyzer for a simulation generated from a MultiStateSampler simulation, implemented as an instance of the
PhaseAnalyzer
.Parameters: - unbias_restraint : bool, optional
If True and a radially-symmetric restraint was used in the simulation, the analyzer will remove the bias introduced by the restraint by reweighting each of the end-points to a state using a square-well potential restraint.
- restraint_energy_cutoff : float or ‘auto’, optional
When the restraint is unbiased, the analyzer discards all the samples for which the restrain potential energy (in kT) is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance such that the restraint potential energy is equal to
restraint_energy_cutoff
.If
'auto'
andrestraint_distance_cutoff
isNone
, this will be set to the 99.9-percentile of the distribution of the restraint energies in the bound state.- restraint_distance_cutoff : simtk.unit.Quantity or ‘auto’, optional
When the restraint is unbiased, the analyzer discards all the samples for which the distance between the restrained atoms is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance
restraint_distance_cutoff
.If
'auto'
andrestraint_energy_cutoff
is not specified, this will be set to the 99.9-percentile of the distribution of the restraint distances in the bound state.
See also
Attributes: - unbias_restraint
- restraint_energy_cutoff
- restraint_distance_cutoff
- mbar
n_equilibration_iterations
int: The number of equilibration interations.
statistical_inefficiency
float: The statistical inefficiency of the sampler.
-
clear
()[source]¶ Reset all cached objects.
This must to be called if the information in the reporter changes after analysis.
-
generate_mixing_statistics
(number_equilibrated: typing.Union[int, NoneType] = None) → typing.NamedTuple[source]¶ Compute and return replica mixing statistics.
Compute the transition state matrix, its eigenvalues sorted from greatest to least, and the state index correlation function.
Parameters: - number_equilibrated : int, optional, default=None
If specified, only samples
number_equilibrated:end
will be used in analysis. If not specified, automatically retrieves the number from equilibration data or generates it from the internal energy.
Returns: - mixing_statistics : namedtuple
A namedtuple containing the following attributes: -
transition_matrix
: (nstates by nstatesnp.array
) -eigenvalues
: (nstates-dimensionalnp.array
) -statistical_inefficiency
: float
-
show_mixing_statistics
(cutoff=0.05, number_equilibrated=None)[source]¶ Print summary of mixing statistics. Passes information off to generate_mixing_statistics then prints it out to the logger
Parameters: - cutoff : float, optional, default=0.05
Only transition probabilities above ‘cutoff’ will be printed
- number_equilibrated : int, optional, default=None
If specified, only samples number_equilibrated:end will be used in analysis If not specified, it uses the internally held statistics best
-
get_effective_energy_timeseries
(energies=None, replica_state_indices=None)[source]¶ Generate the effective energy (negative log deviance) timeseries that is generated for this phase.
The effective energy for a series of samples x_n, n = 1..N, is defined as
u_n = - ln pi(x_n) + c
where pi(x) is the probability density being sampled, and c is an arbitrary constant.
Parameters: - energies : ndarray of shape (K,L,N), optional, Default: None
Energies from replicas K, sampled states L, and iterations N. If provided, then states input_sampled_states must also be provided.
- replica_state_indices : ndarray of shape (K,N), optional, Default: None
Integer indices of each sampled state (matching L dimension in input_energy). that each replica K sampled every iteration N. If provided, then states input_energies must also be provided.
Returns: - u_n : ndarray of shape (N,)
u_n[n] is the negative log deviance of the same from iteration
n
Timeseries used to determine equilibration time and statistical inefficiency.
-
get_free_energy
()[source]¶ Compute the free energy and error in free energy from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: - DeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in free energy from each state relative to each other state
- dDeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in free energy from each state relative to each other state
-
get_enthalpy
()[source]¶ Compute the difference in enthalpy and error in that estimate from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: - DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in enthalpy from each state relative to each other state
- dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in enthalpy from each state relative to each other state
-
get_entropy
()[source]¶ Compute the difference in entropy and error in that estimate from the MBAR object
Output shape changes based on if there are unsampled states detected in the sampler
Returns: - DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Difference in enthalpy from each state relative to each other state
- dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
Error in the difference in enthalpy from each state relative to each other state
-
n_equilibration_iterations
¶ int: The number of equilibration interations.
-
statistical_inefficiency
¶ float: The statistical inefficiency of the sampler.
-
has_log_weights
¶ Return True if the storage has log weights, False otherwise
-
kT
¶ Quantity of boltzmann constant times temperature of the phase in units of energy per mol
Allows conversion between dimensionless energy and unit bearing energy
-
n_iterations
¶ int: The total number of iterations of the phase.
-
n_replicas
¶ int: Number of replicas.
-
n_states
¶ int: Number of sampled thermodynamic states.
-
name
¶ User-readable string name of the phase
-
observables
¶ List of observables that the instanced analyzer can compute/fetch.
-
read_energies
()¶ Extract energies from the ncfile and order them by replica, state, iteration.
Returns: - sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]
Potential energy matrix of the sampled states.
- unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]
Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N).
- neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]
Neighborhood energies were computed at, uses a boolean mask over the energy_matrix.
- replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]
States sampled by the replicas in the energy_matrix
-
read_logZ
(iteration=None)¶ Extract logZ estimates from the ncfile, if present. Returns ValueError if not present.
Parameters: - iteration : int or slice, optional, default=None
If specified, iteration or slice of iterations to extract
Returns: - logZ : np.ndarray of shape [n_states, n_iterations]
logZ[l,n] is the online logZ estimate for state
l
at iterationn
-
read_log_weights
()¶ Extract log weights from the ncfile, if present. Returns ValueError if not present.
Returns: - log_weights : np.ndarray of shape [n_states, n_iterations]
log_weights[l,n] is the log weight applied to state
l
during the collection of samples at iterationn
-
reference_states
¶ Tuple of reference states
i
andj
forMultiPhaseAnalyzer
instances
-
reformat_energies_for_mbar
(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)¶ Convert [replica, state, iteration] data into [state, total_iteration] data
This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified.
Parameters: - u_kln : np.ndarray of shape (K,L,N’)
K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k
- n_k : np.ndarray of shape K or None
Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion.
If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k
WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state.
Returns: - u_ln : np.ndarray of shape (L, N)
Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’
-
reporter
¶ Sampler Reporter tied to this object.
-
use_online_data
¶ Get the online data flag
-
class
yank.multistate.multistateanalyzer.
MultiPhaseAnalyzer
(phases)[source]¶ Multiple Phase Analyzer creator, not to be directly called itself, but instead called by adding or subtracting different implemented
PhaseAnalyzer
or otherMultiPhaseAnalyzers
‘s. The individual Phases of theMultiPhaseAnalyzer
are only references to existing Phase objects, not copies. AllPhaseAnalyzer
andMultiPhaseAnalyzer
classes support+
and-
operations.The observables of this phase are determined through inspection of all the passed in phases and only observables which are shared can be computed. For example:
PhaseA
has.get_free_energy
and.get_entropy
PhaseB
has.get_free_energy
and.get_enthalpy
,PhaseAB = PhaseA + PhaseB
will only have a.get_free_energy
methodBecause each Phase may have a different number of states, the
reference_states
property of each phase determines which states from each phase to read the data from.For observables defined by two states, the i’th and j’th reference states are used:
If we define
PhaseAB = PhaseA - PhaseB
Then
PhaseAB.get_free_energy()
is roughly equivalent to doing the following:A_i, A_j = PhaseA.reference_states
B_i, B_j = PhaseB.reference_states
PhaseA.get_free_energy()[A_i, A_j] - PhaseB.get_free_energy()[B_i, B_j]
The above is not exact since get_free_energy returns an error estimate as well
For observables defined by a single state, only the i’th reference state is used
Given
PhaseAB = PhaseA + PhaseB
,PhaseAB.get_temperature()
is equivalent to:A_i = PhaseA.reference_states[0]
B_i = PhaseB.reference_states[0]
PhaseA.get_temperature()[A_i] + PhaseB.get_temperature()[B_i]
For observables defined entirely by the phase, no reference states are needed.
Given
PhaseAB = PhaseA + PhaseB
,PhaseAB.get_standard_state_correction()
gives:PhaseA.get_standard_state_correction() + PhaseB.get_standard_state_correction()
Each phase MUST use the same ObservablesRegistry, otherwise an error is raised
This class is public to see its API.
Parameters: - phases : dict
has keys “phases”, “names”, and “signs”
See also
Attributes: observables
List of observables this
MultiPhaseAnalyzer
can generatephases
List of implemented
PhaseAnalyzer
‘s objects thisMultiPhaseAnalyzer
is tied tonames
Unique list of string names identifying this phase.
signs
List of signs that are used by the
MultiPhaseAnalyzer
to- registry
-
observables
¶ List of observables this
MultiPhaseAnalyzer
can generate
-
phases
¶ List of implemented
PhaseAnalyzer
‘s objects thisMultiPhaseAnalyzer
is tied to
-
names
¶ Unique list of string names identifying this phase. If this
MultiPhaseAnalyzer
is combined with another, its possible that new names will be generated unique to thatMultiPhaseAnalyzer
, but will still reference the same phase.When in doubt, use
MultiPhaseAnalyzer.phases()
to get the actual phase objects.
-
signs
¶ List of signs that are used by the
MultiPhaseAnalyzer
to