MultiStateAnalyzers¶
Analysis tools and module for MultiStateSampler simulations. Provides programmatic and automatic “best practices” integration to determine free energy and other observables.
Fully extensible to support new samplers and observables.
- 
class yank.multistate.multistateanalyzer.ObservablesRegistry[source]¶
- Registry of computable observables. - This is a class accessed by the - PhaseAnalyzerobjects to check which observables can be computed, and then provide a regular categorization of them.- This registry is a required linked component of any PhaseAnalyzer and especially of the MultiPhaseAnalyzer. This is not an internal class to the PhaseAnalyzer however because it can be instanced, extended, and customized as part of the API for this module. - To define your own methods: 1) Choose a unique observable name. 2) Categorize the observable in one of the following ways by adding to the list in the “observables_X” method: - 2a) “defined_by_phase”:
- Depends on the Phase as a whole (state independent)
- 2b) “defined_by_single_state”:
- Computed entirely from one state, e.g. Radius of Gyration
- 2c) “defined_by_two_states”:
- Property is relative to some reference state, such as Free Energy Difference
 - Optionally categorize the error category calculation in the “observables_with_error_adding_Y” methods If not placed in an error category, the observable will be assumed not to carry error Examples: A, B, C are the observable in 3 phases, eA, eB, eC are the error of the observable in each phase - 3a) “linear”: Error between phases adds linearly.
- If C = A + B, eC = eA + eB 
- 3b) “quadrature”: Error between phases adds in the square.
- If C = A + B, eC = sqrt(eA^2 + eB^2) 
 
- Finally, to add this observable to the phase, implement a “get_{method name}” method to the subclass of - YankPhaseAnalyzer. Any- MultiPhaseAnalyzercomposed of this phase will automatically have the “get_{method name}” if all other phases in the- MultiPhaseAnalyzerhave the same method.
 - 
register_two_state_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶
- Register a new two state observable, or re-register an existing one. - Parameters: - name: str
- Name of the observable, will be cast to all lower case and spaces replaced with underscores 
- error_class: “quad”, “linear”, or None
- How the error of the observable is computed when added with other errors from the same observable. - “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
 
- re_register: bool, optional, Default: False
- Re-register an existing observable 
 
 - 
register_one_state_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶
- Register a new one state observable, or re-register an existing one. - Parameters: - name: str
- Name of the observable, will be cast to all lower case and spaces replaced with underscores 
- error_class: “quad”, “linear”, or None
- How the error of the observable is computed when added with other errors from the same observable. - “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
 
- re_register: bool, optional, Default: False
- Re-register an existing observable 
 
 - 
register_phase_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]¶
- Register a new observable defined by phaee, or re-register an existing one. - Parameters: - name: str
- Name of the observable, will be cast to all lower case and spaces replaced with underscores 
- error_class: ‘quad’, ‘linear’, or None
- How the error of the observable is computed when added with other errors from the same observable. - ‘quad’: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
- ‘linear’: Adds linearly, Observable C = A + B, Error eC = eA + eB
- None: Does not carry error
 
- re_register: bool, optional, Default: False
- Re-register an existing observable 
 
 - 
observables¶
- Set of observables which are derived from the subsets below 
 - 
observables_defined_by_two_states¶
- Observables that require an i and a j state to define the observable accurately between phases 
 - 
observables_defined_by_single_state¶
- Defined observables which are fully defined by a single state, and not by multiple states such as differences 
 - 
observables_defined_by_phase¶
- Observables which are defined by the phase as a whole, and not defined by any 1 or more states e.g. Standard State Correction 
 - 
observables_with_error¶
- Determine which observables have error by inspecting the the error subsets 
 - 
observables_with_error_adding_quadrature¶
- Observable C = A + B, Error eC = sqrt(eA**2 + eB**2) 
 - 
observables_with_error_adding_linear¶
- Observable C = A + B, Error eC = eA + eB 
 
- 
class yank.multistate.multistateanalyzer.PhaseAnalyzer(reporter, name=None, reference_states=(0, -1), max_n_iterations=None, analysis_kwargs=None, registry=<yank.multistate.multistateanalyzer.ObservablesRegistry object>, use_online_data=True)[source]¶
- Analyzer for a single phase of a MultiState simulation. - Uses the reporter from the simulation to determine the location of all variables. - To compute a specific observable in an implementation of this class, add it to the ObservableRegistry and then implement a - get_Xwhere- Xis the name of the observable you want to compute. See the ObservablesRegistry for information about formatting the observables.- Analyzer works in units of kT unless specifically stated otherwise. To convert back to a unit set, just multiply by the .kT property. - A PhaseAnalyzer also needs an ObservablesRegistry to track how to handle each observable given implemented within for things like error and cross-phase analysis. - Parameters: - reporter : MultiStateReporter instance
- Reporter from MultiState which ties to the simulation data on disk. 
- name : str, Optional
- Unique name you want to assign this phase, this is the name that will appear in - MultiPhaseAnalyzer‘s. If not set, it will be given the arbitrary name “phase#” where # is an integer, chosen in order that it is assigned to the- MultiPhaseAnalyzer.
- max_n_iterations : int, optional
- The maximum number of iterations to analyze. If not provided, all the iterations will be analyzed. 
- reference_states : tuple of ints, length 2, Optional, Default: (0,-1)
- Integers - iand- jof the state that is used for reference in observables, “O”. These values are only used when reporting single numbers or combining observables through- MultiPhaseAnalyzer(since the number of states between phases can be different). Calls to functions such as- get_free_energyin a single Phase results in the O being returned for all states.- For O completely defined by the state itself (i.e. no differences between states, e.g. Temperature), only O[i] is used - For O where differences between states are required (e.g. Free Energy): O[i,j] = O[j] - O[i] - For O defined by the phase as a whole, the reference states are not needed. 
- analysis_kwargs : None or dict, optional
- Dictionary of extra keyword arguments to pass into the analysis tool, typically MBAR. For instance, the initial guess of relative free energies to give to MBAR would be something like: - {'initial_f_k':[0,1,2,3]}
- use_online_data : bool, optional, Default: True
- Attempt to read online analysis data as a way to hot-start the analysis computation. This will attempt to read the data stored by the MultiStateAnalyzer. - If this is set to - False, the online analysis data is not read.- If this is set to - Falseafter being initialized, the- CachedPropertydependencies are all invalidated and properties will be computed from scratch on next observables- If no online data is found, this setting has no effect 
- registry : ObservablesRegistry instance
- Instanced ObservablesRegistry with all observables implemented through a - get_Xfunction classified and registered. Any cross-phase analysis must use the same instance of an ObservablesRegistry
 - See also - Attributes: - name
- User-readable string name of the phase 
- observables
- List of observables that the instanced analyzer can compute/fetch. 
- max_n_iterations
- reference_states
- Tuple of reference states - iand- jfor- MultiPhaseAnalyzerinstances
- n_iterations
- int: The total number of iterations of the phase. 
- n_replicas
- int: Number of replicas. 
- n_states
- int: Number of sampled thermodynamic states. 
- kT
- Quantity of boltzmann constant times temperature of the phase in units of energy per mol 
- reporter
- Sampler Reporter tied to this object. 
- registry
- use_online_data
- Get the online data flag 
 - 
clear()[source]¶
- Reset all cached objects. - This must to be called if the information in the reporter changes after analysis. 
 - 
name¶
- User-readable string name of the phase 
 - 
observables¶
- List of observables that the instanced analyzer can compute/fetch. 
 - 
reference_states¶
- Tuple of reference states - iand- jfor- MultiPhaseAnalyzerinstances
 - 
n_iterations¶
- int: The total number of iterations of the phase. 
 - 
n_replicas¶
- int: Number of replicas. 
 - 
n_states¶
- int: Number of sampled thermodynamic states. 
 - 
kT¶
- Quantity of boltzmann constant times temperature of the phase in units of energy per mol - Allows conversion between dimensionless energy and unit bearing energy 
 - 
reporter¶
- Sampler Reporter tied to this object. 
 - 
use_online_data¶
- Get the online data flag 
 - 
read_energies()[source]¶
- Extract energies from the ncfile and order them by replica, state, iteration.Returns: - sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]
- Potential energy matrix of the sampled states. - unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]
- Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N). 
- neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]
- Neighborhood energies were computed at, uses a boolean mask over the energy_matrix. 
- replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]
- States sampled by the replicas in the energy_matrix 
 
 
 - 
has_log_weights¶
- Return True if the storage has log weights, False otherwise 
 - 
read_log_weights()[source]¶
- Extract log weights from the ncfile, if present. Returns ValueError if not present. - Returns: - log_weights : np.ndarray of shape [n_states, n_iterations]
- log_weights[l,n] is the log weight applied to state - lduring the collection of samples at iteration- n
 
 - 
read_logZ(iteration=None)[source]¶
- Extract logZ estimates from the ncfile, if present. Returns ValueError if not present. - Parameters: - iteration : int or slice, optional, default=None
- If specified, iteration or slice of iterations to extract 
 - Returns: - logZ : np.ndarray of shape [n_states, n_iterations]
- logZ[l,n] is the online logZ estimate for state - lat iteration- n
 
 - 
get_effective_energy_timeseries(energies=None, replica_state_indices=None)[source]¶
- Generate the effective energy (negative log deviance) timeseries that is generated for this phase - The effective energy for a series of samples x_n, n = 1..N, is defined as - u_n = - ln pi(x_n) + c - where pi(x) is the probability density being sampled, and c is an arbitrary constant. - Parameters: - energies : ndarray of shape (K,L,N), optional, Default: None
- Energies from replicas K, sampled states L, and iterations N If provided, then states input_sampled_states must also be provided 
- replica_state_indices : ndarray of shape (K,N), optional, Default: None
- Integer indices of each sampled state (matching L dimension in input_energy) that each replica K sampled every iteration N. If provided, then states input_energies must also be provided 
 - Returns: - u_n : ndarray of shape (N,)
- u_n[n] is the negative log deviance of the same from iteration - nTimeseries used to determine equilibration time and statistical inefficiency.
 
 - 
static reformat_energies_for_mbar(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)[source]¶
- Convert [replica, state, iteration] data into [state, total_iteration] data - This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified. - Parameters: - u_kln : np.ndarray of shape (K,L,N’)
- K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k 
- n_k : np.ndarray of shape K or None
- Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion. - If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k - WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state. 
 - Returns: - u_ln : np.ndarray of shape (L, N)
- Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’ 
 
 
- 
class yank.multistate.multistateanalyzer.MultiStateSamplerAnalyzer(*args, unbias_restraint=True, restraint_energy_cutoff='auto', restraint_distance_cutoff='auto', **kwargs)[source]¶
- The MultiStateSamplerAnalyzer is the analyzer for a simulation generated from a MultiStateSampler simulation, implemented as an instance of the - PhaseAnalyzer.- Parameters: - unbias_restraint : bool, optional
- If True and a radially-symmetric restraint was used in the simulation, the analyzer will remove the bias introduced by the restraint by reweighting each of the end-points to a state using a square-well potential restraint. 
- restraint_energy_cutoff : float or ‘auto’, optional
- When the restraint is unbiased, the analyzer discards all the samples for which the restrain potential energy (in kT) is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance such that the restraint potential energy is equal to - restraint_energy_cutoff.- If - 'auto'and- restraint_distance_cutoffis- None, this will be set to the 99.9-percentile of the distribution of the restraint energies in the bound state.
- restraint_distance_cutoff : simtk.unit.Quantity or ‘auto’, optional
- When the restraint is unbiased, the analyzer discards all the samples for which the distance between the restrained atoms is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance - restraint_distance_cutoff.- If - 'auto'and- restraint_energy_cutoffis not specified, this will be set to the 99.9-percentile of the distribution of the restraint distances in the bound state.
 - See also - Attributes: - unbias_restraint
- restraint_energy_cutoff
- restraint_distance_cutoff
- mbar
- n_equilibration_iterations
- int: The number of equilibration interations. 
- statistical_inefficiency
- float: The statistical inefficiency of the sampler. 
 - 
clear()[source]¶
- Reset all cached objects. - This must to be called if the information in the reporter changes after analysis. 
 - 
generate_mixing_statistics(number_equilibrated: typing.Union[int, NoneType] = None) → typing.NamedTuple[source]¶
- Compute and return replica mixing statistics. - Compute the transition state matrix, its eigenvalues sorted from greatest to least, and the state index correlation function. - Parameters: - number_equilibrated : int, optional, default=None
- If specified, only samples - number_equilibrated:endwill be used in analysis. If not specified, automatically retrieves the number from equilibration data or generates it from the internal energy.
 - Returns: - mixing_statistics : namedtuple
- A namedtuple containing the following attributes: - - transition_matrix: (nstates by nstates- np.array) -- eigenvalues: (nstates-dimensional- np.array) -- statistical_inefficiency: float
 
 - 
show_mixing_statistics(cutoff=0.05, number_equilibrated=None)[source]¶
- Print summary of mixing statistics. Passes information off to generate_mixing_statistics then prints it out to the logger - Parameters: - cutoff : float, optional, default=0.05
- Only transition probabilities above ‘cutoff’ will be printed 
- number_equilibrated : int, optional, default=None
- If specified, only samples number_equilibrated:end will be used in analysis If not specified, it uses the internally held statistics best 
 
 - 
get_effective_energy_timeseries(energies=None, replica_state_indices=None)[source]¶
- Generate the effective energy (negative log deviance) timeseries that is generated for this phase. - The effective energy for a series of samples x_n, n = 1..N, is defined as - u_n = - ln pi(x_n) + c - where pi(x) is the probability density being sampled, and c is an arbitrary constant. - Parameters: - energies : ndarray of shape (K,L,N), optional, Default: None
- Energies from replicas K, sampled states L, and iterations N. If provided, then states input_sampled_states must also be provided. 
- replica_state_indices : ndarray of shape (K,N), optional, Default: None
- Integer indices of each sampled state (matching L dimension in input_energy). that each replica K sampled every iteration N. If provided, then states input_energies must also be provided. 
 - Returns: - u_n : ndarray of shape (N,)
- u_n[n] is the negative log deviance of the same from iteration - nTimeseries used to determine equilibration time and statistical inefficiency.
 
 - 
get_free_energy()[source]¶
- Compute the free energy and error in free energy from the MBAR object - Output shape changes based on if there are unsampled states detected in the sampler - Returns: - DeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Difference in free energy from each state relative to each other state 
- dDeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Error in the difference in free energy from each state relative to each other state 
 
 - 
get_enthalpy()[source]¶
- Compute the difference in enthalpy and error in that estimate from the MBAR object - Output shape changes based on if there are unsampled states detected in the sampler - Returns: - DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Difference in enthalpy from each state relative to each other state 
- dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Error in the difference in enthalpy from each state relative to each other state 
 
 - 
get_entropy()[source]¶
- Compute the difference in entropy and error in that estimate from the MBAR object - Output shape changes based on if there are unsampled states detected in the sampler - Returns: - DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Difference in enthalpy from each state relative to each other state 
- dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)
- Error in the difference in enthalpy from each state relative to each other state 
 
 - 
n_equilibration_iterations¶
- int: The number of equilibration interations. 
 - 
statistical_inefficiency¶
- float: The statistical inefficiency of the sampler. 
 - 
has_log_weights¶
- Return True if the storage has log weights, False otherwise 
 - 
kT¶
- Quantity of boltzmann constant times temperature of the phase in units of energy per mol - Allows conversion between dimensionless energy and unit bearing energy 
 - 
n_iterations¶
- int: The total number of iterations of the phase. 
 - 
n_replicas¶
- int: Number of replicas. 
 - 
n_states¶
- int: Number of sampled thermodynamic states. 
 - 
name¶
- User-readable string name of the phase 
 - 
observables¶
- List of observables that the instanced analyzer can compute/fetch. 
 - 
read_energies()¶
- Extract energies from the ncfile and order them by replica, state, iteration.Returns: - sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]
- Potential energy matrix of the sampled states. - unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]
- Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N). 
- neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]
- Neighborhood energies were computed at, uses a boolean mask over the energy_matrix. 
- replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]
- States sampled by the replicas in the energy_matrix 
 
 
 - 
read_logZ(iteration=None)¶
- Extract logZ estimates from the ncfile, if present. Returns ValueError if not present. - Parameters: - iteration : int or slice, optional, default=None
- If specified, iteration or slice of iterations to extract 
 - Returns: - logZ : np.ndarray of shape [n_states, n_iterations]
- logZ[l,n] is the online logZ estimate for state - lat iteration- n
 
 - 
read_log_weights()¶
- Extract log weights from the ncfile, if present. Returns ValueError if not present. - Returns: - log_weights : np.ndarray of shape [n_states, n_iterations]
- log_weights[l,n] is the log weight applied to state - lduring the collection of samples at iteration- n
 
 - 
reference_states¶
- Tuple of reference states - iand- jfor- MultiPhaseAnalyzerinstances
 - 
reformat_energies_for_mbar(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)¶
- Convert [replica, state, iteration] data into [state, total_iteration] data - This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified. - Parameters: - u_kln : np.ndarray of shape (K,L,N’)
- K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k 
- n_k : np.ndarray of shape K or None
- Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion. - If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k - WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state. 
 - Returns: - u_ln : np.ndarray of shape (L, N)
- Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’ 
 
 - 
reporter¶
- Sampler Reporter tied to this object. 
 - 
use_online_data¶
- Get the online data flag 
 
- 
class yank.multistate.multistateanalyzer.MultiPhaseAnalyzer(phases)[source]¶
- Multiple Phase Analyzer creator, not to be directly called itself, but instead called by adding or subtracting different implemented - PhaseAnalyzeror other- MultiPhaseAnalyzers‘s. The individual Phases of the- MultiPhaseAnalyzerare only references to existing Phase objects, not copies. All- PhaseAnalyzerand- MultiPhaseAnalyzerclasses support- +and- -operations.- The observables of this phase are determined through inspection of all the passed in phases and only observables which are shared can be computed. For example: - PhaseAhas- .get_free_energyand- .get_entropy- PhaseBhas- .get_free_energyand- .get_enthalpy,- PhaseAB = PhaseA + PhaseBwill only have a- .get_free_energymethod- Because each Phase may have a different number of states, the - reference_statesproperty of each phase determines which states from each phase to read the data from.- For observables defined by two states, the i’th and j’th reference states are used: - If we define - PhaseAB = PhaseA - PhaseB- Then - PhaseAB.get_free_energy()is roughly equivalent to doing the following:- A_i, A_j = PhaseA.reference_states- B_i, B_j = PhaseB.reference_states- PhaseA.get_free_energy()[A_i, A_j] - PhaseB.get_free_energy()[B_i, B_j]- The above is not exact since get_free_energy returns an error estimate as well - For observables defined by a single state, only the i’th reference state is used - Given - PhaseAB = PhaseA + PhaseB,- PhaseAB.get_temperature()is equivalent to:- A_i = PhaseA.reference_states[0]- B_i = PhaseB.reference_states[0]- PhaseA.get_temperature()[A_i] + PhaseB.get_temperature()[B_i]- For observables defined entirely by the phase, no reference states are needed. - Given - PhaseAB = PhaseA + PhaseB,- PhaseAB.get_standard_state_correction()gives:- PhaseA.get_standard_state_correction() + PhaseB.get_standard_state_correction()- Each phase MUST use the same ObservablesRegistry, otherwise an error is raised - This class is public to see its API. - Parameters: - phases : dict
- has keys “phases”, “names”, and “signs” 
 - See also - Attributes: - observables
- List of observables this - MultiPhaseAnalyzercan generate
- phases
- List of implemented - PhaseAnalyzer‘s objects this- MultiPhaseAnalyzeris tied to
- names
- Unique list of string names identifying this phase. 
- signs
- List of signs that are used by the - MultiPhaseAnalyzerto
- registry
 - 
observables¶
- List of observables this - MultiPhaseAnalyzercan generate
 - 
phases¶
- List of implemented - PhaseAnalyzer‘s objects this- MultiPhaseAnalyzeris tied to
 - 
names¶
- Unique list of string names identifying this phase. If this - MultiPhaseAnalyzeris combined with another, its possible that new names will be generated unique to that- MultiPhaseAnalyzer, but will still reference the same phase.- When in doubt, use - MultiPhaseAnalyzer.phases()to get the actual phase objects.
 - 
signs¶
- List of signs that are used by the - MultiPhaseAnalyzerto