MultiStateAnalyzers

Analysis tools and module for MultiStateSampler simulations. Provides programmatic and automatic “best practices” integration to determine free energy and other observables.

Fully extensible to support new samplers and observables.

class yank.multistate.multistateanalyzer.ObservablesRegistry[source]

Registry of computable observables.

This is a class accessed by the PhaseAnalyzer objects to check which observables can be computed, and then provide a regular categorization of them.

This registry is a required linked component of any PhaseAnalyzer and especially of the MultiPhaseAnalyzer. This is not an internal class to the PhaseAnalyzer however because it can be instanced, extended, and customized as part of the API for this module.

To define your own methods: 1) Choose a unique observable name. 2) Categorize the observable in one of the following ways by adding to the list in the “observables_X” method:

2a) “defined_by_phase”:
Depends on the Phase as a whole (state independent)
2b) “defined_by_single_state”:
Computed entirely from one state, e.g. Radius of Gyration
2c) “defined_by_two_states”:
Property is relative to some reference state, such as Free Energy Difference
  1. Optionally categorize the error category calculation in the “observables_with_error_adding_Y” methods If not placed in an error category, the observable will be assumed not to carry error Examples: A, B, C are the observable in 3 phases, eA, eB, eC are the error of the observable in each phase

    3a) “linear”: Error between phases adds linearly.

    If C = A + B, eC = eA + eB

    3b) “quadrature”: Error between phases adds in the square.

    If C = A + B, eC = sqrt(eA^2 + eB^2)

  2. Finally, to add this observable to the phase, implement a “get_{method name}” method to the subclass of YankPhaseAnalyzer. Any MultiPhaseAnalyzer composed of this phase will automatically have the “get_{method name}” if all other phases in the MultiPhaseAnalyzer have the same method.

register_two_state_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]

Register a new two state observable, or re-register an existing one.

Parameters:
name: str

Name of the observable, will be cast to all lower case and spaces replaced with underscores

error_class: “quad”, “linear”, or None

How the error of the observable is computed when added with other errors from the same observable.

  • “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
  • “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
  • None: Does not carry error
re_register: bool, optional, Default: False

Re-register an existing observable

register_one_state_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]

Register a new one state observable, or re-register an existing one.

Parameters:
name: str

Name of the observable, will be cast to all lower case and spaces replaced with underscores

error_class: “quad”, “linear”, or None

How the error of the observable is computed when added with other errors from the same observable.

  • “quad”: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
  • “linear”: Adds linearly, Observable C = A + B, Error eC = eA + eB
  • None: Does not carry error
re_register: bool, optional, Default: False

Re-register an existing observable

register_phase_observable(name: str, error_class: typing.Union[str, NoneType] = None, re_register: bool = False)[source]

Register a new observable defined by phaee, or re-register an existing one.

Parameters:
name: str

Name of the observable, will be cast to all lower case and spaces replaced with underscores

error_class: ‘quad’, ‘linear’, or None

How the error of the observable is computed when added with other errors from the same observable.

  • ‘quad’: Adds in the quadrature, Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)
  • ‘linear’: Adds linearly, Observable C = A + B, Error eC = eA + eB
  • None: Does not carry error
re_register: bool, optional, Default: False

Re-register an existing observable

observables

Set of observables which are derived from the subsets below

observables_defined_by_two_states

Observables that require an i and a j state to define the observable accurately between phases

observables_defined_by_single_state

Defined observables which are fully defined by a single state, and not by multiple states such as differences

observables_defined_by_phase

Observables which are defined by the phase as a whole, and not defined by any 1 or more states e.g. Standard State Correction

observables_with_error

Determine which observables have error by inspecting the the error subsets

observables_with_error_adding_quadrature

Observable C = A + B, Error eC = sqrt(eA**2 + eB**2)

observables_with_error_adding_linear

Observable C = A + B, Error eC = eA + eB

class yank.multistate.multistateanalyzer.PhaseAnalyzer(reporter, name=None, reference_states=(0, -1), max_n_iterations=None, analysis_kwargs=None, registry=<yank.multistate.multistateanalyzer.ObservablesRegistry object>, use_online_data=True, use_full_trajectory=False)[source]

Analyzer for a single phase of a MultiState simulation.

Uses the reporter from the simulation to determine the location of all variables.

To compute a specific observable in an implementation of this class, add it to the ObservableRegistry and then implement a get_X where X is the name of the observable you want to compute. See the ObservablesRegistry for information about formatting the observables.

Analyzer works in units of kT unless specifically stated otherwise. To convert back to a unit set, just multiply by the .kT property.

A PhaseAnalyzer also needs an ObservablesRegistry to track how to handle each observable given implemented within for things like error and cross-phase analysis.

Parameters:
reporter : MultiStateReporter instance

Reporter from MultiState which ties to the simulation data on disk.

name : str, Optional

Unique name you want to assign this phase, this is the name that will appear in MultiPhaseAnalyzer‘s. If not set, it will be given the arbitrary name “phase#” where # is an integer, chosen in order that it is assigned to the MultiPhaseAnalyzer.

max_n_iterations : int, optional

The maximum number of iterations to analyze. If not provided, all the iterations will be analyzed.

reference_states : tuple of ints, length 2, Optional, Default: (0,-1)

Integers i and j of the state that is used for reference in observables, “O”. These values are only used when reporting single numbers or combining observables through MultiPhaseAnalyzer (since the number of states between phases can be different). Calls to functions such as get_free_energy in a single Phase results in the O being returned for all states.

For O completely defined by the state itself (i.e. no differences between states, e.g. Temperature), only O[i] is used

For O where differences between states are required (e.g. Free Energy): O[i,j] = O[j] - O[i]

For O defined by the phase as a whole, the reference states are not needed.

analysis_kwargs : None or dict, optional

Dictionary of extra keyword arguments to pass into the analysis tool, typically MBAR. For instance, the initial guess of relative free energies to give to MBAR would be something like: {'initial_f_k':[0,1,2,3]}

use_online_data : bool, optional, Default: True

Attempt to read online analysis data as a way to hot-start the analysis computation. This will attempt to read the data stored by the MultiStateAnalyzer.

If this is set to False, the online analysis data is not read.

If this is set to False after being initialized, the CachedProperty dependencies are all invalidated and properties will be computed from scratch on next observables

If no online data is found, this setting has no effect

use_full_trajectory : bool, optional, Default: False

Force the analysis to use the full trajectory when automatically computing equilibration and decorrelation.

Normal behavior (when this is False) is to discard the initial trajectory due to automatic equilibration detection, and then subsample that data to generate decorelated samples. Setting this to True ignores this effect, even if the equilibration data is computed.

This can be changed after the PhaseAnalyzer has been created to re-compute properties with or without the full trajectory.

registry : ObservablesRegistry instance

Instanced ObservablesRegistry with all observables implemented through a get_X function classified and registered. Any cross-phase analysis must use the same instance of an ObservablesRegistry

Attributes:
name

User-readable string name of the phase

observables

List of observables that the instanced analyzer can compute/fetch.

max_n_iterations
reference_states

Tuple of reference states i and j for MultiPhaseAnalyzer instances

n_iterations

int: The total number of iterations of the phase.

n_replicas

int: Number of replicas.

n_states

int: Number of sampled thermodynamic states.

kT

Quantity of boltzmann constant times temperature of the phase in units of energy per mol

reporter

Sampler Reporter tied to this object.

registry
use_online_data

Get the online data flag

use_full_trajectory
clear()[source]

Reset all cached objects.

This must to be called if the information in the reporter changes after analysis.

name

User-readable string name of the phase

observables

List of observables that the instanced analyzer can compute/fetch.

reference_states

Tuple of reference states i and j for MultiPhaseAnalyzer instances

n_iterations

int: The total number of iterations of the phase.

n_replicas

int: Number of replicas.

n_states

int: Number of sampled thermodynamic states.

kT

Quantity of boltzmann constant times temperature of the phase in units of energy per mol

Allows conversion between dimensionless energy and unit bearing energy

reporter

Sampler Reporter tied to this object.

use_online_data

Get the online data flag

read_energies()[source]

Extract energies from the ncfile and order them by replica, state, iteration.

Returns:
sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]

Potential energy matrix of the sampled states.

unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]

Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N).

neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]

Neighborhood energies were computed at, uses a boolean mask over the energy_matrix.

replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]

States sampled by the replicas in the energy_matrix

has_log_weights

Return True if the storage has log weights, False otherwise

read_log_weights()[source]

Extract log weights from the ncfile, if present. Returns ValueError if not present.

Returns:
log_weights : np.ndarray of shape [n_states, n_iterations]

log_weights[l,n] is the log weight applied to state l during the collection of samples at iteration n

read_logZ(iteration=None)[source]

Extract logZ estimates from the ncfile, if present. Returns ValueError if not present.

Parameters:
iteration : int or slice, optional, default=None

If specified, iteration or slice of iterations to extract

Returns:
logZ : np.ndarray of shape [n_states, n_iterations]

logZ[l,n] is the online logZ estimate for state l at iteration n

get_effective_energy_timeseries(energies=None, replica_state_indices=None)[source]

Generate the effective energy (negative log deviance) timeseries that is generated for this phase

The effective energy for a series of samples x_n, n = 1..N, is defined as

u_n = - ln pi(x_n) + c

where pi(x) is the probability density being sampled, and c is an arbitrary constant.

Parameters:
energies : ndarray of shape (K,L,N), optional, Default: None

Energies from replicas K, sampled states L, and iterations N If provided, then states input_sampled_states must also be provided

replica_state_indices : ndarray of shape (K,N), optional, Default: None

Integer indices of each sampled state (matching L dimension in input_energy) that each replica K sampled every iteration N. If provided, then states input_energies must also be provided

Returns:
u_n : ndarray of shape (N,)

u_n[n] is the negative log deviance of the same from iteration n Timeseries used to determine equilibration time and statistical inefficiency.

static reformat_energies_for_mbar(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)[source]

Convert [replica, state, iteration] data into [state, total_iteration] data

This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified.

Parameters:
u_kln : np.ndarray of shape (K,L,N’)

K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k

n_k : np.ndarray of shape K or None

Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion.

If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k

WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state.

Returns:
u_ln : np.ndarray of shape (L, N)

Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’

class yank.multistate.multistateanalyzer.MultiStateSamplerAnalyzer(*args, unbias_restraint=True, restraint_energy_cutoff='auto', restraint_distance_cutoff='auto', **kwargs)[source]

The MultiStateSamplerAnalyzer is the analyzer for a simulation generated from a MultiStateSampler simulation, implemented as an instance of the PhaseAnalyzer.

Parameters:
unbias_restraint : bool, optional

If True and a radially-symmetric restraint was used in the simulation, the analyzer will remove the bias introduced by the restraint by reweighting each of the end-points to a state using a square-well potential restraint.

restraint_energy_cutoff : float or ‘auto’, optional

When the restraint is unbiased, the analyzer discards all the samples for which the restrain potential energy (in kT) is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance such that the restraint potential energy is equal to restraint_energy_cutoff.

If 'auto' and restraint_distance_cutoff is None, this will be set to the 99.9-percentile of the distribution of the restraint energies in the bound state.

restraint_distance_cutoff : simtk.unit.Quantity or ‘auto’, optional

When the restraint is unbiased, the analyzer discards all the samples for which the distance between the restrained atoms is above this cutoff. Effectively, this is equivalent to placing a hard wall potential at a restraint distance restraint_distance_cutoff.

If 'auto' and restraint_energy_cutoff is not specified, this will be set to the 99.9-percentile of the distribution of the restraint distances in the bound state.

See also

PhaseAnalyzer

Attributes:
unbias_restraint
restraint_energy_cutoff
restraint_distance_cutoff
mbar
n_equilibration_iterations

int: The number of equilibration interations.

statistical_inefficiency

float: The statistical inefficiency of the sampler.

clear()[source]

Reset all cached objects.

This must to be called if the information in the reporter changes after analysis.

generate_mixing_statistics(number_equilibrated: typing.Union[int, NoneType] = None) → typing.NamedTuple[source]

Compute and return replica mixing statistics.

Compute the transition state matrix, its eigenvalues sorted from greatest to least, and the state index correlation function.

Parameters:
number_equilibrated : int, optional, default=None

If specified, only samples number_equilibrated:end will be used in analysis. If not specified, automatically retrieves the number from equilibration data or generates it from the internal energy.

Returns:
mixing_statistics : namedtuple

A namedtuple containing the following attributes: - transition_matrix: (nstates by nstates np.array) - eigenvalues: (nstates-dimensional np.array) - statistical_inefficiency: float

show_mixing_statistics(cutoff=0.05, number_equilibrated=None)[source]

Print summary of mixing statistics. Passes information off to generate_mixing_statistics then prints it out to the logger

Parameters:
cutoff : float, optional, default=0.05

Only transition probabilities above ‘cutoff’ will be printed

number_equilibrated : int, optional, default=None

If specified, only samples number_equilibrated:end will be used in analysis If not specified, it uses the internally held statistics best

get_effective_energy_timeseries(energies=None, replica_state_indices=None)[source]

Generate the effective energy (negative log deviance) timeseries that is generated for this phase.

The effective energy for a series of samples x_n, n = 1..N, is defined as

u_n = - ln pi(x_n) + c

where pi(x) is the probability density being sampled, and c is an arbitrary constant.

Parameters:
energies : ndarray of shape (K,L,N), optional, Default: None

Energies from replicas K, sampled states L, and iterations N. If provided, then states input_sampled_states must also be provided.

replica_state_indices : ndarray of shape (K,N), optional, Default: None

Integer indices of each sampled state (matching L dimension in input_energy). that each replica K sampled every iteration N. If provided, then states input_energies must also be provided.

Returns:
u_n : ndarray of shape (N,)

u_n[n] is the negative log deviance of the same from iteration n Timeseries used to determine equilibration time and statistical inefficiency.

get_free_energy()[source]

Compute the free energy and error in free energy from the MBAR object

Output shape changes based on if there are unsampled states detected in the sampler

Returns:
DeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Difference in free energy from each state relative to each other state

dDeltaF_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Error in the difference in free energy from each state relative to each other state

get_enthalpy()[source]

Compute the difference in enthalpy and error in that estimate from the MBAR object

Output shape changes based on if there are unsampled states detected in the sampler

Returns:
DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Difference in enthalpy from each state relative to each other state

dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Error in the difference in enthalpy from each state relative to each other state

get_entropy()[source]

Compute the difference in entropy and error in that estimate from the MBAR object

Output shape changes based on if there are unsampled states detected in the sampler

Returns:
DeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Difference in enthalpy from each state relative to each other state

dDeltaH_ij : ndarray of floats, shape (K,K) or (K+2, K+2)

Error in the difference in enthalpy from each state relative to each other state

n_equilibration_iterations

int: The number of equilibration interations.

statistical_inefficiency

float: The statistical inefficiency of the sampler.

has_log_weights

Return True if the storage has log weights, False otherwise

kT

Quantity of boltzmann constant times temperature of the phase in units of energy per mol

Allows conversion between dimensionless energy and unit bearing energy

n_iterations

int: The total number of iterations of the phase.

n_replicas

int: Number of replicas.

n_states

int: Number of sampled thermodynamic states.

name

User-readable string name of the phase

observables

List of observables that the instanced analyzer can compute/fetch.

read_energies()

Extract energies from the ncfile and order them by replica, state, iteration.

Returns:
sampled_energy_matrix : np.ndarray of shape [n_replicas, n_states, n_iterations]

Potential energy matrix of the sampled states.

unsampled_energy_matrix : np.ndarray of shape [n_replicas, n_unsamped_states, n_iterations]

Potential energy matrix of the unsampled states. Energy from each drawn sample n, evaluated at unsampled state l. If no unsampled states were drawn, this will be shape (0,N).

neighborhoods : np.ndarray of shape [n_replicas, n_states, n_iterations]

Neighborhood energies were computed at, uses a boolean mask over the energy_matrix.

replica_state_indices : np.ndarray of shape [n_replicas, n_iterations]

States sampled by the replicas in the energy_matrix

read_logZ(iteration=None)

Extract logZ estimates from the ncfile, if present. Returns ValueError if not present.

Parameters:
iteration : int or slice, optional, default=None

If specified, iteration or slice of iterations to extract

Returns:
logZ : np.ndarray of shape [n_states, n_iterations]

logZ[l,n] is the online logZ estimate for state l at iteration n

read_log_weights()

Extract log weights from the ncfile, if present. Returns ValueError if not present.

Returns:
log_weights : np.ndarray of shape [n_states, n_iterations]

log_weights[l,n] is the log weight applied to state l during the collection of samples at iteration n

reference_states

Tuple of reference states i and j for MultiPhaseAnalyzer instances

reformat_energies_for_mbar(u_kln: numpy.ndarray, n_k: typing.Union[numpy.ndarray, NoneType] = None)

Convert [replica, state, iteration] data into [state, total_iteration] data

This method assumes that the first dimension are all samplers, the second dimension are all the thermodynamic states energies were evaluated at and an equal number of samples were drawn from each k’th sampler, UNLESS n_k is specified.

Parameters:
u_kln : np.ndarray of shape (K,L,N’)

K = number of replica samplers L = number of thermodynamic states, N’ = number of iterations from state k

n_k : np.ndarray of shape K or None

Number of samples each _SAMPLER_ (k) has drawn This allows you to have trailing entries on a given kth row in the n’th (n prime) index which do not contribute to the conversion.

If this is None, assumes ALL samplers have the same number of samples such that N_k = N’ for all k

WARNING: N_k is number of samples the SAMPLER drew in total, NOT how many samples were drawn from each thermodynamic state L. This method knows nothing of how many samples were drawn from each state.

Returns:
u_ln : np.ndarray of shape (L, N)

Reduced, non-sparse data format L = number of thermodynamic states N = sum_k N_k. note this is not N’

reporter

Sampler Reporter tied to this object.

use_online_data

Get the online data flag

class yank.multistate.multistateanalyzer.MultiPhaseAnalyzer(phases)[source]

Multiple Phase Analyzer creator, not to be directly called itself, but instead called by adding or subtracting different implemented PhaseAnalyzer or other MultiPhaseAnalyzers‘s. The individual Phases of the MultiPhaseAnalyzer are only references to existing Phase objects, not copies. All PhaseAnalyzer and MultiPhaseAnalyzer classes support + and - operations.

The observables of this phase are determined through inspection of all the passed in phases and only observables which are shared can be computed. For example:

PhaseA has .get_free_energy and .get_entropy

PhaseB has .get_free_energy and .get_enthalpy,

PhaseAB = PhaseA + PhaseB will only have a .get_free_energy method

Because each Phase may have a different number of states, the reference_states property of each phase determines which states from each phase to read the data from.

For observables defined by two states, the i’th and j’th reference states are used:

If we define PhaseAB = PhaseA - PhaseB

Then PhaseAB.get_free_energy() is roughly equivalent to doing the following:

A_i, A_j = PhaseA.reference_states

B_i, B_j = PhaseB.reference_states

PhaseA.get_free_energy()[A_i, A_j] - PhaseB.get_free_energy()[B_i, B_j]

The above is not exact since get_free_energy returns an error estimate as well

For observables defined by a single state, only the i’th reference state is used

Given PhaseAB = PhaseA + PhaseB, PhaseAB.get_temperature() is equivalent to:

A_i = PhaseA.reference_states[0]

B_i = PhaseB.reference_states[0]

PhaseA.get_temperature()[A_i] + PhaseB.get_temperature()[B_i]

For observables defined entirely by the phase, no reference states are needed.

Given PhaseAB = PhaseA + PhaseB, PhaseAB.get_standard_state_correction() gives:

PhaseA.get_standard_state_correction() + PhaseB.get_standard_state_correction()

Each phase MUST use the same ObservablesRegistry, otherwise an error is raised

This class is public to see its API.

Parameters:
phases : dict

has keys “phases”, “names”, and “signs”

Attributes:
observables

List of observables this MultiPhaseAnalyzer can generate

phases

List of implemented PhaseAnalyzer‘s objects this MultiPhaseAnalyzer is tied to

names

Unique list of string names identifying this phase.

signs

List of signs that are used by the MultiPhaseAnalyzer to

registry
observables

List of observables this MultiPhaseAnalyzer can generate

phases

List of implemented PhaseAnalyzer‘s objects this MultiPhaseAnalyzer is tied to

names

Unique list of string names identifying this phase. If this MultiPhaseAnalyzer is combined with another, its possible that new names will be generated unique to that MultiPhaseAnalyzer, but will still reference the same phase.

When in doubt, use MultiPhaseAnalyzer.phases() to get the actual phase objects.

signs

List of signs that are used by the MultiPhaseAnalyzer to

clear()[source]

Clear the individual phases of their observables and estimators for re-computing quantities