The Updater
class is responsible for updating one or more unknown parts of the model. This class will be a base class for other classes that update particular parts of the model. A base class has virtual functions that may be replaced by functions of the same name and parameter signature in classes that are derived from this base class. The member function update
is an example of a virtual function. The version of update
created in Updater
will work for most things, but it is designated virtual
so that derived classes can replace it with something different if the general function does not work for the kind of update they are performing.
Some virtual functions are not provided with a function body in this class. These may be spotted because they end with = 0
in the class declaration. The presence of these pure virtual functions make this base class an abstract base class: you cannot create an object (variable) of type Updater
because some of its functions are not defined. (You have seen abstract base classes before in this tutorial: e.g. QMatrix
.) Why create an abstract class? These pure virtual functions are so specific to the update being performed that they must necessarily be different in every class derived from this base class. Making them pure virtual functions ensures that these specific replacements will be written (otherwise the program will not compile). Abstract base classes provide functionality that is common to a collection of derived classes, making the job of writing the derived classes easier because you need to focus only on the differences, not the similarities.
Create a new file updater.hpp and replace the default contents with the following class declaration.
#pragma once
#include "tree.hpp"
#include "tree_manip.hpp"
#include "lot.hpp"
#include "xstrom.hpp"
#include "likelihood.hpp"
namespace strom {
class Chain;
class Updater {
friend class Chain;
public:
typedef std::shared_ptr<Updater> SharedPtr;
TreeManip::SharedPtr getTreeManip() const;
Updater();
virtual ~Updater();
void setLikelihood(Likelihood::SharedPtr likelihood);
void setTreeManip(TreeManip::SharedPtr treemanip);
void setLot(Lot::SharedPtr lot);
void setLambda(double lambda);
void setHeatingPower(double p);
void setTuning(bool on);
void setTargetAcceptanceRate(double target);
void setPriorParameters(const std::vector<double> & c);
void setWeight(double w);
void calcProb(double wsum);
double getLambda() const;
double getWeight() const;
double getProb() const;
double getAcceptPct() const;
double getNumUpdates() const;
std::string getUpdaterName() const;
virtual void clear();
virtual double calcLogPrior() = 0;
double calcLogTopologyPrior() const;
double calcLogEdgeLengthPrior() const;
double calcLogLikelihood() const;
virtual double update(double prev_lnL);
static double getLogZero();
protected:
virtual void reset();
virtual void tune(bool accepted);
virtual void revert() = 0;
virtual void proposeNewState() = 0;
Lot::SharedPtr _lot;
Likelihood::SharedPtr _likelihood;
TreeManip::SharedPtr _tree_manipulator;
std::string _name;
double _weight;
double _prob;
double _lambda;
double _log_hastings_ratio;
double _log_jacobian;
double _target_acceptance;
unsigned _naccepts;
unsigned _nattempts;
bool _tuning;
std::vector<double> _prior_parameters;
double _heating_power;
static const double _log_zero;
};
// member function bodies go here
}
The constructor delegates all its initialization work to the clear
function, and the destructor does nothing.
inline Updater::Updater() {
//std::cout << "Updater constructor called" << std::endl;
clear();
}
inline Updater::~Updater() {
//std::cout << "Updater destructor called" << std::endl;
}
This function is used by the constructor (or called on its own) to reset an Updater
object to the same state it had when first constructed. Note that this function is designated virtual in the class declaration so that a derived class can substitute a version that specifically clears data members it has defined that are out of reach of the base class clear function. It is not a good idea to call this function often, because clear will erase potentially important information (e.g. the _prior_parameters
vector) that is important for updating a parameter value. Instead, you will normally call the reset
function to get the updater ready for the next parameter update.
inline void Updater::clear() {
_name = "updater";
_tuning = true;
_lambda = 0.0001;
_weight = 1.0;
_prob = 0.0;
_target_acceptance = 0.3;
_naccepts = 0;
_nattempts = 0;
_heating_power = 1.0;
_prior_parameters.clear();
reset();
}
The reset (virtual) function is used to provide a way to initialize values that need to be set to zero (for example) before a new parameter update is attempted.
inline void Updater::reset() {
_log_hastings_ratio = 0.0;
_log_jacobian = 0.0;
}
This function provides a way to equip this updater with a Likelihood object that can be used to compute the likelihood for proposed parameter values.
inline void Updater::setLikelihood(Likelihood::SharedPtr likelihood) {
_likelihood = likelihood;
}
This function allows you to give this updater a TreeManip
object that can be used to change the current tree. This is not important for updating many parameters (e.g. base frequencies), but would be needed for an updater whose job is to update the tree or edge lengths.
inline void Updater::setTreeManip(TreeManip::SharedPtr treemanip) {
_tree_manipulator = treemanip;
}
This accessor function merely returns a pointer to the TreeManip
object currently assigned to this object.
inline TreeManip::SharedPtr Updater::getTreeManip() const {
return _tree_manipulator;
}
Every parameter update will involve choices that are mediated by random numbers, so every updater will need a pseudorandom number generator. This function provides a way to assign a Lot
object to this updater object. Generally, a single pseudorandom number generating Lot
object will be used for all updaters to ensure that an analysis is repeatable, so each updater object will receive a pointer to the same Lot object.
inline void Updater::setLot(Lot::SharedPtr lot) {
_lot = lot;
}
For MCMC analyses involving multiple chains, only one chain explores the actual posterior distribution, while all others explore a posterior density surface that is heated by raising the density to a power between 0.0 and 1.0. This function allows the heating power used by this updater to be set. Ordinarily, all updaters assigned to a single MCMC Chain object will be assigned the same heating power (i.e. a chain object will call the setHeatingPower
function for every Updater
and provide the same value of p
for all).
inline void Updater::setHeatingPower(double p) {
_heating_power = p;
}
Every updater needs a way to determine the boldness of its proposed changes, and this function is used to set this single parameter _lambda
. Larger values of _lambda
cause proposed new values to be, on average, farther away from the current value than smaller values of _lambda
. Taking proposed steps that are too large leads to low acceptance rates, and taking proposed steps that are too small leads to really high acceptance rates (which is not necessarily desirable because such “baby steps” do not explore the posterior very efficiently). The function tune
below automatically adjusts _lambda
so that proposals are accepted at the desired rate, but this function may be used to set it explicitly.
inline void Updater::setLambda(double lambda) {
_lambda = lambda;
}
If true
is supplied to this function, then the function tune
will be called after each update to modify the boldness of the proposal distribution for this updater. Supplying false
causes tune
to not be called during an update. Tuning should only be done during burn-in, so this function is needed to provide a way to turn tuning on or off.
void Updater::setTuning(bool do_tune) {
_tuning = do_tune;
_naccepts = 0;
_nattempts = 0;
}
This is the function that adjusts (tunes) _lambda
so that proposals are just bold enough to be accepted the desired proportion of the time. The only parameter is boolean: supply true
if the most recent update was accepted and false
otherwise. Note that the data member _nattempts
is updated by this function, even if tuning is currently turned off. The theory behind this function was provided by Prokaj, V. (2009). (Thanks to Dave Swofford for pointing out this simple method to me.)
inline void Updater::tune(bool accepted) {
_nattempts++;
if (_tuning) {
double gamma_n = 10.0/(100.0 + (double)_nattempts);
if (accepted)
_lambda *= 1.0 + gamma_n*(1.0 - _target_acceptance)/(2.0*_target_acceptance);
else
_lambda *= 1.0 - gamma_n*0.5;
// Prevent run-away increases in boldness for low-information marginal densities
if (_lambda > 1000.0)
_lambda = 1000.0;
}
}
This function provides a way for you to tell the updater the desired acceptance rate (the fraction of updates that should result in a successful change to the parameter value). This value is used by the Prokaj algorithm in the tune
function.
inline void Updater::setTargetAcceptanceRate(double target) {
_target_acceptance = target;
}
Every updater is responsible for modifying a single parameter or jointly updating a group of related parameters, and in order to calculate the posterior kernel, it needs to know how to calculate the prior for any particular value of its parameter (or parameter vector). This function provides a generic way to supply prior distribution parameter values to be used in calculating the prior probability (density). For example, if the supplied vector c
had length 2, and the parameter being updated has been assigned a Gamma prior distribution, then c[0]
would be the shape parameter and c[1]
the scale parameter of the Gamma prior. If, on the other hand, if the parameter being updated is a vector of 4 state frequencies, then c
might have length 4 and provide the four parameters of a Dirichlet prior distribution for state frequencies. Each Updater interprets what is passed to this function in a potentially different way depending on the prior distribution assigned to the parameter it is in charge of updating.
inline void Updater::setPriorParameters(const std::vector<double> & c) {
_prior_parameters.clear();
_prior_parameters.assign(c.begin(), c.end());
}
This function sets the _weight
data member to the supplied value. The _weight
represents the unnormalized probability that the updater will be called in any given generation.
inline void Updater::setWeight(double w) {
_weight = w;
}
This function computes a new value for the data member _prob
by dividing the _weight
by the sum of all updater weights, which is the value passed into the function.
inline void Updater::calcProb(double wsum) {
assert(wsum > 0.0);
_prob = _weight/wsum;
}
This function simply returns the current value of _lambda
, which is the data member that determines the boldness of proposals that update the parameter or parameters managed by this Updater
object.
inline double Updater::getLambda() const {
return _lambda;
}
This function returns the current value of _prob
.
inline double Updater::getProb() const {
return _prob;
}
This function returns the current value of _weight
.
inline double Updater::getWeight() const {
return _weight;
}
This function returns the percentage of attempted updates that were accepted, unless no updates have been attempted since the last call to the clear
or setTuning
functions, both of which reset _nattempts
and _naccepts
to 0.
inline double Updater::getAcceptPct() const {
return (_nattempts == 0 ? 0.0 : (100.0*_naccepts/_nattempts));
}
This function returns the value of the data member _nattempts
, which is incremented each time this updater’s update
function is called.
inline double Updater::getNumUpdates() const {
return _nattempts;
}
This accessor function returns the name given to this Updater
object, which is stored in the _name
data member.
inline std::string Updater::getUpdaterName() const {
return _name;
}
This function returns the log likelihood of the current model. The log likelihood is needed after proposing a new parameter value as it determines (together with the log joint prior) whether the proposed value can be accepted.
inline double Updater::calcLogLikelihood() const {
return _likelihood->calcLogLikelihood(_tree_manipulator->getTree());
}
This is the function called when a parameter update is desired. This function calls the pure virtual function proposeNewState
to propose a new parameter, decides whether to accept the proposed value, and calls revert
if the proposed value is not accepted. The update
function returns the current log likelihood, which will only differ from prev_lnL
if the proposed value is accepted.
The TreeManip
functions deselectAllPartials
and deselectAllTMatrices
are called before proposeNewState
. This cleans the slate, so to speak. During proposeNewState, the selectPartial
function will be called for nodes whose partials need to be recalculated due to a change in one of its descendant’s edge lengths (or for all nodes if a global model parameter changes). Likewise, the selectTMatrix
function will be called for any node whose edge length changes (or all nodes if a global model parameter changes).
Before the likelihood is calculated, the TreeManip::flipPartialsAndTMatrices
function is called. This causes the partials and transition matrices that are recalculated to use an alternate memory location. If it turns out that the proposed change is not accepted, we need only call flipPartialsAndTMatrices
again to return all those expensive-to-calculate partials and transition matrices to their previous state. This requires us to tell BeagleLib to allocate twice as much memory as we actually need, but the speed improvement is well worth the expense (remember that rejection of proposed moves is generally more common than acceptance).
The quantity _log_zero
requires some explanation. You can see by looking at the class declaration above that this is a static data member of the Updater
class. Static data members exist even if no object of the class has been created, and thus static data members must be initialized outside of the constructor for the class (the constructor is only called if you are creating an object of type Updater
). The value of _log_zero
is initialized in the main.cpp source code file (as you will see when we create main.cpp for this step).
But, you might ask, why do we need _log_zero
anyway? We will initialize this data member to have a value equal to the most negative number that can be stored. We need this if a proposed new state is impossible under the model. If the log_prior
is minus infinity, it means that the proposed state has zero prior probability and should be rejected because, presumably, the current state has probability density > 0. Ideally, the code in proposeNewState
should be written in such a way that only valid states are proposed, so (ideally) this check is not necessary.
inline double Updater::update(double prev_lnL) {
double prev_log_prior = calcLogPrior();
// Clear any nodes previously selected so that we can detect those nodes
// whose partials and/or transition probabilities need to be recalculated
_tree_manipulator->deselectAllPartials();
_tree_manipulator->deselectAllTMatrices();
// Set model to proposed state and calculate _log_hastings_ratio
proposeNewState();
// Use alternative partials and transition probability buffer for any selected nodes
// This allows us to easily revert to the previous values if the move is rejected
_tree_manipulator->flipPartialsAndTMatrices();
// Calculate the log-likelihood and log-prior for the proposed state
double log_likelihood = calcLogLikelihood();
double log_prior = calcLogPrior();
// Decide whether to accept or reject the proposed state
bool accept = true;
if (log_prior > _log_zero) {
double log_R = 0.0;
log_R += _heating_power*(log_likelihood - prev_lnL);
log_R += _heating_power*(log_prior - prev_log_prior);
log_R += _log_hastings_ratio;
log_R += _log_jacobian;
double logu = _lot->logUniform();
if (logu > log_R)
accept = false;
}
else
accept = false;
if (accept) {
_naccepts++;
}
else {
revert();
_tree_manipulator->flipPartialsAndTMatrices();
log_likelihood = prev_lnL;
}
tune(accept);
reset();
return log_likelihood;
}
The number of distinct, labeled, binary tree topologies for n taxa is (2n-5)!/[2^(n-3) (n-3)!]. The inverse of this number is thus the prior probability of any given tree topology under a discrete uniform tree topology prior. This function calculates the log of this prior probability. This function is placed in the base class Updater
because several yet-to-be-introduced updaters (namely TreeUpdater
and PolytomyUpdater
) will need to calculate the log of the topology prior. Having this function reside in Updater
means that we won’t have to implement this function multiple times in different derived classes (which introduces the danger that the two duplicates will accidentally evolve independently and diverge like duplicated genes).
inline double Updater::calcLogTopologyPrior() const {
Tree::SharedPtr tree = _tree_manipulator->getTree();
assert(tree);
unsigned n = tree->numLeaves();
if (tree->isRooted())
n++;
double log_topology_prior = -std::lgamma(2*n-5+1) + (n-3)*std::log(2) + std::lgamma(n-3+1);
return log_topology_prior;
}
This program uses the Gamma-Dirichlet prior proposed by Rannala, Zhu, and Yang (2012), and this function calculates that prior. This approach specifies a Gamma prior distribution for tree length (TL) and the first two values in the _prior_parameters
vector are expected to be the shape and scale of that Gamma distribution. The Dirichlet part specifies the prior distribution for the edge length proportions (not edge lengths). The third element in the _prior_parameters
vector specifies the parameter of this symmetric Dirichlet prior distribution (normally this value is 1 so that the prior is flat and edge lengths are allowed to do whatever they please so long as they add up to TL).
We are adding this function to Updater
because edge lengths will be proposed by more than one updater. For example, there will be an updater (TreeLengthUpdater
) responsible for changing just the TL (rescaling the entire tree) to improve MCMC mixing, and this updater will be separate from an updater (TreeUpdater
) that modifies both the topology and some edge length proportions. Having this function reside in Updater
means that we won’t have to implement this function multiple times in different derived classes.
inline double Updater::calcLogEdgeLengthPrior() const {
Tree::SharedPtr tree = _tree_manipulator->getTree();
assert(tree);
double TL = _tree_manipulator->calcTreeLength();
double n = tree->numLeaves();
double num_edges = 2.0*n - (tree->isRooted() ? 2.0 : 3.0);
assert(_prior_parameters.size() == 3);
double a = _prior_parameters[0]; // shape of Gamma prior on TL
double b = _prior_parameters[1]; // scale of Gamma prior on TL
double c = _prior_parameters[2]; // parameter of Dirichlet prior on edge length proportions
// Calculate Gamma prior on tree length (TL)
double log_gamma_prior_on_TL = (a - 1.0)*log(TL) - TL/b - a*log(b) - std::lgamma(a);
// Calculate Dirichlet prior on edge length proportions
//
// Note that, for n edges, the Dirichlet prior density is
//
// p1^{c-1} p2^{c-1} ... pn^{c-1}
// ------------------------------
// Gamma(c)^n / Gamma(n*c)
//
// where n = num_edges, pk = edge length k / TL and Gamma is the Gamma function.
// If c == 1, then both numerator and denominator equal 1, so it is pointless
// do loop over edge lengths.
double log_edge_length_proportions_prior = std::lgamma(num_edges*c);
if (c != 1.0) {
for (auto nd : tree->_preorder) {
double edge_length_proportion = nd->_edge_length/TL;
log_edge_length_proportions_prior += (c - 1.0)*log(edge_length_proportion);
}
log_edge_length_proportions_prior -= std::lgamma(c)*num_edges;
}
double log_prior = log_gamma_prior_on_TL + log_edge_length_proportions_prior;
return log_prior;
}
This is a static member function that returns the value stored in the static data member _log_zero
. The fact that it is declared static means that this function can be called even if no Updater
object exists. The value returned is the most negative floating point value that can be stored in the computer. This value stands in for the log of zero, which is really negative infinity, which is not a number and therefore cannot be stored in a computer. This value is used primarily when a parameter value is proposed outside of the support of its prior probability distribution. The prior probability is in this case zero, and the log prior is -infinity. Because we cannot represent -infinity, we instead use _log_zero
, which is at least guaranteed to be as small or smaller (in log scale) as any valid prior probability.
inline double Updater::getLogZero() {
return _log_zero;
}
V Prokaj. 2009. Proposal selection for MCMC simulation. pp. 61–65 in: Applied Stochastic Models and Data Analysis. XIII international conference on applied stochastic models and data analysis. Vilnius, Lithuania.
B Rannala, T Zhu, and Z Yang. 2012. Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Molecular Biology and Evolution. 29:325–335. DOI: 10.1093/molbev/msr210