Now all that is needed is to make changes to the Node
and TreeManip
classes to accommodate flipping transition matrices and partials, make Updater
a friend of Tree
and Node
, and update the Strom::run
function to set up Chain
and run it. For this exercise we will use another variation on the rbcL data set used before.
Download rbcl10.nex and rbcl10.tre to your working directory with the other downloaded data and tree files.
Here are maximum likelihood estimates (using PAUP* Version 4.0a, build 159) of the GTR+G model parameters using the data stored in rbcl10.nex and the tree topology stored in rbcl10.tre. You can refer back to this table once your program begins estimating these parameters to see how the values compare. Remember, however, that your program is Bayesian and thus prior distributions will affect estimates, and the maximized log-likelihood computed by PAUP* will (almost surely) be higher than any log-likelihood you will see in your output. I have modified the output produced by PAUP* below by adding a second column (values in parentheses) in which I provide:
a normalized version the GTR exchangeabilities (PAUP* sets GT to 1 and makes all other exchangeabilities relative to GT); and
the rate variance (1/shape) instead of the shape parameter reported by PAUP*
In short, if all is working, we should see our MCMC analysis hovering (after a few iterations) around the rate variance 3.94 and a log-likelihood a little less than -6723.144.
-ln L 6723.144
Base frequencies:
A 0.294667
C 0.189172
G 0.206055
T 0.310106
Rate matrix R:
AC 1.25798 (0.06082)
AG 5.76823 (0.27887)
AT 1.33643 (0.06461)
CG 1.29145 (0.06244)
CT 10.03035 (0.48492)
GT 1.00000 (0.04834)
Shape 0.253596 (3.94322)
Edit node.hpp and uncomment the class Updater;
and friend class Updater;
lines that are currently commented out. In addition, add the methods and data structures highlighted below in bold, blue text:
#pragma once
#include <string>
#include <vector>
#include <iostream>
#include "split.hpp"
namespace strom {
class Tree;
class TreeManip;
class Likelihood;
<span style="color:#0000ff"><strong>class Updater;</strong></span>
class Node {
friend class Tree;
friend class TreeManip;
friend class Likelihood;
<span style="color:#0000ff"><strong>friend class Updater;</strong></span>
public:
Node();
~Node();
Node * getParent() {return _parent;}
Node * getLeftChild() {return _left_child;}
Node * getRightSib() {return _right_sib;}
int getNumber() {return _number;}
std::string getName() {return _name;}
Split getSplit() {return _split;}
<span style="color:#0000ff"><strong>bool isSelected() {return _flags & Flag::Selected;}</strong></span>
<span style="color:#0000ff"><strong>void select() {_flags |= Flag::Selected;}</strong></span>
<span style="color:#0000ff"><strong>void deselect() {_flags &= ~Flag::Selected;}</strong></span>
<span style="color:#0000ff"><strong></strong></span>
<span style="color:#0000ff"><strong>bool isSelPartial() {return _flags & Flag::SelPartial;}</strong></span>
<span style="color:#0000ff"><strong>void selectPartial() {_flags |= Flag::SelPartial;}</strong></span>
<span style="color:#0000ff"><strong>void deselectPartial() {_flags &= ~Flag::SelPartial;}</strong></span>
<span style="color:#0000ff"><strong></strong></span>
<span style="color:#0000ff"><strong>bool isSelTMatrix() {return _flags & Flag::SelTMatrix;}</strong></span>
<span style="color:#0000ff"><strong>void selectTMatrix() {_flags |= Flag::SelTMatrix;}</strong></span>
<span style="color:#0000ff"><strong>void deselectTMatrix() {_flags &= ~Flag::SelTMatrix;}</strong></span>
<span style="color:#0000ff"><strong></strong></span>
<span style="color:#0000ff"><strong>bool isAltPartial() {return _flags & Flag::AltPartial;}</strong></span>
<span style="color:#0000ff"><strong>void setAltPartial() {_flags |= Flag::AltPartial;}</strong></span>
<span style="color:#0000ff"><strong>void clearAltPartial() {_flags &= ~Flag::AltPartial;}</strong></span>
<span style="color:#0000ff"><strong></strong></span>
<span style="color:#0000ff"><strong>bool isAltTMatrix() {return _flags & Flag::AltTMatrix;}</strong></span>
<span style="color:#0000ff"><strong>void setAltTMatrix() {_flags |= Flag::AltTMatrix;}</strong></span>
<span style="color:#0000ff"><strong>void clearAltTMatrix() {_flags &= ~Flag::AltTMatrix;}</strong></span>
<span style="color:#0000ff"><strong></strong></span>
<span style="color:#0000ff"><strong>void flipTMatrix() {isAltTMatrix() ? clearAltTMatrix() : setAltTMatrix();}</strong></span>
<span style="color:#0000ff"><strong>void flipPartial() {isAltPartial() ? clearAltPartial() : setAltPartial();}</strong></span>
double getEdgeLength() {return _edge_length;}
void setEdgeLength(double v);
static const double _smallest_edge_length;
typedef std::vector<Node> Vector;
typedef std::vector<Node *> PtrVector;
private:
<span style="color:#0000ff"><strong>enum Flag {</strong></span>
<span style="color:#0000ff"><strong>Selected = (1 << 0),</strong></span>
<span style="color:#0000ff"><strong>SelPartial = (1 << 1),</strong></span>
<span style="color:#0000ff"><strong>SelTMatrix = (1 << 2),</strong></span>
<span style="color:#0000ff"><strong>AltPartial = (1 << 3),</strong></span>
<span style="color:#0000ff"><strong>AltTMatrix = (1 << 4)</strong></span>
<span style="color:#0000ff"><strong>};</strong></span>
void clear();
Node * _left_child;
Node * _right_sib;
Node * _parent;
int _number;
std::string _name;
double _edge_length;
Split _split;
<span style="color:#0000ff"><strong>int _flags;</strong></span>
};
inline Node::Node() {
//std::cout << "Creating Node object" << std::endl;
clear();
}
inline Node::~Node() {
//std::cout << "Destroying Node object" << std::endl;
}
inline void Node::clear() {
<span style="color:#0000ff"><strong>_flags = 0;</strong></span>
_left_child = 0;
_right_sib = 0;
_parent = 0;
_number = -1;
_name = "";
_edge_length = _smallest_edge_length;
}
inline void Node::setEdgeLength(double v) {
_edge_length = (v < _smallest_edge_length ? _smallest_edge_length : v);
}
}
The Flag
enum provides binary attributes for every node that can be turned on or off. The select()
, selectPartial()
, selectTMatrix()
, setAltPartial()
, and setAltTMatrix()
member functions set (i.e. make the value 1) bits at positions 1, 2, 3, 4, and 5, respectively, while the member functions deselect()
, deselectPartial()
, deselectTMatrix()
, clearAltPartial()
, and clearAltTMatrix()
unset (i.e. make the value 0 at) the same bit positions. As an example, suppose a node has 0 at all bit positions except position 4 (Flag::AltPartial
). Here is what the node’s flag data member looks like in binary representation (showing only the 5 bits relevant to this discussion):
flags = 0 1 0 0 0
| | | | \Selected
| | | \SelPartial
| | \SelTMatrix
| \AltPartial
\AltTMatrix
If selectTMatrix()
is called for this node, the result will be to set the 3rd bit (from the right) in addition to the bit already set in the 4th position (due to the fact that this is an OR operation):
flags = 0 1 0 0 0
| 0 0 1 0 0
-----------------
0 1 1 0 0
| | | | \Selected
| | | \SelPartial
| | \SelTMatrix
| \AltPartial
\AltTMatrix
If clearAltPartial()
is called for this node, the result will be to unset only the 4th bit (note that ~(0 1 0 0 0) = 1 0 1 1 1
:
flags = 0 1 1 0 0
& 1 0 1 1 1
--------------------
0 0 1 0 0
| | | | \Selected
| | | \SelPartial
| | \SelTMatrix
| \AltPartial
\AltTMatrix
If bit operations still seem confusing, take a look at this explanation.
These member functions allow us to:
The added Node functions are normally called by TreeManip
functions. Add the indicated function prototypes to the TreeManip
class declaration:
class TreeManip {
public:
TreeManip();
TreeManip(Tree::SharedPtr t);
~TreeManip();
void setTree(Tree::SharedPtr t);
Tree::SharedPtr getTree();
double calcTreeLength() const;
unsigned countEdges() const;
void scaleAllEdgeLengths(double scaler);
void createTestTree();
std::string makeNewick(unsigned precision, bool use_names = false) const;
void buildFromNewick(const std::string newick, bool rooted, bool allow_polytomies);
void storeSplits(std::set<Split> & splitset);
void rerootAtNodeNumber(int node_number);
<span style="color:#0000ff"><strong>void selectAll();</strong></span>
<span style="color:#0000ff"><strong>void deselectAll();</strong></span>
<span style="color:#0000ff"><strong>void selectAllPartials();</strong></span>
<span style="color:#0000ff"><strong>void deselectAllPartials();</strong></span>
<span style="color:#0000ff"><strong>void selectAllTMatrices();</strong></span>
<span style="color:#0000ff"><strong>void deselectAllTMatrices();</strong></span>
<span style="color:#0000ff"><strong>void selectPartialsHereToRoot(Node * a);</strong></span>
<span style="color:#0000ff"><strong>void flipPartialsAndTMatrices();</strong></span>
void clear();
private:
Node * findNextPreorder(Node * nd);
void refreshPreorder();
void refreshLevelorder();
void renumberInternals();
void rerootAtNode(Node * prospective_root);
void extractNodeNumberFromName(Node * nd, std::set<unsigned> & used);
void extractEdgeLen(Node * nd, std::string edge_length_string);
unsigned countNewickLeaves(const std::string newick);
void stripOutNexusComments(std::string & newick);
bool canHaveSibling(Node * nd, bool rooted, bool allow_polytomies);
Tree::SharedPtr _tree;
public:
typedef std::shared_ptr< TreeManip > SharedPtr;
};
Add the bodies of these functions somewhere below the class declaration and before the curly bracket closing the namespace:
inline void TreeManip::selectAll() {
for (auto & nd : _tree->_nodes) {
nd.select();
}
}
inline void TreeManip::deselectAll() {
for (auto & nd : _tree->_nodes) {
nd.deselect();
}
}
inline void TreeManip::selectAllPartials() {
for (auto & nd : _tree->_nodes)
nd.selectPartial();
}
inline void TreeManip::deselectAllPartials() {
for (auto & nd : _tree->_nodes) {
nd.deselectPartial();
}
}
inline void TreeManip::selectAllTMatrices() {
for (auto & nd : _tree->_nodes)
nd.selectTMatrix();
}
inline void TreeManip::deselectAllTMatrices() {
for (auto & nd : _tree->_nodes) {
nd.deselectTMatrix();
}
}
inline void TreeManip::selectPartialsHereToRoot(Node * a) {
a->selectPartial();
while (a->_parent) {
a = a->_parent;
a->selectPartial();
}
}
inline void TreeManip::flipPartialsAndTMatrices() {
for (auto & nd : _tree->_nodes) {
if (nd.isSelPartial())
nd.flipPartial();
if (nd.isSelTMatrix())
nd.flipTMatrix();
}
}
The ability to set a flag for a node indicating that an alternate partials index (or transition matrix index) is only useful if the Likelihood
class pays attention to these flags. The first step is to reserve enough memory in BeagleLIb to accommodate alternate sets of indices. The highlighted lines below double the amount of memory allocated for partials, transition matrices, and scaler buffers.
inline void Likelihood::newInstance(unsigned nstates, int nrates, std::vector<unsigned> & subset_indices) {
unsigned num_subsets = (unsigned)subset_indices.size();
bool is_invar_model = (nrates < 0 ? true : false);
unsigned ngammacat = (unsigned)(is_invar_model ? -nrates : nrates);
unsigned num_patterns = 0;
for (auto s : subset_indices) {
num_patterns += _data->getNumPatternsInSubset(s);
}
unsigned num_internals = calcNumInternalsInFullyResolvedTree();
// add 1 to num_edges so that subroot node will have a tmatrix, root tip's tmatrix is never used
unsigned num_edges = calcNumEdgesInFullyResolvedTree();
unsigned num_nodes = num_edges + 1;
unsigned num_transition_probs = num_nodes*num_subsets;
long requirementFlags = 0;
long preferenceFlags = BEAGLE_FLAG_PRECISION_SINGLE | BEAGLE_FLAG_THREADING_CPP;
if (_underflow_scaling) {
preferenceFlags |= BEAGLE_FLAG_SCALING_MANUAL;
preferenceFlags |= BEAGLE_FLAG_SCALERS_LOG;
}
if (_prefer_gpu)
preferenceFlags |= BEAGLE_FLAG_PROCESSOR_GPU;
else
preferenceFlags |= BEAGLE_FLAG_PROCESSOR_CPU;
BeagleInstanceDetails instance_details;
unsigned npartials = num_internals + _ntaxa;
unsigned nscalers = num_internals; // one scale buffer for every internal node
unsigned nsequences = 0;
if (_ambiguity_equals_missing) {
npartials -= _ntaxa;
nsequences += _ntaxa;
}
int inst = beagleCreateInstance(
_ntaxa, // tips
<span style="color:#0000ff"><strong>2*npartials, // partials</strong></span>
nsequences, // sequences
nstates, // states
num_patterns, // patterns (total across all subsets that use this instance)
num_subsets, // models (one for each distinct eigen decomposition)
<span style="color:#0000ff"><strong>2*num_subsets*num_transition_probs, // transition matrices (one for each edge in each subset)</strong></span>
ngammacat, // rate categories
<span style="color:#0000ff"><strong>(_underflow_scaling ? 2*nscalers + 1 : 0), // scale buffers (+1 is for the cumulative scaler at index 0)</strong></span>
NULL, // resource restrictions
0, // length of resource list
preferenceFlags, // preferred flags
requirementFlags, // required flags
&instance_details); // pointer for details
if (inst < 0) {
// beagleCreateInstance returns one of the following:
// valid instance (0, 1, 2, ...)
// error code (negative integer)
throw XStrom(boost::str(boost::format("Likelihood init function failed to create BeagleLib instance (BeagleLib error code was %d)") % _beagle_error[inst]));
}
InstanceInfo info;
info.handle = inst;
info.resourcenumber = instance_details.resourceNumber;
info.resourcename = instance_details.resourceName;
info.nstates = nstates;
info.nratecateg = ngammacat;
info.invarmodel = is_invar_model;
info.subsets = subset_indices;
info.npatterns = num_patterns;
info.partial_offset = num_internals;
info.tmatrix_offset = num_nodes;
_instances.push_back(info);
}
Next, modify getScalerIndex
, getTMatrixIndex
, and getPartialIndex
so that they return the alternate indices when the appropriate node flags are set. Replace the bodies of each of these three functions with the versions below:
inline unsigned Likelihood::getScalerIndex(Node * nd, InstanceInfo & info) const {
unsigned sindex = BEAGLE_OP_NONE;
if (_underflow_scaling) {
sindex = nd->_number - _ntaxa + 1; // +1 to skip the cumulative scaler vector
if (nd->isAltPartial())
sindex += info.partial_offset;
}
return sindex;
}
inline unsigned Likelihood::getPartialIndex(Node * nd, InstanceInfo & info) const {
// Note: do not be tempted to subtract _ntaxa from pindex: BeagleLib does this itself
assert(nd->_number >= 0);
unsigned pindex = nd->_number;
if (pindex >= _ntaxa) {
if (nd->isAltPartial())
pindex += info.partial_offset;
}
return pindex;
}
inline unsigned Likelihood::getTMatrixIndex(Node * nd, InstanceInfo & info, unsigned subset_index) const {
unsigned tindex = 2*subset_index*info.tmatrix_offset + nd->_number;
if (nd->isAltTMatrix())
tindex += info.tmatrix_offset;
return tindex;
}
Finally, tell defineOperations
to only recalculate transition matrices or partials if the node is selected. This will be important later when only a few nodes are affected by a proposal (no sense in recalculating the partial likelihoods for a clade if that clade was not involved in the MCMC proposal).
inline void Likelihood::defineOperations(Tree::SharedPtr t) {
assert(_instances.size() > 0);
assert(t);
assert(t->isRooted() == _rooted);
_relrate_normalizing_constant = _model->calcNormalizingConstantForSubsetRelRates();
// Start with a clean slate
for (auto & info : _instances) {
_operations[info.handle].clear();
_pmatrix_index[info.handle].clear();
_edge_lengths[info.handle].clear();
_eigen_indices[info.handle].clear();
_category_rate_indices[info.handle].clear();
}
// Loop through all nodes in reverse level order
for (auto nd : boost::adaptors::reverse(t->_levelorder)) {
assert(nd->_number >= 0);
if (!nd->_left_child) {
// This is a leaf
<span style="color:#0000ff"><strong>if (nd->isSelTMatrix())</strong></span>
queueTMatrixRecalculation(nd);
}
else {
// This is an internal node
<span style="color:#0000ff"><strong>if (nd->isSelTMatrix())</strong></span>
queueTMatrixRecalculation(nd);
// Internal nodes have partials to be calculated, so define
// an operation to compute the partials for this node
<span style="color:#0000ff"><strong>if (nd->isSelPartial()) {</strong></span>
Node * lchild = nd->_left_child;
assert(lchild);
Node * rchild = lchild->_right_sib;
assert(rchild);
queuePartialsRecalculation(nd, lchild, rchild);
<span style="color:#0000ff"><strong>}</strong></span>
}
}
}
Edit tree.hpp and uncomment the class Updater;
and friend class Updater;
lines that are currently commented out.
Make the following blue-highlighted additions to the Model
class.
class Model {
friend class Likelihood;
public:
typedef std::vector<ASRV::SharedPtr> asrv_vect_t;
typedef std::vector<QMatrix::SharedPtr> qmat_vect_t;
typedef std::vector<unsigned> subset_sizes_t;
typedef std::vector<DataType> subset_datatype_t;
typedef std::vector<double> subset_relrate_vect_t;
<span style="color:#0000ff"><strong>typedef std::vector<QMatrix::SharedPtr> state_freq_params_t;</strong></span>
<span style="color:#0000ff"><strong>typedef std::vector<QMatrix::SharedPtr> exchangeability_params_t;</strong></span>
<span style="color:#0000ff"><strong>typedef std::vector<QMatrix::SharedPtr> omega_params_t;</strong></span>
<span style="color:#0000ff"><strong>typedef std::vector<ASRV::SharedPtr> ratevar_params_t;</strong></span>
<span style="color:#0000ff"><strong>typedef std::vector<ASRV::SharedPtr> pinvar_params_t;</strong></span>
typedef boost::shared_ptr<Model> SharedPtr;
Model();
~Model();
void activate();
void inactivate();
std::string describeModel();
void setSubsetDataTypes(const subset_datatype_t & datatype_vect);
void setSubsetRateVar(ASRV::ratevar_ptr_t ratevar, unsigned subset, bool fixed);
void setSubsetPinvar(ASRV::pinvar_ptr_t pinvar, unsigned subset, bool fixed);
void setSubsetExchangeabilities(QMatrix::freq_xchg_ptr_t exchangeabilities, unsigned subset, bool fixed);
void setSubsetStateFreqs(QMatrix::freq_xchg_ptr_t state_frequencies, unsigned subset, bool fixed);
void setSubsetOmega(QMatrix::omega_ptr_t omega, unsigned subset, bool fixed);
void setSubsetRelRates(subset_relrate_vect_t & relrates, bool fixed);
subset_relrate_vect_t & getSubsetRelRates();
bool isFixedSubsetRelRates() const;
double calcNormalizingConstantForSubsetRelRates() const;
void setTreeIndex(unsigned i, bool fixed);
unsigned getTreeIndex() const;
bool isFixedTree() const;
unsigned getNumSubsets() const;
unsigned getNumSites() const;
unsigned getSubsetNumSites(unsigned subset) const;
const QMatrix & getQMatrix(unsigned subset) const;
const ASRV & getASRV(unsigned subset) const;
void setSubsetIsInvarModel(bool is_invar, unsigned subset);
bool getSubsetIsInvarModel(unsigned subset) const;
void setSubsetSizes(const subset_sizes_t nsites_vect);
subset_sizes_t & getSubsetSizes();
void setSubsetNumPatterns(const subset_sizes_t npatterns_vect);
unsigned getSubsetNumPatterns(unsigned subset) const;
void setSubsetNumCateg(unsigned ncateg, unsigned subset);
unsigned getSubsetNumCateg(unsigned subset) const;
<span style="color:#0000ff"><strong>state_freq_params_t & getStateFreqParams();</strong></span>
<span style="color:#0000ff"><strong>exchangeability_params_t & getExchangeabilityParams();</strong></span>
<span style="color:#0000ff"><strong>omega_params_t & getOmegaParams();</strong></span>
<span style="color:#0000ff"><strong>ratevar_params_t & getRateVarParams();</strong></span>
<span style="color:#0000ff"><strong>pinvar_params_t & getPinvarParams();</strong></span>
int setBeagleEigenDecomposition(int beagle_instance, unsigned subset, unsigned instance_subset);
int setBeagleStateFrequencies(int beagle_instance, unsigned subset, unsigned instance_subset);
int setBeagleAmongSiteRateVariationRates(int beagle_instance, unsigned subset, unsigned instance_subset);
int setBeagleAmongSiteRateVariationProbs(int beagle_instance, unsigned subset, unsigned instance_subset);
private:
void clear();
unsigned _num_subsets;
unsigned _num_sites;
subset_sizes_t _subset_sizes;
subset_sizes_t _subset_npatterns;
subset_datatype_t _subset_datatypes;
qmat_vect_t _qmatrix;
asrv_vect_t _asrv;
bool _tree_index;
bool _tree_fixed;
bool _subset_relrates_fixed;
subset_relrate_vect_t _subset_relrates;
<span style="color:#0000ff"><strong>state_freq_params_t _state_freq_params;</strong></span>
<span style="color:#0000ff"><strong>exchangeability_params_t _exchangeability_params;</strong></span>
<span style="color:#0000ff"><strong>omega_params_t _omega_params;</strong></span>
<span style="color:#0000ff"><strong>ratevar_params_t _ratevar_params;</strong></span>
<span style="color:#0000ff"><strong>pinvar_params_t _pinvar_params;</strong></span>
};
Add these 5 function bodies somewhere before the right curly bracket that closes the strom
namespace in model.hpp:
inline Model::state_freq_params_t & Model::getStateFreqParams() {
return _state_freq_params;
}
inline Model::exchangeability_params_t & Model::getExchangeabilityParams() {
return _exchangeability_params;
}
inline Model::omega_params_t & Model::getOmegaParams() {
return _omega_params;
}
inline Model::ratevar_params_t & Model::getRateVarParams() {
return _ratevar_params;
}
inline Model::pinvar_params_t & Model::getPinvarParams() {
return _pinvar_params;
}
Make the following additions to the describeModel
function in model.hpp. These additions specify which parameters are variable in the model and allow the Chain
class to determine what specific parameter updaters are needed.
inline std::string Model::describeModel() {
// Creates summary such as following and returns as a string:
//
// Partition information:
//
// data subset 1 2 3
// -----------------------------------------------------
// num. sites 20 20 20
// num. patterns 7 5 17
// num. states 4 4 4
// rate categories 4 1 4
//
// Parameter linkage:
//
// data subset 1 2 3
// -----------------------------------------------------
// state freqs 1 1 1
// exchangeabilities 1 1 2
// rate variance 1 2 3
// pinvar 1 2 -
<span style="color:#0000ff"><strong>// Start with empty parameter vectors</strong></span>
<span style="color:#0000ff"><strong>_state_freq_params.clear();</strong></span>
<span style="color:#0000ff"><strong>_exchangeability_params.clear();</strong></span>
<span style="color:#0000ff"><strong>_omega_params.clear();</strong></span>
<span style="color:#0000ff"><strong>_ratevar_params.clear();</strong></span>
<span style="color:#0000ff"><strong>_pinvar_params.clear();</strong></span>
// Sets used to determine which parameters are linked across subsets
std::set<double *> freqset;
std::set<double *> xchgset;
std::set<double *> omegaset;
std::set<double *> ratevarset;
std::set<double *> pinvarset;
std::set<double *> relrateset;
// Vectors of pointers to distinct parameters
std::vector<double *> unique_freq;
std::vector<double *> unique_xchg;
std::vector<double *> unique_omega;
std::vector<double *> unique_ratevar;
std::vector<double *> unique_pinvar;
std::vector<double *> unique_relrate;
// Map for storing strings that will contain the information for each row
std::map<std::string, std::string> ss = {
{"subset", ""},
{"dashes", ""},
{"freqs", ""},
{"xchg", ""},
{"omega", ""},
{"ratevar", ""},
{"pinvar", ""},
{"ncateg", ""},
{"nsites", ""},
{"npatterns", ""},
{"nstates", ""}
};
// Ensure that the subset relative rates are fixed if there is only one
// subset; otherwise the subset relative rates will be added to the list
// of free parameters that are updated, which makes no sense in this case
if (_num_subsets == 1)
_subset_relrates_fixed = true;
// Loop through subsets, building up rows as we go
for (unsigned i = 0; i < _num_subsets; i++) {
// Ensure that for subsets in which the number of rate categories is 1 that
// the gamma rate variance is fixed; otherwise the gamma rate variance will
// be added to the list of free parameters that are updated, which makes
// no sense in this case
if (_asrv[i]->getNumCateg() == 1) {
_asrv[i]->fixRateVar(true);
}
unsigned index;
ss["subset"] += boost::str(boost::format("%12d") % (i+1));
ss["dashes"] += "------------";
// Determine whether state freqs are unique for this subset
QMatrix::freq_xchg_ptr_t pfreq = _qmatrix[i]->getStateFreqsSharedPtr();
QMatrix::freq_xchg_t & freq = *pfreq;
double * freq_addr = &freq[0];
auto f = freqset.insert(freq_addr);
if (f.second) {
unique_freq.push_back(freq_addr);
<span style="color:#0000ff"><strong>if (!_qmatrix[i]->isFixedStateFreqs())</strong></span>
<span style="color:#0000ff"><strong>_state_freq_params.push_back(_qmatrix[i]);</strong></span>
index = (unsigned)unique_freq.size();
}
else {
auto iter = std::find(unique_freq.begin(), unique_freq.end(), freq_addr);
index = (unsigned)std::distance(unique_freq.begin(), iter) + 1;
}
ss["freqs"] += boost::str(boost::format("%12d") % index);
// Determine whether exchangeabilities are unique for this subset
if (_subset_datatypes[i].isNucleotide()) {
QMatrix::freq_xchg_ptr_t pxchg = _qmatrix[i]->getExchangeabilitiesSharedPtr();
QMatrix::freq_xchg_t & xchg = *pxchg;
double * xchg_addr = &xchg[0];
auto x = xchgset.insert(xchg_addr);
if (x.second) {
unique_xchg.push_back(xchg_addr);
<span style="color:#0000ff"><strong>if (!_qmatrix[i]->isFixedExchangeabilities())</strong></span>
<span style="color:#0000ff"><strong>_exchangeability_params.push_back(_qmatrix[i]);</strong></span>
index = (unsigned)unique_xchg.size();
}
else {
auto iter = std::find(unique_xchg.begin(), unique_xchg.end(), xchg_addr);
index = (unsigned)std::distance(unique_xchg.begin(), iter) + 1;
}
ss["xchg"] += boost::str(boost::format("%12d") % index);
}
else {
ss["xchg"] += boost::str(boost::format("%12s") % "-");
}
// Determine whether omega is unique for this subset
if (_subset_datatypes[i].isCodon()) {
QMatrix::omega_ptr_t pomega = _qmatrix[i]->getOmegaSharedPtr();
QMatrix::omega_t omegavalue = *pomega;
double * omega_addr = &omegavalue;
auto o = omegaset.insert(omega_addr);
if (o.second) {
unique_omega.push_back(omega_addr);
<span style="color:#0000ff"><strong>if (!_qmatrix[i]->isFixedOmega())</strong></span>
<span style="color:#0000ff"><strong>_omega_params.push_back(_qmatrix[i]);</strong></span>
index = (unsigned)unique_omega.size();
}
else {
auto iter = std::find(unique_omega.begin(), unique_omega.end(), omega_addr);
index = (unsigned)std::distance(unique_omega.begin(), iter) + 1;
}
ss["omega"] += boost::str(boost::format("%12d") % index);
}
else {
ss["omega"] += boost::str(boost::format("%12s") % "-");
}
// Determine whether rate variance is unique for this subset
ASRV::ratevar_ptr_t pratevar = _asrv[i]->getRateVarSharedPtr();
double & ratevar = *pratevar;
double * ratevar_addr = &ratevar;
auto r = ratevarset.insert(ratevar_addr);
if (r.second) {
unique_ratevar.push_back(ratevar_addr);
<span style="color:#0000ff"><strong>if (!_asrv[i]->isFixedRateVar())</strong></span>
<span style="color:#0000ff"><strong>_ratevar_params.push_back(_asrv[i]);</strong></span>
index = (unsigned)unique_ratevar.size();
}
else {
auto iter = std::find(unique_ratevar.begin(), unique_ratevar.end(), ratevar_addr);
index = (unsigned)std::distance(unique_ratevar.begin(), iter) + 1;
}
ss["ratevar"] += boost::str(boost::format("%12d") % index);
// Determine whether pinvar is unique for this subset
if (_asrv[i]->getIsInvarModel()) {
ASRV::pinvar_ptr_t ppinvar = _asrv[i]->getPinvarSharedPtr();
double & pinvar = *ppinvar;
double * pinvar_addr = &pinvar;
auto r = pinvarset.insert(pinvar_addr);
if (r.second) {
unique_pinvar.push_back(pinvar_addr);
<span style="color:#0000ff"><strong>if (!_asrv[i]->isFixedPinvar())</strong></span>
<span style="color:#0000ff"><strong>_pinvar_params.push_back(_asrv[i]);</strong></span>
index = (unsigned)unique_pinvar.size();
}
else {
auto iter = std::find(unique_pinvar.begin(), unique_pinvar.end(), pinvar_addr);
index = (unsigned)std::distance(unique_pinvar.begin(), iter) + 1;
}
ss["pinvar"] += boost::str(boost::format("%12d") % index);
}
else {
ss["pinvar"] += boost::str(boost::format("%12s") % "-");
}
// Save number of rate categories for this subset
ss["ncateg"] += boost::str(boost::format("%12d") % _asrv[i]->getNumCateg());
// Save number of sites for this subset
ss["nsites"] += boost::str(boost::format("%12d") % _subset_sizes[i]);
// Save number of patterns for this subset
ss["npatterns"] += boost::str(boost::format("%12d") % _subset_npatterns[i]);
// Save number of states for this subset
if (_subset_datatypes.size() == _num_subsets)
ss["nstates"] += boost::str(boost::format("%12d") % _subset_datatypes[i].getNumStates());
else
ss["nstates"] += boost::str(boost::format("%12s") % "?");
}
std::string s = "Partition information:\n\n";
s += boost::str(boost::format("%20s%s\n") % "data subset" % ss["subset"]);
s += boost::str(boost::format("%20s%s\n") % "-----------------" % ss["dashes"]);
s += boost::str(boost::format("%20s%s\n") % "num. sites" % ss["nsites"]);
s += boost::str(boost::format("%20s%s\n") % "num. patterns" % ss["npatterns"]);
s += boost::str(boost::format("%20s%s\n") % "num. states" % ss["nstates"]);
s += boost::str(boost::format("%20s%s\n") % "rate categories" % ss["ncateg"]);
s += "\nParameter linkage:\n\n";
s += boost::str(boost::format("%20s%s\n") % "data subset" % ss["subset"]);
s += boost::str(boost::format("%20s%s\n") % "-----------------" % ss["dashes"]);
s += boost::str(boost::format("%20s%s\n") % "state freqs" % ss["freqs"]);
s += boost::str(boost::format("%20s%s\n") % "exchangeabilities" % ss["xchg"]);
s += boost::str(boost::format("%20s%s\n") % "omega" % ss["omega"]);
s += boost::str(boost::format("%20s%s\n") % "rate variance" % ss["ratevar"]);
s += boost::str(boost::format("%20s%s\n") % "pinvar" % ss["pinvar"]);
s += "\nParameter values for each subset:\n";
s += "\n relative rate:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
s += boost::str(boost::format(" %12d: %g\n") % (i+1) % _subset_relrates[i]);
}
s += "\n state freqs:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
QMatrix::freq_xchg_t & freqs = *(_qmatrix[i]->getStateFreqsSharedPtr());
std::vector<std::string> freqs_as_strings(freqs.size());
std::transform(freqs.begin(), freqs.end(), freqs_as_strings.begin(), [](double freq) {return boost::str(boost::format("%g") % freq);});
std::string tmp = boost::algorithm::join(freqs_as_strings, ",");
s += boost::str(boost::format(" %12d: (%s)\n") % (i+1) % tmp);
}
s += "\n exchangeabilities:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
if (_subset_datatypes[i].isNucleotide()) {
QMatrix::freq_xchg_t & xchg = *(_qmatrix[i]->getExchangeabilitiesSharedPtr());
std::vector<std::string> xchg_as_strings(xchg.size());
std::transform(xchg.begin(), xchg.end(), xchg_as_strings.begin(), [](double x) {return boost::str(boost::format("%g") % x);});
std::string tmp = boost::algorithm::join(xchg_as_strings, ",");
s += boost::str(boost::format(" %12d: (%s)\n") % (i+1) % tmp);
}
else {
s += boost::str(boost::format(" %12d: -\n") % (i+1));
}
}
s += "\n omega:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
if (_subset_datatypes[i].isCodon()) {
double omega = *(_qmatrix[i]->getOmegaSharedPtr());
s += boost::str(boost::format(" %12d: %g\n") % (i+1) % omega);
}
else {
s += boost::str(boost::format(" %12d: -\n") % (i+1));
}
}
s += "\n rate variance:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
if (_asrv[i]->getNumCateg() > 1) {
double ratevar = *(_asrv[i]->getRateVarSharedPtr());
s += boost::str(boost::format(" %12d: %g\n") % (i+1) % ratevar);
}
else
s += boost::str(boost::format(" %12d: -\n") % (i+1));
}
s += "\n pinvar:\n";
for (unsigned i = 0; i < _num_subsets; i++) {
double pinvar = *(_asrv[i]->getPinvarSharedPtr());
bool is_invar_model = _asrv[i]->getIsInvarModel();
if (is_invar_model)
s += boost::str(boost::format(" %12d: %g\n") % (i+1) % pinvar);
else
s += boost::str(boost::format(" %12d: -\n") % (i+1));
}
return s;
}
Modify strom.hpp by adding the bold, blue parts below to the Strom
class declaration:
#pragma once
#include <iostream>
#include "data.hpp"
#include "likelihood.hpp"
#include "tree_summary.hpp"
#include "partition.hpp"
<span style="color:#0000ff"><strong>#include "lot.hpp"</strong></span>
<span style="color:#0000ff"><strong>#include "chain.hpp"</strong></span>
#include <boost/program_options.hpp>
#include <boost/filesystem.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
namespace strom {
class Strom {
public:
Strom();
~Strom();
void clear();
void processCommandLineOptions(int argc, const char * argv[]);
void run();
private:
bool processAssignmentString(const std::string & which, const std::string & definition);
void handleAssignmentStrings(const boost::program_options::variables_map & vm, std::string label, const std::vector<std::string> & definitions, std::string default_definition);
bool splitAssignmentString(const std::string & definition, std::vector<std::string> & vector_of_subset_names, std::vector<double> & vector_of_values);
double _expected_log_likelihood;
std::string _data_file_name;
std::string _tree_file_name;
Partition::SharedPtr _partition;
Data::SharedPtr _data;
Model::SharedPtr _model;
Likelihood::SharedPtr _likelihood;
TreeSummary::SharedPtr _tree_summary;
<span style="color:#0000ff"><strong>Lot::SharedPtr _lot;</strong></span>
<span style="color:#0000ff"><strong>unsigned _random_seed;</strong></span>
<span style="color:#0000ff"><strong>unsigned _num_iter;</strong></span>
<span style="color:#0000ff"><strong>unsigned _print_freq;</strong></span>
<span style="color:#0000ff"><strong>unsigned _sample_freq;</strong></span>
bool _use_gpu;
bool _ambig_missing;
bool _use_underflow_scaling;
static std::string _program_name;
static unsigned _major_version;
static unsigned _minor_version;
};
I’ve added several data members: _lot
(the pseudorandom number generator), _random_seed
(the starting seed for the pseudorandom number generator), _num_iter
(the number of parameter updates that our Markov chain simulator will attempt, _sample_freq
(the frequency with which parameter values will be saved), and _print_freq
(the frequency with which a progress report will be issued to the console). The last one (_print_freq
) will be ignored by our program for now; we will begin using it in the next step to control how much output appears on the screen as the program is running (we will not want a report after every iteration if we’ve asked for 10 million iterations!).
Modify strom.hpp by adding the bold, blue parts below to the clear
member function:
inline void Strom::clear() {
_data_file_name = "";
_tree_file_name = "";
_tree_summary = nullptr;
_partition.reset(new Partition());
_use_gpu = true;
_ambig_missing = true;
_model.reset(new Model());
_expected_log_likelihood = 0.0;
_data = nullptr;
_likelihood = nullptr;
_use_underflow_scaling = false;
<span style="color:#0000ff"><strong>_lot = nullptr;</strong></span>
<span style="color:#0000ff"><strong>_random_seed = 1;</strong></span>
<span style="color:#0000ff"><strong>_num_iter = 1000;</strong></span>
<span style="color:#0000ff"><strong>_print_freq = 1;</strong></span>
<span style="color:#0000ff"><strong>_sample_freq = 1;</strong></span>
}
Continue modifying strom.hpp by adding the bold, blue parts below to the Strom::processCommandLineOptions
function:
inline void Strom::processCommandLineOptions(int argc, const char * argv[]) {
std::vector<std::string> partition_statefreq;
std::vector<std::string> partition_rmatrix;
std::vector<std::string> partition_omega;
std::vector<std::string> partition_ratevar;
std::vector<std::string> partition_pinvar;
std::vector<std::string> partition_ncateg;
std::vector<std::string> partition_subsets;
std::vector<std::string> partition_relrates;
std::vector<std::string> partition_tree;
boost::program_options::variables_map vm;
boost::program_options::options_description desc("Allowed options");
desc.add_options()
("help,h", "produce help message")
("version,v", "show program version")
<span style="color:#0000ff"><strong>("seed,z", boost::program_options::value(&_random_seed)->default_value(1), "pseudorandom number seed")</strong></span>
<span style="color:#0000ff"><strong>("niter,n", boost::program_options::value(&_num_iter)->default_value(1000), "number of MCMC iterations")</strong></span>
<span style="color:#0000ff"><strong>("printfreq", boost::program_options::value(&_print_freq)->default_value(1), "skip this many iterations before reporting progress")</strong></span>
<span style="color:#0000ff"><strong>("samplefreq", boost::program_options::value(&_sample_freq)->default_value(1), "skip this many iterations before sampling next")</strong></span>
("datafile,d", boost::program_options::value(&_data_file_name)->required(), "name of a data file in NEXUS format")
("treefile,t", boost::program_options::value(&_tree_file_name)->required(), "name of a tree file in NEXUS format")
("subset", boost::program_options::value(&partition_subsets), "a string defining a partition subset, e.g. 'first:1-1234\3' or 'default[codon:standard]:1-3702'")
("ncateg,c", boost::program_options::value(&partition_ncateg), "number of categories in the discrete Gamma rate heterogeneity model")
("statefreq", boost::program_options::value(&partition_statefreq), "a string defining state frequencies for one or more data subsets, e.g. 'first,second:0.1,0.2,0.3,0.4'")
("omega", boost::program_options::value(&partition_omega), "a string defining the nonsynonymous/synonymous rate ratio omega for one or more data subsets, e.g. 'first,second:0.1'")
("rmatrix", boost::program_options::value(&partition_rmatrix), "a string defining the rmatrix for one or more data subsets, e.g. 'first,second:1,2,1,1,2,1'")
("ratevar", boost::program_options::value(&partition_ratevar), "a string defining the among-site rate variance for one or more data subsets, e.g. 'first,second:2.5'")
("pinvar", boost::program_options::value(&partition_pinvar), "a string defining the proportion of invariable sites for one or more data subsets, e.g. 'first,second:0.2'")
("relrate", boost::program_options::value(&partition_relrates), "a string defining the (unnormalized) relative rates for all data subsets (e.g. 'default:3,1,6').")
("tree", boost::program_options::value(&partition_tree), "the index of the tree in the tree file (first tree has index = 1)")
("expectedLnL", boost::program_options::value(&_expected_log_likelihood)->default_value(0.0), "log likelihood expected")
("gpu", boost::program_options::value(&_use_gpu)->default_value(true), "use GPU if available")
("ambigmissing", boost::program_options::value(&_ambig_missing)->default_value(true), "treat all ambiguities as missing data")
("underflowscaling", boost::program_options::value(&_use_underflow_scaling)->default_value(false), "scale site-likelihoods to prevent underflow (slower but safer)")
;
boost::program_options::store(boost::program_options::parse_command_line(argc, argv, desc), vm);
try {
const boost::program_options::parsed_options & parsed = boost::program_options::parse_config_file< char >("strom.conf", desc, false);
boost::program_options::store(parsed, vm);
}
catch(boost::program_options::reading_file & x) {
std::cout << "Note: configuration file (strom.conf) not found" << std::endl;
}
boost::program_options::notify(vm);
// If user specified --help on command line, output usage summary and quit
if (vm.count("help") > 0) {
std::cout << desc << "\n";
std::exit(1);
}
// If user specified --version on command line, output version and quit
if (vm.count("version") > 0) {
std::cout << boost::str(boost::format("This is %s version %d.%d") % _program_name % _major_version % _minor_version) << std::endl;
std::exit(1);
}
// If user specified --subset on command line, break specified partition subset
// definition into name and character set string and add to _partition
if (vm.count("subset") > 0) {
_partition.reset(new Partition());
for (auto s : partition_subsets) {
_partition->parseSubsetDefinition(s);
}
}
_model->setSubsetDataTypes(_partition->getSubsetDataTypes());
handleAssignmentStrings(vm, "statefreq", partition_statefreq, "default:equal");
handleAssignmentStrings(vm, "rmatrix", partition_rmatrix, "default:equal");
handleAssignmentStrings(vm, "omega", partition_omega, "default:0.1" );
handleAssignmentStrings(vm, "ncateg", partition_ncateg, "default:1" );
handleAssignmentStrings(vm, "ratevar", partition_ratevar, "default:1.0" );
handleAssignmentStrings(vm, "pinvar", partition_pinvar, "default:0.0" );
handleAssignmentStrings(vm, "relrate", partition_relrates, "default:equal");
handleAssignmentStrings(vm, "tree", partition_tree, "default:1" );
}
Our final modifications to strom.hpp will be to add the bold, blue parts below to the Strom::run
function:
inline void Strom::run() {
std::cout << "Starting..." << std::endl;
std::cout << "Current working directory: " << boost::filesystem::current_path() << std::endl;
try {
std::cout << "\n*** Reading and storing the data in the file " << _data_file_name << std::endl;
_data = Data::SharedPtr(new Data());
_data->setPartition(_partition);
_data->getDataFromFile(_data_file_name);
_model->setSubsetNumPatterns(_data->calcNumPatternsVect());
_model->setSubsetSizes(_partition->calcSubsetSizes());
_model->activate();
// Report information about data partition subsets
unsigned nsubsets = _data->getNumSubsets();
std::cout << "\nNumber of taxa: " << _data->getNumTaxa() << std::endl;
std::cout << "Number of partition subsets: " << nsubsets << std::endl;
for (unsigned subset = 0; subset < nsubsets; subset++) {
DataType dt = _partition->getDataTypeForSubset(subset);
std::cout << " Subset " << (subset+1) << " (" << _data->getSubsetName(subset) << ")" << std::endl;
std::cout << " data type: " << dt.getDataTypeAsString() << std::endl;
std::cout << " sites: " << _data->calcSeqLenInSubset(subset) << std::endl;
std::cout << " patterns: " << _data->getNumPatternsInSubset(subset) << std::endl;
std::cout << " ambiguity: " << (_ambig_missing || dt.isCodon() ? "treated as missing data (faster)" : "handled appropriately (slower)") << std::endl;
}
std::cout << "\n*** Resources available to BeagleLib " << _likelihood->beagleLibVersion() << ":\n";
std::cout << _likelihood->availableResources() << std::endl;
std::cout << "\n*** Creating the likelihood calculator" << std::endl;
_likelihood = Likelihood::SharedPtr(new Likelihood());
_likelihood->setPreferGPU(_use_gpu);
_likelihood->setAmbiguityEqualsMissing(_ambig_missing);
_likelihood->setData(_data);
_likelihood->useUnderflowScaling(_use_underflow_scaling);
std::cout << "\n*** Model description" << std::endl;
std::cout << _model->describeModel() << std::endl;
_likelihood->setModel(_model);
_likelihood->initBeagleLib();
std::cout << "\n*** Reading and storing the first tree in the file " << _tree_file_name << std::endl;
_tree_summary = TreeSummary::SharedPtr(new TreeSummary());
_tree_summary->readTreefile(_tree_file_name, 0);
Tree::SharedPtr tree = _tree_summary->getTree(0);
<span style="color:#0000ff"><strong>std::string newick = _tree_summary->getNewick(0);</strong></span>
if (tree->numLeaves() != _data->getNumTaxa())
throw XStrom(boost::format("Number of taxa in tree (%d) does not equal the number of taxa in the data matrix (%d)") % tree->numLeaves() % _data->getNumTaxa());
std::cout << "\n*** Calculating the likelihood of the tree" << std::endl;
<span style="color:#0000ff"><strong>TreeManip tm(tree);</strong></span>
<span style="color:#0000ff"><strong>tm.selectAllPartials();</strong></span>
<span style="color:#0000ff"><strong>tm.selectAllTMatrices();</strong></span>
double lnL = _likelihood->calcLogLikelihood(tree);
std::cout << boost::str(boost::format("log likelihood = %.5f") % lnL) << std::endl;
if (_expected_log_likelihood != 0.0)
std::cout << boost::str(boost::format(" (expecting %.3f)") % _expected_log_likelihood) << std::endl;
<span style="color:#0000ff"><strong>// Create a Lot object that generates (pseudo)random numbers</strong></span>
<span style="color:#0000ff"><strong>_lot = Lot::SharedPtr(new Lot);</strong></span>
<span style="color:#0000ff"><strong>_lot->setSeed(_random_seed);</strong></span>
<span style="color:#0000ff"><strong>// Create a Chain object and take _num_iter steps</strong></span>
<span style="color:#0000ff"><strong>Chain chain;</strong></span>
<span style="color:#0000ff"><strong>unsigned num_free_parameters = chain.createUpdaters(_model, _lot, _likelihood);</strong></span>
<span style="color:#0000ff"><strong>if (num_free_parameters > 0) {</strong></span>
<span style="color:#0000ff"><strong>chain.setTreeFromNewick(newick);</strong></span>
<span style="color:#0000ff"><strong>chain.start();</strong></span>
<span style="color:#0000ff"><strong>// Output column headers and first line of output showing starting state (iteration 0)</strong></span>
<span style="color:#0000ff"><strong>std::cout << boost::str(boost::format("\n%12s %12s %12s %12s %12s %12s\n")</strong></span>
<span style="color:#0000ff"><strong>% "iteration"</strong></span>
<span style="color:#0000ff"><strong>% "lnLike"</strong></span>
<span style="color:#0000ff"><strong>% "lnPrior"</strong></span>
<span style="color:#0000ff"><strong>% "ratevar"</strong></span>
<span style="color:#0000ff"><strong>% "accept"</strong></span>
<span style="color:#0000ff"><strong>% "samples");</strong></span>
<span style="color:#0000ff"><strong>GammaRateVarUpdater::SharedPtr ratevar_updater = std::dynamic_pointer_cast<GammaRateVarUpdater> (chain.findUpdaterByName("Gamma Rate Variance"));</strong></span>
<span style="color:#0000ff"><strong>std::cout << boost::str(boost::format("%12d %12.5f %12.5f %12.5f %12.1f %12d\n")</strong></span>
<span style="color:#0000ff"><strong>% 0</strong></span>
<span style="color:#0000ff"><strong>% chain.getLogLikelihood()</strong></span>
<span style="color:#0000ff"><strong>% chain.calcLogJointPrior()</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getCurrentPoint()</strong></span>
<span style="color:#0000ff"><strong>% 0</strong></span>
<span style="color:#0000ff"><strong>% 0);</strong></span>
<span style="color:#0000ff"><strong>for (unsigned iteration = 1; iteration <= _num_iter; ++iteration) {</strong></span>
<span style="color:#0000ff"><strong>chain.nextStep(iteration);</strong></span>
<span style="color:#0000ff"><strong>if (iteration % _sample_freq == 0) {</strong></span>
<span style="color:#0000ff"><strong>GammaRateVarUpdater::SharedPtr ratevar_updater = std::dynamic_pointer_cast<GammaRateVarUpdater> (chain.findUpdaterByName("Gamma Rate Variance"));</strong></span>
<span style="color:#0000ff"><strong>double log_prior = chain.calcLogJointPrior();</strong></span>
<span style="color:#0000ff"><strong>if (log_prior == Updater::getLogZero())</strong></span>
<span style="color:#0000ff"><strong>std::cout << boost::str(boost::format("%12d %12.5f %12s %12.5f %12.1f %12d\n")</strong></span>
<span style="color:#0000ff"><strong>% iteration</strong></span>
<span style="color:#0000ff"><strong>% chain.getLogLikelihood()</strong></span>
<span style="color:#0000ff"><strong>% "-infinity"</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getCurrentPoint()</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getAcceptPct()</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getNumUpdates());</strong></span>
<span style="color:#0000ff"><strong>else</strong></span>
<span style="color:#0000ff"><strong>std::cout << boost::str(boost::format("%12d %12.5f %12.5f %12.5f %12.1f %12d\n")</strong></span>
<span style="color:#0000ff"><strong>% iteration</strong></span>
<span style="color:#0000ff"><strong>% chain.getLogLikelihood()</strong></span>
<span style="color:#0000ff"><strong>% log_prior</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getCurrentPoint()</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getAcceptPct()</strong></span>
<span style="color:#0000ff"><strong>% ratevar_updater->getNumUpdates());</strong></span>
<span style="color:#0000ff"><strong>}</strong></span>
<span style="color:#0000ff"><strong>}</strong></span>
<span style="color:#0000ff"><strong>chain.stop();</strong></span>
<span style="color:#0000ff"><strong>}</strong></span>
<span style="color:#0000ff"><strong>else {</strong></span>
<span style="color:#0000ff"><strong>std::cout << "\nMCMC skipped because there are no free parameters in the model" << std::endl;</strong></span>
<span style="color:#0000ff"><strong>}</strong></span>
}
catch (XStrom & x) {
std::cerr << "Strom encountered a problem:\n " << x.what() << std::endl;
}
std::cout << "\nFinished!" << std::endl;
}
You will also need to return main.cpp to the way it was before we changed it to test the Lot
class. Replace everything in main.cpp with the following:
#include <limits>
#include <iostream>
#include "strom.hpp"
using namespace strom;
// static data member initializations
std::string Strom::_program_name = "strom";
unsigned Strom::_major_version = 1;
unsigned Strom::_minor_version = 0;
const double Node::_smallest_edge_length = 1.0e-12;
const double Updater::_log_zero = -std::numeric_limits<double>::max();
GeneticCode::genetic_code_definitions_t GeneticCode::_definitions = { // codon order is alphabetical: i.e. AAA, AAC, AAG, AAT, ACA, ..., TTT
{"standard", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},
{"vertmito", "KNKNTTTT*S*SMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"yeastmito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"moldmito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"invertmito", "KNKNTTTTSSSSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"ciliate", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVVQYQYSSSS*CWCLFLF"},
{"echinomito", "NNKNTTTTSSSSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"euplotid", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSCCWCLFLF"},
{"plantplastid", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},
{"altyeast", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLSLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},
{"ascidianmito", "KNKNTTTTGSGSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"altflatwormmito", "NNKNTTTTSSSSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVVYY*YSSSSWCWCLFLF"},
{"blepharismamacro", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YQYSSSS*CWCLFLF"},
{"chlorophyceanmito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YLYSSSS*CWCLFLF"},
{"trematodemito", "NNKNTTTTSSSSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},
{"scenedesmusmito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YLY*SSS*CWCLFLF"},
{"thraustochytriummito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWC*FLF"}
};
int main(int argc, const char * argv[]) {
Strom strom;
try {
strom.processCommandLineOptions(argc, argv);
strom.run();
}
catch(std::exception & x) {
std::cerr << "Exception: " << x.what() << std::endl;
std::cerr << "Aborted." << std::endl;
}
catch(...) {
std::cerr << "Exception of unknown type!\n";
}
return 0;
}
Replace the contents of strom.conf with the following. This sets all of the model parameters to the values PAUP* estimated with the exception of ratevar, which we will start at 1.0 rather than 3.94280 so we can test whether our rate variance updater is working.
We also set the number of rate categories to 4. This is necessary because, otherwise, the model would assume rate homogeneity and our one updatable parameter (gamma rate variance) would not be updated!
datafile = rbcl10.nex
treefile = rbcl10.tre
statefreq = default:[0.294667,0.189172,0.206055,0.310106]
rmatrix = default:[0.06082,0.27887,0.06461,0.06244,0.48492,0.04834]
ratevar = default:1.0 # MLE = 3.94322 = 1./0.25360
pinvar = default:[0.0]
ncateg = default:4
niter = 1000
samplefreq = 1
printfreq = 1
expectedLnL = -6897.492
Add the lines indicated below to your meson.build file. These 2 new lines copy rbcl10.nex and rbcl10.tre to your install directory.
project('strom', 'cpp',
default_options : ['cpp_std=c++11','prefix=/home/paul/Documents/strom/distr'],
version : '1.0')
cpp = meson.get_compiler('cpp')
lib_filesystem = cpp.find_library('boost_filesystem', dirs: ['/home/paul/lib/static'], required: true)
lib_program_options = cpp.find_library('boost_program_options', dirs: ['/home/paul/lib/static'], required: true)
lib_ncl = cpp.find_library('ncl', dirs: ['/home/paul/lib/static'], required: true)
lib_beagle = cpp.find_library('hmsbeagle', dirs: ['/home/paul/lib'], required: true)
incl_beagle = include_directories('/home/paul/include/libhmsbeagle-1')
incl_ncl = include_directories('/home/paul/include')
incl_boost = include_directories('/home/paul/Documents/libraries/boost_1_71_0')
incl_eigen = include_directories('/home/paul/Documents/libraries/eigen-eigen-323c052e1731')
executable('strom', 'main.cpp', install: true, install_dir: '.', dependencies: [lib_beagle,lib_ncl,lib_program_options,lib_filesystem], include_directories: [incl_beagle,incl_ncl,incl_boost,incl_eigen])
install_data('strom.conf', install_dir: '.')
<span style="color:#0000ff"><strong>install_data('rbcl10.nex', install_dir: '.')</strong></span>
<span style="color:#0000ff"><strong>install_data('rbcl10.tre', install_dir: '.')</strong></span>
install_data('rbcl738.nex', install_dir: '.')
install_data('rbcl738nj.tre', install_dir: '.')
install_data('test.tre', install_dir: '.')
install_data('rbcL.nex', install_dir: '.')
install_data('rbcLjc.tre', install_dir: '.')
install_data('go.sh', install_dir: '.')
install_data('datatest.nex', install_dir: '.')
Compile and install strom
, as usual, by navigating into your build directory and typing meson install
.
You should see output from 1000 calls to the nextStep
function of the Chain
object: I’ve shown only the first 10 iterations (plus the very last iteration) of this output below
iteration lnLike lnPrior ratevar accept samples
0 -6897.49179 -1.00000 1.00000 0 0
1 -6817.59032 -1.49718 1.49718 100.0 1
2 -6817.59032 -1.49718 1.49718 50.0 2
3 -6770.15548 -2.02658 2.02658 66.7 3
4 -6770.15548 -2.02658 2.02658 50.0 4
5 -6770.15548 -2.02658 2.02658 40.0 5
6 -6770.15548 -2.02658 2.02658 33.3 6
7 -6750.87905 -2.37988 2.37988 42.9 7
8 -6742.62316 -2.58944 2.58944 50.0 8
9 -6742.62316 -2.58944 2.58944 44.4 9
10 -6742.62316 -2.58944 2.58944 40.0 10
1000 -6723.60154 -3.70240 3.70240 30.9 1000
Note that the acceptance percentage for our gamma rate variance updater quickly converges to 30%, which is the target we specified by default in the Chain
constructor (actually, in Chain::clear
function, which is called by the constructor). Right now, we are tuning throughout the MCMC run, but soon we will change things so that autotuning is done for our updaters only during the burn-in period.
Also note that the likelihood improves as the gamma rate variance converges from the starting value of 1.0 to a value close to the maximum likelihood estimate (MLE) 3.94280, suggesting that our MCMC chain is converging on a region of parameter space having higher posterior probability density. The log-likelihood also approaches (but of course can never exceed) the maximum likleihood of -6723.144 as the chain runs.
Finally, notice that the lnPrior column is always equal to -1 times the gamma rate variance value. This is because the prior is completely determined by the prior on the gamma rate variance parameter (our model currently only has one estimated parameter). This gamma rate variance prior is, by default, Gamma(1,1), which is specified in Chain::clear
. The log of this probability density is -log(1) – ratevar/1
, which just equals -ratevar
.