16.4 The SubsetRelRateUpdater Class

(Linux version)

< 16.3 | 16.4 | 16.5 >

Creating a SubsetRelRateUpdater is slightly more complicated than creating the StateFreqUpdater and ExchangeabilityUpdater classes. That’s because subset relative rates do not themselves have a Dirichlet prior distribution. Dirichlet distributions are defined on a simplex, which means that a Dirichlet random variable is a vector whose elements sum to 1.0. Subset relative rates do not sum to 1; instead, their mean is 1. What does sum to 1.0 are the components of the mean. To see this, consider how the mean relative substitution rate across subsets is computed (in pseudocode):

mean relative rate = r_1 p_1 + r_2 p_2 + ... + r_k p_k

Here, r_i is the relative rate at which subset i is evolving and p_i is the probability that a site is in subset i (and thus has relative rate r_i). The probability p_i is simply the proportion of sites in subset i. The mean relative rate equals 1.0, by design, because subset relative rates affect the heterogeneity in substitution rate across subsets, not the absolute rate. Writing the same equation in this way:

mean relative rate = y_1 + y_2 + ... + y_k = 1.0

Shows that the values y_i = r_i p_i form a simplex and can thus have a Dirichlet prior distribution. Because of the involvement of the p_i, this means that the r_i do not themselves have a Dirichlet prior. The distribution of the subset relative rates can be easily obtained, however, and the prior density for a vector of K subset relative rates is identical to a Dirichlet density multiplied by a constant factor equal to the product of the first K-1 proportions (i.e. p_1 p_2 ... p_(K-1)).

In order to use the DirichletUpdater base class, we must provide it with y_i values rather than r_i values, and then multiply the prior computed by DirichletUpdater by the constant factor described above. We override the virtual function DirichletUpdater::calcLogPrior in order to make the needed modification.

Create a new header file named subset_relrate_updater.hpp and fill it with the following.

#pragma once    

#include "dirichlet_updater.hpp"

namespace strom {
    
    class SubsetRelRateUpdater : public DirichletUpdater {
        public:
            typedef std::shared_ptr&lt; SubsetRelRateUpdater &gt; SharedPtr;

                                            SubsetRelRateUpdater(Model::SharedPtr model);
                                            ~SubsetRelRateUpdater();
        
            virtual double                  calcLogPrior();

        private:
        
            void                            pullFromModel();
            void                            pushToModel();

            Model::SharedPtr                _model;
        };

    // member function bodies go here
    
}   

Constructor and destructor

The constructor first calls the clear function defined by the DirichletUpdater base class. It then sets its _name to “Subset Relative Rates” and the data member _model to the argument supplied as the model parameter. The destructor is merely a placeholder, as usual.

    inline SubsetRelRateUpdater::SubsetRelRateUpdater(Model::SharedPtr model) {   
        //std::cout &lt;&lt; "Creating a SubsetRelRateUpdater" &lt;&lt; std::endl;
        DirichletUpdater::clear();
        _name = "Subset Relative Rates";
        _model = model;
    }

    inline SubsetRelRateUpdater::~SubsetRelRateUpdater() {
        //std::cout &lt;&lt; "Destroying a SubsetRelRateUpdater" &lt;&lt; std::endl;
    }   

The calcLogPrior member function

As explained above, this function first calls the calcLogPrior member function of the base class DirichletUpdater in order to do the majority of the work. The only remaining work is to add the log of the first K-1 subset proportions (where K is the number of subsets) to the log prior value returned by DirichletUpdater::calcLogPrior. Each of these subset proportions equals the number of sites in the subset divided by the total number of sites. On log scale, this turns into the log of the number of sites in the subset minus the log of the total number of sites.

    inline double SubsetRelRateUpdater::calcLogPrior() {  
        Model::subset_sizes_t & subset_sizes = _model-&gt;getSubsetSizes();
        double log_num_sites = std::log(_model-&gt;getNumSites());
        unsigned num_subsets = _model-&gt;getNumSubsets();
        double log_prior = DirichletUpdater::calcLogPrior();
        for (unsigned i = 0; i &lt; num_subsets-1; i++) {
            log_prior += std::log(subset_sizes[i]) - log_num_sites;
        }
        return log_prior;
    }  

The pullFromModel and pushToModel member functions

The pullFromModel function cannot simply copy the subset relative rates from the _subset_relrates data member of model into _curr_point. Instead, each element of _curr_point is the product of a subset relative rate and the proportion of sites in that subset.

Similarly, the reverse process is used in pushToModel. Each element of _curr_point is divided by the proportion of sites in the subset in order to extract just the subset relative rate. These relative rates are stored in a temporary vector tmp and then uploaded to the model via the setSubsetRelRates function. Note that we can assume that the second parameter (fixed) of the setSubsetRelRates function is false because, if this updater is in use, then subset relative rates cannot have been fixed by the user.

    inline void SubsetRelRateUpdater::pullFromModel() {  
        Model::subset_relrate_vect_t & relative_rates = _model-&gt;getSubsetRelRates();
        Model::subset_sizes_t &        subset_sizes   = _model-&gt;getSubsetSizes();
        unsigned num_sites   = _model-&gt;getNumSites();
        unsigned num_subsets = _model-&gt;getNumSubsets();
        assert(subset_sizes.size() == num_subsets);
        assert(relative_rates.size() == num_subsets);
        _curr_point.resize(num_subsets);
        for (unsigned i = 0; i &lt; num_subsets; i++) {
            _curr_point[i] = relative_rates[i]*subset_sizes[i]/num_sites;
        }
    }
    
    inline void SubsetRelRateUpdater::pushToModel() {
        Model::subset_sizes_t & subset_sizes = _model-&gt;getSubsetSizes();
        unsigned num_sites   = _model-&gt;getNumSites();
        unsigned num_subsets = _model-&gt;getNumSubsets();
        point_t tmp(num_subsets);
        for (unsigned i = 0; i &lt; num_subsets; i++) {
            tmp[i] = _curr_point[i]*num_sites/subset_sizes[i];
        }
        _model-&gt;setSubsetRelRates(tmp, false);
    }