9.2 Create the Partition class

(Linux version)

< 9.1 | 9.2 | 9.3 >

Partition class

A Partition object will serve as the manager for data subsets defined by the user. It will store the name of each data subset, the range(s) of sites included in each subset, and information about the type of data in each subset.

Create a new C++ class named Partition and add it to your project as the header file partition.hpp. Below is the class declaration. The body of each member function will be described separately (you should add each of these member function bodies just above the right curly bracket that terminates the namespace block).

#pragma once    

#include &lt;tuple&gt;
#include &lt;limits&gt;
#include &lt;cmath&gt;
#include &lt;boost/format.hpp&gt;
#include &lt;boost/algorithm/string.hpp&gt;
#include &lt;boost/algorithm/string/split.hpp&gt;             
#include &lt;boost/algorithm/string/classification.hpp&gt;    
#include "genetic_code.hpp"
#include "datatype.hpp"
#include "xstrom.hpp"

namespace strom {

    class Partition {
        public:
            typedef std::match_results&lt;std::string::const_iterator&gt;::const_reference    regex_match_t;
            typedef std::tuple&lt;unsigned, unsigned, unsigned, unsigned&gt;                  subset_range_t;
            typedef std::vector&lt;subset_range_t&gt;                                         partition_t;
            typedef std::vector&lt;DataType&gt;                                               datatype_vect_t;
            typedef std::vector&lt;unsigned&gt;                                               subset_sizes_vect_t;
            typedef std::vector&lt;std::string&gt;                                            subset_names_vect_t;
            typedef std::shared_ptr&lt;Partition&gt;                                          SharedPtr;

                                                        Partition();
                                                        ~Partition();
        
            unsigned                                    getNumSites() const;
            unsigned                                    getNumSubsets() const;
            std::string                                 getSubsetName(unsigned subset) const;
        
            const partition_t &                         getSubsetRangeVect() const;
        
            unsigned                                    findSubsetByName(const std::string & subset_name) const;
            unsigned                                    findSubsetForSite(unsigned site_index) const;
            bool                                        siteInSubset(unsigned site_index, unsigned subset_index) const;
            DataType                                    getDataTypeForSubset(unsigned subset_index) const;
            const datatype_vect_t &                     getSubsetDataTypes() const;
        
            unsigned                                    numSitesInSubset(unsigned subset_index) const;
            subset_sizes_vect_t                         calcSubsetSizes() const;

            void                                        defaultPartition(unsigned nsites = std::numeric_limits&lt;unsigned&gt;::max());
            void                                        parseSubsetDefinition(std::string & s);
            void                                        finalize(unsigned nsites);

            void                                        clear();

        private:

            int                                         extractIntFromRegexMatch(regex_match_t s, unsigned min_value);
            void                                        addSubsetRange(unsigned subset_index, std::string range_definition);
            void                                        addSubset(unsigned subset_index, std::string subset_definition);

            unsigned                                    _num_sites;
            unsigned                                    _num_subsets;
            subset_names_vect_t                         _subset_names;
            partition_t                                 _subset_ranges;
            datatype_vect_t                             _subset_data_types;

            const unsigned                              _infinity;
    };
    
    // member function bodies here
    
}   

The class declaration above defines several types that are introduced to simplify member function prototypes and data member definitions that follow:

These type definitions make it simpler to define variables of these types and to pass such variables into functions.

Constructor and destructor

Here are the bodies of the constructor and destructor. As usual, the only thing that we have these functions do is to report when an object of the Partition class is created or destroyed, and these lines have been commented out (you can uncomment them at any time for debugging purposes). In addition, the constructor calls the clear member function to perform initializations.

The data member _infinity needs some explanation. This data member has type const unsigned and as a const data member must be initialized before the Partition object has been created. Thus, we cannot initialize it in the body of the constructor, because, by that time, the object already exists. Hence, _infinity is initialized via the initializer list (before the left curly bracket that opens the constructor body). The data member _infinity is set equal to the largest possible unsigned value, which is used as a stand-in for the total number of sites until we actually know the total number of sites. The value itself is obtained using the (static) max function of the class std::numeric_limits<unsigned>. The #include <limits> line at the top of the file provides access to the numeric_limits class.

    inline Partition::Partition() : _infinity(std::numeric_limits&lt;unsigned&gt;::max()) {   
        //std::cout &lt;&lt; "Constructing a Partition" &lt;&lt; std::endl;
        clear();
    }  

    inline Partition::~Partition() {
        //std::cout &lt;&lt; "Destroying a Partition" &lt;&lt; std::endl;
    }   

Accessor functions

The functions getNumSites, getNumSubsets, getSubsetName, getSubsetRangeVect, getDataTypeForSubset, and getSubsetDataTypes provide access to the values stored in the private data members _num_sites, _num_subsets, _subset_names, _subset_ranges, and _subset_data_types, respectively, but do not allow you to change those variables.

    inline unsigned Partition::getNumSites() const {    
        return _num_sites;
    }
    
    inline unsigned Partition::getNumSubsets() const {
        return _num_subsets;
    }
    
    inline std::string Partition::getSubsetName(unsigned subset) const {
        assert(subset &lt; _num_subsets);
        return _subset_names[subset];
    }
    
    inline const Partition::partition_t & Partition::getSubsetRangeVect() const {
        return _subset_ranges;
    }
    
    inline DataType Partition::getDataTypeForSubset(unsigned subset_index) const {
        assert(subset_index &lt; _subset_data_types.size());
        return _subset_data_types[subset_index];
    }

    inline const std::vector&lt;DataType&gt; & Partition::getSubsetDataTypes() const {
        return _subset_data_types;
    }    

The findSubsetByName member function

This function returns the (0-based) index of the subset in _subset_names corresponding to the subset_name provided. If no subset by that name can be found, an exception is thrown. Here the std::find function is used to search for subset_name in _subset_names. If found, the returned iterator will not equal _subset_names.end(), which represents the position just beyond the last element of the vector. Once the iterator is positioned at the correct element of _subset_names, the index is found using the std::distance algorithm, which computes the distance between the returned iterator and the first element of the vector.

    inline unsigned Partition::findSubsetByName(const std::string & subset_name) const {    
        auto iter = std::find(_subset_names.begin(), _subset_names.end(), subset_name);
        if (iter == _subset_names.end())
            throw XStrom(boost::format("Specified subset name \"%s\" not found in partition") % subset_name);
        return (unsigned)std::distance(_subset_names.begin(),iter);
    }    

The findSubsetForSite member function

This function returns the subset index corresponding to a given site. Subset indices start at 0 and are indexed according to the order in which they are specified. Sites are numbered starting from 1, which is the convention used in, for example, NEXUS formatted data files.

The information for each chunk of sites is stored in a 4-tuple. A std::tuple is a structure that contains a fixed number of values (in our case 4) and is a generalization of std::pair, which represents a 2-tuple. The std::get template returns a reference to the kth element of the tuple t, where k is specified in the angle brackets and t in parentheses. Each value in the 4-tuple is copied to a local variable for purposes of clarity.

    inline unsigned Partition::findSubsetForSite(unsigned site_index) const {    
        for (auto & t : _subset_ranges) {
            unsigned begin_site = std::get&lt;0&gt;(t);
            unsigned end_site = std::get&lt;1&gt;(t);
            unsigned stride = std::get&lt;2&gt;(t);
            unsigned site_subset = std::get&lt;3&gt;(t);
            bool inside_range = site_index &gt;= begin_site && site_index &lt;= end_site;
            if (inside_range && (site_index - begin_site) % stride == 0)
                return site_subset;
        }
        throw XStrom(boost::format("Site %d not found in any subset of partition") % (site_index + 1));
    }    

The siteInSubset member function

This function simply returns true if the specified site (1,2,…,_num_sites) is in the specified subset (0, 1, …, _num_subsets-1), and returns false otherwise. It uses findSubsetForSite to do the heavy lifting. Note that sites are numbered starting with 1 in NEXUS data files but a 0-based indexing system is used for everything else in our program.

    inline bool Partition::siteInSubset(unsigned site_index, unsigned subset_index) const {    
        unsigned which_subset = findSubsetForSite(site_index);
        return (which_subset == subset_index ? true : false);
    }    

The numSitesInSubset member function

This function calculates the number of sites assigned to the subset having index subset_index. This involves looping through all the subset ranges in _subset_ranges and, for all ranges assigned to _subset_index, determining how many sites are included.

This process would be uncomplicated were it not for the third element of each range (the step size or stride), which can be greater than 1 and which necessitates the use of the modulus operator to determine whether the range includes one more site than is suggested by floor(n/stride). For example, suppose begin_site is 1, end_site is 10, and stride is 3. A total of 4 sites are included (sites 1, 4, 7, 10), yet 10/3 is only 3.

    inline unsigned Partition::numSitesInSubset(unsigned subset_index) const {    
        unsigned nsites = 0;
        for (auto & t : _subset_ranges) {
            unsigned begin_site = std::get&lt;0&gt;(t);
            unsigned end_site = std::get&lt;1&gt;(t);
            unsigned stride = std::get&lt;2&gt;(t);
            unsigned site_subset = std::get&lt;3&gt;(t);
            if (site_subset == subset_index) {
                unsigned n = end_site - begin_site + 1;
                nsites += (unsigned)(floor(n/stride)) + (n % stride == 0 ? 0 : 1);
            }
        }
        return nsites;
    }    

The calcSubsetSizes member function

This function returns a vector of subset sizes using the same approach used by numSitesInSubset to count the numnber of sites falling in each subset. This function is useful for reporting information about the partition to the user, and these subset sizes are needed (as we shall later see) when normalizing the subset relative rates of substitution.

    inline std::vector&lt;unsigned&gt; Partition::calcSubsetSizes() const {    
        assert(_num_sites &gt; 0); // only makes sense to call this function after subsets are defined
        std::vector&lt;unsigned&gt; nsites_vect(_num_subsets, 0);
        for (auto & t : _subset_ranges) {
            unsigned begin_site = std::get&lt;0&gt;(t);
            unsigned end_site = std::get&lt;1&gt;(t);
            unsigned stride = std::get&lt;2&gt;(t);
            unsigned site_subset = std::get&lt;3&gt;(t);
            unsigned hull = end_site - begin_site + 1;
            unsigned n = (unsigned)(floor(hull/stride)) + (hull % stride == 0 ? 0 : 1);
            nsites_vect[site_subset] += n;
        }
        return nsites_vect;
    }    

The clear member function

Like other clear functions in this tutorial, this function is called by the constructor (but could be called at other times) and returns the Partition object to the just-constructed state. Note that the clear function creates a default partition consisting of a single subset range with begin site 1, end site _infinity, step size 1, and subset index 0. As soon as the user adds the first subset definition, this default partition will be deleted. Assuming that the user does not define a partition, then as soon as data are read (i.e. when the Partition::finalize member function is called), the _infinity will be replaced by the actual total number of sites.

    inline void Partition::clear() {    
        _num_sites = 0;
        _num_subsets = 1;
        _subset_data_types.clear();
        _subset_data_types.push_back(DataType());
        _subset_names.clear();
        _subset_names.push_back("default");
        _subset_ranges.clear();
        _subset_ranges.push_back(std::make_tuple(1, _infinity, 1, 0));
    }    

The parseSubsetDefinition member function

This function provides the primary route by which partition subsets are added. It takes a string s representing everything after the keyword “subset” in a configuration file and splits s at the colon to yield two strings, before_colon (e.g. “rbcL[codon,plantplastid]”) and subset_definition (e.g. “1-20”).

The use of boost::split to split s at the colon character requires including the header file boost/algorithm/string/split.hpp and the use of the boost::is_any_of predicate requires including the header file boost/algorithm/string/classification.hpp. You will find that we indeed did include these headers at the beginning of the partition.hpp file:

#pragma once    

#include &lt;tuple&gt;
#include &lt;limits&gt;
#include &lt;cmath&gt;
#include &lt;boost/format.hpp&gt;
#include &lt;boost/algorithm/string.hpp&gt;
<span style="color:#0000ff"><strong>#include &lt;boost/algorithm/string/split.hpp&gt;</strong></span>
<span style="color:#0000ff"><strong>#include &lt;boost/algorithm/string/classification.hpp&gt;</strong></span>
#include "genetic_code.hpp"
#include "datatype.hpp"
#include "xstrom.hpp"


The main complication faced by parseSubsetDefinition is that the string before_colon may be either just the subset label chosen by the user (e.g. “rbcL”), or it may be a subset name followed by a subset data type specification embedded in square brackets (e.g. “rbcL[codon,plantplastid]”). The subset name and the subset data type (if it is there) are extracted through the use of regular expressions using the std::regex_match function. The regular expression pattern used is explained below.

R"((.+?)\s*(\[(\S+?)\])*)"
   ^^^^^ non-greedy sequence of 1 or more characters
R"((.+?)\s*(\[(\S+?)\])*)"
        ^^^ 0 or more whitespace characters
R"((.+?)\s*(\[(\S+?)\])*)"
           ^^^^^^^^^^^^^ captures subset data type specification if it is there
R"((.+?)\s*(\[(\S+?)\])*)"
            ^^      ^^ literal left/right square brackets (preceded by backslash
                       because brackets have special meaning in regular expressions)
R"((.+?)\s*(\[(\S+?)\])*)"
              ^^^^^^ 1 or more darkspace characters (this should be "nucleotide",
                     "protein", "standard", or "codon" but we will check this later                      
                     because we don't want the entire regular expression to fail if
                     the user has specified an invalid data type here

First of all, this is an example of a raw literal string. The beginning R"( and the ending )" form a wrapper that identifies this as a raw string and thus these 5 characters are not actually part of the pattern string. The reason we use a raw string here is that regular expression patterns are full of backslash characters, which, in regular expressions, signal that the following character has special significance. For example, if \d appears in a regular expression pattern, it means that d means “digit character” rather than just the letter d. If a raw string is not used, then each backslash character must be “escaped” by preceding it with a second backslash. This leads to lots of double backslash character combinations, which make regular expression patterns, already difficult to comprehend, even more difficult to construct correctly.

The resulting match_obj is a vector of length either 2 or 4. It has length 2 if the user did not specify a data type at all (“nucleotide” is assumed), or 4 if the user did specify a data type in square brackets. The reason there are two extra elements is that we captured not only the data type itself, but also the entire expression in square brackets (in order to test whether it was there). Thus, the specification rbcL[codon,plantplastid] would result in the following match_obj vector:

match_obj[0] = "rbcL[codon,plantplastid]"
match_obj[1] = "rbcL"
match_obj[2] = "[codon,plantplastid]"
match_obj[3] = "codon,plantplastid"

whereas the specification rbcL (which is effectively the same as rbcL[nucleotide]) would result in:

match_obj[0] = "rbcL"
match_obj[1] = "rbcL"

If the data type is “codon”, then the user may have either specified a particular genetic code name (e.g. “plantplastid”) or not, in which case the “standard” (i.e. universal) genetic code is assumed. A further regular expression is used to detect whether the data type is “codon” and a genetic code was specified. Several else clauses handle all other possible data types, including the codon data type where the default genetic code is assumed.

The function then adds subset_name to the _subset_names vector, adds the data type dt to _subset_data_types, updates _num_subsets, then calls addSubset to do all of the work involved in interpreting the subset_definition string. Note that this function deletes the default partition (created in the constructor): if this function is ever called, it is because the user has defined a partition and thus the default partition is not needed.

Here’s the entire function body:

    inline void Partition::parseSubsetDefinition(std::string & s) {    
        std::vector&lt;std::string&gt; v;
        
        // first separate part before colon (stored in v[0]) from part after colon (stored in v[1])
        boost::split(v, s, boost::is_any_of(":"));
        if (v.size() != 2)
            throw XStrom("Expecting exactly one colon in partition subset definition");

        std::string before_colon = v[0];
        std::string subset_definition = v[1];

        // now see if before_colon contains a data type specification in square brackets
        const char * pattern_string = R"((.+?)\s*(\[(\S+?)\])*)";
        std::regex re(pattern_string);
        std::smatch match_obj;
        bool matched = std::regex_match(before_colon, match_obj, re);
        if (!matched) {
            throw XStrom(boost::format("Could not interpret \"%s\" as a subset label with optional data type in square brackets") % before_colon);
        }
        
        // match_obj always yields 2 strings that can be indexed using the operator[] function
        // match_obj[0] equals entire subset label/type string (e.g. "rbcL[codon:standard]")
        // match_obj[1] equals the subset label (e.g. "rbcL")
        
        // Two more elements will exist if the user has specified a data type for this partition subset
        // match_obj[2] equals data type inside square brackets (e.g. "[codon:standard]")
        // match_obj[3] equals data type only (e.g. "codon:standard")
        
        std::string subset_name = match_obj[1].str();
        DataType dt;    // nucleotide by default
        std::string datatype = "nucleotide";
        if (match_obj.size() == 4 && match_obj[3].length() &gt; 0) {
            datatype = match_obj[3].str();
            boost::to_lower(datatype);

            // check for comma plus genetic code in case of codon
            std::regex re(R"(codon\s*,\s*(\S+))");
            std::smatch m;
            if (std::regex_match(datatype, m, re)) {
                dt.setCodon();
                std::string genetic_code_name = m[1].str();
                dt.setGeneticCodeFromName(genetic_code_name);
            }
            else if (datatype == "codon") {
                dt.setCodon();  // assumes standard genetic code
            }
            else if (datatype == "protein") {
                dt.setProtein();
                }
            else if (datatype == "nucleotide") {
                dt.setNucleotide();
                }
            else if (datatype == "standard") {
                dt.setStandard();
                }
            else {
                throw XStrom(boost::format("Datatype \"%s\" specified for subset(s) \"%s\" is invalid: must be either nucleotide, codon, protein, or standard") % datatype % subset_name);
                }
            }

        // Remove default subset if there is one
        unsigned end_site = std::get&lt;1&gt;(_subset_ranges[0]);
        if (_num_subsets == 1 && end_site == _infinity) {
            _subset_names.clear();
            _subset_data_types.clear();
            _subset_ranges.clear();
        }
        else if (subset_name == "default") {
            throw XStrom("Cannot specify \"default\" partition subset after already defining other subsets");
        }
        _subset_names.push_back(subset_name);
        _subset_data_types.push_back(dt);
        _num_subsets = (unsigned)_subset_names.size();
        addSubset(_num_subsets - 1, subset_definition);

        std::cout &lt;&lt; boost::str(boost::format("Partition subset %s comprises sites %s and has type %s") % subset_name % subset_definition % datatype) &lt;&lt; std::endl;
    }    

The addSubset member function

This is a private member function that does the work of breaking up a subset definition into a vector of component ranges (these component ranges are separated by commas in subset_definition). This is accomplished by the boost::split function, and the resulting components are each submitted to the member function addSubsetRange.

    inline void Partition::addSubset(unsigned subset_index, std::string subset_definition) {    
        std::vector&lt;std::string&gt; parts;
        boost::split(parts, subset_definition, boost::is_any_of(","));
        for (auto subset_component : parts) {
            addSubsetRange(subset_index, subset_component);
        }
    }    

The addSubsetRange member function

This is a private member function, called by the addSubset member function, that receives a subset index (an integer greater than or equal to 0) and a range definition, which may be trivial (just a single integer corresponding to one site), a range (e.g. 1-1000) consisting of a beginning and ending site separated by a hyphen, or a more complex range comprising a beginning and ending site as well as a step size, or stride. Regular expression matching is used to parse the range definition. The regular expression pattern string is

        const char * pattern_string = R"((\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*)";   

Note that the R"(...)" wrapper just tells C++ that this is a raw string (it tells C++ not to escape characters preceded by a backslash). After removing the raw literal bracketing characters, the regular expression pattern is shown below with the 5 potential capture groups indicated below:

(\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*
|-1-|   |--------------2-----------|
             |---3---||-----4----| 
                            |-5-|

The first construct, (\d+), looks for one or more digits, and the parentheses serve to capture this number as regex group 1. Group 2 occupies most of the remainder of the pattern and the terminating * means that group 2 may not even be present in a match. This would be the case for trivial ranges consisting of only a single number representing the site position.

Assuming group 2 is present in a match, group 3 is required and captures the ending site position, which follows the required hyphen character (-) and zero or more whitespace characters (\s*). Group 4 is not required, but, if present, matches a backslash character (\\), which must be doubled (“escaped”) in order to keep it from acting to make the following character special, followed by a potential space (\s*) and then a series of digit characters (\d+) captured as group 5. (The fact that backslashes are escaped in this regular expression itself explains why I decided to go with the raw literal string approach; otherwise, backslashes would need to be escaped for both C++ and the regular expression interpreter!)

The function std::regex_match searches the supplied range_definition for the pattern defined by re and stores any captured groups in match_obj.

Note that groups 2 and 4 are both optional. These groups are only defined so that we can make them optional. It is groups 1, 3, and 5 that capture the information we need. The three lines that call extractIntFromRegexMatch do the work of assigning these pieces of information to the variables ibegin, iend, istep. The second argument to extractIntFromRegexMatch is the default value that is used if the capture group specified as the first argument is the empty string.

All that remains is to append the tuple containing the four values ibegin, iend, isteo, and subset_index to the vector _subset_ranges, as well as update _num_sites if the last site included in the range is larger than the current value of _num_sites. This last task is complicated by the possibility that the last site included in the range may not equal iend! Consider the range 2-10\3, which translates to the set {2, 5, 8}. The last site included is 8, not 10.

Here is the source for the addSubsetRange function:

    inline void Partition::addSubsetRange(unsigned subset_index, std::string range_definition) {    
        // match patterns like these: "1-.\3" "1-1000" "1001-."
        const char * pattern_string = R"((\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*)";   
        std::regex re(pattern_string);
        std::smatch match_obj;
        bool matched = std::regex_match(range_definition, match_obj, re);
        if (!matched) {
            throw XStrom(boost::format("Could not interpret \"%s\" as a range of site indices") % range_definition);
        }
        
        // match_obj always yields 6 strings that can be indexed using the operator[] function
        // match_obj[0] equals entire site_range (e.g. "1-.\3")
        // match_obj[1] equals beginning site index (e.g. "1")
        // match_obj[2] equals everything after beginning site index (e.g. "-.\3")
        // match_obj[3] equals "" or ending site index (e.g. ".")
        // match_obj[4] equals "" or everything after ending site index (e.g. "\3")
        // match_obj[5] equals "" or step value (e.g. "3")
        int ibegin = extractIntFromRegexMatch(match_obj[1], 1);
        int iend   = extractIntFromRegexMatch(match_obj[3], ibegin);
        int istep  = extractIntFromRegexMatch(match_obj[5], 1);
        
        // record the triplet
        _subset_ranges.push_back(std::make_tuple(ibegin, iend, istep, subset_index));
        
        // determine last site in subset
        unsigned last_site_in_subset = iend - ((iend - ibegin) % istep);
        if (last_site_in_subset &gt; _num_sites) {
            _num_sites = last_site_in_subset;
        }
    }    

The extractIntFromRegexMatch member function

This function takes a regex_match_t argument and attempts to interpret it as an integer. A regex_match_t is an object that captured part of a regular expression match; the match object can be converted to a std::string using the object’s str member function.

If the attempt to convert the match to an integer succeeds, then the function returns the integer value extracted, assuming that the integer is at least as large as the minimum value specified. If the supplied string is empty, then no attempt is made to interpret it and the supplied minimum value is returned by default.

    inline int Partition::extractIntFromRegexMatch(regex_match_t s, unsigned min_value) {    
        int int_value = min_value;
        if (s.length() &gt; 0) {
            std::string str_value = s.str();
            try {
                int_value = std::stoi(str_value);
            }
            catch(std::invalid_argument) {
                throw XStrom(boost::format("Could not interpret \"%s\" as a number in partition subset definition") % s.str());
            }
            
            // sanity check
            if (int_value &lt; (int)min_value) {
                throw XStrom(boost::format("Value specified in partition subset definition (%d) is lower than minimum value (%d)") % int_value % min_value);
            }
        }
        return int_value;
    }    

The function uses std::stoi to extract an integer value from a string. If std::stoi fails to interpret the supplied string as an integer, it throws a std::invalid_argument exception, which we catch and follow up with our own XStrom exception to explain to the user what went wrong.

The finalize member function

The finalize function is called once the actual number of sites is known (after the data have been stored). This function performs three important sanity checks. First, it checks whether the number of sites specified is equal to the number of sites determined by the subset definitions. Second, it checks whether any sites have slipped through the cracks and were not assigned to any subset in the partition. Third, it checks whether any sites have been assigned to more than one subset. If any of these sanity checks fail, an XStrom exception is thrown, forcing the user to fix their partition definition.

    inline void Partition::finalize(unsigned nsites) {    
        if (_num_sites == 0) {
            defaultPartition(nsites);
            return;
        }

        // First sanity check:
        //   nsites is the number of sites read in from a data file;
        //   _num_sites is the maximum site index specified in any partition subset.
        //   These two numbers should be the same.
        if (_num_sites != nsites) {
            throw XStrom(boost::format("Number of sites specified by the partition (%d) does not match actual number of sites (%d)") % _num_sites % nsites);
        }
        
        // Second sanity check: ensure that no sites were left out of all partition subsets
        // Third sanity check: ensure that no sites were included in more than one partition subset
        std::vector&lt;int&gt; tmp(nsites, -1);   // begin with -1 for all sites
        for (auto & t : _subset_ranges) {
            unsigned begin_site  = std::get&lt;0&gt;(t);
            unsigned end_site    = std::get&lt;1&gt;(t);
            unsigned stride  = std::get&lt;2&gt;(t);
            unsigned site_subset = std::get&lt;3&gt;(t);
            for (unsigned s = begin_site; s &lt;= end_site; s += stride) {
                if (tmp[s-1] != -1)
                    throw XStrom("Some sites were included in more than one partition subset");
                else
                    tmp[s-1] = site_subset;
            }
        }
        if (std::find(tmp.begin(), tmp.end(), -1) != tmp.end()) {
            throw XStrom("Some sites were not included in any partition subset");
        }
        tmp.clear();
    }    

The defaultPartition member function

If data will not be partitioned, it makes sense to relieve the user of the responsibility of creating a subset definition. This function may be called if no subset definitions were provided on the command line (or strom.conf file). It simply creates a _subset_ranges vector containing a single range tuple specifying a range that extends from the first to the last site. It also adds the name default to the _subset_names vector.

    inline void Partition::defaultPartition(unsigned nsites) {    
        clear();
        _num_sites = nsites;
        _num_subsets = 1;
        _subset_ranges[0] = std::make_tuple(1, nsites, 1, 0);
    }    

< 9.1 | 9.2 | 9.3 >