A Partition
object will serve as the manager for data subsets defined by the user. It will store the name of each data subset, the range(s) of sites included in each subset, and information about the type of data in each subset.
Create a new C++ class named Partition
and add it to your project as the header file partition.hpp. Below is the class declaration. The body of each member function will be described separately (you should add each of these member function bodies just above the right curly bracket that terminates the namespace block).
#pragma once
#include <tuple>
#include <limits>
#include <cmath>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include "genetic_code.hpp"
#include "datatype.hpp"
#include "xstrom.hpp"
namespace strom {
class Partition {
public:
typedef std::match_results<std::string::const_iterator>::const_reference regex_match_t;
typedef std::tuple<unsigned, unsigned, unsigned, unsigned> subset_range_t;
typedef std::vector<subset_range_t> partition_t;
typedef std::vector<DataType> datatype_vect_t;
typedef std::vector<unsigned> subset_sizes_vect_t;
typedef std::vector<std::string> subset_names_vect_t;
typedef std::shared_ptr<Partition> SharedPtr;
Partition();
~Partition();
unsigned getNumSites() const;
unsigned getNumSubsets() const;
std::string getSubsetName(unsigned subset) const;
const partition_t & getSubsetRangeVect() const;
unsigned findSubsetByName(const std::string & subset_name) const;
unsigned findSubsetForSite(unsigned site_index) const;
bool siteInSubset(unsigned site_index, unsigned subset_index) const;
DataType getDataTypeForSubset(unsigned subset_index) const;
const datatype_vect_t & getSubsetDataTypes() const;
unsigned numSitesInSubset(unsigned subset_index) const;
subset_sizes_vect_t calcSubsetSizes() const;
void defaultPartition(unsigned nsites = std::numeric_limits<unsigned>::max());
void parseSubsetDefinition(std::string & s);
void finalize(unsigned nsites);
void clear();
private:
int extractIntFromRegexMatch(regex_match_t s, unsigned min_value);
void addSubsetRange(unsigned subset_index, std::string range_definition);
void addSubset(unsigned subset_index, std::string subset_definition);
unsigned _num_sites;
unsigned _num_subsets;
subset_names_vect_t _subset_names;
partition_t _subset_ranges;
datatype_vect_t _subset_data_types;
const unsigned _infinity;
};
// member function bodies here
}
The class declaration above defines several types that are introduced to simplify member function prototypes and data member definitions that follow:
regex_match_t
is used for regular expression match object variablessubset_range_t
is a std::tuple
comprising four unsigned int
s storing the beginning site, ending site, step size, and partition subset index for a chunk of sitespartition_t
is used for the vector of subset_range_t
objects that place every site into one and only one partition subsetdatatype_vect_t
is used for the vector of DataType
objects that store information about the type of data stored in each subsetsubset_sizes_vect_t
is a vector of unsigned integers that stores the number of sites assigned to each partition subsetsubset_names_vect_t
is a vector of strings that stores the label that the user assigned to each partition subsetSharedPtr
is the shared pointer used for passing around Partition objectsThese type definitions make it simpler to define variables of these types and to pass such variables into functions.
Here are the bodies of the constructor and destructor. As usual, the only thing that we have these functions do is to report when an object of the Partition
class is created or destroyed, and these lines have been commented out (you can uncomment them at any time for debugging purposes). In addition, the constructor calls the clear
member function to perform initializations.
The data member _infinity
needs some explanation. This data member has type const unsigned
and as a const
data member must be initialized before the Partition
object has been created. Thus, we cannot initialize it in the body of the constructor, because, by that time, the object already exists. Hence, _infinity
is initialized via the initializer list (before the left curly bracket that opens the constructor body). The data member _infinity
is set equal to the largest possible unsigned
value, which is used as a stand-in for the total number of sites until we actually know the total number of sites. The value itself is obtained using the (static) max
function of the class std::numeric_limits<unsigned>
. The #include <limits>
line at the top of the file provides access to the numeric_limits
class.
inline Partition::Partition() : _infinity(std::numeric_limits<unsigned>::max()) {
//std::cout << "Constructing a Partition" << std::endl;
clear();
}
inline Partition::~Partition() {
//std::cout << "Destroying a Partition" << std::endl;
}
The functions getNumSites
, getNumSubsets
, getSubsetName
, getSubsetRangeVect
, getDataTypeForSubset
, and getSubsetDataTypes
provide access to the values stored in the private data members _num_sites
, _num_subsets
, _subset_names
, _subset_ranges
, and _subset_data_types
, respectively, but do not allow you to change those variables.
inline unsigned Partition::getNumSites() const {
return _num_sites;
}
inline unsigned Partition::getNumSubsets() const {
return _num_subsets;
}
inline std::string Partition::getSubsetName(unsigned subset) const {
assert(subset < _num_subsets);
return _subset_names[subset];
}
inline const Partition::partition_t & Partition::getSubsetRangeVect() const {
return _subset_ranges;
}
inline DataType Partition::getDataTypeForSubset(unsigned subset_index) const {
assert(subset_index < _subset_data_types.size());
return _subset_data_types[subset_index];
}
inline const std::vector<DataType> & Partition::getSubsetDataTypes() const {
return _subset_data_types;
}
This function returns the (0-based) index of the subset in _subset_names
corresponding to the subset_name
provided. If no subset by that name can be found, an exception is thrown. Here the std::find
function is used to search for subset_name
in _subset_names
. If found, the returned iterator will not equal _subset_names.end()
, which represents the position just beyond the last element of the vector. Once the iterator is positioned at the correct element of _subset_names
, the index is found using the std::distance
algorithm, which computes the distance between the returned iterator and the first element of the vector.
inline unsigned Partition::findSubsetByName(const std::string & subset_name) const {
auto iter = std::find(_subset_names.begin(), _subset_names.end(), subset_name);
if (iter == _subset_names.end())
throw XStrom(boost::format("Specified subset name \"%s\" not found in partition") % subset_name);
return (unsigned)std::distance(_subset_names.begin(),iter);
}
This function returns the subset index corresponding to a given site. Subset indices start at 0 and are indexed according to the order in which they are specified. Sites are numbered starting from 1, which is the convention used in, for example, NEXUS formatted data files.
The information for each chunk of sites is stored in a 4-tuple. A std::tuple
is a structure that contains a fixed number of values (in our case 4) and is a generalization of std::pair
, which represents a 2-tuple. The std::get
template returns a reference to the kth element of the tuple t, where k is specified in the angle brackets and t in parentheses. Each value in the 4-tuple is copied to a local variable for purposes of clarity.
inline unsigned Partition::findSubsetForSite(unsigned site_index) const {
for (auto & t : _subset_ranges) {
unsigned begin_site = std::get<0>(t);
unsigned end_site = std::get<1>(t);
unsigned stride = std::get<2>(t);
unsigned site_subset = std::get<3>(t);
bool inside_range = site_index >= begin_site && site_index <= end_site;
if (inside_range && (site_index - begin_site) % stride == 0)
return site_subset;
}
throw XStrom(boost::format("Site %d not found in any subset of partition") % (site_index + 1));
}
This function simply returns true
if the specified site (1,2,…,_num_sites
) is in the specified subset (0, 1, …, _num_subsets
-1), and returns false
otherwise. It uses findSubsetForSite
to do the heavy lifting. Note that sites are numbered starting with 1 in NEXUS data files but a 0-based indexing system is used for everything else in our program.
inline bool Partition::siteInSubset(unsigned site_index, unsigned subset_index) const {
unsigned which_subset = findSubsetForSite(site_index);
return (which_subset == subset_index ? true : false);
}
This function calculates the number of sites assigned to the subset having index subset_index
. This involves looping through all the subset ranges in _subset_ranges
and, for all ranges assigned to _subset_index
, determining how many sites are included.
This process would be uncomplicated were it not for the third element of each range (the step size or stride), which can be greater than 1 and which necessitates the use of the modulus operator to determine whether the range includes one more site than is suggested by floor(n/stride)
. For example, suppose begin_site
is 1, end_site
is 10, and stride
is 3. A total of 4 sites are included (sites 1, 4, 7, 10), yet 10/3 is only 3.
inline unsigned Partition::numSitesInSubset(unsigned subset_index) const {
unsigned nsites = 0;
for (auto & t : _subset_ranges) {
unsigned begin_site = std::get<0>(t);
unsigned end_site = std::get<1>(t);
unsigned stride = std::get<2>(t);
unsigned site_subset = std::get<3>(t);
if (site_subset == subset_index) {
unsigned n = end_site - begin_site + 1;
nsites += (unsigned)(floor(n/stride)) + (n % stride == 0 ? 0 : 1);
}
}
return nsites;
}
This function returns a vector of subset sizes using the same approach used by numSitesInSubset
to count the numnber of sites falling in each subset. This function is useful for reporting information about the partition to the user, and these subset sizes are needed (as we shall later see) when normalizing the subset relative rates of substitution.
inline std::vector<unsigned> Partition::calcSubsetSizes() const {
assert(_num_sites > 0); // only makes sense to call this function after subsets are defined
std::vector<unsigned> nsites_vect(_num_subsets, 0);
for (auto & t : _subset_ranges) {
unsigned begin_site = std::get<0>(t);
unsigned end_site = std::get<1>(t);
unsigned stride = std::get<2>(t);
unsigned site_subset = std::get<3>(t);
unsigned hull = end_site - begin_site + 1;
unsigned n = (unsigned)(floor(hull/stride)) + (hull % stride == 0 ? 0 : 1);
nsites_vect[site_subset] += n;
}
return nsites_vect;
}
Like other clear
functions in this tutorial, this function is called by the constructor (but could be called at other times) and returns the Partition
object to the just-constructed state. Note that the clear
function creates a default
partition consisting of a single subset range with begin site 1
, end site _infinity
, step size 1
, and subset index 0
. As soon as the user adds the first subset definition, this default partition will be deleted. Assuming that the user does not define a partition, then as soon as data are read (i.e. when the Partition::finalize
member function is called), the _infinity
will be replaced by the actual total number of sites.
inline void Partition::clear() {
_num_sites = 0;
_num_subsets = 1;
_subset_data_types.clear();
_subset_data_types.push_back(DataType());
_subset_names.clear();
_subset_names.push_back("default");
_subset_ranges.clear();
_subset_ranges.push_back(std::make_tuple(1, _infinity, 1, 0));
}
This function provides the primary route by which partition subsets are added. It takes a string s
representing everything after the keyword “subset” in a configuration file and splits s
at the colon to yield two strings, before_colon
(e.g. “rbcL[codon,plantplastid]”) and subset_definition
(e.g. “1-20”).
The use of boost::split
to split s
at the colon character requires including the header file boost/algorithm/string/split.hpp and the use of the boost::is_any_of
predicate requires including the header file boost/algorithm/string/classification.hpp. You will find that we indeed did include these headers at the beginning of the partition.hpp file:
#pragma once
#include <tuple>
#include <limits>
#include <cmath>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>
<span style="color:#0000ff"><strong>#include <boost/algorithm/string/split.hpp></strong></span>
<span style="color:#0000ff"><strong>#include <boost/algorithm/string/classification.hpp></strong></span>
#include "genetic_code.hpp"
#include "datatype.hpp"
#include "xstrom.hpp"
The main complication faced by parseSubsetDefinition
is that the string before_colon
may be either just the subset label chosen by the user (e.g. “rbcL”), or it may be a subset name followed by a subset data type specification embedded in square brackets (e.g. “rbcL[codon,plantplastid]”). The subset name and the subset data type (if it is there) are extracted through the use of regular expressions using the std::regex_match
function. The regular expression pattern used is explained below.
R"((.+?)\s*(\[(\S+?)\])*)"
^^^^^ non-greedy sequence of 1 or more characters
R"((.+?)\s*(\[(\S+?)\])*)"
^^^ 0 or more whitespace characters
R"((.+?)\s*(\[(\S+?)\])*)"
^^^^^^^^^^^^^ captures subset data type specification if it is there
R"((.+?)\s*(\[(\S+?)\])*)"
^^ ^^ literal left/right square brackets (preceded by backslash
because brackets have special meaning in regular expressions)
R"((.+?)\s*(\[(\S+?)\])*)"
^^^^^^ 1 or more darkspace characters (this should be "nucleotide",
"protein", "standard", or "codon" but we will check this later
because we don't want the entire regular expression to fail if
the user has specified an invalid data type here
First of all, this is an example of a raw literal string. The beginning R"(
and the ending )"
form a wrapper that identifies this as a raw string and thus these 5 characters are not actually part of the pattern string. The reason we use a raw string here is that regular expression patterns are full of backslash characters, which, in regular expressions, signal that the following character has special significance. For example, if \d
appears in a regular expression pattern, it means that d
means “digit character” rather than just the letter d
. If a raw string is not used, then each backslash character must be “escaped” by preceding it with a second backslash. This leads to lots of double backslash character combinations, which make regular expression patterns, already difficult to comprehend, even more difficult to construct correctly.
The resulting match_obj
is a vector of length either 2 or 4. It has length 2 if the user did not specify a data type at all (“nucleotide” is assumed), or 4 if the user did specify a data type in square brackets. The reason there are two extra elements is that we captured not only the data type itself, but also the entire expression in square brackets (in order to test whether it was there). Thus, the specification rbcL[codon,plantplastid]
would result in the following match_obj
vector:
match_obj[0] = "rbcL[codon,plantplastid]"
match_obj[1] = "rbcL"
match_obj[2] = "[codon,plantplastid]"
match_obj[3] = "codon,plantplastid"
whereas the specification rbcL
(which is effectively the same as rbcL[nucleotide]
) would result in:
match_obj[0] = "rbcL"
match_obj[1] = "rbcL"
If the data type is “codon”, then the user may have either specified a particular genetic code name (e.g. “plantplastid”) or not, in which case the “standard” (i.e. universal) genetic code is assumed. A further regular expression is used to detect whether the data type is “codon” and a genetic code was specified. Several else
clauses handle all other possible data types, including the codon data type where the default genetic code is assumed.
The function then adds subset_name
to the _subset_names
vector, adds the data type dt
to _subset_data_types
, updates _num_subsets
, then calls addSubset
to do all of the work involved in interpreting the subset_definition
string. Note that this function deletes the default partition (created in the constructor): if this function is ever called, it is because the user has defined a partition and thus the default partition is not needed.
Here’s the entire function body:
inline void Partition::parseSubsetDefinition(std::string & s) {
std::vector<std::string> v;
// first separate part before colon (stored in v[0]) from part after colon (stored in v[1])
boost::split(v, s, boost::is_any_of(":"));
if (v.size() != 2)
throw XStrom("Expecting exactly one colon in partition subset definition");
std::string before_colon = v[0];
std::string subset_definition = v[1];
// now see if before_colon contains a data type specification in square brackets
const char * pattern_string = R"((.+?)\s*(\[(\S+?)\])*)";
std::regex re(pattern_string);
std::smatch match_obj;
bool matched = std::regex_match(before_colon, match_obj, re);
if (!matched) {
throw XStrom(boost::format("Could not interpret \"%s\" as a subset label with optional data type in square brackets") % before_colon);
}
// match_obj always yields 2 strings that can be indexed using the operator[] function
// match_obj[0] equals entire subset label/type string (e.g. "rbcL[codon:standard]")
// match_obj[1] equals the subset label (e.g. "rbcL")
// Two more elements will exist if the user has specified a data type for this partition subset
// match_obj[2] equals data type inside square brackets (e.g. "[codon:standard]")
// match_obj[3] equals data type only (e.g. "codon:standard")
std::string subset_name = match_obj[1].str();
DataType dt; // nucleotide by default
std::string datatype = "nucleotide";
if (match_obj.size() == 4 && match_obj[3].length() > 0) {
datatype = match_obj[3].str();
boost::to_lower(datatype);
// check for comma plus genetic code in case of codon
std::regex re(R"(codon\s*,\s*(\S+))");
std::smatch m;
if (std::regex_match(datatype, m, re)) {
dt.setCodon();
std::string genetic_code_name = m[1].str();
dt.setGeneticCodeFromName(genetic_code_name);
}
else if (datatype == "codon") {
dt.setCodon(); // assumes standard genetic code
}
else if (datatype == "protein") {
dt.setProtein();
}
else if (datatype == "nucleotide") {
dt.setNucleotide();
}
else if (datatype == "standard") {
dt.setStandard();
}
else {
throw XStrom(boost::format("Datatype \"%s\" specified for subset(s) \"%s\" is invalid: must be either nucleotide, codon, protein, or standard") % datatype % subset_name);
}
}
// Remove default subset if there is one
unsigned end_site = std::get<1>(_subset_ranges[0]);
if (_num_subsets == 1 && end_site == _infinity) {
_subset_names.clear();
_subset_data_types.clear();
_subset_ranges.clear();
}
else if (subset_name == "default") {
throw XStrom("Cannot specify \"default\" partition subset after already defining other subsets");
}
_subset_names.push_back(subset_name);
_subset_data_types.push_back(dt);
_num_subsets = (unsigned)_subset_names.size();
addSubset(_num_subsets - 1, subset_definition);
std::cout << boost::str(boost::format("Partition subset %s comprises sites %s and has type %s") % subset_name % subset_definition % datatype) << std::endl;
}
This is a private member function that does the work of breaking up a subset definition into a vector of component ranges (these component ranges are separated by commas in subset_definition
). This is accomplished by the boost::split
function, and the resulting components are each submitted to the member function addSubsetRange
.
inline void Partition::addSubset(unsigned subset_index, std::string subset_definition) {
std::vector<std::string> parts;
boost::split(parts, subset_definition, boost::is_any_of(","));
for (auto subset_component : parts) {
addSubsetRange(subset_index, subset_component);
}
}
This is a private member function, called by the addSubset
member function, that receives a subset index (an integer greater than or equal to 0) and a range definition, which may be trivial (just a single integer corresponding to one site), a range (e.g. 1-1000) consisting of a beginning and ending site separated by a hyphen, or a more complex range comprising a beginning and ending site as well as a step size, or stride. Regular expression matching is used to parse the range definition. The regular expression pattern string is
const char * pattern_string = R"((\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*)";
Note that the R"(...)"
wrapper just tells C++ that this is a raw string (it tells C++ not to escape characters preceded by a backslash). After removing the raw literal bracketing characters, the regular expression pattern is shown below with the 5 potential capture groups indicated below:
(\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*
|-1-| |--------------2-----------|
|---3---||-----4----|
|-5-|
The first construct, (\d+)
, looks for one or more digits, and the parentheses serve to capture this number as regex group 1. Group 2 occupies most of the remainder of the pattern and the terminating *
means that group 2 may not even be present in a match. This would be the case for trivial ranges consisting of only a single number representing the site position.
Assuming group 2 is present in a match, group 3 is required and captures the ending site position, which follows the required hyphen character (-
) and zero or more whitespace characters (\s*
). Group 4 is not required, but, if present, matches a backslash character (\\
), which must be doubled (“escaped”) in order to keep it from acting to make the following character special, followed by a potential space (\s*
) and then a series of digit characters (\d+
) captured as group 5. (The fact that backslashes are escaped in this regular expression itself explains why I decided to go with the raw literal string approach; otherwise, backslashes would need to be escaped for both C++ and the regular expression interpreter!)
The function std::regex_match
searches the supplied range_definition
for the pattern defined by re
and stores any captured groups in match_obj
.
Note that groups 2 and 4 are both optional. These groups are only defined so that we can make them optional. It is groups 1, 3, and 5 that capture the information we need. The three lines that call extractIntFromRegexMatch
do the work of assigning these pieces of information to the variables ibegin
, iend
, istep
. The second argument to extractIntFromRegexMatch
is the default value that is used if the capture group specified as the first argument is the empty string.
All that remains is to append the tuple containing the four values ibegin
, iend
, isteo
, and subset_index
to the vector _subset_ranges
, as well as update _num_sites
if the last site included in the range is larger than the current value of _num_sites
. This last task is complicated by the possibility that the last site included in the range may not equal iend
! Consider the range 2-10\3
, which translates to the set {2, 5, 8}
. The last site included is 8, not 10.
Here is the source for the addSubsetRange
function:
inline void Partition::addSubsetRange(unsigned subset_index, std::string range_definition) {
// match patterns like these: "1-.\3" "1-1000" "1001-."
const char * pattern_string = R"((\d+)\s*(-\s*([0-9.]+)(\\\s*(\d+))*)*)";
std::regex re(pattern_string);
std::smatch match_obj;
bool matched = std::regex_match(range_definition, match_obj, re);
if (!matched) {
throw XStrom(boost::format("Could not interpret \"%s\" as a range of site indices") % range_definition);
}
// match_obj always yields 6 strings that can be indexed using the operator[] function
// match_obj[0] equals entire site_range (e.g. "1-.\3")
// match_obj[1] equals beginning site index (e.g. "1")
// match_obj[2] equals everything after beginning site index (e.g. "-.\3")
// match_obj[3] equals "" or ending site index (e.g. ".")
// match_obj[4] equals "" or everything after ending site index (e.g. "\3")
// match_obj[5] equals "" or step value (e.g. "3")
int ibegin = extractIntFromRegexMatch(match_obj[1], 1);
int iend = extractIntFromRegexMatch(match_obj[3], ibegin);
int istep = extractIntFromRegexMatch(match_obj[5], 1);
// record the triplet
_subset_ranges.push_back(std::make_tuple(ibegin, iend, istep, subset_index));
// determine last site in subset
unsigned last_site_in_subset = iend - ((iend - ibegin) % istep);
if (last_site_in_subset > _num_sites) {
_num_sites = last_site_in_subset;
}
}
This function takes a regex_match_t
argument and attempts to interpret it as an integer. A regex_match_t
is an object that captured part of a regular expression match; the match object can be converted to a std::string
using the object’s str
member function.
If the attempt to convert the match to an integer succeeds, then the function returns the integer value extracted, assuming that the integer is at least as large as the minimum value specified. If the supplied string is empty, then no attempt is made to interpret it and the supplied minimum value is returned by default.
inline int Partition::extractIntFromRegexMatch(regex_match_t s, unsigned min_value) {
int int_value = min_value;
if (s.length() > 0) {
std::string str_value = s.str();
try {
int_value = std::stoi(str_value);
}
catch(std::invalid_argument) {
throw XStrom(boost::format("Could not interpret \"%s\" as a number in partition subset definition") % s.str());
}
// sanity check
if (int_value < (int)min_value) {
throw XStrom(boost::format("Value specified in partition subset definition (%d) is lower than minimum value (%d)") % int_value % min_value);
}
}
return int_value;
}
The function uses std::stoi
to extract an integer value from a string. If std::stoi fails to interpret the supplied string as an integer, it throws a std::invalid_argument
exception, which we catch and follow up with our own XStrom
exception to explain to the user what went wrong.
The finalize function is called once the actual number of sites is known (after the data have been stored). This function performs three important sanity checks. First, it checks whether the number of sites specified is equal to the number of sites determined by the subset definitions. Second, it checks whether any sites have slipped through the cracks and were not assigned to any subset in the partition. Third, it checks whether any sites have been assigned to more than one subset. If any of these sanity checks fail, an XStrom
exception is thrown, forcing the user to fix their partition definition.
inline void Partition::finalize(unsigned nsites) {
if (_num_sites == 0) {
defaultPartition(nsites);
return;
}
// First sanity check:
// nsites is the number of sites read in from a data file;
// _num_sites is the maximum site index specified in any partition subset.
// These two numbers should be the same.
if (_num_sites != nsites) {
throw XStrom(boost::format("Number of sites specified by the partition (%d) does not match actual number of sites (%d)") % _num_sites % nsites);
}
// Second sanity check: ensure that no sites were left out of all partition subsets
// Third sanity check: ensure that no sites were included in more than one partition subset
std::vector<int> tmp(nsites, -1); // begin with -1 for all sites
for (auto & t : _subset_ranges) {
unsigned begin_site = std::get<0>(t);
unsigned end_site = std::get<1>(t);
unsigned stride = std::get<2>(t);
unsigned site_subset = std::get<3>(t);
for (unsigned s = begin_site; s <= end_site; s += stride) {
if (tmp[s-1] != -1)
throw XStrom("Some sites were included in more than one partition subset");
else
tmp[s-1] = site_subset;
}
}
if (std::find(tmp.begin(), tmp.end(), -1) != tmp.end()) {
throw XStrom("Some sites were not included in any partition subset");
}
tmp.clear();
}
If data will not be partitioned, it makes sense to relieve the user of the responsibility of creating a subset definition. This function may be called if no subset definitions were provided on the command line (or strom.conf file). It simply creates a _subset_ranges
vector containing a single range tuple specifying a range that extends from the first to the last site. It also adds the name default
to the _subset_names
vector.
inline void Partition::defaultPartition(unsigned nsites) {
clear();
_num_sites = nsites;
_num_subsets = 1;
_subset_ranges[0] = std::make_tuple(1, nsites, 1, 0);
}