Before we can do anything interesting with our Split
class, we need to add a Split
object to the Node
class and a storeSplits
member function to the TreeManip
class.
Uncomment the line that includes the split.hpp header, which is located at the top of the node.hpp file. Also uncomment the line that declares a data member of type Split
just below _edge_length
inside the Node
class declaration. Finally, uncomment the getSplit
accessor member function. The changes needed in the Node
class are shown below.
#pragma once
#include <string>
#include <vector>
#include <iostream>
<span style="color:#0000ff"><strong>#include "split.hpp"</strong></span>
namespace strom {
class Tree;
class TreeManip;
//class Likelihood;
//class Updater;
class Node {
friend class Tree;
friend class TreeManip;
//friend class Likelihood;
//friend class Updater;
public:
Node();
~Node();
Node * getParent() {return _parent;}
Node * getLeftChild() {return _left_child;}
Node * getRightSib() {return _right_sib;}
int getNumber() {return _number;}
std::string getName() {return _name;}
<span style="color:#0000ff"><strong>Split getSplit() {return _split;}</strong></span>
double getEdgeLength() {return _edge_length;}
void setEdgeLength(double v);
static const double _smallest_edge_length;
typedef std::vector<Node> Vector;
typedef std::vector<Node *> PtrVector;
private:
void clear();
Node * _left_child;
Node * _right_sib;
Node * _parent;
int _number;
std::string _name;
double _edge_length;
<span style="color:#0000ff"><strong>Split _split;</strong></span>
};
Add the following member function prototype to the TreeManip
class declaration (just below buildFromNewick
member function prototype).
#pragma once
#include <cassert>
#include <memory>
#include <stack>
#include <queue>
#include <set>
#include <regex>
#include <boost/range/adaptor/reversed.hpp>
#include <boost/format.hpp>
#include "tree.hpp"
#include "xstrom.hpp"
namespace strom {
class TreeManip {
public:
TreeManip();
TreeManip(Tree::SharedPtr t);
~TreeManip();
void setTree(Tree::SharedPtr t);
Tree::SharedPtr getTree();
double calcTreeLength() const;
unsigned countEdges() const;
void scaleAllEdgeLengths(double scaler);
void createTestTree();
std::string makeNewick(unsigned precision, bool use_names = false) const;
void buildFromNewick(const std::string newick, bool rooted, bool allow_polytomies);
<span style="color:#0000ff"><strong>void storeSplits(std::set<Split> & splitset);</strong></span>
void rerootAtNodeNumber(int node_number);
void clear();
private:
Node * findNextPreorder(Node * nd);
void refreshPreorder();
void refreshLevelorder();
void renumberInternals();
void rerootAtNode(Node * prospective_root);
void extractNodeNumberFromName(Node * nd, std::set<unsigned> & used);
void extractEdgeLen(Node * nd, std::string edge_length_string);
unsigned countNewickLeaves(const std::string newick);
void stripOutNexusComments(std::string & newick);
bool canHaveSibling(Node * nd, bool rooted, bool allow_polytomies);
Tree::SharedPtr _tree;
public:
typedef std::shared_ptr< TreeManip > SharedPtr;
};
Now add the member function body to the end of the tree_manip.hpp file (but before the right curly bracket that terminates the namespace scope).
inline void TreeManip::storeSplits(std::set<Split> & splitset) {
// Start by clearing and resizing all splits
for (auto & nd : _tree->_nodes) {
nd._split.resize(_tree->_nleaves);
}
// Now do a postorder traversal and add the bit corresponding
// to the current node in its parent node's split
for (auto nd : boost::adaptors::reverse(_tree->_preorder)) {
if (nd->_left_child) {
// add this internal node's split to splitset
splitset.insert(nd->_split);
}
else {
// set bit corresponding to this leaf node's number
nd->_split.setBitAt(nd->_number);
}
if (nd->_parent) {
// parent's bits are the union of the bits set in all its children
nd->_parent->_split.addSplit(nd->_split);
}
}
}
The first for loop visits each node in the tree (i.e. every element of the _nodes
vector) and resizes that node’s _split
to ensure that all splits have just enough elements in their _bits
vector to store any possible split defined on _nleaves
taxa.
The second loop (which involves the use of boost::adaptors::reverse
to reverse the order in which nodes in the _preorder
vector are visited) conducts a postorder traversal (i.e. all descendants are visited before the node representing their most recent common ancestor is visited). Because of the Split::resize
call we just did for every node, it is safe to assume that all splits have zeros in every bit. When visiting a leaf node (i.e. nd->_left_child = 0
), this function sets the bit in its split corresponding to that leaf’s taxon number. If the leaf node has _number = 0
, then the first (i.e. 0th) bit will be set in its _split
.
Note that for any node having a _parent
(i.e. all nodes except the root node), the bits in the node being visited are added to the bits already set in the _parent
node. Thus, by the time all descendants of an internal node have been visited, the _split
for that internal node is already completely specified. When an internal node is visited, all that needs to be done is to add its _split
to the growing splitset
and, of course, add the bits defined in its _split
to its own parent.
The parameter splitset
is a reference to a std::set
of Split
objects. A reference in C++ represents an alias to an existing variable. In this case, the std::set<Split>
variable aliased must be created before this function is called, and this function builds up the set as it walks through the tree. Why use a std::set
and not a std::vector
here? Different trees may have identical splits, but they may not be in the same order depending on how each edge is rotated. If splits are added to a set, then the order in which they are encountered does not matter: trees with the same set of splits will have identical split sets. When the function exits, the external std::set<Split>
variable will contain the same number of splits as there are internal nodes in the tree, and will serve as a tree ID: a unique bar code that identifies the tree topology and allows other trees with the same topology (but probably different edge lengths) to be identified.