7.2 Adding splits to Node and TreeManip

(Mac version)

< 7.1 | 7.2 | 7.3 >

Before we can do anything interesting with our Split class, we need to add a Split object to the Node class and a storeSplits member function to the TreeManip class.

Adding splits to the Node class

Uncomment the line that includes the split.hpp header, which is located at the top of the node.hpp file. Also uncomment the line that declares a data member of type Split just below _edge_length inside the Node class declaration. Finally, uncomment the getSplit accessor member function. The changes needed in the Node class are shown below.

#pragma once    

#include &lt;string&gt;
#include &lt;vector&gt;
#include  &lt;iostream&gt;
<span style="color:#0000ff"><strong>#include "split.hpp"</strong></span>

namespace strom {

    class Tree;
    class TreeManip;
    //class Likelihood;
    //class Updater;

    class Node {
            friend class Tree;
            friend class TreeManip;
            //friend class Likelihood;
            //friend class Updater;

        public:
                                        Node();
                                        ~Node();

                    Node *              getParent()     {return _parent;}
                    Node *              getLeftChild()  {return _left_child;}
                    Node *              getRightSib()   {return _right_sib;}
                    int                 getNumber()     {return _number;}
                    std::string         getName()       {return _name;}
                    <span style="color:#0000ff"><strong>Split               getSplit()      {return _split;}</strong></span>

                    double              getEdgeLength() {return _edge_length;}
                    void                setEdgeLength(double v);

            static const double _smallest_edge_length;

            typedef std::vector&lt;Node&gt;    Vector;
            typedef std::vector&lt;Node *&gt;  PtrVector;

        private:

            void                clear();

            Node *              _left_child;
            Node *              _right_sib;
            Node *              _parent;
            int                 _number;
            std::string         _name;
            double              _edge_length;
            <span style="color:#0000ff"><strong>Split               _split;</strong></span>
        };


Adding the storeSplits member function to the TreeManip class

Add the following member function prototype to the TreeManip class declaration (just below buildFromNewick member function prototype).

#pragma once    

#include &lt;cassert&gt;
#include &lt;memory&gt;
#include &lt;stack&gt;
#include &lt;queue&gt;
#include &lt;set&gt;
#include &lt;regex&gt;
#include &lt;boost/range/adaptor/reversed.hpp&gt;
#include &lt;boost/format.hpp&gt;
#include "tree.hpp"
#include "xstrom.hpp"

namespace strom {

    class TreeManip {
        public:
                                        TreeManip();
                                        TreeManip(Tree::SharedPtr t);
                                        ~TreeManip();

            void                        setTree(Tree::SharedPtr t);
            Tree::SharedPtr             getTree();

            double                      calcTreeLength() const;
            unsigned                    countEdges() const;
            void                        scaleAllEdgeLengths(double scaler);

            void                        createTestTree();
            std::string                 makeNewick(unsigned precision, bool use_names = false) const;

            void                        buildFromNewick(const std::string newick, bool rooted, bool allow_polytomies);
            <span style="color:#0000ff"><strong>void                        storeSplits(std::set&lt;Split&gt; & splitset);</strong></span>
            void                        rerootAtNodeNumber(int node_number);

            void                        clear();

        private:

            Node *                      findNextPreorder(Node * nd);
            void                        refreshPreorder();
            void                        refreshLevelorder();
            void                        renumberInternals();
            void                        rerootAtNode(Node * prospective_root);
            void                        extractNodeNumberFromName(Node * nd, std::set&lt;unsigned&gt; & used);
            void                        extractEdgeLen(Node * nd, std::string edge_length_string);
            unsigned                    countNewickLeaves(const std::string newick);
            void                        stripOutNexusComments(std::string & newick);
            bool                        canHaveSibling(Node * nd, bool rooted, bool allow_polytomies);

            Tree::SharedPtr             _tree;

        public:

            typedef std::shared_ptr&lt; TreeManip &gt; SharedPtr;
    };
    

Now add the member function body to the end of the tree_manip.hpp file (but before the right curly bracket that terminates the namespace scope).

    inline void TreeManip::storeSplits(std::set&lt;Split&gt; & splitset) {    
        // Start by clearing and resizing all splits
        for (auto & nd : _tree-&gt;_nodes) {
            nd._split.resize(_tree-&gt;_nleaves);
        }

        // Now do a postorder traversal and add the bit corresponding
        // to the current node in its parent node's split
        for (auto nd : boost::adaptors::reverse(_tree-&gt;_preorder)) {
            if (nd-&gt;_left_child) {
                // add this internal node's split to splitset
                splitset.insert(nd-&gt;_split);
            }
            else {
                // set bit corresponding to this leaf node's number
                nd-&gt;_split.setBitAt(nd-&gt;_number);
            }

            if (nd-&gt;_parent) {
                // parent's bits are the union of the bits set in all its children
                nd-&gt;_parent-&gt;_split.addSplit(nd-&gt;_split);
            }
        }
    }   

The first for loop visits each node in the tree (i.e. every element of the _nodes vector) and resizes that node’s _split to ensure that all splits have just enough elements in their _bits vector to store any possible split defined on _nleaves taxa.

The second loop (which involves the use of boost::adaptors::reverse to reverse the order in which nodes in the _preorder vector are visited) conducts a postorder traversal (i.e. all descendants are visited before the node representing their most recent common ancestor is visited). Because of the Split::resize call we just did for every node, it is safe to assume that all splits have zeros in every bit. When visiting a leaf node (i.e. nd->_left_child = 0), this function sets the bit in its split corresponding to that leaf’s taxon number. If the leaf node has _number = 0, then the first (i.e. 0th) bit will be set in its _split.

Note that for any node having a _parent (i.e. all nodes except the root node), the bits in the node being visited are added to the bits already set in the _parent node. Thus, by the time all descendants of an internal node have been visited, the _split for that internal node is already completely specified. When an internal node is visited, all that needs to be done is to add its _split to the growing splitset and, of course, add the bits defined in its _split to its own parent.

The parameter splitset is a reference to a std::set of Split objects. A reference in C++ represents an alias to an existing variable. In this case, the std::set<Split> variable aliased must be created before this function is called, and this function builds up the set as it walks through the tree. Why use a std::set and not a std::vector here? Different trees may have identical splits, but they may not be in the same order depending on how each edge is rotated. If splits are added to a set, then the order in which they are encountered does not matter: trees with the same set of splits will have identical split sets. When the function exits, the external std::set<Split> variable will contain the same number of splits as there are internal nodes in the tree, and will serve as a tree ID: a unique bar code that identifies the tree topology and allows other trees with the same topology (but probably different edge lengths) to be identified.

< 7.1 | 7.2 | 7.3 >