9.4 Test Reading and Storing Data

(Linux version)

< 9.3 | 9.4 | 10.0 >

Add the subset command line option

To test the new Partition and Data classes, we’ll need to modify strom.hpp to input partition subset specifications from the user. Begin by including some additional header files in strom.hpp and a shared pointer to a Partition object in the Strom class declaration:

#pragma once    

#include &lt;iostream&gt;
<span style="color:#0000ff"><strong>#include "data.hpp"</strong></span>
#include "tree_summary.hpp"
<span style="color:#0000ff"><strong>#include "partition.hpp"</strong></span>
#include &lt;boost/program_options.hpp&gt;
<span style="color:#0000ff"><strong>#include &lt;boost/filesystem.hpp&gt;</strong></span>

namespace strom {

    class Strom {
        public:
                                        Strom();
                                        ~Strom();

            void                        clear();
            void                        processCommandLineOptions(int argc, const char * argv[]);
            void                        run();
        
        private:

            std::string                 _data_file_name;
            std::string                 _tree_file_name;
            
            <span style="color:#0000ff"><strong>Partition::SharedPtr        _partition;</strong></span>
            <span style="color:#0000ff"><strong>Data::SharedPtr             _data;</strong></span>

            TreeSummary::SharedPtr      _tree_summary;

            static std::string          _program_name;
            static unsigned             _major_version;
            static unsigned             _minor_version;

    };

    // member function bodies go here
    
}   

Next, modify the Strom::processCommandLineOptions function as follows:

    inline void Strom::processCommandLineOptions(int argc, const char * argv[]) {   
        <span style="color:#0000ff"><strong>std::vector&lt;std::string&gt; partition_subsets;</strong></span>
        boost::program_options::variables_map vm;
        boost::program_options::options_description desc("Allowed options");
        desc.add_options()
            ("help,h", "produce help message")
            ("version,v", "show program version")
            <span style="color:#0000ff"><strong>("datafile,d",  boost::program_options::value(&_data_file_name)-&gt;required(), "name of a data file in NEXUS format")</strong></span>
            <span style="color:#0000ff"><strong>("treefile,t",  boost::program_options::value(&_tree_file_name), "name of a tree file in NEXUS format")</strong></span>
            <span style="color:#0000ff"><strong>("subset",  boost::program_options::value(&partition_subsets), "a string defining a partition subset, e.g. 'first:1-1234\3' or 'default[codon:standard]:1-3702'")</strong></span>
        ;
        boost::program_options::store(boost::program_options::parse_command_line(argc, argv, desc), vm);
        try {
            const boost::program_options::parsed_options & parsed = boost::program_options::parse_config_file&lt; char &gt;("strom.conf", desc, false);
            boost::program_options::store(parsed, vm);
        }
        catch(boost::program_options::reading_file & x) {
            std::cout &lt;&lt; "Note: configuration file (strom.conf) not found" &lt;&lt; std::endl;
        }
        boost::program_options::notify(vm);

        // If user specified --help on command line, output usage summary and quit
        if (vm.count("help") &gt; 0) {
            std::cout &lt;&lt; desc &lt;&lt; "\n";
            std::exit(1);
        }

        // If user specified --version on command line, output version and quit
        if (vm.count("version") &gt; 0) {
            std::cout &lt;&lt; boost::str(boost::format("This is %s version %d.%d") % _program_name % _major_version % _minor_version) &lt;&lt; std::endl;
            std::exit(1);
        }
    
        <span style="color:#0000ff"><strong>// If user specified --subset on command line, break specified partition subset</strong></span>
        <span style="color:#0000ff"><strong>// definition into name and character set string and add to _partition</strong></span>
        <span style="color:#0000ff"><strong>if (vm.count("subset") &gt; 0) {</strong></span>
            <span style="color:#0000ff"><strong>_partition.reset(new Partition());</strong></span>
            <span style="color:#0000ff"><strong>for (auto s : partition_subsets) {</strong></span>
                <span style="color:#0000ff"><strong>_partition-&gt;parseSubsetDefinition(s);</strong></span>
            <span style="color:#0000ff"><strong>}</strong></span>
        <span style="color:#0000ff"><strong>}</strong></span>
    }   

Note that I’ve moved the required() call in Strom::processCommandLineOptions from the treefile command line option to the datafile command line option:

            ("datafile,d",  boost::program_options::value(&_data_file_name)-&gt;required(), "name of a data file in NEXUS format") 
            ("treefile,t",  boost::program_options::value(&_tree_file_name), "name of a tree file in NEXUS format") 

You could remove the required() call from all command line options; I only use it here because this means program_options takes care of informing the user if they forget to specify a data file on the command line or in the strom.conf file and, in this particular version, we are testing only whether our program can successfully read in data from a file; we are not testing anything about trees, so the user does not need to specify a tree file to test this version of the program.

The section at the bottom (highlighted in blue) that processes a subset specification provided by the user is worth a little explanation. We start by making sure the data member _partition, which is a shared pointer to a Partition, actually points to something. The shared_ptr reset function creates a new Partition object and sets _partition to point to it. You’ll notice that I’ve highlighted a line at the top of the function that declares a variable named partition_subsets that is a vector of strings. This vector is provided to the options description for the subset option. If a single string variable were provided, only the last subset specified would be saved. However, because we specified a containiner, Boost Program Options will save all subset options specified by the user! We can thus iterate through this string vector, passing each of these strings to the Partition parsSubsetDefinition member function for processing.

Now add the following (highlighted) line to the Strom::clear function:

    inline void Strom::clear() {    
        _data_file_name = "";
        _tree_file_name = "";
        _tree_summary   = nullptr;
        <span style="color:#0000ff"><strong>_partition.reset(new Partition());</strong></span>
        _data = nullptr;
    }   

This line ensures that a newly-constructed Strom object has a default partition in place.

Create a data file

Before running it, however, you will need to create the file rbcL.nex. This file does not need to be in your project, so you can use any text editor to create it (e.g. TextEditor, nano, vi, etc.).

Here is a Nexus-formatted data file containing 60 sites from the large subunit of the gene encoding the enzyme RuBisCO (rbcL). Save this as the contents of rbcL.nex:

#NEXUS

[
Sites 76-135 (counting from the first site after the starting methionine triplet)
from selected taxa in the data used in: K Fučíková, PO Lewis, S Neupane, KG Karol, 
and LA Lewis. 2019. Order, please! Uncertainty in the ordinal-level classification 
of Chlorophyceae. PeerJ:e6899. DOI:10.7717/peerj.6899 (https://peerj.com/articles/6899/)
]

Begin data;
    Dimensions ntax=14 nchar=60;
    Format datatype=dna gap=-;
    Matrix
        Atractomorpha_echinata          CCTGATTATGTTGTAAGAGACACTGATATTCTTGCTGCTTTCCGTATGACTCCTCAACCA
        Bracteacoccus_aerius            CCAGATTACGTAGTTAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCACAACCA
        Bracteacoccus_minor             CCAGATTACGTAGTTAAAGATACTGACATTTTAGCTGCATTCCGTATGACTCCACAACCA
        Chlorotetraedron_incus          CCTGATTACGTTATCAAAGATACTGATATTTTAGCAGCATTCCGTATGACTCCACAACCA
        Chromochloris_zofingiensis      CCTGATTACGTAGTTAAAGATACAGATATTTTAGCAGCTTTCCGTATGACTCCTCAACCA
        Kirchneriella_aperta            CCTGATTACGTAGTAAGAGAGACTGACATCTTAGCTGCATTCCGTATGACTCCACAACCA
        Mychonastes_homosphaera         CCAGATTACGTTGTTAAAGATACTGACATCTTAGCAGCATTCCGTATGACTCCACAACCA
        Neochloris_aquatica             CCAGATTATGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCTCAACCA
        Ourococcus_multisporus          CCTGATTACGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCACAACCA
        Pediastrum_duplex               CCAGATTATGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCTCAACCA
        Pseudomuriella_schumacherensis  CCTGATTACGTAGTAAAAGAAACAGACATTCTAGCTGCATTCCGTATGACTCCTCAACCA
        Rotundella_rotunda              CCAGATTACGTTGTAAAAGAAACTGATATTTTAGCAGCATTCCGTATGACTCCTCAACCA
        Scenedesmus_obliquus            CCAGATTACGTTGTAAAAGATACTGATATTTTAGCAGCATTCCGTATGACTCCACAACCA
        Stigeoclonium_helveticum        CCAGATTATATGGTTAAAGATACTGATATTCTTGCTGCTTTCCGTATGACTCCTCAACCT
    ;
end;

In order to get rbcL.nex installed to the same directory as the strom executable file, we need to add one more install_data line to the very end of our meson.build file:

project('strom', 'cpp', 
    default_options : ['cpp_std=c++11','cpp_link_args=-static','prefix=/home/paul/Documents/strom/distr'],
    version : '1.0')
cpp = meson.get_compiler('cpp')
<span style="color:#0000ff"><strong>lib_filesystem = cpp.find_library('boost_filesystem', dirs: ['/home/paul/lib/static'], required: true)</strong></span>
lib_program_options = cpp.find_library('boost_program_options', dirs: ['/home/paul/lib/static'], required: true)
lib_ncl = cpp.find_library('ncl', dirs: ['/home/paul/lib/static'], required: true)
incl_ncl = include_directories('/home/paul/include')
incl_boost = include_directories('/home/paul/Documents/libraries/boost_1_71_0')
<span style="color:#0000ff"><strong>executable('strom', 'main.cpp', install: true, install_dir: '.', dependencies: [lib_ncl,lib_program_options,lib_filesystem], include_directories: [incl_ncl,incl_boost])</strong></span>
<span style="color:#0000ff"><strong>install_data('strom.conf', install_dir: '.')</strong></span>
install_data('test.tre', install_dir: '.')
<span style="color:#0000ff"><strong>install_data('rbcL.nex', install_dir: '.')</strong></span>

I’ve also highlighted 3 other lines here. The new line that defines the dependency lib_filesystem and the addition of this dependency to the executable command are both needed in order to show the current working directory (see below). Also highlighted is a line that copies the strom.conf to the install directory.

Specifying the data file

You will need to specify the data file and partition information either on the command line (using --datafile rbcL.nex --subset first:1-60\3 --subset second:2-60\3 --subset third:3-60\3) or (better) create a text file named strom.conf containing these lines:

datafile = rbcL.nex 
subset = first:1-60\3
subset = second:2-60\3
subset = third:3-60\3

Finding the current working directory

We’ve already arranged for the data file to be in the working directory when strom is run, but users of your program may not know where strom is expecting data files to reside. You can help your future users by getting the program to tell you its current working directory when it starts up. We accomplish this by placing the following (highlighted) line just inside the Strom::run function:

    inline void Strom::run() {  
        std::cout &lt;&lt; "Starting..." &lt;&lt; std::endl;
        <span style="color:#0000ff"><strong>std::cout &lt;&lt; "Current working directory: " &lt;&lt; boost::filesystem::current_path() &lt;&lt; std::endl;</strong></span>

This line is what required us to compile the Boost file_system and system libraries. To use the current_path function, you’ll note that we included the boost/filesystem.hpp header at the top of the strom.hpp file.

Modifying the Strom::run function

Replace the try block in the Strom::run function with the highlighted lines below (and also add the highlighted line reporting the current working directory):

    inline void Strom::run() {  
        std::cout &lt;&lt; "Starting..." &lt;&lt; std::endl;
        <span style="color:#0000ff"><strong>std::cout &lt;&lt; "Current working directory: " &lt;&lt; boost::filesystem::current_path() &lt;&lt; std::endl;</strong></span>
        
        <span style="color:#0000ff"><strong>try {</strong></span>
            <span style="color:#0000ff"><strong>std::cout &lt;&lt; "\n*** Reading and storing the data in the file " &lt;&lt; _data_file_name &lt;&lt; std::endl;</strong></span>
            <span style="color:#0000ff"><strong>_data = Data::SharedPtr(new Data());</strong></span>
            <span style="color:#0000ff"><strong>_data-&gt;setPartition(_partition);</strong></span>
            <span style="color:#0000ff"><strong>_data-&gt;getDataFromFile(_data_file_name);</strong></span>

<span style="color:#0000ff"><strong></strong></span>
            <span style="color:#0000ff"><strong>// Report information about data partition subsets</strong></span>
            <span style="color:#0000ff"><strong>unsigned nsubsets = _data-&gt;getNumSubsets();</strong></span>
            <span style="color:#0000ff"><strong>std::cout &lt;&lt; "\nNumber of taxa: " &lt;&lt; _data-&gt;getNumTaxa() &lt;&lt; std::endl;</strong></span>
            <span style="color:#0000ff"><strong>std::cout &lt;&lt; "Number of partition subsets: " &lt;&lt; nsubsets &lt;&lt; std::endl;</strong></span>
            <span style="color:#0000ff"><strong>for (unsigned subset = 0; subset &lt; nsubsets; subset++) {</strong></span>
                <span style="color:#0000ff"><strong>DataType dt = _partition-&gt;getDataTypeForSubset(subset);</strong></span>
                <span style="color:#0000ff"><strong>std::cout &lt;&lt; "  Subset " &lt;&lt; (subset+1) &lt;&lt; " (" &lt;&lt; _data-&gt;getSubsetName(subset) &lt;&lt; ")" &lt;&lt; std::endl;</strong></span>
                <span style="color:#0000ff"><strong>std::cout &lt;&lt; "    data type: " &lt;&lt; dt.getDataTypeAsString() &lt;&lt; std::endl;</strong></span>
                <span style="color:#0000ff"><strong>std::cout &lt;&lt; "    sites:     " &lt;&lt; _data-&gt;calcSeqLenInSubset(subset) &lt;&lt; std::endl;</strong></span>
                <span style="color:#0000ff"><strong>std::cout &lt;&lt; "    patterns:  " &lt;&lt; _data-&gt;getNumPatternsInSubset(subset) &lt;&lt; std::endl;</strong></span>
                <span style="color:#0000ff"><strong>}</strong></span>
            <span style="color:#0000ff"><strong>}</strong></span>
        catch (XStrom & x) {
            std::cerr &lt;&lt; "Strom encountered a problem:\n  " &lt;&lt; x.what() &lt;&lt; std::endl;
        }

        std::cout &lt;&lt; "\nFinished!" &lt;&lt; std::endl;
    }   

The new code creates a Data object, informs it of the partition set up by the user in the config file or command line, and then reads the data file whose name was provided by the user via the --datafile command line option or datafile config file option.

Adding genetic code definitions to main.cpp

The _definitions data member of the GeneticCode class was declared static, so we need to initialize it outside of the GeneticCode class in a source code (cpp) file. As main.cpp is the only source code file in our project (all others are header files), we must modify main.cpp to initialize _definitions.

#include &lt;iostream&gt;
#include "strom.hpp"

using namespace strom;

// static data member initializations
std::string  Strom::_program_name        = "strom";
unsigned     Strom::_major_version       = 1;
unsigned     Strom::_minor_version       = 0;
const double Node::_smallest_edge_length = 1.0e-12;
<span style="color:#0000ff"><strong>GeneticCode::genetic_code_definitions_t GeneticCode::_definitions = {</strong></span>
                             <span style="color:#0000ff"><strong>// codon order is alphabetical: i.e. AAA, AAC, AAG, AAT, ACA, ..., TTT</strong></span>
    <span style="color:#0000ff"><strong>{"standard",             "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"vertmito",             "KNKNTTTT*S*SMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"yeastmito",            "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"moldmito",             "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"invertmito",           "KNKNTTTTSSSSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"ciliate",              "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVVQYQYSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"echinomito",           "NNKNTTTTSSSSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"euplotid",             "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSCCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"plantplastid",         "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"altyeast",             "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLSLEDEDAAAAGGGGVVVV*Y*YSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"ascidianmito",         "KNKNTTTTGSGSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"altflatwormmito",      "NNKNTTTTSSSSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVVYY*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"blepharismamacro",     "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YQYSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"chlorophyceanmito",    "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YLYSSSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"trematodemito",        "NNKNTTTTSSSSMIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSSWCWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"scenedesmusmito",      "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*YLY*SSS*CWCLFLF"},</strong></span>
    <span style="color:#0000ff"><strong>{"thraustochytriummito", "KNKNTTTTRSRSIIMIQHQHPPPPRRRRLLLLEDEDAAAAGGGGVVVV*Y*YSSSS*CWC*FLF"}</strong></span>
<span style="color:#0000ff"><strong>};</strong></span>

int main(int argc, const char * argv[]) {
    Strom strom;
    try {
        strom.processCommandLineOptions(argc, argv);
        strom.run();
    }
    catch(std::exception & x) {
        std::cerr &lt;&lt; "Exception: " &lt;&lt; x.what() &lt;&lt; std::endl;
        std::cerr &lt;&lt; "Aborted." &lt;&lt; std::endl;
    }
    catch(...) {
        std::cerr &lt;&lt; "Exception of unknown type!\n";
    }

    return 0;
}

Expected output

After running the program, the output should contain these lines, indicating that there were 14 taxa, 3 partition subsets, 60 sites divided equally into each subset, and a total of 9 + 5 + 17 = 31 data patterns found in the rbcL.nex file:

Partition subset first comprises sites 1-60\3 and has type nucleotide
Partition subset second comprises sites 2-60\3 and has type nucleotide
Partition subset third comprises sites 3-60\3 and has type nucleotide
Starting...
Current working directory: "/home/pol02003/stromtutorial.github.io/steps/step-09/test"

*** Reading and storing the data in the file rbcL.nex
storing implied block: TAXA
storing read block: DATA

Number of taxa: 14
Number of partition subsets: 3
  Subset 1 (first)
    data type: nucleotide
    sites:     20
    patterns:  7
  Subset 2 (second)
    data type: nucleotide
    sites:     20
    patterns:  5
  Subset 3 (third)
    data type: nucleotide
    sites:     20
    patterns:  17

Finished!

Something to try

To further test the data aspect of your program, create a file datatest.nex containing the DATA block from the rbcL.nex file plus two other blocks: (1) a CHARACTERS block containing the amino acid translation of the rbcL nucleotide data (as a test of reading protein data); and (2) a CHARACTERS block containing 5 (contrived) discrete morphological characters:

#nexus

begin data;
    dimensions ntax=14 nchar=60;
    format datatype=dna gap=-;
    matrix
        Atractomorpha_echinata          CCTGATTATGTTGTAAGAGACACTGATATTCTTGCTGCTTTCCGTATGACTCCTCAACCA
        Bracteacoccus_aerius            CCAGATTACGTAGTTAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCACAACCA
        Bracteacoccus_minor             CCAGATTACCTAGTTAAAGATACTGACATTTTATCTGCATTCCGTATGACTCCACAACCA
        Chlorotetraedron_incus          CCTGATTACGTTATCAAAGATACTGATATTTTAGCAGCATTCCGTATGACTCCACAACCA
        Chromochloris_zofingiensis      CCTGATTACGTAGTTAAAGATACAGATATTTTAGCAGCTTTCCGTATGACTCCTCAACCA
        Kirchneriella_aperta            CCTGATTACGTAGTAAGAGAGACTGACATCTTAGCTGCATTCCGTATGACTCCACAACCA
        Mychonastes_homosphaera         CCAGATTACGTTGTTAAAGATACTGACATCTTAGCAGCATTCCGTATGACTCCACAACCA
        Neochloris_aquatica             CCAGATTATGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCTCAACCA
        Ourococcus_multisporus          CCTGATTACGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCACAACCA
        Pediastrum_duplex               CCAGATTATGTTGTAAAAGATACTGATATTTTAGCTGCATTCCGTATGACTCCTCAACCA
        Pseudomuriella_schumacherensis  CCTGATTACGTAGTAAAAGAAACAGACATTCTAGCTGCATTCCGTATGACTCCTCAACCA
        Rotundella_rotunda              CCAGATTACGTTGTAAAAGAAACTGATATTTTAGCAGCATTCCGTATGACTCCTCAACCA
        Scenedesmus_obliquus            CCA?ATTACGTTGTAAAAGATACTGATATTTTAGCAGCATTCCGTATGACTCCACAACCA
        Stigeoclonium_helveticum        CCAGATTATATGGTTAAAGATACTGATATTCTTGCTGCTTTCCGTATGACTCCTCAACCT
    ;
end;

begin characters;
    dimensions nchar=20;
    format datatype=protein;
    matrix
        Atractomorpha_echinata             PDYVVRDTDILAAFRMTPQP
        Bracteacoccus_aerius               PDYVVKDTDILAAFRMTPQP
        Bracteacoccus_minor                PDYLVKDTDILSAFRMTPQP
        Chlorotetraedron_incus             PDYVIKDTDILAAFRMTPQP
        Chromochloris_zofingiensis         PDYVVKDTDILAAFRMTPQP
        Kirchneriella_aperta               PDYVVRETDILAAFRMTPQP
        Mychonastes_homosphaera            PDYVVKDTDILAAFRMTPQP
        Neochloris_aquatica                PDYVVKDTDILAAFRMTPQP
        Ourococcus_multisporus             PDYVVKDTDILAAFRMTPQP
        Pediastrum_duplex                  PDYVVKDTDILAAFRMTPQP
        Pseudomuriella_schumacherensis     PDYVVKETDILAAFRMTPQP
        Rotundella_rotunda                 PDYVVKETDILAAFRMTPQP
        Scenedesmus_obliquus               P?YVVKDTDILAAFRMTPQP
        Stigeoclonium_helveticum           PDYMVKDTDILAAFRMTPQP
    ;
end;

begin characters;
    dimensions nchar=5;
    format Datatype=standard symbols="0123";
    matrix
        Atractomorpha_echinata             01111
        Bracteacoccus_aerius               11111
        Bracteacoccus_minor                11111
        Chlorotetraedron_incus             01111
        Chromochloris_zofingiensis         01111
        Kirchneriella_aperta               00111
        Mychonastes_homosphaera            00111
        Neochloris_aquatica                00011
        Ourococcus_multisporus             00002
        Pediastrum_duplex                  00002
        Pseudomuriella_schumacherensis     00003
        Rotundella_rotunda                 00003
        Scenedesmus_obliquus               00003
        Stigeoclonium_helveticum           00003
    ;
end;

Now modify your strom.conf file to look like this:

# datafile = rbcL.nex 
# subset = first:1-60\3
# subset = second:2-60\3
# subset = third:3-60\3
datafile = datatest.nex
subset = rbcL[codon,plantplastid]:1-20
subset = rbcL[protein]:21-40
subset = morph[standard]:41-45 

Note that I’ve commented out the commands used for reading rbcL.nex by placing a hash (#) character at the beginning of each line. I’ve added a datafile command to read datatest.nex and subset commands to interpret the nucleotide data as codons, the protein data as type protein, and the morphological data as type standard. Note the site specification 1-20 rather than 1-60 for the first subset statement (storing 20 codons for each taxon rather than 60 nucleotides).

You should also add the following highlighted line to the bottom of your meson.build file if you want datatest.nex to be copied to the installation directory when strom is built.

project('strom', 'cpp', 
    default_options : ['cpp_std=c++11','cpp_link_args=-static','prefix=/home/paul/Documents/strom/distr'],
    version : '1.0')
cpp = meson.get_compiler('cpp')
lib_filesystem = cpp.find_library('boost_filesystem', dirs: ['/home/paul/lib/static'], required: true)  
lib_program_options = cpp.find_library('boost_program_options', dirs: ['/home/paul/lib/static'], required: true)
lib_ncl = cpp.find_library('ncl', dirs: ['/home/paul/lib/static'], required: true)
incl_ncl = include_directories('/home/paul/include')
incl_boost = include_directories('/home/paul/Documents/libraries/boost_1_71_0')
executable('strom', 'main.cpp', install: true, install_dir: '.', dependencies: [lib_ncl,lib_program_options,lib_filesystem], include_directories: [incl_ncl,incl_boost])    
install_data('strom.conf', install_dir: '.')    
install_data('test.tre', install_dir: '.')
install_data('rbcL.nex', install_dir: '.')  
<span style="color:#0000ff"><strong>install_data('datatest.nex', install_dir: '.')</strong></span>

Running the program again should now yield the following output:

Partition subset rbcL comprises sites 1-20 and has type codon,plantplastid
Partition subset rbcL comprises sites 21-40 and has type protein
Partition subset morph comprises sites 41-45 and has type standard
Starting...
Current working directory: "/home/pol02003/stromtutorial.github.io/steps/step-09/test"

*** Reading and storing the data in the file datatest.nex
storing implied block: TAXA
storing read block: DATA
storing read block: CHARACTERS
storing read block: CHARACTERS

Number of taxa: 14
Number of partition subsets: 3
  Subset 1 (rbcL)
    data type: codon,plantplastid
    sites:     20
    patterns:  20
  Subset 2 (rbcL)
    data type: protein
    sites:     20
    patterns:  17
  Subset 3 (morph)
    data type: standard
    sites:     5
    patterns:  5

Finished!

< 9.3 | 9.4 | 10.0 >