Monday, July 16, 2018

A minor mistake from make_split_iterator

Recently, I found out there are some minor mistake lie underneath the following function:
FileNode* FileBot::constructParentPath(boost::filesystem::path inputPath)
{
    ...
    ...

    typedef split_iterator<string::iterator> string_split_iterator;
    for( string_split_iterator it = make_split_iterator(path, first_finder(GENERIC_PATH_SEPARATOR, is_equal()));
         it != string_split_iterator();
         ++it)
    {
       FileNode *node = new FileNode();
       node->setType('d');
       node->setName(copy_range<string>(*it));

       BOOST_LOG_TRIVIAL(info) << "value inserted: [" << copy_range<string>(*it) << "]";

       if( parent == NULL )
           fileList.push_back(*node);
       else {
           parent->sibling.push_back(*node);
           node->setParentNode(parent);
       }

       parent = node;
       ...
       ...
    }
}
This mistake would not cause any major crash, it just doesn't look "pretty" in terms of behavioral design. Before I go into the details, let's consider following 2 possible inputs:
  1. /home/user
  2. C:/home/user 
The output for entry 2 will generate a nice and beautiful link-list as shown in the image below:
But entry 1 would become like this, the first node in the link-list doesn't contain any name.
 
Why this could happen? I though the code will automatically handle the first token when it is empty? And I also realized this only happened to Linux due to Linux file system structure design behave differently from Windows. To prove my assumption, I run a series of test on make_split_iterator(), and the result shows the method does return an empty string. Another test on boost::split() also having this issue. The only one success is boost::tokenizer().

BOOST_AUTO_TEST_CASE(TL_split_words, *boost::unit_test::precondition(skipTest(false)) )
{
    string win = "D:/home/user";
    string lin = "/home/user";

    vector<string> words;
    string str, firstElement;

    cout << "***** version 1 *****" << endl;
    cout << "Windows platform: ";

    str = win;
    boost::split(words, str, boost::is_any_of(GENERIC_PATH_SEPARATOR), boost::token_compress_on);

    for(string w : words) {
        cout << "[" << w << "]";
    }
    cout << endl;

    firstElement = words.at(0);

    BOOST_TEST(words.size() == 4);
    BOOST_TEST(firstElement.compare("D:") == 0);

    words.clear();
    str = lin;
    boost::split(words, str, boost::is_any_of(GENERIC_PATH_SEPARATOR), boost::token_compress_on);

    cout << "Linux platform: ";
    for(string w : words) {
        cout << "[" << w << "]";
    }
    cout << endl;

    firstElement = words.at(0);

    BOOST_TEST(words.size() == 4);
    BOOST_TEST(firstElement.compare("") == 0);

    cout << "***** version 2 *****" << endl;
    cout << "Windows platform: ";

    str = win;

    typedef split_iterator< string::iterator> string_split_iterator;
    for( string_split_iterator it = make_split_iterator(str, first_finder(GENERIC_PATH_SEPARATOR, is_equal()));
         it != string_split_iterator();
         ++it)
    {
        cout << "[" << copy_range<string>(*it) << "]";
    }
    cout << endl;

    cout << "Linux platform: ";
    str = lin;

    typedef split_iterator<string::iterator> string_split_iterator;
    for( string_split_iterator it = make_split_iterator(str, first_finder(GENERIC_PATH_SEPARATOR, is_equal()));
         it != string_split_iterator();
         ++it)
    {
        cout << "[" << copy_range<string>(*it) << "]";
    }
    cout << endl;

    cout << "***** version 3 *****" << endl;
    cout << "Windows platform: ";

    str = win;

    boost::char_separator<char> sep{GENERIC_PATH_SEPARATOR.c_str()};
    boost::tokenizer<boost::char_separator<char> > token{str, sep};

    for( const auto &t : token ) {
        cout << "[" << t << "]";
    }
    cout << endl;

    cout << "Linux platform: ";
    str = lin;
    token = {str, sep};

    for( const auto &t : token ) {
        cout << "[" << t << "]";
    }
    cout << endl;
}
The output of this test would be like this:
***** version 1 *****
Windows platform: [D:][home][user]
Linux platform: [][home][user]
***** version 2 *****
Windows platform: [D:][home][user]
Linux platform: [][home][user]
***** version 3 *****
Windows platform: [D:][home][user]
Linux platform: [home][user]
I admit that I have overlooked the code, it is just too detail which something I have missed. Although the test doesn't fail, but the consequent impact would cause the link-list contain a node without a name.

No comments: