Sunday, May 26, 2019

Rework on loadRemoteFS

I was infested by the novel The Song of Ice and Fire, I can't concentrate on my code now. One day when I was looking into the loadRemoteFS(), reading it for a few rounds, I see something wasn't right. The section in // continue to search the path under the section // route back to home path to start the search again was not execute if I read it correctly.
    for( ; it != fileList.end(); it++ ) {
        entry = (string)*it;

        LOGGER(lg, info) << "Processing entry: " << entry << " parent path: " << entry.parent_path()
             << " file name: " << entry.filename();

        LOGGER(lg, info) << "Current path: " << constructPathAddress(currentPosition);

        // navigate to new path if the entry wasn't same with the current position
        if( tmpParentMemory.compare(entry.parent_path().string()) != 0 ) {
            // validate local path before moving on to new path

            LOGGER(lg, info) << "Navigate to new path: " << entry.parent_path();

            string key = "";

            // search forward from current position
            if( tmpParentMemory.length() < entry.parent_path().string().length() ) {
                key = entry.parent_path().string().substr(tmpParentMemory.length(), entry.parent_path().string().length() - tmpParentMemory.length());

                LOGGER(lg, info) << "Search forward to: " << key;

                currentPosition = loadRemoteSubPath(currentPosition, key.substr(1, key.length()));
            }
            // route back to home path to start the search again
            else {
                key = entry.parent_path().string();

                // continue to search the path
                if( key.length() > (remoteHome->getName().string().length() + 2) ) {
                    key = key.substr(remoteHome->getName().string().length() + 2, remoteHome->getName().string().length() + 1);
                    currentPosition = loadRemoteSubPath(remoteHome, key);

                    LOGGER(lg, info) << "Searching for " << key << " from home path";
                }
                // we have arrive to home path
                else {
                    LOGGER(lg, info) << "Don't go elsewhere, the file is right under home path";

                    currentPosition = remoteHome;
                }
            }
        }

        // construct them just as in current level
        FileNode *newNode = new FileNode();
        newNode->setName(entry.filename());
        newNode->setType('f');
        newNode->setParentNode(currentPosition);

        currentPosition->sibling.push_back(*newNode);

        LOGGER(lg, info) << "Insert " << newNode->getName() << "[" << newNode << "] into the path: " << currentPosition;

        // update the new entry into memory
        tmpParentMemory = entry.parent_path().string();
    }
I was looking at the ceiling though of something for a while. I am not thinking how could I optimize the code, but thinking what would be the next of GoT instead... and then continue my novel reading. A few days later, I did some improvement to the code, the code now looks much cleaner than before.
    for( ; it != fileList.end(); it++ ) {
        entry = (string)*it;

        FileNode *currentPosition = analyseEntryNode(entry);

        // construct them just as in current level
        FileNode *newNode = new FileNode();
        newNode->setName(entry.filename());
        newNode->setType('f');
        newNode->setParentNode(currentPosition);

        currentPosition->sibling.push_back(*newNode);

        LOGGER(lg, info) << "Insert " << newNode->getName() << "[" << newNode << "] into the path: " << currentPosition;
    }
A damn lot of junk code has been removed. I am trying to let the code reading much easier. At here, I am doing some analysis to identify the position where should the new file node to be captured. If I want to know more how things done, then I further drill down to the details. I put them into a new method as shown below:
FileNode* FileBot::analyseEntryNode(path entry)
{
    string curLocation = Configuration::getInstance()->getRemotePath();
    FileNode *curPos = remoteHome;

    LOGGER(lg, info) << "Processing entry: " << entry << " parent path: " << entry.parent_path()
         << " file name: " << entry.filename();

    LOGGER(lg, info) << "Current path: " << constructPathAddress(curPos);

    // navigate to new path if the entry wasn't same with the current position
    if( curLocation.compare(entry.parent_path().string()) != 0 ) {
        // validate local path before moving on to new path

        LOGGER(lg, info) << "Navigate to new path: " << entry.parent_path();

        string key = "";

        // search forward from current position
        if( curLocation.length() < entry.parent_path().string().length() ) {
            key = entry.parent_path().string().substr(curLocation.length(), entry.parent_path().string().length() - curLocation.length());

            LOGGER(lg, info) << "Search forward to: " << key;

            curPos = lookupPath(curPos, key.substr(1, key.length()));
        }
    }

    return curPos;
}

Wednesday, February 20, 2019

New implementation on logger

I suppose to work on the logger but due to some unit test code has not been integrated with the PUGI XML, I worry that base code might have some flaw in it. I have to focus myself to complete this before moving on to the next task. It took me a few weeks to clean my mess. For now, it is time for me to clear my doubt on the logger. Before this, I am using a very simple method to show my log in the console output:
...

BOOST_LOG_TRIVIAL(info) << "my log message";

...
I don't have a proper file logger implement yet, I've been thinking to use log4cxx but then I give up since I begin it with Boost Log. From my experience on log4j, to implement a file logger in Boost Log, it would be a simple task to do provided I have followed the documentation. Otherwise, it's going to be so hard to do it. I found a working sample used for logging the file name and file number in the log in StackOverflow. This is what I need to do for my case, I define my logger in FileBot.h:
...

#define LOGGER(logger, sev) \
    BOOST_LOG_STREAM_WITH_PARAMS( \
        (logger), \
        (set_get_attrib("FILE", path_to_filename(__FILE__))) \
        (set_get_attrib("LINE", __LINE__)) \
        (::boost::log::keywords::severity = (boost::log::trivial::sev)) \
    )

...
The the bunch of utility function mention in the above #define in CPP file.
...

template<typename valuetype="">
ValueType set_get_attrib(const char* name, ValueType value)
{
    auto attr = logging::attribute_cast<logging::attributes::mutable_constant aluetype="">>(logging::core::get()->get_thread_attributes()[name]);
    attr.set(value);
    return attr.get();
}

std::string path_to_filename(std::string path)
{
    return path.substr(path.find_last_of("/\\")+1);
}

...
And then the initialize code in FileBot constructor:
...

FileBot::FileBot()
{
    logging::core::get()->add_thread_attribute("FILE", boost::log::attributes::mutable_constant<string>(""));
    logging::core::get()->add_thread_attribute("LINE", boost::log::attributes::mutable_constant<int>(0));

    logging::add_file_log
    (
       logging::keywords::file_name = "/home/kokhoe/workspaceqt/debug/sample_%N.log",
       logging::keywords::rotation_size = 10 * 1024 * 1024,
       logging::keywords::time_based_rotation = logging::sinks::file::rotation_at_time_point(0, 0, 0),
       logging::keywords::format = (
                expr::stream
                      << expr::format_date_time< boost::posix_time::ptime >("TimeStamp", "%Y-%m-%d %H:%M:%S")
                      << " [" << boost::log::trivial::severity << "] "
                      << '['   << expr::attr<std::string>("FILE")
                               << ':' << expr::attr<int>("LINE") << "] "
                      << expr::smessage
       )
    );

    logging::add_common_attributes();

...
The usage of the code is simply
...

LOGGER(lg, info) << "my log message";
...
Then the output will show this:

2019-02-19 22:03:26 [info] [FileBot.cpp:426] my log message

Monday, January 28, 2019

Integration of Configuration class and FileBot class

Since I have the Configuration class ready, is time to integrate it into my FileBot class. As for now, only loadRemoteFS() is requiring data from INDEX file. Thus, integration should be much easier. Before this, I am using a list for temporary mimic the data available from INDEX file. This is what I do in my unit test (testSearch.cpp):
/*****   FileBot.cpp   *****/

std::list<string> FileBot::loadRemoteFS(vector<string> fileList, bool showCaptureList) {
   
   for( vector<string>::iterator it = fileList.begin(); it != fileList.end(); it++ ) {
   }

   ...
   ...
}


/*****   testSearch.cpp   *****/

BOOST_AUTO_TEST_CASE(TL_5, *boost::unit_test::precondition(skipTest(false)))
{
    ...
    ...

    vector<string> keyList;
    keyList.push_back("/home/puiyee/workspaceqt/debug/FolderA");
    keyList.push_back("/home/puiyee/workspaceqt/debug/FolderA/subA/subB/file_3.txt");
    vector<string> found;

    fb.loadRemoteFS(keyList);

    ...
}
Now I have replaced this chunk of code with a more elegant piece. I am creating a real XML file to mimic the INDEX file in createDestinationConfig() during the test. And now the loadRemoteFS() is no longer taking any std::list as input parameter, it will digest the INDEX file, it knows what data to look for, shallow it, and then produce the output.
/*****   FileBot.cpp   *****/

FileNode* FileBot::loadRemoteFS()
{
   ...

   pugi::xpath_node_set fileList = Configuration::getInstance()->retrieveRemoteFiles("/backup/file");
   pugi::xpath_node_set::const_iterator it = fileList.begin();
   for( ; it != fileList.end(); it++ ) {
   }

   ...
   ...
}


/*****   testSearch.cpp   *****/

BOOST_AUTO_TEST_CASE(TL_1, *boost::unit_test::precondition(skipTest(false)))
{
   ...
   ...

   std::vector<string> keyList;
   keyList.push_back("/home/puiyee/workspaceqt/debug/FolderA/subA/subB/file_3.txt");

   FileBotUnderTest fb;
   fb.createDestinationConfig("/home/puiyee/workspaceqt/debug/FolderA", keyList);
   fb.loadRemoteFS();

   ...
}
Don't confuse that the keyList mention in the test case above is required by the createDestinationConfig(). And also the first item of keyList, which indicate the root path of remote path is also not require anymore. Since it produces output, I need to verify the output to ensure consistency. I use this code at the end of unit test.
BOOST_AUTO_TEST_CASE(TL_1, *boost::unit_test::precondition(skipTest(true)))
{
    ...
    ...

    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file("backup.xml");
    if( result ) {
        pugi::xpath_node_set files = doc.select_nodes("/backup/file");
        BOOST_TEST(files.size() == 0);

        files = doc.select_nodes("/backup/recover/dest");
        for( pugi::xpath_node_set::const_iterator it = files.begin(); it != files.end(); it++ ) {
            pugi::xml_node file = ((pugi::xpath_node)*it).node();
            string value = file.text().get();

            BOOST_TEST(value.compare("/home/puiyee/workspaceqt/debug/FolderA/sub/file_2.txt") == 0);
        }

        files = doc.select_nodes("/backup/recover/src");
        BOOST_TEST(files.size() == 0);
    }
In this unit test, I will load the backup.xml and then verify the /backup/recover/dest does created in following format.
 <backup>
    <recover>
       <dest></dest>
    </recover>
 </backup>
And same goes to /backup/recover/src.

Friday, January 25, 2019

Introducing new member - Configuration

Time flies, almost 2 months since my last update. I was working on a new class to handle the INDEX file. This class was given a name as Configuration, and its sole responsibility is to work together with INDEX file. The INDEX file consists of XML tag containing information about a file structure being scanned.

During that 2 months, I was struggling with Boost with handling the XML file, I have completed roughly 60% of my work only after I found out it is not easy to remove an XML tag that I don't need anymore. Many try and error still unable to work it out, then I start to look for alternate solutions and I found out Pugi XML able to remove and update easily. With that, I begin to switch my code to Pugi. I was lucky that around 30% of rework need to be done.

When I first design on this class, I try to think of a lazy way to accomplish a task. I try to avoid to call initialize() when the class is first initialize. This doesn't look smart. So I do it in the constructor. It is an old method, but effective. When this class is first born, it will look for the configuration setting for the source and remote path. Every time this class is loaded into memory, it will look for the remote path. If it's empty, then initialize it, otherwise it is a source path.
Configuration::Configuration()
{
    defaultPath = filesystem::current_path().string() + GENERIC_PATH_SEPARATOR + INDEX_FILENAME;

    // create new one if the index file doesn't exists
    if( !boost::filesystem::exists(INDEX_FILENAME) ) {
        auto declareNode = doc.append_child(pugi::node_declaration);
        declareNode.append_attribute("version") = "1.0";
        declareNode.append_attribute("encoding") = "UTF-8";

        pathLookup(filesystem::current_path().string());
    }
    // load the index file if it exists
    else {
        readConfigFile();

        destPath = doc.child("backup").child("configuration").child("destination_path").child_value();
        sourcePath = doc.child("backup").child("configuration").child("source_path").child_value();
    }
}

string Configuration::pathLookup(string inputPath)
{
    // assign the new destination path if the field doesn't exists
    if( destPath.size() == 0 && sourcePath.size() == 0 ) {
        destPath = inputPath;

        pugi::xml_node destPathNode = doc.append_child("backup").append_child("configuration").append_child("destination_path");
        destPathNode.append_child(pugi::node_pcdata).set_value(inputPath.c_str());

        updateConfiguration();

        return destPath;
    }
    // assign the new source path if the field doesn't exists
    else if( sourcePath.size() == 0 && destPath.size() != 0 ) {
        sourcePath = inputPath;

        pugi::xml_node configNode = doc.child("backup").child("configuration");
        configNode.append_child("source_path").append_child(pugi::node_pcdata).set_value(inputPath.c_str());

        updateConfiguration();

        return sourcePath;
    }
    else
        return "";
}
Next is the content construction. This class must be able to construct the XML tag from a given input. For example, if I pass in the input like this, recover.source, then it must be able to construct as shown below:
<recover>
   <source></source>
</recover>
And not something like this:
<recover.source></recover.source>
Well, that's about the design, but the code behind this logic isn't straight forward. One condition is to validate whether it is allowed to duplicate, another is to check whether the XML tag exists, if it doesn't, then create it.
pugi::xml_node Configuration::allocateNode(string key, bool duplicateKey)
{
    char_separator<char> sep(".");
    tokenizer<char_separator<char> > token(key, sep);
    pugi::xml_node node;

    BOOST_FOREACH( const string& nodeName, token ) {
        qDebug() << "processing node name: " << nodeName.c_str();

        // retrieve the root node for the first time
        if( node.empty() ) {
            // create a new node if the root node was not found
            if( doc.child(nodeName.c_str()) == nullptr )
                node = doc.append_child(nodeName.c_str());
            // retrieve the root node
            else
                node = doc.child(nodeName.c_str());
        }
        else {
            // test if the child node is there
            if( node.child(nodeName.c_str()) == nullptr )
                node = node.append_child(nodeName.c_str());
            // retrieve the node
            else
                node = node.child(nodeName.c_str());
        }
    }

    return node;
}


void Configuration::writeValue(string key, bool duplicateKey, string value)
{
    bool allowInsert = true;

    string xpath = key;
    // convert key to XPath
    std::replace(xpath.begin(), xpath.end(), '.', '/');
    xpath = "/" + xpath;
    qDebug() << "XML node path: " << xpath.c_str();

    // duplicate value is not allowed
    if( !duplicateKey ) {
        // overwrite the value without validation
        pugi::xpath_node node = doc.select_node(xpath.c_str());
        if( !node.node().empty() ) {
            qDebug() << node.node().name() << " : " << node.node().text().get();

            node.node().text().set(value.c_str());
        }
        else {
            pugi::xml_node tmp = allocateNode(key, duplicateKey);
            tmp.text().set(value.c_str());
        }
    }
    else {
        // walk throught each node to check any duplicate value
        pugi::xpath_node_set files = doc.select_nodes(xpath.c_str());
        for( pugi::xpath_node_set::const_iterator it = files.begin(); it != files.end(); ++it ) {
            pugi::xpath_node file = *it;
            string val = file.node().text().get();

            qDebug() << file.node().name() << " : " << file.node().text().get();

            if( value.compare(val) == 0 )
                allowInsert = false;
        }

        // no duplicate value, proceed to insert the value
        if( allowInsert ) {
            // bail if this is equal to first node
            if( key.find_last_of(".") == -1 )
                return;

            string parentNode;
            parentNode = key.substr(0, key.find_last_of("."));

            if( parentNode.compare(key) == 0 )
                return;

            // is the XML node missing? Make a new one if it went missing
            pugi::xml_node node = allocateNode(parentNode, duplicateKey);

            string nodeName = key.substr(key.find_last_of(".") + 1, key.length());
            node = node.append_child(nodeName.c_str());
            node.append_child(pugi::node_pcdata).set_value(value.c_str());
        }
    }

    updateConfiguration();
}
Last but not least, this class is also able to remove an XML tag. The nodePath will tell which part of the XML tag will be removed, the removal condition must contain the value mention in the nodeValue.
void Configuration::removeNode(string nodePath, string nodeValue)
{
    // convert key to XPath
    std::replace(nodePath.begin(), nodePath.end(), '.', '/');
    nodePath = "/" + nodePath;
    qDebug() << "XML node path: " << nodePath.c_str();

    pugi::xml_node node;
    pugi::xpath_node_set nodes = doc.select_nodes(nodePath.c_str());
    for( pugi::xpath_node_set::const_iterator it = nodes.begin(); it != nodes.end(); ++it ) {
        pugi::xpath_node file = *it;
        string val = file.node().text().get();

        if( nodeValue.compare(val) == 0 ) {
            node = file.node();
            break;
        }
    }

    node.parent().remove_child(node);
    updateConfiguration();
}