Tuesday, February 10, 2015

Finding index of Nth occurrence of a string

Recently I was working on a task that require me to find the Nth occurrence of a string. In other words, this function should return the position index of Nth occurrence. For example, the 2nd occurrence of o in a string hello world will locate at 7. Thus, I have following code created for this:
int searchNth(const char* work, const char* find, const int nth) {

 int work_len = strlen(work);
 int find_len = strlen(find);
 int match_count = 0;
 int match_index_pos = -1;
 int occurrence = -1;

 for (int i = 0; i < work_len; i++) {

  if (work[i] == find[match_count]) {
   match_count++;

   if (match_count == 1) {
    match_index_pos = i;
   }
  }
  else if (match_count > 0 && work[i] != find[match_count]) {
   match_count = 0;
   match_index_pos = -1;
  }

  if (find_len == match_count) {
   occurrence++;
  }

  if (nth == occurrence) {
   return match_index_pos;
  }
 }

 return -1;
}
Sample of execution:

searchNth("hello world", "o", 1); // output: 7
searchNth("hello world", "o", 2); // output: -1
searchNth("hello world", "o ", 0); // output: 4

Well, I'm pretty proud of myself because I have created such a nice piece of code that could return the nth occurrence of a string. (Not really!) Someone has better solution than mine. I just found the answer in this thread.
size_t find_Nth(
    const std::string & str ,   // where to work
    unsigned            N ,     // N'th ocurrence
    const std::string & find    // what to 'find'
) {
    if ( 0==N ) { return std::string::npos; }
    size_t pos,from=0;
    unsigned i=0;
    while ( i<N ) {
        pos=str.find(find,from);
        if ( std::string::npos == pos ) { break; }
        from = pos + 1; // from = pos + find.size();
        ++i;
    }
    return pos;
/**
    It would be more efficient to use a variation of KMP to
    benefit from the failure function.
    - Algorithm inspired by James Kanze.
    - http://stackoverflow.com/questions/20406744/
*/
}
What a nice piece. The code is much cleaner than mine, lesser if statement, easier to digest with human brain. (Perfect!) Somehow, here is still better one. Believe or not, Boost library did have such algorithm developed:
    string work = "hello world";
    iterator_range<string::iterator> r = find_nth(work, "o", 1);
    cout << distance(work.begin(), r.begin()) << endl;
Wow!!! This piece was just damn real simple. From the top most till the bottom, although there are producing the same output, but the implementation was totally different. I can see the code is evolving gradually. What is next? What else can be done in the future? Fewer code?

Last but not least, thanks to Boost algorithm, nice work!

No comments: