This section of the archives stores flipcode's complete Developer Toolbox collection, featuring a variety of mini-articles and source code contributions from our readers.

 

  String Matching In Linear-Time
  Submitted by



Hi, I noticed the queue seemed to be empty so here's a string matching implementation that runs in linear time, O(n+m) to be exact, where n is the length of the text to be searched and m is the length of the text to search for. It's based on the Knuth-Morris-Pratt algorithm (or my understanding of it at least :)). Although not an explicit game-programming issue, I hope this can be helpful for some people.

The single file actually compiles into a program that searches a file for a string. However here's an example:
// ex1
string_search my_search;
int pos = my_search.find_first("badfbhfbsdcbaca", "ba");
// ex2
std::vector<int all_pos = my_search.find_all("badfbhfbsdcbaca", "ba"); 

The first example will return 0, and the other 0 and ... eh, something else. You may use it as you like, but if you're going to give credits, give them where they are due.

/Andreas Magnusson
http://www.mdstud.chalmers.se/~md7amag/

Download Associated File: strmatch.cpp (3,340 bytes)

/* String search based on the Knuth-Morris-Pratt algorithm for
   linear running-time, O(m+n) where m=length of pattern and
   n=length of text.

Written by Andreas Magnusson in November 2001

You may do as you please with this code, but don't blame me if anything goes wrong... */
#include <iostream> #include <fstream> #include <string> #include <vector>

typedef std::vector<int> int_vec;

class string_search { int_vec shifts; void compute_shifts(const std::string &pattern); public: int find_first(const std::string &text, const std::string &pattern); int_vec find_all(const std::string &text, const std::string &pattern);

};

// create the shift-lookup-table void string_search::compute_shifts(const std::string &pattern) { int next_shift = 0; shifts.clear(); shifts.push_back(0); // shift to the first character // start with the second character, since the shift to the first is always 0 for(int i = 1; i < pattern.length(); i++) { while(next_shift > 0 && pattern[next_shift] != pattern[i]) next_shift = shifts[next_shift];

if(pattern[next_shift] == pattern[i]) next_shift++;

shifts.push_back(next_shift); } }

// search the string and return when the first occurrence is found int string_search::find_first(const std::string &text, const std::string &pattern) { int next_shift = 0; compute_shifts(pattern); for(int i = 0; i < text.length(); i++) { while(next_shift > 0 && pattern[next_shift] != text[i]) next_shift = shifts[next_shift - 1];

if(pattern[next_shift] == text[i]) next_shift++;

if(next_shift == pattern.length()) return i - (pattern.length() - 1); // found the first so return } return -1; }

// search the string and put every occurence in a vector int_vec string_search::find_all(const std::string &text, const std::string &pattern) { int next_shift = 0; int_vec positions; compute_shifts(pattern); for(int i = 0; i < text.length(); i++) { while(next_shift > 0 && pattern[next_shift] != text[i]) next_shift = shifts[next_shift - 1];

if(pattern[next_shift] == text[i]) next_shift++;

if(next_shift == pattern.length()) { positions.push_back(i - (pattern.length() - 1)); // found one, put in list next_shift = shifts[next_shift - 1]; // restart pattern with last shift } } return positions; }

int main(int argc, char **argv) { if(argc <= 2){ cout << "Usage: " << argv[0] << " filename searchpattern" << endl; return 0; } std::string pattern = argv[2];

// read the file. Since the file is read like this all white-characters // are eaten, so a search including white-characters will fail... fstream fs; std::string text, temp; fs.open(argv[1], ios::in); while(!fs.eof()){ fs >> temp; text += temp; } fs.close();

// search the file string_search search; int_vec pos_list = search.find_all(text, pattern);

// print out result std::vector<int>::iterator it; cout << "Found " << pos_list.size() << " occurrences" << endl; for(it = pos_list.begin(); it != pos_list.end(); it++){ temp = text.substr(*it, pattern.length()); cout << "Pos=" << *it << " == " << temp << endl; } }

The zip file viewer built into the Developer Toolbox made use of the zlib library, as well as the zlibdll source additions.

 

Copyright 1999-2008 (C) FLIPCODE.COM and/or the original content author(s). All rights reserved.
Please read our Terms, Conditions, and Privacy information.