Knuth–Morris–Pratt Algorithm

Computer Science students often need to search for particular words, alphabets or patterns in a set of strings. Consider the need to search for a particular letter or word in a text and find out the number of times it appears. This is made possible by the Knuth-Morris-Pratt Algorithm that was released in 1977 and has ever since secured application in various fields. A writer, for example, may need to search for a particular keyword in his 1000 word long text. It is difficult and time-consuming to do it manually.

A brief description of the algorithm

The Knuth-Morris-Pratt algorithm basically deals with searching a particular character within a specified range. Let’s say it’s a word that you wish to search for. The algorithm calculates if any occurrences of that word succeed a space represented by ‘\0’.

The overlapping function includes i and j counters. Counters basically calculate the number of times the searched word appears in the text or string.

How is KMP algorithm an improvement from Naïve strings?

One problem of using the simple or the naïve pattern of this algorithm that it is slow. Since you are working with loops, it will not break until it searches for (a*b) times, where ‘a’ be the pattern length and ‘b’ be the length of the text.

Some basic concepts that students need to learn

There are some terms that students need to understand before they go deep into the code. Remember that coding is only half the work. A couple of algorithm specific terms are as follows:

  • Proper subset:‘a’ is said to be a subset of A if a is fully contained inside A. In case of a non-empty set, empty set is considered to be a proper subset.
  • Proper prefix:The word ‘prefix’ in KMP algorithm retains its original meaning and is called a proper prefix if
  • It contains a string of letters which is smaller in countthan the string itself.
  • It precedes at least one character from the string.
  • Proper suffix: Similarly a proper suffix succeeds at least one character from the string.

So in “string”, proper prefix = {“strin”, “stri”, “str”, “st”, “s”}

Example of KMP algorithm for String matching

If you search for ‘str’ in an entire length of a text, the KMP algorithm aims to match each of the positions where ‘str’ is found. Additionally it looks for spaces which are succeeded by ‘str’, only when it counts it as one.

KMP Algorithm as a remedy to Naïve String problem

After you have mastered the concepts of suffix and prefix, you need to understand the index and value of the character you are dealing with. Given a particular sub pattern, the length of proper prefix should match that of the proper suffix. Only then will it count it as one, provided that a space succeeds it.

