Skip to main content

Calculating Boyer Moore Bad Character Table with examples

Let’s look at how to calculate the Boyer Moore bad character table with an example. Boyer Moore is a string matching algorithm. So what it basically does is, finding the occurrence of a pattern within a text. There are many string matching algorithms including,

  1. Naive String Matching Algorithm
  2. KMP (Knuth-Morris-Pratt) Algorithm
  3. Rabin Carp Algorithm
  4. Boyer Moore Algorithm
  5. Boyer Moore Horsepool Algorithm

In this article, we are focusing on the 2 Boyer Moore Algorithms. Assuming you already have enough background knowledge about how to use the bad character table, I will describe how to calculate the crucial part in these algorithms, the bad character table because when I learned this, there were not enough resources on the internet on how to calculate them.

You are given two strings. One is the pattern and the other one is the text. You have to find the pattern within the text. Let’s look at an example,


Text => “THIS IS A TEST

Pattern => “TEST


You can clearly see that the Text has the pattern but the computer cannot identify that just like that. This is what string matching algorithms are for. So now let’s see step by step how to calculate the bad character table.

Steps are as follows,

  1. Take the pattern and index each character starting from 0
Image for post
pattern indexing

2. Take the length of the pattern as length like above.

3. Now let’s draw the skeleton of the table.

Image for post
initial bad character table

Notice the Letters in the above picture. We have to get the unique characters in the pattern as letters. And the star (*) is for any other character we may come upon when going through the Text that is not in the pattern. Now we have to fill this table.

4. Now, for every character in the pattern, calculate max(1, length-index-1) and put it in the table.

Here, max(1, length-index-1) means that you have to choose the maximum number between 1 and the calculated length-index-1 value for every letter. Which simply means that the value can’t be less than 1.

length = length of the pattern

index = index of the character

NOTE: If the same character repeats, update the table by the values from the new character (or simply consider the rightmost occurrence of the character in the pattern and calculate the value for it)

The value for the * is the length of the pattern

Letter | T | E | S | * |

Value | 1 | 2 | 1 | 4 |

Value( T ) = max (1, 4–0–1) = max ( 1, 3) = 3

Value( E ) = max (1, 4–1–1) = max ( 1, 2) = 2

Value( S ) = max (1, 4–2–1) = max ( 1, 1) = 1

Value( T ) = max (1, 4–3–1) = max ( 1, 0) = 1 <=

update the T value <= Rightmost occurrence of T

Now let’s look at the Boyer Moore Horsepool Algorithm

In the Boyer Moore Horsepool Algorithm, the bad character table is a little bit different. Let’s take an example,

Text => “TRUSTHARDTOOTHBRUSHES"

Pattern => “TOOTH”


The steps in the Horsepool algorithm are,

1) Take the pattern and index each character starting from 0 ( You may ignore the last character )

2) Take the length of the pattern as length

3) Ignore the last character.

For every other character in the pattern, calculate (length-index-1), and put it in the table.

length = length of the pattern

index = index of the character

4) If the same character repeats, update the table by the values from the new character (or simply consider the rightmost occurrence of the character in the pattern and calculate the value for it)

5) Add another column for any other letter which doesn’t exist in the pattern as *

6) The value for the * is the length of the pattern

7) If the last letter has occurred before & you have calculated its value and updated it in the table, leave it as it is without updating. If not, take the length of the pattern as the value.

Ex:

Pattern => TOOTH

index =>     0123

length = 5

Bad match table

Letter | T | O | H | * |

Value  | 1 | 2  | 5  | 5 |

Value ( T ) = 5–0–1 = 4

Value ( O ) = 5–1–1 = 3

Value ( O ) = 5–2–1 = 2 <= Update O Value <= Rightmost O

Value ( T ) = 5–3–1 = 1 <= Update T Value <= Rigthmost T

Value ( H ) = Not previously found, so Value ( H ) = length

If there is a value for H before, leave it as it is, do NOT update!

Comments