Boyer-Moore string matching

O(n·m) worst, sublinear typical

Boyer-Moore is the matcher most text editors actually use, because it usually examines only a fraction of the text. It aligns the pattern and compares right to left. On a mismatch it applies the bad-character rule: look up the mismatched text character in the pattern and slide the pattern so that character's last occurrence lines up — and if the character isn't in the pattern at all, jump the whole pattern past it. Those jumps can be large, so the more dissimilar the alphabet, the faster it runs (sublinear on typical text, though O(n·m) in the worst case). The full algorithm adds a good-suffix rule too; shown here is the classic bad-character version. The matched suffix is teal, the current comparison orange, mismatches red, and confirmed matches green.

Text & pattern
Edit the input and press Play

How it works

  1. Build the bad-character table: the last index of each character in the pattern.
  2. Align the pattern and compare with the text from right to left.
  3. On a mismatch, slide the pattern so the offending text char meets its last occurrence.
  4. If the character isn't in the pattern, jump the whole pattern past it; on a full match, record and step on.

Pseudocode

1boyerMoore(text, pattern):          # sublinear typical2  last[c] = last index of c in pattern (bad-char table)3  align pattern at s = 04  while s ≤ n - m:5    compare pattern with text RIGHT to LEFT6    on mismatch of text char c at pattern j:7      s += max(1, j − last[c])       # jump past8    on full match: record it, s += 1