Thompson’s construction is a divide-and-conquer algorithm to convert a regular expression into an equivalent NFA. The NFA can then be used to match strings against that regex.

Unlike naive backtracking-based regex matching, which can take exponential time in the worst case, Thompson’s construction produces an NFA that can be simulated in time linear in the length of the input string.

Below is an illustration of the algorithm (with transitions not labeled being -transitions) Thompson's construction.png

See also

  • Regular Expression Matching Can Be Simple And Fast provides a variant that stores “output arrows” in NFA fragments. This version reduces redundant states and -transitions compare to the vanilla Thompson’s construction
  • Glushkov’s construction is another construction method that generates an NFA without -transitions (though the number of transitions can be larger)
  • Xing’s construction presents a regex parsing method that produces a parse tree with fewer nodes, and from it constructs an NFA with fewer states and transitions than both Thompson’s and Glushkov’s methods.