The Cocke–Younger–Kasami algorithm (CYK) is a bottom-up parsing algorithm for context-free grammar. It is often studied in theoretical computer science because it is a systematic parsing algorithm that works for all CFGs.
The algorithm only works in the Chomsky normal form (CNF), and a CFG needs to be converted CNF before feeding into the algorithm.
CYK employs a table-filling method to reduce the otherwise exponential parsing complexity to polynomial time. Despite this improvement, the algorithm still runs in
Steps
- First create a triangular table, where each cell at (i, j) stores the set of non-terminals that can generate the substring of length j starting at position i (i.e. string[i:i+j])
- Fill the bottom (trivial lookup in the CNF)
- Going upward row-by-row, for each column, split the substring into two parts (for every splitting point) and see whether a rule
exists with in the left part and in the right part - If the start symbol is in the upper-most position, the string is in the language