Converting Context Free Grammar to Chomsky Normal Form
Last Updated :
16 Apr, 2025
Chomsky Normal Form (CNF) is a way to simplify context-free grammars (CFGs) so that all production rules follow specific patterns. In CNF, each rule either produces two non-terminal symbols, or a single terminal symbol, or, in some cases, the empty string. Converting a CFG to CNF is an important step in many parsing algorithms, like the CYK algorithm, and helps in understanding the structure of languages.
What is Chomsky Normal Form (CNF)
A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production rules satisfy the following conditions:
- A non-terminal generating a terminal (e.g.; X→ x)
- A non-terminal generating two non-terminals (e.g.; X→YZ)
- Start symbol generating ε. (e.g.; S→ ε)
Consider the following grammars,
G1 = {S→a, S→AZ, A→a, Z→z}
G2 = {S→a, S→aZ, Z→a}
The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However, the grammar G2 is not in CNF as the production rule S→aZ contains terminal followed by non-terminal which does not satisfy the rules specified for CNF.
Key Properties of CNF
- A single CFG can be converted into different equivalent CNF forms.
- CNF produces the same language as the original CFG.
- CNF is widely used in parsing algorithms such as:
- Cocke - Younger - Kasami (CYK) algorithm for membership checking.
- Bottom-up parsers in compilers.
- For a string of length n, a CNF derivation requires at most 2n-1 derivation steps.
- Any CFG that does not generate ε has an equivalent CNF.
Steps to Convert CFG to CNF
Step 1: Eliminate the Start Symbol from RHS
If start symbol S is at the RHS of any production in the grammar, create a new production as: S0→S where S0 is the new start symbol.
Step 2: Remove Null, Unit, and Useless Productions
- Null (ε) Productions: If a rule contains ε, remove it by modifying other rules accordingly.
- Unit Productions: If a rule has a single non-terminal on the RHS (e.g., A→B), replace it with B’s productions.
- Useless Productions: Remove non-reachable or non-generating symbols from the grammar.
Step 3: Replace Terminals in Mixed Productions
Eliminate terminals from RHS if they exist with other terminals or non-terminals. e.g. , production rule X→ xY can be decomposed as: X→ZY, Z→x.
Step 4: Reduce Productions with More Than Two Non-Terminals
Eliminate RHS with more than two non-terminals. e.g,; production rule X→XYZ can be decomposed as: X→PZ, P→XY
Example: Converting a CFG to CNF
Let us take an example to convert CFG to CNF. Consider the given grammar G1:
S → ASB
A → aAS | a | ε
B → SbS | A | bb
Step 1.
As start symbol S appears on the RHS, we will create a new production rule S0→S. Therefore, the grammar will become:
S0→S
S → ASB
A → aAS | a | ε
B → SbS | A | bb
Step 2.
As grammar contains null production A→ ε, its removal from the grammar yields:
S0→S
S → ASB | SB
A → a | aAS | aS
B → SbS | A | ε | bb
Now, it creates null production B→ ε, its removal from the grammar yields:
S0→S
S → AS | S | ASB | SB
A → a | aAS | aS
B → SbS | A | bb
Now, it creates unit production B→A, its removal from the grammar yields:
S0→S
S → AS | ASB | SB | S
A → a | aAS | aS
B → SbS | bb | aAS | aS | a
Also, removal of unit production S0→S from grammar yields:
S0→ AS | ASB | SB | S
S → AS | ASB | SB | S
A → aAS | aS | a
B → SbS | bb | aAS | aS | a
Also, removal of unit production S→S and S0→S from grammar yields:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → aAS | aS | a
B → SbS | bb | aAS | aS | a
Step 3.
In production rule A→aAS | aS and B→ SbS | aAS | aS, terminals a and b exist on RHS with non-terminates. Removing them from RHS:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS |a
B → SYS | bb | XAS | XS |a
X →a
Y→b
Also, B→ bb can’t be part of CNF, removing it from grammar yields:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b
Step 4:
In production rule S0→ASB, S→ASB RHS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b
P → AS
Similarly, A→XAS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → SYS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA
Similarly, B→SYS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → TS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA
T → SY
So this is the required CNF for given grammar.
Explore
Automata _ Introduction
Regular Expression and Finite Automata
CFG
PDA (Pushdown Automata)
Introduction of Pushdown Automata
5 min read
Pushdown Automata Acceptance by Final State
4 min read
Construct Pushdown Automata for given languages
4 min read
Construct Pushdown Automata for all length palindrome
6 min read
Detailed Study of PushDown Automata
3 min read
NPDA for accepting the language L = {anbm cn | m,n>=1}
2 min read
NPDA for accepting the language L = {an bn cm | m,n>=1}
2 min read
NPDA for accepting the language L = {anbn | n>=1}
2 min read
NPDA for accepting the language L = {amb2m| m>=1}
2 min read
NPDA for accepting the language L = {am bn cp dq | m+n=p+q ; m,n,p,q>=1}
2 min read
Construct Pushdown automata for L = {0n1m2m3n | m,n ⥠0}
3 min read
Construct Pushdown automata for L = {0n1m2n+m | m, n ⥠0}
2 min read
NPDA for accepting the language L = {ambncm+n | m,n ⥠1}
2 min read
NPDA for accepting the language L = {amb(m+n)cn| m,n ⥠1}
3 min read
NPDA for accepting the language L = {a2mb3m|m>=1}
2 min read
NPDA for accepting the language L = {amb2m+1 | m ⥠1}
2 min read
NPDA for accepting the language L = {aibjckdl | i==k or j==l,i>=1,j>=1}
3 min read
Construct Pushdown automata for L = {a2mc4ndnbm | m,n ⥠0}
3 min read
NPDA for L = {0i1j2k | i==j or j==k ; i , j , k >= 1}
2 min read
NPDA for accepting the language L = {anb2n| n>=1} U {anbn| n>=1}
2 min read
NPDA for the language L ={wÐ{a,b}* | w contains equal no. of a's and b's}
3 min read
Turing Machine
Decidability
TOC Interview preparation
TOC Quiz and PYQ's in TOC