Something About Basic Parsing Technique

Rohit Gupta
Written by Rohit Gupta. Posted in New Technologies on 31 December 2009.
Hot 1319 hits 0 favoured

Basic Parsing Technique


Here we discuss the two most common forms of the parser – operator precedence and recursive decent.operator precedence is especially suitable for parsing expression, since it can use information about the precedence and associativity of the operator to guide the parse. Reecursive descent parser uses a collection of mutually recursive routines to perform the syntax analysis.the great bulk of compilers in existence in the early 1970’s use one or both of these methods. A common situation is for operator precedence to be used for expression and recursive descent for the rest of the language.

The primary advantage of these methods is that they are easy to implement by hand. But there are drawbacks as well. Operator precedence has the curious property that if one is not careful, one can recognize inputs that are not in the language of the underlying grammar. Likewise, recursive descent, particularly when augmented with backtracking, can produce rather unexpected results.

Fortunately, there are two newer methods gaining popularity that are both more general than the older methods and more firmly grounded in grammar theory. The first of these methods, LL parsing, will be mentioned in this article, as it is really a table-based variant of recursive descent.




A  parser for grammar G is a program that taken input as string w and produces as output either a parse tree for w, if w is a sentence of G, or an error message indicating that w is not a sentence of G often the parse tree is produced in only a figurative sense;  in reality, the parse tree exist only as a sequence of actions made by stepping  through the tree construction process.

This article discusses the operation of two basic types of parser for the context-free grammar-Bottom-up and Top-down. As indicated by their names, bottom-up parser build parse tree from bottom to the the root, while top-down parser starts with the root and work down to the leaves. In both cases the input to the parser is being scanned from left to right, one symbol at a time.

The bottom-up parsing method we discuss is called “shift-reduce” parsing because it consist of shifting input symbols onto a stack until right side of the production appears on the top of the stack. The right side may then be replace by (reduced to) the symbol on the left side of the production, and  the process repeated.

Unfortunately, if A->XYZ is a production, then not every time that XYZ is on the top of the stack is it correct to reduce XYZ to A; there may be occasion where it Is necessary to continue to shift input symbols on the top of XYZ. Designing an algorithm from a grammar so that shift-reduce decisions are made properly is the fundamental problem of bottom-up parser construction.


SLR parsing


A problem with LL(1) parsing is that most grammars need extensive rewriting to get them into a form that allows unique choice of production. Even though this rewriting can, to a large extent, be automated, there are still a large number of grammars that can not be automatically transformed into LL(1) grammars.

LR parsers is a class of bottom-up methods for parsing that accept a much larger class of grammars than LL(1) parsing, though still not all grammars. The main advantage of LR parsing is that less rewriting is required to get a grammar in acceptable form for LR parsing than is the case for LL(1) parsing.

Furthermore, LR parsers allow external declaration of operator precedences for resolving ambiguity, instead of requiring the grammars themselves to be unambiguous.

we limit the discussion to SLR for the following reasons:

• It is simpler.

• In practice, LALR(1) handles only a few more grammars than SLR.

• When a grammar is in the SLR class, the parse-table produced by a SLR parser generator will be identical to the table produced by a LALR(1) parser generator.

• Understanding of SLR principles is sufficient to know how to rewrite a grammar rejected by a LALR(1) parser

If the input text does not conform to the grammar, there will at some point during the parsing be no applicable actions and the parser will stop with an error message. Otherwise, the parser will read through all the input and leave a single element (the start symbol of the grammar) on the stack. LR parsers are also called shift-reduce parsers. As with LL(1), our aim is to make the choice of action depend only on the next input symbol and the symbol on top of the stack.

Conflicts in SLR parse-tables

When reduce actions are added to SLR parse-tables, we might add one to a place where there is already a shift action, or we may add reduce actions for several different productions to the same place. When either of this happens, we no longer have a unique choice of action, i.e., we have a conflict. The first

situation is called a shift-reduce conflict and the other case a reduce-reduce conflict. Both may occur in the same place. Conflicts are often caused by ambiguous grammars, but (as is the case for LL-parsers) even some non-ambiguous grammars may generate conflicts. If a conflict is caused by an ambiguous grammar, it is usually (but not always) possible to find an equivalent unambiguous grammar. But even unambiguous grammars may in some cases generate conflicts in SLR-tables.

In some cases, it is still possible to rewrite the grammar to get around the problem, but in a few cases the language simply is not SLR. Rewriting an unambiguous grammar to eliminate conflicts is somewhat of an art. Investigation of the NFA states that form the problematic DFA state will often help identifying the exact nature of the problem, which is the first step towards solving it. Sometimes, changing a production from left-recursive to

  • Add the production S0 !S, where S is the start symbol of the grammar.
  • Make an NFA for the right-hand side of each production.
  • If an NFA state s has an outgoing transition on a nonterminal N, add epsilon-transitions from s to the starting states of the NFAs for the righthand sides of the productions for N.
  • Convert the combined NFA to a DFA. Use the starting state of the NFA for the production added in step 1 as the starting state for the combined NFA.
  • Build a table cross-indexed by the DFA states and grammar symbols (terminals including $ and nonterminals). Add shift actions for transitions on terminals and go actions for transitions on nonterminals.
  • Calculate FOLLOW for each nonterminal. For this purpose, we add one more start production.
  • When a DFA state contains an accepting NFA state marked with production number p, where the nonterminal for p is N, find the symbols in FOLLOW(N) and add a reduce p action in the DFA state at all these symbols. If production p is the production added in step 1, add an accept action instead of a reduce p action.

right-recursive may help, even though left-recursion in general is not a problem for SLR-parsers, as it is for LL(1)-parsers.



Rohit Gupta

Author: Rohit Gupta

71 11562 3
  • No comments found
Powered by CjBlog