Parsing CalculiX
par Stéphane Graham-Lengrand

 Login :  Mot de passe :

The goal of this TD is to write a parser for CalculiX, the language for which we implemented an interpreter in TD12. In that TD, we gave you a parser so that you could test your code. Today, it is your turn to write the parser, reading a sequence of tokens recognized by a lexical analyzer (lexer) and constructing expressions according to a grammar.

Preparation

The setup for this TD is identical to TD12. Create a new project TD14, download td_14.zip and unzip it in the project directory. Add the two libraries in the directory lib to the project ("Java Build Path -> Add JARs").

The package CalculiX contains a number of classes, of which the following are mainly of interest in this TD:

In TD12, we described the CalculiX language (up to Boolean expressions) by using the following grammar:

Integersi ::= ... | -1 | 0 | 1 | ...
Integer Operatorsop ::= + | - | * | /
Integer Comparatorsco ::= < | <= | ==
Booleansb ::= true | false
Boolean Connectivesbop ::= && | ||
Constantsc ::= i | b
Expressionse ::= c | e op e | e co e | e bop e | if e then e else e

In the lecture last week, you saw a generic method to turn a grammar such as the above, into a non-directional and non-deterministic parser. This afternoon, we implement a (more efficient) directional and deterministic parser for the CalculiX language, by adapting the above grammar to our needs. We do this step-by-step.

Implementation

Integers

The first grammar that we shall parse only accepts integer constants:

Integersi ::= ... | -1 | 0 | 1 | ...
Constantsc ::= i
Expressionse ::= c

The class MyParser contains a field lexer, an expression factory factory, a constructor for initializing these fields and the following parse() function:

  Expression parse () throws SyntaxError
  {
    Expression e = consumeExpression();
    Token t = lexer.peek();
    switch (t.symbol()) {
    case EOF:
      return e;
    default:
      throw t.syntaxError ("expected EOF");
    }
  }
It parses an expression e, and then uses the peek function to look at the next available token. If the next token corresponds to the symbol EOF, it returns the parsed expression e, otherwise it throws a syntax error at the token. Given the current implementation of consumeExpression (always returning null), the parse function only accepts empty expressions (at present).

In the class MyParser, write a new method Expression consumeConstant() that

If the token is not an integer, throw a syntaxError (like in parse above).

Now, modify the function consumeExpression to call consumeConstant as follows:

  Expression consumeExpression () throws SyntaxError
  {
    return consumeConstant();
  }
Our parser now implements a grammar that accepts integer constants but nothing else.

Test your code by running the Main class which contains a sequence of tests. The line

int j = 0;
instructs the main method to perform the first j tests on integer expressions (initially, 0). If your code works correctly, the first 2 integer tests should produce
Parsed "0" as 0
Evaluated as 0, expecting 0
Parsed "7" as 7
Evaluated as 7, expecting 7

Submit MyParser.java

Additive Expressions

We now extend the grammar to accept additions and subtractions. The grammar from TD12 suggests to implement in our parser the following grammar
Additive Expressionsae ::= c | ae + ae | ae - ae
Expressionse ::= ae

Unfortunately, this grammar is ambiguous (the word c + c - c can be accepted with two different production trees).

In order to write our deterministic parser, we use a different grammar to define the same language:
Additive Expression Tailaet ::= + c aet | - c aet | ε
Additive Expressionsae ::= c aet
Expressionse ::= ae

An expression is now defined as an additive expression, which is in turn defined as a constant followed by an additive expression tail. The tail may be empty (denoted ε), or may consist of a sequence of expressions of the form +/- c +/- c +/- c ...

To parse this grammar, we define an oracle that tells us which decision to make when we are facing a choice (between several productions). This oracle can be written as the following two-dimensional table, where each column corresponds to a non-terminal symbol that we are trying to parse, and each line corresponds to the first token to be consumed (a terminal symbol):

aeaetc
ic aeti
++ c aet
-- c aet
#ε

In words:

Implement columns (ae) and (aet) of this oracle as two functions Expression consumeAdditiveExpression() and Expression consumeAdditiveExpressionTail(Expression head). Note that consumeAdditiveExpressionTail takes as its argument the expression that was parsed before entering the tail. Then, modify consumeExpression to call consumeAdditiveExpression instead of consumeConstant.

WARNINGS! For those who have time, have a look at this comment.

Test your code by running Main; check which tests now succeed and which fail.

Submit MyParser.java

Parenthetical Expressions

We extend the grammar to include parenthetical expressions ( e ) which allow us to write more precise expressions. For example, 3 - 2 - 1 is different from 3 - (2 - 1).

We introduce a new non-terminal called atomic expressions that include both constants and parenthetical expressions. Additive expressions are now written in terms of atomic expressions, not constants.

Atomic Expressionsa ::= c | ( e )
Additive Expression Tailaet ::= + a aet | - a aet | ε
Additive Expressionsae ::= a aet
Expressionse ::= ae

To parse this grammar, we extend the oracle as follows:

aeaetac
ia aetci
(a aet(e)
)ε
++ a aet
-- a aet
#ε

In particular, when we encounter a LPAREN symbol, we are parsing an atomic expression, and when we encounter a RPAREN symbol we finish parsing an additive expression tail.

Implement Expression consumeAtomicExpression(), and modify Expression consumeAdditiveExpression() and Expression consumeAdditiveExpressionTail(Expression head) to implement the above oracle.

Test your code with Main and submit MyParser.java

Multiplicative Expressions

We extend the grammar to include operators for multiplication and division which have higher precedence than addition and subtraction. To represent this precedence, we create a new non-terminal symbol for multiplicative expressions as follows:

Multiplicative Expression Tailmet ::= * a met | / a met | ε
Multiplicative Expressionsme ::= a met
Additive Expression Tailaet ::= + me aet | - me aet | ε
Additive Expressionsae ::= me aet

Design an oracle for this grammar and implement it as two new functions Expression consumeMultiplicativeExpression() and Expression consumeMultiplicativeExpressionTail(Expression head). Modify the other functions accordingly.

Again, make sure that the multiplicative expressions that you produce are left-associative; i.e. make sure that parsing 12 / 4 / 2 produces the expression corresponding to (12 / 4) / 2;

Test your code with Main and submit MyParser.java

Comparisons and Boolean Expressions

We extend the syntax with the boolean constants true and false, comparison operators (<, <=, ==), and boolean connectives (&&, ||). All comparison operators have the same precedence and they have lower precedence than addition and subtraction. Disjunction (||) has lower precedence than conjunction (&&) which has lower precedence than comparison.

Extend the grammar to accept this extended syntax; design an oracle for it and implement it by adding the corresponding parsing functions to MyParser; adapt the consumeExpression function to start parsing the constructs with lowest precedence.

Again, test your code with Main: you can test some boolean expressions by changing the value of variable k in the main method.

Submit MyParser.java

If Then Else

Finally, we extend the syntax with the if then else construct, which has the lowest precedence of all.

Extend the grammar to accept this construct; design an oracle for it and implement it by adding the corresponding parsing function to MyParser; adapt consumeExpression accordingly.

Test your code with Main and submit MyParser.java