CS4905 Introduction to Compiler Construction

Assignment 3  Feb. 8, 2007 due Feb. 20, 2007

This assignment is worth 5% of your final mark.  References to exercises are in the course text.  Assignments must be your own work.

A printable pdf version of this file is available here.

 

1.  Textbook exercise 3.10, p.84 on identifying LR(1) states and constructing an LR(1) parse table.  

 

2.  Textbook exercise 3.13, p.84 on showing why a grammar is LALR(1) but not SLR.

 

3.  Textbook exercise 3.14, p.85 on showing why a grammar is LL(1) but not LALR(1).

 

4.  Textbook exercise 3.15, p.85 on overcoming LALR(1) conflicts with yacc precedence directives.

 

5.  Write a yacc program that will take a series of Roman numerals as input, parse them, and take semantic actions to compute and print the base 10 number corresponding to the Roman numerals.  Remember that the roman numerals are I(=1), V(=5), X(=10), L(=50), C(=100), D(=500) and M(=1000).  The rules for roman numerals are

            (i) If a letter is immediately followed by one of equal or lesser value, the two values are added (e.g. XV = 15, MCXX = 1120).

            (ii) If a letter is immediately followed by one of greater value, the first is subtracted from the second (e.g. IV = 4, XL = 40, CM = 900).  For this rule, the first letter must represent a power of ten, and the second letter can be at most 10 times the first letter.  This means that I can precede V or X, X can precede L or C, and C can precede D or M.  This follows the "conventional principle" of Roman numerals. 

Other examples are XLVII = 47, MCMXIV = 1914.  Assume a maximum number of MMM.

For example, the Roman numeral MCMXIV, when parsed and interpreted should result in the base 10 number 1914.  Test your program on the following Roman numerals:

CDIX

MMD

MMCMXCIX

CMCCLXIV

MMDCCLII

MMVII

MMDCXL

MCMXXXVII

MDCLVI

XXXIX

XCIX

CMXCIX

MMXII

CXLVII

Hints: Use the desk calculator example from Lab 2 as a starting point, and change the yacc stack type from double to int.  One possible grammar for Roman numerals is as follows:

            <N> ::= G | G <N>

where G is a token (one of the seven possible letter “glyphs”) returned from lex.  Remember that yylval contains the returned value of a lex token.

 

6.  Textbook exercise on writing an interpreter using JavaCC, 4.3 on p.102.  Note that this first requires you to modify the grammar 3.15 to accommodate the negative integer specification and the “print” statement.  Left recursion removal is also required.  You will also need a symbol table that can store the values of integer variables.  For this, you can modify the hash table given in Program 5.2 (p.106) to store the integer value as the variable’s binding (as in Lab 3).  Test that your interpreter works using the following input file:

k = -2;

n = 6;

j = k - n * 3 + n / k;

r = j - ( 2 / k ) * n;

i = r / (j + 2 ) - k;

print j;

print r;

print i;

p = 27 - i * 12 / ( r - j);

n = (n + r) / k;

print p;

print n

Your interpreter should print the values of j, r, i, p and n correctly. 

 

 

Remember that all assignments must be handed in on 8.5 by 11 inch paper, typed, with a table of contents and all pages numbered.  For questions 5 and 6, make sure you hand in a complete listing of your input program, along with a copy of the output resulting from the testing you did with the input test data files.