CS712/CS812
Project Phase 1
Spring 2008
Due Sunday February 24


Build a compiler for a subset of T that supports the main block, integer variable declarations, integer expressions, expression statements, and the out statement.

You should use flex to build a scanner for this subset of T. The scanner should correctly process whitespace, comments, identifiers, the required keywords ("int", "main", "out"), integer literals, the required separators ("(", ")","{","}",";"), and the required operators ("=","==","+",">","-","!","/","<","*").

As well as allowing the scanner to be integrated with the rest of your compiler, you should also build the scanner in a manner that allows it to be used as a stand-alone program, where it will read a T program from stdin and will write the tokens to stdout, one token to a line (like was done by the Simp scanner). When running integrated with the full compiler, the scanner should still read from stdin.

You should use bison to build a parser for the subset of T. The T specification includes a yacc input for the whole language. Your parser should build an AST for the whole program. The compiler may stop at the first parse error.

Your compiler should have a command-line option ("-before") that allows the user to dump the AST before semantic analysis is done. The AST should be dumped in prefix form, one AST node per line (like was done by the Simp compiler). The dump should be printed to stderr. Print a blank line to stderr before starting the dump.

You should perform semantic analysis on the AST to support the subset of T. The semantic analysis should annotate and transform the AST to fully specify the semantics of the input program. Even though there is only one type in this subset of T, you should still label the type of all expression AST nodes. (We will introduce class types in a later phase.) You should also appropriatedly introduce Dereference nodes to distinguish the value of a variable from the address of a variable.

You should try to identify all semantic errors in the input. When an error is found, report it with the appropriate line number and an appropriate error message. (Sometimes because of how flex and bison work, your line number may unavoidably be off by one. This is okay.) Error messages should be written to stderr. Try to avoid a cascade of error messages caused by a single error.

Your compiler should have a command-line option ("-after") that allows the user to dump the AST after semantic analysis is done. The AST should be dumped in prefix form, one AST node per line (like was done by the Simp compiler). The dump should include the type for all expression AST nodes. The dump should be printed to stderr. Print a blank line to stderr before starting the dump. It should be possible to dump an AST even if the input program contained semantic errors. You may have to make arbitrary decisions about how to set fields of the AST when there is an error, but it should still be possible to do the dump without something bad happening (like a segfault).

If there are no semantic errors, then perform code generation by traversing the AST. Linux assembly code should be generated for the Intel IA-32 architecture. We will discuss in class all Intel instructions that will be required to complete this assignment. The output code should be written to stdout. You should provide a C++ source file that implements the run-time environment for the generated code. This file must be called "RTS.cxx" and it will be compiled and linked with the output of the code generator in order to execute a compiled program.

Write a Makefile called "Makefile" for building both the stand-alone scanner and the whole compiler. The Make goal for the stand-alone scanner should be "lexdbg", and this should be the executable name for the stand-alone scanner. The Make goal for the whole compiler should be "tc" and this should be the executable name for the whole compiler. You should also have a Makefile goal called "clean" which will remove all files that can be re-created (object files, executable files, flex output and bison output).

Provide a README file (called "README") that explains the current state of your compiler. Describe how well your compiler fulfills the requirements of this assignment.

The two required command-line options ("-before" and "-after") can be given in any order if both are supplied.

Archive all your files in a tar file called "phase1.tar". This tar file should un-tar into a single directory called "phase1", which should contain the Makefile, the README file, the "RTS.cxx" file and all the source files. All submitted files should be placed directly in the top-level of this directory. (That is, please do not use subdirectories.) You should submit your tar archive from euler.unh.edu or zeno.unh.edu using my "submit" script. (The submission script will not work from gauss.unh.edu.) Please note: the tar file you submit should not be compressed. But, please, do not include any executable or object files in the tar file.

To turn in this assignment, type:
~cs712/bin/submit phase1 phase1.tar

Submissions can be checked by typing (also on euler.unh.edu or zeno.unh.edu):
~cs712/bin/scheck phase1

Please read this specification carefully and try to follow it exactly. I will use scripts to test your compiler and therefore it is important that you follow my directions for the file names, command-line options, Makefile goals, tar file name, etc. (Points may be deducted if you do not follow the directions.)

Points will be assigned for this assignment in the following manner:

To receive full credit for the assignment, you must turn in your files prior to 8am on Monday February 25. Late submissions will be accepted at a penalty of 2 points for one day late, 5 points for two days late, 10 points for three days late, 20 points for four days late, and 40 points for five days late. No program may be turned in more than 5 days late.

Remember: you are expected to do your own work on this assignment.


Last modified on January 31, 2008.

Comments and questions should be directed to hatcher@unh.edu