CS611
Spring 2003
Programming Assignment 2
Due Sunday February 23


Replace the assemble module of the IAS assembler.

The source code for the IAS assembler is in ~cs611/public/prog2. There are four entry points for the assembly module. Stubs for these four functions are in assemble.c.

The initializeAssemble function is called once at program start-up to allow the assemble module to initialize any internal data structures.

The assemble function is called once for each non-blank line of the input assembly language source program. The two parameters to this function are a string for a label and an INSTRUCTION object for an instruction. The function should encode its input in sim611 machine format in an internal data structure for later dumping to an output object file.

The INSTRUCTION object is described in defs.h. It contains a string for the opcode and an OPERAND object (also described in defs.h) for the operand if the instruction includes an operand. (If there is no operand, the OPERAND object will contain NULLoperand for the OPERAND_TYPE.) An operand is either a symbol (represented with a string) or an integer constant (represented as a "long long" integer).

Either the label or the instruction may be missing. If the label is not present, its string pointer will be NULL. If the instruction is not present, its opcode string pointer will be NULL.

The writeObjectFile function produces an object file from the internal data structure constructed by the series of calls to the assemble function. The structure of the object file is described in the IAS assembler webpage.

The dumpSymbolInfo function iterates through the symbol table and dumps symbol information to stdout. The format of this output is described in the IAS assembler webpage.

Both writeObjectFile and dumpSymbolInfo are called once, when EOF is reached on the input file.

The strings that are passed into assemble for labels and opcodes will already have been verified to be legal "identifiers" (start with a letter and made up of letters and digits). Likewise, for operands that are symbols, the strings will already be verified to be legal identifiers. For integer operands, the values have been verified to fit in 40 bits. In most cases, however, you will need to check whether the value will fit in 12 bits (if the operand represents an address).

Comments are discarded prior to the call to the assemble function.

As well as the IAS machine's opcodes, the assembler should support the IAS assembler's four directives described in the IAS assembler webpage.

If a line of the input is in error, call the error routine to format an error message. See the message.c file for details. You are only responsible for detecting one error per line, but you must be capable of detecting multiple errors in a file. You may, if you wish, simply ignore the rest of a line once an error is detected on that line. If an input file has errors, then the user of the assembler understands that the output object file is not to be trusted.

Sample IAS assembly language programs are available in ~cs611/public/IAS.

Your goal should be to match exactly the behavior of the implementation available in ~cs611/bin/iasm. Any ambiguities in this specification can be resolved by running test cases through this "official" implementation.

NOTE: Before starting, be sure you understand how to store a 40-bit integer value in five bytes, and be sure you realize that the object code file is not human readable.

You can examine the output of your assembler by using the "dump object file" tool available in ~cs611/bin/dumpobj. This tool reads an IAS object file from stdin and displays the file to stdout in human-readable form. You may also want to use the "octal dump" tool available in /usr/bin/od.

Your implementation must be performed using C.

Each instruction will be worth 2 points, for a total of 60 points. The DATA, WORD and ALLOC directives will collectively be worth 5 points. The GLOBL directive will be worth 5 points. The handling of "insymbols" and the dump of symbol information to stdout will be worth 15 points. The handling of "outsymbols" will be worth 15 points. (Outsymbols will be tested using the LD instruction so you must be handling the LD instruction in order to get credit for outsymbols.)

Be sure to include the proper error checking: e.g. illegal opcode, address out of range, program too big to fit 1024 words, opcode-operand mismatch, etc.

Your program will be graded primarily by testing it for correct functionality. However, you may lose points if your program is not properly structured or adequately documented.

Your code should be submitted for grading from a CIS Linux machine (e.g. turing.unh.edu). To turn in this assignment, type:
~cs611/bin/submit prog2 assemble.c

Do not turn in any other files!

Submissions can be checked by typing:
~cs611/bin/scheck prog2

To receive full credit for the assignment, you must turn in your files prior to 8am on Monday February 24. Late submissions will be accepted at the penalty of 5 points per day up to one week late.

Your programs will be graded using a CIS Linux machine (e.g. turing.unh.edu) so be sure to test in that environment.

Remember: as always you are expected to do your own work on this assignment.


Last modified on January 24, 2003.

Comments and questions should be directed to hatcher@unh.edu