CS611
Programming Assignment 4
Spring 1999


Replace the assemble module of as611.

The source code for as611 is in ~cs611/public/prog4/as611. There are three entry points for the assembly module. Stubs for these three functions are in assemble.stub.

The init_assemble function is called once at program start-up to allow the assemble module to initialize any internal data structures.

The assemble function is called once for each line of the input assembly language source program. The four parameters to this function are strings and correspond to the four possible (non-comment) components of an input line. (The structure of an input line is described further below.) The function should encode its input in sim611 machine format in an internal data structure for later dumping to an output object file.

The write_obj_file function produces an object file from the internal data structure constructed by the series of calls to the assemble function. (The structure of the object file is described below.)

An input assembly source statement has the following basic format:

label: op code operands ;comments

All four fields are optional, although certain op codes require certain operands and an operand cannot appear without an op code. Statements can not be continued on other lines.

The input is "caseless" -- case of characters does not matter: "xy" is the same as "XY" is the same as "xY" is the same as "Xy". The symbols will be promoted to uppercase elsewhere in as611 prior to the call to the assemble function. Also labels are truncated to a maximum of six characters prior to the call to the assemble function.

A comment is begun by a semi-colon and ends with the end of the line. Comments are discarded prior to the call to the assemble function.

The only legal address specification is a string which would also be a legal label name (the label without the terminating colon). The string doesn't have to be a label -- it can be undefined and therefore an outsymbol -- but it has to have the same form as a label name.

NOTE: The LDIMM instruction accepts either a decimal constant (possibly signed) or an address specification as its nonregister operand.

The assembler should support the following pseudo-ops (ie they can appear in the opcode column of the input source program):

BYTE value
WORD value
ALLOC length

BYTE stores a one-byte value at the current location. WORD stores a 16-bit value starting at the current location. ALLOC allocates `length' number of bytes starting at the current location.

The value and length operands can only be decimal constants (the value can be possibly signed).

If a line of the input is in error, call the error routine to format an error message. See the message.c file for details. You are only responsible for detecting one error per line, but you must be capable of detecting multiple errors in a file. You may, if you wish, simply ignore the rest of a line once an error is detected on that line. If an input file has errors, no object code need be output.

If one of the four input parameters to the assemble function does not have a corresponding component on the input line, the NULL pointer will be supplied for that parameter.

See the comments in the ~cs611/public/prog4/sim611/exec.stub file for other details of the sim611 machine and the sim611 assembly language.

An object file is divided into four sections -- insymbol table, outsymbol table, relocation data, and the object code itself. The four sections appear in the order just listed and each section is preceded by a two-byte integer which describes the length in bytes of the section which follows.

The relocation data is simply a series of bits. The relocation bits consist of one bit for each byte of the object code. A bit is set if the corresponding byte of the object code contains the low-order byte of a relocatable address, and the bit is clear otherwise. If bit zero (low-order, right-most, bit) of a byte of relocation data refers to the object code byte 'n', then bit one of the same byte of relocation data refers to object code byte `n+1', bit two refers to byte `n+2', and so on.

Insymbols are the label names defined in the source program. Outsymbols are the address references which do not appear as label names somewhere in the source program (ie undefined symbols).

An entry in the insymbol table consists of a 6-byte string which is the symbol itself (in all uppercase) and a 2-byte offset of where in the object code that symbol refers (ie the address of the symbol).

An entry in the outsymbol table consists of a 6-byte string which is the symbol itself (in all uppercase) and a 2-byte offset into the object code of where that symbol is used (not its address -- it's undefined -- but where its address would go if it were known). This offset is in fact a pointer to the beginning of a linked list of all the uses of this symbol in the object code. The address fields of the references to the outsymbol are used to store the links. An address field containing all ones (in binary) terminates the chain.

Symbols are stored in both the insymbol table and the outsymbol table left justified and blank filled. Symbols in the insymbol and outsymbol tables are stored in all uppercase.

Eight public test files (assembly language source files) are available in ~cs611/public/prog4/test. Each file tests a different aspect of the assignment and each will be worth 10 percent. Most of these files are simply meant to be used to test the assembler and are not intended to be executed by sim611. Two hidden test files will be used to test error handling and other items not covered in the public files. These files will also be worth 10 percent each.

NOTE: Before starting, be sure you understand what a two-byte integer is and be sure you realize that the object code file is not human readable.

Your implementation must be performed using C.

Your program will be graded primarily by testing it for correct functionality. However, you may lose points if your program is not properly structured or adequately documented.

Your code should be submitted for grading from alberti (or hopper or christa). To turn in this assignment, type:
~cs611/bin/submit prog4 assemble.c

Do not turn in any non-Ascii files (i.e. no object files, no executable files, etc.).

Submissions can be checked by typing:
~cs611/bin/scheck prog4

To receive full credit for the assignment, you must turn in your files prior to 8am on Monday March 29. Late submissions will be accepted at the penalty of 5% per day up to one week late.

Remember: as always you are expected to do your own work on this assignment.


Last modified on February 28, 1999.

Comments and questions should be directed to pjh@cs.unh.edu