The maTe assembler encodes class file addresses (byte offsets within the class file), decimal integer literals, ASCII string literals and maTe symbolic instruction opcodes. These entities are all encoded into one or more 32-bit words stored in big-endian format in the class file.
The maTe assembler supports symbolic references to class file addresses. A symbol (label) is defined by providing the symbol name followed by a colon. The symbol is defined to be the number of bytes that will be encoded in the class file prior to the point where the symbol is defined. That is, the symbol stands for the address of the next entity in the class file.
References to symbols are made by preceding the symbol name by a dollar sign. For the reference the maTe assembler encodes in the class file the address bound to the symbol. The maTe assembler is implemented with two passes over the assembly language input in order to support references to symbols before they are defined.
Newlines are not treated specially and like all other whitespace are just used to separate tokens (symbol definitions, symbol references, integer literals, string literals and opcodes).
Comments can be placed in the assembly language file by starting them with a "#". The comment ends with the next newline character.
String literals are enclosed in double quotes and can include any character except a double quote (even a newline character). There is no support for backslash sequences.
Integer literals are specified in decimal and cannot include a sign.
Symbol names must begin with a letter and are made up of letters, digits, "+", "*", "/", "!", "$", "-", "<", ">", "[" and "]".
Instruction opcodes must be specified in lowercase.
The first token in the assembly language file must be "$mainBlock", a reference to the symbol "mainBlock", and this symbol must be defined at the beginning of the instructions for the main block, which must be specified prior to the method and constructor bodies for the classes. The second token in the file must an integer literal specifying the number of local slots used by the main block.
Next in the assembly language file is the class table, which starts with an integer literal defining the number of classes in the table. This is followed by a list of pairs, one for each class. The first member of each pair is a symbol reference, for the address of the descriptor of the class. The second member of the pair is a string, the name of the class.
The class descriptors follow in the file. Each class descriptor begins with a symbol definition to define its address and is followed by:
A method table begins with an integer literal specifying the number of methods in the class. This is followed by a list of method descriptors, one for each method in the class. Each method descriptor is a triple: either a symbol reference for the beginning of the code for the method, followed by an integer literal for the number of local slots used by the method and by a string containing the name of the method; or an integer literal zero if the method is implemented natively, followed by the native method index and by a string containing the name of the method.
Method names produced by the maTe compiler are "munged" names, consisting of the following items separated by dollar signs:
Unmunged method names are created for instances of operator overloading by concatenating the string "operator" to the operator itself (e.g. "operator+"). These names are then munged as described above.
After the class descriptors is the code section, which contains the main block followed by the method and constructor bodies. Each piece of code begins with a symbol definition to define its entry point address. This is followed by a list of instructions.
The maTe assembler verifies that each instruction opcode has the correct number and type of fields. Index or count fields must be integer literals. Address fields must be symbol references. The value field of the newint instruction must be an integer literal. The value field of the newstr instruction must be a string literal.
Comments and questions should be directed to hatcher@unh.edu