maTe Class File Format


maTe class files are simply a series of binary-encoded 32-bit integers referred to as words in the remainder of this document. All values are encoded in big-endian byte-order. All addresses in the class file are byte-offsets from the start of the class file, where address 0 points to the first byte of the class file.

The maTe compiler (mc) outputs a maTe assembly language file which the maTe assembler (mas) will then translate into binary-encoded words in the format described by this document.

The maTe virtual machine instruction encoding is specified in a document describing the maTe virtual machine instruction set.

All symbols ($symbol in assembly language) are encoded as an word whose value is the address of the label of the same name.

Integer literals are encoded as a word of the same value.

String literals are encoded as a series of words containing the ASCII values of the characters in the string, one byte per word, terminated by the ASCII value of the NULL character (0).

Instruction names are encoded as a word whose value is the instruction's opcode.

The class file contains the following major sections in this order:

  1. main block descriptor

  2. class table

  3. class descriptors

  4. code section

The main block descriptor contains two words in this order:

  1. address of the code for the main block

  2. number of locals used by the main block

The class table begins with a word containing the number of classes described by the table. This is followed by a list of pairs, one for each class. The first item of a pair is a word containing the address of the descriptor for the class. The second item is a string containing the name of the class.

A class descriptor consists of the following components in this order:

  1. super class

  2. number of fields, including inherited fields

  3. method table

The super class is described by specifying the address of the descriptor for the super class. For Object, zero is placed in the super class component.

The method table begins with a word containing the number of methods described by the table. The method table then consists of a list of triples, with one triple for each method. The first item in each triple is the address of the code for the method. The second item is the number of locals the method uses or the native code index if it is implemented natively. If the method is implemented natively then its address (the first item in the triple) will be zero. The third item in the triple is a string containing the munged name of the method. (The munged name of a method consists of its class name, the method name, and then the types of any parameters, listed in order.)

The code section consists of the code for the main block plus the code for each constructor and method that is not implemented natively. User-defined classes cannot have native methods so all native implementations of methods will be for predefined classes (Object, Integer, String and Table) and will be provided directly by the virtual machine implementation.


Last modified on January 15, 2010.

Comments and questions should be directed to hatcher@unh.edu