The CS611 Virtual Machine (VM) is used to define and execute simple 32-bit integer computations.
The VM uses the two's-complement encoding for representing integer values.
The VM contains two memories: one for instructions and one for data. The instruction memory contains 2048 bytes, which is actually thought of as 4096 4-bit "nibbles", numbered from 0 to 4095. The data memory contains 1024 32-bit words, numbered from 0 to 1023.
The data memory contains both statically allocated data and a run-time stack. The statically allocated data is placed in the low-address end of the data memory and the run-time stack is in the high-address end of the data memory. The stack grows from high memory down to low memory.
Internally the VM has a program-counter (PC) register that contains the nibble address of the instruction that will execute next. At VM initialization the PC is set to 0. The VM also has a stack-pointer (SP) register that contains the address of the top integer on the stack. At VM initialization the SP is set to 1024, indicating that the stack is empty. Finally, the VM has a frame-pointer (FP) register that is used by the "call" and "ret" instructions, which will be discussed below. At VM initialization the FP is set to 1024.
The VM has the following fetch-execute cycle:
Instructions are either 4 bits long or 16 bits long. Instructions can be stored in instruction memory starting at any 4-bit boundary.
A 4-bit instruction contains only a 4-bit opcode. These instructions typically pop two integer operands from the stack and push one integer result. For example, the "add" instruction pops two operands, adds them together and pushes the result.
The 16-bit instructions come in two flavors. The branch ("b" and "bt") and call ("call") instructions consist of a 4-bit opcode followed by a 12-bit nibble address of a location in instruction memory. The stack ("push" and "pop") instructions consist of a 4-bit opcode followed by a 2-bit operand type and a 10-bit operand field.
A 2-bit operand type denotes these 4 cases:
The 12-bit address in the branch and call instructions is stored in Little Endian fashion: the second nibble of the instruction contains the low 4 bits of the address, the third nibble contains the middle 4 bits of the address, and the last nibble contains the high 4 bits of the address.
The 10-bit operand field in the stack instructions is also stored in Little Endian fashion. The second nibble of the instruction contains the 2-bit operand type in bits 2-3 and the low 2 bits of the operand field in bits 0-1. The third nibble contains bits 2-5 of the operand field. The last nibble contains the bits 6-9 of the operand field.
The "call" instruction pushes the return address on the stack, pushes the contents of the FP register, assigns the contents of the SP register to the FP register, and pushes a zero on top of the stack. These stack locations mark the beginning of the "frame" for the function being called. Stack frames are used to store local variables, function arguments and intermediate results. Stack locations may be referenced as local operands by providing offsets relative to the return address. In particular, local variables would be denoted by negative offsets and arguments (passed to the current function) would be denoted by positive offsets. An offset of 0 would denote the old FP contents. Similarly, an offset of 1 would denote the return address itself. The arguments passed to the current function would start at an offset of 2. An offset of -1 would denote the zero word pushed by the "call" instruction. This word will be used for function return values (as described below). Local variables could be allocated started at offset -2. Local variables are allocated by simply using the "push" instruction.
The "ret" instruction pops the top of stack, which is assumed to be the function return value, into a temporary storage location in the VM, then it sets the FP by popping the stack, sets the PC by again popping the stack, and then stores the initially popped value in the first local slot (at offset -1) of the calling function. The net result is to return control to the calling function, leaving the function return value in the caller's first local slot.
The calling conventions for this machine, therefore, are for the calling function to push the arguments on the stack in reverse order and then execute the "call" instruction, which will push the return address on the stack. (The return address is the current contents of the PC register.) The "call" instruction also pushes the old FP contents and establishes a new FP value for the newly called function. And it allocates a local variable to contain return values for any function calls that will be made by the called function. The called function can then issue "push" instructions to allocate local variables. The called function accesses its locals using negative offsets (starting at -2, since -1 will be used for function return values) and accesses the arguments passed to it using positive offsets starting at 2. The called function is responsible for removing its local variables from the stack (by using the "pop" instruction), pushing its return value, and executing the "ret" instruction.
The following table describes the VM instruction set.
Opcode | Encoding | Operation | Description |
add | 0x0 | addition | Top two values on the stack are popped, added and the result is pushed. |
sub | 0x1 | subtraction | Pop value2; pop value1; subtract value2 from value1 and push the result. |
mul | 0x2 | multiplication | Top two values on the stack are popped, multiplied and the result is pushed. |
div | 0x3 | division | Pop value2; pop value1; divide value1 by value2 and push the result. |
lt | 0x4 | test for less than | Pop value2; pop value1; if value1 < value2 then 1 is pushed else 0 is pushed. |
gt | 0x5 | test for greater than | Pop value2; pop value1; if value1 > value2 then 1 is pushed else 0 is pushed. |
eq | 0x6 | test for equal to | Pop value2; pop value1; if value1 == value2 then 1 is pushed else 0 is pushed. |
ret | 0x7 | return from a function | Pop returnValue; pop savedFP; pop returnAddress; FP := savedFP; PC := returnAddress; *(FP-1) := returnValue. |
b | 0x8 | branch | The address field of the instruction is assigned to the PC register. |
bt | 0x9 | branch if true | The top value on the stack is popped and if not zero (true) the address field of the instruction is assigned to the PC register. (If zero, then no further action is taken after the pop.) |
call | 0xA | call a function | Push PC; PC := address field of instruction; Push FP; FP := SP; Push 0. |
push | 0xB | push an operand | The value denoted by the operand field is pushed. |
pop | 0xC | pop an operand | The top value on the stack is popped and assigned to the location denoted by the operand field. (If the instruction denotes an immediate operand, the popped value is simply discarded.) |
out | 0xD | output an operand | The top value on the stack is popped and output. |
in | 0xE | input an operand | A value is input and pushed on the stack. |
halt | 0xF | halt | The VM is halted. |
When an instruction memory address is pushed on the run-time stack by the "call" instruction, the 12-bit address is zero extended to 32 bits. When an instruction memory address is popped from the stack by the "ret" instruction, 32 bits is truncated to 12 bits by discarding the upper 20 bits.
When an immediate mode operand is pushed, the operand is sign extended to 32 bits. When an indirect operand is accessed, the address stored as 32 bits in data memory will be truncated to 10 bits by discarding the upper 22 bits.
Code to be executed by the VM are stored in "object" files. The first 12 bits in a VM object file contain the length of the instruction section in units of nibbles. This length is stored in Little Endian fashion. That is, the first byte of the file contains the low two nibbles, with the lowest nibble being in bits 0-3 and the middle nibble being in bits 4-7. The high nibble of the 12-bit length will be in bits 0-3 of the second byte in the file. Bits 4-7 of the second byte contain the first nibble of the instructions.
The 12-bit length is followed by the instructions. The instruction nibbles are stored in the bytes of the input file in the following manner: the nibble in bits 0-3 logically precedes the nibble in bits 4-7. If the instruction has an even number of nibbles then it will be padded with a nibble of all 1 bits to make the instruction section end on a byte boundary. This padding will not be reflected in the length field that starts the file.
After the instructions comes an optional sequence of 32-bit data words. The end of the sequence is simply indicated by the end of the file. These 32-bit words are stored in the byte-oriented file in Little Endian order (the low-order bytes will come first in the file).
An object file is executed by using the vm611 tool available in ~cs611/bin/vm611 on CIS Linux machines. The object file name should be provided on the command line. At VM initialization the instructions in the object file are loaded into the VM's instruction memory starting at address 0. Similarly, the data words (if any) are loaded into the VM's data memory starting at address 0. Nibbles in the VM's instruction memory that are not loaded from the object file are set to all 1 bits. Words in the VM's data memory that are no loaded from the object file are set to 0.
After the object file is loaded, execution begins with the PC initialized to 0, the SP initialized to 1024 and the FP initialized to 1024.
Execution continues until a "halt" instruction is executed or until an exception occurs. The following exceptions are possible:
An optional command-line argument, "-trace", can be provided to the vm611 tool before the object file is specified. If the "-trace" argument is provided, the VM will display a trace of each instruction execution to stderr.
Note: the "in" instruction reads from stdin and the "out" instruction writes to stdout, so you will need to re-direct stdin and stdout when executing vm611, if you use the "in" and "out" instructions. Remember that the input and output are not ASCII so you cannot type in input. The input must be a binary file.
Comments and questions should be directed to hatcher@unh.edu