The CS611 Virtual Machine

The CS611 Virtual Machine (VM) is used to define and execute simple 32-bit integer computations.

The VM uses the two's-complement encoding for representing integer values.

The VM contains two memories: one for instructions and one for data. The instruction memory contains 2048 bytes, which is actually thought of as 4096 4-bit "nibbles", numbered from 0 to 4095. The data memory contains 1024 32-bit words, numbered from 0 to 1023.

The data memory contains both statically allocated data and a run-time stack. The statically allocated data is placed in the low-address end of the data memory and the run-time stack is in the high-address end of the data memory. The stack grows from high memory down to low memory.

Internally the VM has a program-counter (PC) register that contains the nibble address of the instruction that will execute next. At VM initialization the PC is set to 0. The VM also has a stack-pointer (SP) register that contains the address of the top integer on the stack. At VM initialization the SP is set to 1024, indicating that the stack is empty. Finally, the VM has a frame-pointer (FP) register that is used by the "call" and "ret" instructions, which will be discussed below. At VM initialization the FP is set to 1024.

The VM has the following fetch-execute cycle:

Fetch the instruction that begins at the nibble address in instruction memory given by the PC.
Update the PC by the length of the instruction.
Execute the instruction.
Go to step 1.

Instructions are either 4 bits long or 16 bits long. Instructions can be stored in instruction memory starting at any 4-bit boundary.

A 4-bit instruction contains only a 4-bit opcode. These instructions typically pop two integer operands from the stack and push one integer result. For example, the "add" instruction pops two operands, adds them together and pushes the result.

The 16-bit instructions come in two flavors. The branch ("b" and "bt") and call ("call") instructions consist of a 4-bit opcode followed by a 12-bit nibble address of a location in instruction memory. The stack ("push" and "pop") instructions consist of a 4-bit opcode followed by a 2-bit operand type and a 10-bit operand field.

A 2-bit operand type denotes these 4 cases:

00: normal operand, the address of the operand is in the following 10 bits.
01: immediate operand, the value itself is in the following 10 bits. (The value is signed and is represented using two's complement.)
10: indirect operand, the address of the address of the operand is in the following 10 bits.
11: local operand, the offset of the local variable slot is in the following 10 bits. (Offsets can be both positive and negative and are represented using two's complement.)

The 12-bit address in the branch and call instructions is stored in Little Endian fashion: the second nibble of the instruction contains the low 4 bits of the address, the third nibble contains the middle 4 bits of the address, and the last nibble contains the high 4 bits of the address.

The 10-bit operand field in the stack instructions is also stored in Little Endian fashion. The second nibble of the instruction contains the 2-bit operand type in bits 2-3 and the low 2 bits of the operand field in bits 0-1. The third nibble contains bits 2-5 of the operand field. The last nibble contains the bits 6-9 of the operand field.

The "call" instruction pushes the return address on the stack, pushes the contents of the FP register, assigns the contents of the SP register to the FP register, and pushes a zero on top of the stack. These stack locations mark the beginning of the "frame" for the function being called. Stack frames are used to store local variables, function arguments and intermediate results. Stack locations may be referenced as local operands by providing offsets relative to the return address. In particular, local variables would be denoted by negative offsets and arguments (passed to the current function) would be denoted by positive offsets. An offset of 0 would denote the old FP contents. Similarly, an offset of 1 would denote the return address itself. The arguments passed to the current function would start at an offset of 2. An offset of -1 would denote the zero word pushed by the "call" instruction. This word will be used for function return values (as described below). Local variables could be allocated started at offset -2. Local variables are allocated by simply using the "push" instruction.

The "ret" instruction pops the top of stack, which is assumed to be the function return value, into a temporary storage location in the VM, then it sets the FP by popping the stack, sets the PC by again popping the stack, and then stores the initially popped value in the first local slot (at offset -1) of the calling function. The net result is to return control to the calling function, leaving the function return value in the caller's first local slot.

The calling conventions for this machine, therefore, are for the calling function to push the arguments on the stack in reverse order and then execute the "call" instruction, which will push the return address on the stack. (The return address is the current contents of the PC register.) The "call" instruction also pushes the old FP contents and establishes a new FP value for the newly called function. And it allocates a local variable to contain return values for any function calls that will be made by the called function. The called function can then issue "push" instructions to allocate local variables. The called function accesses its locals using negative offsets (starting at -2, since -1 will be used for function return values) and accesses the arguments passed to it using positive offsets starting at 2. The called function is responsible for removing its local variables from the stack (by using the "pop" instruction), pushing its return value, and executing the "ret" instruction.

The following table describes the VM instruction set.

The CS611 VM Instructions
Opcode Encoding Operation Description

add 0x0 addition Top two values on the stack are popped, added and the result is pushed.

sub 0x1 subtraction Pop value2; pop value1; subtract value2 from value1 and push the result.

mul 0x2 multiplication Top two values on the stack are popped, multiplied and the result is pushed.

div 0x3 division Pop value2; pop value1; divide value1 by value2 and push the result.

lt 0x4 test for less than Pop value2; pop value1; if value1 < value2 then 1 is pushed else 0 is pushed.

gt 0x5 test for greater than Pop value2; pop value1; if value1 > value2 then 1 is pushed else 0 is pushed.

eq 0x6 test for equal to Pop value2; pop value1; if value1 == value2 then 1 is pushed else 0 is pushed.

ret 0x7 return from a function Pop returnValue; pop savedFP; pop returnAddress; FP := savedFP; PC := returnAddress; *(FP-1) := returnValue.

b 0x8 branch The address field of the instruction is assigned to the PC register.

bt 0x9 branch if true The top value on the stack is popped and if not zero (true) the address field of the instruction is assigned to the PC register. (If zero, then no further action is taken after the pop.)

call 0xA call a function Push PC; PC := address field of instruction; Push FP; FP := SP; Push 0.

push 0xB push an operand The value denoted by the operand field is pushed.

pop 0xC pop an operand The top value on the stack is popped and assigned to the location denoted by the operand field. (If the instruction denotes an immediate operand, the popped value is simply discarded.)

out 0xD output an operand The top value on the stack is popped and output.

in 0xE input an operand A value is input and pushed on the stack.

halt 0xF halt The VM is halted.

The CS611 VM Instructions
Opcode	Encoding	Operation	Description
add	0x0	addition	Top two values on the stack are popped, added and the result is pushed.
sub	0x1	subtraction	Pop value2; pop value1; subtract value2 from value1 and push the result.
mul	0x2	multiplication	Top two values on the stack are popped, multiplied and the result is pushed.
div	0x3	division	Pop value2; pop value1; divide value1 by value2 and push the result.
lt	0x4	test for less than	Pop value2; pop value1; if value1 < value2 then 1 is pushed else 0 is pushed.
gt	0x5	test for greater than	Pop value2; pop value1; if value1 > value2 then 1 is pushed else 0 is pushed.
eq	0x6	test for equal to	Pop value2; pop value1; if value1 == value2 then 1 is pushed else 0 is pushed.
ret	0x7	return from a function	Pop returnValue; pop savedFP; pop returnAddress; FP := savedFP; PC := returnAddress; *(FP-1) := returnValue.
b	0x8	branch	The address field of the instruction is assigned to the PC register.
bt	0x9	branch if true	The top value on the stack is popped and if not zero (true) the address field of the instruction is assigned to the PC register. (If zero, then no further action is taken after the pop.)
call	0xA	call a function	Push PC; PC := address field of instruction; Push FP; FP := SP; Push 0.
push	0xB	push an operand	The value denoted by the operand field is pushed.
pop	0xC	pop an operand	The top value on the stack is popped and assigned to the location denoted by the operand field. (If the instruction denotes an immediate operand, the popped value is simply discarded.)
out	0xD	output an operand	The top value on the stack is popped and output.
in	0xE	input an operand	A value is input and pushed on the stack.
halt	0xF	halt	The VM is halted.

When an instruction memory address is pushed on the run-time stack by the "call" instruction, the 12-bit address is zero extended to 32 bits. When an instruction memory address is popped from the stack by the "ret" instruction, 32 bits is truncated to 12 bits by discarding the upper 20 bits.

When an immediate mode operand is pushed, the operand is sign extended to 32 bits. When an indirect operand is accessed, the address stored as 32 bits in data memory will be truncated to 10 bits by discarding the upper 22 bits.

Code to be executed by the VM are stored in "object" files. The first 12 bits in a VM object file contain the length of the instruction section in units of nibbles. This length is stored in Little Endian fashion. That is, the first byte of the file contains the low two nibbles, with the lowest nibble being in bits 0-3 and the middle nibble being in bits 4-7. The high nibble of the 12-bit length will be in bits 0-3 of the second byte in the file. Bits 4-7 of the second byte contain the first nibble of the instructions.

The 12-bit length is followed by the instructions. The instruction nibbles are stored in the bytes of the input file in the following manner: the nibble in bits 0-3 logically precedes the nibble in bits 4-7. If the instruction has an even number of nibbles then it will be padded with a nibble of all 1 bits to make the instruction section end on a byte boundary. This padding will not be reflected in the length field that starts the file.

After the instructions comes an optional sequence of 32-bit data words. The end of the sequence is simply indicated by the end of the file. These 32-bit words are stored in the byte-oriented file in Little Endian order (the low-order bytes will come first in the file).

An object file is executed by using the vm611 tool available in ~cs611/bin/vm611 on CIS Linux machines. The object file name should be provided on the command line. At VM initialization the instructions in the object file are loaded into the VM's instruction memory starting at address 0. Similarly, the data words (if any) are loaded into the VM's data memory starting at address 0. Nibbles in the VM's instruction memory that are not loaded from the object file are set to all 1 bits. Words in the VM's data memory that are no loaded from the object file are set to 0.

After the object file is loaded, execution begins with the PC initialized to 0, the SP initialized to 1024 and the FP initialized to 1024.

Execution continues until a "halt" instruction is executed or until an exception occurs. The following exceptions are possible:

Stack overflow: pushing a value to an address less than zero.
Stack underflow: popping a value from an address greater than 1023.
Divide by zero.
Illegal instruction: execution goes off the bottom of instruction memory.

The "in" instruction reads from stdin and the "out" instruction writes to stdout. Remember that these instructions simply manipulate 32 bits. There is no ASCII conversion performed. The "out" instruction writes a 32-bit value to stdout in Little Endian fashion (low byte is written first). The "in" instruction reads a 32-bit value from stdin in Little Endian fashion (low byte is read first). If an "in" instruction is executed and EOF is encounted on stdin before a full 32-bit value can be read, then -1 (32 1 bits) will be produced. Note that getting -1 from an "in" instruction is not a guarantee of EOF since -1 could be read normally.

An optional command-line argument, "-trace", can be provided to the vm611 tool before the object file is specified. If the "-trace" argument is provided, the VM will display a trace of each instruction execution to stderr.

Note: the "in" instruction reads from stdin and the "out" instruction writes to stdout, so you will need to re-direct stdin and stdout when executing vm611, if you use the "in" and "out" instructions. Remember that the input and output are not ASCII so you cannot type in input. The input must be a binary file.

Last modified on September 17, 2003.

Comments and questions should be directed to hatcher@unh.edu