CS520
Fall 2016
Program 4
Due Wednesday November 2


Implement a garbage collector for the maTe virtual machine.

This requires modifying the heap module that you implemented for Program 3. In particular, you will need to re-write the GC_alloc function so that it allocates blocks of memory from a single large block of memory that is malloc-ed when the heap is initialized.

Remember that the GC_alloc function allocates memory in units of bytes. Round the passed value up to the nearest multiple of eight before performing the allocation.

I have added a function, initializeHeap, that is called to initialize the heap module. It is passed a single argument, the requested size of the heap, in units of bytes. This value should be rounded up to the nearest multiple of eight.

The initializeHeap function should call malloc to allocate the heap. If malloc fails, call the haltVM function with an appropriate error message.

The initializeHeap function should also malloc the indirection array. The length of the indirection array should be set to the requested size of the heap, rounded up to the nearest multiple of eight, divided by eight, plus 1. So if the requested heapsize was 78, the indirection array length would be 11.

For debugging and testing, provide a public (non-static) function, dumpHeap, which returns nothing and takes no parameters. This function should print information about the current state of the memory allocator to stderr.

  1. Indirection Array: First, print a line that simply says "Indirection Array". Then print a line for each array element that contains a pointer to an allocated block. Print first the array index in decimal, followed by the relative address of the allocated block in decimal. The relative address of the block is its distance, in units of 32-bit words, from the beginning of the heap. Print the array elements in increasing index order.

  2. Heap: First, print a line that simply says "Heap". Then print a line for each block of the heap. Print the blocks in increasing address order. For each block, first print "address: ", followed by the block's relative address in decimal, followed by "; ". Second, print "length: ", followed by the length of the block (in units of 32-bit words) in decimal, followed by "; ". Third, print "Free; " if the block is not allocated or "Allocated; " if the block is allocated. Fourth, print "Marked; " if the garbage collector is active and the block has been marked by the collector or "Unmarked; " if the garbage collector is inactive or the garbage collector is active and the block has not been marked. The block length should include the one or two header words that are on all blocks, allocated or free. Here is a sample output (for the test GCtest2):
    Indirection Array
    1 254
    2 252
    3 250
    4 248
    Heap
    address: 0; length: 248; Free; Unmarked
    address: 248; length: 2; Allocated; Unmarked
    address: 250; length: 2; Allocated; Unmarked
    address: 252; length: 2; Allocated; Unmarked
    address: 254; length: 2; Allocated; Unmarked
    

You must implement a "first-fit" allocation strategy, meaning that you start searching for an available block at the start (low address end) of the heap, and you allocate from the first block that is big enough, splitting the block if it is bigger than required. Allocate the new block from the high-address end of the available block.

You should use the low bit of the first word of a block to indicate whether the block is allocated or free. You should use the second word of a block to store the length of the block, if the block is free. For an allocated block, the length of the block is determined from its class (stored in the first word) if contains an object. If the allocated block stores a string literal, then you need to look for the end of the string to see how long it is, remembering to round up the length to the nearest multiple of eight.

If there is no available block big enough to satisfy the request, then invoke the garbage collector. After the garbage collector finishes, try again to allocate the block. If there is still no available block big enough to satisfy the request, then halt the VM with an "out of heap space" message.

You must allocate an indirection array element for each object being allocated. As in Program 3, you can keep an index for the next indirection array element to look at when you are allocating indirection array elements. Initialize this index to be one. In Program 4, however, you truly need to search for an available element, one whose pointer is NULL. When you reach the end of the array, then wrap around and continue looking starting with index one. (Remember: we will never use the indirection array element with index 0.)

If there is no available indirection array element, then invoke the garbage collector. After the garbage collector finishes, try again to find an available indirection array element. If there is still no available element, then halt the VM with an "out of memory references" message.

The garbage collector will first mark all reachable objects by calling the getRoots method from the frame module (frame.o). You pass this function a pointer to a recursive marking function. The marking function will be called for each local slot in an active frame and each value currently on an operand stack of an active frame. The marking function should use the second lowest bit of the object header as the mark bit.

If the marking function is passed a null reference or a reference to an object that is already marked, then it just returns. Otherwise, if the passed reference denotes an object, other than a String, then mark the object and recurse on any fields. If the passed reference denotes a String, then mark it and mark the string literal associated with it.

When marking is complete, traverse the indirection array and free any element that points to an unmarked object. To free the element, simply make its pointer be NULL.

Then scan the heap, starting at the start of the heap, and make a block free if it was allocated but not marked. Clear the mark bit for any marked block too. And, when freeing a block, see if the block in front of it (at the lower address) is also free. If so, combine them into one free block.

If you are writing a compacting collector, then you need to also move all marked objects to the lowest address possible. That is, if a block is allocated and marked, then copy it to the beginning of the free block in front of it, if there is a free block in front of it. In essence this will push all free space to the end of the heap.

Of course, a compacting collector will need to update the pointers in the indirection array for all objects that are moved. It will also have to update the references in String objects that denote string literals. One way to do this is, when you are traversing the indirection array, for all elements that point to an allocated and marked block, store the header word pointed to by the element in the element, and store the index of the element in the header word. Then, when you scan the heap to free blocks and compact, use the stored indices to update the indirection array elements. And, of course, put the header words back in the objects.

Your program will be graded primarily by testing it for correct functionality:

  1. 60% - allocation of blocks, with the dumpHeap function implemented to show the current contents of the heap.

  2. 20% - collection and re-use of garbage blocks, including combining adjacent free blocks.

  3. 20% - compaction of allocated blocks to the low address end of the heap, allowing all free space to be combined into one block at the high end of the heap.

There is a Makefile and some tests available on agate in ~cs520/public/prog4. I have provided solutions to the other components of the maTe VM (class.o, frame.o, vm.o, native.o and mvm.o). When you build and run mvm, it now takes two command-line arguments: the heapsize followed by the class file.

Remember, you may lose points if your program is not properly structured or adequately documented. Coding guidelines are given on the course overview webpage.

Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment. Your programs will be compiled using these gcc flags: -g -Wall -std=c99.

Your programs should be submitted for grading from agate.cs.unh.edu. To turn in this assignment, type:
~cs520/bin/submit prog4 heap.c

Submissions can be checked by typing:
~cs520/bin/scheck prog4

This assignment is due Wednesday November 2. The standard late policy concerning late submissions will be in effect. See the course overview webpage.

Remember: as always you are expected to do your own work on this assignment. Copying code from another student or from sites on the internet is explicitly forbidden!


Last modified on October 17, 2016.

Comments and questions should be directed to pjh@cs.unh.edu