CS520
Spring 2017
Laboratory 5
Friday February 24
The lab must be submitted prior to noon on Sunday February 26.


Implement a facility for creating, using and deleting symbol tables. The symbol tables should be implemented as hash tables that use separate chaining.

Follow these steps to complete the lab:

  1. On agate create a subdirectory for this lab and copy the files in ~cs520/public/lab5 to the new subdirectory.

  2. The file symtab.h describes the seven functions that you should implement for this lab. Copy this file into symtab.c, where you will implement the functions. Edit each function to put a statement block (matching curly brackets) with an appropriate return statement so that each function will compile and do nothing if called.

  3. Put the following #include lines at the top of the file (after a comment describing the purpose of the file):

  4. The file Makefile describes how to build the test program using the files main.c, symtab.h and your symtab.c. To build the test program, simply type "make". Do this now to be sure your symtab.c will compile with your initial "stub" implementations.

  5. Define a struct to be linked together at one level of the hash table. This struct needs to have a char* member for the symbol, a void* member for the data pointer, and a member to point to the next struct in the linked list. I recommend using a typedef to define a name for the struct type.

  6. Define a struct to hold the control information for one symbol table. This struct needs to have a member to hold the length of the hash table and a member to point to the array of that length that contains pointers to the structs that are linked together. This latter member is a bit tricky, as it is a pointer to a pointer to the linked struct. Again I recommend using a typedef to define a name for the struct type.

  7. Define a hash function, such as this one:
    // modified FNV hash (see http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash)
    static unsigned int hash(const char *str)
    {
      const unsigned int p = 16777619;
      unsigned int hash = 2166136261u;
      while (*str)
      {
        hash = (hash ^ *str) * p;
        str += 1;
      }
      hash += hash << 13;
      hash ^= hash >> 7;
      hash += hash << 3;
      hash ^= hash >> 17;
      hash += hash << 5;
      return hash;
    }
    
    Note the u suffix that makes the decimal constant 2166136261 unsigned.

  8. Implement a helper function to do an internal lookup of a symbol in a given table, returning a pointer to the linked struct containing the symbol, if one is found. The function should be passed the control struct for a hash table and a const char* pointer to the symbol. (The const keyword indicates that the function should not change the characters in the string representing the symbol.) The function should use the hash function to "hash" the symbol to an integer value and then compute the modulus of that integer and the table's length. The symbol is then searched for on the linked list at that level in the table. Use the strcmp function to compare symbol strings. (Be careful, strcmp returns zero when the strings are identical.) If the symbol is found, then return a pointer to the struct that contains the symbol. If the symbol is not found, then return NULL.

  9. Use the static keyword to indicate that both the hash function and the lookup helper function should not be called from outside this file. (That is, they are not part of the public interface of this file.)

  10. Implement symtabCreate.
    1. Use malloc to allocate space for a control struct for the new symbol table. Use sizeof with the typedef name for the control struct to calculate the size in bytes of the struct, which is the value that should be passed to malloc.

    2. Check the return value of malloc and, if NULL, return NULL.

    3. Initialize the length member of the new member with the sizeHint value passed in to symtabCreate.

    4. Use malloc to allocate a array of pointers to the linked struct. The size in bytes of this array is computed by multiplying the size of a pointer to a linked struct times the length member. Store the returned pointer in the second member of the control struct.

    5. Check the return value of malloc and, if NULL, call free to deallocate memory for the control struct and then return NULL.

    6. Return a pointer to the control struct.

  11. Implement symtabInstall.
    1. The symtabHandle parameter is actually a pointer to a control struct so assign it to a local variable of this type.

    2. Call the lookup helper function to check if the symbol is already in the table.

    3. If the symbol is found in the table, then simply update the data member of the struct returned by the lookup helper function.

    4. If the symbol is not found, then use malloc to allocate a new linked struct. Also use malloc to allocate space for a copy of the new symbol. Use strlen to compute the length of the string representing the symbol. (When calling malloc don't forget to add one for the NULL byte on the end of C strings. strlen does not count this NULL byte but you need to allocate space for it.) Use strcpy to copy the symbol string to the newly allocated space. Store a pointer to the copy of the symbol in the symbol member of the new linked struct. Store the data value in the data member of the struct. Use the hash function applied to the symbol modulus the table length to compute the level in the table where the new linked struct should be inserted. It is probably easiest to link the new struct on the front of this list.

    5. If a call to malloc fails, then return 0; otherwise return 1.

  12. Implement symtabLookup. This can easily be done using the lookup helper function.

  13. Define a struct to hold an iterator. This struct needs to have a member that points to the control struct for the hash table being iterated over, a unsigned integer member that is the index of the level in the symbol table that contains the next (symbol,data) pair to be returned, and a pointer to the struct containing that (symbol,data) pair. Again I recommend using a typedef to define a name for the struct type.

  14. Implement symtabCreateIterator.
    1. The symtabHandle parameter is actually a pointer to a control struct so assign it to a local variable of this type.

    2. Use malloc to allocate space for an iterator struct. If malloc returns NULL, then return NULL.

    3. Save the symtabHandle parameter in the control struct pointer member of the new iterator struct.

    4. Now search from the beginning of the table to find the first level that contains a symbol. Save the index of this level in the iterator struct. Save the address of the linked struct that contains the first symbol of this level in the iterator struct.

    5. If there are no symbols in the table, then set the linked struct pointer member of the iterator struct to NULL.

    6. Return a pointer to the iterator struct.

  15. Implement symtabNext.
    1. The iteratorHandle parameter is actually a pointer to a iterator struct so assign it to a local variable of this type.

    2. If the linked struct pointer member of the iterator struct is NULL, then return NULL.

    3. Otherwise, initialize a local variable to contain the pointer currently stored in the symbol member of the linked struct pointed to by the iterator struct, return the data member of that linked struct through the returnData parameter, which is an output parameter (meaning it points to where you should store the data value), and then find the next symbol that should be returned by the iterator and update the iterator struct members accordingly. (Note that the index member might not change, if there is another symbol at the current level in the table that has not yet been returned.) If all symbols in the table have now been processed, then set the linked struct pointer member of the iterator struct to NULL. (Note that this would happen when your search for the next symbol to return hits the end of the table without finding a symbol.)

    4. Return the symbol pointer that you stashed in a local variable.

  16. Test your code using the provided main.c. There are some large text files available in ~cs520/public/books. Here are the correct results for two of these files:
  17. Implement symtabDelete. This requires freeing the strings for all symbols in the table, freeing all linked structs in the table, freeing the array of pointers to linked structs, and freeing the control struct. Note: do not call free for the data member of the linked structs.

  18. Implement symtabDeleteIterator. This simply requires a call to free to deallocate the memory for the iterator struct.

  19. Re-test your code to be sure the delete functions are working. Use the valgrind memcheck facility to validate that you properly free everything.

  20. Read over my main.c. It illustrates some useful C features that you might not yet know, such as isalpha, tolower and the printf %s format for printing strings. It also illustrates some things you need to do in this lab, such as using malloc, strlen and strcpy to make copies of strings. It also illustrates the dubious technique of storing an integer value in a void*.

  21. To turn in this laboratory, type:
    % ~cs520/bin/submit lab5 symtab.c
    

    Submissions can be checked by typing:

    % ~cs520/bin/scheck lab5
    

70 points will be awarded for correctly implementing symtabCreate, symtabInstall, symtabLookup, symtabCreateIterator, and symtabNext. 30 points will be awarded for correctly implementing symtabDelete, and symtabDeleteIterator.

Remember that there are no late submissions of labs. Submit what you have prior to noon on Sunday February 26.


Last modified on February 25, 2017.

Comments and questions should be directed to pjh@cs.unh.edu