CS520
Spring 2017
Programming Assignment 4
Due Wednesday April 5


The goal of this program is to use POSIX threads to concurrently analyze the words in a set of text files. To do this, you will use a concurrent hash table facility that you will construct in Lab 8. And, for full credit, you will modify the concurrent hash table facility to support the optional use of thin locks instead of mutexes.

The goal of the text-file analysis program is to find the longest word that is found in every file in a set of ASCII files. That is, to qualify a word must appear in every file, and it must be the longest such word.

A single answer should be printed to stdout, on a line by itself, with no other output. (That is, don't print "The longest word is ...". Just print the longest word on a line by itself.) If there are multiple words of the maximum length, report the one that is lexicographically smallest. If there is no word that is found in every file, then print a blank line.

The program should take a list of files on the command line. (You can assume there will be no more than 25 files.) If a bad filename is provided (one that cannot be opened), print an error message to stderr and consider the file to have no words in it.

If the user does not specify at least one file to be processed, then terminate the program with an appropriate message to stderr.

Each file should be processed by a different thread. A single concurrent hash table should be used to collect information about the words in the files. You can size the table, and specify the number of locks and the type of lock, to be whatever you think is best.

A word starts with a letter (either uppercase or lowercase) and continues until a non-letter (or EOF) is encountered. Only consider words that are at least six, and no more than 50, letters long. Non-words in the file should simply be ignored. Once a word is identified, convert all uppercase letters to lowercase before you process it.

Therefore, "elephant's" will be two words, "elephant" and "s", and since "s" is less than six letters long, it will be ignored. Likewise, "double-precision" will be two words, "double" and "precision".

If the files do not contain any words, then the program should print a blank line to stdout.

Once you have the multithreaded program working, then return to the concurrent symbol table facility and add support for the use of thin locks. That is, use the atomic_flag type, the atomic_flag_clear function and the atomic_flag_test_and_set function, which are all available in stdatomic.h, to build your own lock. Both functions take a pointer to the atomic_flag value that is to be manipulated.

You should make sure that all malloc-ed memory is free before the program exits. We will use valgrind to make sure you do this.

You should use helgrind to debug your program. Please note that helgrind will complain if you use thin locks, so do not run it if you are using thin locks.

If it is helpful, you can re-use code from the main.c files that were distributed with Lab 5 and Lab 8.

Put all the source code for the test file analysis program in the file prog4.c.

Your program will be graded primarily by testing it for correct functionality:

  1. 60% - the implementation of a concurrent hash table, without thin locks.

  2. 30% - the multithreaded program to analyze text files and find the longest word that is used in all files.

  3. 10% - supporting thin locks to protect the concurrent hash table.

Remember, you may lose points if your program is not properly structured or adequately documented. Coding guidelines are given on the course overview webpage.

Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment. Your programs will be compiled using these gcc flags: -g -Wall -std=c99 -pthread.

Your programs should be submitted for grading from agate.cs.unh.edu. To turn in this assignment, type:
~cs520/bin/submit prog4 symtab.c prog4.c

Submissions can be checked by typing:
~cs520/bin/scheck prog4

This assignment is due Wednesday April 5. The standard late policy concerning late submissions will be in effect. See the course overview webpage.

Remember: as always you are expected to do your own work on this assignment. Copying code from another student or from sites on the internet is explicitly forbidden!


Last modified on April 2, 2017.

Comments and questions should be directed to pjh@cs.unh.edu