CS520
Fall 2016
Program 5
Due Wednesday November 16


The goal of this program is to use Posix threads to concurrently analyze the words in a set of text files. To do this, you will first construct a queue facility for threads interacting via the producer-consumer pattern.

The queue facility should contain these functions:

  1. void *pcQueueCreate(unsigned int queueLength)

    Create a queue of the given length. The specified queue length must be greater than zero. The queue stores void* pointers. A void* "handle" is returned, to be used to manipulate the queue.

  2. void pcQueuePut(void *queue, void *value)

    Put a value into the queue, which is FIFO. If the queue is full, the calling thread blocks until space in the queue becomes available. It is a fatal error (a message is printed to stderr and the program is terminated) if a put is attempted on a closed queue.

  3. void *pcQueueGet(void *queue)

    Get a value from the queue. If the queue is empty, the calling thread blocks until a value becomes available. If a queue is closed, get calls will retrieve values from the queue until the queue is empty. Once the queue is empty, get calls on a closed queue will return NULL.

  4. void pcQueueClose(void *queue)

    Close the queue. All future put calls are fatal errors. Future get calls will return values until the queue is empty. Once the queue is empty, get calls will return NULL. It is not an error to close a queue more than once.

  5. void pcQueueDelete(void *queue)

    Free all memory associated with the queue. It is a fatal error if the queue is not closed.

You will need to define a struct to represent the state of a queue. Put a 64-bit word at the beginning of the struct and always initialize it to the same "magic number". Choose a magic number that sets some of the bits in each byte of the 64-bit word. Use this to do a validity check on the queue handle that is passed into the get, put, close and delete functions. Also be sure to check that the handle is not NULL.

Put all the code for the queue facility in the file pcQueue.c.

The goal of the text-file analysis program is to find the longest word in a set of ASCII files. A single answer should be reported, on a line by itself, with no other output. (That is, don't print "The longest word is ...". Just print the longest word on a line by itself.) If there are multiple words of the maximum length, report the one that is lexicographically smallest.

The program should take a list of files on the command line. (You can assume there will be no more than 25 files.) For each file create two threads, one producer and one consumer. Each producer should open one file, and read it one line at a time, sending each file individually to a producer-consumer queue. (The length of the queue should be set to ten times the number of files.) Each consumer reads lines from the queue and tracks the longest word that it sees. When a producer reaches EOF, it should send one NULL to the queue. When a consumer retrieves a NULL, it should send the longest word that is saw to a second producer-consumer queue. (The length of the second queue should be set to the number of files.) The consumer for this second queue will be the main thread, which will then select the longest word seen by any of the consumer threads that did the word analysis. The main thread then prints the answer.

If the user does not specify at least one file to be processed, then terminate the program with an appropriate message.

A word starts with a letter (either uppercase or lowercase) and continues until a non-letter (or EOF) is encountered. Only consider words that are at least eight letters long. Non-words in the file should simply be ignored. Once a word is identified, convert all uppercase letters to lowercase before you process it.

Therefore, "elephant's" will be two words, "elephant" and "s", and since "s" is less than eight letters long, it will be ignored. Likewise, "double-precision" will be two words, "double" and "precision", and since "double" is less than eight letters long, it will be ignored.

If the files do not contain any words at least eight letters long, then the correct answer is the empty string, meaning that the program should print a blank line.

You may assume that all lines that are read will be less than 1000 characters long.

You should make sure that all malloc-ed memory is free before the program exits. We will use valgrind to make sure you do this.

You should use helgrind to debug your program.

Put all the source code for the test file analysis program in the file prog5.c.

Your program will be graded primarily by testing it for correct functionality:

  1. 50% - a producer-consumer queue that works with a single producer and a single consumer.

  2. 20% - a producer-consumer queue that works with multiple producers and multiple consumers.

  3. 30% - the text-file analysis program.

There is a Makefile, a header file and stubs for the functions required to implement the producer-consumer queue, and some tests available on agate in ~cs520/public/prog5. Code from the tests can be adapted and used in your text-file analysis program. There is also a set of large text files in ~cs520/public/prog5/files.

Remember, you may lose points if your program is not properly structured or adequately documented. Coding guidelines are given on the course overview webpage.

Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment. Your programs will be compiled using these gcc flags: -g -Wall -std=c99 -pthread.

Your programs should be submitted for grading from agate.cs.unh.edu. To turn in this assignment, type:
~cs520/bin/submit prog5 pcQueue.c prog5.c

Submissions can be checked by typing:
~cs520/bin/scheck prog5

This assignment is due Wednesday November 16. The standard late policy concerning late submissions will be in effect. See the course overview webpage.

Remember: as always you are expected to do your own work on this assignment. Copying code from another student or from sites on the internet is explicitly forbidden!


Last modified on November 2, 2016.

Comments and questions should be directed to pjh@cs.unh.edu