The goal of this program is to use POSIX threads to concurrently analyze the words in a set of text files. To do this, you will use a concurrent hash table facility that you will construct in Lab 8. And, for full credit, you will modify the concurrent hash table facility to support the optional use of thin locks instead of mutexes.
The goal of the text-file analysis program is to find the longest word that is found in every file in a set of ASCII files. That is, to qualify a word must appear in every file, and it must be the longest such word.
A single answer should be printed to stdout, on a line by itself, with no other output. (That is, don't print "The longest word is ...". Just print the longest word on a line by itself.) If there are multiple words of the maximum length, report the one that is lexicographically smallest. If there is no word that is found in every file, then print a blank line.
The program should take a list of files on the command line. (You can assume there will be no more than 25 files.) If a bad filename is provided (one that cannot be opened), print an error message to stderr and consider the file to have no words in it.
If the user does not specify at least one file to be processed, then terminate the program with an appropriate message to stderr.
Each file should be processed by a different thread. A single concurrent hash table should be used to collect information about the words in the files. You can size the table, and specify the number of locks and the type of lock, to be whatever you think is best.
A word starts with a letter (either uppercase or lowercase) and continues until a non-letter (or EOF) is encountered. Only consider words that are at least six, and no more than 50, letters long. Non-words in the file should simply be ignored. Once a word is identified, convert all uppercase letters to lowercase before you process it.
Therefore, "elephant's" will be two words, "elephant" and "s", and since "s" is less than six letters long, it will be ignored. Likewise, "double-precision" will be two words, "double" and "precision".
If the files do not contain any words, then the program should print a blank line to stdout.
Once you have the multithreaded program working, then return to the concurrent symbol table facility and add support for the use of thin locks. That is, use the atomic_flag type, the atomic_flag_clear function and the atomic_flag_test_and_set function, which are all available in stdatomic.h, to build your own lock. Both functions take a pointer to the atomic_flag value that is to be manipulated.
int pthread_yield(void);
.
You should use helgrind to debug your program. Please note that helgrind will complain if you use thin locks, so do not run it if you are using thin locks.
If it is helpful, you can re-use code from the main.c files that were distributed with Lab 5 and Lab 8.
Put all the source code for the test file analysis program in the file prog4.c.
Your program will be graded primarily by testing it for correct functionality:
Remember, you may lose points if your program is not properly structured or adequately documented. Coding guidelines are given on the course overview webpage.
Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment. Your programs will be compiled using these gcc flags: -g -Wall -std=c99 -pthread.
Your programs should be submitted for grading from
agate.cs.unh.edu.
To turn in this assignment, type:
~cs520/bin/submit prog4 symtab.c prog4.c
Submissions can be checked by typing:
~cs520/bin/scheck prog4
This assignment is due Wednesday April 5. The standard late policy concerning late submissions will be in effect. See the course overview webpage.
Remember: as always you are expected to do your own work on this assignment. Copying code from another student or from sites on the internet is explicitly forbidden!
Comments and questions should be directed to pjh@cs.unh.edu