CS611
Programming Assignment 1
Spring 2001


You are to write two programs: one to compress ASCII files and one to un-compress compressed files.

Write a program, ca7, that will read a stream of ASCII characters from stdin and will write a compressed form of those characters to stdout.

The compression program should leverage the fact that ASCII only uses the bottom (least-significant) seven bits of an 8-bit byte. The compression program should simply squeeze out the 8th bit. That is, the program should consider the output to be a stream of bits and the 7 data bits from each input byte should simply be sent to the output bit stream, with the 8th bit being discarded. The 7 data bits should be sent to the output stream in order of increasing significance, bit 0 (least significant) first, bit 1 next, and so on.

Of course, the output is really a sequence of bytes so the logical output bit stream must be packed into output bytes. The bit stream is packed into bytes by assigning bits to increasing bit position (increasing significance) within a byte, and going to the next byte when the upper bit has been filled in the current byte. At end of file, if there are empty bit positions in the current byte, then fill them with zeros before outputing the byte.

As an example, a file consisting of the bytes 7F 03 1A 33 should be "compressed" to the file containing FF 81 66 06. (This file is too small to gain anything from the simple compression technique.)

If the ca7 program encounters a byte with the upper bit set, then an error should be reported that includes the byte offset of the error. (The first byte is at offset 0, the next byte is at offset 1, etc.) The bit should still be discarded however and the compression should continue.

Write a program, ua7, that will read a compressed file from stdin (one that was produced by running ca7) and will un-compress it. The program should simply reproduce the ASCII file from which its input was derived (assuming the original file was a valid ASCII file with the 8th bit always 0). For example, if ua7 is given the bytes FF 81 66 06 it should output 7F 03 1A 33.

The only possibility for an error in a compressed file is if the last byte contains "extra" bits that are not set to zero. As described above, "extra" bits occur when the number of data bits in the original ASCII file is not evenly divisible by 8. The ca7 program should have set those bits to zero, but if you encounter a 1 in one of those bits, issue an appropriate error message.

The two programs will be worth equal credit: each is worth 50% of the points for the assignment.

Your program will be graded primarily by testing it for correct functionality. However, you may lose points if your program is not properly structured or adequately documented.

Before starting, be sure you understand that the output of ca7 is a binary file (not an ASCII file) and is most likely not viewable using the standard editing tools or by sending it to an ASCII display device.

You may find using the od command on alberti helpful for analyzing input and output files. In particular, using the -tx1 flag will display the bytes of a file, one byte at a time, in hexadecimal.

You must write your programs in C. The source for program ca7 should be placed into ca7.c. The source for program ua7 should be placed into ua7.c.

You must submit a Makefile (called "Makefile") so that I can conveniently build your programs. The Makefile goal of "ca7" should build an executable called "ca7" and the goal of "ua7" should build an executable called "ua7". Your programs will be graded using an Alpha machine (e.g. alberti) so be sure to test in that environment.

Your programs should be submitted for grading from alberti (or hypatia or hopper). To turn in this assignment, type:
~cs611/bin/submit prog1 ca7.c ua7.c Makefile

Do not turn in any non-Ascii files (i.e. no object files, no executable files, etc.).

Submissions can be checked by typing:
~cs611/bin/scheck prog1

To receive full credit for the assignment, you must turn in your files prior to 8am on Monday February 5. Late submissions will be accepted at the penalty of 5% per day up to one week late.

Remember: as always you are expected to do your own work on this assignment.


Last modified on January 25, 2001.

Comments and questions should be directed to pjh@cs.unh.edu