You are to write two programs: one to compress ASCII files and one to un-compress compressed files.
Write a program, ca7, that will read a stream of ASCII characters from stdin and will write a compressed form of those characters to stdout.
The compression program should leverage the fact that ASCII only uses the bottom (least-significant) seven bits of an 8-bit byte. The compression program should simply squeeze out the 8th bit. That is, the program should consider the output to be a stream of bits and the 7 data bits from each input byte should simply be sent to the output bit stream, with the 8th bit being discarded. The 7 data bits should be sent to the output stream in order of increasing significance, bit 0 (least significant) first, bit 1 next, and so on.
Of course, the output is really a sequence of bytes so the logical output bit stream must be packed into output bytes. The bit stream is packed into bytes by assigning bits to increasing bit position (increasing significance) within a byte, and going to the next byte when the upper bit has been filled in the current byte. At end of file, if there are empty bit positions in the current byte, then fill them with zeros before outputing the byte.
As an example, a file consisting of the bytes 7F 03 1A 33 should be "compressed" to the file containing FF 81 66 06. (This file is too small to gain anything from the simple compression technique.)
If the ca7 program encounters a byte with the upper bit set, then an error should be reported that includes the byte offset of the error. (The first byte is at offset 0, the next byte is at offset 1, etc.) The bit should still be discarded however and the compression should continue.
Write a program, ua7, that will read a compressed file from stdin (one that was produced by running ca7) and will un-compress it. The program should simply reproduce the ASCII file from which its input was derived (assuming the original file was a valid ASCII file with the 8th bit always 0). For example, if ua7 is given the bytes FF 81 66 06, it should output 7F 03 1A 33.
The only possibility for an error in a compressed file is if the last byte contains "extra" bits that are not set to zero. As described above, "extra" bits occur when the number of data bits in the original ASCII file is not evenly divisible by 8. The ca7 program should have set those bits to zero, but if you encounter a 1 in one of those bits, issue an appropriate error message.
The two programs will be worth equal credit: each is worth 50% of the points for the assignment.
Your program will be graded primarily by testing it for correct functionality. However, you may lose points if your program is not properly structured or adequately documented.
Before starting, be sure you understand that the output of ca7 is a binary file (not an ASCII file) and is most likely not viewable using the standard editing tools or by sending it to an ASCII display device.
You may find using the od command on agate helpful for analyzing input and output files. In particular, using the -tx1 flag will display the bytes of a file, one byte at a time, in hexadecimal.
You must write your programs in C. The source for program ca7 should be placed into ca7.c. The source for program ua7 should be placed into ua7.c.
By the end of the lab on Friday September 3, you should have a fairly complete implementation of ca7. Prior to the lab, at least do a design for the program and start the implementation. Use the lab to get any questions answered, to complete the implementation, and to do the initial testing. Your lab submission will be graded using only one test file: ~cs520/public/prog1/test1.
By the beginning of the lab on Friday September 10, you should also have a fairly complete implementation of ua7. This means you should design and implement the program prior to coming to lab. Use the lab to do your final testing.
Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment.
Your programs should be submitted for grading from
agate.cs.unh.edu.
To turn in this assignment, type:
~cs520/bin/submit prog1 ca7.c ua7.c
Do not turn in any non-Ascii files (i.e. no object files, no executable files, etc.).
Submissions can be checked by typing on agate.cs.unh.edu:
~cs520/bin/scheck prog1
To receive full credit for the assignment, you must turn in your files prior to 8am on Monday September 13. Programming assignments may be handed in late at a penalty of 2 points for one day late, 5 points for two days late, 10 points for three days late, 20 points for four days late, and 40 points for five days late. No program may be turned in more than 5 days late.
Remember: as always you are expected to do your own work on this assignment.
Comments and questions should be directed to hatcher@unh.edu