(Last revised 9/28 at 2:30pm) This example illustrates the encoded file format used for Project 2. Let's assume a very simple input file, with 12 bytes whose contents are: HelloTWorld! where the "T" is actually represents a tab character. (The tab character is ASCII code 09 in hexidecimal. I will use Hexidecimal values to simplify description of the output file in the end.) We first deduce the following frequency table from the 12 characters in the input file. char freq ASCII val (in hex) T (tab) 1 09 ! 1 21 H 1 48 W 1 57 d 1 64 e 1 65 l 3 6C o 2 6F r 1 72 From this, we build the huffman coding tree. I use the convention that equal-valued characters go onto the list in ascending order of ASCII code value (thus the initial list is in order ) and that created trees go back onto the list at the earliest possible point in the case of ties. The tree I build yields the following code table: char freq ASCII code T (tab) 1 09 1110 ! 1 21 1111 H 1 48 1100 W 1 57 1101 d 1 64 0110 e 1 65 0111 l 3 6C 10 o 2 6F 00 r 1 72 010 The output file consists of the following: * A one byte quantity that specifies the number of codes. * A series of codes of the form . * The codes themselves * A (byte aligned) four byte quantity that specifies the number of encoded characters in the encoded message. * The encoded message. The logical values for these components, with byte boundaries shown with a bar (|), are: number of codes (in decimal): 9 code values and lengths (in hex): 09|4|21|4|48|4|57|4|64|4|65|4|6C|2|6F|2|72|3 codes themselves (in binary): 11101111|11001101|01100111|1000010x number of coded characters (in decimal): 12 the encoded message (in binary): 11000111|10100011|10110100|01010011|01111xxx Note that the codes and the message are padded with 0s to fill out the bytes. To indicate where these pad bits are, I show them with the value "x". The actual file itself will contain the following, with all values specified in Hexidecimal: 09 09 04 21 04 48 04 57 04 64 04 65 04 6C 02 6F 02 72 03 EF CD 67 84 0C 00 00 00 C7 A3 B4 53 78 Don't forget that the number of coded characters is a 4 byte quantity. I am assuming that the low-order byte is the left-most one, which is why the integer (decimal) value 12 is shown as the 4 byte values 0C 00 00 00. The output file will contain a total of 32 bytes, only 5 of which are the actual encoded message itself. The rest is overhead to describe the context.