Programs: hencode and hdecode Assignment 3

$24.99 $18.99

“I’d crawl over an acre of ’Visual This++’ and ’Integrated Development That’ to get to gcc, Emacs, and gdb. Thank you.” (By Vance Petree, Virginia Power) | /usr/games/fortune Programs: hencode and hdecode This assignment is to build a le compression tool, hencode, that will use Hu – man codes to compress a given le, and…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

“I’d crawl over an acre of ’Visual This++’ and ’Integrated Development That’ to get to gcc, Emacs, and gdb. Thank you.” (By Vance Petree, Virginia Power)

| /usr/games/fortune

Programs: hencode and hdecode

This assignment is to build a le compression tool, hencode, that will use Hu – man codes to compress a given le, and a le decompression tool, hdecode, that will uncompress a le compressed with hencode.

Usage:

hencode in le [ out le ]

hdecode [ ( in le j – ) [ out le ] ]

hencode must:

take a command line argument that is the name of the le to be com-pressed; and

take a second, optional, command line argument that is the name of the out le. (If this argument is missing, the output should go to stdout.)

properly encode its input into a binary Hu man-coded le according to the algorithm below.

hdecode must:

Take an optional command line argument that is the name of the le to be uncompressed. If this argument is missing, or if the input le name is \-“, hdecode will take its input from standard input.

Take a second, optional, command line argument that is the name of the output le. (If this argument is missing, the output should go to stdout.)

Properly decode its input and restore the original data.

Both must:

use only Unix unbu ered IO (read(2) and write(2)) for reading and writing the les; and

share as much of the code base as is reasonable between the two programs; and

demonstrate robust programming techniques including proper handling of errors; and

adhere to a \reasonableness” standard with respect to performance.

See also the \Pesky Details” section below.

Hu man Codes

In 1952, David A. Hu man proposed a method for compressing les by gener-ating a variable length encoding of characters based on the relative frequency with which each character occurs in a le. This encoding is what is known as pre xless code because no letter encoding is a pre x of any other encoding. In practice, this means that the coding can be stored in a binary tree with the characters at the leaves. Each character’s encoding can be determined by look-ing at the sequence of right or left child node transitions and representing those as either zero bits or one bits.

Generating the Tree

Note: Be sure to follow the tiebreaker conventions outlined in this document. If you do something di erent, you will still get a valid encoding of your le, but will be di erent from the reference one. That means that both programs will have to work perfectly all the way through in order to be tested. If you follow the tiebreakers partial credit becomes a possibility.

To build the tree, it is rst necessary to generate a histogram of all the characters in the le, remembering that any value from 0 to 255 represents a valid byte. We will demonstrate the algorithm through the following encoding example.

If the le you are encoding consists of the string \bmmmcccccttttttaaaaaaa”

the histogram will be:

Byte

Count

0x61

(’a’)

7

0x62

(’b’)

1

0x63

(’c’)

5

0x6d (’m’)

3

0x74

(’t’)

6

Take these counts and generate a linked list of them in ascending order of frequency. If there is a tie in frequency, do a secondary sort and place the characters also in ascending order. For the histogram above, this list will look like:

0x62 (‘b’) 0x6d (‘m’) 0x63 (‘c’) 0x74 (‘t’) 0x61 (‘a’)

1 3 5 6 7

Now, remove the rst two nodes on the list and construct a new node whose frequency is the sum of the two removed nodes. Make the rst removed node the left child of the new node and the second removed node the right subchild:

4

0

1

0x62 (‘b’)

0x6d (‘m’)

1

3

Next, reinsert the new node into the remaining list of nodes in order. If there is a frequency tie, insert the new node before the other node in the list. After reinsertion of the new node, our list will look like:

4

0x63 (‘c’)

0x74 (‘t’)

0x61 (‘a’)

5

6

7

0

1

0x62 (‘b’)

0x6d (‘m’)

1

3

Repeat the process until there is only one node left in the list. The rest of the process for this le is shown in Figure 1.

Now that you have the tree, to encode the le all that is necessary is to do a pass over the tree to extract the encodings into a table, then re-read the input le and translate each input character into the appropriate sequence of bits. If the le ends with a partially lled byte, pad the nal byte with zeros.

The codes extracted from the tree above will be:

0x74 (‘t’)

0x61 (‘a’)

9

6

7

0

1

4

0x63 (‘c’)

5

0

1

0x62 (‘b’)

0x6d (‘m’)

1

3

(a) After a second pass

9

13

0

1

0

1

4

0x63

(‘c’)

0x74 (‘t’)

0x61 (‘a’)

5

6

7

0

1

0x62 (‘b’)

0x6d (‘m’)

1

3

(b) After a third pass

22

0

1

9

13

0

1

0

1

4

0x63 (‘c’)

0x74 (‘t’)

0x61 (‘a’)

5

6

7

0

1

4

0x62 (‘b’)

0x6d (‘m’)

1

3

0x61 (’a’): 11

0x62 (’b’): 000

0x63 (’c’): 01

0x6d (’m’): 001

0x74 (’t’): 10

File Format

The output format for the encoded le consists of two parts. First, comes a header that contains the frequency information necessary to re-create the tree, then the bits of the encoded le.

The rst four bytes of the header consists of an integer containing the number of characters in the frequency table. (Remember, not all les will contain all 256 possible bytes.) Following this comes the frequency table, in alphabetical order1. Each element of the frequency table consists of a single byte that it the byte itself, followed by a four-byte integer that is the frequency.

zero bit, go left, for each one bit, go right. Because this is a pre xless code, you will know you have decoded a character when you reach a leaf. Output this character and start again at the root.

You know how many characters are in the output from the counts encoded in the frequency table.

Tricks and Tools

xxd(1)

Useful for reading binary les to see what’s what.

open(2)

The Unix basic IO functions. Check out the man pages.

close(2)

read(2)

write(2)

lseek(2)

Figure 3: Some potentially useful library functions

Some potentially useful library functions listed in Figure 3. Also, it might be helpful to know that <stdint.h> de nes a set of xed-width data types which are more portable than ints when you really need to know how big something is. These exist in signed and unsigned versions named intXX t and uintXX t, where XX is the number of bits. For example, uint32 t size makes size a 32-bit unsigned integer.

Coding Standards and Make

See the pages on coding standards and make on the cpe 357 class web page.

Pesky Details

Endianness

Because you are reading and writing binary data, the endianness of integers will be apparent. For this assignment you do not need to be concerned with this, but know that les encoded on a big-endian machine will not decode properly on a little-endian one (and vice-versa). Note that the x86 (vogon) is little-endian but if you’re developing on a PowerPC Mac, it’s big-endian.

Bits and their order

If it is necessary to pad the nal byte of the le, pad it with zero bits.

When lling bits, ll from the high order bits. That is, if the nal four bits of an encoded le are 1010, the nal byte of the le will be 0xA0.

What to turn in

Submit via handin on the Unix servers to the asgn3 directory of the ngonella account:

your well-documented source les.

A make le (called Makefile) that will build your programs when given the command \make all”.

A README le that contains: { Your name.

{ Any special instructions for running your program.

{ Any other thing you want me to know while I am grading it.

The README le should be plain text, i.e, not a Word document, and should be named \README”, all capitals with no extension, ‘.md‘, or ‘.txt‘.

Sample runs

Below are some sample runs of hencode and hdecode.

Here is an example of encoding the le used in the example above:

  • htable testfile 0x61: 11

0x62: 000

0x63: 01

0x6d: 001

0x74: 10

  • hencode testfile testfile.huff

  • hdecode testfile.huff testfile.out

  • diff testfile testfile.out

  • ls -l

total 12

-rw——-

1

ngonella ngonella 22

Oct 18

15:00 testfile

-rw——-

1

ngonella ngonella

35

Oct

18

15:02 testfile.huff

-rw——-

1

ngonella ngonella

22

Oct

18

15:02 testfile.out

Note that, because testfile is so small, this made the \compressed” le bigger. Consider this example with a larger le, the class notes for cpe357 so far:

% ls -l class.ps

-rw——- 1 ngonella ngonella 339520 Oct 18 15:04 class.ps

  • hencode class.ps class.ps.huff

  • ls -l class.*

-rw——-

1

ngonella ngonella

339520

Oct

18

15:04 class.ps

-rw——-

1

ngonella ngonella

217162

Oct

18

15:05 class.ps.huff

%

A reduction of 36% isn’t so bad.

8

Programs: hencode and hdecode Assignment 3
$24.99 $18.99