csnotes/370/notes/huffman.md
2019-03-28 14:51:54 -07:00

941 B

Huffman codes

Covering: Fixed length encoding & Variable length encoding

Fixed length encoding

consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width.

Huffman Trees

We create a tree of character frequencies where each node basically has a character and that character's frequency.

struct Node {
	uint8_t c;
	size_t frequency;
	...
};

Rules of thumb: * The more frequent characters are close to the root * Less frequent characters are found far from the root

We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree.

General decoding process goes like this:

1. Get frequencies of all symbols
2. Put those frequenncies of symbols into a structure like above
3. Build max heap of the node set
	Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1
4. When we reach a leaf we drop that char into the result