941 B
941 B
Huffman codes
Covering: Fixed length encoding & Variable length encoding
Fixed length encoding
consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width.
Huffman Trees
We create a tree of character frequencies where each node basically has a character and that character's frequency.
struct Node {
uint8_t c;
size_t frequency;
...
};
Rules of thumb: * The more frequent characters are close to the root * Less frequent characters are found far from the root
We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree.
General decoding process goes like this:
1. Get frequencies of all symbols
2. Put those frequenncies of symbols into a structure like above
3. Build max heap of the node set
Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1
4. When we reach a leaf we drop that char into the result