csnotes/370/notes/huffman.md
2019-03-28 14:51:54 -07:00

36 lines
941 B
Markdown

# Huffman codes
Covering: Fixed length encoding & Variable length encoding
# Fixed length encoding
consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width.
# Huffman Trees
We create a tree of character frequencies where each node basically has a character and that character's frequency.
```
struct Node {
uint8_t c;
size_t frequency;
...
};
```
Rules of thumb:
* The more frequent characters are close to the root
* Less frequent characters are found far from the root
We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree.
General decoding process goes like this:
1. Get frequencies of all symbols
2. Put those frequenncies of symbols into a structure like above
3. Build max heap of the node set
Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1
4. When we reach a leaf we drop that char into the result