36 lines
941 B
Markdown
36 lines
941 B
Markdown
# Huffman codes
|
|
|
|
Covering: Fixed length encoding & Variable length encoding
|
|
|
|
# Fixed length encoding
|
|
|
|
consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width.
|
|
|
|
# Huffman Trees
|
|
|
|
We create a tree of character frequencies where each node basically has a character and that character's frequency.
|
|
|
|
```
|
|
struct Node {
|
|
uint8_t c;
|
|
size_t frequency;
|
|
...
|
|
};
|
|
```
|
|
|
|
Rules of thumb:
|
|
* The more frequent characters are close to the root
|
|
* Less frequent characters are found far from the root
|
|
|
|
We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree.
|
|
|
|
General decoding process goes like this:
|
|
|
|
1. Get frequencies of all symbols
|
|
2. Put those frequenncies of symbols into a structure like above
|
|
3. Build max heap of the node set
|
|
Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1
|
|
4. When we reach a leaf we drop that char into the result
|
|
|
|
|