huffman notes
This commit is contained in:
parent
ebc0272dcf
commit
9b21944cf6
35
370/notes/huffman.md
Normal file
35
370/notes/huffman.md
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
# Huffman codes
|
||||||
|
|
||||||
|
Covering: Fixed length encoding & Variable length encoding
|
||||||
|
|
||||||
|
# Fixed length encoding
|
||||||
|
|
||||||
|
consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width.
|
||||||
|
|
||||||
|
# Huffman Trees
|
||||||
|
|
||||||
|
We create a tree of character frequencies where each node basically has a character and that character's frequency.
|
||||||
|
|
||||||
|
```
|
||||||
|
struct Node {
|
||||||
|
uint8_t c;
|
||||||
|
size_t frequency;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules of thumb:
|
||||||
|
* The more frequent characters are close to the root
|
||||||
|
* Less frequent characters are found far from the root
|
||||||
|
|
||||||
|
We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree.
|
||||||
|
|
||||||
|
General decoding process goes like this:
|
||||||
|
|
||||||
|
1. Get frequencies of all symbols
|
||||||
|
2. Put those frequenncies of symbols into a structure like above
|
||||||
|
3. Build max heap of the node set
|
||||||
|
Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1
|
||||||
|
4. When we reach a leaf we drop that char into the result
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user