diff --git a/370/notes/huffman.md b/370/notes/huffman.md new file mode 100644 index 0000000..e539e08 --- /dev/null +++ b/370/notes/huffman.md @@ -0,0 +1,35 @@ +# Huffman codes + +Covering: Fixed length encoding & Variable length encoding + +# Fixed length encoding + +consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width. + +# Huffman Trees + +We create a tree of character frequencies where each node basically has a character and that character's frequency. + +``` +struct Node { + uint8_t c; + size_t frequency; + ... +}; +``` + +Rules of thumb: + * The more frequent characters are close to the root + * Less frequent characters are found far from the root + +We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree. + +General decoding process goes like this: + + 1. Get frequencies of all symbols + 2. Put those frequenncies of symbols into a structure like above + 3. Build max heap of the node set + Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1 + 4. When we reach a leaf we drop that char into the result + +