From 9b21944cf6647b02bdaeecec7d8f7addb8b2c5d8 Mon Sep 17 00:00:00 2001 From: Medium Fries Date: Thu, 28 Mar 2019 14:51:54 -0700 Subject: [PATCH] huffman notes --- 370/notes/huffman.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 370/notes/huffman.md diff --git a/370/notes/huffman.md b/370/notes/huffman.md new file mode 100644 index 0000000..e539e08 --- /dev/null +++ b/370/notes/huffman.md @@ -0,0 +1,35 @@ +# Huffman codes + +Covering: Fixed length encoding & Variable length encoding + +# Fixed length encoding + +consider ascii or unicode, where each symbol is either 8-bit or 16-bits in width. + +# Huffman Trees + +We create a tree of character frequencies where each node basically has a character and that character's frequency. + +``` +struct Node { + uint8_t c; + size_t frequency; + ... +}; +``` + +Rules of thumb: + * The more frequent characters are close to the root + * Less frequent characters are found far from the root + +We'll end up with a list of nodes which we can throw into a maxheap to build our huffman tree. + +General decoding process goes like this: + + 1. Get frequencies of all symbols + 2. Put those frequenncies of symbols into a structure like above + 3. Build max heap of the node set + Keep in mind however that our root should be agnostic so that we can start bit strings with 0|1 + 4. When we reach a leaf we drop that char into the result + +