csnotes/337/lec/lec2.md

117 lines
3.8 KiB
Markdown

# lec2
## Binary Bits & Bytes
> Binary Notation 0b...
Typically we see `0b` but sometimes like in many x86 assemblers we'll see `...b` to denote some bit string.
Most typically we deal with binary(when we do) in nibbles or 4 _bit_ chunks which then grouped into 2 groups of 4 to build up a byte.
Ex:`0101 1100` is a basic random byte.
For most sane solutions this is essentially the only way we __ever__ deal with binary.
> Why can't we (((save bits))) and not use nibbles?
In truth you can totally do that; but not really.
To explain let's look at some higher level C/C++ code; say you had this structure:
```
struct Point {
int x; // specifying width for clarity sake
int y;
unsigned int valid : 1;
};
```
On a typical x86 system(and many x64 systems) with no compile time optimizations this structure might look like:
```
32(int x) + 32(int y) + 1(unsigned int valid) + 7(bits of padding)
```
Why? Because while we can always calculate the address of a particular byte's address in memory we cant' or rather don't even try to do the same for bits.
The reason is simple: a 32bit CPU can calulate any number inclusively between `0` and `0xffffffff` or `4294967295`. That means we have an entropy pool large enough to have 1 number per byte but not enough to include the bits as well.
If we use that `valid` _bit-field_ in our code later like
```
if(point_ref->valid) {
/* do stuff */
}
```
The machine code instructions generated will really just check if that byte(which contains the bit we care about) is a non-zero value.
If the bit is set we have (for example) `0b0000 0001` thus a _true_ value.
## Two's Complement - aka Negate
To find the Negation of any bit-string:
i.e. `3 * -1=> -3`
1. Flip all bits in the bit-string
2. Add 1 to the bitstring
The case for 3:
```
start off: 0011 => 3
flip bits: 1100 => -2
add one: 1101 => -3
```
### Signedness
> Why?
Because this matters for dealing with `signed` and `unsigned` values. _No it doesn't mean positive and negative numbers._
Say we have 4 bytes to mess with. This means we have a range of 0000 to 1111. If we wanted purely positive numbers in this range we could have 0000 to 1111... or 0 to 15.
If we needed negative representation however, we have to sacrifice some of our range.
Our new unsigned range is then `0-7` _or in binary_: `0000 - 0111`. We say unsigned for this range because the largest number we can represent without setting the first bit is `0111` => `7`.
Our negative range is then `-8 -> -1` which in binary is `0b1000 -> 0b1111`
## Intro to hex
> Hex Notation 0x...
x86 assemblersi(masm) will typically accept `...h` as a postfix notation.
More convinient than binary for obvious reasons; namely it doesn't look like spaghetti on the screen.
Our 4-bit range from earlier {0000-1111} now becomes {00-ff}.
More pedantically our new hex range is 0x00 to 0xff.
> Binary mapped
It happens that 1 nibble makes up 0x0 to 0xF.
So for now just get used to converting {0000-1111} to one of it's respective values in hex and eventually it should be second nature.
Then just move on to using hex(like immediately after these lessons), because writing actual binary is actually awful.
> Dude trust me hex is way better to read than decimal
It may seem convenient at first but after a while you'll realized that hex has really easy to understand uses and makes this super clear + concise, especially when dealing with bit masks and bitsets.
> Ascii in Hex Dumps
Kind of a side note but most ascii text values range from 0x21 to 0x66 so if you're looking for text in a binary look for groupings of that value.
## 32 v 64 bit
In case you come from an x86_64 ish background know that in MIPS terminology changes a bit(bun intended).
> x86 byte = mips byte
> x86 word = mips half word
> x86 dword = mips word
> x86/64 qword = mips mips dword
So just keep those translations in mind...