lec2

Binary Bits & Bytes

Binary Notation 0b...

Typically we see 0b but sometimes like in many x86 assemblers we'll see ...b to denote some bit string.

Most typically we deal with binary(when we do) in nibbles or 4 bit chunks which then grouped into 2 groups of 4 to build up a byte. Ex:0101 1100 is a basic random byte. For most sane solutions this is essentially the only way we ever deal with binary.

Why can't we (((save bits))) and not use nibbles?

In truth you can totally do that; but not really. To explain let's look at some higher level C/C++ code; say you had this structure:

struct Point {
	int x; // specifying width for clarity sake
	int y;
	unsigned int valid : 1;
};

On a typical x86 system(and many x64 systems) with no compile time optimizations this structure might look like:

32(int x) + 32(int y) + 1(unsigned int valid) + 7(bits of padding)

Why? Because while we can always calculate the address of a particular byte's address in memory we cant' or rather don't even try to do the same for bits. The reason is simple: a 32bit CPU can calulate any number inclusively between 0 and 0xffffffff or 4294967295. That means we have an entropy pool large enough to have 1 number per byte but not enough to include the bits as well.

If we use that valid bit-field in our code later like

if(point_ref->valid) {
	/* do stuff */
}

The machine code instructions generated will really just check if that byte(which contains the bit we care about) is a non-zero value.

If the bit is set we have (for example) 0b0000 0001 thus a true value.

Two's Complement - aka Negate

To find the Negation of any bit-string:

i.e. 3 * -1=> -3

Flip all bits in the bit-string
Add 1 to the bitstring

The case for 3:

start off: 0011 => 3

flip bits: 1100 => -2

add one: 1101 => -3

Signedness

Why?

Because this matters for dealing with signed and unsigned values. No it doesn't mean positive and negative numbers. Say we have 4 bytes to mess with. This means we have a range of 0000 to 1111. If we wanted purely positive numbers in this range we could have 0000 to 1111... or 0 to 15. If we needed negative representation however, we have to sacrifice some of our range. Our new unsigned range is then 0-7 or in binary: 0000 - 0111. We say unsigned for this range because the largest number we can represent without setting the first bit is 0111 => 7. Our negative range is then -8 -> -1 which in binary is 0b1000 -> 0b1111

Intro to hex

Hex Notation 0x...

x86 assemblersi(masm) will typically accept ...h as a postfix notation.

More convinient than binary for obvious reasons; namely it doesn't look like spaghetti on the screen.

Our 4-bit range from earlier {0000-1111} now becomes {00-ff}. More pedantically our new hex range is 0x00 to 0xff.

Binary mapped

It happens that 1 nibble makes up 0x0 to 0xF. So for now just get used to converting {0000-1111} to one of it's respective values in hex and eventually it should be second nature. Then just move on to using hex(like immediately after these lessons), because writing actual binary is actually awful.

Dude trust me hex is way better to read than decimal

It may seem convenient at first but after a while you'll realized that hex has really easy to understand uses and makes this super clear + concise, especially when dealing with bit masks and bitsets.

Ascii in Hex Dumps

Kind of a side note but most ascii text values range from 0x21 to 0x66 so if you're looking for text in a binary look for groupings of that value.

32 v 64 bit

In case you come from an x86_64 ish background know that in MIPS terminology changes a bit(bun intended).

x86 byte = mips byte

x86 word = mips half word

x86 dword = mips word

x86/64 qword = mips mips dword

So just keep those translations in mind...

3.8 KiB Raw Permalink Blame History