more updated content to display on the new site

This commit is contained in:
shockrah
2020-07-05 17:18:01 -07:00
parent 62bcfa79b3
commit ec31274f14
6 changed files with 317 additions and 272 deletions

View File

@@ -1,41 +1,169 @@
# lec11
lec1
====
At this point I'l mention that just reading isn't going to get you anywhere, you have to try things, and give it a real earnest attempt.
> What on earth?
__ALU:__ Arithmetic Logic Unit
The first lecture has bee 50% syllabus 25% videos, 25% simple
terminology; expect nothing interesting for this section
## Building a 1-bit ALU
General Performance Improvements in software
--------------------------------------------
![fig0](../img/alu.png)
In general we have a few options to increase performace in software;
pipelining, parallelism, prediction.
First we'll create an example _ALU_ which implements choosing between an `and`, `or`, `xor`, or `add`.
Whether or not our amazing _ALU_ is useful doesn't matter so we'll go one function at a time(besides `and/or`).
1. Parallelism
First recognize that we need to choose between `and` or `or` against our two inputs A/B.
This means we have two inputs and/or, and we need to select between them.
_Try to do this on your own first!_
If we have multiple tasks to accomplish or multiple sources of data we
might instead find it better to work on multiple things at
once\[e.g. multi-threading, multi-core rendering\]
![fig1](../mg/fig1llec11.png)
2. Pipelining
Next we'll add on the `xor`.
Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux.
Here we are somehow taking *data* and serializing it into a linear form.
We do things like this because it could make sense to things
linearly\[e.g. taking data from a website response and forming it into a
struct/class instance in C++/Java et al.\].
![fig2](../img/fig2lec11.png)
3. Prediction
Finally we'll add the ability to add and subtract.
You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result.
If we can predict an outcome to avoid a bunch of computation then it
could be worth to take our prediction and proceed with that instead of
the former. This happens **a lot** in cpu's where they use what's called
[branch prediction](https://danluu.com/branch-prediction/) to run even
faster.
![fig3](../img/fig3lec11.png)
Cost of Such Improvements
-------------------------
At this point our _ALU_ can `and`, `or`, `xor`, and `add`/`sub`.
The mux will choose one which logic block to use; the carry-in line will tell the `add` logic block whether to add or subtract.
Finally the A-invert and B-invert line allow us to determine if we want to invert either A or B (inputs).
As the saying goes: every decision you make as an engineer ultimately
has a cost, let's look at the cost of these improvements.
## N-bit ALU
1. Parallelism
For sanity we'll use the following block for our new ALU.
If we have a data set which has some form of inter-dependencies between
its members then we could easily run into the issue of waiting on other
things to finish.
![fig4](../img/fig4lec11.png)
Contrived Example:
Note that we are chaining the carry-in's to the carry-out's just like a ripple adder.
also each ALU just works with `1` bit from our given 4-bit input.
Premise: output file contents -> search lines for some text -> sort the resulting lines
We have to do the following processes:
print my-file.data
search file
sort results of the search
In bash we might do: cat my-file.data | grep 'Text to search for' | sort
Parallelism doesn't make sense here for one reason: this series of
proccesses don't benefit from parallelism because the 2nd and 3rd tasks
*must* wait until the previous ones finish first.
2. Pipelining
Let's say we want to do the following:
Search file1 for some text : [search file1]
Feed the results of the search into a sorting program [sort]
Search file2 for some text [search file2]
Feed the results of the search into a reverse sorting program [reverse sort]
The resulting Directed Acyclic Graph looks like
[search file1] => [sort]
[search file2] => [reverse sort]
Making the above linear means we effectively have to:
[search file1] => [sort] [search file2] => [reverse sort]
| proc2 waiting........|
Which wastes a lot of time if the previous process is going to take a
long time. Bonus points if process 2 is extremely short.
3. Prediction
Ok two things up front:
- First: prediction's fault is that we could be wrong and have to end
up doing hard computations.
- Second: *this course never covers branch prediction(something that
pretty much every cpu in the last 20 years out there does)* so I'm
gonna cover it here; ready, let's go.
For starters let's say a basic cpu takes instructions sequentially in
memory: `A B C D`. However this is kinda slow because there is *time*
between getting instructions, decoding it to know what instruction it is
and finally executing it proper. For this reason modern CPU's actually
fetch, decode, and execute(and more!) instructions all at the same time.
Instead of getting instructions like this:
0
AA
BB
CC
DD
We actually do something more like this
A
AB
BC
CD
D0
If it doesn't seem like much remember this is half an instruction on a
chip that is likely going to process thousands/millions of instructions
so the savings scales really well.
This scheme is fine if our instructions are all coming one after the
other in memory, but if we need to branch then we likely need to jump to
a new location like so.
ABCDEFGHIJKL
^^^* ^
|-----|
Now say we have the following code:
if (x == 123) {
main_call();
}
else {
alternate_call();
}
The (psuedo)assembly might look like
``` {.asm}
cmp x, 123
je second
main_branch: ; pointless label but nice for reading
call main_call
jmp end
second:
call alternate_call
end:
; something to do here
```
Our problem comes when we hit the je. Once we've loaded that instruction
and can start executing it, we have to make a decision, load the
`call main_call` instruction or the `call alternate_call`? Chances are
that if we guess we have a 50% change of saving time and 50% chance of
tossing out our guess and starting the whole *get instruction =\> decode
etc.* process over again from scratch.
Solution 1:
Try do determine what branches are taken prior to running the program
and just always guess the more likely branches. If we find that the
above branch calls `main_branch` more often then we should load that
branch always; knowing that the loss from being wrong is offset by the
gain from the statistically more often correct branches.
...