diff --git a/337/lec/lec1.md b/337/lec/lec1.md index 42a8820..75fc202 100644 --- a/337/lec/lec1.md +++ b/337/lec/lec1.md @@ -4,13 +4,152 @@ The first lecture has bee 50% syllabus 25% videos, 25% simple terminology; expect nothing interesting for this section -## Performace Options +## General Performance Improvements in software + + In general we have a few options to increase performace in software; pipelining, parallelism, prediction. -Parallelism/Pipelining +1. Parallelism -* I'll just assume you know what this entail; one does many things at once; the other is like queues for processessing. +If we have multiple tasks to accomplish or multiple sources of data we might instead find it better to work on multiple things at once[e.g. multi-threading, multi-core rendering] -* Prediction +2. Pipelining -Yes this means interpreting potential outcomes/inputs/outputs etc. __BRANCHING__. We try to predict potentiality and account for it ahead of time. +Here we are somehow taking _data_ and serializing it into a linear form. +We do things like this because it could make sense to things linearly[e.g. taking data from a website response and forming it into a struct/class instance in C++/Java et al.]. + +3. Prediction + +If we can predict an outcome to avoid a bunch of computation then it could be worth to take our prediction and proceed with that instead of the former. +This happens **a lot** in cpu's where they use what's called [branch prediction](https://danluu.com/branch-prediction/) to run even faster. + +## Cost of Such Improvements + +As the saying goes: every decision you make as an engineer ultimately has a cost, let's look at the cost of these improvements. + +1. Parallelism + +If we have a data set which has some form of inter-dependencies between its members then we could easily run into the issue of waiting on other things to finish. + +Contrived Example: + +``` +Premise: output file contents -> search lines for some text -> sort the resulting lines + +We have to do the following processes: +print my-file.data +search file +sort results of the search + +In bash we might do: cat my-file.data | grep 'Text to search for' | sort +``` + +Parallelism doesn't make sense here for one reason: this series of proccesses don't benefit from parallelism because the 2nd and 3rd tasks _must_ wait until the previous ones finish first. + +2. Pipelining + +Let's say we want to do the following: + +``` +Search file1 for some text : [search file1] +Feed the results of the search into a sorting program [sort] + +Search file2 for some text [search file2] +Feed the results of the search into a reverse sorting program [reverse sort] + +The resulting Directed Acyclic Graph looks like + +[search file1] => [sort] + +[search file2] => [reverse sort] +``` + +Making the above linear means we effectively have to: + +``` +[search file1] => [sort] [search file2] => [reverse sort] +| proc2 waiting........| +``` + +Which wastes a lot of time if the previous process is going to take a long time. +Bonus points if process 2 is extremely short. + + +3. Prediction + +Ok two things up front: + +* First: prediction's fault is that we could be wrong and have to end up doing hard computations. +* Second: _this course never covers branch prediction(something that pretty much every cpu in the last 20 years out there does)_ so I'm gonna cover it here; ready, let's go. + +For starters let's say a basic cpu takes instructions sequentially in memory: `A B C D`. +However this is kinda slow because there is _time_ between getting instructions, decoding it to know what instruction it is and finally executing it proper. +For this reason modern CPU's actually fetch, decode, and execute(and more!) instructions all at the same time. + +Instead of getting instructions like this: + + +``` +0 + AA + BB + CC + DD +``` + +We actually do something more like this + +``` +A + AB + BC + CD + D0 +``` + +If it doesn't seem like much remember this is half an instruction on a chip that is likely going to process thousands/millions of instructions so the savings scales really well. + + +This scheme is fine if our instructions are all coming one after the other in memory, but if we need to branch then we likely need to jump to a new location like so. + +``` +ABCDEFGHIJKL +^^^* ^ + |-----| +``` + +Now say we have the following code: + +``` +if (x == 123) { + main_call(); +} +else { + alternate_call(); +} +``` + +The (psuedo)assembly might look like + +```asm + cmp x, 123 + je second +main_branch: ; pointless label but nice for reading + call main_call + jmp end +second: + call alternate_call +end: + ; something to do here +``` + +Our problem comes when we hit the je. +Once we've loaded that instruction and can start executing it, we have to make a decision, load the `call main_call` instruction or the `call alternate_call`? +Chances are that if we guess we have a 50% change of saving time and 50% chance of tossing out our guess and starting the whole _get instruction => decode etc._ process over again from scratch. + +Solution 1: + +Try do determine what branches are taken prior to running the program and just always guess the more likely branches. +If we find that the above branch calls `main_branch` more often then we should load that branch always; knowing that the loss from being wrong is offset by the gain from the statistically more often correct branches. + +... diff --git a/337/lec/lec10.md b/337/lec/lec10.md index 5425808..0908e42 100644 --- a/337/lec/lec10.md +++ b/337/lec/lec10.md @@ -18,12 +18,12 @@ _Try to do this on your own first!_ ![fig1](../mg/fig1llec11.png) Next we'll add on the `xor`. -AGAIN: try to do this on your own, the main hint I'll give here is: the current mux needs to be changed. +Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux. ![fig2](../img/fig2lec11.png) Finally we'll add the ability to add and subtract. -You may have also noted that we can subtract two things to see if they are the same dhowever, we can also `not` the result of the `xor` and get the same result. +You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result. ![fig3](../img/fig3lec11.png)