nearly done with lecture 1 rewrie

2019-09-25 18:49:13 -07:00
parent 2a5b249a5c
commit 63bb0b961d
2 changed files with 146 additions and 7 deletions
--- a/337/lec/lec1.md
+++ b/337/lec/lec1.md
@@ -4,13 +4,152 @@
 The first lecture has bee 50% syllabus 25% videos, 25% simple terminology; expect nothing interesting for this section
-## Performace Options 
+## General Performance Improvements in software
 In general we have a few options to increase performace in software; pipelining, parallelism, prediction.
-Parallelism/Pipelining 
+1. Parallelism
-* I'll just assume you know what this entail; one does many things at once; the other is like queues for processessing.
+If we have multiple tasks to accomplish or multiple sources of data we might instead find it better to work on multiple things at once[e.g. multi-threading, multi-core rendering]
-* Prediction 
+2. Pipelining
-Yes this means interpreting potential outcomes/inputs/outputs etc. __BRANCHING__. We try to predict potentiality and account for it ahead of time.
+Here we are somehow taking _data_ and serializing it into a linear form.
 We do things like this because it could make sense to things linearly[e.g. taking data from a website response and forming it into a struct/class instance in C++/Java et al.].
 3. Prediction 
 If we can predict an outcome to avoid a bunch of computation then it could be worth to take our prediction and proceed with that instead of the former.
 This happens **a lot** in cpu's where they use what's called [branch prediction](https://danluu.com/branch-prediction/) to run even faster.
 ## Cost of Such Improvements
 As the saying goes: every decision you make as an engineer ultimately has a cost, let's look at the cost of these improvements.
 1. Parallelism
 If we have a data set which has some form of inter-dependencies between its members then we could easily run into the issue of waiting on other things to finish.
 Contrived Example: 
 ```
 Premise: output file contents -> search lines for some text -> sort the resulting lines
 We have to do the following processes:
 print my-file.data 
 search file
 sort results of the search
 In bash we might do: cat my-file.data | grep 'Text to search for' | sort
 ```
 Parallelism doesn't make sense here for one reason: this series of proccesses don't benefit from parallelism because the 2nd and 3rd tasks _must_ wait until the previous ones finish first.
 2. Pipelining
 Let's say we want to do the following:
 ```
 Search file1 for some text : [search file1] 
 Feed the results of the search into a sorting program [sort]
 Search file2 for some text  [search file2]
 Feed the results of the search into a reverse sorting program [reverse sort]
 The resulting Directed Acyclic Graph looks like
 [search file1] => [sort]
 [search file2] => [reverse sort]
 ```
 Making the above linear means we effectively have to:
 ```
 [search file1] => [sort] [search file2] => [reverse sort]
 | proc2 waiting........| 
 ```
 Which wastes a lot of time if the previous process is going to take a long time.
 Bonus points if process 2 is extremely short.
 3. Prediction
 Ok two things up front:
 * First: prediction's fault is that we could be wrong and have to end up doing hard computations.
 * Second: _this course never covers branch prediction(something that pretty much every cpu in the last 20 years out there does)_ so I'm gonna cover it here; ready, let's go.
 For starters let's say a basic cpu takes instructions sequentially in memory: `A B C D`.
 However this is kinda slow because there is _time_ between getting instructions, decoding it to know what instruction it is and finally executing it proper.
 For this reason modern CPU's actually fetch, decode, and execute(and more!) instructions all at the same time.
 Instead of getting instructions like this:
 ```
 0
 AA
   BB
     CC
       DD 
 ```
 We actually do something more like this
 ```
 A
 AB
   BC
     CD
 	   D0
 ```
 If it doesn't seem like much remember this is half an instruction on a chip that is likely going to process thousands/millions of instructions so the savings scales really well.
 This scheme is fine if our instructions are all coming one after the other in memory, but if we need to branch then we likely need to jump to a new location like so.
 ```
 ABCDEFGHIJKL
 ^^^*     ^
   |-----|
 ```
 Now say we have the following code:
 ```
 if (x == 123) {
 	main_call();
 }
 else {
 	alternate_call();
 }
 ```
 The (psuedo)assembly might look like
 ```asm
 	cmp x, 123 
 	je second
 main_branch:	; pointless label but nice for reading
 	call main_call
 	jmp end
 second:
 	call alternate_call
 end:
 	; something to do here
 ```
 Our problem comes when we hit the je. 
 Once we've loaded that instruction and can start executing it, we have to make a decision, load the `call main_call` instruction or the `call alternate_call`?
 Chances are that if we guess we have a 50% change of saving time and 50% chance of tossing out our guess and starting the whole _get instruction => decode etc._ process over again from scratch.
 Solution 1: 
 Try do determine what branches are taken prior to running the program and just always guess the more likely branches.
 If we find that the above branch calls `main_branch` more often then we should load that branch always; knowing that the loss from being wrong is offset by the gain from the statistically more often correct branches.
 ...
--- a/337/lec/lec10.md
+++ b/337/lec/lec10.md
@@ -18,12 +18,12 @@ _Try to do this on your own first!_
 ![fig1](../mg/fig1llec11.png)
 Next we'll add on the `xor`.
-AGAIN: try to do this on your own, the main hint I'll give here is: the current mux needs to be changed.
+Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux.
 ![fig2](../img/fig2lec11.png)
 Finally we'll add the ability to add and subtract. 
-You may have also noted that we can subtract two things to see if they are the same dhowever, we can also `not` the result of the `xor` and get the same result.
+You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result.
 ![fig3](../img/fig3lec11.png)