more updated content to display on the new site

2020-07-05 17:18:01 -07:00 · 2020-07-05 17:18:01 -07:00 · ec31274f14
commit ec31274f14
parent 62bcfa79b3
6 changed files with 317 additions and 272 deletions
--- a/311/lec/lec10.md
+++ b/311/lec/lec10.md
@ -1,81 +1,51 @@
-# lec10
+lec1 
 =====
-## TCP Structue
+First we'll define some terminology.
-Sequence Numbers:
+> Hosts
 * byte stream _number_ of first byte in segment's data
-ACKS's: 
+End systems - typically don't bother with routing data through a network
 * seq # of next byte expected from other side
-Example: 
+> Communication Links
 ```
 host a: user sends 'c'
 	seq=42, ack=79, data='c'
 host b: ACK recepit send to host a(echo's back ''c')
 	seq=72, ack=49, data='c'	; data sent back from host b
 ```
-### Round trip time
+Typically the actual systems that connect things together.
-EstimatedRTT= (1-\alpha)*EstimatedRTT + \alpha*SampleRTT
+Network edges
 -------------
-> Lot's of stuff missing here
+Can be subdivided clients & servers and sometimes both at the same time.
-## TCP Reliable data transfer
+Access network: cable network
-
+-----------------------------
 Implements:
 * Pipeplined segments
 * cumulative `ACK`
 	* This just means that we assume that the highest sequenced ACK also means the previous segments have been received properly too
 * Single transmission timer
 ### Sender Events
 1. First create segment w/ seq no.
 	a. Sequence number refers to byte in 
 2. Start timer if we don't already have one.
 	a. Timer based off oldest UN-ACKED segment
 ## Retransmission w/ TCP
 __Timout__: Usually it's pretty long so if there is a timeout on a packet.
 When this happens the receiver responds to sender with 3 ACK's for the last well received segment:
 Receiver gets `1 2 3 5` but not `4`. We respond with the ACK for `1` like normal, then 3 ACK's for `1` is sent to the sender before the time out and we start re-sending from `2`.
 This is what we call _fast retransmit_.
 _The main thing here is that the receiver controls the sender's "send rate" so that the receiver doesn't get inundated._
 Receiver will _advertise_ free buffer space in including `rwnd` value in TCP header.
 This just tells the sender how much space is available to accept at a time.
 Example: Transferring a large file from host to host.
 \alpha will send a file to \beta.
 Alpha sends some file data to \beta, who then ACK's the packet but includes in the header that their buffer is full.
 \alpha responds with a 1 byte packet to keep the connection alive.
 ## Connection Management
 Before sender/receiver start exchanging anything we must perform a `handshake`.
 `SYN` is a special packet type under TCP which we can use to synchronize both client and server.
 ### Closing 
 `FIN` bit inside the header.
 We send this off to a receiver and we enter a `close_wait` state.
 We only wait because there might be more data.
 Receiver enters the `close_wait` state as well, _but_, still sends any data left over.
 Once the last `ACK` is sent we send a `FIN` packet
 Typically when have to share one line we can change the frequency of the
 signal as one method to provide a distinguishment between different data
 which may sometimes come from different sources.
 ### Home Network
 Let's start with the modem. All it does it take some signla and convert
 it to the proper IEEE data format(citation needed).
 Typically we would then pipe that data to a router which, given a
 scenario for most houses, would forward that input data to whichever
 machines requested the data.
 If you recall back to your discrete mathematics coursework various graph
 topologies were covered and you likely noted that *star* topologies were
 common for businesses since it makes it easist to send data from one
 outside node on the star to another. In practice this would just mean
 having the router/modem setup be one of the apendages of the star and
 switch be in the middle so that the data only has to make two hops to
 get anywhere in the network.
 > Doesn't that mean theres one node that could bring the whole network
 > down at any time?
 Absolutely, which is why if you have a *very* small network with a
 couple devices it's not really a problem but if you have an office full
 of employees all with their own machines and wireless, printers,
 servers, etc. then it's a huge problem. That's why typically a small
 business or shop might be more inclined to use such a setup because: \*
 It's easy to setup \* It's cheap to maintain
--- a/312/ciphers.md
+++ b/312/ciphers.md
@ -1,77 +1,32 @@
-# Block Ciphers
+Active v Passive Attacks
 ========================
-The main concept here is twofold:
+Base Definitions
 ----------------
-* we take _blocks_ of data and cipher the _blocks_
+Passive: compromising a system but not necessarily doing anything apart
-* A given key is actually used to generate recursive keys to be further used on the data itself
+from *watching*
 Active: compromising a system while doing something to the system apart
 from infiltrating it
-_bs example ahead_
+Loosely speaking
 ----------------
-Say we have a key 7 and some data 123456.
+*Passive* can be just like listening in on a conversation(eavesdropping)
-We take the whole data set and chunk it into blocks(for example): 12 34 56.
+where *active* is like jumping into the conversation and trying to do
 something to it.
-Let's say our function here is to just add 7 to each block so we do the first step: 
+When/How would either happen?
 -----------------------------
-```
+If the result of an attack is to actually trigger some code to run then
-12 + 7 = 19
+usually we need to first gather the information required to understand
-Unlike other ciphers we don't reuse 7; instead we use the new thing as both the new key and part of our cipher text
+how to make that happen. The reasoning is straightforward: if you don't
 know how some system works then it's much harder to exploit that system.
-19 + 34 = 53
+Random example: Using a keylogger to log keystroke before sending those
-Cipher: 1953..
+logs to a server for processing could be a passive attack since you're
-
+still in a *gathering data* sort of mode. Finally using that data to
-53 + 56 = 109 <= let's pretend that this rolls over 99 and back to 00
+trying logging into some service would be the active portion of a
-          09  <= like this
+full-scale attack.
 Final cipher: 195309
 ```
 _It should be noted that in practice these functions usually take in huge keys and blocks_.
 > Deciphering
 Start from the back of the cipher not the front; if we used and xor function scheme (which is a symmetrical function) we would simply just xor the last block by itself and thus perform the same encryption scheme but in reverse.
 Example::Encryption
 ```
 Key: 110
 Function scheme: xor
 Data: 101 001 111
 101 011 010
 110 001 111
 011 010 101 <= encrypted
 ```
 Example::Decryption
 ```
 Ciphered: 011 010 101
 Function scheme: xor
 ...
 ```
 # Feistal Cipher
 Two main components:
 1. each _thing_ in the data to cipher is replaced by a _ciphered thing_
 2. nothing is added or deleted or replaced in sequence, instead the order of _things_ is changed.
 Basically imagine that every _type of thing_ in our data maps to some other _type of thing/thing_ in the data and thus become swapped/reordered.
 # DES - Data Encryption Standard
 Widely used until about 2001 when AES surpassed it as the newer(ish(kinda)) standard.
 DEA was the actual algorithm tho:
 * 64 bit blocks
 * 56 bit keys
 * turns a 64-bit input into a 64-bit output (wew)
 * Steps in reverse also reverse the encryption itself
--- a/337/lec/lec10.md
+++ b/337/lec/lec10.md
@ -1,41 +1,169 @@
-# lec11
+lec1
 ====
-At this point I'l mention that just reading isn't going to get you anywhere, you have to try things, and give it a real earnest attempt.
+> What on earth?
-__ALU:__ Arithmetic Logic Unit
+The first lecture has bee 50% syllabus 25% videos, 25% simple
 terminology; expect nothing interesting for this section
-## Building a 1-bit ALU 
+General Performance Improvements in software
 --------------------------------------------
-![fig0](../img/alu.png)
+In general we have a few options to increase performace in software;
 pipelining, parallelism, prediction.
-First we'll create an example _ALU_ which implements choosing between an `and`, `or`, `xor`, or `add`.
+1.  Parallelism
 Whether or not our amazing _ALU_ is useful doesn't matter so we'll go one function at a time(besides `and/or`).
-First recognize that we need to choose between `and` or `or` against our two inputs A/B.
+If we have multiple tasks to accomplish or multiple sources of data we
-This means we have two inputs and/or, and we need to select between them.
+might instead find it better to work on multiple things at
-_Try to do this on your own first!_
+once\[e.g. multi-threading, multi-core rendering\]
-![fig1](../mg/fig1llec11.png)
+2.  Pipelining
-Next we'll add on the `xor`.
+Here we are somehow taking *data* and serializing it into a linear form.
-Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux.
+We do things like this because it could make sense to things
 linearly\[e.g. taking data from a website response and forming it into a
 struct/class instance in C++/Java et al.\].
-![fig2](../img/fig2lec11.png)
+3.  Prediction
-Finally we'll add the ability to add and subtract. 
+If we can predict an outcome to avoid a bunch of computation then it
-You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result.
+could be worth to take our prediction and proceed with that instead of
 the former. This happens **a lot** in cpu's where they use what's called
 [branch prediction](https://danluu.com/branch-prediction/) to run even
 faster.
-![fig3](../img/fig3lec11.png)
+Cost of Such Improvements
 -------------------------
-At this point our _ALU_ can `and`, `or`, `xor`, and `add`/`sub`.
+As the saying goes: every decision you make as an engineer ultimately
-The mux will choose one which logic block to use; the carry-in line will tell the `add` logic block whether to add or subtract.
+has a cost, let's look at the cost of these improvements.
 Finally the A-invert and B-invert line allow us to determine if we want to invert either A or B (inputs).
-## N-bit ALU
+1.  Parallelism
-For sanity we'll use the following block for our new ALU.
+If we have a data set which has some form of inter-dependencies between
 its members then we could easily run into the issue of waiting on other
 things to finish.
-![fig4](../img/fig4lec11.png)
+Contrived Example:
-Note that we are chaining the carry-in's to the carry-out's just like a ripple adder.
+    Premise: output file contents -> search lines for some text -> sort the resulting lines
-also each ALU just works with `1` bit from our given 4-bit input.
+
    We have to do the following processes:
    print my-file.data 
    search file
    sort results of the search
    In bash we might do: cat my-file.data | grep 'Text to search for' | sort
 Parallelism doesn't make sense here for one reason: this series of
 proccesses don't benefit from parallelism because the 2nd and 3rd tasks
 *must* wait until the previous ones finish first.
 2.  Pipelining
 Let's say we want to do the following:
    Search file1 for some text : [search file1] 
    Feed the results of the search into a sorting program [sort]
    Search file2 for some text  [search file2]
    Feed the results of the search into a reverse sorting program [reverse sort]
    The resulting Directed Acyclic Graph looks like
    [search file1] => [sort]
    [search file2] => [reverse sort]
 Making the above linear means we effectively have to:
    [search file1] => [sort] [search file2] => [reverse sort]
    | proc2 waiting........| 
 Which wastes a lot of time if the previous process is going to take a
 long time. Bonus points if process 2 is extremely short.
 3.  Prediction
 Ok two things up front:
 -   First: prediction's fault is that we could be wrong and have to end
    up doing hard computations.
 -   Second: *this course never covers branch prediction(something that
    pretty much every cpu in the last 20 years out there does)* so I'm
    gonna cover it here; ready, let's go.
 For starters let's say a basic cpu takes instructions sequentially in
 memory: `A B C D`. However this is kinda slow because there is *time*
 between getting instructions, decoding it to know what instruction it is
 and finally executing it proper. For this reason modern CPU's actually
 fetch, decode, and execute(and more!) instructions all at the same time.
 Instead of getting instructions like this:
    0
     AA
       BB
         CC
           DD 
 We actually do something more like this
    A
     AB
       BC
         CD
           D0
 If it doesn't seem like much remember this is half an instruction on a
 chip that is likely going to process thousands/millions of instructions
 so the savings scales really well.
 This scheme is fine if our instructions are all coming one after the
 other in memory, but if we need to branch then we likely need to jump to
 a new location like so.
    ABCDEFGHIJKL
    ^^^*     ^
       |-----|
 Now say we have the following code:
    if (x == 123) {
        main_call();
    }
    else {
        alternate_call();
    }
 The (psuedo)assembly might look like
 ``` {.asm}
    cmp x, 123 
    je second
 main_branch:    ; pointless label but nice for reading
    call main_call
    jmp end
 second:
    call alternate_call
 end:
    ; something to do here
 ```
 Our problem comes when we hit the je. Once we've loaded that instruction
 and can start executing it, we have to make a decision, load the
 `call main_call` instruction or the `call alternate_call`? Chances are
 that if we guess we have a 50% change of saving time and 50% chance of
 tossing out our guess and starting the whole *get instruction =\> decode
 etc.* process over again from scratch.
 Solution 1:
 Try do determine what branches are taken prior to running the program
 and just always guess the more likely branches. If we find that the
 above branch calls `main_branch` more often then we should load that
 branch always; knowing that the loss from being wrong is offset by the
 gain from the statistically more often correct branches.
 ...
--- a/363/lec/lec10.md
+++ b/363/lec/lec10.md
@ -1,43 +1,47 @@
-# lec10
+lec1
 ====
-This lecture has a corresponding lab excercise who's instructions can be found in `triggers-lab.pdf`.
+Databases introduction
 ----------------------
-## What is a trigger
+First off why do we even need a database and what do they accomplish?
-Something that executes when _some operation_ is performed
+Generally a databse will have 3 core elements to it:
-## Structure 
+1.  querying
    -   Finding things
        -   Just as well structured data makes querying easier
 2.  access control
    -   who can access which data segments and what they can do with
        that data
        -   reading, writing, sending, etc
 3.  corruption prevention
    -   mirroring/raid/parity checking/checksums/etc as some examples
-```
+Modeling Data
-create trigger NAME before some_operation
+-------------
 when(condition)
 begin
 	do_something
 end;
 ```
-To explain: First we `create trigger` followed by some trigger name.
+Just like other data problems we can choose what model we use to deal
-Then we have to denote that this trigger should fire whenever some operation happens.
+with data. In the case for sqlite3 the main data model we have are
-This trigger then executes everything in the `begin...end;` section _before_ the new operation happens.
+tables, where we store our pertinent data, and later we'll learn even
 data about our data is stored in tables.
-> `after`
+Because everything goes into a table, it means we also have to have a
 plan for *how* we want to lay out our data in the table. The **schema**
 is that design/structure for our databse. The **instance** is the
 occurance of that schema with some data inside the fields, i.e. we have
 a table sitting somewhere in the databse which follows the given
 structure of a aforemention schema.
-Likewise if we want to fire a trigger _after_ some operation we ccan just replace the before keyword with `after`.
+**Queries** are typically known to be declarative; typically we don't
-
+care about what goes on behind the scenes in practice since by this
-> `new.adsf`
+point we are assuming we have tools we trust and know to be somewhat
-
+efficient.
 Refers to _new_ value being added to a table.
 > `old.adsf` 
 Refers to _old_ vvalue being changed in a table.
 ## Trigger Metadata
 If you want to look at what triggers exist you can query the `sql_master` table.
 ```
 select * from sql_master where name='trigger';
 ```
 Finally we have **transactions** which are a set of operations who are
 not designed to only commit if they are completed successfully.
 Transactions are not alllowed to fail. If *anything* fails then
 everything should be undone and the state should revert to previous
 state. This is useful because if we are, for example, transferring money
 to another account we want to make sure that the exchange happens
 seamlessly otherwise we should back out of the operation altogether.
--- a/370/notes/adj-list.md
+++ b/370/notes/adj-list.md
@ -1,38 +1,35 @@
-# Adjacency list
+A\* Pathfinding
 ===============
-Imagine 8 nodes with no connections
+There are 3 main values usedd in reference to A\*:
-To store this data in an _adjacency list_ we need __n__ items to store them.
+    f = how promisiing a new location is
-We'll have 0 __e__dges however so in total our space is (n+e) == (n)
+    g = distance from origin
    h = estimate distance to goal
    f = g + h
-# Adjacency matrix
+For a grid space our `h` is calculated by two straight shots to the goal
 from the current location(ignore barriers). The grid space `g` value is
 basiccally the number of steps we've taken from the origin. We maintain
 a list of potential nodes only, so if one of the seeking nodes gets us
 stuck we can freely remove that, because it succs.
-space: O(n^2)
+Time & Space Commplexities
-The convention for notation btw is [x,y] meaning:
+==========================
 	* _from x to y_
-# Breadth first search
+Best-First Search
 -----------------
-add neighbors of current to queue
+Time: O(VlogV + E)
 go through current's neighbors and add their neighbors to queue
 add neighbor's neighbors
 	keep going until there are no more neighbors to add
 go through queue and start popping members out of the queue
-# Depth first search 
+Dijkstra's
 ----------
-Here we're going deeper into the neighbors
+O(V\^2 + E)
-_once we have a starting point_ 
+A\*
 ---
-_available just means that node has a non-visited neighbor_
+Worst case is the same as Dijkstra's time
 if available go to a neighbor
 if no neighbors available visit
 goto 1
-# Kahn Sort
+O(V\^2 + E)
 # Graph Coloring
 When figuring out how many colors we need for the graph, we should note the degree of the graph
--- a/412/hardware-strats.md
+++ b/412/hardware-strats.md
@ -1,69 +1,60 @@
-# Hardware deployment Strategies
+Data storage
 ============
 Spinning Disks
 --------------
-## Virtual Desktop Interface
+Cheaper for more storage
-aka 0-Clients: network hosted OS is what each client would use.
+RAID - Redundant Array of Independent Disk
 ------------------------------------------
-In some cases that network is a pool of servers which are tapped into.
+Raid 0: basically cramming multiple drives and treating them as one.
-Clients can vary in specs like explained below(context: university):
+Data is striped across the drives but if one fails then you literally
 lose a chunk of data.
-> Pool for a Library
+Raid 1: data is mirrored across the drives so it's completely redundant
 so if one fails the other is still alive. It's not a backup however
 since file updates will affect all the drives.
-Clients retain low hardware specs since most are just using office applications and not much else.
+Raid 5: parity. Combining multiple drives allows us to establish the
 parity of the data on other drives to recover that data if it goes
 missing.(min 3 drives)
-> Pool for an Engineering department
+Raid 6: same in principle as raid 5 but this time we have an extra drive
 for just parity.
-Clients connect to another pool where both clients and pool have better hardware specs/resources.
+Raid 10: 0 and 1 combined to have a set of drives in raid 0 and putting
 those together in raid 1 with another equally sized set of drives.
-The downside is that there is _1 point of failure_.
+Network Attached Storage - NAS
-The pool goes down and so does everyone else, meaning downtime is going to cost way more than a single machine going down.
+------------------------------
 Basically space stored on the local network.
 Storage Attached Network - SAN
 ------------------------------
-# Server Hardware Strategies
+Applicable when we virtualise whole os's for users, we use a storage
 device attached to the network to use different operating systems
-> All eggs in one basket
+Managing Storage
 ================
-Imagine just one server doing everything
+Outsourcing the storage for users to services like Onedrive because it
 becomes their problem and not ours.
-* Important to maintain redundancy in this case
+Storage as a Service
-* Upgrading is a pain sometimes
+====================
 Ensure that the OS gets its own space/partition on a drive and give the
 user their own partition to ruin. That way the OS(windows) will just
 fill its partition into another dimension.
-> Buy in bulk, allocate fractions
+Backup
 ======
-Basically have a server that serves up varies virtual machines.
+Other people's data is in your hands so make sure that you backup data
-# Live migration
+in some way. Some external services can be nice if you find that you
-
+constantly need to get to your backups. Tape records are good for
-Allows us to move live running virtual machines onto new servers if that server is running out of resources.
+archival purposes; keep in mind that they are slow as hell.
 # Containers
 _docker_: Virtualize the service, not the whole operating system
 # Server Hardware Features
 > Things that server's benefit from
 * fast i/o
 * low latency cpu's(xeons > i series)
 * expansion slots
 * lots of network ports available
 * EC memory
 * Remote control
 Patch/Version control on server's
 Scheduling is usually slow/more lax so that server's don't just randomly break all the time.
 # Misc
 Uptime: more uptime is _going_ to be more expensive. Depending on what you're doing figure out how much downtime you can afford.
 # Specs
 Like before _ecc memory_ is basically required for servers, good number of network interfaces, and solid disks management.
 Remember that the main parameters for choosing hardware is going to be budget, and necessity; basically what can you get away with on the budget at hand.