From ec31274f14aa317c45966a1427d952391a096e89 Mon Sep 17 00:00:00 2001 From: shockrah Date: Sun, 5 Jul 2020 17:18:01 -0700 Subject: [PATCH] more updated content to display on the new site --- 311/lec/lec10.md | 102 +++++++++-------------- 312/ciphers.md | 93 ++++++--------------- 337/lec/lec10.md | 178 +++++++++++++++++++++++++++++++++++------ 363/lec/lec10.md | 72 +++++++++-------- 370/notes/adj-list.md | 51 ++++++------ 412/hardware-strats.md | 93 ++++++++++----------- 6 files changed, 317 insertions(+), 272 deletions(-) diff --git a/311/lec/lec10.md b/311/lec/lec10.md index de79746..cfe2ce8 100644 --- a/311/lec/lec10.md +++ b/311/lec/lec10.md @@ -1,81 +1,51 @@ -# lec10 +lec1 +===== -## TCP Structue +First we'll define some terminology. -Sequence Numbers: -* byte stream _number_ of first byte in segment's data +> Hosts -ACKS's: -* seq # of next byte expected from other side +End systems - typically don't bother with routing data through a network -Example: -``` -host a: user sends 'c' - seq=42, ack=79, data='c' -host b: ACK recepit send to host a(echo's back ''c') - seq=72, ack=49, data='c' ; data sent back from host b -``` +> Communication Links -### Round trip time +Typically the actual systems that connect things together. -EstimatedRTT= (1-\alpha)*EstimatedRTT + \alpha*SampleRTT +Network edges +------------- -> Lot's of stuff missing here +Can be subdivided clients & servers and sometimes both at the same time. -## TCP Reliable data transfer - -Implements: - -* Pipeplined segments -* cumulative `ACK` - * This just means that we assume that the highest sequenced ACK also means the previous segments have been received properly too -* Single transmission timer - -### Sender Events - -1. First create segment w/ seq no. - - a. Sequence number refers to byte in - -2. Start timer if we don't already have one. - - a. Timer based off oldest UN-ACKED segment - -## Retransmission w/ TCP - -__Timout__: Usually it's pretty long so if there is a timeout on a packet. -When this happens the receiver responds to sender with 3 ACK's for the last well received segment: - -Receiver gets `1 2 3 5` but not `4`. We respond with the ACK for `1` like normal, then 3 ACK's for `1` is sent to the sender before the time out and we start re-sending from `2`. -This is what we call _fast retransmit_. - -_The main thing here is that the receiver controls the sender's "send rate" so that the receiver doesn't get inundated._ -Receiver will _advertise_ free buffer space in including `rwnd` value in TCP header. -This just tells the sender how much space is available to accept at a time. - -Example: Transferring a large file from host to host. - -\alpha will send a file to \beta. -Alpha sends some file data to \beta, who then ACK's the packet but includes in the header that their buffer is full. -\alpha responds with a 1 byte packet to keep the connection alive. - - -## Connection Management - -Before sender/receiver start exchanging anything we must perform a `handshake`. -`SYN` is a special packet type under TCP which we can use to synchronize both client and server. - -### Closing - -`FIN` bit inside the header. -We send this off to a receiver and we enter a `close_wait` state. -We only wait because there might be more data. -Receiver enters the `close_wait` state as well, _but_, still sends any data left over. -Once the last `ACK` is sent we send a `FIN` packet +Access network: cable network +----------------------------- +Typically when have to share one line we can change the frequency of the +signal as one method to provide a distinguishment between different data +which may sometimes come from different sources. +### Home Network +Let's start with the modem. All it does it take some signla and convert +it to the proper IEEE data format(citation needed). +Typically we would then pipe that data to a router which, given a +scenario for most houses, would forward that input data to whichever +machines requested the data. +If you recall back to your discrete mathematics coursework various graph +topologies were covered and you likely noted that *star* topologies were +common for businesses since it makes it easist to send data from one +outside node on the star to another. In practice this would just mean +having the router/modem setup be one of the apendages of the star and +switch be in the middle so that the data only has to make two hops to +get anywhere in the network. +> Doesn't that mean theres one node that could bring the whole network +> down at any time? +Absolutely, which is why if you have a *very* small network with a +couple devices it's not really a problem but if you have an office full +of employees all with their own machines and wireless, printers, +servers, etc. then it's a huge problem. That's why typically a small +business or shop might be more inclined to use such a setup because: \* +It's easy to setup \* It's cheap to maintain diff --git a/312/ciphers.md b/312/ciphers.md index 7b16b8c..9de48cf 100644 --- a/312/ciphers.md +++ b/312/ciphers.md @@ -1,77 +1,32 @@ -# Block Ciphers +Active v Passive Attacks +======================== -The main concept here is twofold: +Base Definitions +---------------- -* we take _blocks_ of data and cipher the _blocks_ -* A given key is actually used to generate recursive keys to be further used on the data itself +Passive: compromising a system but not necessarily doing anything apart +from *watching* +Active: compromising a system while doing something to the system apart +from infiltrating it -_bs example ahead_ +Loosely speaking +---------------- -Say we have a key 7 and some data 123456. -We take the whole data set and chunk it into blocks(for example): 12 34 56. +*Passive* can be just like listening in on a conversation(eavesdropping) +where *active* is like jumping into the conversation and trying to do +something to it. -Let's say our function here is to just add 7 to each block so we do the first step: +When/How would either happen? +----------------------------- -``` -12 + 7 = 19 -Unlike other ciphers we don't reuse 7; instead we use the new thing as both the new key and part of our cipher text +If the result of an attack is to actually trigger some code to run then +usually we need to first gather the information required to understand +how to make that happen. The reasoning is straightforward: if you don't +know how some system works then it's much harder to exploit that system. -19 + 34 = 53 -Cipher: 1953.. - -53 + 56 = 109 <= let's pretend that this rolls over 99 and back to 00 - 09 <= like this - -Final cipher: 195309 -``` - -_It should be noted that in practice these functions usually take in huge keys and blocks_. - -> Deciphering - -Start from the back of the cipher not the front; if we used and xor function scheme (which is a symmetrical function) we would simply just xor the last block by itself and thus perform the same encryption scheme but in reverse. - -Example::Encryption - -``` -Key: 110 -Function scheme: xor -Data: 101 001 111 - -101 011 010 -110 001 111 - -011 010 101 <= encrypted - -``` - -Example::Decryption - -``` -Ciphered: 011 010 101 -Function scheme: xor - -... -``` - -# Feistal Cipher - -Two main components: - -1. each _thing_ in the data to cipher is replaced by a _ciphered thing_ - -2. nothing is added or deleted or replaced in sequence, instead the order of _things_ is changed. - -Basically imagine that every _type of thing_ in our data maps to some other _type of thing/thing_ in the data and thus become swapped/reordered. - -# DES - Data Encryption Standard - -Widely used until about 2001 when AES surpassed it as the newer(ish(kinda)) standard. - -DEA was the actual algorithm tho: - -* 64 bit blocks -* 56 bit keys -* turns a 64-bit input into a 64-bit output (wew) -* Steps in reverse also reverse the encryption itself +Random example: Using a keylogger to log keystroke before sending those +logs to a server for processing could be a passive attack since you're +still in a *gathering data* sort of mode. Finally using that data to +trying logging into some service would be the active portion of a +full-scale attack. diff --git a/337/lec/lec10.md b/337/lec/lec10.md index 0908e42..7c75724 100644 --- a/337/lec/lec10.md +++ b/337/lec/lec10.md @@ -1,41 +1,169 @@ -# lec11 +lec1 +==== -At this point I'l mention that just reading isn't going to get you anywhere, you have to try things, and give it a real earnest attempt. +> What on earth? -__ALU:__ Arithmetic Logic Unit +The first lecture has bee 50% syllabus 25% videos, 25% simple +terminology; expect nothing interesting for this section -## Building a 1-bit ALU +General Performance Improvements in software +-------------------------------------------- -![fig0](../img/alu.png) +In general we have a few options to increase performace in software; +pipelining, parallelism, prediction. -First we'll create an example _ALU_ which implements choosing between an `and`, `or`, `xor`, or `add`. -Whether or not our amazing _ALU_ is useful doesn't matter so we'll go one function at a time(besides `and/or`). +1. Parallelism -First recognize that we need to choose between `and` or `or` against our two inputs A/B. -This means we have two inputs and/or, and we need to select between them. -_Try to do this on your own first!_ +If we have multiple tasks to accomplish or multiple sources of data we +might instead find it better to work on multiple things at +once\[e.g. multi-threading, multi-core rendering\] -![fig1](../mg/fig1llec11.png) +2. Pipelining -Next we'll add on the `xor`. -Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux. +Here we are somehow taking *data* and serializing it into a linear form. +We do things like this because it could make sense to things +linearly\[e.g. taking data from a website response and forming it into a +struct/class instance in C++/Java et al.\]. -![fig2](../img/fig2lec11.png) +3. Prediction -Finally we'll add the ability to add and subtract. -You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result. +If we can predict an outcome to avoid a bunch of computation then it +could be worth to take our prediction and proceed with that instead of +the former. This happens **a lot** in cpu's where they use what's called +[branch prediction](https://danluu.com/branch-prediction/) to run even +faster. -![fig3](../img/fig3lec11.png) +Cost of Such Improvements +------------------------- -At this point our _ALU_ can `and`, `or`, `xor`, and `add`/`sub`. -The mux will choose one which logic block to use; the carry-in line will tell the `add` logic block whether to add or subtract. -Finally the A-invert and B-invert line allow us to determine if we want to invert either A or B (inputs). +As the saying goes: every decision you make as an engineer ultimately +has a cost, let's look at the cost of these improvements. -## N-bit ALU +1. Parallelism -For sanity we'll use the following block for our new ALU. +If we have a data set which has some form of inter-dependencies between +its members then we could easily run into the issue of waiting on other +things to finish. -![fig4](../img/fig4lec11.png) +Contrived Example: -Note that we are chaining the carry-in's to the carry-out's just like a ripple adder. -also each ALU just works with `1` bit from our given 4-bit input. + Premise: output file contents -> search lines for some text -> sort the resulting lines + + We have to do the following processes: + print my-file.data + search file + sort results of the search + + In bash we might do: cat my-file.data | grep 'Text to search for' | sort + +Parallelism doesn't make sense here for one reason: this series of +proccesses don't benefit from parallelism because the 2nd and 3rd tasks +*must* wait until the previous ones finish first. + +2. Pipelining + +Let's say we want to do the following: + + Search file1 for some text : [search file1] + Feed the results of the search into a sorting program [sort] + + Search file2 for some text [search file2] + Feed the results of the search into a reverse sorting program [reverse sort] + + The resulting Directed Acyclic Graph looks like + + [search file1] => [sort] + + [search file2] => [reverse sort] + +Making the above linear means we effectively have to: + + [search file1] => [sort] [search file2] => [reverse sort] + | proc2 waiting........| + +Which wastes a lot of time if the previous process is going to take a +long time. Bonus points if process 2 is extremely short. + +3. Prediction + +Ok two things up front: + +- First: prediction's fault is that we could be wrong and have to end + up doing hard computations. +- Second: *this course never covers branch prediction(something that + pretty much every cpu in the last 20 years out there does)* so I'm + gonna cover it here; ready, let's go. + +For starters let's say a basic cpu takes instructions sequentially in +memory: `A B C D`. However this is kinda slow because there is *time* +between getting instructions, decoding it to know what instruction it is +and finally executing it proper. For this reason modern CPU's actually +fetch, decode, and execute(and more!) instructions all at the same time. + +Instead of getting instructions like this: + + 0 + AA + BB + CC + DD + +We actually do something more like this + + A + AB + BC + CD + D0 + +If it doesn't seem like much remember this is half an instruction on a +chip that is likely going to process thousands/millions of instructions +so the savings scales really well. + +This scheme is fine if our instructions are all coming one after the +other in memory, but if we need to branch then we likely need to jump to +a new location like so. + + ABCDEFGHIJKL + ^^^* ^ + |-----| + +Now say we have the following code: + + if (x == 123) { + main_call(); + } + else { + alternate_call(); + } + +The (psuedo)assembly might look like + +``` {.asm} + cmp x, 123 + je second +main_branch: ; pointless label but nice for reading + call main_call + jmp end +second: + call alternate_call +end: + ; something to do here +``` + +Our problem comes when we hit the je. Once we've loaded that instruction +and can start executing it, we have to make a decision, load the +`call main_call` instruction or the `call alternate_call`? Chances are +that if we guess we have a 50% change of saving time and 50% chance of +tossing out our guess and starting the whole *get instruction =\> decode +etc.* process over again from scratch. + +Solution 1: + +Try do determine what branches are taken prior to running the program +and just always guess the more likely branches. If we find that the +above branch calls `main_branch` more often then we should load that +branch always; knowing that the loss from being wrong is offset by the +gain from the statistically more often correct branches. + +... diff --git a/363/lec/lec10.md b/363/lec/lec10.md index c20d480..d878d00 100644 --- a/363/lec/lec10.md +++ b/363/lec/lec10.md @@ -1,43 +1,47 @@ -# lec10 +lec1 +==== -This lecture has a corresponding lab excercise who's instructions can be found in `triggers-lab.pdf`. +Databases introduction +---------------------- -## What is a trigger +First off why do we even need a database and what do they accomplish? -Something that executes when _some operation_ is performed +Generally a databse will have 3 core elements to it: -## Structure +1. querying + - Finding things + - Just as well structured data makes querying easier +2. access control + - who can access which data segments and what they can do with + that data + - reading, writing, sending, etc +3. corruption prevention + - mirroring/raid/parity checking/checksums/etc as some examples -``` -create trigger NAME before some_operation -when(condition) -begin - do_something -end; -``` +Modeling Data +------------- -To explain: First we `create trigger` followed by some trigger name. -Then we have to denote that this trigger should fire whenever some operation happens. -This trigger then executes everything in the `begin...end;` section _before_ the new operation happens. +Just like other data problems we can choose what model we use to deal +with data. In the case for sqlite3 the main data model we have are +tables, where we store our pertinent data, and later we'll learn even +data about our data is stored in tables. -> `after` +Because everything goes into a table, it means we also have to have a +plan for *how* we want to lay out our data in the table. The **schema** +is that design/structure for our databse. The **instance** is the +occurance of that schema with some data inside the fields, i.e. we have +a table sitting somewhere in the databse which follows the given +structure of a aforemention schema. -Likewise if we want to fire a trigger _after_ some operation we ccan just replace the before keyword with `after`. - -> `new.adsf` - -Refers to _new_ value being added to a table. - -> `old.adsf` - -Refers to _old_ vvalue being changed in a table. - - -## Trigger Metadata - -If you want to look at what triggers exist you can query the `sql_master` table. - -``` -select * from sql_master where name='trigger'; -``` +**Queries** are typically known to be declarative; typically we don't +care about what goes on behind the scenes in practice since by this +point we are assuming we have tools we trust and know to be somewhat +efficient. +Finally we have **transactions** which are a set of operations who are +not designed to only commit if they are completed successfully. +Transactions are not alllowed to fail. If *anything* fails then +everything should be undone and the state should revert to previous +state. This is useful because if we are, for example, transferring money +to another account we want to make sure that the exchange happens +seamlessly otherwise we should back out of the operation altogether. diff --git a/370/notes/adj-list.md b/370/notes/adj-list.md index cbd3ad9..d2603f8 100644 --- a/370/notes/adj-list.md +++ b/370/notes/adj-list.md @@ -1,38 +1,35 @@ -# Adjacency list +A\* Pathfinding +=============== -Imagine 8 nodes with no connections +There are 3 main values usedd in reference to A\*: -To store this data in an _adjacency list_ we need __n__ items to store them. -We'll have 0 __e__dges however so in total our space is (n+e) == (n) + f = how promisiing a new location is + g = distance from origin + h = estimate distance to goal + f = g + h -# Adjacency matrix +For a grid space our `h` is calculated by two straight shots to the goal +from the current location(ignore barriers). The grid space `g` value is +basiccally the number of steps we've taken from the origin. We maintain +a list of potential nodes only, so if one of the seeking nodes gets us +stuck we can freely remove that, because it succs. -space: O(n^2) -The convention for notation btw is [x,y] meaning: - * _from x to y_ +Time & Space Commplexities +========================== -# Breadth first search +Best-First Search +----------------- -add neighbors of current to queue -go through current's neighbors and add their neighbors to queue -add neighbor's neighbors - keep going until there are no more neighbors to add -go through queue and start popping members out of the queue +Time: O(VlogV + E) -# Depth first search +Dijkstra's +---------- -Here we're going deeper into the neighbors +O(V\^2 + E) -_once we have a starting point_ +A\* +--- -_available just means that node has a non-visited neighbor_ -if available go to a neighbor -if no neighbors available visit -goto 1 +Worst case is the same as Dijkstra's time -# Kahn Sort - - -# Graph Coloring - -When figuring out how many colors we need for the graph, we should note the degree of the graph +O(V\^2 + E) diff --git a/412/hardware-strats.md b/412/hardware-strats.md index 5945efa..28403ad 100644 --- a/412/hardware-strats.md +++ b/412/hardware-strats.md @@ -1,69 +1,60 @@ -# Hardware deployment Strategies +Data storage +============ +Spinning Disks +-------------- -## Virtual Desktop Interface +Cheaper for more storage -aka 0-Clients: network hosted OS is what each client would use. +RAID - Redundant Array of Independent Disk +------------------------------------------ -In some cases that network is a pool of servers which are tapped into. -Clients can vary in specs like explained below(context: university): +Raid 0: basically cramming multiple drives and treating them as one. +Data is striped across the drives but if one fails then you literally +lose a chunk of data. -> Pool for a Library +Raid 1: data is mirrored across the drives so it's completely redundant +so if one fails the other is still alive. It's not a backup however +since file updates will affect all the drives. -Clients retain low hardware specs since most are just using office applications and not much else. +Raid 5: parity. Combining multiple drives allows us to establish the +parity of the data on other drives to recover that data if it goes +missing.(min 3 drives) -> Pool for an Engineering department +Raid 6: same in principle as raid 5 but this time we have an extra drive +for just parity. -Clients connect to another pool where both clients and pool have better hardware specs/resources. +Raid 10: 0 and 1 combined to have a set of drives in raid 0 and putting +those together in raid 1 with another equally sized set of drives. -The downside is that there is _1 point of failure_. -The pool goes down and so does everyone else, meaning downtime is going to cost way more than a single machine going down. +Network Attached Storage - NAS +------------------------------ +Basically space stored on the local network. +Storage Attached Network - SAN +------------------------------ -# Server Hardware Strategies +Applicable when we virtualise whole os's for users, we use a storage +device attached to the network to use different operating systems -> All eggs in one basket +Managing Storage +================ -Imagine just one server doing everything +Outsourcing the storage for users to services like Onedrive because it +becomes their problem and not ours. -* Important to maintain redundancy in this case -* Upgrading is a pain sometimes +Storage as a Service +==================== +Ensure that the OS gets its own space/partition on a drive and give the +user their own partition to ruin. That way the OS(windows) will just +fill its partition into another dimension. -> Buy in bulk, allocate fractions +Backup +====== -Basically have a server that serves up varies virtual machines. -# Live migration - -Allows us to move live running virtual machines onto new servers if that server is running out of resources. - -# Containers - -_docker_: Virtualize the service, not the whole operating system - -# Server Hardware Features - -> Things that server's benefit from - -* fast i/o -* low latency cpu's(xeons > i series) -* expansion slots -* lots of network ports available -* EC memory -* Remote control - -Patch/Version control on server's - -Scheduling is usually slow/more lax so that server's don't just randomly break all the time. - -# Misc - -Uptime: more uptime is _going_ to be more expensive. Depending on what you're doing figure out how much downtime you can afford. - - -# Specs - -Like before _ecc memory_ is basically required for servers, good number of network interfaces, and solid disks management. - -Remember that the main parameters for choosing hardware is going to be budget, and necessity; basically what can you get away with on the budget at hand. +Other people's data is in your hands so make sure that you backup data +in some way. Some external services can be nice if you find that you +constantly need to get to your backups. Tape records are good for +archival purposes; keep in mind that they are slow as hell.