more updated content to display on the new site

This commit is contained in:
shockrah 2020-07-05 17:18:01 -07:00
parent 62bcfa79b3
commit ec31274f14
6 changed files with 317 additions and 272 deletions

View File

@ -1,81 +1,51 @@
# lec10 lec1
=====
## TCP Structue First we'll define some terminology.
Sequence Numbers: > Hosts
* byte stream _number_ of first byte in segment's data
ACKS's: End systems - typically don't bother with routing data through a network
* seq # of next byte expected from other side
Example: > Communication Links
```
host a: user sends 'c'
seq=42, ack=79, data='c'
host b: ACK recepit send to host a(echo's back ''c')
seq=72, ack=49, data='c' ; data sent back from host b
```
### Round trip time Typically the actual systems that connect things together.
EstimatedRTT= (1-\alpha)*EstimatedRTT + \alpha*SampleRTT Network edges
-------------
> Lot's of stuff missing here Can be subdivided clients & servers and sometimes both at the same time.
## TCP Reliable data transfer Access network: cable network
-----------------------------
Implements:
* Pipeplined segments
* cumulative `ACK`
* This just means that we assume that the highest sequenced ACK also means the previous segments have been received properly too
* Single transmission timer
### Sender Events
1. First create segment w/ seq no.
a. Sequence number refers to byte in
2. Start timer if we don't already have one.
a. Timer based off oldest UN-ACKED segment
## Retransmission w/ TCP
__Timout__: Usually it's pretty long so if there is a timeout on a packet.
When this happens the receiver responds to sender with 3 ACK's for the last well received segment:
Receiver gets `1 2 3 5` but not `4`. We respond with the ACK for `1` like normal, then 3 ACK's for `1` is sent to the sender before the time out and we start re-sending from `2`.
This is what we call _fast retransmit_.
_The main thing here is that the receiver controls the sender's "send rate" so that the receiver doesn't get inundated._
Receiver will _advertise_ free buffer space in including `rwnd` value in TCP header.
This just tells the sender how much space is available to accept at a time.
Example: Transferring a large file from host to host.
\alpha will send a file to \beta.
Alpha sends some file data to \beta, who then ACK's the packet but includes in the header that their buffer is full.
\alpha responds with a 1 byte packet to keep the connection alive.
## Connection Management
Before sender/receiver start exchanging anything we must perform a `handshake`.
`SYN` is a special packet type under TCP which we can use to synchronize both client and server.
### Closing
`FIN` bit inside the header.
We send this off to a receiver and we enter a `close_wait` state.
We only wait because there might be more data.
Receiver enters the `close_wait` state as well, _but_, still sends any data left over.
Once the last `ACK` is sent we send a `FIN` packet
Typically when have to share one line we can change the frequency of the
signal as one method to provide a distinguishment between different data
which may sometimes come from different sources.
### Home Network
Let's start with the modem. All it does it take some signla and convert
it to the proper IEEE data format(citation needed).
Typically we would then pipe that data to a router which, given a
scenario for most houses, would forward that input data to whichever
machines requested the data.
If you recall back to your discrete mathematics coursework various graph
topologies were covered and you likely noted that *star* topologies were
common for businesses since it makes it easist to send data from one
outside node on the star to another. In practice this would just mean
having the router/modem setup be one of the apendages of the star and
switch be in the middle so that the data only has to make two hops to
get anywhere in the network.
> Doesn't that mean theres one node that could bring the whole network
> down at any time?
Absolutely, which is why if you have a *very* small network with a
couple devices it's not really a problem but if you have an office full
of employees all with their own machines and wireless, printers,
servers, etc. then it's a huge problem. That's why typically a small
business or shop might be more inclined to use such a setup because: \*
It's easy to setup \* It's cheap to maintain

View File

@ -1,77 +1,32 @@
# Block Ciphers Active v Passive Attacks
========================
The main concept here is twofold: Base Definitions
----------------
* we take _blocks_ of data and cipher the _blocks_ Passive: compromising a system but not necessarily doing anything apart
* A given key is actually used to generate recursive keys to be further used on the data itself from *watching*
Active: compromising a system while doing something to the system apart
from infiltrating it
_bs example ahead_ Loosely speaking
----------------
Say we have a key 7 and some data 123456. *Passive* can be just like listening in on a conversation(eavesdropping)
We take the whole data set and chunk it into blocks(for example): 12 34 56. where *active* is like jumping into the conversation and trying to do
something to it.
Let's say our function here is to just add 7 to each block so we do the first step: When/How would either happen?
-----------------------------
``` If the result of an attack is to actually trigger some code to run then
12 + 7 = 19 usually we need to first gather the information required to understand
Unlike other ciphers we don't reuse 7; instead we use the new thing as both the new key and part of our cipher text how to make that happen. The reasoning is straightforward: if you don't
know how some system works then it's much harder to exploit that system.
19 + 34 = 53 Random example: Using a keylogger to log keystroke before sending those
Cipher: 1953.. logs to a server for processing could be a passive attack since you're
still in a *gathering data* sort of mode. Finally using that data to
53 + 56 = 109 <= let's pretend that this rolls over 99 and back to 00 trying logging into some service would be the active portion of a
09 <= like this full-scale attack.
Final cipher: 195309
```
_It should be noted that in practice these functions usually take in huge keys and blocks_.
> Deciphering
Start from the back of the cipher not the front; if we used and xor function scheme (which is a symmetrical function) we would simply just xor the last block by itself and thus perform the same encryption scheme but in reverse.
Example::Encryption
```
Key: 110
Function scheme: xor
Data: 101 001 111
101 011 010
110 001 111
011 010 101 <= encrypted
```
Example::Decryption
```
Ciphered: 011 010 101
Function scheme: xor
...
```
# Feistal Cipher
Two main components:
1. each _thing_ in the data to cipher is replaced by a _ciphered thing_
2. nothing is added or deleted or replaced in sequence, instead the order of _things_ is changed.
Basically imagine that every _type of thing_ in our data maps to some other _type of thing/thing_ in the data and thus become swapped/reordered.
# DES - Data Encryption Standard
Widely used until about 2001 when AES surpassed it as the newer(ish(kinda)) standard.
DEA was the actual algorithm tho:
* 64 bit blocks
* 56 bit keys
* turns a 64-bit input into a 64-bit output (wew)
* Steps in reverse also reverse the encryption itself

View File

@ -1,41 +1,169 @@
# lec11 lec1
====
At this point I'l mention that just reading isn't going to get you anywhere, you have to try things, and give it a real earnest attempt. > What on earth?
__ALU:__ Arithmetic Logic Unit The first lecture has bee 50% syllabus 25% videos, 25% simple
terminology; expect nothing interesting for this section
## Building a 1-bit ALU General Performance Improvements in software
--------------------------------------------
![fig0](../img/alu.png) In general we have a few options to increase performace in software;
pipelining, parallelism, prediction.
First we'll create an example _ALU_ which implements choosing between an `and`, `or`, `xor`, or `add`. 1. Parallelism
Whether or not our amazing _ALU_ is useful doesn't matter so we'll go one function at a time(besides `and/or`).
First recognize that we need to choose between `and` or `or` against our two inputs A/B. If we have multiple tasks to accomplish or multiple sources of data we
This means we have two inputs and/or, and we need to select between them. might instead find it better to work on multiple things at
_Try to do this on your own first!_ once\[e.g. multi-threading, multi-core rendering\]
![fig1](../mg/fig1llec11.png) 2. Pipelining
Next we'll add on the `xor`. Here we are somehow taking *data* and serializing it into a linear form.
Try doing this on your own but as far as hints go: don't be afraid to make changes to the mux. We do things like this because it could make sense to things
linearly\[e.g. taking data from a website response and forming it into a
struct/class instance in C++/Java et al.\].
![fig2](../img/fig2lec11.png) 3. Prediction
Finally we'll add the ability to add and subtract. If we can predict an outcome to avoid a bunch of computation then it
You may have also noted that we can subtract two things to see if they are the same however, we can also `not` the result of the `xor` and get the same result. could be worth to take our prediction and proceed with that instead of
the former. This happens **a lot** in cpu's where they use what's called
[branch prediction](https://danluu.com/branch-prediction/) to run even
faster.
![fig3](../img/fig3lec11.png) Cost of Such Improvements
-------------------------
At this point our _ALU_ can `and`, `or`, `xor`, and `add`/`sub`. As the saying goes: every decision you make as an engineer ultimately
The mux will choose one which logic block to use; the carry-in line will tell the `add` logic block whether to add or subtract. has a cost, let's look at the cost of these improvements.
Finally the A-invert and B-invert line allow us to determine if we want to invert either A or B (inputs).
## N-bit ALU 1. Parallelism
For sanity we'll use the following block for our new ALU. If we have a data set which has some form of inter-dependencies between
its members then we could easily run into the issue of waiting on other
things to finish.
![fig4](../img/fig4lec11.png) Contrived Example:
Note that we are chaining the carry-in's to the carry-out's just like a ripple adder. Premise: output file contents -> search lines for some text -> sort the resulting lines
also each ALU just works with `1` bit from our given 4-bit input.
We have to do the following processes:
print my-file.data
search file
sort results of the search
In bash we might do: cat my-file.data | grep 'Text to search for' | sort
Parallelism doesn't make sense here for one reason: this series of
proccesses don't benefit from parallelism because the 2nd and 3rd tasks
*must* wait until the previous ones finish first.
2. Pipelining
Let's say we want to do the following:
Search file1 for some text : [search file1]
Feed the results of the search into a sorting program [sort]
Search file2 for some text [search file2]
Feed the results of the search into a reverse sorting program [reverse sort]
The resulting Directed Acyclic Graph looks like
[search file1] => [sort]
[search file2] => [reverse sort]
Making the above linear means we effectively have to:
[search file1] => [sort] [search file2] => [reverse sort]
| proc2 waiting........|
Which wastes a lot of time if the previous process is going to take a
long time. Bonus points if process 2 is extremely short.
3. Prediction
Ok two things up front:
- First: prediction's fault is that we could be wrong and have to end
up doing hard computations.
- Second: *this course never covers branch prediction(something that
pretty much every cpu in the last 20 years out there does)* so I'm
gonna cover it here; ready, let's go.
For starters let's say a basic cpu takes instructions sequentially in
memory: `A B C D`. However this is kinda slow because there is *time*
between getting instructions, decoding it to know what instruction it is
and finally executing it proper. For this reason modern CPU's actually
fetch, decode, and execute(and more!) instructions all at the same time.
Instead of getting instructions like this:
0
AA
BB
CC
DD
We actually do something more like this
A
AB
BC
CD
D0
If it doesn't seem like much remember this is half an instruction on a
chip that is likely going to process thousands/millions of instructions
so the savings scales really well.
This scheme is fine if our instructions are all coming one after the
other in memory, but if we need to branch then we likely need to jump to
a new location like so.
ABCDEFGHIJKL
^^^* ^
|-----|
Now say we have the following code:
if (x == 123) {
main_call();
}
else {
alternate_call();
}
The (psuedo)assembly might look like
``` {.asm}
cmp x, 123
je second
main_branch: ; pointless label but nice for reading
call main_call
jmp end
second:
call alternate_call
end:
; something to do here
```
Our problem comes when we hit the je. Once we've loaded that instruction
and can start executing it, we have to make a decision, load the
`call main_call` instruction or the `call alternate_call`? Chances are
that if we guess we have a 50% change of saving time and 50% chance of
tossing out our guess and starting the whole *get instruction =\> decode
etc.* process over again from scratch.
Solution 1:
Try do determine what branches are taken prior to running the program
and just always guess the more likely branches. If we find that the
above branch calls `main_branch` more often then we should load that
branch always; knowing that the loss from being wrong is offset by the
gain from the statistically more often correct branches.
...

View File

@ -1,43 +1,47 @@
# lec10 lec1
====
This lecture has a corresponding lab excercise who's instructions can be found in `triggers-lab.pdf`. Databases introduction
----------------------
## What is a trigger First off why do we even need a database and what do they accomplish?
Something that executes when _some operation_ is performed Generally a databse will have 3 core elements to it:
## Structure 1. querying
- Finding things
- Just as well structured data makes querying easier
2. access control
- who can access which data segments and what they can do with
that data
- reading, writing, sending, etc
3. corruption prevention
- mirroring/raid/parity checking/checksums/etc as some examples
``` Modeling Data
create trigger NAME before some_operation -------------
when(condition)
begin
do_something
end;
```
To explain: First we `create trigger` followed by some trigger name. Just like other data problems we can choose what model we use to deal
Then we have to denote that this trigger should fire whenever some operation happens. with data. In the case for sqlite3 the main data model we have are
This trigger then executes everything in the `begin...end;` section _before_ the new operation happens. tables, where we store our pertinent data, and later we'll learn even
data about our data is stored in tables.
> `after` Because everything goes into a table, it means we also have to have a
plan for *how* we want to lay out our data in the table. The **schema**
is that design/structure for our databse. The **instance** is the
occurance of that schema with some data inside the fields, i.e. we have
a table sitting somewhere in the databse which follows the given
structure of a aforemention schema.
Likewise if we want to fire a trigger _after_ some operation we ccan just replace the before keyword with `after`. **Queries** are typically known to be declarative; typically we don't
care about what goes on behind the scenes in practice since by this
> `new.adsf` point we are assuming we have tools we trust and know to be somewhat
efficient.
Refers to _new_ value being added to a table.
> `old.adsf`
Refers to _old_ vvalue being changed in a table.
## Trigger Metadata
If you want to look at what triggers exist you can query the `sql_master` table.
```
select * from sql_master where name='trigger';
```
Finally we have **transactions** which are a set of operations who are
not designed to only commit if they are completed successfully.
Transactions are not alllowed to fail. If *anything* fails then
everything should be undone and the state should revert to previous
state. This is useful because if we are, for example, transferring money
to another account we want to make sure that the exchange happens
seamlessly otherwise we should back out of the operation altogether.

View File

@ -1,38 +1,35 @@
# Adjacency list A\* Pathfinding
===============
Imagine 8 nodes with no connections There are 3 main values usedd in reference to A\*:
To store this data in an _adjacency list_ we need __n__ items to store them. f = how promisiing a new location is
We'll have 0 __e__dges however so in total our space is (n+e) == (n) g = distance from origin
h = estimate distance to goal
f = g + h
# Adjacency matrix For a grid space our `h` is calculated by two straight shots to the goal
from the current location(ignore barriers). The grid space `g` value is
basiccally the number of steps we've taken from the origin. We maintain
a list of potential nodes only, so if one of the seeking nodes gets us
stuck we can freely remove that, because it succs.
space: O(n^2) Time & Space Commplexities
The convention for notation btw is [x,y] meaning: ==========================
* _from x to y_
# Breadth first search Best-First Search
-----------------
add neighbors of current to queue Time: O(VlogV + E)
go through current's neighbors and add their neighbors to queue
add neighbor's neighbors
keep going until there are no more neighbors to add
go through queue and start popping members out of the queue
# Depth first search Dijkstra's
----------
Here we're going deeper into the neighbors O(V\^2 + E)
_once we have a starting point_ A\*
---
_available just means that node has a non-visited neighbor_ Worst case is the same as Dijkstra's time
if available go to a neighbor
if no neighbors available visit
goto 1
# Kahn Sort O(V\^2 + E)
# Graph Coloring
When figuring out how many colors we need for the graph, we should note the degree of the graph

View File

@ -1,69 +1,60 @@
# Hardware deployment Strategies Data storage
============
Spinning Disks
--------------
## Virtual Desktop Interface Cheaper for more storage
aka 0-Clients: network hosted OS is what each client would use. RAID - Redundant Array of Independent Disk
------------------------------------------
In some cases that network is a pool of servers which are tapped into. Raid 0: basically cramming multiple drives and treating them as one.
Clients can vary in specs like explained below(context: university): Data is striped across the drives but if one fails then you literally
lose a chunk of data.
> Pool for a Library Raid 1: data is mirrored across the drives so it's completely redundant
so if one fails the other is still alive. It's not a backup however
since file updates will affect all the drives.
Clients retain low hardware specs since most are just using office applications and not much else. Raid 5: parity. Combining multiple drives allows us to establish the
parity of the data on other drives to recover that data if it goes
missing.(min 3 drives)
> Pool for an Engineering department Raid 6: same in principle as raid 5 but this time we have an extra drive
for just parity.
Clients connect to another pool where both clients and pool have better hardware specs/resources. Raid 10: 0 and 1 combined to have a set of drives in raid 0 and putting
those together in raid 1 with another equally sized set of drives.
The downside is that there is _1 point of failure_. Network Attached Storage - NAS
The pool goes down and so does everyone else, meaning downtime is going to cost way more than a single machine going down. ------------------------------
Basically space stored on the local network.
Storage Attached Network - SAN
------------------------------
# Server Hardware Strategies Applicable when we virtualise whole os's for users, we use a storage
device attached to the network to use different operating systems
> All eggs in one basket Managing Storage
================
Imagine just one server doing everything Outsourcing the storage for users to services like Onedrive because it
becomes their problem and not ours.
* Important to maintain redundancy in this case Storage as a Service
* Upgrading is a pain sometimes ====================
Ensure that the OS gets its own space/partition on a drive and give the
user their own partition to ruin. That way the OS(windows) will just
fill its partition into another dimension.
> Buy in bulk, allocate fractions Backup
======
Basically have a server that serves up varies virtual machines. Other people's data is in your hands so make sure that you backup data
# Live migration in some way. Some external services can be nice if you find that you
constantly need to get to your backups. Tape records are good for
Allows us to move live running virtual machines onto new servers if that server is running out of resources. archival purposes; keep in mind that they are slow as hell.
# Containers
_docker_: Virtualize the service, not the whole operating system
# Server Hardware Features
> Things that server's benefit from
* fast i/o
* low latency cpu's(xeons > i series)
* expansion slots
* lots of network ports available
* EC memory
* Remote control
Patch/Version control on server's
Scheduling is usually slow/more lax so that server's don't just randomly break all the time.
# Misc
Uptime: more uptime is _going_ to be more expensive. Depending on what you're doing figure out how much downtime you can afford.
# Specs
Like before _ecc memory_ is basically required for servers, good number of network interfaces, and solid disks management.
Remember that the main parameters for choosing hardware is going to be budget, and necessity; basically what can you get away with on the budget at hand.