csnotes/363/lec/lec12.md
2019-09-24 11:34:35 -07:00

44 lines
2.3 KiB
Markdown

# lec12
## Lab
This section has a lab activity in `lab/` with instructions on `in-memory-searches.pdf` and `on-disk-search.pdf`.
## In-memory Search
_For now we'll deal with trivial queuries._
Say we perform this query: `select name from censusData where age<30;`.
If we do a linear search we will nearly always have to go through all `N` records in the table to get the data we want out.
Binary searches prove to be quicker but our data must be ordered in some fashion.
_Note:_ just recall that we can only sort a table's entries by a single column at any given time.
The other problem we encounter is that our data must _always_ remaini sorted, which means entering, modifying, and deleting data has much larger overhead than other methods.
## On-Disk Search
There are two main ways of storing the data on disk: by record or by column.
Likewise we also have to deal with variable length data types like `varchar` which provides an uppoer bound but no fixed size necessarily.
### Blocks
Blocks contain records or sometimes columns depending on the implementation.
We usually allocate these blocks in 4k or 8k bytes of space since sectors are split into 512 byte chunks.
These things are taken into account because I/O time sucks, it always has and until ssd's lifetime performace doesn't suck this always will.
The main issue with getting data off the disk isn't the read time, it's the time to find something in the first place. This is because we write to the disk in a fashion that _isn't_ completely linear.
Also keep in mind that our total I/O time to search for something is going to be T~access~ + T~transfer~\*N~records~.
* If we search on a keytype then we only have to search half the records.
* Also this is assuming that _all_ the blocks are right next to each other.
If we search for some blocks that happen to be right next to each then we only need to bother finding the first block but with a binary search we have to bother accessing _every single research_.
This is because unlike memory which is managed by a well written OS, the disk is dumb... very dumb.
The way it(physical machine disk) writes/modifies data is nearly always trivial, meaning there is no clever way that it is writing data.
This is half the reason we say that I/O time sucks.
Because hard disks are slow and stupid compared to memory which is quick and clever.