renmaing directories to remain consistent
This commit is contained in:
35
363/lec/lec1.md
Normal file
35
363/lec/lec1.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# lec1
|
||||
|
||||
## Databases introduction
|
||||
|
||||
First off why do we even need a database and what do they accomplish?
|
||||
|
||||
Generally a databse will have 3 core elements to it:
|
||||
|
||||
1. querying
|
||||
* Finding things
|
||||
* Just as well structured data makes querying easier
|
||||
|
||||
2. access control
|
||||
* who can access which data segments and what they can do with that data
|
||||
* reading, writing, sending, etc
|
||||
|
||||
3. corruption prevention
|
||||
* mirroring/raid/parity checking/checksums/etc as some examples
|
||||
|
||||
## Modeling Data
|
||||
|
||||
Just like other data problems we can choose what model we use to deal with data.
|
||||
In the case for sqlite3 the main data model we have are tables, where we store our pertinent data, and later we'll learn even data about our data is stored in tables.
|
||||
|
||||
Because everything goes into a table, it means we also have to have a plan for _how_ we want to lay out our data in the table.
|
||||
The __schema__ is that design/structure for our databse.
|
||||
The __instance__ is the occurance of that schema with some data inside the fields, i.e. we have a table sitting somewhere in the databse which follows the given structure of a aforemention schema.
|
||||
|
||||
__Queries__ are typically known to be declarative; typically we don't care about what goes on behind the scenes in practice since by this point we are assuming we have tools we trust and know to be somewhat efficient.
|
||||
|
||||
Finally we have __transactions__ which are a set of operations who are not designed to only commit if they are completed successfully.
|
||||
Transactions are not alllowed to fail.
|
||||
If _anything_ fails then everything should be undone and the state should revert to previous state.
|
||||
This is useful because if we are, for example, transferring money to another account we want to make sure that the exchange happens seamlessly otherwise we should back out of the operation altogether.
|
||||
|
||||
43
363/lec/lec10.md
Normal file
43
363/lec/lec10.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# lec10
|
||||
|
||||
This lecture has a corresponding lab excercise who's instructions can be found in `triggers-lab.pdf`.
|
||||
|
||||
## What is a trigger
|
||||
|
||||
Something that executes when _some operation_ is performed
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
create trigger NAME before some_operation
|
||||
when(condition)
|
||||
begin
|
||||
do_something
|
||||
end;
|
||||
```
|
||||
|
||||
To explain: First we `create trigger` followed by some trigger name.
|
||||
Then we have to denote that this trigger should fire whenever some operation happens.
|
||||
This trigger then executes everything in the `begin...end;` section _before_ the new operation happens.
|
||||
|
||||
> `after`
|
||||
|
||||
Likewise if we want to fire a trigger _after_ some operation we ccan just replace the before keyword with `after`.
|
||||
|
||||
> `new.adsf`
|
||||
|
||||
Refers to _new_ value being added to a table.
|
||||
|
||||
> `old.adsf`
|
||||
|
||||
Refers to _old_ vvalue being changed in a table.
|
||||
|
||||
|
||||
## Trigger Metadata
|
||||
|
||||
If you want to look at what triggers exist you can query the `sql_master` table.
|
||||
|
||||
```
|
||||
select * from sql_master where name='trigger';
|
||||
```
|
||||
|
||||
26
363/lec/lec11.md
Normal file
26
363/lec/lec11.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# lec11
|
||||
|
||||
_this section still needs more info_
|
||||
|
||||
## Query processing
|
||||
|
||||
Keep in mind we are still concerned with systems like sqlite3.
|
||||
|
||||
First we have to parse an input to validate it.
|
||||
Then we should also validate any semantics about the input, ensure that the given _tables, objects etc_ are correct.
|
||||
Finally we should somehow calculate the input: usually by converting the given expression to the equivalent relational algebra expression.
|
||||
|
||||
If we can optimize this expression we can then create more efficient queries.
|
||||
To do this we take into account 3 main factors:
|
||||
1. I/O time
|
||||
* if we have to write something to disk over and over again then we
|
||||
2. Computational Time
|
||||
3. Required memory/disk space
|
||||
|
||||
### Cost funtion
|
||||
|
||||
## Performance of Disk and RAM
|
||||
|
||||
## DB Block
|
||||
|
||||
## Disk Buffers
|
||||
43
363/lec/lec12.md
Normal file
43
363/lec/lec12.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# lec12
|
||||
|
||||
## Lab
|
||||
|
||||
This section has a lab activity in `lab/` with instructions on `in-memory-searches.pdf` and `on-disk-search.pdf`.
|
||||
|
||||
## In-memory Search
|
||||
|
||||
_For now we'll deal with trivial queuries._
|
||||
|
||||
Say we perform this query: `select name from censusData where age<30;`.
|
||||
|
||||
|
||||
If we do a linear search we will nearly always have to go through all `N` records in the table to get the data we want out.
|
||||
Binary searches prove to be quicker but our data must be ordered in some fashion.
|
||||
_Note:_ just recall that we can only sort a table's entries by a single column at any given time.
|
||||
The other problem we encounter is that our data must _always_ remaini sorted, which means entering, modifying, and deleting data has much larger overhead than other methods.
|
||||
|
||||
## On-Disk Search
|
||||
|
||||
There are two main ways of storing the data on disk: by record or by column.
|
||||
Likewise we also have to deal with variable length data types like `varchar` which provides an uppoer bound but no fixed size necessarily.
|
||||
|
||||
### Blocks
|
||||
|
||||
Blocks contain records or sometimes columns depending on the implementation.
|
||||
We usually allocate these blocks in 4k or 8k bytes of space since sectors are split into 512 byte chunks.
|
||||
These things are taken into account because I/O time sucks, it always has and until ssd's lifetime performace doesn't suck this always will.
|
||||
|
||||
The main issue with getting data off the disk isn't the read time, it's the time to find something in the first place. This is because we write to the disk in a fashion that _isn't_ completely linear.
|
||||
|
||||
Also keep in mind that our total I/O time to search for something is going to be T~access~ + T~transfer~\*N~records~.
|
||||
|
||||
* If we search on a keytype then we only have to search half the records.
|
||||
* Also this is assuming that _all_ the blocks are right next to each other.
|
||||
|
||||
If we search for some blocks that happen to be right next to each then we only need to bother finding the first block but with a binary search we have to bother accessing _every single research_.
|
||||
This is because unlike memory which is managed by a well written OS, the disk is dumb... very dumb.
|
||||
The way it(physical machine disk) writes/modifies data is nearly always trivial, meaning there is no clever way that it is writing data.
|
||||
This is half the reason we say that I/O time sucks.
|
||||
Because hard disks are slow and stupid compared to memory which is quick and clever.
|
||||
|
||||
|
||||
34
363/lec/lec13.md
Normal file
34
363/lec/lec13.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# lec13
|
||||
|
||||
## Lab Exercises
|
||||
|
||||
This lecture has a lab portion in `lab/` directory.
|
||||
Directions are on `index-structures-lab.pdf` and `ordered-indexes-lab.pdf`.
|
||||
|
||||
## Indexing
|
||||
|
||||
To create an index we do:
|
||||
|
||||
```
|
||||
create index indexName on targetTable(attrs);
|
||||
```
|
||||
We create an index based on some field, where we sort the entries in this index table.
|
||||
Each entry then contains a pointer to each record in the target table.
|
||||
Sorting the indexes allows us to search them _much faster_ than we could ever do on disk.
|
||||
|
||||
> What about collision?
|
||||
|
||||
Then we simply add a pointer to the index's list of associated pointers.
|
||||
|
||||
The biggest problem we have with indexing that if have a large number of entries then we would end up storing a huge number of indexes and pointers.
|
||||
In order to avoid this, we don't take all of the entries.
|
||||
Instead of taking all entries we take instead every other entry into our index or even every third.
|
||||
This means that if we have a search that lands us inside one of the gaps we still search in a binary fashion but once we detect that we are we should search a _gap_ we linearly search through that gap.
|
||||
|
||||
|
||||
## Clustering
|
||||
|
||||
First let's recall that ideally our data entries in some table are physically located close to each other on disk _and_, are ordered somehow.
|
||||
|
||||
### Dense Clustering
|
||||
### Sparser Clustering
|
||||
35
363/lec/lec14.md
Normal file
35
363/lec/lec14.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# lec14
|
||||
|
||||
Let's say we have a massive dense index, so large that we can't fit it into memory.
|
||||
We can use this dense index to create a sparse index off of that target dense index.
|
||||
Basically we're indexing the index to reach some data on disk that is very big.
|
||||
|
||||
We can of course take this even further to index and index which is indexing an index of a table.
|
||||
This concept leads us to B+ Trees.
|
||||
|
||||
## B+ Trees
|
||||
|
||||
This type of tree here is a self-balancing tree.
|
||||
This means that as we add rows our indexes the structure will adjust as necessary, meaning the indexes are updated and the pointers in the indexes are updated so that the height of the tree remains balanced.
|
||||
|
||||
The leaves on the tree will _always_ have pointers to the target data which we've built a tree upon.
|
||||
|
||||
### Overhead
|
||||
|
||||
> found on disk or ram?
|
||||
|
||||
> whats the cost for balancing the tree everytime
|
||||
|
||||
## Hashing
|
||||
|
||||
### Direct Indexing
|
||||
|
||||
We can create some index where each entry is an index number with a pointer to that respective entry.
|
||||
If we have id numbers that aren't right next to each other then we have a problem: we also have to include every intermediate values in our table.
|
||||
|
||||
### Hash Function
|
||||
|
||||
Now we'll have some function that takes an input[key] and generates a value[index].
|
||||
_For now we'll assume we are dealing with collisions with chaining._
|
||||
|
||||
|
||||
44
363/lec/lec15.md
Normal file
44
363/lec/lec15.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# lec15
|
||||
|
||||
This lecture has two corresponding lab exercises `lab/hashing-lab.pdf` and `lab/other-operations.pdf`.
|
||||
|
||||
## Deleted Data on Disk
|
||||
|
||||
Let's say we did the following operations on our disk:
|
||||
|
||||
```
|
||||
insert data1
|
||||
insert data2
|
||||
delete data1
|
||||
lookup data2
|
||||
```
|
||||
|
||||
Let's say that when we inserted data2 with a hash function there was a collision with data1.
|
||||
In sequential storage we would normal try to put data2 right after data1.
|
||||
If we try to lookup data2 through our hash function we would again land at data1 so we would have to search linarly for data2.
|
||||
Now let's suppose that data1 is deleted.
|
||||
If we lookup data2 again we would still land at data1's location but this time there's no collision, ergo, there's no linar correction to reach data2.
|
||||
This is why when something is deleted on disk we don't actually delete things.
|
||||
Instead we simply _mark_ or _flag_ a block for deletion.
|
||||
This means we still get a collision so that we can linearly correct for data2.
|
||||
|
||||
The other side to this is that if we do another insert that collides with data1's location we are allowed to overwrite that data because it has been marked for deletion.
|
||||
|
||||
## 'where' clause
|
||||
|
||||
Let's say we have the following query:
|
||||
|
||||
```
|
||||
... where condition or other_condition;
|
||||
```
|
||||
|
||||
By default the database will try to optimize the query by effectively replacing the query with its own version of the same query but tailored specifically for that task.
|
||||
|
||||
We can also use `and`'s with the `where` clause which the databse must also evaluate to create a more efficient query.
|
||||
|
||||
## Pages in Memory
|
||||
|
||||
If we have a large table that won't fit into memory we can partition that table so that when we push it into memory it fits in our pages.
|
||||
We can do a first pass where we sort individual partitions in the memory pages.
|
||||
This first pass will temporarily write our sorted paritions to the disk where we can then gladitorially sort the partitions against eacch other, writing the result to some output.
|
||||
The previous temporary files from earlier can then be marked for deletion.
|
||||
61
363/lec/lec16.md
Normal file
61
363/lec/lec16.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# lec16
|
||||
|
||||
Let's now go into how we build a utility like the _transaction_ in SQLite.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
> Why we need or care about transactions.
|
||||
|
||||
If we have tons of users all trying to access a databse to say, reserve a hotel room, we need to make sure that each operation doesn't fail or write over each other.
|
||||
Otherwise we're going to have tons of undefined behavior.
|
||||
|
||||
## A.C.I.D Principles
|
||||
|
||||
### Atomicity
|
||||
|
||||
Mneumonically: _all or nothing_
|
||||
|
||||
Either everything in our transaction happens, or none of it happens.
|
||||
The reason why we care about this is because we want to be able to _recover_ from problems, like a power outage for instance, or some error which causes a halt.
|
||||
|
||||
To acheive this we have to log everything we're going to do.
|
||||
Before we do anything in our transactions, we log what we're going to do, what changes are being made and what those changes are.
|
||||
|
||||
WAL: _write-ahead logging_
|
||||
|
||||
### Consistency
|
||||
|
||||
Like the name implies we mean to say that our transactions should result in a predictable output everytime.
|
||||
|
||||
### Isolation
|
||||
|
||||
Transactions should never be able to peek into another transaction.
|
||||
As the name implies the transaction runs alone.
|
||||
|
||||
### Dependability
|
||||
|
||||
Essentially once we reach the end of a transaction we should commit those changes to the database.
|
||||
This way if something goes wrong, where the whole database needs to be shutdown, our changes should still be there.
|
||||
_Basically this means that we dump anything our transaction buffer onto disk_.
|
||||
|
||||
To achieve this we must verify that the changes were actually committed to the disk.
|
||||
|
||||
|
||||
## Serializability
|
||||
|
||||
What we ultimately want is to be able to operate on multiple transactions while also being able to get the same result as if we had done everything in linear order.
|
||||
We want that result because it maintains isolation for each transaction.
|
||||
|
||||
## Transaction Schedule
|
||||
|
||||
If we have two complex transactions that need to run then we can schedule them in some manner.
|
||||
Sometimes it means that we do one transaction first then another, and sometimes it means we do pieces of both in some order.
|
||||
The latter is known as _interleaving_.
|
||||
|
||||
Just like individual transactions we can serialize schedules.
|
||||
|
||||
### More on interleaving
|
||||
|
||||
We mentioned interleaving earlier.
|
||||
Basically this just means that we run part of one transaction then another part of a _different_ transaction.
|
||||
We only do this if the result of this operation is the same as running them in a serialized fashion.
|
||||
47
363/lec/lec17.md
Normal file
47
363/lec/lec17.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# lec17
|
||||
|
||||
The previous lecture we covered methods and principles of transactions.
|
||||
This time around we'll take care of proper ordering of operations.
|
||||
|
||||
## Operation Order
|
||||
|
||||
If two\* transactions work on two different data items then we know that they shouldn't collide in their operative results, therefore the order wouldn't matter.
|
||||
The order matters if there is a collision between transactions on similar data.
|
||||
|
||||
_Conflict Serializability_ : the ability to swap an interleaved schuedule into a serialized schedule while maintaining the conflict result from the start to the end.
|
||||
|
||||
## Determining Serializability
|
||||
|
||||
We can go through a schedule where each transaction is placed into a graph as a node.
|
||||
We draw edges from each node to another if say we run into a read in transaction-A, followed later on by a opposing write action in another transaction.
|
||||
The opposite also applies.
|
||||
|
||||
Our schedule is not serializable if we have a cycle in the resulting graph.
|
||||
|
||||
## Locks
|
||||
|
||||
Exclusive lock: process locks the database for itself.
|
||||
|
||||
Shared lock: allows others to put locks on the databse but not exclusive locks
|
||||
|
||||
There are some drawbacks to using locks, especially if done poorly.
|
||||
If transaction-a locks some data, say exculsively, but doesn't release the lock before another transaction tries to use that data means that we may end up in a state where everyone is locked out of certain data.
|
||||
For this reason we use a special locking protocol to take care of this exact scenario.
|
||||
|
||||
The state where everyone is locked out of something is called a deadlock.
|
||||
|
||||
### Two-Phase locking Protocol
|
||||
|
||||
The two phases include the _growing_ and _shrinking_ phase.
|
||||
This means we are getting more and more locks before we finally release locks until there are none left.
|
||||
We don't mix locks and unlocks however, so `[lock lock free lock free free]` isn't valid but `[lock lock lock free free free]` is fine.
|
||||
We get two main advantages from this:
|
||||
|
||||
1. Serializability is maintained
|
||||
2. Dead locks are easy to find
|
||||
|
||||
Keep in mind however, deadlocks still happen with this protocol.
|
||||
|
||||
|
||||
|
||||
|
||||
22
363/lec/lec18.md
Normal file
22
363/lec/lec18.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# lec18
|
||||
|
||||
Using graphs & trees to avoid deadlocks.
|
||||
|
||||
## Trees
|
||||
|
||||
If we have a tree filled some data that we want to access.
|
||||
With our first accessing into the tree we may lock whichever node we want, however, every subsequent lock after that point must happen _only_ if the parent node to that target is locked.
|
||||
|
||||
The main disadvantage to this methodology is that if we want to access the root node and a leaf node, it means we must do a lot of intermediary locking.
|
||||
|
||||
## Snapshot Isolation
|
||||
|
||||
For this strategy we're going to scrap the idea that we're going to be using locks, graphs or even trees.
|
||||
|
||||
Instead, when a transaction is about to run, we take a snapshot of everything we're going to modify, then work from there.
|
||||
When we commit on the first transaction we'll query to see if anything else has changed the data we're trying to write to.
|
||||
If nothing comes up we commit with no issue.
|
||||
If something does come up we abort and restart the transaction with a new snapshot, _this time with the new stuff_.
|
||||
This time around we should be ok to commit.
|
||||
|
||||
The overhead comes in hard if we have to be correcting transaction but, if we don't find ourselves doing that too much then it beats graphs and trees since there's barely anything to maintain.
|
||||
63
363/lec/lec19.md
Normal file
63
363/lec/lec19.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# lec19
|
||||
|
||||
Let's say you are asked to model a business's data.
|
||||
The first question you may have is, _where do I even begin_.
|
||||
If you are in charge of this kind of project there are some things to consider:
|
||||
|
||||
* Scalability for future changes
|
||||
* What kind of data you are dealing with
|
||||
|
||||
## Design process
|
||||
|
||||
1. Understand the problem at hand and needs of users
|
||||
2. Create conceptual design
|
||||
|
||||
## Entity-Relationship (ER) Models
|
||||
|
||||
Used for conceptual design.
|
||||
There are 3 building blocks for an ER model:
|
||||
|
||||
* Entities
|
||||
* Relationships
|
||||
* Attributes
|
||||
|
||||
For the rest of this lecture we'll be using a book store as our working example to get through this concept.
|
||||
|
||||
## Entity & Entity Set
|
||||
|
||||
Often with entities we're really referring to real world _things_ which have properties of their own.
|
||||
An entity for a bookstore would just be something like a book, author, or a publisher.
|
||||
We would have to consider these things when modeling the pertinent data in regards to the business or organization, because we have to keep track of these things to ensure that the business runs smoothly.
|
||||
|
||||
If we keep track of things like the books in our store, then we might avoid accidently ordering too many of the same book, or running out of one specifically.
|
||||
|
||||
## Relationships
|
||||
|
||||
A _relationship_ is simply an association between entities.
|
||||
Furthermore entities, can participate in relationships by simply being related to some other entity.
|
||||
|
||||
Coming back to our exmample, books and publishers, are two entities whom participate in a relationship together.
|
||||
Likewise we can two books which may be a part of a long running series, which means they should be related.
|
||||
This would mean we have a book in a relationship with a book.
|
||||
Each book however, takes on a different _role_ in the relationship; perhaps one book is the sequel to the other.
|
||||
|
||||
A more clear example might be that a book has an author which means the two must be related, therefore we may create associate some kind of relationship.
|
||||
Likewise the same book entity may participate in another relationship if appropriate, like with the publisher.
|
||||
|
||||
## Mapping Cardinalities
|
||||
|
||||
> One-to-One Mapping
|
||||
|
||||
Say we have to entity sets, where each entity in each set is related to one entity in the opposing set[_ex. every office has 1 instructor_].
|
||||
A is in a relationship with B or, B is in a relationship wth A.
|
||||
Both explanations are fine and valid.
|
||||
|
||||
> One-to-Many/Many-to-One
|
||||
|
||||
All the entities in one set are related to by _at most_ one entity from the other set[A student can have at most one advisor].
|
||||
The student set would by one-to-many while the advisor set would be many-to-one.
|
||||
|
||||
> Many-to-Many
|
||||
|
||||
Each entity in a given set may have zero or more relationships with entities in an opposing entity set.
|
||||
Likewise the inverse is also true.
|
||||
72
363/lec/lec2.md
Normal file
72
363/lec/lec2.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# lec2
|
||||
|
||||
Covering `tables, tuples, data stuff`
|
||||
|
||||
## Problem Statement
|
||||
|
||||
We need to be able to manipulate data easily
|
||||
|
||||
For sometime previous databses systems, like IMS, had been using tree structures for a while but, there was still a demand for a system that _anyone_ could use.
|
||||
This issue is what brings us closer to the table structure that we have mentioned in previous lectures.
|
||||
|
||||
To actually guide _how_ we use tables we'll use the followig logic:
|
||||
|
||||
* Rows --> Contain whole entries or records
|
||||
* all the data in a row is meant to go together
|
||||
|
||||
* Columns --> Individually they are attributes or fields
|
||||
* Each column is guaranteed to have _only_ 1 type of data in it(e.g. name, title, balance, id\_number)
|
||||
|
||||
* Table --> __relation__
|
||||
|
||||
Relational instance as well for another term
|
||||
|
||||
* Domain
|
||||
* The set of values allowed in a field
|
||||
|
||||
## NULL
|
||||
|
||||
`NULL` is special, especially in sqlite3 because we aren't allowed to perform operations with it at all.
|
||||
If we tried to do for example `NULL < 3` then we would just get back NULL; that way we avoid non-deterministic behavior and we are able to parse out bad results later on.
|
||||
There are a few exceptions to NULL however, where they will be accounted for.
|
||||
|
||||
* Count
|
||||
* We only count if there is a row there or not, the data inside does not matter in this context.
|
||||
|
||||
## Keys Types
|
||||
|
||||
* Super Key
|
||||
* Candidate Key
|
||||
* Primary Key
|
||||
|
||||
### Problem Statement
|
||||
|
||||
The rows are not distinguishable from each other; we still have a mess of data sitting there unlabeled. Some kind of identifier is necessary to be able to access every tuple in the relational set.
|
||||
|
||||
### SuperKey
|
||||
|
||||
A set of attributes is a __superkey__ for a table as long as that combination of fields remains unique for every tuple in the relational set.
|
||||
In other words if we have multiple fields; f1 f3 f5 might be a good combo to use as a key into the table, because it might be able to identify a unique entry in our table.
|
||||
|
||||
* What's a valid superkey?
|
||||
|
||||
For starters anything that contains another valid superkey
|
||||
Any subset of a full tuple that can uniquely identify any row in the table.
|
||||
|
||||
* Can a whole row be a superkey?
|
||||
As long as it can identify any unique row in a table then it _is_ a superkey for that table.
|
||||
|
||||
### Candidate Key
|
||||
|
||||
Any super key that wouldn't be a superkey if one of the attr were removed. Say then that we have a super key that takes columns {1,3,5,6,7}, but removing anyone of the rows no longer reliably returns an arbitrary _unique_ row.
|
||||
To put it simply it is the most minimal superkey; though this doesn't entail that there can't be multiple candidate keys for a given table.
|
||||
|
||||
### Primary key
|
||||
|
||||
Any candidate key the database designer has chosen to serve as the unique
|
||||
|
||||
### Foreign Key
|
||||
|
||||
If a table/relation includes among it's attributes the primary key for another relation then it is referred to as a foreign key because that key references another relation.
|
||||
The table being refferred to is identified as a referenced relation.
|
||||
|
||||
32
363/lec/lec20.md
Normal file
32
363/lec/lec20.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# lec20
|
||||
|
||||
_more on mapping cardinalities_
|
||||
|
||||
## Pariticipation Constraints
|
||||
|
||||
These are _perscriptive_ constraints.
|
||||
|
||||
* total
|
||||
|
||||
If _all_ the entities in a set participate we say that the participation is total for that set.
|
||||
|
||||
* partial
|
||||
|
||||
If even one entity in the set is not participating then the pariticpation is _partial_ for that set.
|
||||
|
||||
## Entity Keys
|
||||
|
||||
If we have a table for a relationship, we can identify all the relationships if we can uniquely identify enties from either given set.
|
||||
Essentially if we can identify both(all) pariticipants in a given relationship table we can find any relationship in our relation-set.
|
||||
|
||||
## Weak Entity Sets
|
||||
|
||||
Any set where we can not uniquely all entities in that set.
|
||||
Let's say we have a tournament.
|
||||
|
||||
We'll have players, with some _name_ and _jersey number_.
|
||||
We also have teams with a _team-name_ and likely some _team-id_.
|
||||
|
||||
This means our players entity set is a weak set, but, because _all_ players participate in a team by definition of what a player is.
|
||||
Furthermore we may use the relationship between teams and players to idetify players.
|
||||
|
||||
56
363/lec/lec21.md
Normal file
56
363/lec/lec21.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# lec21
|
||||
|
||||
## Strong & Weak Entity sets
|
||||
|
||||
Strong: has a primary key in the set
|
||||
|
||||
Weak: opposite of strong
|
||||
|
||||
## Diagram things
|
||||
|
||||
_pretty formatting on diagrams means stuff_
|
||||
|
||||
* Arrows
|
||||
|
||||
* solid lines
|
||||
* many
|
||||
* Contributer [solid] contribution [solid] candidate
|
||||
|
||||
* dotted lines
|
||||
* only one
|
||||
|
||||
Let's say we have solid/arrow
|
||||
|
||||
```
|
||||
student{id/name} --- [advisor] --> instructor{id/name}
|
||||
```
|
||||
Students can have at most 1 advisor \
|
||||
Instructor's can advise multiple students
|
||||
|
||||
If we want to have some kind of advisor table we can identify each relationship with _just_ the student-id.
|
||||
We can do this because each student will only every 0,1 instructor's in an adivsory relationship
|
||||
|
||||
## Composite Structures
|
||||
|
||||
Logically compositing makes sense but SQL does not like much aside from primitives so we can't really do that
|
||||
|
||||
Say then we have:
|
||||
```
|
||||
contributor:
|
||||
id
|
||||
name
|
||||
address <-- f to pay respects
|
||||
city
|
||||
state
|
||||
zip
|
||||
```
|
||||
|
||||
We just drop the address part if we wanted to convert this to something in sql.
|
||||
Reasoning here is case by case likewise need-to-know basis: ergo stick to da plan.
|
||||
|
||||
## Normalization
|
||||
|
||||
Process used to imporve schemas
|
||||
|
||||
Basically a method of setting up rules to stop random bs from happennig.
|
||||
Also we typically remove redundancy from our schemas through this process
|
||||
34
363/lec/lec22.md
Normal file
34
363/lec/lec22.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# lec22
|
||||
|
||||
## Functional Dependancy
|
||||
|
||||
If we have an attribute `a` that could produce `b,c,d` reliably everytime then we would only need to keep track of `a` instead of keeping track of all the repeats because the dependants depend on `a`.
|
||||
|
||||
Example:
|
||||
```
|
||||
building -> roomNumber
|
||||
```
|
||||
This one makes sense since we would likely be able to find duplicate room numbers in different buildings.
|
||||
Most places have a name for each building however.
|
||||
|
||||
## BCNF
|
||||
|
||||
> Boyce Codd Normal Form
|
||||
|
||||
If we have a key which sources our redundancy we're ok because we can easily find all redundancies.
|
||||
If we're not looking at a key then we could be in the middle of a dependancy nest.
|
||||
|
||||
Say we have a schema like `advisees(inst_id, student_id, student_name)`.
|
||||
Student name depends on `student_id` so there's no reason to have it in the schema.
|
||||
|
||||
|
||||
A schema follows BCNF if there's a non-trivial functional dependancy
|
||||
|
||||
* x -> y for r
|
||||
* then x is a superkey for r
|
||||
|
||||
## Example
|
||||
|
||||
```
|
||||
instructor_offices(inst_id, name, office)
|
||||
```
|
||||
41
363/lec/lec23.md
Normal file
41
363/lec/lec23.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# lec23
|
||||
|
||||
_More on stuff about building a usable model for given data; more specifically BCF(Boyce Codd Normal Form)_
|
||||
|
||||
BCNF : any table where the are no redundancies based on functional dependancies
|
||||
|
||||
>
|
||||
|
||||
## Lossless Decomposition
|
||||
|
||||
If we split a table that isn't in BCN form so that the new tables are in BCN form we should be able to natural join them back to the original state.
|
||||
|
||||
## Normalization 3: Third Normal Form
|
||||
|
||||
Take everything out from the original table which is part of a functional dependancy.
|
||||
|
||||
Example:
|
||||
|
||||
original: `id name office` {id -> name}
|
||||
|
||||
Table 1: `id name` [functional dependancy participants]
|
||||
|
||||
Table 2: `id office` [everything else + root of FD]
|
||||
|
||||
This may be more expressive but the problem is then performance takes a hit because if we want to look for all the information from the first table we have to do a bunch of joins.
|
||||
This means going off to disk a bunch of times and back to memory bleh its slo.
|
||||
|
||||
Let's say we have the following table(text view pls):
|
||||
|
||||
student-id | dept-name | instructor-id
|
||||
------------|---------------|----------------
|
||||
1 | Biology | 10
|
||||
1 | Chemistry | 20
|
||||
2 | Biology | 10
|
||||
|
||||
|
||||
## Lab Excercise
|
||||
|
||||
1. BCNF: the form a table follows if it doesn't have redundancy which comes from functional dependency
|
||||
2. `order(CustID, CustName, ItemNum, Date)`: no because name depends on id?
|
||||
* close: No. we can still create one more table aside with `id name`. we can create `id itemNum date`
|
||||
27
363/lec/lec24.md
Normal file
27
363/lec/lec24.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# lec24
|
||||
|
||||
## NoSQL
|
||||
|
||||
Why NoSQL is a thing
|
||||
|
||||
1. Scaling
|
||||
|
||||
Scaling demands more computing power over time.
|
||||
Relational databases require more vertical scaling where machines must be upgraded constantly.
|
||||
|
||||
Alternatively horizontal scaling allows for clustering computing style upgrades where computational power can be cheaper for upgrades maintaining support for large userbase.
|
||||
|
||||
2. Data Migration
|
||||
|
||||
If you change a schema then you have to painfully move everything over with the new schema.
|
||||
This means downtime in some cases, no matter how small the change.
|
||||
|
||||
3. OOP
|
||||
|
||||
`Rows != Objects`: the key distinction here is that objects can contain other objects while SQL rows can not.
|
||||
|
||||
4. Open Source
|
||||
|
||||
Many of the more popular NoSQL database systems happen to be open source and have permissive licensing.
|
||||
|
||||
Getting started can suck though because you have sometimes have to implement certain features with something like _MySQL_ already have.
|
||||
40
363/lec/lec25.md
Normal file
40
363/lec/lec25.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# lec25
|
||||
|
||||
## Master-Slave
|
||||
|
||||
Here we'll have a master node which handles updating data, but the slaves only deal with reads.
|
||||
|
||||
## Peer to Peer
|
||||
|
||||
In this case we still have a reverse sink where things are input and then sent through some pipes to all the other nodes on that network.
|
||||
|
||||
## Redundancy
|
||||
|
||||
Apart from being redudant with data we can also do the same thing with workloads.
|
||||
|
||||
## CAP Theorem
|
||||
|
||||
* consistency
|
||||
* availability
|
||||
* partition tolerance
|
||||
|
||||
Definition: You can't have all of these without some large tradeoffs.(shit happens all the time)
|
||||
|
||||
If you optimize for accessability then you may not be able to optimize for consistency without sacrificing for one or the other.
|
||||
Say we have the following constraints:
|
||||
|
||||
* n = number of replicas
|
||||
* w = number of nodes that must ack a write
|
||||
* r = number of nodes that must ack a read
|
||||
|
||||
If you optimize for reads: r=1 w=n
|
||||
|
||||
Then reading is quick, meaning your data is more accessible, but it's also less reliable.
|
||||
|
||||
## Pessimistic Replication
|
||||
|
||||
Traditional strategy:
|
||||
|
||||
* block access to data until it is up to date
|
||||
|
||||
|
||||
37
363/lec/lec26.md
Normal file
37
363/lec/lec26.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# lec26
|
||||
|
||||
## Some types of Database Structures
|
||||
|
||||
1. Key-Values
|
||||
|
||||
Just like a standard map, we provide a `key` and get a `value`.
|
||||
Maks things easy for usage but recall that our hash function is really delicate.
|
||||
Because we're using a map or a set type of container.
|
||||
|
||||
2. Document
|
||||
|
||||
Welcome to json/xml world.
|
||||
Now we just look for documents(json/xml) instead of looking for some data with a predefined structure.
|
||||
|
||||
3. Column Family
|
||||
|
||||
Variant of key/value but this time we store things in columns instead of rows.
|
||||
Advantage here is that we can quickly search through columns for analysis type things.
|
||||
|
||||
4. Graph
|
||||
|
||||
Data is a _graph_(wow).
|
||||
|
||||
We still have some key-value system to find a node in the graph but we can create edges between values to create relationships.
|
||||
|
||||
## NoSQL
|
||||
|
||||
### In favor of usage
|
||||
|
||||
* Data is not uniform
|
||||
* Dataset is massive
|
||||
|
||||
### Against
|
||||
|
||||
* You need consistency
|
||||
*
|
||||
77
363/lec/lec3.md
Normal file
77
363/lec/lec3.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# lec3
|
||||
|
||||
## Relational Algebra and its relation to SQL
|
||||
|
||||
### SELECT
|
||||
Used to select columns from some table.
|
||||
|
||||
> Relational symbol: pi
|
||||
|
||||
### Projection
|
||||
Picks out the rows of a (set of) table(s)
|
||||
|
||||
> Relational symbol: sigma
|
||||
|
||||
### Union
|
||||
Adds the rows of two tables into some new table.
|
||||
Removes duplicates from resultant table as well.
|
||||
|
||||
> Relational symbol: U(looks like a U but w i d e)
|
||||
|
||||
### Relational Algebra on paper
|
||||
|
||||
Even though conceptually this maps straight to SQL there is a handwritten way to express this on paper. The following is a _plain_ english way of reading these pape operations.
|
||||
|
||||
> /select_(concatenation of fields)[target table]
|
||||
|
||||
> /project_(_list of fields_)[target table]
|
||||
|
||||
> {resultant table} /union {resultant table}
|
||||
|
||||
## Cartesian Product
|
||||
|
||||
We take the _crossproduct_ of two relations(tables)
|
||||
|
||||
Take the __N'th__ row of the first and combine linearly with the rows from the second table.
|
||||
The result should be a new table with R1_(rowcount) * R2_(rowcount).
|
||||
This result isn't something that we would particularly care about since there is so much data which is now mixed throughout.
|
||||
|
||||
## SQL
|
||||
|
||||
Declarative language which mainly queries databases, and deals with schemas.
|
||||
|
||||
> Data-Definition language (DDL)
|
||||
* Create/modify/delete schemas
|
||||
* define integrity contraints
|
||||
* define views
|
||||
* drop tables
|
||||
|
||||
> Data Manipulation Language (DML)
|
||||
* Queries and whatnot
|
||||
|
||||
### Define a relation Schema
|
||||
|
||||
```
|
||||
create table tableName (
|
||||
fieldName type(argv...),
|
||||
...
|
||||
fieldName1 type(length) not null,
|
||||
primary_key(fieldName,[...]),
|
||||
foreign_key(fieldName,[...]) references otherTable
|
||||
|
||||
);
|
||||
```
|
||||
#### Contraints
|
||||
|
||||
`not null`: Requires that the field not be null when data is being inserted.
|
||||
|
||||
`primary key`: Can be used inline to show that some field is a primary key on its own.
|
||||
|
||||
### Field Types
|
||||
|
||||
`varchar(length)`: _variable_ length string of characters. If we have `6` as a maxiumum then we may only end reserving `4` bytes(words) of space somewhere.
|
||||
|
||||
`char(length)`: _Fixed_ length string of characters. If we have `5` then we must have a string length of `5`.
|
||||
|
||||
`numeric(p,d)`: Fixed-point number of `p` digits with `d` digits to the right of the decimal.
|
||||
|
||||
15
363/lec/lec4.md
Normal file
15
363/lec/lec4.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# lec4
|
||||
|
||||
This section mostly relies on practicing some of the most basic commands for sqlite3, for that reason most of the content is expressed through practice in the lab sub-section.
|
||||
|
||||
## Lab*
|
||||
|
||||
This lecture has some lab questions in the `lab/` directory named `table1.pdf` *and* some example data called `patients.sql`.
|
||||
`table1.pdf` will have some exercises to learn the basic commands of sqlite3 and `patients.sql` should have some example data which _table1_ asks you to query.
|
||||
|
||||
## Serverless
|
||||
|
||||
Instead of having listen server listen for requests to perform actions upon these requests we simply have some database held on our own machine and we perform all of our sql commands on that machine.
|
||||
For now we'll be dealing with small test databases so that we can practice the commands and observe each one's behavior; this will give you a good feeling of what does what in sqlite3.
|
||||
|
||||
|
||||
41
363/lec/lec5.md
Normal file
41
363/lec/lec5.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# lec5
|
||||
|
||||
## Lab
|
||||
|
||||
This lecture will have a lab activity in `cst366/lab/1994-census-summary.sql` with instructions found in `single-table-queries-2-lab.pdf`.
|
||||
|
||||
|
||||
## Distinct Values
|
||||
|
||||
* Mininum - min(field)
|
||||
|
||||
Finds the smallest value in the given filed
|
||||
|
||||
* Maximum - max(field)
|
||||
|
||||
Find the largest value in the given field
|
||||
|
||||
Say we have a column where we know there are duplicate values but we want to konw what the distinct values in the column may be.
|
||||
SQLite3 has a function for that: `select distinct field, ... from table;`
|
||||
|
||||
* select substr(field, startIndex, length) ...
|
||||
|
||||
_Note_: the start index starts counting at `1` so keep in mind we are offset `+1` compared to other language like C.
|
||||
|
||||
## Joins
|
||||
|
||||
Now we want to join to tables together to associate their respective data.
|
||||
To accomplish this we can perform a simple `join` to combine tables.
|
||||
Important to note that a simple join does not necessarily take care of duplicate fields.
|
||||
If we have duplicate fields we must denote them as `target.field`.
|
||||
Here `target` is the table with the desired table and `field` is the desired field.
|
||||
|
||||
## Type Casting
|
||||
|
||||
If we have say `"56"` we can use a cast to turn it into an integer.
|
||||
|
||||
> cast(targetString as integer)
|
||||
|
||||
This will return with an error if a non number character is given as input to the cast function, here in this example we denote it with `targetString`.
|
||||
|
||||
|
||||
19
363/lec/lec6.md
Normal file
19
363/lec/lec6.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# lec6
|
||||
|
||||
## Lab activity
|
||||
|
||||
This lecture features a lab activity in the lab/ directory named: `courses-ddl.sql` with instructions in `simple-joins-lab.pdf`.
|
||||
|
||||
* Note: Just make sure to read in courses-ddl.sql _first_ then courses-small.sql _second_ otherwise there will be random errors.(I'm not taking responsibility for that garbage so don't flame me)
|
||||
|
||||
## Natural Joins
|
||||
|
||||
`Natural Joins`: allows us to join tables while getting rid of duplicate columns automatically.
|
||||
|
||||
Form:
|
||||
|
||||
```
|
||||
select columns_[...] from tableLeft natural join tableRight
|
||||
```
|
||||
While there is no need to write extra `where` statements there is also the issue where there may be accidental matches since attributes are dropped.
|
||||
This implies that if two tables have attributes with the same field name, then only one will be returned in the resulting table.
|
||||
45
363/lec/lec7.md
Normal file
45
363/lec/lec7.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# lec7
|
||||
|
||||
## Lab Activity
|
||||
|
||||
This lecture has two correspondnig lab activities in `lab/` using `1994-census-summary.sql` with instrucctions on `aggregation-lab.pdf` and `nested-subqueries-lab.pdf`.
|
||||
|
||||
## Null Operations
|
||||
|
||||
Take the following table as a trivial example of working data
|
||||
|
||||
| a | b |
|
||||
|---|---|
|
||||
| 1 | 2 |
|
||||
| 3 | N |
|
||||
|
||||
Where `a` and `b` are attributes and N signifiies a NULL value.
|
||||
If we `select a+b from table` we only get back 2 rows like normal but the second row is left empty since we are operating with a NULL value.
|
||||
Even if we use multiplication or some kind of comparison against null we simply ignore that row since NULL in sqlite3 doesn't mean 0.
|
||||
Instead NULL in sqlite3 actually represents something that doesn't exist.
|
||||
|
||||
> count will treat NULL as 0 however
|
||||
|
||||
This is the only exception to the _ignore NULL_ "rule".
|
||||
|
||||
## Aggregation
|
||||
|
||||
This section we'll deal with functions similar to `count average min max`.
|
||||
We call these functions _aggreagate_ functions because they aggregate multiple data points into one.
|
||||
|
||||
> round(integer)
|
||||
|
||||
Rounds off the floating point number to some level of precision.
|
||||
|
||||
> group by _attr_
|
||||
|
||||
This will group attributes to gether in the result of a query
|
||||
|
||||
> having(attribute)
|
||||
|
||||
Similar to `where` however we only care about group scope in this case.
|
||||
|
||||
## Nested Subqueries
|
||||
|
||||
Recall that when we perform a query the result is a table.
|
||||
We can leverage this and perform some query to query a resultant table to further our ability to filter results from a table.
|
||||
60
363/lec/lec8.md
Normal file
60
363/lec/lec8.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# lec8
|
||||
|
||||
## Lab
|
||||
|
||||
The lab exercises for this lecture can found under `lab/` as `db-mods-transactions-lab.pdf`.
|
||||
|
||||
|
||||
DB Modifications, plus transactions
|
||||
|
||||
## Modifyinig Data
|
||||
|
||||
Since we're dealing with data we may need to add, delete or modify entries in some table.
|
||||
|
||||
When we have inserted data before we have done simple insertions like in the previous lab exercises `insert into tableName values(...);`.
|
||||
Where the arguments are listed in the same order as they are listed in the table structure.
|
||||
|
||||
However, we can pass arguments by name, elminating the need to provide them in a rigid order:
|
||||
```
|
||||
insert into tableName(list, of, attributes) values('respective', 'data', 'entries');
|
||||
```
|
||||
|
||||
We can also move things from one table into another table.
|
||||
```
|
||||
insert into targetTable select ... from hostTable;
|
||||
```
|
||||
|
||||
### Deleting
|
||||
|
||||
```
|
||||
delete from tableName where ...;
|
||||
```
|
||||
|
||||
Deletes a _whole row_.
|
||||
Caution: the delete operation also accepts tables as valid arguments so a query that returns multiple rows as a table will be deleted in the `targetTable` mentioned earlier.
|
||||
|
||||
### Updating entries
|
||||
|
||||
```
|
||||
update tableName set attribute=123 where def='abc';
|
||||
```
|
||||
|
||||
The above updates an attribute based on the condiftion `where def='abc'`.
|
||||
|
||||
## Transactions
|
||||
|
||||
Set of instructions which upon failure do not modify any state.
|
||||
|
||||
```
|
||||
begin;
|
||||
// set of commands
|
||||
// wew
|
||||
end;
|
||||
```
|
||||
|
||||
## Inner/Outer Joins
|
||||
|
||||
> left (left outer)
|
||||
|
||||
_the outer part is implied so it's unnecessary to write it in_
|
||||
|
||||
54
363/lec/lec9.md
Normal file
54
363/lec/lec9.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# lec9
|
||||
|
||||
## Lab
|
||||
|
||||
This lecture has a corresponding lab activity in `lab/`, the instructions are named `views-lab.pdf` and the second one is `contraints-lab.pdf`.
|
||||
|
||||
## Views
|
||||
|
||||
```
|
||||
create view newTabelName as select ... from targetTable;
|
||||
```
|
||||
|
||||
This will create a `view` which whenever it is queried will pull data from some base table.
|
||||
Really the `view` is a kind of "_macro_" which is stored in a `catalog` that normal, non-admin users can use to access a database.
|
||||
The catalog is saved in a table somewhere in the database.
|
||||
Think of this catalog like a container(_table_) for the other tables in the database.
|
||||
|
||||
### Pros & Cons
|
||||
|
||||
Problems:
|
||||
|
||||
* Computing the view multiple times can be expensive
|
||||
* Maintainence
|
||||
|
||||
There are two strategies to dealing with the second item: eager and lazy strategies.
|
||||
|
||||
1. Eager
|
||||
* If the target table of some view changes the update the view immediately
|
||||
2. Lazy
|
||||
* Don't update the view until it is needed(_queried_)
|
||||
|
||||
|
||||
## Check Contraint
|
||||
|
||||
Checks values when they are inserted to validate their legitimacy.
|
||||
|
||||
```
|
||||
create table blah(
|
||||
id varchar(8) check (id like "%-%"),
|
||||
);
|
||||
```
|
||||
This is how we can avoid accidently putting in null or downright logically incorrect data into a table.
|
||||
|
||||
We can also require entries be unique as well.
|
||||
|
||||
```
|
||||
create table blah (
|
||||
dept_name varchar(20),
|
||||
...
|
||||
unique(dept_name)
|
||||
);
|
||||
```
|
||||
_KEEP IN MIND HOWEVER_. With `unique()` if we try to check if a new entry is unique it will always fail with NULL since operations with NULL results in false.
|
||||
That means we will be able to insert NULL values into the table even if they are not unique.
|
||||
Reference in New Issue
Block a user