diff --git a/main.typ b/main.typ index 226465c..e6075fb 100644 --- a/main.typ +++ b/main.typ @@ -28,48 +28,179 @@ = Algorithms -== Costs -=== Nested-loop join +== Nested-loop join -=== Block-nested join +=== Overview -=== Merge join +=== Cost -=== Hash-join +== Block-nested join + +=== Overview + +=== Cost + +== Merge join + +=== Overview + +=== Cost + +== Hash-join + +=== Overview + +=== Cost -== Overview = Relational-algebra == Equivalence rules +- Commutativity of Union: $R∪S=S∪R$ +- Commutativity of Intersection: $R∩S=S∩R$ +- Commutativity of Join: $R join S=S join R$ +- Associativity of Union: $(R∪S)∪T=R∪(S∪T)$ +- Associativity of Intersection: $(R∩S)∩T=R∩(S∩T)$ +- Associativity of Join: $(R join S) join T=R join (S join T)$ +- Theta joins are associative in the following manner: $(E_1 join_theta_1 + E_2) join_(theta_2 and theta_3) E_3 ≡E_1 join_(theta_1 or theta_3) (E_2 + join_theta_2 E_3)$ +- Distributivity of Union over IntersectionL $R∪(S∩T)=(R∪S)∩(R∪T)$ +- Distributivity of Intersection over Union: $R∩(S∪T)=(R∩S)∪(R∩T)$ +- Distributivity of Join over Union: $R join (S∪T)=(R join S)∪(R join T)$ +- Selection is Commutative: $ sigma p_1( sigma p_2(R))= sigma p_2( sigma + p_1(R))$ +- Selection Distributes Over Union: $ sigma p(R∪S)= sigma p(R)∪ sigma p(S)$ +- Projection Distributes Over Union: $pi c(R∪S)=pi c(R)∪pi c(S)$ +- Selection and Join Commutativity: $ sigma p(R join S)= sigma p(R) join S$ if + p involves only attributes of R +- Pushing Selections Through Joins: $ sigma p(R join S)=( sigma p(R)) join S$ + when p only involves attributes of R +- Pushing Projections Through Joins: $pi c(R join S)=pi c(pi_(c sect #[attr]) + (R) join pi_(c sect #[attr]) (S))$ + == Operations +- Projection ($pi$). Syntax: $pi_{#[attributes]}(R)$. Purpose: Reduces the + relation to only contain specified attributes. Example: $pi_{#[Name, + Age}]}(#[Employees])$ + +- Selection ($sigma$). Syntax: $sigma_{#[condition]}(R)$. Purpose: Filters rows + that meet the condition. Example: $sigma_{#[Age] > 30}(#[Employees])$ + +- Union ($union$). Syntax: $R union S$. Purpose: Combines tuples from both + relations, removing duplicates. Requirement: Relations must be + union-compatible. + +- Intersection ($sect$). Syntax: $R sect S$. Purpose: Retrieves tuples common + to both relations. Requirement: Relations must be union-compatible. + +- Difference ($-$). Syntax: $R - S$. Purpose: Retrieves tuples in R that are + not in S. Requirement: Relations must be union-compatible. + +- Cartesian Product ($times$). Syntax: $R times S$. Purpose: Combines tuples + from R with every tuple from S. + +- Natural Join ($join$). Syntax: $R join S$. Purpose: Combines tuples from R + and S based on common attribute values. + +- Theta Join ($join_theta$). Syntax: $R join_theta S$. Purpose: Combines tuples + from R and S where the theta condition holds. + +- Outer Join. Full Outer Join: $R join.l.r S$. Left Outer Join: $R join.l S$. + Right Outer Join: $R join.r S$. Purpose: Extends join to include non-matching + tuples from one or both relations, filling with nulls. + + = Concurrency + +=== Conflict + +We say that I and J conflict if they are operations by *different transactions* on the +*same data item*, and at least one of these instructions is a *write* operation. +For example: I = read(Q), J = read(Q) -- Not a conflict; I = read(Q), J = +write(Q) -- Conflict; I = write(Q), J = read(Q) -- Conflict; I = write(Q), J = +write(Q) -- Conflict. + +// + I = read(Q), J = read(Q). The order of I and J *does not matter*, since the same +// value of Q is read by $T_i$ and $T _j$, regardless of the order. +// +// + I = read(Q), J = write(Q). If I comes before J, then Ti does not read the value +// of Q that is written by Tj in instruction J. If J comes before I, then Ti reads the +// value of Q that is written by Tj. Thus, the order of I and J *matters*. +// +// + I = write(Q), J = read(Q). The order of I and J *matters* for reasons similar to +// those of the previous case. +// +// + I = write(Q), J = write(Q). Since both instructions are write operations, the +// order of these instructions does not affect either Ti or Tj. However, the value +// obtained by the next read(Q) instruction of S is affected, since the result of only +// the latter of the two write instructions is preserved in the database. If there is no +// other write(Q) instruction after I and J in S, then the order of I and J *directly +// affects the final value* of Q in the database state that results from schedule S. + == Conflict-serializability -=== Conflict (types) +If a schedule $S$ can be transformed into a schedule $S'$ by a series of swaps +of non- conflicting instructions, we say that $S$ and $S'$ are *conflict +equivalent*. We can swap only _adjacent_ operations. + +The concept of conflict equivalence leads to the concept of conflict +serializability. We say that a schedule $S$ is *conflict serializable* if it is +conflict equivalent to a serial schedule. === Serializability graph -== Standard consistency levels +Simple and efficient method for determining the conflict +seriazability of a schedule. Consider a schedule $S$. We construct a directed +graph, called a precedence graph, from $S$. The set of vertices +consists of all the transactions participating in the schedule. The set of +edges consists of all edges $T_i arrow T_j$ for which one of three conditions holds: + ++ $T_i$ executes `write(Q)` before $T_j$ executes `read(Q)`. ++ $T_i$ executes `read(Q)` before $T_j$ executes `write(Q)`. ++ $T_i$ executes `write(Q)` before $T_j$ executes `write(Q)`. + +If the precedence graph for $S$ has a cycle, then schedule $S$ is not conflict +serializable. If the graph contains no cycles, then the schedule $S$ is +conflict serializable. + +== Standard isolation levels + +- *Serializable* usually ensures serializable execution. However, as we shall explain + shortly, some database systems implement this isolation level in a manner that + may, in certain cases, allow nonserializable executions. +- *Repeatable* read allows only committed data to be read and further requires that, + between two reads of a data item by a transaction, no other transaction is allowed + to update it. However, the transaction may not be serializable with respect to other + transactions. For instance, when it is searching for data satisfying some conditions, + a transaction may find some of the data inserted by a committed transaction, but + may not find other data inserted by the same transaction. +- *Read committed* allows only committed data to be read, but does not require re- + peatable reads. For instance, between two reads of a data item by the transaction, + another transaction may have updated the data item and committed. +- *Read uncommitted* allows uncommitted data to be read. It is the lowest isolation + level allowed by SQL. == Protocols === Lock-based -=== Timestamp +=== Timestamp-based -=== Validation +=== Validation-based === Version isolation -= Logs += Logs == WAL principle +== Write ahead principle + == Recovery algorithm == Log type examples