Databases-II-Cheatsheet/main.typ
2024-05-05 10:22:28 +03:00

209 lines
7.0 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#set page(margin: (
top: 1cm,
bottom: 1cm,
right: 1cm,
left: 1cm,
))
#set text(7pt)
#show heading: it => {
if it.level == 1 {
// pagebreak(weak: true)
text(10pt, upper(it))
} else if it.level == 2 {
text(9pt, smallcaps(it))
} else {
text(8pt, smallcaps(it))
}
}
= Indices
== Bitmap
== B+ tree
== Hash-index
= Algorithms
== Nested-loop join
=== Overview
=== Cost
== Block-nested join
=== Overview
=== Cost
== Merge join
=== Overview
=== Cost
== Hash-join
=== Overview
=== Cost
= Relational-algebra
== Equivalence rules
- Commutativity of Union: $RS=SR$
- Commutativity of Intersection: $R∩S=S∩R$
- Commutativity of Join: $R join S=S join R$
- Associativity of Union: $(RS)T=R(ST)$
- Associativity of Intersection: $(R∩S)∩T=R∩(S∩T)$
- Associativity of Join: $(R join S) join T=R join (S join T)$
- Theta joins are associative in the following manner: $(E_1 join_theta_1
E_2) join_(theta_2 and theta_3) E_3 ≡E_1 join_(theta_1 or theta_3) (E_2
join_theta_2 E_3)$
- Distributivity of Union over IntersectionL $R(S∩T)=(RS)∩(RT)$
- Distributivity of Intersection over Union: $R∩(ST)=(R∩S)(R∩T)$
- Distributivity of Join over Union: $R join (ST)=(R join S)(R join T)$
- Selection is Commutative: $ sigma p_1( sigma p_2(R))= sigma p_2( sigma
p_1(R))$
- Selection Distributes Over Union: $ sigma p(RS)= sigma p(R) sigma p(S)$
- Projection Distributes Over Union: $pi c(RS)=pi c(R)pi c(S)$
- Selection and Join Commutativity: $ sigma p(R join S)= sigma p(R) join S$ if
p involves only attributes of R
- Pushing Selections Through Joins: $ sigma p(R join S)=( sigma p(R)) join S$
when p only involves attributes of R
- Pushing Projections Through Joins: $pi c(R join S)=pi c(pi_(c sect #[attr])
(R) join pi_(c sect #[attr]) (S))$
== Operations
- Projection ($pi$). Syntax: $pi_{#[attributes]}(R)$. Purpose: Reduces the
relation to only contain specified attributes. Example: $pi_{#[Name,
Age}]}(#[Employees])$
- Selection ($sigma$). Syntax: $sigma_{#[condition]}(R)$. Purpose: Filters rows
that meet the condition. Example: $sigma_{#[Age] > 30}(#[Employees])$
- Union ($union$). Syntax: $R union S$. Purpose: Combines tuples from both
relations, removing duplicates. Requirement: Relations must be
union-compatible.
- Intersection ($sect$). Syntax: $R sect S$. Purpose: Retrieves tuples common
to both relations. Requirement: Relations must be union-compatible.
- Difference ($-$). Syntax: $R - S$. Purpose: Retrieves tuples in R that are
not in S. Requirement: Relations must be union-compatible.
- Cartesian Product ($times$). Syntax: $R times S$. Purpose: Combines tuples
from R with every tuple from S.
- Natural Join ($join$). Syntax: $R join S$. Purpose: Combines tuples from R
and S based on common attribute values.
- Theta Join ($join_theta$). Syntax: $R join_theta S$. Purpose: Combines tuples
from R and S where the theta condition holds.
- Outer Join. Full Outer Join: $R join.l.r S$. Left Outer Join: $R join.l S$.
Right Outer Join: $R join.r S$. Purpose: Extends join to include non-matching
tuples from one or both relations, filling with nulls.
= Concurrency
=== Conflict
We say that I and J conflict if they are operations by *different transactions* on the
*same data item*, and at least one of these instructions is a *write* operation.
For example: I = read(Q), J = read(Q) -- Not a conflict; I = read(Q), J =
write(Q) -- Conflict; I = write(Q), J = read(Q) -- Conflict; I = write(Q), J =
write(Q) -- Conflict.
// + I = read(Q), J = read(Q). The order of I and J *does not matter*, since the same
// value of Q is read by $T_i$ and $T _j$, regardless of the order.
//
// + I = read(Q), J = write(Q). If I comes before J, then Ti does not read the value
// of Q that is written by Tj in instruction J. If J comes before I, then Ti reads the
// value of Q that is written by Tj. Thus, the order of I and J *matters*.
//
// + I = write(Q), J = read(Q). The order of I and J *matters* for reasons similar to
// those of the previous case.
//
// + I = write(Q), J = write(Q). Since both instructions are write operations, the
// order of these instructions does not affect either Ti or Tj. However, the value
// obtained by the next read(Q) instruction of S is affected, since the result of only
// the latter of the two write instructions is preserved in the database. If there is no
// other write(Q) instruction after I and J in S, then the order of I and J *directly
// affects the final value* of Q in the database state that results from schedule S.
== Conflict-serializability
If a schedule $S$ can be transformed into a schedule $S'$ by a series of swaps
of non- conflicting instructions, we say that $S$ and $S'$ are *conflict
equivalent*. We can swap only _adjacent_ operations.
The concept of conflict equivalence leads to the concept of conflict
serializability. We say that a schedule $S$ is *conflict serializable* if it is
conflict equivalent to a serial schedule.
=== Serializability graph
Simple and efficient method for determining the conflict
seriazability of a schedule. Consider a schedule $S$. We construct a directed
graph, called a precedence graph, from $S$. The set of vertices
consists of all the transactions participating in the schedule. The set of
edges consists of all edges $T_i arrow T_j$ for which one of three conditions holds:
+ $T_i$ executes `write(Q)` before $T_j$ executes `read(Q)`.
+ $T_i$ executes `read(Q)` before $T_j$ executes `write(Q)`.
+ $T_i$ executes `write(Q)` before $T_j$ executes `write(Q)`.
If the precedence graph for $S$ has a cycle, then schedule $S$ is not conflict
serializable. If the graph contains no cycles, then the schedule $S$ is
conflict serializable.
== Standard isolation levels
- *Serializable* usually ensures serializable execution. However, as we shall explain
shortly, some database systems implement this isolation level in a manner that
may, in certain cases, allow nonserializable executions.
- *Repeatable* read allows only committed data to be read and further requires that,
between two reads of a data item by a transaction, no other transaction is allowed
to update it. However, the transaction may not be serializable with respect to other
transactions. For instance, when it is searching for data satisfying some conditions,
a transaction may find some of the data inserted by a committed transaction, but
may not find other data inserted by the same transaction.
- *Read committed* allows only committed data to be read, but does not require re-
peatable reads. For instance, between two reads of a data item by the transaction,
another transaction may have updated the data item and committed.
- *Read uncommitted* allows uncommitted data to be read. It is the lowest isolation
level allowed by SQL.
== Protocols
=== Lock-based
=== Timestamp-based
=== Validation-based
=== Version isolation
= Logs
== WAL principle
== Write ahead principle
== Recovery algorithm
== Log type examples
== Recovery example