Notes V.1.0.0
This commit is contained in:
131
sections/relational_model.tex
Normal file
131
sections/relational_model.tex
Normal file
@@ -0,0 +1,131 @@
|
||||
\chapter{The Relational Model}
|
||||
|
||||
The relational model stands as the preeminent framework for managing data in contemporary information systems. Historically, the organization of data in tabular formats is a practice that stretches back nearly four millennia, beginning with early clay tablets. However, the modern digital iteration was pioneered by Edgar Codd in 1970. Codd’s primary contribution was the principle of data independence, which strictly separates the logical representation of information from its physical implementation on storage devices. Before this shift, programmers were often required to understand the underlying physical structure of the data to perform even basic queries. The relational model replaced these complex, system-dependent methods with a high-level abstraction based on tables, which are referred to as relations.
|
||||
|
||||
In this model, data is represented as a collection of two-dimensional structures. This approach offers simplicity and versatility, allowing for anything from corporate records to scientific data to be modeled effectively. By restricting operations to a limited set of high-level queries, the relational model allows for significant optimization by the database management system, often performing tasks more efficiently than code written in general-purpose languages. This chapter details the structure, mathematical foundations, and design theories—specifically functional dependencies and normalization—that ensure data remains consistent and free from redundancy.
|
||||
|
||||
\section{Core Terminology and Structural Components}
|
||||
|
||||
The architecture of the relational model is defined by a specific set of terms that describe both the structure and the content of the data.
|
||||
|
||||
\dfn{Attribute}{
|
||||
An attribute is a named header for a column in a relation. It describes the meaning of the entries within that column. For example, in a table tracking information about movies, "title" and "year" would be typical attributes.
|
||||
}
|
||||
|
||||
\dfn{Tuple}{
|
||||
A tuple is a single row in a relation, excluding the header row. It represents a specific instance of the entity described by the relation. A tuple contains one component for every attribute defined in the relation's schema.
|
||||
}
|
||||
|
||||
\dfn{Relation Schema}{
|
||||
A relation schema consists of the name of the relation and the set of attributes associated with it. This is typically expressed as $R(A_1, A_2, \dots, A_n)$. A database schema is the total collection of all relation schemas within a system.
|
||||
}
|
||||
|
||||
\dfn{Relation Instance}{
|
||||
A relation instance is the specific set of tuples present in a relation at any given time. While schemas are relatively static, instances change frequently as data is inserted, updated, or deleted.
|
||||
}
|
||||
|
||||
The components of a tuple must be atomic, meaning they are elementary types like integers or strings. The model explicitly forbids complex structures such as nested lists or sets as individual values. Every attribute is associated with a domain, which defines the set of permissible values or the specific data type for that column.
|
||||
|
||||
\section{Mathematical Foundations of Relations}
|
||||
|
||||
Mathematically, a relation is defined as a subset of the Cartesian product of the domains of its attributes. If an attribute $A$ has a domain $D$, then the entries in the column for $A$ must be elements of $D$. A record can be viewed as a partial function or a "map" from the set of attribute names to a set of atomic values.
|
||||
|
||||
\thm{Relation as a Set}{
|
||||
In the abstract mathematical model, a relation is a set of tuples. This implies that the order of the rows is irrelevant and that every tuple must be unique. Furthermore, because attributes are a set, the order of columns does not change the identity of the relation, provided the components of the tuples are reordered to match.
|
||||
}
|
||||
|
||||
While the theoretical model relies on set semantics, practical implementations often utilize alternate semantics:
|
||||
\begin{itemize}
|
||||
\item \textbf{Bag Semantics}: Used by SQL, this allows for duplicate records within a table.
|
||||
\item \textbf{List Semantics}: In this variation, the specific sequence of the records is preserved and carries meaning.
|
||||
\end{itemize}
|
||||
|
||||
A database is formally defined as a set of these relational tables. To interact with this data, the model employs relational algebra, a system of operators that take one or more relations as input and produce a new relation as output.
|
||||
|
||||
\section{Integrity Constraints and Consistency}
|
||||
|
||||
To ensure the validity of data, the relational model enforces several categories of integrity.
|
||||
|
||||
\thm{Relational Integrity}{
|
||||
The requirement that every record within a specific relation must possess the exact same set of attributes. Broken relational integrity occurs if attributes are missing or if redundant attributes appear in individual rows.
|
||||
}
|
||||
|
||||
\thm{Atomic Integrity}{
|
||||
Also known as the First Normal Form (1NF), this rule dictates that every value in a cell must be a single, indivisible unit. Complex data types cannot be stored within a single attribute field.
|
||||
}
|
||||
|
||||
\thm{Domain Integrity}{
|
||||
This constraint requires that every value for an attribute must belong to the predefined set of values or the data type associated with its domain.
|
||||
}
|
||||
|
||||
\section{Defining Relation Schemas in SQL}
|
||||
|
||||
SQL (Structured Query Language) is the primary tool for implementing the relational model. It is divided into the Data-Definition Language (DDL) for creating and modifying schemas, and the Data-Manipulation Language (DML) for querying and updating data. The most fundamental command in DDL is the \texttt{CREATE TABLE} statement, which establishes the table name, its attributes, and their types.
|
||||
|
||||
\subsection{SQL Data Types}
|
||||
|
||||
Attributes must be assigned a primitive type. Common SQL types include:
|
||||
\begin{itemize}
|
||||
\item \textbf{CHAR(n)}: A fixed-length string of $n$ characters.
|
||||
\item \textbf{VARCHAR(n)}: A variable-length string up to $n$ characters.
|
||||
\item \textbf{INT / INTEGER}: Standard whole numbers.
|
||||
\item \textbf{FLOAT / REAL}: Floating-point numbers.
|
||||
\item \textbf{BOOLEAN}: Stores TRUE, FALSE, or UNKNOWN.
|
||||
\item \textbf{DATE / TIME}: Specific formats for calendar dates (e.g., YYYY-MM-DD) and clock times.
|
||||
\end{itemize}
|
||||
|
||||
\subsection{Keys and Uniqueness}
|
||||
|
||||
\dfn{Key}{
|
||||
A key is a set of one or more attributes such that no two tuples in any possible relation instance can share the same values for all these attributes. A key must be minimal; no subset of its attributes can also be a key.
|
||||
}
|
||||
|
||||
In SQL, keys are declared using the \texttt{PRIMARY KEY} or \texttt{UNIQUE} keywords. Attributes designated as a primary key are forbidden from containing NULL values, whereas \texttt{UNIQUE} columns may allow them depending on the system.
|
||||
|
||||
\section{Functional Dependencies}
|
||||
|
||||
A central concept in database design theory is the functional dependency (FD), which generalizes the idea of a key.
|
||||
|
||||
\thm{Functional Dependency}{
|
||||
A functional dependency on a relation $R$ is an assertion that if two tuples agree on a set of attributes $A_1, \dots, A_n$, they must also agree on another set of attributes $B_1, \dots, B_m$. This is written as $A \rightarrow B$.
|
||||
}
|
||||
|
||||
FDs are not merely observations about a specific instance of data but are constraints that must hold for every possible legal instance of the relation. They describe the relationships between attributes; for example, a movie's title and year might functionally determine its length and studio, as there is only one specific length and studio for a unique movie released in a given year.
|
||||
|
||||
\dfn{Superkey}{
|
||||
A superkey is a set of attributes that contains a key as a subset. Therefore, every superkey functionally determines all attributes of the relation, but it may not be minimal.
|
||||
}
|
||||
|
||||
The closure of a set of attributes under a set of FDs is the collection of all attributes that are functionally determined by that set. Calculating the closure allows designers to identify all keys of a relation and test if a new FD follows from the existing ones.
|
||||
|
||||
\section{Anomalies and the Need for Decomposition}
|
||||
|
||||
Careless schema design leads to "anomalies," which are problems that occur when too much information is crammed into a single table. There are three primary types:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Redundancy}: Information is repeated unnecessarily across multiple rows (e.g., repeating a studio's address for every movie they made).
|
||||
\item \textbf{Update Anomalies}: If a piece of redundant information changes, it must be updated in every row. Failure to do so leads to inconsistent data.
|
||||
\item \textbf{Deletion Anomalies}: Deleting a record might inadvertently destroy the only copy of unrelated information (e.g., deleting the last movie of a studio might remove the studio's address from the database entirely).
|
||||
\end{enumerate}
|
||||
|
||||
To eliminate these issues, designers use decomposition—the process of splitting a relation into two or more smaller relations whose attributes, when combined, include all the original attributes.
|
||||
|
||||
\section{Normal Forms}
|
||||
|
||||
The goal of decomposition is to reach a normal form that guarantees the absence of certain anomalies.
|
||||
|
||||
\thm{Boyce-Codd Normal Form (BCNF)}{
|
||||
A relation $R$ is in BCNF if and only if for every nontrivial functional dependency $A \rightarrow B$, the set of attributes $A$ is a superkey. In simpler terms, every determinant must be a key.
|
||||
}
|
||||
|
||||
Any relation can be decomposed into a collection of BCNF relations. This process effectively removes redundancy caused by functional dependencies. However, while BCNF is very powerful, it does not always preserve all original dependencies. This leads to the use of a slightly relaxed condition.
|
||||
|
||||
\thm{Third Normal Form (3NF)}{
|
||||
A relation $R$ is in 3NF if for every nontrivial FD $A \rightarrow B$, either $A$ is a superkey, or every attribute in $B$ that is not in $A$ is "prime" (a member of some key).
|
||||
}
|
||||
|
||||
3NF is useful because it is always possible to find a decomposition that is both lossless (the original data can be reconstructed) and dependency-preserving, which is not always true for BCNF.
|
||||
|
||||
\section{Modifying and Removing Schemas}
|
||||
|
||||
Database structures are dynamic. SQL provides the \texttt{DROP TABLE} command to remove a relation and all its data permanently. For structural changes, the \texttt{ALTER TABLE} command is used. This allows for the addition of new attributes via \texttt{ADD} or the removal of existing ones via \texttt{DROP}. When new columns are added, existing tuples typically receive a \texttt{NULL} value or a specified \texttt{DEFAULT} value.
|
||||
|
||||
Reference in New Issue
Block a user