information_systems_for_eng…/sections/relational_model.tex

\chapter{The Relational Model}

The relational model serves as the theoretical cornerstone of modern database systems, providing a structured yet flexible framework for data management. Proposed by Edgar Codd in 1970, this model revolutionized the field by introducing the principle of data independence. This principle decouples the logical representation of data—how users perceive and interact with it—from its physical storage on hardware. By representing information through intuitive two-dimensional tables, the model bridges the gap between complex mathematical theory and practical business applications. Interestingly, the tabular format is not a modern invention; historical evidence shows that humans have used clay tablets for relational data organization since at least 1800 BC. This enduring utility underscores the model's alignment with human cognitive patterns for managing structured facts.

\thm{Data Independence}{The separation of the logical data model from the physical storage implementation, allowing changes to the machine-level storage without affecting user queries or the logical view of the data.}

\section{Core Terminology and Structural Components}

In relational theory, specific terminology is used to describe the components of a database, often with synonyms used across different technical and business contexts. The primary structure is the relation, commonly referred to as a table. A relation consists of a set of attributes, which are the named columns that define the properties of the data stored. The set of these attributes, combined with the name of the relation itself, constitutes the relation schema.

\dfn{Relation Schema}{The formal description of a relation, comprising its name and a set of attributes, typically denoted as $R(A_1, A_2, \dots, A_n)$.}

Each entry within a relation is called a tuple, which corresponds to a row in a table or a record in a file. A tuple contains a specific value for each attribute defined in the schema. These values, often called scalars, represent individual facts or characteristics.

\dfn{Tuple}{A single row or record within a relation, representing a specific instance of the entity or business object described by the schema.}

\nt{While mathematicians prefer to index tuples by numbers, database scientists identify components by their attribute names to provide semantic clarity.}

\section{Domains and Atomic Values}

Every attribute in a relation is associated with a domain. A domain is essentially a data type or a set of permissible values that can appear in a specific column. For example, a "year" attribute might be restricted to the domain of integers, while a "name" attribute is restricted to the domain of character strings. A fundamental requirement of the standard relational model is that these values must be atomic. This means they cannot be further decomposed into smaller components, such as nested tables, lists, or sets. This requirement is formally known as the First Normal Form.

\dfn{Domain}{A set of values of a specific elementary type from which an attribute draws its components.}

\thm{Atomic Integrity}{The rule that every component of every tuple must be an indivisible, elementary value rather than a structured or repeating group.}

\section{Mathematical Foundations of Relations}

The relational model is built upon the mathematical concept of the Cartesian product. Given a family of domains $D_1, D_2, \dots, D_n$, a relation is defined as a subset of the Cartesian product $D_1 \times D_2 \times \dots \times D_n$. Each element of this subset is an $n$-tuple. This mathematical approach ensures that domain integrity and relational integrity are maintained by definition, as every value must belong to its prescribed set.

\dfn{Cartesian Product}{The set of all possible ordered tuples that can be formed by taking one element from each of the participating sets or domains.}

An alternative mathematical representation views a record as a map. In this perspective, a record $t$ is a partial function from a set of attribute names to a global set of values. This mapping approach is often preferred because it makes the order of attributes irrelevant, reflecting how databases actually operate in practice.

\nt{In a relation, the order of both the attributes and the tuples is immaterial; a relation remains the same regardless of how its rows or columns are permuted.}

\section{Integrity and Consistency Rules}

For a collection of data to be considered a valid relational table, it must adhere to three primary integrity rules. These rules ensure the consistency and predictability of the data.

\begin{enumerate}
	\item \textbf{Relational Integrity:} This requires that all records within a specific table have the exact same set of attributes. A table cannot have "holes" or missing attributes in some rows but not others.
	\item \textbf{Atomic Integrity:} As previously noted, this prohibits the nesting of structures within a cell. A value must be a single fact.
	\item \textbf{Domain Integrity:} This ensures that every value in a column is of the same kind, matching the type specified for that attribute in the schema.
\end{enumerate}

\dfn{Domain Integrity}{The constraint that every value in a specific column must belong to the domain (data type) associated with that attribute.}

\thm{Relational Integrity}{The requirement that every record in a relation must possess the same support, meaning they all share the identical set of attributes defined in the schema.}

\section{Keys and Uniqueness}

To distinguish between tuples, the relational model relies on the concept of keys. A key is a set of one or more attributes that uniquely identifies a tuple within a relation instance. No two tuples in a valid relation can share the same values for all attributes in the key. Typically, one key is designated as the primary key.

\dfn{Primary Key}{A specific attribute or minimal set of attributes chosen to uniquely identify each tuple in a relation, often indicated in a schema by underlining the attributes.}

\nt{Identifying a primary key is essential for establishing relationships between different tables and maintaining data accuracy.}

\section{Relation Instances and Temporal Change}

A relation is not a static object; it changes over time as tuples are inserted, deleted, or updated. The set of tuples present in a relation at any given moment is called an instance. Standard database systems typically only maintain the "current instance," representing the data as it exists right now. Changing a schema (adding or deleting columns) is a much more significant and expensive operation than changing an instance, as it requires restructuring every tuple currently stored.

\dfn{Relation Instance}{The specific set of tuples contained within a relation at a given point in time.}

\section{Alternative Storage Semantics}

While the classical relational model is based on set semantics, where duplicate tuples are strictly forbidden, practical implementations often utilize different semantics based on the needs of the system.

\begin{enumerate}
	\item \textbf{Set Semantics:} No duplicate records are allowed.
	\item \textbf{Bag Semantics:} Duplicate records are permitted. This is common in SQL results, as eliminating duplicates is computationally expensive.
	\item \textbf{List Semantics:} The specific order of the records is preserved and significant.
\end{enumerate}

\thm{Bag Semantics}{A variation of the relational model where duplicate tuples are allowed to exist within a relation, often used to improve the efficiency of query operations.}

\nt{The choice between set, bag, and list semantics is often a trade-off between mathematical purity and the performance requirements of a real-world database engine.}

\section{Conclusion}

The relational model's power lies in its simplicity and its firm mathematical grounding. By treating data as a collection of relations and providing a clear set of integrity rules, it allows for the creation of robust, scalable information systems. The use of schemas provides a stable contract for applications, while the principle of data independence ensures that the system can evolve technologically without breaking the logical structures that users depend on.

\nt{The relational model effectively acts as the "physics" of data, providing the laws that govern how digital information is structured and transformed.}