information_systems_for_eng…/sections/database_architecture.tex

\chapter{Database Architecture}

The management of transactions is the core mechanism that ensures a database remains reliable and consistent despite concurrent access and system failures. A transaction is defined as a logical unit of work, consisting of one or more database operations that must be executed as an indivisible whole. This chapter explores the multi-tier architecture that supports these operations, the physical storage layer that provides data independence, and the sophisticated logging and concurrency control protocols used to maintain the ACID properties. We investigate how the system handles crashes through undo and redo logging, how schedulers prevent interference between users through locking and timestamping, and how complex, long-running processes are managed through the use of sagas and compensating transactions.

\section{The Architectural Context of Transactions}

Modern database systems are typically deployed in a three-tier architecture to separate user interaction from business logic and data persistence.

\dfn{Three-Tier Architecture}{A system organization consisting of three distinct layers: the Web Server tier for managing client interactions, the Application Server tier for executing business logic and generating queries, and the Database Server tier for managing data storage and transaction execution.}

The database tier is designed to provide data independence, allowing users to query data without needing to understand the underlying physical storage mechanics. Behind the scenes, the DBMS manages a complex hierarchy of hardware, moving data between volatile main memory (RAM) and nonvolatile storage (Disk).

\nt{Data independence is a fundamental principle established by Edgar Codd. it ensures that the logical representation of data in tables is decoupled from the physical directories and files on the disk, such as the `pg\_data` directory in PostgreSQL.}

\section{Physical Storage and Data Movement}

The unit of interaction between the disk and main memory is not the individual record, but the block or page. In many systems, such as PostgreSQL, these blocks are typically 8 KB in size.

\dfn{Database Element}{A unit of data that can be accessed or modified by a transaction. While elements can be tuples or relations, they are most effectively treated as disk blocks to ensure atomic writes to nonvolatile storage.}

\thm{The I/O Model of Computation}{The primary cost of database operations is measured by the number of disk I/O actions. Because accessing a disk is orders of magnitude slower than CPU cycles, efficiency is achieved by minimizing the transfer of blocks between the disk and memory buffers.}

When a record is too large to fit within a standard page—such as large text objects or binary data—the system employs specialized techniques like TOAST (The Oversized-Attribute Storage Technique), which slices the data into chunks and stores them in separate tables. Physically, these pages are organized into larger files on the disk called chunks, often reaching sizes of 1 GB.

\section{The ACID Properties of Transactions}

To ensure the integrity of the database, every transaction must satisfy the ACID test. These properties guarantee that the database remains in a consistent state even if a program is interrupted or multiple users attempt to modify the same record.

\thm{ACID Properties}{
	\begin{itemize}
		\item \textbf{Atomicity:} The "all-or-nothing" execution of transactions. If any part fails, the entire unit is rolled back.
		\item \textbf{Consistency:} Every transaction must move the database from one valid state to another, satisfying all structural and business constraints.
		\item \textbf{Isolation:} Each transaction must appear to execute as if no other transaction were occurring simultaneously.
		\item \textbf{Durability:} Once a transaction is committed, its effects must persist permanently, surviving any subsequent system crash.
	\end{itemize}}

\section{Undo Logging and Recovery}

Logging is the primary method for achieving durability and atomicity. The log is an append-only file that records every important change to the database.

\dfn{Undo Logging}{A logging method where only the old values of modified data elements are recorded. It is designed to allow the recovery manager to cancel the effects of uncommitted transactions by restoring data to its previous state.}

For undo logging to function correctly, two specific rules must be followed:
1. Every update record (the old value) must be written to the disk before the modified data element itself reaches the disk.
2. The commit record must be written to the disk only after all modified data elements have been successfully flushed to the disk.

\nt{In undo logging, the order of writes to disk is: Log record $\to$ Data element $\to$ Commit record. This ensures that if a crash occurs before the commit, we always have the old value available to undo the change.}

\section{Redo Logging and the Write-Ahead Rule}

While undo logging requires immediate data flushes, redo logging offers more flexibility by recording only the new values of data elements.

\dfn{Redo Logging}{A logging method that records the new values of database elements. On recovery, the system repeats the changes of committed transactions and ignores those that did not commit.}

\thm{Write-Ahead Logging (WAL) Rule}{In redo logging, all log records pertaining to a modification, including the update record and the commit record, must appear on disk before the modified data element itself is written to disk.}

The order of operations for redo logging is: Log record $\to$ Commit record $\to$ Data element. This allows the system to keep changed data in memory buffers longer, potentially reducing disk I/O, as the log provides a way to "redo" the work if memory is lost.

\section{Undo/Redo Logging and Checkpointing}

A hybrid approach, undo/redo logging, records both the old and new values of a database element ($<T, X, v, w>$). This provides the highest level of flexibility, as the commit record can be written either before or after the data elements are flushed to disk.

\nt{Undo/redo logging is the most common method in modern DBMSs because it allows the buffer manager to be more efficient. It only requires that the log record for a change reach the disk before the change itself does.}

To avoid scanning the entire log during recovery, the system uses checkpointing.

\dfn{Nonquiescent Checkpointing}{A technique that allows the system to mark a "safe" point in the log without shutting down the database. It records the set of active transactions and ensures that all data changed by previously committed transactions has reached the disk.}

\section{Concurrency Control and Serializability}

When multiple transactions run at once, their actions form a schedule. The goal of the scheduler is to ensure that this schedule is serializable.

\dfn{Conflict-Serializable Schedule}{A schedule that can be transformed into a serial schedule (where transactions run one after another) by a sequence of swaps of adjacent, non-conflicting actions.}

\thm{The Precedence Graph Test}{A schedule is conflict-serializable if and only if its precedence graph—where nodes are transactions and arcs represent conflicts—is acyclic. A conflict occurs if two transactions access the same element and at least one is a write.}

\section{Lock-Based Schedulers and Two-Phase Locking}

The most common way to enforce serializability is through the use of locks. Before a transaction can access a database element, it must obtain a lock on that element from the scheduler's lock table.

\dfn{Shared and Exclusive Locks}{A Shared (S) lock allows multiple transactions to read an element simultaneously. An Exclusive (X) lock is required for writing and prevents any other transaction from reading or writing that element.}

Simply taking locks is insufficient; the timing of when locks are released is vital for maintaining a consistent state.

\thm{Two-Phase Locking (2PL)}{A protocol requiring that in every transaction, all lock acquisitions must precede all lock releases. This creates a "growing phase" where locks are gathered and a "shrinking phase" where they are surrendered.}

\nt{Strict Two-Phase Locking is a variation where a transaction holds all its exclusive locks until it commits or aborts. This prevents other transactions from reading "dirty data"—values written by uncommitted transactions—and eliminates the need for cascading rollbacks.}

\section{Deadlock Management}

Locking systems are inherently prone to deadlocks, where transactions are stuck in a cycle of waiting for one another. Schedulers must implement strategies to detect or prevent these states.

\dfn{Waits-For Graph}{A directed graph used for deadlock detection. Nodes represent transactions, and an arc from $T$ to $U$ indicates that $T$ is waiting for a lock currently held by $U$. A cycle in this graph indicates a deadlock.}

Prevention strategies often involve timestamps. Two popular methods are:
\begin{itemize}
	\item \textbf{Wait-Die:} If an older transaction needs a lock held by a newer one, it waits. If a newer transaction needs a lock held by an older one, it dies (rolls back).
	\item \textbf{Wound-Wait:} An older transaction "wounds" (forces a rollback) a newer transaction that holds a lock it needs. A newer transaction must wait for an older one.
\end{itemize}

\section{Alternative Concurrency Control: Timestamps and Validation}

Beyond locking, systems may use optimistic concurrency control methods, which are particularly effective when conflicts are rare.

\dfn{Timestamp-Based Scheduling}{A method where each transaction is assigned a unique timestamp when it begins. The scheduler maintains read and write times for every database element and rolls back any transaction that attempts to perform a "physically unrealizable" action, such as reading a value written in its future.}

\dfn{Validation-Based Scheduling}{An optimistic approach where transactions execute in a private workspace. Before committing, the transaction enters a validation phase where the scheduler checks its read and write sets against those of other active transactions to ensure no serializability violations occurred.}

\section{Long-Duration Transactions and Sagas}

In environments like design systems or workflow management, transactions can last for hours or even days. Holding locks for such durations would paralyze the system.

\dfn{Saga}{A long-duration transaction consisting of a sequence of smaller, independent actions. Each action is its own transaction that commits immediately.}

\thm{Compensating Transactions}{For every action $A$ in a saga, there must be a corresponding compensating transaction $A^{-1}$ that logically undoes the effects of $A$. If the saga must abort, the system executes the compensating transactions in reverse order to return the database to a consistent state.}

\nt{A saga does not strictly follow the traditional "Isolation" property of ACID, as the results of its intermediate actions are visible to other transactions. However, through the use of compensation, it maintains the logical consistency of the system.}

In conclusion, transaction management requires a delicate balance between performance and correctness. By combining robust logging for durability, strict locking for isolation, and innovative structures like sagas for long-term processes, modern database systems provide a stable foundation for complex information ecosystems. These mechanisms ensure that even in the event of hardware failure or intense concurrent demand, the integrity of the data remains unassailable.

To visualize a transaction, think of it as a set of instructions for a complicated recipe. If you get halfway through and realize you are missing a vital ingredient, you cannot just stop and leave the half-mixed dough on the counter. You must either finish the recipe or clean up the mess so the kitchen is exactly as it was before you started. The database scheduler and log manager are like the head chef, ensuring that every cook has the tools they need and that no one's flour ends up in someone else's soup.