Notes V.2.0.0
Rewrote Notes
This commit is contained in:
@@ -1,50 +1,121 @@
|
||||
\chapter{Database Architecture}
|
||||
|
||||
Database architecture serves as the structural foundation that bridges high-level data models with the physical realities of computer hardware. It encompasses a spectrum of components ranging from the multi-layered memory hierarchy used to store bits and bytes to the logical abstractions like virtual views that provide tailored interfaces for users. A central goal of this architecture is to maintain data independence, allowing the underlying storage methods to change without affecting how application programs interact with the data. This summary explores the mechanics of physical storage, the implementation of virtual relations, and the optimization techniques involving index structures to ensure efficient data retrieval and system resilience.
|
||||
The management of transactions is the core mechanism that ensures a database remains reliable and consistent despite concurrent access and system failures. A transaction is defined as a logical unit of work, consisting of one or more database operations that must be executed as an indivisible whole. This chapter explores the multi-tier architecture that supports these operations, the physical storage layer that provides data independence, and the sophisticated logging and concurrency control protocols used to maintain the ACID properties. We investigate how the system handles crashes through undo and redo logging, how schedulers prevent interference between users through locking and timestamping, and how complex, long-running processes are managed through the use of sagas and compensating transactions.
|
||||
|
||||
\section{The Storage and Memory Hierarchy}
|
||||
Behind the daily operations of a database system lies a sophisticated hardware hierarchy designed to manage the trade-off between access speed and storage capacity. At the fastest and most expensive level is the processor's internal cache (Levels 1 and 2), providing almost instantaneous access to data measured in nanoseconds. Below this is the main memory, or RAM, which acts as the primary workspace for the Database Management System (DBMS). While RAM provides significant capacity, it is volatile, meaning its contents are lost if power is interrupted. This volatility is a critical consideration for the ACID properties of transactions, specifically durability.
|
||||
\section{The Architectural Context of Transactions}
|
||||
|
||||
Secondary storage, primarily consisting of magnetic disks, serves as the non-volatile repository where data persists. Accessing disks involves mechanical movements, introducing latencies in the millisecond range, which is orders of magnitude slower than RAM. For massive data sets that exceed disk capacity, tertiary storage such as tapes or DVDs is utilized, offering enormous capacity (terabyte to petabyte range) at the cost of significantly longer access times.
|
||||
Modern database systems are typically deployed in a three-tier architecture to separate user interaction from business logic and data persistence.
|
||||
|
||||
\dfn{Memory Hierarchy}{A systematic organization of storage devices in a computer system, ranked by speed, capacity, and cost per bit, typically including cache, main memory, and secondary/tertiary storage.}
|
||||
\dfn{Three-Tier Architecture}{A system organization consisting of three distinct layers: the Web Server tier for managing client interactions, the Application Server tier for executing business logic and generating queries, and the Database Server tier for managing data storage and transaction execution.}
|
||||
|
||||
\thm{The Dominance of I/O}{The performance of a database system is largely determined by the number of disk I/O operations performed, as the time required to access a block on disk is significantly greater than the time needed to process data in main memory.}
|
||||
The database tier is designed to provide data independence, allowing users to query data without needing to understand the underlying physical storage mechanics. Behind the scenes, the DBMS manages a complex hierarchy of hardware, moving data between volatile main memory (RAM) and nonvolatile storage (Disk).
|
||||
|
||||
\section{Physical Data Representation and PGDATA}
|
||||
The physical storage of a database is typically organized into a specific directory structure on the host machine, often referred to as the data directory or PGDATA. This directory contains the actual files representing tables, configuration parameters, and transaction logs. To manage this data effectively, the DBMS divides information into blocks or pages, which are the fundamental units of transfer between disk and memory. In systems like PostgreSQL, these pages are usually 8KB in size.
|
||||
\nt{Data independence is a fundamental principle established by Edgar Codd. it ensures that the logical representation of data in tables is decoupled from the physical directories and files on the disk, such as the `pg\_data` directory in PostgreSQL.}
|
||||
|
||||
Data within these pages is organized into records. Fixed-length records are straightforward to manage, but variable-length fields require more complex structures like offset tables within the block header. When fields become exceptionally large, such as multi-gigabyte video files or documents, techniques like TOAST (The Oversized-Attribute Storage Technique) are employed to store these values in separate chunks, preventing them from bloating the primary data pages.
|
||||
\section{Physical Storage and Data Movement}
|
||||
|
||||
\section{Virtual Views and Data Abstraction}
|
||||
Virtual views are relations that do not exist physically in the database but are instead defined by a stored query over base tables. These views provide a layer of abstraction, allowing different users to see the data in formats that suit their specific needs without duplicating the underlying information. When a user queries a view, the system's query processor substitutes the view name with its corresponding definition, effectively transforming the query into one that operates directly on the stored base tables.
|
||||
The unit of interaction between the disk and main memory is not the individual record, but the block or page. In many systems, such as PostgreSQL, these blocks are typically 8 KB in size.
|
||||
|
||||
\dfn{Virtual View}{A relation that is not stored in the database but is defined by an expression that constructs it from other relations whenever it is needed.}
|
||||
\dfn{Database Element}{A unit of data that can be accessed or modified by a transaction. While elements can be tuples or relations, they are most effectively treated as disk blocks to ensure atomic writes to nonvolatile storage.}
|
||||
|
||||
Attributes within a view can be renamed for clarity using the AS keyword or by listing names in the declaration. This allows the architect to present a clean logical model to the end-user while hiding the complexity of the underlying join operations or attribute names.
|
||||
\thm{The I/O Model of Computation}{The primary cost of database operations is measured by the number of disk I/O actions. Because accessing a disk is orders of magnitude slower than CPU cycles, efficiency is achieved by minimizing the transfer of blocks between the disk and memory buffers.}
|
||||
|
||||
\section{Modification of Virtual Relations}
|
||||
Modifying a view is inherently more complex than modifying a base table because the system must determine how to map changes to the underlying physical data. SQL allows for "updatable views" under specific conditions: the view must be defined over a single relation, it cannot use aggregation or duplicate elimination, and the selection criteria must be simple enough that the system can unambiguously identify which base tuples are affected.
|
||||
When a record is too large to fit within a standard page—such as large text objects or binary data—the system employs specialized techniques like TOAST (The Oversized-Attribute Storage Technique), which slices the data into chunks and stores them in separate tables. Physically, these pages are organized into larger files on the disk called chunks, often reaching sizes of 1 GB.
|
||||
|
||||
\dfn{Updatable View}{A virtual view that is sufficiently simple to allow insertions, deletions, and updates to be translated directly into equivalent modifications on the underlying base relation.}
|
||||
\section{The ACID Properties of Transactions}
|
||||
|
||||
For more complex views that involve multiple tables or aggregations, "instead-of" triggers provide a solution. These triggers intercept a modification attempt on a view and execute a custom piece of logic—written by the database designer—that appropriately updates the base tables.
|
||||
To ensure the integrity of the database, every transaction must satisfy the ACID test. These properties guarantee that the database remains in a consistent state even if a program is interrupted or multiple users attempt to modify the same record.
|
||||
|
||||
\section{Index Structures and Motivation}
|
||||
As database relations grow, scanning every block to find specific information becomes prohibitively slow. Indexes are specialized data structures designed to accelerate this process. An index takes the value of a specific attribute, known as the search key, and provides pointers directly to the records containing that value.
|
||||
\thm{ACID Properties}{
|
||||
\begin{itemize}
|
||||
\item \textbf{Atomicity:} The "all-or-nothing" execution of transactions. If any part fails, the entire unit is rolled back.
|
||||
\item \textbf{Consistency:} Every transaction must move the database from one valid state to another, satisfying all structural and business constraints.
|
||||
\item \textbf{Isolation:} Each transaction must appear to execute as if no other transaction were occurring simultaneously.
|
||||
\item \textbf{Durability:} Once a transaction is committed, its effects must persist permanently, surviving any subsequent system crash.
|
||||
\end{itemize}}
|
||||
|
||||
\dfn{Index}{A stored data structure that facilitates the efficient retrieval of records in a relation based on the values of one or more attributes.}
|
||||
\section{Undo Logging and Recovery}
|
||||
|
||||
The primary motivation for indexing is the reduction of disk I/O. For example, finding a specific movie in a massive relation is much faster if the system can use an index on the title rather than performing a full table scan. In joins, indexes on the join attributes can allow the system to look up only the relevant matching tuples, avoiding the exhaustive pairing of every row from both relations.
|
||||
Logging is the primary method for achieving durability and atomicity. The log is an append-only file that records every important change to the database.
|
||||
|
||||
\section{Strategic Selection of Indexes}
|
||||
While indexes speed up queries, they impose a cost: every time a record is inserted, deleted, or updated, the associated indexes must also be modified. This creates a strategic trade-off for the database architect. A clustering index, where the physical order of records matches the index order, is exceptionally efficient for range queries as it minimizes the number of blocks that must be read. Non-clustering indexes are useful for locating individual records but may require many disk accesses if many rows match the search key, as the records might be scattered across different blocks.
|
||||
\dfn{Undo Logging}{A logging method where only the old values of modified data elements are recorded. It is designed to allow the recovery manager to cancel the effects of uncommitted transactions by restoring data to its previous state.}
|
||||
|
||||
\thm{Index Cost-Benefit Analysis}{The decision to create an index depends on the ratio of queries to modifications; an index is beneficial if the time saved during data retrieval exceeds the additional time required to maintain the index during updates.}
|
||||
For undo logging to function correctly, two specific rules must be followed:
|
||||
1. Every update record (the old value) must be written to the disk before the modified data element itself reaches the disk.
|
||||
2. The commit record must be written to the disk only after all modified data elements have been successfully flushed to the disk.
|
||||
|
||||
Architects often use a cost model based on the number of disk I/O's to evaluate the utility of a proposed index. This model considers factors like the number of tuples (T), the number of blocks (B), and the number of distinct values for an attribute (V).
|
||||
\nt{In undo logging, the order of writes to disk is: Log record $\to$ Data element $\to$ Commit record. This ensures that if a crash occurs before the commit, we always have the old value available to undo the change.}
|
||||
|
||||
\section{Historical Foundations of Relational Theory}
|
||||
The modern concept of database architecture and data independence traces back to the seminal work of Edgar Codd in 1970. Codd's introduction of the relational model revolutionized the field by suggesting that data should be viewed as sets of tuples in tables, independent of their physical storage. This shift allowed for the development of high-level query languages like SQL and sophisticated optimization techniques that define the current state of database management systems. Subsequent research into integrity checking and semistructured data models like XML continues to build upon these relational foundations.
|
||||
\section{Redo Logging and the Write-Ahead Rule}
|
||||
|
||||
\dfn{Data Independence}{The principle that the logical structure of data (the schema) should be separated from its physical storage, ensuring that changes to the storage method do not require changes to application programs.}
|
||||
While undo logging requires immediate data flushes, redo logging offers more flexibility by recording only the new values of data elements.
|
||||
|
||||
\dfn{Redo Logging}{A logging method that records the new values of database elements. On recovery, the system repeats the changes of committed transactions and ignores those that did not commit.}
|
||||
|
||||
\thm{Write-Ahead Logging (WAL) Rule}{In redo logging, all log records pertaining to a modification, including the update record and the commit record, must appear on disk before the modified data element itself is written to disk.}
|
||||
|
||||
The order of operations for redo logging is: Log record $\to$ Commit record $\to$ Data element. This allows the system to keep changed data in memory buffers longer, potentially reducing disk I/O, as the log provides a way to "redo" the work if memory is lost.
|
||||
|
||||
\section{Undo/Redo Logging and Checkpointing}
|
||||
|
||||
A hybrid approach, undo/redo logging, records both the old and new values of a database element ($<T, X, v, w>$). This provides the highest level of flexibility, as the commit record can be written either before or after the data elements are flushed to disk.
|
||||
|
||||
\nt{Undo/redo logging is the most common method in modern DBMSs because it allows the buffer manager to be more efficient. It only requires that the log record for a change reach the disk before the change itself does.}
|
||||
|
||||
To avoid scanning the entire log during recovery, the system uses checkpointing.
|
||||
|
||||
\dfn{Nonquiescent Checkpointing}{A technique that allows the system to mark a "safe" point in the log without shutting down the database. It records the set of active transactions and ensures that all data changed by previously committed transactions has reached the disk.}
|
||||
|
||||
\section{Concurrency Control and Serializability}
|
||||
|
||||
When multiple transactions run at once, their actions form a schedule. The goal of the scheduler is to ensure that this schedule is serializable.
|
||||
|
||||
\dfn{Conflict-Serializable Schedule}{A schedule that can be transformed into a serial schedule (where transactions run one after another) by a sequence of swaps of adjacent, non-conflicting actions.}
|
||||
|
||||
\thm{The Precedence Graph Test}{A schedule is conflict-serializable if and only if its precedence graph—where nodes are transactions and arcs represent conflicts—is acyclic. A conflict occurs if two transactions access the same element and at least one is a write.}
|
||||
|
||||
\section{Lock-Based Schedulers and Two-Phase Locking}
|
||||
|
||||
The most common way to enforce serializability is through the use of locks. Before a transaction can access a database element, it must obtain a lock on that element from the scheduler's lock table.
|
||||
|
||||
\dfn{Shared and Exclusive Locks}{A Shared (S) lock allows multiple transactions to read an element simultaneously. An Exclusive (X) lock is required for writing and prevents any other transaction from reading or writing that element.}
|
||||
|
||||
Simply taking locks is insufficient; the timing of when locks are released is vital for maintaining a consistent state.
|
||||
|
||||
\thm{Two-Phase Locking (2PL)}{A protocol requiring that in every transaction, all lock acquisitions must precede all lock releases. This creates a "growing phase" where locks are gathered and a "shrinking phase" where they are surrendered.}
|
||||
|
||||
\nt{Strict Two-Phase Locking is a variation where a transaction holds all its exclusive locks until it commits or aborts. This prevents other transactions from reading "dirty data"—values written by uncommitted transactions—and eliminates the need for cascading rollbacks.}
|
||||
|
||||
\section{Deadlock Management}
|
||||
|
||||
Locking systems are inherently prone to deadlocks, where transactions are stuck in a cycle of waiting for one another. Schedulers must implement strategies to detect or prevent these states.
|
||||
|
||||
\dfn{Waits-For Graph}{A directed graph used for deadlock detection. Nodes represent transactions, and an arc from $T$ to $U$ indicates that $T$ is waiting for a lock currently held by $U$. A cycle in this graph indicates a deadlock.}
|
||||
|
||||
Prevention strategies often involve timestamps. Two popular methods are:
|
||||
\begin{itemize}
|
||||
\item \textbf{Wait-Die:} If an older transaction needs a lock held by a newer one, it waits. If a newer transaction needs a lock held by an older one, it dies (rolls back).
|
||||
\item \textbf{Wound-Wait:} An older transaction "wounds" (forces a rollback) a newer transaction that holds a lock it needs. A newer transaction must wait for an older one.
|
||||
\end{itemize}
|
||||
|
||||
\section{Alternative Concurrency Control: Timestamps and Validation}
|
||||
|
||||
Beyond locking, systems may use optimistic concurrency control methods, which are particularly effective when conflicts are rare.
|
||||
|
||||
\dfn{Timestamp-Based Scheduling}{A method where each transaction is assigned a unique timestamp when it begins. The scheduler maintains read and write times for every database element and rolls back any transaction that attempts to perform a "physically unrealizable" action, such as reading a value written in its future.}
|
||||
|
||||
\dfn{Validation-Based Scheduling}{An optimistic approach where transactions execute in a private workspace. Before committing, the transaction enters a validation phase where the scheduler checks its read and write sets against those of other active transactions to ensure no serializability violations occurred.}
|
||||
|
||||
\section{Long-Duration Transactions and Sagas}
|
||||
|
||||
In environments like design systems or workflow management, transactions can last for hours or even days. Holding locks for such durations would paralyze the system.
|
||||
|
||||
\dfn{Saga}{A long-duration transaction consisting of a sequence of smaller, independent actions. Each action is its own transaction that commits immediately.}
|
||||
|
||||
\thm{Compensating Transactions}{For every action $A$ in a saga, there must be a corresponding compensating transaction $A^{-1}$ that logically undoes the effects of $A$. If the saga must abort, the system executes the compensating transactions in reverse order to return the database to a consistent state.}
|
||||
|
||||
\nt{A saga does not strictly follow the traditional "Isolation" property of ACID, as the results of its intermediate actions are visible to other transactions. However, through the use of compensation, it maintains the logical consistency of the system.}
|
||||
|
||||
In conclusion, transaction management requires a delicate balance between performance and correctness. By combining robust logging for durability, strict locking for isolation, and innovative structures like sagas for long-term processes, modern database systems provide a stable foundation for complex information ecosystems. These mechanisms ensure that even in the event of hardware failure or intense concurrent demand, the integrity of the data remains unassailable.
|
||||
|
||||
To visualize a transaction, think of it as a set of instructions for a complicated recipe. If you get halfway through and realize you are missing a vital ingredient, you cannot just stop and leave the half-mixed dough on the counter. You must either finish the recipe or clean up the mess so the kitchen is exactly as it was before you started. The database scheduler and log manager are like the head chef, ensuring that every cook has the tools they need and that no one's flour ends up in someone else's soup.
|
||||
|
||||
Reference in New Issue
Block a user