information_systems_for_eng…/sections/transactions_three_tiers.tex

\chapter{Transactions and the Three Tiers}

Modern database systems do not operate in isolation; they are embedded within complex multi-tier architectures designed to handle thousands of concurrent users. At the heart of this ecosystem is the concept of a transaction, a logical unit of work that ensures data integrity despite system failures or overlapping user actions. To maintain this integrity, databases adhere to the ACID properties—Atomicity, Consistency, Isolation, and Durability. This chapter explores the three-tier architecture that connects users to data, the hierarchical structure of the SQL environment, and the rigorous mechanics of transaction management, including isolation levels and locking protocols such as Two-Phase Locking (2PL).

\section{The Three-Tier Architecture}

Large-scale database installations typically utilize a three-tier architecture to separate concerns and improve scalability. This organization allows different components of the system to run on dedicated hardware, optimizing performance for each specific task.

\dfn{Three-Tier Architecture}{A system organization consisting of three distinct layers: the Web Server tier for user interaction, the Application Server tier for processing logic, and the Database Server tier for data management.}

The first tier consists of **Web Servers**. These processes act as the entry point for clients, usually interacting via a web browser over the Internet. When a user enters a URL or submits a form, the browser sends an HTTP (Hypertext Transfer Protocol) request to the web server. The web server is responsible for returning an HTML page, which may include images and other data to be displayed to the user.

\nt{Common web server software includes Apache and Tomcat, which are frequently used in both professional and academic environments to bridge the gap between web browsers and database systems.}

The second tier is the **Application Server**, often referred to as the **Business Logic** layer. This is where the core functionality of the system resides. When the web server receives a request that requires data, it communicates with the application tier. Programmers use languages such as Java, Python, C++, or PHP to write the logic that decides how to respond to user requests. This layer is responsible for generating SQL queries, sending them to the database, and formatting the returned results into a programmatically built HTML page or other responses.

The third tier is the **Database Server**. These are the processes running the Database Management System (DBMS), such as PostgreSQL or MySQL. This tier executes the queries requested by the application tier, manages data persistence on disk, and ensures that the system remains responsive through buffering and connection management.

\section{The SQL Environment}

Within the database tier, data is organized in a hierarchical framework known as the SQL environment. This structure allows for a clear namespace and organizational scope for all database elements.

\dfn{SQL Environment}{The overall framework under which database elements exist and SQL operations are executed, typically representing a specific installation of a DBMS.}

The hierarchy begins with the **Cluster**, which represents the maximum scope for a database operation and the set of all data accessible to a particular user. Within a cluster, data is organized into **Catalogs**. A catalog is the primary unit for supporting unique terminology and contains one or more **Schemas**.

A schema is a collection of database objects, including tables, views, triggers, and assertions. In professional environments, a full name for a table might look like `CatalogName.SchemaName.TableName`. If the catalog or schema is not explicitly specified, the system defaults to the current session's settings (e.g., `public` is often the default schema).

\thm{The Concept of Sessions}{A session is the period during which a connection between a SQL client and a SQL server is active, encompassing a sequence of operations performed under a specific authorization ID.}

\section{Fundamentals of Transactions}

A transaction is a single execution of a program or a batch of queries that must be treated as an indivisible unit. The goal of the transaction manager is to ensure that even if the system crashes or multiple users access the same record, the result remains correct.

\dfn{Transaction}{A collection of one or more database operations, such as reads and writes, that are grouped together to be executed atomically and in isolation from other concurrent actions.}

To be considered reliable, every transaction must satisfy the **ACID** test. These four properties are the cornerstone of database design theory.

\thm{ACID Properties}{
	\begin{itemize}
		\item \textbf{Atomicity:} Often described as "all-or-nothing," this ensures that a transaction is either fully completed or not executed at all. If a failure occurs halfway through, any partial changes must be undone.
		\item \textbf{Consistency:} A transaction must take the database from one consistent state to another, satisfying all integrity constraints like primary keys and check constraints.
		\item \textbf{Isolation:} Each transaction should run as if it were the only one using the system, regardless of how many other users are active.
		\item \textbf{Durability:} Once a transaction has been committed, its effects must persist in the database even in the event of a power outage or system crash.
	\end{itemize}}

\nt{Atomicity in transactions should not be confused with atomic values in First Normal Form. In this context, it refers to the indivisibility of the execution process itself.}

\section{Concurrency and Isolation Levels}

When multiple transactions run at the same time, their actions may interleave in a way that leads to inconsistencies. A **Schedule** is the actual sequence of actions (reads and writes) performed by these transactions. While a **Serial Schedule** (running one transaction after another) is always safe, it is inefficient. Schedulers instead aim for **Serializability**.

\thm{Serializability}{A schedule is serializable if its effect on the database is identical to the effect of some serial execution of the same transactions.}

If isolation is not properly managed, several types of "anomalies" can occur. These phenomena describe undesirable interactions between concurrent processes.

\dfn{Dirty Read}{A situation where one transaction reads data that has been modified by another transaction but has not yet been committed. If the first transaction subsequently aborts, the second transaction has based its work on data that "never existed."}

\dfn{Non-repeatable Read}{Occurs when a transaction reads the same data element twice but finds different values because another transaction modified and committed that element in the interim.}

\dfn{Phantom Read}{A phenomenon where a transaction runs a query to find a set of rows, but upon repeating the query, finds additional "phantom" rows that were inserted and committed by a concurrent transaction.}

SQL provides four **Isolation Levels** that allow developers to trade off strictness for performance.

\begin{itemize}
	\item \textbf{Read Uncommitted:} The most relaxed level; allows dirty reads.
	\item \textbf{Read Committed:} Forbids dirty reads but allows non-repeatable reads.
	\item \textbf{Repeatable Read:} Forbids dirty and non-repeatable reads but may allow phantoms.
	\item \textbf{Serializable:} The strictest level; ensures the result is equivalent to some serial order.
\end{itemize}

\section{Locking and Two-Phase Locking (2PL)}

The most common way for a database to enforce serializability is through the use of **Locks**. Before a transaction can read or write a piece of data, it must obtain a lock on that element. These are managed via a **Lock Table** in the scheduler.

\dfn{Shared and Exclusive Locks}{A Shared (S) lock is required for reading and allows multiple transactions to read the same element. An Exclusive (X) lock is required for writing and prevents any other transaction from accessing that element.}

Simply using locks is not enough to guarantee serializability; the timing of when locks are released is critical. If a transaction releases a lock too early, another transaction might intervene and change the data, leading to a non-serializable schedule. To prevent this, systems use the **Two-Phase Locking (2PL)** protocol.

\thm{Two-Phase Locking (2PL)}{A protocol requiring that in every transaction, all locking actions must precede all unlocking actions. This creates two distinct phases: a "growing phase" where locks are acquired and a "shrinking phase" where they are released.}

\nt{Strict Two-Phase Locking is a variation where a transaction does not release any exclusive locks until it has committed or aborted. This prevents other transactions from reading dirty data and avoids the need for cascading rollbacks.}

A potential downside of locking is the risk of a **Deadlock**. This occurs when two or more transactions are stuck in a cycle, each waiting for a lock held by the other. Schedulers must be able to detect these cycles—often using a **Waits-For Graph**—and resolve them by aborting one of the transactions.

In conclusion, the management of transactions requires a deep integration of architectural tiers, hierarchical environments, and rigorous concurrency control. By utilizing ACID properties, various isolation levels, and the 2PL protocol, database systems provide a robust platform where users can safely interact with data as if they were the sole occupants of the system.

\nt{In practice, many developers use higher-level APIs like JDBC for Java or PHP's PEAR DB library to handle the complexities of database connections and transaction boundaries programmatically.}

To think of it another way, a transaction is like a single entry in a shared diary. Even if twenty people are writing in the same diary simultaneously, the system acts like a careful librarian, ensuring that each person's entry is written cleanly on its own line without anyone's ink smudging another's work.