information_systems_for_eng…/sections/transactions_three_tiers.tex

\chapter{Transactions and the Three Tiers}

The evolution of data management has shifted from localized, single-machine installations to complex, multi-tiered architectures that support massive user bases across the globe. This chapter explores the foundational structures of modern information systems, specifically focusing on how databases operate within a server environment. We examine the interaction between various layers of processing, known as the three-tier architecture, and the logical organization of data into environments, clusters, catalogs, and schemas. Furthermore, we investigate the mechanisms that allow general-purpose programming languages, such as Java, to interact with SQL through call-level interfaces like JDBC. Central to this discussion is the management of transactions, which ensure that even in highly concurrent and distributed settings, the integrity and consistency of data are maintained through the adherence to the ACID properties and the management of isolation levels.

\dfn{Database Management System}{A specialized software system designed to create, manage, and provide efficient, safe, and persistent access to large volumes of data over long periods of time.}

\section{The Three-Tier Architecture}

Modern large-scale database installations typically utilize a three-tier or three-layer architecture. This structure is designed to separate different functional concerns, which allows for better scalability, security, and maintenance.

\thm{Three-Tier Architecture}{A system organization that distinguishes between three interacting layers: the Web Server tier (user interface), the Application Server tier (business logic), and the Database Server tier (data management).}

The first layer is the Web-Server Tier. This tier manages the primary interaction with the user, often through the Internet. When a customer accesses a service, a web server responds to the initial request and presents the interface, such as an HTML page with forms and menus. The client's browser handles the user's input and transmits it back to the web server, which then communicates with the application tier.

The middle layer is the Application-Server Tier. This is where the "business logic" of an organization resides. The responsibility of this tier is to process requests from the web server by determining what data is needed and how it should be presented. In complex systems, this tier might be divided into subtiers, such as one for object-oriented data handling or another for information integration, where data from multiple disparate sources is combined. The application tier performs the heavy lifting of turning raw database information into a meaningful response for the end user.

The final layer is the Database-Server Tier. This layer consists of the processes that run the Database Management System (DBMS). It receives query and modification requests from the application tier and executes them against the stored data. To ensure efficiency, this tier often maintains a pool of open connections that can be shared among various application processes, avoiding the overhead of constantly opening and closing connections.

\section{The SQL Environment and Its Logical Organization}

The SQL environment provides the framework within which data exists and operations are executed. This environment is organized into a specific hierarchy to manage terminology and scope.

\dfn{SQL Environment}{The overall framework, typically an installation of a DBMS at a specific site, under which database elements are defined and SQL operations are performed.}

At the top of this hierarchy is the Cluster. A cluster is a collection of catalogs and represents the maximum scope over which a single database operation can occur. Essentially, it is the entire database as perceived by a specific user.

Below the cluster is the Catalog. Catalogs are used to organize schemas and provide a unique naming space. Each catalog must contain a special schema that holds information about all other schemas within that catalog.

The most basic unit of organization is the Schema. A schema is a collection of database elements such as tables, views, triggers, and assertions. One can create a schema using a specific declaration or modify it over time.

\section{Establishing Connections and Sessions}

For a program or a user to interact with the database server, a link must be established. This is handled through connections and sessions. A connection is the physical or logical link between a SQL client (often the application server) and a SQL server. A user can open multiple connections, but only one can be active at any given moment.

\dfn{Session}{The sequence of SQL operations performed while a specific connection is active. It includes state information such as the current catalog, current schema, and the authorized user.}

When a connection is established, it usually requires an authorization clause, which includes a username and password. This ensures that the current authorization ID has the necessary privileges to perform the requested actions. In this context, a "Module" refers to the application program code, while a "SQL Agent" is the actual execution of that code.

\section{Transactions and the ACID Properties}

Transactions are the fundamental units of work in a database system. To ensure that the database remains in a consistent state despite concurrent access or system failures, every transaction must follow a set of requirements known as the ACID properties.

\thm{ACID Properties}{A set of four essential characteristics of a transaction: Atomicity (all-or-nothing execution), Consistency (preserving database invariants), Isolation (executing as if in isolation), and Durability (permanent storage of results).}

Atomicity ensures that if a transaction is interrupted, any partial changes are rolled back, leaving the database as if the transaction never started. Consistency guarantees that a transaction moves the database from one valid state to another, respecting all defined rules. Isolation is managed by a scheduler to ensure that the concurrent execution of multiple transactions results in a state that could have been achieved if they were run one after another. Finally, Durability ensures that once a transaction is committed, its effects will survive even a subsequent system crash.

\subsection{Transactional Phenomena and Isolation Levels}

When multiple transactions run simultaneously, several problematic phenomena can occur if isolation is not strictly enforced.

1. \textbf{Dirty Read}: This happens when one transaction sees data that has been written by another transaction but has not yet been committed. If the first transaction eventually aborts, the data seen by the second transaction effectively never existed.
2. \textbf{Nonrepeatable Read}: A transaction reads the same data twice but finds different values because another transaction modified and committed that data in the meantime.
3. \textbf{Phantom Read}: A transaction runs a query multiple times and finds "phantom" rows that were inserted by another committed transaction during the process.
4. \textbf{Serialization Anomaly}: This occurs when the result of a group of concurrent transactions is inconsistent with any serial ordering of those same transactions.

To manage these risks, SQL defines various "Isolation Levels." The most stringent is "Serializable," which prevents all the aforementioned phenomena. Lower levels, such as "Read Committed" or "Read Uncommitted," allow for higher concurrency at the risk of encountering some of these issues.

\subsection{Java Database Connectivity (JDBC)}

One of the most common ways to implement the application tier is through Java, using the JDBC call-level interface. JDBC allows a Java program to interact with virtually any SQL database by using a standard set of classes and methods.

\dfn{JDBC}{A Java-based API that provides a standard library of classes for connecting to a database, executing SQL statements, and processing the results.}

The process begins by loading a driver for the specific DBMS, such as MySQL or PostgreSQL. Once the driver is loaded, a connection is established using a URL that identifies the database, along with credentials for authorization.

In JDBC, there are different types of statements used to interact with the data. A simple `Statement` is used for queries without parameters, while a `PreparedStatement` is used when a query needs to be executed multiple times with different values. These parameters are denoted by question marks in the SQL string and are bound to specific values before execution.

The result of a query in JDBC is returned as a `ResultSet` object. This object acts like a cursor, allowing the program to iterate through the resulting tuples one at a time using a `next()` method. For each tuple, the programmer uses specific getter methods, such as `getInt()` or `getString()`, to extract data based on the attribute's position in the result.

\thm{JDBC Interaction Pattern}{The standard flow of database access in Java: Load Driver $\rightarrow$ Establish Connection $\rightarrow$ Create Statement $\rightarrow$ Execute Query/Update $\rightarrow$ Process Results via ResultSet $\rightarrow$ Close Connection.}

This interface effectively solves the "impedance mismatch" between the set-oriented world of SQL and the object-oriented world of Java. By providing a mechanism to fetch rows individually, it allows Java's iterative control structures to process data retrieved from SQL's relational queries. Furthermore, it supports the execution of updates, which encompass all non-query operations like insertions, deletions, and schema modifications. This robust framework is essential for building the business logic required in the application tier of the three-tier architecture.```