Serializability Theory

Consider a database D = (x, y, z), on which we will concurrently perform a series of transactions T1, T2, ..., Tn. We want some way of knowing whether we executed the transactions "correctly." Formally, we state that an execution is correct if and only if it is equivalent to some serial execution of the transactions. In other words, a correct concurrent execution of the transactions produces the same result we would get if we executed them one at a time.

It may not be clear how executing transactions concurrently can cause problems. Date [1] identifies the Lost Update problem as a potential problem for concurrently executing transactions. A lost update occurs when two transactions, A and B, both update the same location R in the database. Suppose A reads R, but before A can update R, B reads R. Then, A updates R, and B updates R based on R's previous contents. B has overwritten A's update; A's update, thus, is lost. Note that this would not be a problem has A and B executed sequentially: A would have updated R before B read R.

In this section, we will consider how to use logs to detect whether a concurrent execution was serializable (ie correct).



Logs


A log is simply a record of all operations which each transaction performed. The log entries are chronological; each entry in the log gives the following information>
As we shall see, logs are central to our study of serializability.


Example: Consider the transactions T1, T2, and T3, defined below:

T1 = r1[x] r1[z] w1[x]

T2 = r2[y] r2[z] w2[y]

T3 = w3[x] r3[y] w3[z]

Two possible logs of their execution are:

L1 = w3[x] r1[x] r3[y] r2[y] w3[x] r2[z] w2[y] w1[x]

L2 = w3[x] r3[y] w3[z] r2[y] r2[z] w2[y] r1[x] r1[z] w1[x].



Serial Logs


A serial log is just a log in which transactions are run one at a time. In a serial log, there is no interleaving of operations from different transactions. L2 above is an example of a serial log, corresponding to the execution {T3, T2, T1}. L1 above is not a serial log, since an operation from transaction 1 (r1) occurs between two operations from transaction 3 (w3 and r3).

Log Equivalence


Intuitively, two logs are equivalent if the executions that produced them leave the database in the same state. The formal definition of equivalence is:

Two logs are equivalent if and only if:

We say a read operation j reads from a write operation i if read operation j reads a value most recently written by write operation i. The final write of a log is the log's last write operation.

Serializable Logs

Since we consider serial logs to be "correct", but we also want to execute transactions concurrently, we want to know whether a concurrent log is serializable; that is, equivalent to a serial log of the same transactions. Knowing this will allow us to execute transactions concurrently, but also verify the correctness of the execution.

The Serializability Theorem

To determine whether a log is serializable, we construct its serialization graph. This graph is constructed as follows: The transactions {T1, T2, ...,Tn} are nodes in the graph. There is a directed edge from Ti to Tj if and only if, for some x, one of the following hold:The Serializability Theorem states that: A log L is serializable if and only if SG(L) (the serialization graph) is acyclic.

Thus, given a transaction log, we can construct its serialization graph and determine whether the log is serializable.


Summary

In this section, we have determined how to tell whether a concurrent execution of transactions on a database was correct. We first decided that correctness meant that the execution was equivalent to a serial execution of the transactions. Then, we found that, by constructing a serialization graph, we can test its serializability, since serializability is equivalent to the graph's being acyclic.

Study Questions

1. Consider the following two transactions, and a log of their execution:

T1 = r1[a] r1[b] w1[c]

T2 = w2[b] w2[a] r2[c]

L = r1[z] r1[y] w2[y] w2[z] r2[x] w1[x]

Construct the serializability graph for this log, and show that this execution is not serializable.

2. Rearrange the log L above so that it is serializable. Try to do this without making L a serial log (ie retain concurrent execution of T1 and T2).

References

[1] Date, C.J. An Introduction to Database Systems, Addison-Wesley, 1990,

[2] Singhal, M and N Shivaratri. Advanced Concepts in Operating Systems, McGraw-Hill, 1994, pp. 488-491.