The Two-Phase Commit Protocol

Mike Duckett

last updated on April 30, 1995


Overview

The two phase commit protocol is a distributed algorithm which lets all sites in a distributed system agree to commit a transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of site failures and message losses. However, due to the work by Skeen and Stonebraker, the protocol will not handle more than one random site failure at a time. The two phases of the algorithm are broken into the COMMIT-REQUEST phase, where the COORDINATOR attempts to prepare all the COHORTS, and the COMMIT phase, where the COORDINATOR completes the transactions at all COHORTS.

Assumptions

The protocol works in the following manner: One node is designated the coordinator, which is the master site, and the rest of the nodes in the network are called cohorts. Other assumptions of the protocol include stable storage at each site and use of a write ahead log by each node. Also, the protocol assumes that no node crashes forever, and eventually any two nodes can communicate with each other. The latter is not a big deal since network communication can typically be rerouted. The former is a much stronger assumption; suppose the machine blows up!

Basic Algorithm

During phase 1, initially the coordinator sends a query to commit message to all cohorts. Then it waits for all cohorts to report back with the agreement message. The cohorts, if the transaction was successful, write an entry to the undo log and an entry to the redo log. Then the cohorts reply with an agree message, or an abort if the transaction failed at a cohort node. During phase 2, if the coordinator receives an agree message from all cohorts, then it writes a commit record into its log and sends a commit message to all the cohorts. If all agreement messages do not come back the coordinator sends an abort message. Next the coordinator waits for the acknowledgement from the cohorts. When acks are received from all cohorts the coordinator writes a complete record to its log. Note the coordinator will wait forever for all the acknowledgements to come back. If the cohort receives a commit message, it releases all the locks and resources held during the transaction and sends an acknowledgement to the coordinator. If the message is abort, then the cohort undoes the transaction with the undo log and releases the resources and locks held during the transaction. Then it sends an acknowledgement.

Disadvantages

The greatest disadvantage of the two phase commit protocol is the fact that it is a blocking protocol. A node will block while it is waiting for a message. This means that other processes competing for resource locks held by the blocked processes will have to wait for the locks to be released. A single node will continue to wait even if all other sites have failed. If the coordinator fails permanently, some cohorts will never resolve their transactions. This has the effect that resources are tied up forever.

Another disadvantage is the protocol is conservative. It is biased to the abort case rather than the complete case.


The Detailed Commit Protocol[2]

At the COORDINATOR:

  1. The COORDINATOR sends the message to each COHORT. The COORDINATOR is now in the preparing transaction state.
  2. Now the COORDINATOR waits for responses from each of the COHORTS. If any COHORT responds ABORT then the transaction must be aborted, proceed to step 5. If all COHORTS respond AGREED then the transaction may be commited, and proceed to step 3. If after some time period all COHORTS do not respond the COORDINATOR can either transmit ABORT messages to all COHORTS or transmit COMMIT-REQUEST messages to the COHORTS that have not responded. In either case the COORDINATOR will eventually go to state 3 or state 5.
  3. Record in the logs a COMPLETE to indication the transaction is now completing. Send COMMIT message to each of the COHORTS.
  4. Wait for each COHORT to respond. They must reply COMMIT. If after some time period some COHORT has not responded retransmit the COMMIT message. Once all COHORTS have replied erase all associated information from permanent memory ( COHORT list, etc. ). DONE.
  5. Send the ABORT message to each COHORT.

At cohorts:

  1. If a COMMIT-REQUEST message is received for some transaction t which is unknown at the COHORT ( never ran, wiped out by crash, etc ), reply ABORT. Otherwise write the new state of the transaction to the UNDO and REDO log in permanent memory. This allows for the old state to be recovered ( in event of later abort ) or committed on demand regardless of crashes. The read locks of a transaction may be released at this time; however, the write locks are still maintained. Now send AGREED to the COORDINATOR.
  2. If an ABORT message is received then kill the transaction, which involves deleting the new state if the transaction from the REDO and UNDO log the new state of the transaction and restoring any state before the transaction occured.
  3. If a COMMIT message is received then the transaction is either prepared for commital or already committed. If it is prepared, perform all the operations necessary to update the database and release the remaining locks the transaction possesses. If it is already commited, no further action is required. Respond COMMITED to the COORDINATOR.

Figure 1 - Finite State Diagram of Commit Protocol for Coordinator and Cohort [3]

Legend


Correctness

We assert the claim that if one COHORT completes the transaction all COHORTS complete the transaction eventually. The proof for correctness proceeds somewhat informally as follows: If a COHORT is completing a transaction, it is so only because the COORDINATOR sent it a COMMT message. This message is only sent when the COORDINATOR is in the commit phase, in which case all COHORTS have responded to the COORDINATOR AGREED. This means all COHORTS have prepared the transaction, which implies any crash at this point will not harm the transaction data because it is in permanent memory. Once the COORDINATOR is completing, it is insured every COHORT completes before the COORDINATOR's data is erased. Thus crashes of the COORDINATOR do not interfere with the completion.

Therefore if any COHORT completes, then they all do. The abort sequence can be argued in a similar manner. Hence the atomicity of the transaction is guaranteed to fail or complete globally.


Other Sites

Here are some other sites who use the two-phase commit protocol :

Glossary

Commit - an action which indicates that the process or transaction updating the object has successfully completed. Hence the changes done to the database can be made permanent.

Commit Protocol - an algorithm which ensures that all the sites in a distributed system either commit or abort a transaction unanimously, even in the presence of failures. They force the system to behave in a predetermined manner in the presence of failures.

do - an operation which updates and writes a log record of the operation performed.

log - a record of system activity which is recorded in sufficient detail so that a previous state of a process can be restored by reversing the changes made.

redo - given a log record written by a do, this operation redoes the action specified in the log.

stable storage - permanent storage ( i.e. hard drive ) which can write objects in an atomic manner.

transaction - an atomic action which is some computation that changes or reads the state of one or more data objects and appears to take place indivisibly.[2]

undo - given a log record written by a do, this operation undoes the action specified in the log.

write-ahead log protocol - a method in which operations done on a database may be undone after restarting a system due to a failure ( hardware, network, etc...). The protocol is implemented by the following operations :


References

[1] Gray, J.N., "Notes on DataBase Operating Systems," Operating Systems : An Advances Course, Springer-Verlag, 1979, New York, pp.393-481.

[2] Moss, Elliot, Nested Transactions : An Approach to Reliable Distributed Computing, The MIT Press, Cambridge, Massachusetts, 1985, pp.31-38.

[3] Singhal, M. and Shivaratri, N., Advanced Concepts in Operating Systems, McGraw-Hill, 1994, pp. 302-303, pp. 334-335, p. 337


Go to Operating Systems Home Page