Sunday, March 13, 2011

Synchronizations: what are they and why do you need them?

As well as the two-phase commit protocol, traditional transaction processing systems employ an additional protocol, often referred to as the synchronization protocol. If you recall the original ACID properties, then you’ll remember that Durability is important in the case where state changes have to be available despite failures. What this means is that applications interact with a persistence store of some kind (e.g., a database) and this can impose a significant overhead – disk access is orders of magnitude slower than access to main computer memory.

One apparently obvious solution to this problem would be to cache the state in main memory and only operate on that for the duration of a transaction. Unfortunately you’d then need some way of being able to flush the state back to the persistent store before the transaction terminates, or risk losing the full ACID properties. This is what the synchronization protocol does, with Synchronization participants.

Synchronizations are informed that a transaction is about to commit, so they can, for example, flush cached state, which may be being used to improve performance of an application, to a durable representation prior to the transaction committing. They are then informed when the transaction has completed and in what state it completed.

Synchronizations essentially turn the two-phase commit protocol into a four-phase protocol:

• Before the transaction starts the two-phase commit, all registered Synchronizations are informed. Any failure at this point will cause the transaction to roll back.
• The coordinator then conducts the normal two-phase commit protocol.
• Once the transaction has terminated, all registered Synchronizations are informed. However, this is a courtesy invocation because any failures at this stage are ignored: the transaction has terminated so there’s nothing to affect.

Unlike the two-phase commit protocol, the synchronization protocol does not have the same failure requirements. For example, Synchronization participants don’t need to make sure they can recover in the event of failures; this is because any failure before the two-phase commit protocol completes means the transaction will roll back, and failures after it has completed can’t affect the data the Synchronization participants were managing.

So we've now looked at ACID and Synchronizations. What about optimisations to the protocol when we don't need two-phase commit? Stay tuned.

No comments: