Monday, June 18, 2012

faking it

Every now and then you find yourself unable to use XA in a situation where you need full ACID guarantees across updates to multiple systems. Once upon a time this occurred mainly in situations where one of your relational databases did not support XA. These days it's more likely to be some newfangled NoSQL store that's not playing nice in the XA ecosystem. When this happens you will more often than not try to convince yourself you can fake it. You probably can't.

XA is a consensus protocol for guaranteeing ACID on transactions spanning multiple resource managers. The two phase commit protocol it uses is a means to that end. For everything to work correctly, it requires that the resource managers make some promises: that they keep changes hidden from other users until the transaction commits and that after the prepare stage they remain able to commit (or rollback) even if they crash before a decision is reached. That's Isolation (usually with a side order of Atomicity) and Durability in a nutshell.

The cheap and cheerful fake version of XA is LRCO. Last Resource Commit Optimization, sometimes called Last Resource Gambit, allows for one non-XA RM in an otherwise entirely XA transaction. By ordering the non-XA resource last in the 2PC processing order you can achieve something that behaves like XA for most scenarios. But close inspection reveals the poor quality of the knockoff goods: LRCO breaks horribly if a crash occurs at certain points in the execution. In such cases inconsistencies between the RMs can arise and reconciling them requires human intervention or complex code. So the savings may not be as good as they first appear.

The more convincing version of LRCO is known as LLRO (Logging Last Resource Optimization) or 1.5 phase commit. In this model you write the transaction manager's log entry to the same database that is being used as the last resource. By making the log write atomic with the commit, you close the timing window during which crashes can cause problems. Which is all well and good if your last resource happens to be a database that can take the logging workload. In cases where it is e.g. a mail server, you're still in trouble.

For the more general case providing the ACID properties yourself in a situation where the resource manager doesn't is hard. Like academic research effort hard. Indeed there are several such research projects in progress right now. Some focus on implementing a custom driver for the resource manager that applies changes against a tx local cache during the transaction, logs them to local disk at prepare and flushes the modifications to the real RM at commit time. This is hard where the resource manager implements complex features, as you wind up needing to reimplement them in the driver or avoid using them in the application code. It's also vulnerable to write conflicts in highly concurrent environments, especially where other clients are accessing the resource manager directly rather than through the custom driver.

The inverse of that model is compensation based transactions. These apply the state changes to the resource manager immediately or at prepare time and then apply additional changes to undo the effects if the transaction needs to roll back. This approach does not offer isolation though - the changes are visible to other users for a time even if the tx is not committed. It's also difficult to generate correct compensation logic for many operations.

All the alternatives to XA have limitations or drawbacks. If your circumstances are such that you can live with those, you may be able to use a different protocol to good effect. But beware the corner cases. You can't fool all the people all the time.

references:  (No I didn't steal the answer, I wrote it.)