Wednesday, May 28, 2014

Bringing Transactional Guarantees to MongoDB: Part 1

In this blog post I'll present some recent work we've been doing to bring stronger transactional guarantees to MongoDB. In part 2 I'll present a code example that shows this in action in WildFly 8.

What requirements are we fulfilling? 

1) Updating multiple MongoDB documents in a single transaction
2) Support for sharded environments, without harming scalability
3) Support for global transactions spanning other datastores and traditional relational databases.
4) A middleware solution that's simple for developers to use.

This post covers the background and explains why a compensating transaction (vs an ACID transaction) could be the best fit to meet the above requirements. Part two in this series is more implementation focused. It presents a code example, showing you how you can use the technology, whilst omitting a lot of the theory (that is covered in this post).

Background

NoSQL datastores were originally built as bespoke, in-house solutions, to meet scalability requirements that it was felt relational databases couldn't meet. The general thinking was that ACID transactions would harm scalability and that it was better to workaround that requirement. However, as NoSQL adoption spread beyond its in-house roots, it became clear that many applications do indeed need a level of reliability that transactions can bring.

Typically a NoSQL datastore offers atomic updates to single items, such as a document or key-value (more generally, an aggregate). Therefore, structuring data into aggregates, can mean that the application never needs to update more than one document at a time, within the same transaction. Mostly, this could be true. However, there are cases in which it's not possible to structure the data in this way. Take the classic example of moving funds from one user's account to another. It doesn't make sense to store all users in the same aggregate as it will create a lot of contention and a very large aggregate! Therefore the only option is to deal with each user's data in separate atomic operations. Without a transaction spanning these operations, the application runs the risk of becoming inconsistent in the event of failure. Another example is when the application needs to make updates to a NoSQL datastore in the same transaction as an RDBMS or JMS interaction. Typically NoSQL datastores don't support this.

MongoDB and other NoSQL datastores scale through a combination of sharding and replica-sets. I won't go into the specifics here on how this works. However, the key point is that the data becomes distributed over several nodes. Updating multiple data items atomically requires a distributed transaction. As well as being complex to implement, under certain workloads, a distributed ACID transaction can limit scalability. I suspect it is for these reasons that very few NoSQL datastores support ACID transactions in a sharded environment.

The blocking nature of an ACID transaction is the key property that limits scalability. For the duration of the transaction, external readers and writers are blocked until the transaction completes. For contended data, this can result in lots of waiting. The longer the transaction takes to run, the worse the problem. As well as delay introduced by applications, distributing data over a cluster or multiple databases can also result in longer running transactions. However, for data with low-contention, it's possible that an ACID transaction doesn't harm your scalability, in which case you should consider using them as they are a lot simpler to deal with.

Compensating transactions offer an alternative to ACID transactions. They remove the blocking property by relaxing Isolation and Consistency. Despite offering fewer guarantees than ACID transactions, they offer significantly more guarantees than forgoing transactions altogether. Furthermore, in many applications these guarantees are enough, and any more are superfluous. ACID vs Compensating transactions are discussed in more detail in my blog series "Compensating Transactions: when ACID is too much". Here I also show a pattern for working around the relaxed properties.

Using Compensating-transactions with MongoDB

Through Narayana and WildFly's compensating transactions feature, we can fullfil the requirements stated at the start of this blog as follows:

1) Multiple document updates. 
A compensation handler is logged with each document update. In the case of failure, or if the application elects to cancel the transaction, any partially completed work is compensated, resulting in an atomic outcome. Narayana can build on the atomic update mechanism provided by MongoDB, by logging a reference to the compensating handler in the updated document, in the same atomic update as the business-logic's update. This ensures that either i) both the business-logic update, and compensating handler are persisted; or ii) neither is persisted. The handler can be removed at the end of the protocol. It is this construct that the rest of the protocol is built on, allowing recovery to be achieved regardless of what stage the protocol is in during failure.


2) Sharded environments. 
This approach builds on the atomic-update primitive offered by MongoDB. As this feature works in a sharded environment, so does the compensating-transaction that is built upon it. Furthermore, scalability is maintained due to i) the units of work, composing the transaction operating relatively-quickly; and ii) external readers and writers not being blocked during the progress of the transaction. This holds true, regardless of the duration of the transaction.

3) Support for global transactions. 
The transaction is coordinated by an external transaction manager which means multiple datastores or databases can be enlisted. As this is a general approach, it should be possible to mix the databases and datastore types. For example, an RDBMS and/or a JMS resource can also be enlisted in the global compensating-transaction as well as multiple NoSQL datastores. Furthermore, not all participants need to enlist as compensating resources. Those, which are more traditionally used with ACID transactions, can enlist as an ACID resource, using traditional XA. Here the ACID (XA) resources would experience the full ACID properties, with the compensating resources experiencing the relaxed-ACID properties of a compensating transaction.

4) A middleware solution that's simple for developers to use. 
Narayana offers an annotation-based API very similar to JTA 1.2, for using compensating transactions in your application. Furthermore, it comes pre-installed in WildFly 8, so you don't need to worry about complex setup. This API is discussed in more detail in the this blog series. Part 2 in this series will show how this API can be used to update two MongoDB documents within a compensating transaction.

Can this be done already with MongoDB?

The MongoDB documentation proposes a pattern for updating multiple documents in a relaxed-ACID transaction. This approach is similar to the Narayana approach in that they are both based on Sagas and result in similar interactions with the datastore. However, where the Narayana approach differs is that it provides a middleware-solution and so doesn't need to be developed within the application. Also, this approach is driven by a transaction manager, making it simpler for the transaction to span multiple resources.


No comments: