Thursday, March 10, 2011

NoSQL != NoTx

The extent of support for transaction concepts in NoSQL systems is almost as diverse as the systems themselves. Some have not only full support for 'native' i.e. local transactions, but also support distributed transactions using XA. Others barely support CAS or durable write, much less any higher level constructs. This is not necessarily a bad thing, but does make life 'interesting' for those of us trying to build consistent abstractions over various systems, or to use them as components in the composition of large, reliable systems.

My current preference in NoSQL is Cassandra, which tends towards the 'no support for transactions' end of the spectrum and therefore presents a more interesting challenge than systems that come with XA already baked in.

Cassandra is architected as a BASE rather than an ACID system. In CAP terms it favours availability and partition tolerance over consistency. That's not to say there is no hope of consistency, just that you have to make some compromises if you want it.

There is durability (but don't forget to switch CommitLogSync from periodic to batch), and atomicity in so far as operations on a single row key are atomic. Batch mutates are non-atomic and there is no locking primitive so you can't do much in the way of consistency over multiple API calls.

Non of which is necessarily a problem if you're aware of it and writing your app from the ground up with those constraints in mind. In which case, lucky you. The rest of us get to endure some degree of pain, particularly when migrating, either mentally or physically, from a SQL environment.

There are several ongoing efforts to put a JPA like abstraction on top of Cassandra, thus making it readily accessible to the hordes of JPA trained Java EE programmers out there. Thus far these focus mainly on the data mapping and indexing considerations for query support, whilst avoiding or deferring the tricky transactional bits. That's not to say there is no appreciation of the looming problem, just that it seems nobody has stepped up and tackled it seriously yet.

In order to build an even semi-credible OGM façade over Cassandra or some other approximately key-value oriented store, we need support for certain transactional characteristics, of which repeatable read is perhaps the most fundamental.

The expectation of repeatable read is that if I retrieve the value for a given key twice in the same transaction, I expect it to have the same value, regardless of other users manipulating the database in the meanwhile. If I perform some form of search or index based query multiple times I expect the result set to be consistent within the transaction. If I update a value and then retrieve it, I expect to see the result of the update within the transaction even before I make a commit.

In an RDBMS it's the database engine's job to deal with this and all of them do, using mechanisms from locking to MVCC. All the ORM layer has to do is delegate the problem to the store. In a NoSQL world where the store has no such support and no plans to add any, the client has to do some additional work.

For simple key lookups the client can cache the result locally and use that copy to serve subsequent reads in the same transaction, effectively doing its own MVCC. Most ORMs do this already for performance, although they'll flush modifications and use the relational db engine to perform queries rather than compute the result over the cached copies, which avoids reimplementing large chunks of a db in the ORM but is not feasible when there is no support for it in the store layer.

The client cache approach is fine for small tx, but becomes a problem when the read size is large. Although they are arguably an anti-pattern, tx on large sets can be supported by using a backing store, perhaps the NoSQL engine itself, to store the cache copies when they overspill the available client RAM. However, the client needs some way of isolating these private copies and for rewriting queries to use them.

Alternatively, it can use an external lock manager to perform read/write locking on keys to prevent them being changed for the duration of the transaction. However, this only works where all access to the stores is going via clients that are lockmanager aware. In siloed deployments where a single app exclusively owns the data that is not unreasonable. In enterprise deployments where a single store is shared by multiple apps, each perhaps using a different client library, potentially in a different language, it gets a bit hairy.

The Cassandra community's best effort in this direction thus far is Cages, written by Dominic Williams. Cages is a Java client solution that uses ZooKeeper as a distributed lock manager. It's not a general purpose solution though, as it assumes the complete list of locks required in known in advance. That's fine for apps written to Cassandra's native usage model, where the assumption is that you've laid out your data based on knowing exactly what queries you're going to be performing.

Where you are trying to use Cassandra to support arbitrary JPA-QL style queries it's going to fall short though. The transaction systems in traditional databases allow for gradual acquisition of locks on demand over the course of multiple operations in the transaction and JPA or any other interface that supports ad-hoc querying is going to assume that support. The locking model used by Cages also assumes relatively low contention and is likely to scale poorly compared to an MVCC based solution. Fair enough - it's written by an end user to scratch their own itch, so it's simpler and less powerful than middleware written for the general case.

Whilst a good locking system allows for the provision of several transactional characteristics, it's not the only piece we need to provide full ACID behaviour. In particular, it's not going to help us with durable atomic updates over multiple keys. Nor does it address the issues of making updates of the NoSQL store atomic with respect to actions in other systems the app may be using, such as a message queue or relational databases. There is an expectation that such components support XA, allowing them to participate in distributed transactions coordinated by a JTA.

We're still a long way from being able to use Cassandra as a full member of the transactional resource ecosystem and it's going to be an interesting journey to get there, even assuming we should try. Sounds like it could be a fun trip for those who like adventures though.

No comments: