Wednesday, May 28, 2014

Bringing Transactional Guarantees to MongoDB: Part 1

In this blog post I'll present some recent work we've been doing to bring stronger transactional guarantees to MongoDB. In part 2 I'll present a code example that shows this in action in WildFly 8.

What requirements are we fulfilling? 

1) Updating multiple MongoDB documents in a single transaction
2) Support for sharded environments, without harming scalability
3) Support for global transactions spanning other datastores and traditional relational databases.
4) A middleware solution that's simple for developers to use.

This post covers the background and explains why a compensating transaction (vs an ACID transaction) could be the best fit to meet the above requirements. Part two in this series is more implementation focused. It presents a code example, showing you how you can use the technology, whilst omitting a lot of the theory (that is covered in this post).

Background

NoSQL datastores were originally built as bespoke, in-house solutions, to meet scalability requirements that it was felt relational databases couldn't meet. The general thinking was that ACID transactions would harm scalability and that it was better to workaround that requirement. However, as NoSQL adoption spread beyond its in-house roots, it became clear that many applications do indeed need a level of reliability that transactions can bring.

Typically a NoSQL datastore offers atomic updates to single items, such as a document or key-value (more generally, an aggregate). Therefore, structuring data into aggregates, can mean that the application never needs to update more than one document at a time, within the same transaction. Mostly, this could be true. However, there are cases in which it's not possible to structure the data in this way. Take the classic example of moving funds from one user's account to another. It doesn't make sense to store all users in the same aggregate as it will create a lot of contention and a very large aggregate! Therefore the only option is to deal with each user's data in separate atomic operations. Without a transaction spanning these operations, the application runs the risk of becoming inconsistent in the event of failure. Another example is when the application needs to make updates to a NoSQL datastore in the same transaction as an RDBMS or JMS interaction. Typically NoSQL datastores don't support this.

MongoDB and other NoSQL datastores scale through a combination of sharding and replica-sets. I won't go into the specifics here on how this works. However, the key point is that the data becomes distributed over several nodes. Updating multiple data items atomically requires a distributed transaction. As well as being complex to implement, under certain workloads, a distributed ACID transaction can limit scalability. I suspect it is for these reasons that very few NoSQL datastores support ACID transactions in a sharded environment.

The blocking nature of an ACID transaction is the key property that limits scalability. For the duration of the transaction, external readers and writers are blocked until the transaction completes. For contended data, this can result in lots of waiting. The longer the transaction takes to run, the worse the problem. As well as delay introduced by applications, distributing data over a cluster or multiple databases can also result in longer running transactions. However, for data with low-contention, it's possible that an ACID transaction doesn't harm your scalability, in which case you should consider using them as they are a lot simpler to deal with.

Compensating transactions offer an alternative to ACID transactions. They remove the blocking property by relaxing Isolation and Consistency. Despite offering fewer guarantees than ACID transactions, they offer significantly more guarantees than forgoing transactions altogether. Furthermore, in many applications these guarantees are enough, and any more are superfluous. ACID vs Compensating transactions are discussed in more detail in my blog series "Compensating Transactions: when ACID is too much". Here I also show a pattern for working around the relaxed properties.

Using Compensating-transactions with MongoDB

Through Narayana and WildFly's compensating transactions feature, we can fullfil the requirements stated at the start of this blog as follows:

1) Multiple document updates. 
A compensation handler is logged with each document update. In the case of failure, or if the application elects to cancel the transaction, any partially completed work is compensated, resulting in an atomic outcome. Narayana can build on the atomic update mechanism provided by MongoDB, by logging a reference to the compensating handler in the updated document, in the same atomic update as the business-logic's update. This ensures that either i) both the business-logic update, and compensating handler are persisted; or ii) neither is persisted. The handler can be removed at the end of the protocol. It is this construct that the rest of the protocol is built on, allowing recovery to be achieved regardless of what stage the protocol is in during failure.


2) Sharded environments. 
This approach builds on the atomic-update primitive offered by MongoDB. As this feature works in a sharded environment, so does the compensating-transaction that is built upon it. Furthermore, scalability is maintained due to i) the units of work, composing the transaction operating relatively-quickly; and ii) external readers and writers not being blocked during the progress of the transaction. This holds true, regardless of the duration of the transaction.

3) Support for global transactions. 
The transaction is coordinated by an external transaction manager which means multiple datastores or databases can be enlisted. As this is a general approach, it should be possible to mix the databases and datastore types. For example, an RDBMS and/or a JMS resource can also be enlisted in the global compensating-transaction as well as multiple NoSQL datastores. Furthermore, not all participants need to enlist as compensating resources. Those, which are more traditionally used with ACID transactions, can enlist as an ACID resource, using traditional XA. Here the ACID (XA) resources would experience the full ACID properties, with the compensating resources experiencing the relaxed-ACID properties of a compensating transaction.

4) A middleware solution that's simple for developers to use. 
Narayana offers an annotation-based API very similar to JTA 1.2, for using compensating transactions in your application. Furthermore, it comes pre-installed in WildFly 8, so you don't need to worry about complex setup. This API is discussed in more detail in the this blog series. Part 2 in this series will show how this API can be used to update two MongoDB documents within a compensating transaction.

Can this be done already with MongoDB?

The MongoDB documentation proposes a pattern for updating multiple documents in a relaxed-ACID transaction. This approach is similar to the Narayana approach in that they are both based on Sagas and result in similar interactions with the datastore. However, where the Narayana approach differs is that it provides a middleware-solution and so doesn't need to be developed within the application. Also, this approach is driven by a transaction manager, making it simpler for the transaction to span multiple resources.


Tuesday, May 27, 2014

Research Worth Knowing on CAP & ACID

The last couple of blog posts on NoSQL/SOA/large-scale and transactions got me thinking that maybe I hadn't mentioned another interesting research effort that also attempts to show how ACID transactions are possible in environments where some believe they aren't applicable. The work is being done under the banner of HA Transactions (HAT), and there's a more recent paper on the topic from VLDB 2014; they also talk about the use of different transaction models such as Sagas, which were part of the input to WS-TX and our REST transactions work. And of course there's always HPTS over the years!

Monday, May 26, 2014

Transactions and Microservices

I've written elsewhere that I think the term Microservices is really just referring to good SOA principles (why we need another term I really don't quite understand). But for whatever reason, articles and blog entries on Microservices seem to be in vogue at the moment. A recent entry on InfoQ goes into some depth on various aspects of what they are and how to use them. Unfortunately the author talks about transactions in the context of Microservices (aka SOA) and has this to say:

"One solution, of course, is to use distributed transactions. For example, when updating a customer’s credit limit, the CustomerService could use a distributed transaction to update both its credit limit and the corresponding credit limit maintained by the OrderService. Using distributed transactions would ensure that the data is always consistent. The downside of using them is that it reduces system availability since all participants must be available in order for the transaction to commit. Moreover, distributed transactions really have fallen out of favor and are generally not supported by modern software stacks, e.g. REST, NoSQL databases, etc."

Huh? This paragraph is wrong on so many levels that I really don't know where to start! For a start "generally not supported by modern software stacks"? Seriously?! Others have spoken about REST and transactions for years, but we've done our own work for over a decade! You also don't have to look too far on this blog for references to NoSQL and transactions (extended transactions or ACID). And of course there's Google's Spanner! ACID transaction support is a key part of this!

Over the years during the initial SOA, WS-* and REST debates kicking transactions out of the picture was a convenient thing for many people to do. Fortunately sanity and better understanding of where they can and should be used has seen them and their variants returning to these environments. I had hoped that those days were over, but it seems that with Microservices we're turning back the clock yet again. Oh well, time to dust off those old papers, blog posts etc. as it seems there's life left in them thanks to Microservices!

Friday, April 25, 2014

What's that we've been saying about transactions and NoSQL ..?

Well we've been saying for years that transactions are important and whilst not every use case needs them, some pretty important ones really do! Our very own Paul Robinson had something to say about this at DevNation the other week and his presentation will be on line very soon.

Friday, January 10, 2014

Narayana Transaction Analyser 1.0.0.Alpha1

In a previous post I introduced the Narayana Transaction Analyser (NTA). In that post I focused on the high-level goals of the project and provided some insight into where we hope to take this tooling. In this post I will focus on what features we have today in the 1.0.0.Alpha1 release and how to get started. I'll also close with an overview of what we hope to add in the coming releases.

Overview of Features in 1.0.0.Alpha1

The main set of features in this release are as follows:
  • A simple getting started experience. Drop the .ear file into WildFly or EAP 6.2.0's deployments directory and click "start" in the Transaction Analyser's console, to begin analysis.
  • Detailed information on JTA transactions. This includes the outcome of the transaction; which resources were enlisted and how they behaved.
  • Demonstrator application. You may want to try out this tooling, but you don't currently have a misbehaving application to try it with. This demonstrator allows you to trigger various successful and unsuccessful transaction scenarios, on-demand.

Getting Started

Currently NTA supports WildFly 8.0.0.CR1 and EAP 6.2.0 onwards. To install and enable NTA do the following:

  1. Download NTA from here.
  2. Copy the nta-full.ear file to $JBOSS_HOME/standalone/deployments/
  3. Start the application server, if not already started.
  4. Visit the console here: http://localhost:8080/nta
  5. Click start in the top-right of the console to enable analysis.

If you want to simulate some failing transactions, simply deploy our demo application and trigger some scenarios:
  1. Download the demo.war from here.
  2. Copy the demo.war file to $JBOSS_HOME/standalone/deployments/
  3. Visit the demo application here: http://localhost:8080/txdemo
  4. You will see a list of four scenarios. Click on invoke for the scenario of interest.
  5. Switch to the NTA console and observe the details of the transaction.
  6. Click on the transaction ID link to view more details of the transaction.

Example detecting a timeout

In this section I'll show how a timed-out transaction can be detected. I'll also use this as an opportunity to highlight some of the additional information that can be viewed with the tool.



Visit the console and click 'start' to begin analysis.


Open the demo application and  click on "Invoke" for the "2) Transaction Timeout" scenario. After two seconds, an alert will pop-up displaying the (failed) outcome of the transaction. 



Now switch back to the NTA console. You will see a single transaction listed. Notice that the status is 'TIMEOUT' and that the duration was a little over 1 second. This is understandable as the scenario set the transaction timeout to 1 second, whilst the business logic took 2 seconds to complete before attempting to commit the transaction.



Clicking on the TransactionID of the timed-out transaction takes you to a details page. Here we can see:

1) Transaction Information. This area provides high-level details about the transaction, such as the outcome (timeout, in this case), details of the duration and start/stop time as well as a note on wether it was distributed.

2) Enlisted Participants. This area lists information on all the participants enlisted in the transaction and how they behaved. In this example, the transaction was timed-out before the two phase protocol was begun. Therefore, the participants were never asked to prepare. The console reports the Vote as UNKNOWN. This text is a bit misleading and will be improved (see: NTA-39).

3) Event Timeline. This area provides a list of interesting events that occurred during the transaction. In this example we can see when the transaction was begun and then what participants were enlisted. The ABORT event occurs in response to the transaction timeout. This will be improved to make it clearer that the transaction was aborted here due to timeout (see: NTA-40).

Clicking on the other scenarios and observing the outcome in NTA is left as an exercise to the reader.

Upcoming Features

We are community-driven in prioritising features, so please provide feedback on the Narayana forums after you've tried the tool. The current set of features planned for the next few releases are as follows:

1) Distributed Transaction support. Alex Creasy (the intern who developed the prototype) has implemented support for analysing distributed (JTS) transactions. With the 1.0.0.Alpha1 release manual setup is required. In the Alpha2 release we hope to polish this process making it a lot simpler to use.

2) Plugin support. Alex also implemented support for simple plugins that detect common transactional issues. This feature requires a little polish and is currently un-documented. We hope to resolve this in the Alpha2 or Alpha3 release.

3) Closer integration with the WildFly and JBoss ecosystem. Currently NTA is a standalone tool. Over the coming releases we aim to align and integrate this tooling more closely with the existing tooling provided by WildFly and JBoss EAP. We are currently seeking community input on how best to achieve this.

4) Import/export of data. This allows the gathered data to be transferred to a third party who's helping to diagnose an issue. For example, when requesting support via a forum post or support ticket. This is currently targeted for the Alpha2 or Alpha3 release.

We also have lots of other features planned. For a more complete list (and up-to-date roadmap), visit the NTA Jira instance.

Getting Involved



What to do if NTA doesn't display what you expect

NTA is currently Alpha quality, so it's possible that it won't always display the correct information. If you suspect this to be the case it would be great to hear from you, so that we can fix the issue. In this case, you should create a post on the Narayana forum. It would be great if you could provide a screen-shot of what NTA is displaying, as well as the server log file (if it's not too large). NTA parses this log file, so we should be able to compare this to the screen-shot when figuring out the problem. We can also use NTA to parse this log file making it easy to reproduce your issue.

Introducing the "Narayana Transaction Analyser"

In this post I'll interview Paul Robinson, who is the project lead of the new "Narayana Tranasaction Analyser" project. In the interview we'll aim to provide an introduction to the tool and help you get a feel for how it could help you as an application developer.

Tom: Can you provide some background for the tool?

Paul: Back when I was a JBoss Transactions consultant, my clients and I often found it difficult to discover the cause of failing transactions. Since then I've felt that it would be great to have tooling that tells me everything about all the transactions ran in my application server. With this information I would be able to see exactly why my transactions are not behaving as I would like. I could also use this information to understand more about my architecture, by displaying a high level topology of servers and transactional resources involved in my transaction.

Last summer we hired Alex Creasy as an Intern to work on a prototype of this tool. From his excellent work the "Narayana Transaction Analyser" was born. Since Alex completed his prototype, we have promoted it to a project under the 'Narayana' umbrella and produced our first release. Today, I'd like to focus on providing an overview of what we aim to achieve with this tool. I'll follow up with a subsequent post, focusing on what features we have in the recent 1.0.0.Alpha1 release and how to get started.

Tom: What are the goals of this tool?


Paul: The main requirement of the Transaction Analyser is to make it significantly simpler to diagnose transaction-related problems. As well as providing detailed information on every transaction, the tool can also be loaded with a suite of plugins that diagnose common issues. It should also be possible to export this data. This exported data can then be uploaded alongside a support ticket or forum posting, giving the person providing the assistance more data to work with. 


Tom: When would I use this tool?


Paul: In general, the tool should be enabled when you are experiencing some transaction-related issues and you require more information. You need to be mindful of when this tool is enabled as gathering this data does impose an overhead on the system. Think of this tool as being similar to a performance profiler, like JProfiler. You just enable this tool when you detect an issue that requires more investigation.

Tom: Sounds interesting, can you give me some examples of what I could use this for?


Paul: The following list should give you a feel for what type of issues the tool can investigate.

Many of your transactions are rolling back, and you don't know why.

This often occurs when a timeout is triggered due to business logic taking too long to complete. The tool lists all transactions that were rolled back due to timeout. The tool may also be able to provide details on what the business logic was doing, making it easier to track down the root cause. For example, any JPA queries ran within the transaction could be displayed.

You have a distributed transaction crossing many servers, and you're finding it difficult to correlate the many log files. 

The tool is distributed-transaction aware and groups together all the data from a single transaction that spans multiple servers. Currently the focus is on supporting JTS, but gathering data on Web Service and REST transactions is possible.

You have a heuristic transaction, but you don't know which resource misbehaved. 

The tool shows all resources enlisted in the transaction and provides details on how they behaved in the transaction. It is relatively simple to see which resource didn't behave as instructed.

A transaction appears to have 'hung', but you don't know why. 

This often occurs due to deadlock when trying to obtain a lock on some resource. This is the type of issue that often requires expert knowledge to track down and requires a good understanding of the log output. However, this process can be automated by this tool; notifying the user when the problem is detected. As well as notifying the user, the tool can also link to some useful documentation explaining how to fix the problem.

Someone is assisting you with an issue and they would like some more details. 

Providing this person with a dump from this tool will provide them with a wealth of information, hopefully making it easier for them to assist you. After fixing the problem, a plugin could be developed so that the issue is automatically detected whenever anyone else experiences that issue in the future.

Tom: How do I try it?


Paul: We recently released NTA 1.0.0.Alpha1. In this blog post I walk through the current feature set and how to get the tool up-and-running.

Thursday, October 17, 2013

The Narayana project visualized by Gource

We just ran the excellent Gource tool over the Narayana repo and generated this rather neat visualization of our contributions over the last few years.
Software projects are displayed by Gource as an animated tree with the root directory of the project at its centre. Directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project.



If you want to see your avatar on the video next time we generate one, please do consider adding a contribution over here: http://github.com/jbosstm/narayana