Narayana team blog: Narayana periodic recovery of XA transactions

Let's talk about the transaction recovery with details specific to Narayana.
This blog post is related to JTA transactions. If you configure recovery for JTS, still you can find a relevant information here but then you will need to consult the Narayana documentation.

What is the transaction recovery

The transaction recovery is process needed when an active transaction fails for some reason. It could be a crash of process of the transaction manager (JVM) or connection to the resource (database)could fail or any other reason for failure.
The failure of the transaction could happen at various points of the transaction lifetime and the point define the state which the in-progress transaction was left at. The state could be just an in-memory state which is left behind and transaction manager relies on the resource transaction timeout to release it. But it could the transaction state after prepare was called (by successful prepare call the 2PC transaction confirms that is capable to finish transaction and more of what it promises to finish the transaction with commit). There has to be a process which finishes such transaction remainders. And that process is the transaction recovery.
Let's review three variants of failures which serves three different transaction states. Their results and needs of termination will guide us through the work of the transaction recovery process.

Transaction manager runs a global transaction which includes several transaction branches (when using term from XA specification). In our article about 2PC we used (not precisely) term resource-located transaction instead of the transaction branch.
Let's say we have a global transaction containing data insertion to a database plus sending a message to a message broker queue.
We will examine the crash of transaction manager (the JVM process) where each point represents one of the three example cases. The examples show the timeline of actions and differ in time when the crash happens.

The global transaction was started and insertion to database happened, now JVM crashes. (no message was sent to queue). In this situation, all the transaction metadata is saved only in the memory. After JVM is restarted the transaction manager has no notion of existence of the global transaction in time before.
But the insertion to the database already happened and the database has some work in progress already. But there was no promise by prepare call to end with commit and everything was stored only as an in-memory state thus transaction manager relies on the database to abort the work itself. Normally it happens when transaction timeout expires.
The global transaction was started, data was inserted into the database and message was sent to the queue. The global transaction is asking to commit. The two-phase commit begins – the prepare is called on the database (resource-located transaction). Now the transaction manager (the JVM) crashes. If an interleaving data manipulation would be permitted then the 2PC commit would fail. But calling of prepare means the promise of the successful end. Thus the call of prepare causes locks to be taken to prevent other transactions to interleave.
When transaction manager is restarted but again no notion of the transaction could be found as all the in-memory state was cleared. And there was nothing to be saved in the Narayana transaction log so far.
But the database transaction is in the prepared state and with locks. On top of it, the transaction in the prepared state can't be rolled-back by transaction timeout and needs to wait for some other party to finish it.
The global transaction was started, data inserted into the database and message was sent to the queue. The transaction was asked to commit. The two-phase commit begins – the prepare is called on the database and on the message queue too. A record success of the prepare phase is saved to the Narayana transaction log too. Now the transaction manager (JVM) crashes.
After the transaction manager is restarted there is no in-memory state but we can observe the record in the Narayana transaction log and that database and the JMS queue resource-located transactions are in the prepared state with locks.

In the later cases the transaction state survived the JVM crash - once only at the side of locked records of a database, in other cases, a record is present in transaction log too. In the first case only in memory transaction representation was used where transaction manager is not responsible to finish it.
The work of finishing the unfinished transactions belongs to the recovery manager. The purpose of the recovery manager is to periodically check the state of the Narayana transaction log and resource transaction logs (unfinished resource-located transactions – it runs the JTA API call of XAResource.recover().
If an in-doubt transaction is found the recovery manager either roll it back - for example in the second case, or commit it as the whole prepare phase was originally finished with success, see the third case.

Narayana periodic recovery in details

The periodic recovery is the configurable process. That brings flexibility of the usage but made necessary to use proper settings if you want to run it.
We recommend checking the Narayana documenation, the chapter Failure Recovery.
The recovery runs periodicity (by default each two minutes) - the period could be changed by setting system property RecoveryEnvironmentBean.periodicRecoveryPeriod). When launched it iterates over all registered recovery modules (see Narayana codebase com.arjuna.ats.arjuna.recovery.RecoveryModule) and it runs the following sequence: calling the method periodicWorkFirstPass on all recovery modules, waiting time defined by RecoveryEnvironmentBean.recoveryBackoffPeriod, calling the method RecoveryEnvironmentBean.recoveryBackoffPeriod on all recovery modules.
When you want to run standard JTA XA transactions (JTS differs, you can check the config example in the Narayana code base) then you needs to configure the XARecoveryModule for the usage. The XARecoveryModule then brings to the play need of configuring XAResourceOrphanFilters which manage finishing in-doubt transactions when available only at the resource side (the second case represents such scenario).

Narayana periodic recovery configuration

You may ask how all this is configured, right?
The Narayana configuration is held in "beans". The "beans" contains properties which are retrieved by getter method calls all over the Narayana code. So configuration of the Narayana behaviour means redefining values of the demanded bean properties.
Let's check what are the beans relevant for setting the transaction management and recovery for XA transactions. We will use the jdbc transactional driver quickstart as an example.
The releavant beans are following

To configure the values of the properties you need to define it one of the following ways

via system property – see example in the quickstart pom.xml
. We can see that the property is passed at the JVM argument line.
via use of the descriptor file jbossts-properties.xml.
This is usually the main source of configuration in the standalone applications using Narayana. You can see the example jbossts-properties.xml and observe that as the standalone application is not the exception.
The descriptor has to be at the classpath for the Narayana will be able to access it.
via call of bean setter methods.
This is the programatic approach and is normally used mainly in managed environment as they are application servers as WildFly is (WildFly configures with jboss-cli).

The usage of the system property has precedence over use of the descriptor jbossts-properties.xml.
The usage of the programatic call of the setter method has precedence over use of system properties.

The default settings for the used narayana-idlj-jts.jar artifact can be seen at https://github.com/jbosstm/narayana/blob/master/ArjunaJTS/narayana-jts-idlj/src/main/resources/jbossts-properties.xml. Those are (with combination of settings inside of particular beans) default values used when you don't have any properties file defined.
For more details on configuration check the Narayana.io documentation (section Development Guide -> Configuration options).

If you want to use the programatic approach and call the bean setters you need to gain the bean instance first. That is normally done by calling a static method of PropertyManager. There are various of them depending what you want to configure.
The relevant for us are:

arjPropertyManager for CoreEnvironmentBean etc.
recoveryPropertyManager for RecoveryEnvironmentBean
jdbcPropertyManager for JDBCEnvironementBean

We will examine the programmatic approach at the example of the jdbc transactional driver quickstart inside of the recovery utility class where property controlling values of XAResourceRecovery is reset in the code.
If you search to understand what should be the exact name of the system property or entry in jbossts-properties.xml the rule of thumb is to take the short class name of the bean, add the dot and the name of the property at the end.
For example let's say you want to redefine time period for the periodic recovery cycle. Then you need to visit the RecoveryEnvironmentBean, find the name of the variable – which is periodicRecoveryPeriod. By using the rule of thumb will use the name RecoveryEnvironmentBean.periodicRecoveryPeriod for redefinition of the default 2 minutes value.
Some bean uses annotations @PropertyPrefix which offers other way of naming for the property for settings it up. In case of the periodicRecoveryPeriod we can use system property with name com.arjuna.ats.arjuna.recovery.periodicRecoveryPeriod to reset it in the same way.

Note: an interesting link could be the settings for the integration with Tomcat which useses programatic approach, see NarayanaJtaServletContextListener.

Thinking about XA recovery

I hope you have a better picture of recovery setup and how that works now.
The XARecoveryModule has the responsibility for handling recovery of 2PC XA transactions. The module is responsible for committing unfinished transactions and for handling orphans by running registered XAResourceOrphanFilter.

As you could see we configured two RecoveryModules – XARecoveryModule and AtomicActionRecoveryModule in the jbossts-properties.xml descriptor.
The AtomicActionRecoveryModule is responsible for loading resource from object store and if it is serializable and as the whole saved in the Narayana transaction log then it could be deserialized and used immediately during recovery.
This is not the case often. When the XAResource is not serializable (which is hard to achieve for example for database where we need to have a connection to do any work) the Narayana offers resource initiated recovery. That requires a class (a code and a settings) that could provide XAResources for the recovery purposes. For getting the resource we need a connection (to database, to jms broker...). The XARecoveryModule uses objects of two interfaces to get such information (to get the XAResources for recovery).
Those interfaces are

Both interfaces then contain method to retrieve the resources (XAResourceRecoveryHelper.getXAResources(),XAResourceRecovery.getXAResource()). The XARecoveryModule then ask all the received XAResources to find in-doubt transactions (by calling XAResource.recovery()) (aka. resource located transactions).
The found in-doubt transactions are then paired with transactions in Narayana transaction log store. If the match is found the XAResource.commit() could be called.
Maybe you wonder what both interfaces are mostly the same – which kind of true – but the use differs. The XAResourceRecoveryHelper is designed (and only available) to be used in programatic way. For adding the helper amogst other ones you need to call XARecoveryModule.addXAResourceRecoveryHelper(). You can even deregister the helper by method call XARecoveryModule.removeXAResourceRecoveryHelper.
The XAResourceRecovery is configured not directly in the code but via property com.arjuna.ats.jta.recovery.XAResourceRecovery. This is not viable for dynamic changes as in normal circumstances it's not possible to reset it – even when you try to change the values by call of JTAEnvironmentBean.setXaResourceRecoveryClassNames().

Running the recovery manager

We have explained how to configure the recovery properties but we haven't pointed down one important fact – in the standalone application there is no automatic launch of the recovery manager. You need manually to start it.
A good point is that's quite easy (if you don't use ORB) and it's fine to call just


      RecoveryManager manager = RecoveryManager.manager();
      manager.initialize()

This runs an indirect recovery manager (RecoveryManager.INDIRECT_MANAGEMENT) which spawns a thread which runs periodically the recovery process. If you feel that you need to run the periodic recovery in times you want (periodic timeout value is then not used) you can use direct management and call to run it manually


      RecoveryManager manager = RecoveryManager.manager(RecoveryManager.DIRECT_MANAGEMENT);
      manager.initialize();
      manager.scan();

For stopping the recovery manager to work use the terminate call.


      manager.terminate();

Summary

This blog post tried to introduce process of transaction recovery in Narayana.
The goal was to present settings necessary to be set for the recovery would work in an expected way for XA transactions and shows how to start the recovery manager in your application.

Thursday, January 11, 2018

Narayana periodic recovery of XA transactions