Wednesday, June 26, 2019

Expiry scanners and object store in Narayana

What are the expiry scanners?

The expiry scanner serves for garbage collection of aged transaction records in Narayana.
Before elaborating on that statement let's first find out why is such functionality needed.

Narayana object store and transaction records

Narayana creates persistent records when process transactions. These records are saved to the transaction log called Narayana object store. The records are utilized during transaction recovery when a failure of a transaction happens. Usual reasons for the transaction failure is a crash of the JVM or a network connection issue or an internal error on the remote participant. The records are created during the processing of transactions. Then they are removed immediately after the transaction successfully finishes (regardless of the transaction outcome – commit or rollback). That implies that the Narayana log contains only the records of the currently active transactions and the failed ones. The records on active transactions are expected to be removed when the transaction finishes. The records on failed transactions are stored until the time they are recovered – finished by periodic recovery – or by the time they are resolved by human intervention.
...or by the time they are garbage collected by the expiry scanner.

Narayana stores transaction record in a hierarchical structure. The hierarchy location depends on the type of record. The object store could be stored on the hard drive – either as a directory structure, or in the journal store (the implementation which is used is created by ActiveMQ Artemis project), or it can be placed to the database via JDBC connection.

NOTE: Narayana object store saves data about transaction processing, but the same storage is used to persist other runtime data which is expected to survive the crash of the JVM.

Object store records for JTA and JTS

Transaction processing records are stored differently independence whether JTA or JTS mode is used. The JTA runs the transactions inside the same JVM. While JTS is designed to support distributed transactions. When JTS is used, the components of the transaction manager are not coupled inside the same JVM. The components communicate with each other via messages, regardless the components run within the same JVM or as different processes or on different nodes. JTS mode saves more transaction processing data to object store than the JTA alternative.

For standard transaction processing the JTA starts with the enlisting participant under the global transaction. Then two-phase commit starts and prepare is called at each participant. When the prepare 2PC phase ends, the record informing about the success of the phase is stored under the object store. After this point, the transaction is predetermined to commit (until that point the rollback would be processed in case of the failure, see presumed rollback). The 2PC commit phase is processed by calling commit on each participant. After this phase ends the record is deleted from the object store.
The prepare "tombstone record" informs about the success of the phase but contains information on successfully prepared participants which were part of the transaction.
 
This is how the transaction object storage looks like after the prepare was successfully processed. The type which represents the JTA tombstone record is StateManager/BasicAction/TwoPhaseCoordinator/AtomiAction.
data/tx-object-store/
ShadowNoFileLockStore
└── defaultStore
   ├── EISNAME
   │   ├── 0_ffff0a000007_6d753eda_5d0f2fd1_34
   │   └── 0_ffff0a000007_6d753eda_5d0f2fd1_3a
   └── StateManager
       └── BasicAction
           └── TwoPhaseCoordinator
               └── AtomicAction
                   └── 0_ffff0a000007_6d753eda_5d0f2fd1_29
In the case of the JTS, the processing runs mostly the same way. But one difference is that the JTS saves more setup data (created once during initialization of transaction manager, see FactoryContact, RecoveryCoordinator). Then the second difference to JTA is that the JTS stores the information about each prepared participant separately for JTS the participants are separate entities and each of them handles the persistence on his own. Because of that, a "prepare record" is created for each participant separately (see Mark's clarification below in comments).  When XAResource.prepare is called there is created a record type CosTransactions/XAResourceRecord. When the XAResource.commit is called then the record is deleted. After the 2PC prepare is successfully finished the record StateManager/BasicAction/TwoPhaseCoordinator/ArjunaTransactionImple is created and is removed when the 2PC commit phase is finished. The record ArjunaTransactionImple is the prepare "tombstone record" for JTS.
Take a look at how the object store with two participants and finished 2PC prepare phase looks like
data/tx-object-store/
ShadowNoFileLockStore
└── defaultStore
   ├── CosTransactions
   │   └── XAResourceRecord
   │       ├── 0_ffff0a000007_-55aeb984_5d0f33c3_4b
   │       └── 0_ffff0a000007_-55aeb984_5d0f33c3_50
   ├── Recovery
   │   └── FactoryContact
   │       └── 0_ffff0a000007_-55aeb984_5d0f33c3_15
   ├── RecoveryCoordinator
   │   └── 0_ffff52e38d0c_c91_4140398c_0
   └── StateManager
       └── BasicAction
           └── TwoPhaseCoordinator
               └── ArjunaTransactionImple
                   └── 0_ffff0a000007_-55aeb984_5d0f33c3_41

Now, what about the failures?

When the JVM crashes, network error or another transaction error happens the transaction manager stops to process the current transaction. Depending on the type of failure it either abandons the state and passes responsibility to finish the transaction to the periodic recovery manager. That's the case e.g. for the "clean" failures – the JVM crash or the network crash. The periodic recovery starts processing when the system is restarted and/or it periodically retries to connect to the participants to finish the transaction.
Continuing with the object store example above. JVM crashes and further restarts make that periodic recovery to observe the 2PC prepare was finished – there is the AtomicAction/ArjunaTransactionImple record in the object store. The recovery manager lists the participants (represented with XAResources) which were part of the transaction and it tries to commit them.

ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource

Let me make a quick side note to one interesting point in the processing. Interesting at least from the Narayana perspective.
If you are using Narayana transaction manager for some time you are well familiar with the log error message:

[com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord

This warning means: There was a successful prepared transaction as we can observe the record in the object store. But periodic recovery manager is not capable to find out what is the counterparty participant – e.g. what database or JMS broker the record belongs to.
This situation happens when the failure (JVM crash) happens in a specific time. That's time just after XAResource.commit is called. It makes the participant (the remote side - e.g. the database) to remove its knowledge about the transaction from its resource local storage. But at that particular point in time, the transaction record was not yet removed from the Narayana object store.
The JVM crash happened so after the application restarts the periodic recovery can observe a record in the object store. It tries to match such record to the information obtained from the participant's resource local storage (uses XAResource.recover call).
 
As the participant's resource local storage was cleaned there is no information obtained. Now the periodic recovery does see any directly matching information to its record in the object store.
From that said, we can see the periodic recovery complains that there is a participant record which does not contain "connection data" as it's non-serializable. And there is no matching record at the participant's resource local storage.

NOTE: One possibility to get rid of the warning in the log would be to serialize all the information about the participant (serializing the XAResource). Such serialized participants provide an easy way for the periodic recovery manager to directly call methods on the un-serialized instance (XAResource.recover). But it would mean to serialize e.g. the JDBC connection which is hardly possible.

The description above explains the JTA behaviour. In the case of the JTS, if the transaction manager found a record in the object store which does not match any participant's resource local storage info then the object store record is considered as assumed completed. Such consideration means changing the type of record in the object store. Changing the type means moving the record to a different place in the hierarchical structure of the object store. When the record is moved to an unknown place for the periodic recovery it stops to consider it as a problematic one and it stops to print out warnings to the application log. The record is then saved under ArjunaTransactionImple/AssumedCompleteServerTransaction in the hierarchical structure.
This conversion of the in-doubt record to the assumed completed one happens by default after 3 cycles of recovery. Changing the number of cycles could be done by providing system property -DJTSEnvironmentBean.commitedTransactionRetryLimit=…

The ARJUNA016037 the warning was a topic in various discussions

The warning is shown again and again in the application log. It's shown each time the periodic recovery is running – as it informs there is a record and I don't know what to do with that.

NOTE: The periodic recovery runs by default every 2 minutes.

Now, what we can do with that?


Fortunately, there is an enhancement of the recovery processing in the Narayana for some time already. When the participant driver (ie. resource manager "deployed" in the same JVM) implements the Narayna SPI XAResourceWrapper it provides the information what resource is the owner of the participant record. Narayana periodic recovery is then capable to deduce if the orphaned object store record belongs to the particular participant's resource local storage. Then it can assume that the participant committed already its work. Narayana can update its own object store and periodic recovery stops to show the warnings.
An example of the usage of the SPI is in the Active MQ Artemis RA.

Transaction processing failures

Back to the transaction processing failures (JVM crash, network failure, internal participant error).
As mentioned the "clean failures" can be automatically handled by the periodic recovery. But the "clean" failures are not the only ones you can experience. The XA protocol permits a heuristic failure. Those are failures which occurs when the participant does not follow the XA protocol. Such failures are not automatically recoverable by periodic recovery. Human intervention is needed.
 
Such failures occur mostly because of an internal error at the remote participant. An example of such failure could be that the transaction manager commands the resource to commit with XAResource.commit call. But the resource manager responds that it already rolled-back the resource transaction arbitrarily. In such a case, Narayana saves this unexpected state into the object store. The transaction is marked having the heuristic outcome. And the periodic recovery observes the heuristic record in the object store and informs about it during each cycle.
Now, it's the responsibility of the administrator to get an understanding of the transaction state and handle it.
But if he does not process such a transaction for a very long time then...

Expiry scanners

...then we are back at the track to the expiry scanners.
What does mean that a record stays in the object for a very long time?

The "very long time" is by default 12 hours for Narayana. It's the default time after when the garbage collection process starts. This garbage collection is the responsibility of the expiry scanners. The purpose is cleaning the object store from the long staying records. When there is a record left in the heuristic state for 12 hours in the object store or there is a record without the matching participant's resource local storage info in the object store then the expiry scanner handles it. The purpose of such handling causes is the periodic recovery stops to observe the existence of such in-doubt participant and subsequently to stop complaining about the existence of the record.

Handling a record means moving a record to a different place (changing the type of the record and placing the record to a different place in the hierarchical structure) or removing the record completely from the object store.

Available implementations of the expiry scanner

For the JTA transaction types, there are following expiry scanners available in Narayana
  • AtomicActionExpiryScanner : moving records representing the prepared transaction (AtomicAction) to the inferior hierarchy place named /Expired.
  • ExpiredTransactionStatusManagerScanner : removing records about connection setup for the status manager. This record is not connected with transaction processing and represents Narayana runtime data.

For the JTS transaction types, there are following expiry scanners available in Narayana
  • ExpiredToplevelScanner Removing ArjunaTransactionImple/AssumedCompleteTransaction record from the object store. The AssumedCompleteTransaction originates from the type ArjunaTransactionImple and is moved to the assumed type by the JTS periodic recovery processing.
  • ExpiredServerScanner Removing ArjunaTransactionImple/AssumedCompleteServerTransaction record from the object store. The AssumedCompleteServerTransaction originates from the type ArjunaTransactionImple/ServerTransaction/JCA and is moved to the assumed type by the JTS periodic recovery processing.
  • ExpiredContactScanner : Scanner removes the records which let the recovery manager know what Narayana instance belongs to which JVM. This record is not connected with transaction processing and represents Narayana runtime data.

Setup of expiry scanners classes

As explained elsewhere Narayana can be set up either with system properties passed directly to the Java program or defined in the file descriptor jbossts-properties.xml. If you run the WildFly application server the system properties can be defined at the command line with -D… when starting application server with standalone.sh/bat script. Or they can be persistently added into the bin/standalone.conf config file.
The class names of the expiry scanners that will be active after Narayana initialization can be defined by property com.arjuna.ats.arjuna.common.RecoveryEnvironmentBean.expiryScannerClassNames or RecoveryEnvironmentBean.expiryScannerClassNames (named differently, doing the same service). The property then contains the fully qualified class names of implementation of ExpiryScanner interface. The class names are separated with space or an empty line.
An example of such settings could be seen at Narayana quickstarts. Or when it should be defined directly here it's
-DRecoveryEnvironmentBean.expiryScannerClassNames="com.arjuna.ats.internal.arjuna.recovery.ExpiredTransactionStatusManagerScanner com.arjuna.ats.internal.arjuna.recovery.AtomicActionExpiryScanner"

NOTE: when you configure the WildFly app server then you are allowed to use only the shortened property name of -DRecoveryEnvironmentBean.expiryScannerClassNames=…. The longer variant does not work because of the way the issue WFLY-951 was implemented.

NOTE2: when you are running the WildFly app server then the expired scanners enabled by default could be observed by looking into the source code at ArjunaRecoveryManagerService (consider variants for JTA and JTS modes).

Setup of expiry scanners interval

To configure the time interval after the "orphaned" record is handled as the expired one you can use the property the property with the name com.arjuna.ats.arjuna.common.RecoveryEnvironmentBean.expiryScanInterval or RecoveryEnvironmentBean.expiryScanInterval. The value could be a positive whole number. Such number defines that the records expire after that number of hours. If you define the value as a negative whole number then the first run of the expire scanner run skipped. Next run of the expire scanner expires the records after that (positive) number of hours. If you define the value to be 0 then records are never handled by expiry scanners.


That's all in terms of this article. Feel free to ask a question here or at our forum at https://developer.jboss.org/en/jbosstm.

Monday, April 29, 2019

JTA and CDI integration

The Narayana release 5.9.5.Final comes with few nice CDI functionality enhancements. This blogpost introduces these changes while placing them to the context of the JTA and CDI integration, particularly with focus to Weld.

TL;DR

The fastest way to find out the way of using the JTA with the CDI is walking through the Narayana CDI quickstart.

JTA and CDI specifications

JTA version 1.2 was published in 2013. The version introduced the integration of JTA with CDI. The specification came with the definition of annotations javax.transaction.Transactional and javax.transaction.TransactionScoped. Those two provide a way for transaction boundary definition and for handling application data bounded to the transaction.

Narayana, as the implementation of the JTA specification, provides those capabilities in the CDI maven module.
Here we come with the maven coordinates:
<groupId>org.jboss.narayana.jta</groupId>
<artifactId>cdi</artifactId>

The module brings Narayana CDI extension to the user's project. The extension installs interceptors which manage transactional boundaries for method invocation annotated with @Transactional. Then the extension defines a transaction scope declared with the @TransactionScoped annotation.

On top of the functionality defined in the JTA specification, it's the CDI specification which defines some more transaction-related features. They are the transactional observer methods and the definition of the javax.transaction.UserTransaction built-in bean.

Let's summarize what that all means in practice.

@Transactional

With the use of the @Transactional annotation, transaction boundary could be controlled declaratively. The use of the annotation is really similar to the container-managed transactions in EJB.

When the annotation is used for a bean or a method the Narayana CDI extension (CDI interceptor is used) verifies the existence of the transaction context when the method is called. Based on the value of the value parameter an appropriate action is taken. The value is defined from enumeration Transactional.TxType
For example when @Transactional(Transactional.TxType.REQUIRES_NEW) is used on the method then on the start of its execution a new transaction is started. If the incoming method call contains an existing transaction it's suspended during the method execution and then resumed after it finishes. For details about the other Transactional.TxType values consider the javadoc documentation.

NOTE: be aware of the fact that for the CDI container can intercept the method call the CDI managed instance has to be used. For example, when you want to use the capability for calling an inner bean you must use the injection of the bean itself.

@RequestScope
public class MyCDIBean {
  @Inject
  MyCDIBean myBean;

  @Transactional(TxType.REQUIRED)
  public void mainMethod() {
    // CDI container does not wrap the invocation
    // no new transaction is started
    innerFunctionality();

    // CDI container starts a new transaction
    // the method uses TxType.REQUIRES_NEW and is called from the CDI bean
    myBean.innerFunctionality();
  }

  @Transactional(TxType.REQUIRES_NEW)
  private void innerFunctionality() {
    // some business logic
  }
}
  
>

@TransactionScoped

@TransactionScoped brings an additional scope type in addition to the standard built-in ones. A bean annotated with the @TransactionScoped, when injected, lives in the scope of the currently active transaction. The bean remains bound to the transaction even when it is suspended. On resuming the transaction the scoped data are available again. If a user tries to access the bean out of the scope of the active transaction the javax.enterprise.context.ContextNotActiveException is thrown.

Built-in UserTransaction bean

The CDI specification declares that the Java EE container has to provide a bean for the UserTransaction can be @Injected. Notice that the standalone CDI container has no obligation to provide such bean. The availability is expected for the Java EE container. In Weld, the integration for the Java EE container is provided through the SPI interface TransactionServices.

If somebody wants to use the Weld integrated with Narayana JTA implementation in a standalone application he needs to implement this SPI interface (see more below).

Transaction observer methods

The feature of the transaction observer methods allows defining an observer with the definition of the during parameter at @Observes annotation. During takes a value from the TransactionPhase enumeration. The during value defines when the event will be delivered to the observer. The event is fired during transaction processing in the business logic but then the delivery is deferred until transaction got status defined by the during parameter.
The during parameter can obtain values BEFORE_COMPLETION, AFTER_COMPLETION, AFTER_FAILURE, AFTER_SUCCESS. Using value IN_PROGRESS means the event is delivered to observer immediately when it's fired. It behaves like there is no during parameter used.

The implementation is based on the registration of the transaction synchronization. When the event is fired there is registered a special new synchronization which is invoked by the transaction manager afterwards. The registered CDI synchronization code then manages to launch the observer method to deliver the event.

For the during parameter working and for the events being deferred Weld requires integration through the TransactionServices SPI. The interface defines a method which provides makes for Weld possible to register the transaction synchronization. If the integration with the TransactionServices is not provided then the user can still use the during parameter in his code. But(!) no matter what TransactionPhase value is used the event is not deferred but it's immediately delivered to the observer. The behaviour is the same as when the IN_PROGRESS value is used.

Maybe it could be fine to clarify who fires the event. The event is fired by the user code. For example, take a look at the example in the Weld documentation. The user code injects an event and fires it when considers it necessary.

@Inject @Any Event productEvent;
...
public void persist(Product product) {
  em.persist(product);
  productEvent.select(new AnnotationLiteral(){}).fire(product);
}
The observer is defined in the standard way and using during for the event delivery to be deferred until the time the transaction is finished with success.
void addProduct(@Observes(during = AFTER_SUCCESS) @Created Product product) {
...
}

A bit more about TransactionServices: Weld and JTA integration

As said for the integration of the Weld CDI to JTA it's needed to implement the TransactionServices SPI interface. The interface gives the Weld the chance to gain the UserTransaction thus the built-in bean can provide it when it's @Injected. It provides the way to register transaction synchronization for an event could be deferred until particular transaction status occurs. Up to that, it demands the implementation of the method isTransactionActive. The TransactionScoped is active only when there is some active transaction. This way the Weld is able to obtain the transaction activity state.

Regarding the implementation, you can look at how the interface TransactionServices is implemented in WildFly or in the more standalone way for SmallRye Context Propagation.

A new Narayana CDI features

Narayana brings two new CDI JTA integration capabilities, up to those described above.

The first enhancement is the addition of the transactional scope events. Up to now, Narayana did not fire the scope events for the @TransactionScoped. From now there is fired the scope events automatically by Narayana. The user can observe the initialization and the destruction of the transaction scope. The code for the observer could be like

void transactionScopeActivated(
  @Observes @Initialized(TransactionScoped.class) final Transaction event,
  final BeanManager beanManager) {
...
}
The event payload for the @Initialized is the javax.transaction.Transaction, for the @Destroyed is just the java.lang.Object (when the transaction scope is destroyed there is no active transaction anymore).
As the Narayana implements the CDI in version 1.2 in these days there is not fired an event for @BeforeDestroy. That scope event was introduced in the CDI version 2.0.

The second enhancement is the addition of two built-in beans which can be @Injected in the user code. Those are beans TransctionManager and TransactionSynchronizationRegistry.

The implementation gives priority to the JNDI binding. If there is bound TransactionManager/TransactionSynchronizationRegistry in the JNDI then such instance is returned at the injection point.
If the user defines his own CDI bean or a CDI producer which provides an instance of those two classes then such instance is grabbed for the injection.
As the last resort, the default Narayana implementation of both classes is used. You can consider the TransactionManagerImple and the TransactionSynchronizationRegistryImple to be used.

Using the transactional CDI extension

The easiest way to check the integration in the action is to run our JTA standalone quickstart. You can observe the implementation of the Weld SPI interface TransactionServices. You can check the use of the observers, both the transaction observer methods and the transactional scoped observers. Up to that, you can see the use of the transaction scope and use of the injection for the TransactionManager.

Acknowledgement

Big thanks to Laird Nelson who contributed the new CDI functionality enhancements to Narayana.
And secondly thanks to Matěj Novotný. for his help in understanding the CDI topic.