Tuesday, June 25, 2024

Some experiments in migrating transaction logs

Transaction stores

Some time ago I prototyped a Redis based implementation of the SlotStore backend suitable for installations where nodes hosting the storage can come and go making it well suited for cloud based deployments of the Recovery Manager.

In the context of the CAP theorem of distributed computing, the recovery store needs to behave as a CP system, ie it needs to be able to tolerate network partitions and yet continue to provide Strong Consistency. Redis can provide the strong consistency guarantee if the RedisRaft module is used with Redis running as a cluster. RedisRaft achieves consistency and partition tolerance by ensuring that:

  • acknowledged writes are guaranteed to be committed and never lost,
  • reads will always return the most up-to-date committed write,
  • the cluster is sized correctly: a RedisRaft cluster of 3 nodes can tolerate a single node failure and a cluster of 5 can tolerate 2 node failures, … ie if the cluster is to tolerate losing N nodes then the cluster size must be at least 2*N+1, thus the minimum cluster size is 3 and the reason for having an odd number of nodes in the cluster is to avoid “split brain” scenarios during network partitions; an odd number guarantees that one side of the split will be in the majority.

During network splits the cluster will become unavailable for a while, ie the cluster is designed to survive failures of a few nodes in the cluster, but it is not a suitable solution for applications that require availability in the event of large net splits, however transaction systems favour Consistency over Availability.

A key motivator for this new SlotStore backend is to address a common problem with using the Narayana transaction stores on cloud platforms when scaling down a node that has in doubt transactions which can leave them unmanaged. Most cloud platforms can detect crashed nodes and restart them but this must be carefully managed to ensure that the restarted node is identically configured (same node identifier, same transaction store and same resource adapters). The current cloud solution, when running on Openshift, is to use a ReplicaSet and to veto scale down until all transactions are completed which can take an indeterminate amount of time, but if we can ask another member of the deployment to finish these in doubt transactions then all but the last node can be safely shutdown even with in doubt transactions. The resulting increase in availability in the presence of node or network failures is a significant benefit for transactional applications which, after all, is key reason why businesses are embracing cloud based deployments.

Remark: Redis is offered as a managed service on a majority of cloud platforms which can help customers to get started with this solution. But note that standard Redis excludes the RedisRaft module which is a requirment for use as a transaction store.

A Redis backed store

Redis is a key value store. Keys are stored in hash slots and hash slots are shared evenly amongst the shards (keys -> hash slots -> shards), the redis cluster specication contains the details. Re-sharding involves moving hash slots to other nodes, impacting performance. Thus, if we can control which hash slots the [slot store] keys map onto then we can improve performance under both normal and failure conditions. This periodic rebalancing of the cluster can be optimised if keys belonging to the same recovery manager are stored in the same hash slot, additionally having the keys, for a particular recovery node, colocated on a single cluster node is good for the general performance of the transaction store.

Also noteworthy is that the keys mapped to a particular hash slot can operated upon transactionally, which is not the case for keys in different slots, meaning that no inter-node hand-shaking is required. This feature opens up the possibility, perhaps, of allowing concurrent access to recovery logs by different recovery managers - but that’s something for a future iteration of the design, but if the logs in a store are shared then be aware that some Narayana recovery modules cache records so those implementations would need to re-evaluated, noting in particular that Redis has support for optimistic concurrency using the watch API which clients can use to observe updates to key values by other recovery managers.

Key space design

A recovery manager has a unique node identifier. We’d like to be able to form “recovery groups” such that any recovery manager in the group can manage transactions created by the others, but not at the same time. To this end we assign a “failoverGroupId” to each recovery manager and use that as the Redis key prefix. This will force all keys created by members of the failover group into the same hash slot, a cloud example of this idea is that the pods in a deployment would all share the same failoverGroupId so any pod in the deployement can take over when the deployment is scaled down.

Failover

Failover involves detecting when a member of the “recovery group” is removed from the cluster and to then migrate the keys to another member of the group. I added an example to the LRA recovery coordinator and used the jedis redis API rename command to “migrate” the keys which is an atomic operation.

Issues

The performance of Redis Raft in this implementation of the SlotStore backend is poor (more than 4 times slower than the default store); I have not invested any effort on improving it but may follow up with another post to discuss throughput since it is a general issue that needs to be solved for any cloud based object store, examples of topics to investigate include pipelining redis commands (similar to how we batch writes to the Journal Store), using Virtual Threads, etc.

We are therefore investigating other alternatives, including an Infinispan based slot store backend - Infinispan now supports partition tolerance so it has become a suitable candidate for a transaction store. Such a store will produce many of the benefits of a Redis store although its performance will be a key implementation constraint.

This design for the key space may not be suitable for transactions with subordinates or nested transactions or for ones that require participant logs to be distinct from the transaction log, such as JTS or XTS. I say may since a modification to the design should accomodate these models.

Assumptions

The cloud platform administrator is responsible for:

  • detecting and restarting failed nodes;
  • issuing the migrate command on one of the remaining nodes
  • for detecting when the deployment is scaled down to zero with pending transactions (including orphans) and emitting a warning accordingly

Example of how to migrate logs

The demo presents the use case of migrating logs between LRA coordinators. To run the demo you will need to:

  1. Clone and build a narayana git branch with support for a Redis backed store.
  2. Start a 3-node cluster of Redis nodes running RedisRaft.
  3. Build and start two LRA coordinators with distinct node id’s, node1 and node2.
  4. Start an LRA on the first coordinator and then halt it to simulate a failure.
  5. View the redis keys using the Redis CLI noticing that the keys embed the node id of the owning coordinator.
  6. Ask the second coordinator to migrate the keys from node1 to node2.
  7. Coordinators maintain a cache of LRAs, but since this is just a PoC I haven’t implemented refreshing internal caches so you will need to simulate that by restarting the second coordinator.
  8. When the first periodic recovery cycle runs (the default is every 2 minutes) the migrated LRAs will be detected which you can verify (curl http://localhost:50001/lra-coordinator/|jq).

Please refer to the demonstrator instructions for the full details.

Notes

The JIRA issue and branch is JBTM-3762.

  • the implementation is is in the slot store directory
  • the tests can be ran using mvn test -Predis-store -Dtest=RedisStoreTest#test1 -f ArjunaCore/arjuna/pom.xml and assume that a redis cluster is running on the test machine
  • the demo is work in progress and is strictly a PoC
  • redis raft implementation: git clone https://github.com/RedisLabs/redisraft.git and build it with: cmake and make

Wednesday, September 6, 2023

A Review of Recent Narayana Releases

The last four releases of Narayana have brought some noteworthy changes, closing 86 issues in the process, which I’d like to summarise in this brief post. The contributions have come from both the broader community and the core Narayana team, thank you for that. The changes include bug fixes, dependency upgrades and tasks and features.

Community

Improve inclusiveness by building a community of users

We reviewed our existing guidance, adding clarifying text to the contributing guide and added a SECURITY.md file. The latest snapshot adds an email address for reporting security issues.

Conscious Language

We also reviewed our materials to ensure that we use welcoming language, free from offensive, othering, or otherwise problematic communication styles.

New Additions/Features

All maven modules were migrated from Java EE to Jakarta EE (which included the main narayana repo plus the quickstart, jboss-transaction-spi and performance repos).

There is now a BOM for narayana (JBTM-3735). To depend on the correct versions in your projects just include the following dependency:

      <dependency>
        <groupId>org.jboss.narayana</groupId>
        <artifactId>narayana-bom</artifactId>
        <version>latest version</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>

The new license for Narayana is Apache License 2.0, it replaces LGPL and provides consumers with more flexibility when releasing their own software products that incorporate Narayana (JBTM-3764).

Issue JBTM-3734 was resolved by a community contributor, it introduced support for JEP-444: Virtual Threads. Virtual threads “dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications”. The change replaced many occurrences of the synchronized java keyword with ReentrantLock which in most usages, but not all, should be semantically equivalent. The change is an API breaking change so we released the update in a major version, 7.0.0.Final.

Removal of features

All modules have been migrated to Jakarta EE and Java EE is not supported.

Release 6.0.0.Final removed the transformed Jakarta maven modules (ones that ended in “-jakarta”).

The OSGi module is no longer available, please refer to the issue for the reason why this decision was made.

Quickstarts showing integration of Spring and Tomcat with Narayana have been temporarily disabled because at the time of the Jakarta migration, Tomcat and Spring had not yet added Jakarta support to their offerings. Issue JBTM-3803 was created for them to be re-enabled when Jakarta variants become available.

Long Running Actions for MicroProfile (LRA)

Release 6.0.0.Final was certified against MicroProfile LRA 2.0.

We added a Narayana specific feature to allow LRA participants to store data with the coordinator (3rd section) during the registration phase. The feature is configurable, using the MicroProfile Config approach, because some users may prefer not to entrust their business data with the coordinator.

The bug fix for JBTM-3749 facilitated the integration of LRA into WildFly, LRA support in WildFly was added with issue WFLY-14869 by Martin Stefanko, an active contributor to LRA. JBTM-3749 provided a partial fix for JBTM-3552 (Do not rely on thread locals for propagating LRA context) and it also included a doc update recommending that users explicitly set the LRA context when JAX-RS resource methods perform outgoing JAX-RS invocations.

The latest snapshot of narayana includes documentation about configuring the concurrency of the LRA coordinator start method, the details are in issue JBTM-3753.

Transaction Logging

Transaction managers log data in order to provide the Durability property of a transactions. Narayana supports a variety of persistence stores, including logging to a database which we call the JDBCStore. JBTM-3724 included a quickstart for this store and JBTM-3754 introduced an option to supply the DataSource for connecting to the store at runtime for use with the Quarkus extension for JTA transactions.

Thursday, April 21, 2022

Narayana on the Cloud - Part 1

In the last few months, I have been working on how distributed transactions are recovered in WildFly when this Application Server (AS) is deployed in Kubernetes. This blog post is a reflection on how Narayana performs on the cloud and the features it is still missing for it to evolve into a native cloud transaction suite.

Some (very brief) context

Narayana started its journey more than 30 years ago! ArjunaCore was developed in the late 1980s. Even though the theoretical concept of cloud computing was introduced by John McCarthy in 1961 [1][2], at the time of ArjunaCore’s development it was still considered only as a theoretical possibility. However, in the past two decades, the implementation of cloud computing has increased exponentially, dramatically changing the world of technology. As a consequence, Narayana (and its ArjunaCore) needs to step up its game to become a cloud native transaction suite that can be used in different cloud environments. This is an ongoing conversation the Narayana team has started a long time ago (for a detailed summary of Narayana's Cloud Strategy see [3]).

Narayana was introduced to the cloud through WildFly (note 1) on Kubernetes (K8s). In my recent experience, I worked on WildFly and its K8s operator [4] and I think that the integration between Narayana and WildFly works very smoothly on K8s [5]. On the other hand, when the pod hosting WildFly needs to scale down, the ephemeral nature of K8s does not get along with Narayana very well. In fact, ArjunaCore/Narayana needs to have a stable ground to perform its magic (within or without WildFly). In particular, Narayana needs to have:

  • A stable and durable Object Store where objects’ states are held
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
  • A stable communication channel to allow participants of transactions to communicate with the TM

In all points above, “stable” indicates the ability to survive whatever happens to the host where Narayana is running (e.g., crashes). On the other hand, K8s is an ephemeral environment where pods do not need a stable storage and/or particular configurations that survive over multiple reboots. To overcome this “incompatibility”, K8s provides StatefulSet [6] through which applications can leverage a stable realm. Particularly in relation to Narayana, the employment of StatefulSet and the addition of a transaction recovery module to the WildFly K8s Operator [7] enables this AS to fully support transactions on K8s. Unfortunately, this solution is tailor-made for K8s and it cannot be easily ported in other cloud environments. Our target, though, is to evolve Narayana to become a cloud transaction suite, which means that Narayana should also support other cloud computing infrastructures.

Our take on this

The Narayana team thoroughly discussed the above limitations that prevent Narayana from becoming a native cloud application. A brief summary is presented here:

  • A stable and durable Object Store where objects’ states are held
    Narayana is able to use different kinds of object stores; in particular, it is possible to use a (SQL) database to create the object store [8]. RDBMS databases are widely available on cloud environments: these solutions already cover our stability needs providing a reliable storage solution that supports replications and that is able to scale up on demand. Moreover, using a “centralised” RDBMS database would easen the management of multiple Narayana instances, which can be connected to the same database. This might also become incredibly useful in the future when it comes to evolving Narayana to work with multiple instances behind a load balancer (i.e. in case of replication)
     
  • A stable communication channel to allow participants of transactions to communicate with the TM
    Most cloud providers (and platforms) already offer two options to tackle this problem: a stable IP address and a DNS. Although both methods still need some tweaking for each cloud provider, these solutions should provide a stable endpoint to communicate with Narayana’s TM over multiple reboots
     
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
    This is the actual sticky point this blog post is about. Although it seems straightforward to assign a unique node identifier to the TM, it is indeed the first real logic challenge to solve on the path to turn Narayana in a cloud transaction manager

We discussed different possible solutions to this last point but we are still trying to figure out how to address this issue. The main problem is that Narayana needs stable storage to save the node identifier and reload it after a reboot. As already said, cloud environments do not provide this option very easily as their ephemeral nature is more inclined to a stateless approach. Our first idea to solve this problem was, “why do we not store the node identifier in the object store? Narayana still needs a stable object store (and this constraint cannot be dropped) and RDBMS databases on the cloud already provide a base to start from”. The node identifier is a property of the transaction manager that gets initialised when Narayana/ArjunaCore starts (together with all the other properties). As a consequence, it is not possible to save the node identifier in the object store as the preferences for the object store are also loaded during the same initialisation process! In other words, if the node identifier is stored in the object store, how can Narayana/ArjunaCore know where the object store is without loading all properties? Which came first: the chicken or the egg? Nevertheless, introducing an order when properties are loaded might help in this regard (i.e. we force the egg to exist before the chicken). Nevertheless, there is still a problem: what happens if the object store is shared between different instances of Narayana/ArjunaCore? For example, it might be very likely that a Narayana administrator configures multiple Narayana instances to create their object stores in the same database. In this case, every Narayana instance would need a unique identifier to tell which node identifier in the object store is its own. Recursive problems are fun :-) Even if we solve all these problems, the assignment of the node identifier should not be possible outside of Narayana (e.g. using system properties) and it should become an exclusive (internal) operation of Narayana. Fortunately, this is easier than solving our previous “chicken and egg” problem as there are solutions to generate a (almost) unique distributed identifier locally [9]. As things stand, we should find an alternative solution to port the node identifier to the cloud.

Looking at this problem from a different point of view, I wonder if there are more recent solutions to replace and/or remove the node identifier from Narayana. With this in mind, the first question I ask myself is “Why do we need a node identifier?”. Behind the hood, Narayana uses a recovery manager to try to recover transactions that have not completed their lifecycle. This comes with a caveat though: it is essential that two different recovery managers do not try to recover the same in-doubt transaction at the same time. That is where the node identifier comes in handy! In fact, thanks to the unique node identifier (that gets embedded in every global transaction identifier), the recovery manager can recognise if it is responsible for the recovery of an in-doubt transaction stored in a remote resource (note 2). This concept is best illustrated by an example. Let’s consider two different Narayana instances that initiate two different transactions that enlist the same resource. In this scenario, both transaction managers store a record in the shared resource. Let’s assume that the first Narayana instance starts the transaction before the second instance. While the first transaction gets to the point where it has sent prepare() to its enlisted resources, it is possible that the recovery manager of the second Narayana instance queries the shared resource for in-doubt records. If Narayana’s recovery manager was not forced to recover only transactions initiated by the same Narayana instance’s TM, this hypothetical scenario would have ended with an error: the recovery manager of the second Narayana instance would have rolled back the transaction initiated by the first Narayana instance, assuming that it was one of its own in-doubt transaction!

Cloud environments are encouraging (all of) us to come up with an innovative solution to reduce the footprint of Narayana/ArjunaCore. In particular, the node identifier is the challenge we are currently facing and the first real step to push Narayana onto the cloud. I will share any updates the Narayana team comes up with…and in the meantime, feel free to reach out to the team through our public channels (for example Gitter or our Google group narayana-users) to propose your ideas or discuss with us your take on this fundamental issue.

Note

  1. WildFly supports transactions thanks to the integration with Narayana
  2. It is possible to tell the Recovery Manager that it will be responsible for the recovery of in-doubt transactions initiated by different transaction managers (which are identified with different node identifiers). The only caveat here is that two Recovery Managers should not recover the same in-doubt transaction at the same time. To assign the responsibility of multiple node identifiers to the same Recovery Manager, the property xaRecoveryNodes [10] in Narayana’s JTAEnvironmentBean should be used.

Bibliography

[1] J. Surbiryala and C. Rong, "Cloud Computing: History and Overview," 2019 IEEE Cloud Summit, 2019, pp. 1-7, doi: 10.1109/CloudSummit47114.2019.00007.

[2] Garfinkel, Simson L. and Harold Abelson. “Architects of the Information Society: 35 Years of the Laboratory for Computer Science at Mit.” (1999).

[3] https://jbossts.blogspot.com/2022/03/narayana-community-priorities.html

[4] https://github.com/wildfly/wildfly-operator

[5] https://issues.redhat.com/browse/EAP7-1394

[6] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

[7] https://github.com/wildfly/wildfly-operator/

[8] https://www.narayana.io/docs/project/index.html#d0e459

[9] https://groups.google.com/g/narayana-users/c/ttSff9HvXdA

[10] https://www.narayana.io//docs/product/index.html#d0e1032

Friday, March 4, 2022

Narayana Community Priorities

Narayana Community Priorities

The following is an outline of our near term priorities for the Narayana open source transaction manager. They have been set based on input from the community, including the narayana-users forum discussion.

It is not necessarily a complete list, so please continue to share your own thoughts on whether you agree they are the right focus for the project, in some respects the list is quite ambitious and we encourage/need and welcome continued contributions and discussion from the community to help achieve these goals.

Community Engagement

  1. Improve inclusiveness by building a community of users:
    • produce clear guidance on how to contribute with different levels of guidance
    • responsive to the community (PRs, queries, issues, rooms etc)
    • issue labels for new contributors and for tasks that we need help with
    • make sure all new features are publicised (blog, articles, docs, etc)
    • regular blog posts with runnable and focused examples
    • acknowledge contributors in release announcements
    • encourage discussions in the community (i.e. minimise private team discussions)

Java Versions

  1. Support native JTA 2.0, EE 10 and Jakarta EE namespace: this work is already well under way. Java SE 11 is now the minimum runtime supported by WildFly and Jakarta EE compatible implementations.
  2. Remove support for Java SE 8 (i.e. SE 11 will the minimum supported version) and add support for Java SE 17

Integrating contemporary services:

  1. Kafka Integration
  2. Quarkus support for REST-AT. This task depends on SRA (aka REST-AT annotations) Tasks

Cloud strategy:

  1. Managed Transaction Service
  2. An improved cloud strategy for JTA around recovery (again we have already started work in this area). Currently we need to create a bespoke solution for every cloud, e.g. the WildFly kubernetes operator. This task includes provision of a more “cloud ready” transaction log store. The task still needs to be pinned down but some relevant input material includes:
    1. Transactional cloud resources (this includes an investigation of whether an Infinispan based store is feasible - note that earlier versions were incompatible with our needs)
    2. Investigate jgroups-raft and whether this can help with creating a cloud-ready object store
    3. Add clustering support
    4. Add an SPI nethod to obtain a unique identifier for a transaction
    5. No easy way to acquire the node name from the JBoss Transaction SPI

    There is also the forum item: reliably generate node identifiers which will help with using Narayana in cloud deployments:

    • the task should also explore the pros and cons of storing it for crash recovery purposes
    • the forum thread also includes some work that we may do on validating our current uid solution for cloud environments
  3. Better integration of LRA in cloud environments:
    1. Ensure that any LRA coordinator instance can control any LRA
    2. Allow different LRA coordinators to share an object store

Transaction Log Stores

  1. Persistent Memory: narayana already provides a pmem based object store which we would like to integrate into WildFly
  2. Journal Store performance improvements
  3. Provide a native file lock alternative to our own Object Store locking (FileLock.java) for managing concurrent access to a transaction log. It should be configurable at runtime or build time (Quarkus is a good use case). If the runtime platform does not provide the capability then a default fallback mechanism will be defined.

Upgrades/Deprecation/Removal/Replacement of existing functionality:

  1. Remove Transactional Driver in favour of using Agroal. We are tracking this work using JBTM-3439.
  2. Remove txframework - it was previously deprecated by the compensations module. The issue tracker is Remove old TXFramework API
  3. Remove support for JacORB which is now EOL
  4. Upgrade to JUnit 5 (from 4) for unit testing: Testing Narayana using JUnit 5

Other

  1. Improved support for asynchronous APIs. Although we continue to be tied to XA and very few resource managers support the asynchronous component of the XA spec section 3.5 Synchronous, Non-blocking and Asynchronous Modes, there are still things we would like to do in this area including Asynchronous JTA

Wednesday, January 5, 2022

Securing LRA endpoints using JWT

Introduction

JWT stands for JSON Web Token, which is a popular way to do user authorization in web application and is also popular in the context of micro-services. So, when we use Long Running Actions (LRA) in any micro-service, the transaction APIs could be authorized using JWT tokens. Open industry standard specification RFC-7519 outlines how JTW is structured and how to use it. JWT works over HTTP protocol. The reason JWT is now a days preferred more is because it makes the authorization mechanism easier for micro-service applications, avoids single point of failure and also helps the application design to be more scalable.

Here is how JWT is structured: [<HEADER>.<PAYLOAD>.<SIGNATURE>]

The JWT token is divided into three parts, as we can see in the above example which are separated by two periods.

    1: HEADER    -> base64UrlEncode(header)
    2: PAYLOAD   -> base64UrlEncode(payload)
    3: SIGNATURE -> encryptionAlgorithm(base64UrlEncode(header) + '.' + base64UrlEncode(payload),256-bit-SECRET)
You can create your own JWT token by visiting website jwt.io. JWT is a value token, which will only contain the user information in PAYLOAD, with the name of type of algorithm used in the HEADER and  the token verification signature in the SIGNATURE part.


The above figure shows the implication of JWT. The server will create JWT token and will give it to the client, so that client can send it back on the subsequent request. Once the JWT token is created and provided to the client, we can do a REST call to  as below:
 curl -H "Authorization:Bearer [<HEADER>.<PAYLOAD>.<SIGNATURE>]" http://127.0.0.1:8080/app/api

Securing LRA endpoints

There are various LRA annotations used, which will internally call the REST APIs that are present in Coordinator and RecoveryCoordinator classes. So, below are the recommendations to, how to define roles for each and every APIs in order to create JWT token for client.

LRA-endPointsAllowed-roles
getAllLRAsclient
getLRAStatusclient

getLRAInfoclient
startLRAclient

renewTimeLimitclient
getNestedLRAStatusclient

closeLRAclient
cancelLRAclient

joinLRAViaBodyclient
leaveLRAclient

completeNestedLRAsystem
compensateNestedLRAsystem

forgetNestedLRAsystem
getCompensatoradmin

replaceCompensatoradmin
getRecoveringLRAsadmin

getFailedLRAsadmin
deleteFailedLRAadmin

One of the popular tool that could be used to generate JWT tokens would be Keycloak. Keycloak is an open source identity and access management solution. For more details about Keycloak you can also visit keycloak.org.

Problems with JWT and their solutions


1. Anyone can read first two parts of JWT tokens, i.e. HEADER and PAYLOAD, which are only base64 encoded. So, the PAYLOAD part must not contain any confidential information. It should contain enough information so that server could know who the user is.

2. If someone steals your JWT token, it will work for anyone. So in order to avoid the theft, we should be careful about how we are transmitting JWT. It has to be HTTPS connection and by using the process of OAuth which comes with its own security and protection to make sure people don't steal JWT tokens.

3. In compare to session based authentication, if someone steals sessionID, we can log off, which ends the session and it doesn't exist anymore. But in case of JWT there is nothing on the server to end. Since the whole information is inside JWT, we only set expiration for JWT by having expiry PAYLOADs, but we cannot log off. This situation can be handled by creating blacklisted JWTs table at server side and when the request comes to server, that JWT token will be validated if not the blacklisted one then the server will authorize the request if the token had valid signature.

4. If we choose to use Expiry JWT token for LRA, then if the transaction did not complete before the token expiration, then transaction will never complete. So avoid using Expiry JWT tokens with LRA and try to follow above three ways in order to avoid the security breaches.

Tuesday, August 17, 2021

LRA annotation checker Maven plugin

With the release of the LRA (Long Running Actions) specification in version 1.0 Narayana team works on integrating the Narayana implementation to various application runtimes. Currently it's Quarkus and WildFly. (Camel is a third platform integrating the Narayana implementation but it does in a way not depending on LRA annotations defined in the specification.).

NOTE: If you want to get introduction what is LRA and what is good for you can read some of the already published articles (1, 2).

At this time when the LRA  can be easily grab and used within the application runtimes it may come some difficulty on precise use of the LRA annotations.

The specification defines different "requirements" the LRA application has to follow to work correctly. Some are basic as "the LRA annotated class must contain at least one of the methods annotated with @Compensate or @AfterLRA" or that the LRA annotated JAX-RS endpoints has predefined HTTP methods to be declared with. For example the @Complete/@Compensate requires the @PUT method while the @Forgot requires the @DELETE and @Status requires the @GET.

When the specific LRA contract rule is violated the developer will find them at the deployment time with the RuntimeException being thrown. But time of the deployment could be a bit late to find just a forgotten annotation required by the LRA specification. With that idea in mind Narayana offers a Maven plugin project Narayana LRA annotation checker.

The developer working with the LRA introduces the dependency to the LRA annotation with artifact org.eclipse.microprofile.lra:microprofile-lra-api:1.0. He codes the application and then he can introduce the Maven plugin of the LRA checker to be run during Maven phase process-classes. The developer needs to point to the plugin goal check to get the verification being run.

The snippets that can be placed to the project pom.xml is following. The plugin Maven artifact coordinates is io.narayana:maven-plugin-lra-annotations_1.0:1.0.0.Beta1

...
<build>
  <plugins>
    <plugin>
      <groupId>io.narayana</groupId>
      <artifactId>maven-plugin-lra-annotations_1.0</artifactId>
      <version>1.0.0.Beta1</version>
      <executions>
        <execution>
          <goals>
            <goal>check</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>
...

When plugin is loaded it searches for classes available at path ${project.build.directory}/classes (i.e., target/classes) and tries to find if the application preserve the rules defined by the LRA specification. When not then the Maven build fails reporting what error happens.
Such an error is in format of [error id]:[description]. Example of such error is

[ERROR] Failed to execute goal io.narayana:maven-plugin-lra-annotations_1.0:1.0.0.Beta1:check (default) on project lra-annotation-checker-maven-plugin-test: LRA annotation errors:
[ERROR] [[4]]->
  1: The class annotated with org.eclipse.microprofile.lra.annotation.ws.rs.LRA missing at least one of the annotations Compensate or AfterLRA Class: io.narayana.LRAParticipantResource;
  2: Multiple annotations of the same type are used. Only one per the class is expected. Multiple annotations 'org.eclipse.microprofile.lra.annotation.Status' in the class 'class io.narayana.LRAParticipantResource' on methods [status, status2].;
  4: Wrong method signature for non JAX-RS resource method. Signature for annotation 'org.eclipse.microprofile.lra.annotation.Forget' in the class 'io.narayana.LRAParticipantResource' on method 'forget'. It should be 'public void/CompletionStage/ParticipantStatus forget(java.net.URI lraId, java.net.URI parentId)';
  5: Wrong complementary annotation of JAX-RS resource method. Method 'complete' of class 'class io.narayana.LRAParticipantResource' annotated with 'org.eclipse.microprofile.lra.annotation.Complete' misses complementary annotation javax.ws.rs.PUT.

The plugin can be configured with two parameters (placed under <configuration> under the <plugin> element).

Attribute Description Default
paths Paths searched for classes to be checked. Point to a directory or jar file. Multiple paths are delimited with a comma. ${project.build.directory}/classes
failWhenPathNotExist When some path defined within argument paths does not exist then Maven build may fail or resume with information that the path is not available. true

All the points described in this article can be seen and tested in an example at github.com/jbosstm/artifacts#jbossts.blogspot/17-08-2021-lra-annotation-checker. The plugin configuration can be seen in the pom.xml of the same project.

Any comments, ideas for enhancement and bug reports are welcomed. The project LRA annotation checker Maven plugin is placed in the jbosstm incubator repository. The issues can be submitted via JBTM issue tracker at https://issues.jboss.org/browse/JBTM.

Hopefully this small plugin provides a better experience for any developer working with the LRA and LRA Narayana implementation in particular.

Tuesday, July 20, 2021

How to use Long Running Actions between microservices

Introduction

In my last post I showed how to run a Long Running Action (LRA) within a single JAX-RS resource method using quarkus features to build and run the application. I showed how to create and start an LRA coordinator and then generated a basic hello application, showing how to modify the application to run with a long running action (by adding dependencies on the org.eclipse.microprofile.lra:microprofile-lra-api and org.jboss.narayana.rts:narayana-lra artifacts, which together provide annotations for controlling the lifecycle of LRAs). That post also includes links to the the LRA specification and to the javadoc for the annotation API.

In this follow up post I will indicate how to include a second resource in the LRA. To keep things interesting I’ll deploy the second resource to another microservice and use quarkus’s MicroProfile Rest Client support to implement the remote service invocations. The main difference between this example and the one I developed in the earlier post, apart from the technicalities of using Rest Client, is that we will set the LRA.end attribute to false in the remote service so that the LRA will remain active when the call returns. In this way the initiating service method has the option of calling other microservices before ending the LRA.

Creating and starting an LRA coordinator

LRA relies on a coordinator to manage the lifecycle of LRAs so you will need one to be running for this demo to work successfully. The previous post showed how to build and run coordinators. Alternatively, download or view some scripts which execute all of the steps required in the current post and it includes a shell script called coordinator.sh which will build a runnable coordinator jar (it’s fairly simple and short so you can just read it and create your own jar or just run it as is).

Generate a project for booking tickets

Since the example will be REST based, include the resteasy and rest-client extensions (on line 6 next):

    1: mvn io.quarkus:quarkus-maven-plugin:2.0.1.Final:create \
    2:     -DprojectGroupId=org.acme \
    3:     -DprojectArtifactId=ticket \
    4:     -DclassName="org.acme.ticket.TicketResource" \
    5:     -Dpath="/tickets" \
    6:     -Dextensions="resteasy,rest-client"
    7: cd ticket

You will need the mvn program to run the plugin (but the generated projects will include the mvnw maven wrapper).

Modify the generated TicketResource.java source file to add Microprofile LRA support. The changes that you will need for LRA are on lines 26 and 27. Line 26 says that the bookTicket method must run with an LRA (if one is not present when the method is invoked then one will be automatically created). Note that we have set the end attribute to false to stop the LRA from being automatically closed when the method finishes. By keeping the LRA active when the ticket is booked, the caller can invoke other services in the context of the same LRA. Most services will require the LRA context for tracking updates which typically will be useful for knowing which actions to compensate for if the LRA is later cancelled: the context is injected as a JAX-RS method parameter on line 27.

You will also need to include callbacks for when the LRA is later closed or cancelled (the methods are defined on lines 37 and line 46, respectively).

    1: package org.acme.ticket;
    2:
    3: import static javax.ws.rs.core.MediaType.APPLICATION_JSON;
    4:
    5: // import annotation definitions
    6: import org.eclipse.microprofile.lra.annotation.ws.rs.LRA;
    7: import org.eclipse.microprofile.lra.annotation.Compensate;
    8: import org.eclipse.microprofile.lra.annotation.Complete;
    9: // import the definition of the LRA context header
   10: import static org.eclipse.microprofile.lra.annotation.ws.rs.LRA.LRA_HTTP_CONTEXT_HEADER;
   11:
   12: // import some JAX-RS types
   13: import javax.ws.rs.GET;
   14: import javax.ws.rs.PUT;
   15: import javax.ws.rs.Path;
   16: import javax.ws.rs.Produces;
   17: import javax.ws.rs.core.Response;
   18: import javax.ws.rs.HeaderParam;
   19:
   20: @Path("/tickets")
   21: @Produces(APPLICATION_JSON)
   22: public class TicketResource {
   23:
   24:     @GET
   25:     @Path("/book")
   26:     @LRA(value = LRA.Type.REQUIRED, end = false) // an LRA will be started before method execution if none exists and will not be ended after method execution
   27:     public Response bookTicket(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   28:         System.out.printf("TicketResource.bookTicket: %s%n", lraId);
   29:         String ticket = "1234"
   30:         return Response.ok(ticket).build();
   31:     }
   32:
   33:     // ask to be notified if the LRA closes:
   34:     @PUT // must be PUT
   35:     @Path("/complete")
   36:     @Complete
   37:     public Response completeWork(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   38:         System.out.printf("TicketResource.completeWork: %s%n", lraId);
   39:         return Response.ok().build();
   40:     }
   41:
   42:     // ask to be notified if the LRA cancels:
   43:     @PUT // must be PUT
   44:     @Path("/compensate")
   45:     @Compensate
   46:     public Response compensateWork(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   47:         System.out.printf("TicketResource.compensateWork: %s%n", lraId);
   48:         return Response.ok().build();
   49:     }
   50: }

Skip the tests:

rm src/test/java/org/acme/ticket/*

Add dependencies on microprofile-lra-api and narayana-lra to the pom to include the MicroProfile LRA annotations and the narayana implementation of them so that the LRA context will be propagated during interservice communications:

    <dependencies>
      <dependency>
        <groupId>org.eclipse.microprofile.lra</groupId>
        <artifactId>microprofile-lra-api</artifactId>
        <version>1.0</version>
      </dependency>
      <dependency>
        <groupId>org.jboss.narayana.rts</groupId>
        <artifactId>narayana-lra<\/artifactId>
        <version>5.12.0.Final</version>
      </dependency>

We are creating ticket and trip microservices so they need to listen on different ports, configure the ticket service to run on port 8081:

    1: quarkus.arc.exclude-types=io.narayana.lra.client.internal.proxy.nonjaxrs.LRAParticipantRegistry,io.narayana.lra.filter.ServerLRAFilter,io.narayana.lra.client.internal.proxy.nonjaxrs.LRAParticipantResource
    2: quarkus.http.port=8081
    3: quarkus.http.test-port=8081

The excludes are pulled in by the org.jboss.narayana.rts:narayana-lra maven dependency. As mentioned in my previous post this step will not be necessary when the pull request for the io.quarkus:quarkus-narayana-lra extension is approved. Now build and test the ticket service, making sure that you have already started a coordinator as described in the previous blog (or you can use the shell scripts linked above):

./mvnw clean package -DskipTests # skip tests
java -jar target/quarkus-app/quarkus-run.jar & # run the application in the background
curl http://localhost:8081/tickets/book
TicketResource.bookTicket: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_2
1234

The bookTicket() method prints the method name and the id of the active LRA followed by the hard-coded booking id 1234.

Generate a project for booking trips

Now create a second microservice which will be used for booking trips. It will invoke other microservices to complete trip bookings. In order to simplify the example there is just the single remote ticket service involved in the booking process.

First generate the project. Like the ticket service, the example will be REST based so include the resteasy and rest-client extensions:

mvn io.quarkus:quarkus-maven-plugin:2.0.1.Final:create \
    -DprojectGroupId=org.acme \
    -DprojectArtifactId=trip \
    -DclassName="org.acme.trip.TripResource" \
    -Dpath="/trips" \
    -Dextensions="resteasy,rest-client"

cd trip

The rest-client extension includes support for MicroProfile REST Client which we shall use to perform the remote REST invocations from the trip to the ticket service. For REST Client we need a TicketService and we need to register it as shown on line 12 of the following listing:

    1: package org.acme.trip;
    2:
    3: import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;
    4:
    5: import javax.ws.rs.GET;
    6: import javax.ws.rs.Path;
    7: import javax.ws.rs.Produces;
    8: import javax.ws.rs.core.MediaType;
    9:
   10: @Path("/tickets")
   11: @Produces(MediaType.APPLICATION_JSON)
   12: @RegisterRestClient
   13: public interface TicketService {
   14:
   15:     @GET
   16:     @Path("/book")
   17:     String bookTicket();
   18: }

Let’s also create a TripService and inject an instance of the TicketService into it, marking it with the @RestClient annotation on line 11. The quarkus rest client support will configure this injected instance such that it will perform remote REST calls to the ticket service (the remote endpoint for the ticket service will be configured below in the application.properties file):

    1: package org.acme.trip;
    2:
    3: import org.eclipse.microprofile.rest.client.inject.RestClient;
    4: import javax.enterprise.context.ApplicationScoped;
    5: import javax.inject.Inject;
    6:
    7: @ApplicationScoped
    8: public class TripService {
    9:
   10:     @Inject
   11:     @RestClient
   12:     TicketService ticketService;
   13:
   14:     String bookTrip() {
   15:         return ticketService.bookTicket(); // only one service will be used for the trip booking
   16:
   17:         // if other services need to be part of the trip they would be called here
   18:         // and the TripService would associate each step of the booking with the id of the LRA
   19:         // (although I've not shown it being passed in this example) and that would form the
   20:         // basis of the ability to compensate or clean up depending upon the outcome.
   21:         // We may include a more comprehensive/realistic example in a later blog.
   22:     }
   23: }

And now we can inject an instance of this service into the generated TripResource (src/main/java/org/acme/trip/TripResource.java) on line 26. I have also annotated the bookTrip() method with an LRA annotation so that a new LRA will be started before the method is started (if one wasn’t already present) and I have added @Complete and @Compensate callback methods (these will be called when the LRA closes or cancels, respectively):

    1: package org.acme.trip;
    2:
    3: import javax.inject.Inject;
    4: import javax.ws.rs.GET;
    5: import javax.ws.rs.Path;
    6: import javax.ws.rs.Produces;
    7: import javax.ws.rs.core.Response;
    8:
    9: import static javax.ws.rs.core.MediaType.APPLICATION_JSON;
   10:
   11: // import annotation definitions
   12: import org.eclipse.microprofile.lra.annotation.ws.rs.LRA;
   13: import org.eclipse.microprofile.lra.annotation.Compensate;
   14: import org.eclipse.microprofile.lra.annotation.Complete;
   15: // import the definition of the LRA context header
   16: import static org.eclipse.microprofile.lra.annotation.ws.rs.LRA.LRA_HTTP_CONTEXT_HEADER;
   17:
   18: // import some JAX-RS types
   19: import javax.ws.rs.PUT;
   20: import javax.ws.rs.HeaderParam;
   21:
   22: @Path("/trips")
   23: @Produces(APPLICATION_JSON)
   24: public class TripResource {
   25:
   26:     @Inject
   27:     TripService service;
   28:
   29:     // annotate the hello method so that it will run in an LRA:
   30:     @GET
   31:     @LRA(LRA.Type.REQUIRED) // an LRA will be started before method execution and ended after method execution
   32:     @Path("/book")
   33:     public Response bookTrip(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   34:         System.out.printf("TripResource.bookTrip: %s%n", lraId);
   35:         String ticket = service.bookTrip();
   36:         return Response.ok(ticket).build();
   37:     }
   38:
   39:     // ask to be notified if the LRA closes:
   40:     @PUT // must be PUT
   41:     @Path("/complete")
   42:     @Complete
   43:     public Response completeWork(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   44:         System.out.printf("TripResource.completeWork: %s%n", lraId);
   45:         return Response.ok().build();
   46:     }
   47:
   48:     // ask to be notified if the LRA cancels:
   49:     @PUT // must be PUT
   50:     @Path("/compensate")
   51:     @Compensate
   52:     public Response compensateWork(@HeaderParam(LRA_HTTP_CONTEXT_HEADER) String lraId) {
   53:         System.out.printf("TripResource.compensateWork: %s%n", lraId);
   54:         return Response.ok().build();
   55:     }
   56: }

For the blog we can skip the tests:

rm src/test/java/org/acme/trip/*

Configure the trip service to listen on port 8082 (line 2). Also configure the remote ticket endpoint as required by the MicroProfile REST Client specification (line 5):

    1: quarkus.arc.exclude-types=io.narayana.lra.client.internal.proxy.nonjaxrs.LRAParticipantRegistry,io.narayana.lra.filter.ServerLRAFilter,io.narayana.lra.client.internal.proxy.nonjaxrs.LRAParticipantResource
    2: quarkus.http.port=8082
    3: quarkus.http.test-port=8082
    4:
    5: org.acme.trip.TicketService/mp-rest/url=http://localhost:8081
    6: org.acme.trip.TicketService/mp-rest/scope=javax.inject.Singleton

Add dependencies on microprofile-lra-api and narayana-lra to the pom to include the MicroProfile LRA annotations and the narayana implementation of them so that the application can request that the LRA context be propagated during interservice communications:

      <dependency>
        <groupId>org.eclipse.microprofile.lra</groupId>
        <artifactId>microprofile-lra-api</artifactId>
        <version>1.0</version>
      </dependency>
      <dependency>
        <groupId>org.jboss.narayana.rts</groupId>
        <artifactId>narayana-lra</artifactId>
        <version>5.12.0.Final</version>
      </dependency>

and finally, build and run the microservice:

./mvnw clean package -DskipTests
java -jar target/quarkus-app/quarkus-run.jar &

Use curl to book a trip. The HTTP GET request to the trips/book endpoint is handled by the trip service bookTrip() method and it then invokes the ticket service to book a ticket. When the bookTrip() method finishes the LRA will be closed (since the default value for the LRA.end attribute is true), triggering calls to the service @Complete methods of the two services:

curl http://localhost:8082/trips/book
TripResource.bookTrip: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_52c
TicketResource.bookTrip: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_52c
TripResource.completeWork: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_52c
TicketResource.bookTrip: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_52c
TicketResource.completeWork: http://localhost:8080/lra-coordinator/0_ffffc0a8000e_8b2b_60f6a8d4_52c
1234

Notice the output shows the bookTrip and bookTicket methods being called and also notice that the @Complete methods of both services (completeWork()) were called. The id of the LRA on all calls should be the same value as shown in the example output, this is worthwhile noting since the completion and compensation methods will typically use it in order to determine which actions it should clean up for or compensate for when the LRA closes or cancels.

Not shown here, but if there was a problem booking the ticket then the ticket service should return a JAX-RS status code (4xx and 5xx HTTP codes by default) that triggers the cancellation of the LRA, and this would then cause the @Compensate methods of all services involved in the LRA to be invoked.