Monday, January 6, 2025

Managing the availability of LRA participants

This post is a continuation of a series of jbosts blogs that discuss the MicroProfile LRA specification.

Services manage their workloads by providing endpoints to an LRA coordinator which in turn uses those endpoints to drive the LRA protocol forward thereby enabling the construction of reliable services. These endpoints may need to be modified over the long run so it ought to be possible to replace them with different ones in response to changes to the environment in which the service executes. Although the specification does not discuss how the endpoints can be replaced, the Narayana LRA REST API for the coordinator includes Microprofile OpenAPI documentation for replacing endpoints.

There are various administrative and management reasons for why the capability can be useful, such as controlling where termination handling is to take place, or to facilitate service replacement, etc. It may also be desirable for work completion, compensation, status reporting and clean up activities to be handled on different endpoints and at different times and this goal is facilitated via annotations including @Compensate, @Complete, @Status, @Forget and @AfterLRA.

When a participant does work in the context of a long running action, a “recovery URL” is created which services may use to associate their work with various management actions such as changing the participant endpoints as the action proceeds, after all a long running action can be of arbitrary duration and the needs of a service may change as the action evolves. The example I created for this post halts the JVM during “complete”, asks the user to send a curl request to the LRA coordinator to provide it with a new participant completion endpoint, restarts the participant on the new endpoint and waits for recovery to resend the completion callback to the new endpoint.

By leveraging the feature admins may proactively react to changing conditions (connectivity, throughput, functionality updates, etc) and be able to tune and or reconfigure the environment accordingly, perhaps bringing up a more reliability aware service that more intelligently operates within the more limited environment.

Build and start a coordinator on port 8080

Use the quarkus-maven-plugin to create a project for the coordinator, adding a dependency on maven artifact org.jboss.narayana.lra:lra-coordinator-jar:0.0.10.Final to the resulting pom. Also specify that the build should produce an uber jar so that the coordinator can run standalone:

    mvn io.quarkus:quarkus-maven-plugin:3.3.1:create -DprojectGroupId=org.acme -DprojectArtifactId=narayana-lra-coordinator -Dextensions="rest-jackson,rest-client"
    cd narayana-lra-coordinator
    rm -rf src/test src/main/java # the sources created by the example aren't required
    echo "quarkus.package.jar.type=uber-jar" > src/main/resources/application.properties
    # don't forget to add a dependency on maven artifact: org.jboss.narayana.lra:lra-coordinator-jar:0.0.10.Final
    ./mvnw clean package

and then start it on port 8080 by running the resulting jar

   java -jar target/narayana-lra-coordinator-1.0.0-SNAPSHOT-runner.jar &

Build and start a participant on port 8081 and run an LRA but halt the JVM before closing it

The service will be quite basic:

@Path("/halt")
public class MigratableResource {
    private static final AtomicBoolean halt = new AtomicBoolean(false);

    @LRA(value = LRA.Type.REQUIRED)
    @PUT
    public void doInTransaction() {
        halt.set(true); // halt when compensate or complete are called
        // when the business method finishes the LRA is closed and the complete endpoint will be called
    }

    @PUT
    @Path("/compensate")
    @Compensate
    public Response compensate() {
        return Response.ok().build();
    }

    @PUT
    @Path("/complete")
    @Complete
    public Response complete(@HeaderParam(LRA_HTTP_RECOVERY_HEADER) String recoveryUrl) {
        if (halt.get()) {
            int port = 8082;
            String completionUrl = String.format("http://localhost:%d/halt/complete", port);

            System.out.printf("Ask the coordinator to send the completion notification on a new endpoint using:%n");
            System.out.printf("curl -X PUT %s -d '<%s>; rel=complete'%n", recoveryUrl, completionUrl);
            Runtime.getRuntime().halt(1);
        }
        System.out.printf("completed%n");
        return Response.ok().build();
    }
}

The interesting part happens during completion where the JVM is halted. Notice that the curl command for migrating the completion endpoint is printed prior to halting.

Now build and run the participant on port 8081 - the maven project is available from the narayana artifacts maven repository.

cd <participant directory>
mvn clean package
java -Dquarkus.http.port=8081 -jar target/quarkus-app/quarkus-run.jar &

and then call the service method using the curl utility, or otherwise:

curl -X PUT -I http://localhost:8081/halt

The service method is annotated with just @LRA(value = LRA.Type.REQUIRED) so when it finishes the completion callback will be invoked by the coordinator. Make a note of the curl request printed by the completion callback just before it halts the JVM. An example is (the Uids will change on each run):

curl -X PUT http://localhost:8080/lra-coordinator/recovery/0_ffffc0a801c7_9d57_677ad0a4_2/0_ffffc0a801c7_9d57_677ad0a4_5 \
  -d '<http://localhost:8082/halt/complete>; rel=complete'

Notice that the payload of the HTTP PUT request includes the specification of the new completion callback, namely <http://localhost:8082/halt/complete>; rel=complete.

The new endpoint will be used on the next recovery pass which is every two minutes by default.

Finally restart the service on the new endpoint (port 8082):

java -Dquarkus.http.port=8082 -jar target/quarkus-app/quarkus-run.jar &

When the coordinator next runs a recovery scan it should use the new endpoint and the service will report that it has completed its' service work by printing the text “completed” when the completion endpoint is by the coordinator.

Monday, September 30, 2024

Coping with Failures during Long Running Actions

In this brief note I want to draw attention to some of the features in the LRA protocol that can help service writers manage failures. LRA is a transaction protocol that provides certain desirable properties for building reliable systems such as Atomicity, (eventual) Consistency and Durability. Providing this level of assurance is non trivial but the protocol provides a simple model that can help participants to easily play their part in enabling such systems.

LRA is not just for orchestrating services, it is as equally as important for managing failures. Apart from the specification I have not seen many posts, articles etc covering this important topic, and it is this deficit that I’d like to address in some posts. I had wanted to kick off with an article and demonstration of participant failover but I hit an issue while writing the demo which we need to release the fix for before I can showcase that. So instead, in this post I’ll just bring to the readers attention one or two, but by no means all, of the main features that service writers can use to help them to create more reliable microservices, a preview if you like, before going into more depth in a subsequent post.

Some remarkable items to consider include:

  1. Failing participants must be restarted. There is an option to change the callbacks on restart, any of the endpoints can be changed, even passing over responsibility for, say, the compensation to some other microservice. Likewise, failing coordinators must be restarted if progress of LRAs is to be made.
  2. There is an @Status annotation on participants that the coordinator can use to monitor participant progress and to enable participants to fully participate in the recovery protocol, in particular there is support for non-idempotent compensate endpoints; if there is an @Status endpoint and the compensate endpoint has previously returned a 202 Accepted HTTP status code, then it will periodically poll the status endpoint until the participant reports that it has reached an end state. The @Forget annotation is used by the coordinator to inform the participant that it is free to clean up.
  3. There are state transitions which participants use to notify the coordinator of failures (FailedToCompensate and FailedToComplete) and of transitory states (Compensating and Completing).
  4. Managing timeouts, although the actions supported by the protocol are long running careful choice of time limits for actions can bound failure windows and reduce the need for complicated recover procedures.
  5. And of course there is support for nested Long Running Actions which is a jewel in the toolkit for building reliable distributed systems.

That’s all for now - I’ve deliberately kept the ideas brief and high level so that they can be explored in greater depth later.

Tuesday, June 25, 2024

Some experiments in migrating transaction logs

Transaction stores

Some time ago I prototyped a Redis based implementation of the SlotStore backend suitable for installations where nodes hosting the storage can come and go making it well suited for cloud based deployments of the Recovery Manager.

In the context of the CAP theorem of distributed computing, the recovery store needs to behave as a CP system, ie it needs to be able to tolerate network partitions and yet continue to provide Strong Consistency. Redis can provide the strong consistency guarantee if the RedisRaft module is used with Redis running as a cluster. RedisRaft achieves consistency and partition tolerance by ensuring that:

  • acknowledged writes are guaranteed to be committed and never lost,
  • reads will always return the most up-to-date committed write,
  • the cluster is sized correctly: a RedisRaft cluster of 3 nodes can tolerate a single node failure and a cluster of 5 can tolerate 2 node failures, … ie if the cluster is to tolerate losing N nodes then the cluster size must be at least 2*N+1, thus the minimum cluster size is 3 and the reason for having an odd number of nodes in the cluster is to avoid “split brain” scenarios during network partitions; an odd number guarantees that one side of the split will be in the majority.

During network splits the cluster will become unavailable for a while, ie the cluster is designed to survive failures of a few nodes in the cluster, but it is not a suitable solution for applications that require availability in the event of large net splits, however transaction systems favour Consistency over Availability.

A key motivator for this new SlotStore backend is to address a common problem with using the Narayana transaction stores on cloud platforms when scaling down a node that has in doubt transactions which can leave them unmanaged. Most cloud platforms can detect crashed nodes and restart them but this must be carefully managed to ensure that the restarted node is identically configured (same node identifier, same transaction store and same resource adapters). The current cloud solution, when running on Openshift, is to use a ReplicaSet and to veto scale down until all transactions are completed which can take an indeterminate amount of time, but if we can ask another member of the deployment to finish these in doubt transactions then all but the last node can be safely shutdown even with in doubt transactions. The resulting increase in availability in the presence of node or network failures is a significant benefit for transactional applications which, after all, is key reason why businesses are embracing cloud based deployments.

Remark: Redis is offered as a managed service on a majority of cloud platforms which can help customers to get started with this solution. But note that standard Redis excludes the RedisRaft module which is a requirment for use as a transaction store.

A Redis backed store

Redis is a key value store. Keys are stored in hash slots and hash slots are shared evenly amongst the shards (keys -> hash slots -> shards), the redis cluster specication contains the details. Re-sharding involves moving hash slots to other nodes, impacting performance. Thus, if we can control which hash slots the [slot store] keys map onto then we can improve performance under both normal and failure conditions. This periodic rebalancing of the cluster can be optimised if keys belonging to the same recovery manager are stored in the same hash slot, additionally having the keys, for a particular recovery node, colocated on a single cluster node is good for the general performance of the transaction store.

Also noteworthy is that the keys mapped to a particular hash slot can operated upon transactionally, which is not the case for keys in different slots, meaning that no inter-node hand-shaking is required. This feature opens up the possibility, perhaps, of allowing concurrent access to recovery logs by different recovery managers - but that’s something for a future iteration of the design, but if the logs in a store are shared then be aware that some Narayana recovery modules cache records so those implementations would need to re-evaluated, noting in particular that Redis has support for optimistic concurrency using the watch API which clients can use to observe updates to key values by other recovery managers.

Key space design

A recovery manager has a unique node identifier. We’d like to be able to form “recovery groups” such that any recovery manager in the group can manage transactions created by the others, but not at the same time. To this end we assign a “failoverGroupId” to each recovery manager and use that as the Redis key prefix. This will force all keys created by members of the failover group into the same hash slot, a cloud example of this idea is that the pods in a deployment would all share the same failoverGroupId so any pod in the deployement can take over when the deployment is scaled down.

Failover

Failover involves detecting when a member of the “recovery group” is removed from the cluster and to then migrate the keys to another member of the group. I added an example to the LRA recovery coordinator and used the jedis redis API rename command to “migrate” the keys which is an atomic operation.

Issues

The performance of Redis Raft in this implementation of the SlotStore backend is poor (more than 4 times slower than the default store); I have not invested any effort on improving it but may follow up with another post to discuss throughput since it is a general issue that needs to be solved for any cloud based object store, examples of topics to investigate include pipelining redis commands (similar to how we batch writes to the Journal Store), using Virtual Threads, etc.

We are therefore investigating other alternatives, including an Infinispan based slot store backend - Infinispan now supports partition tolerance so it has become a suitable candidate for a transaction store. Such a store will produce many of the benefits of a Redis store although its performance will be a key implementation constraint.

This design for the key space may not be suitable for transactions with subordinates or nested transactions or for ones that require participant logs to be distinct from the transaction log, such as JTS or XTS. I say may since a modification to the design should accomodate these models.

Assumptions

The cloud platform administrator is responsible for:

  • detecting and restarting failed nodes;
  • issuing the migrate command on one of the remaining nodes
  • for detecting when the deployment is scaled down to zero with pending transactions (including orphans) and emitting a warning accordingly

Example of how to migrate logs

The demo presents the use case of migrating logs between LRA coordinators. To run the demo you will need to:

  1. Clone and build a narayana git branch with support for a Redis backed store.
  2. Start a 3-node cluster of Redis nodes running RedisRaft.
  3. Build and start two LRA coordinators with distinct node id’s, node1 and node2.
  4. Start an LRA on the first coordinator and then halt it to simulate a failure.
  5. View the redis keys using the Redis CLI noticing that the keys embed the node id of the owning coordinator.
  6. Ask the second coordinator to migrate the keys from node1 to node2.
  7. Coordinators maintain a cache of LRAs, but since this is just a PoC I haven’t implemented refreshing internal caches so you will need to simulate that by restarting the second coordinator.
  8. When the first periodic recovery cycle runs (the default is every 2 minutes) the migrated LRAs will be detected which you can verify (curl http://localhost:50001/lra-coordinator/|jq).

Please refer to the demonstrator instructions for the full details.

Notes

The JIRA issue and branch is JBTM-3762.

  • the implementation is is in the slot store directory
  • the tests can be ran using mvn test -Predis-store -Dtest=RedisStoreTest#test1 -f ArjunaCore/arjuna/pom.xml and assume that a redis cluster is running on the test machine
  • the demo is work in progress and is strictly a PoC
  • redis raft implementation: git clone https://github.com/RedisLabs/redisraft.git and build it with: cmake and make

Wednesday, September 6, 2023

A Review of Recent Narayana Releases

The last four releases of Narayana have brought some noteworthy changes, closing 86 issues in the process, which I’d like to summarise in this brief post. The contributions have come from both the broader community and the core Narayana team, thank you for that. The changes include bug fixes, dependency upgrades and tasks and features.

Community

Improve inclusiveness by building a community of users

We reviewed our existing guidance, adding clarifying text to the contributing guide and added a SECURITY.md file. The latest snapshot adds an email address for reporting security issues.

Conscious Language

We also reviewed our materials to ensure that we use welcoming language, free from offensive, othering, or otherwise problematic communication styles.

New Additions/Features

All maven modules were migrated from Java EE to Jakarta EE (which included the main narayana repo plus the quickstart, jboss-transaction-spi and performance repos).

There is now a BOM for narayana (JBTM-3735). To depend on the correct versions in your projects just include the following dependency:

      <dependency>
        <groupId>org.jboss.narayana</groupId>
        <artifactId>narayana-bom</artifactId>
        <version>latest version</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>

The new license for Narayana is Apache License 2.0, it replaces LGPL and provides consumers with more flexibility when releasing their own software products that incorporate Narayana (JBTM-3764).

Issue JBTM-3734 was resolved by a community contributor, it introduced support for JEP-444: Virtual Threads. Virtual threads “dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications”. The change replaced many occurrences of the synchronized java keyword with ReentrantLock which in most usages, but not all, should be semantically equivalent. The change is an API breaking change so we released the update in a major version, 7.0.0.Final.

Removal of features

All modules have been migrated to Jakarta EE and Java EE is not supported.

Release 6.0.0.Final removed the transformed Jakarta maven modules (ones that ended in “-jakarta”).

The OSGi module is no longer available, please refer to the issue for the reason why this decision was made.

Quickstarts showing integration of Spring and Tomcat with Narayana have been temporarily disabled because at the time of the Jakarta migration, Tomcat and Spring had not yet added Jakarta support to their offerings. Issue JBTM-3803 was created for them to be re-enabled when Jakarta variants become available.

Long Running Actions for MicroProfile (LRA)

Release 6.0.0.Final was certified against MicroProfile LRA 2.0.

We added a Narayana specific feature to allow LRA participants to store data with the coordinator (3rd section) during the registration phase. The feature is configurable, using the MicroProfile Config approach, because some users may prefer not to entrust their business data with the coordinator.

The bug fix for JBTM-3749 facilitated the integration of LRA into WildFly, LRA support in WildFly was added with issue WFLY-14869 by Martin Stefanko, an active contributor to LRA. JBTM-3749 provided a partial fix for JBTM-3552 (Do not rely on thread locals for propagating LRA context) and it also included a doc update recommending that users explicitly set the LRA context when JAX-RS resource methods perform outgoing JAX-RS invocations.

The latest snapshot of narayana includes documentation about configuring the concurrency of the LRA coordinator start method, the details are in issue JBTM-3753.

Transaction Logging

Transaction managers log data in order to provide the Durability property of a transactions. Narayana supports a variety of persistence stores, including logging to a database which we call the JDBCStore. JBTM-3724 included a quickstart for this store and JBTM-3754 introduced an option to supply the DataSource for connecting to the store at runtime for use with the Quarkus extension for JTA transactions.

Thursday, April 21, 2022

Narayana on the Cloud - Part 1

In the last few months, I have been working on how distributed transactions are recovered in WildFly when this Application Server (AS) is deployed in Kubernetes. This blog post is a reflection on how Narayana performs on the cloud and the features it is still missing for it to evolve into a native cloud transaction suite.

Some (very brief) context

Narayana started its journey more than 30 years ago! ArjunaCore was developed in the late 1980s. Even though the theoretical concept of cloud computing was introduced by John McCarthy in 1961 [1][2], at the time of ArjunaCore’s development it was still considered only as a theoretical possibility. However, in the past two decades, the implementation of cloud computing has increased exponentially, dramatically changing the world of technology. As a consequence, Narayana (and its ArjunaCore) needs to step up its game to become a cloud native transaction suite that can be used in different cloud environments. This is an ongoing conversation the Narayana team has started a long time ago (for a detailed summary of Narayana's Cloud Strategy see [3]).

Narayana was introduced to the cloud through WildFly (note 1) on Kubernetes (K8s). In my recent experience, I worked on WildFly and its K8s operator [4] and I think that the integration between Narayana and WildFly works very smoothly on K8s [5]. On the other hand, when the pod hosting WildFly needs to scale down, the ephemeral nature of K8s does not get along with Narayana very well. In fact, ArjunaCore/Narayana needs to have a stable ground to perform its magic (within or without WildFly). In particular, Narayana needs to have:

  • A stable and durable Object Store where objects’ states are held
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
  • A stable communication channel to allow participants of transactions to communicate with the TM

In all points above, “stable” indicates the ability to survive whatever happens to the host where Narayana is running (e.g., crashes). On the other hand, K8s is an ephemeral environment where pods do not need a stable storage and/or particular configurations that survive over multiple reboots. To overcome this “incompatibility”, K8s provides StatefulSet [6] through which applications can leverage a stable realm. Particularly in relation to Narayana, the employment of StatefulSet and the addition of a transaction recovery module to the WildFly K8s Operator [7] enables this AS to fully support transactions on K8s. Unfortunately, this solution is tailor-made for K8s and it cannot be easily ported in other cloud environments. Our target, though, is to evolve Narayana to become a cloud transaction suite, which means that Narayana should also support other cloud computing infrastructures.

Our take on this

The Narayana team thoroughly discussed the above limitations that prevent Narayana from becoming a native cloud application. A brief summary is presented here:

  • A stable and durable Object Store where objects’ states are held
    Narayana is able to use different kinds of object stores; in particular, it is possible to use a (SQL) database to create the object store [8]. RDBMS databases are widely available on cloud environments: these solutions already cover our stability needs providing a reliable storage solution that supports replications and that is able to scale up on demand. Moreover, using a “centralised” RDBMS database would easen the management of multiple Narayana instances, which can be connected to the same database. This might also become incredibly useful in the future when it comes to evolving Narayana to work with multiple instances behind a load balancer (i.e. in case of replication)
     
  • A stable communication channel to allow participants of transactions to communicate with the TM
    Most cloud providers (and platforms) already offer two options to tackle this problem: a stable IP address and a DNS. Although both methods still need some tweaking for each cloud provider, these solutions should provide a stable endpoint to communicate with Narayana’s TM over multiple reboots
     
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
    This is the actual sticky point this blog post is about. Although it seems straightforward to assign a unique node identifier to the TM, it is indeed the first real logic challenge to solve on the path to turn Narayana in a cloud transaction manager

We discussed different possible solutions to this last point but we are still trying to figure out how to address this issue. The main problem is that Narayana needs stable storage to save the node identifier and reload it after a reboot. As already said, cloud environments do not provide this option very easily as their ephemeral nature is more inclined to a stateless approach. Our first idea to solve this problem was, “why do we not store the node identifier in the object store? Narayana still needs a stable object store (and this constraint cannot be dropped) and RDBMS databases on the cloud already provide a base to start from”. The node identifier is a property of the transaction manager that gets initialised when Narayana/ArjunaCore starts (together with all the other properties). As a consequence, it is not possible to save the node identifier in the object store as the preferences for the object store are also loaded during the same initialisation process! In other words, if the node identifier is stored in the object store, how can Narayana/ArjunaCore know where the object store is without loading all properties? Which came first: the chicken or the egg? Nevertheless, introducing an order when properties are loaded might help in this regard (i.e. we force the egg to exist before the chicken). Nevertheless, there is still a problem: what happens if the object store is shared between different instances of Narayana/ArjunaCore? For example, it might be very likely that a Narayana administrator configures multiple Narayana instances to create their object stores in the same database. In this case, every Narayana instance would need a unique identifier to tell which node identifier in the object store is its own. Recursive problems are fun :-) Even if we solve all these problems, the assignment of the node identifier should not be possible outside of Narayana (e.g. using system properties) and it should become an exclusive (internal) operation of Narayana. Fortunately, this is easier than solving our previous “chicken and egg” problem as there are solutions to generate a (almost) unique distributed identifier locally [9]. As things stand, we should find an alternative solution to port the node identifier to the cloud.

Looking at this problem from a different point of view, I wonder if there are more recent solutions to replace and/or remove the node identifier from Narayana. With this in mind, the first question I ask myself is “Why do we need a node identifier?”. Behind the hood, Narayana uses a recovery manager to try to recover transactions that have not completed their lifecycle. This comes with a caveat though: it is essential that two different recovery managers do not try to recover the same in-doubt transaction at the same time. That is where the node identifier comes in handy! In fact, thanks to the unique node identifier (that gets embedded in every global transaction identifier), the recovery manager can recognise if it is responsible for the recovery of an in-doubt transaction stored in a remote resource (note 2). This concept is best illustrated by an example. Let’s consider two different Narayana instances that initiate two different transactions that enlist the same resource. In this scenario, both transaction managers store a record in the shared resource. Let’s assume that the first Narayana instance starts the transaction before the second instance. While the first transaction gets to the point where it has sent prepare() to its enlisted resources, it is possible that the recovery manager of the second Narayana instance queries the shared resource for in-doubt records. If Narayana’s recovery manager was not forced to recover only transactions initiated by the same Narayana instance’s TM, this hypothetical scenario would have ended with an error: the recovery manager of the second Narayana instance would have rolled back the transaction initiated by the first Narayana instance, assuming that it was one of its own in-doubt transaction!

Cloud environments are encouraging (all of) us to come up with an innovative solution to reduce the footprint of Narayana/ArjunaCore. In particular, the node identifier is the challenge we are currently facing and the first real step to push Narayana onto the cloud. I will share any updates the Narayana team comes up with…and in the meantime, feel free to reach out to the team through our public channels (for example Gitter or our Google group narayana-users) to propose your ideas or discuss with us your take on this fundamental issue.

Note

  1. WildFly supports transactions thanks to the integration with Narayana
  2. It is possible to tell the Recovery Manager that it will be responsible for the recovery of in-doubt transactions initiated by different transaction managers (which are identified with different node identifiers). The only caveat here is that two Recovery Managers should not recover the same in-doubt transaction at the same time. To assign the responsibility of multiple node identifiers to the same Recovery Manager, the property xaRecoveryNodes [10] in Narayana’s JTAEnvironmentBean should be used.

Bibliography

[1] J. Surbiryala and C. Rong, "Cloud Computing: History and Overview," 2019 IEEE Cloud Summit, 2019, pp. 1-7, doi: 10.1109/CloudSummit47114.2019.00007.

[2] Garfinkel, Simson L. and Harold Abelson. “Architects of the Information Society: 35 Years of the Laboratory for Computer Science at Mit.” (1999).

[3] https://jbossts.blogspot.com/2022/03/narayana-community-priorities.html

[4] https://github.com/wildfly/wildfly-operator

[5] https://issues.redhat.com/browse/EAP7-1394

[6] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

[7] https://github.com/wildfly/wildfly-operator/

[8] https://www.narayana.io/docs/project/index.html#d0e459

[9] https://groups.google.com/g/narayana-users/c/ttSff9HvXdA

[10] https://www.narayana.io//docs/product/index.html#d0e1032

Friday, March 4, 2022

Narayana Community Priorities

Narayana Community Priorities

The following is an outline of our near term priorities for the Narayana open source transaction manager. They have been set based on input from the community, including the narayana-users forum discussion.

It is not necessarily a complete list, so please continue to share your own thoughts on whether you agree they are the right focus for the project, in some respects the list is quite ambitious and we encourage/need and welcome continued contributions and discussion from the community to help achieve these goals.

Community Engagement

  1. Improve inclusiveness by building a community of users:
    • produce clear guidance on how to contribute with different levels of guidance
    • responsive to the community (PRs, queries, issues, rooms etc)
    • issue labels for new contributors and for tasks that we need help with
    • make sure all new features are publicised (blog, articles, docs, etc)
    • regular blog posts with runnable and focused examples
    • acknowledge contributors in release announcements
    • encourage discussions in the community (i.e. minimise private team discussions)

Java Versions

  1. Support native JTA 2.0, EE 10 and Jakarta EE namespace: this work is already well under way. Java SE 11 is now the minimum runtime supported by WildFly and Jakarta EE compatible implementations.
  2. Remove support for Java SE 8 (i.e. SE 11 will the minimum supported version) and add support for Java SE 17

Integrating contemporary services:

  1. Kafka Integration
  2. Quarkus support for REST-AT. This task depends on SRA (aka REST-AT annotations) Tasks

Cloud strategy:

  1. Managed Transaction Service
  2. An improved cloud strategy for JTA around recovery (again we have already started work in this area). Currently we need to create a bespoke solution for every cloud, e.g. the WildFly kubernetes operator. This task includes provision of a more “cloud ready” transaction log store. The task still needs to be pinned down but some relevant input material includes:
    1. Transactional cloud resources (this includes an investigation of whether an Infinispan based store is feasible - note that earlier versions were incompatible with our needs)
    2. Investigate jgroups-raft and whether this can help with creating a cloud-ready object store
    3. Add clustering support
    4. Add an SPI nethod to obtain a unique identifier for a transaction
    5. No easy way to acquire the node name from the JBoss Transaction SPI

    There is also the forum item: reliably generate node identifiers which will help with using Narayana in cloud deployments:

    • the task should also explore the pros and cons of storing it for crash recovery purposes
    • the forum thread also includes some work that we may do on validating our current uid solution for cloud environments
  3. Better integration of LRA in cloud environments:
    1. Ensure that any LRA coordinator instance can control any LRA
    2. Allow different LRA coordinators to share an object store

Transaction Log Stores

  1. Persistent Memory: narayana already provides a pmem based object store which we would like to integrate into WildFly
  2. Journal Store performance improvements
  3. Provide a native file lock alternative to our own Object Store locking (FileLock.java) for managing concurrent access to a transaction log. It should be configurable at runtime or build time (Quarkus is a good use case). If the runtime platform does not provide the capability then a default fallback mechanism will be defined.

Upgrades/Deprecation/Removal/Replacement of existing functionality:

  1. Remove Transactional Driver in favour of using Agroal. We are tracking this work using JBTM-3439.
  2. Remove txframework - it was previously deprecated by the compensations module. The issue tracker is Remove old TXFramework API
  3. Remove support for JacORB which is now EOL
  4. Upgrade to JUnit 5 (from 4) for unit testing: Testing Narayana using JUnit 5

Other

  1. Improved support for asynchronous APIs. Although we continue to be tied to XA and very few resource managers support the asynchronous component of the XA spec section 3.5 Synchronous, Non-blocking and Asynchronous Modes, there are still things we would like to do in this area including Asynchronous JTA

Wednesday, January 5, 2022

Securing LRA endpoints using JWT

Introduction

JWT stands for JSON Web Token, which is a popular way to do user authorization in web application and is also popular in the context of micro-services. So, when we use Long Running Actions (LRA) in any micro-service, the transaction APIs could be authorized using JWT tokens. Open industry standard specification RFC-7519 outlines how JTW is structured and how to use it. JWT works over HTTP protocol. The reason JWT is now a days preferred more is because it makes the authorization mechanism easier for micro-service applications, avoids single point of failure and also helps the application design to be more scalable.

Here is how JWT is structured: [<HEADER>.<PAYLOAD>.<SIGNATURE>]

The JWT token is divided into three parts, as we can see in the above example which are separated by two periods.

    1: HEADER    -> base64UrlEncode(header)
    2: PAYLOAD   -> base64UrlEncode(payload)
    3: SIGNATURE -> encryptionAlgorithm(base64UrlEncode(header) + '.' + base64UrlEncode(payload),256-bit-SECRET)
You can create your own JWT token by visiting website jwt.io. JWT is a value token, which will only contain the user information in PAYLOAD, with the name of type of algorithm used in the HEADER and  the token verification signature in the SIGNATURE part.


The above figure shows the implication of JWT. The server will create JWT token and will give it to the client, so that client can send it back on the subsequent request. Once the JWT token is created and provided to the client, we can do a REST call to  as below:
 curl -H "Authorization:Bearer [<HEADER>.<PAYLOAD>.<SIGNATURE>]" http://127.0.0.1:8080/app/api

Securing LRA endpoints

There are various LRA annotations used, which will internally call the REST APIs that are present in Coordinator and RecoveryCoordinator classes. So, below are the recommendations to, how to define roles for each and every APIs in order to create JWT token for client.

LRA-endPointsAllowed-roles
getAllLRAsclient
getLRAStatusclient

getLRAInfoclient
startLRAclient

renewTimeLimitclient
getNestedLRAStatusclient

closeLRAclient
cancelLRAclient

joinLRAViaBodyclient
leaveLRAclient

completeNestedLRAsystem
compensateNestedLRAsystem

forgetNestedLRAsystem
getCompensatoradmin

replaceCompensatoradmin
getRecoveringLRAsadmin

getFailedLRAsadmin
deleteFailedLRAadmin

One of the popular tool that could be used to generate JWT tokens would be Keycloak. Keycloak is an open source identity and access management solution. For more details about Keycloak you can also visit keycloak.org.

Problems with JWT and their solutions


1. Anyone can read first two parts of JWT tokens, i.e. HEADER and PAYLOAD, which are only base64 encoded. So, the PAYLOAD part must not contain any confidential information. It should contain enough information so that server could know who the user is.

2. If someone steals your JWT token, it will work for anyone. So in order to avoid the theft, we should be careful about how we are transmitting JWT. It has to be HTTPS connection and by using the process of OAuth which comes with its own security and protection to make sure people don't steal JWT tokens.

3. In compare to session based authentication, if someone steals sessionID, we can log off, which ends the session and it doesn't exist anymore. But in case of JWT there is nothing on the server to end. Since the whole information is inside JWT, we only set expiration for JWT by having expiry PAYLOADs, but we cannot log off. This situation can be handled by creating blacklisted JWTs table at server side and when the request comes to server, that JWT token will be validated if not the blacklisted one then the server will authorize the request if the token had valid signature.

4. If we choose to use Expiry JWT token for LRA, then if the transaction did not complete before the token expiration, then transaction will never complete. So avoid using Expiry JWT tokens with LRA and try to follow above three ways in order to avoid the security breaches.