Thursday, April 21, 2022

Narayana on the Cloud - Part 1

In the last few months, I have been working on how distributed transactions are recovered in WildFly when this Application Server (AS) is deployed in Kubernetes. This blog post is a reflection on how Narayana performs on the cloud and the features it is still missing for it to evolve into a native cloud transaction suite.

Some (very brief) context

Narayana started its journey more than 30 years ago! ArjunaCore was developed in the late 1980s. Even though the theoretical concept of cloud computing was introduced by John McCarthy in 1961 [1][2], at the time of ArjunaCore’s development it was still considered only as a theoretical possibility. However, in the past two decades, the implementation of cloud computing has increased exponentially, dramatically changing the world of technology. As a consequence, Narayana (and its ArjunaCore) needs to step up its game to become a cloud native transaction suite that can be used in different cloud environments. This is an ongoing conversation the Narayana team has started a long time ago (for a detailed summary of Narayana's Cloud Strategy see [3]).

Narayana was introduced to the cloud through WildFly (note 1) on Kubernetes (K8s). In my recent experience, I worked on WildFly and its K8s operator [4] and I think that the integration between Narayana and WildFly works very smoothly on K8s [5]. On the other hand, when the pod hosting WildFly needs to scale down, the ephemeral nature of K8s does not get along with Narayana very well. In fact, ArjunaCore/Narayana needs to have a stable ground to perform its magic (within or without WildFly). In particular, Narayana needs to have:

  • A stable and durable Object Store where objects’ states are held
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
  • A stable communication channel to allow participants of transactions to communicate with the TM

In all points above, “stable” indicates the ability to survive whatever happens to the host where Narayana is running (e.g., crashes). On the other hand, K8s is an ephemeral environment where pods do not need a stable storage and/or particular configurations that survive over multiple reboots. To overcome this “incompatibility”, K8s provides StatefulSet [6] through which applications can leverage a stable realm. Particularly in relation to Narayana, the employment of StatefulSet and the addition of a transaction recovery module to the WildFly K8s Operator [7] enables this AS to fully support transactions on K8s. Unfortunately, this solution is tailor-made for K8s and it cannot be easily ported in other cloud environments. Our target, though, is to evolve Narayana to become a cloud transaction suite, which means that Narayana should also support other cloud computing infrastructures.

Our take on this

The Narayana team thoroughly discussed the above limitations that prevent Narayana from becoming a native cloud application. A brief summary is presented here:

  • A stable and durable Object Store where objects’ states are held
    Narayana is able to use different kinds of object stores; in particular, it is possible to use a (SQL) database to create the object store [8]. RDBMS databases are widely available on cloud environments: these solutions already cover our stability needs providing a reliable storage solution that supports replications and that is able to scale up on demand. Moreover, using a “centralised” RDBMS database would easen the management of multiple Narayana instances, which can be connected to the same database. This might also become incredibly useful in the future when it comes to evolving Narayana to work with multiple instances behind a load balancer (i.e. in case of replication)
     
  • A stable communication channel to allow participants of transactions to communicate with the TM
    Most cloud providers (and platforms) already offer two options to tackle this problem: a stable IP address and a DNS. Although both methods still need some tweaking for each cloud provider, these solutions should provide a stable endpoint to communicate with Narayana’s TM over multiple reboots
     
  • A stable node identifier to uniquely mark transactions (which are initialised by the Transaction Manager (TM) with the same node identifier) and ensure that the Recovery Manager will only recover those transactions
    This is the actual sticky point this blog post is about. Although it seems straightforward to assign a unique node identifier to the TM, it is indeed the first real logic challenge to solve on the path to turn Narayana in a cloud transaction manager

We discussed different possible solutions to this last point but we are still trying to figure out how to address this issue. The main problem is that Narayana needs stable storage to save the node identifier and reload it after a reboot. As already said, cloud environments do not provide this option very easily as their ephemeral nature is more inclined to a stateless approach. Our first idea to solve this problem was, “why do we not store the node identifier in the object store? Narayana still needs a stable object store (and this constraint cannot be dropped) and RDBMS databases on the cloud already provide a base to start from”. The node identifier is a property of the transaction manager that gets initialised when Narayana/ArjunaCore starts (together with all the other properties). As a consequence, it is not possible to save the node identifier in the object store as the preferences for the object store are also loaded during the same initialisation process! In other words, if the node identifier is stored in the object store, how can Narayana/ArjunaCore know where the object store is without loading all properties? Which came first: the chicken or the egg? Nevertheless, introducing an order when properties are loaded might help in this regard (i.e. we force the egg to exist before the chicken). Nevertheless, there is still a problem: what happens if the object store is shared between different instances of Narayana/ArjunaCore? For example, it might be very likely that a Narayana administrator configures multiple Narayana instances to create their object stores in the same database. In this case, every Narayana instance would need a unique identifier to tell which node identifier in the object store is its own. Recursive problems are fun :-) Even if we solve all these problems, the assignment of the node identifier should not be possible outside of Narayana (e.g. using system properties) and it should become an exclusive (internal) operation of Narayana. Fortunately, this is easier than solving our previous “chicken and egg” problem as there are solutions to generate a (almost) unique distributed identifier locally [9]. As things stand, we should find an alternative solution to port the node identifier to the cloud.

Looking at this problem from a different point of view, I wonder if there are more recent solutions to replace and/or remove the node identifier from Narayana. With this in mind, the first question I ask myself is “Why do we need a node identifier?”. Behind the hood, Narayana uses a recovery manager to try to recover transactions that have not completed their lifecycle. This comes with a caveat though: it is essential that two different recovery managers do not try to recover the same in-doubt transaction at the same time. That is where the node identifier comes in handy! In fact, thanks to the unique node identifier (that gets embedded in every global transaction identifier), the recovery manager can recognise if it is responsible for the recovery of an in-doubt transaction stored in a remote resource (note 2). This concept is best illustrated by an example. Let’s consider two different Narayana instances that initiate two different transactions that enlist the same resource. In this scenario, both transaction managers store a record in the shared resource. Let’s assume that the first Narayana instance starts the transaction before the second instance. While the first transaction gets to the point where it has sent prepare() to its enlisted resources, it is possible that the recovery manager of the second Narayana instance queries the shared resource for in-doubt records. If Narayana’s recovery manager was not forced to recover only transactions initiated by the same Narayana instance’s TM, this hypothetical scenario would have ended with an error: the recovery manager of the second Narayana instance would have rolled back the transaction initiated by the first Narayana instance, assuming that it was one of its own in-doubt transaction!

Cloud environments are encouraging (all of) us to come up with an innovative solution to reduce the footprint of Narayana/ArjunaCore. In particular, the node identifier is the challenge we are currently facing and the first real step to push Narayana onto the cloud. I will share any updates the Narayana team comes up with…and in the meantime, feel free to reach out to the team through our public channels (for example Gitter or our Google group narayana-users) to propose your ideas or discuss with us your take on this fundamental issue.

Note

  1. WildFly supports transactions thanks to the integration with Narayana
  2. It is possible to tell the Recovery Manager that it will be responsible for the recovery of in-doubt transactions initiated by different transaction managers (which are identified with different node identifiers). The only caveat here is that two Recovery Managers should not recover the same in-doubt transaction at the same time. To assign the responsibility of multiple node identifiers to the same Recovery Manager, the property xaRecoveryNodes [10] in Narayana’s JTAEnvironmentBean should be used.

Bibliography

[1] J. Surbiryala and C. Rong, "Cloud Computing: History and Overview," 2019 IEEE Cloud Summit, 2019, pp. 1-7, doi: 10.1109/CloudSummit47114.2019.00007.

[2] Garfinkel, Simson L. and Harold Abelson. “Architects of the Information Society: 35 Years of the Laboratory for Computer Science at Mit.” (1999).

[3] https://jbossts.blogspot.com/2022/03/narayana-community-priorities.html

[4] https://github.com/wildfly/wildfly-operator

[5] https://issues.redhat.com/browse/EAP7-1394

[6] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

[7] https://github.com/wildfly/wildfly-operator/

[8] https://www.narayana.io/docs/project/index.html#d0e459

[9] https://groups.google.com/g/narayana-users/c/ttSff9HvXdA

[10] https://www.narayana.io//docs/product/index.html#d0e1032

Friday, March 4, 2022

Narayana Community Priorities

Narayana Community Priorities

The following is an outline of our near term priorities for the Narayana open source transaction manager. They have been set based on input from the community, including the narayana-users forum discussion.

It is not necessarily a complete list, so please continue to share your own thoughts on whether you agree they are the right focus for the project, in some respects the list is quite ambitious and we encourage/need and welcome continued contributions and discussion from the community to help achieve these goals.

Community Engagement

  1. Improve inclusiveness by building a community of users:
    • produce clear guidance on how to contribute with different levels of guidance
    • responsive to the community (PRs, queries, issues, rooms etc)
    • issue labels for new contributors and for tasks that we need help with
    • make sure all new features are publicised (blog, articles, docs, etc)
    • regular blog posts with runnable and focused examples
    • acknowledge contributors in release announcements
    • encourage discussions in the community (i.e. minimise private team discussions)

Java Versions

  1. Support native JTA 2.0, EE 10 and Jakarta EE namespace: this work is already well under way. Java SE 11 is now the minimum runtime supported by WildFly and Jakarta EE compatible implementations.
  2. Remove support for Java SE 8 (i.e. SE 11 will the minimum supported version) and add support for Java SE 17

Integrating contemporary services:

  1. Kafka Integration
  2. Quarkus support for REST-AT. This task depends on SRA (aka REST-AT annotations) Tasks

Cloud strategy:

  1. Managed Transaction Service
  2. An improved cloud strategy for JTA around recovery (again we have already started work in this area). Currently we need to create a bespoke solution for every cloud, e.g. the WildFly kubernetes operator. This task includes provision of a more “cloud ready” transaction log store. The task still needs to be pinned down but some relevant input material includes:
    1. Transactional cloud resources (this includes an investigation of whether an Infinispan based store is feasible - note that earlier versions were incompatible with our needs)
    2. Investigate jgroups-raft and whether this can help with creating a cloud-ready object store
    3. Add clustering support
    4. Add an SPI nethod to obtain a unique identifier for a transaction
    5. No easy way to acquire the node name from the JBoss Transaction SPI

    There is also the forum item: reliably generate node identifiers which will help with using Narayana in cloud deployments:

    • the task should also explore the pros and cons of storing it for crash recovery purposes
    • the forum thread also includes some work that we may do on validating our current uid solution for cloud environments
  3. Better integration of LRA in cloud environments:
    1. Ensure that any LRA coordinator instance can control any LRA
    2. Allow different LRA coordinators to share an object store

Transaction Log Stores

  1. Persistent Memory: narayana already provides a pmem based object store which we would like to integrate into WildFly
  2. Journal Store performance improvements
  3. Provide a native file lock alternative to our own Object Store locking (FileLock.java) for managing concurrent access to a transaction log. It should be configurable at runtime or build time (Quarkus is a good use case). If the runtime platform does not provide the capability then a default fallback mechanism will be defined.

Upgrades/Deprecation/Removal/Replacement of existing functionality:

  1. Remove Transactional Driver in favour of using Agroal. We are tracking this work using JBTM-3439.
  2. Remove txframework - it was previously deprecated by the compensations module. The issue tracker is Remove old TXFramework API
  3. Remove support for JacORB which is now EOL
  4. Upgrade to JUnit 5 (from 4) for unit testing: Testing Narayana using JUnit 5

Other

  1. Improved support for asynchronous APIs. Although we continue to be tied to XA and very few resource managers support the asynchronous component of the XA spec section 3.5 Synchronous, Non-blocking and Asynchronous Modes, there are still things we would like to do in this area including Asynchronous JTA

Wednesday, January 5, 2022

Securing LRA endpoints using JWT

Introduction

JWT stands for JSON Web Token, which is a popular way to do user authorization in web application and is also popular in the context of micro-services. So, when we use Long Running Actions (LRA) in any micro-service, the transaction APIs could be authorized using JWT tokens. Open industry standard specification RFC-7519 outlines how JTW is structured and how to use it. JWT works over HTTP protocol. The reason JWT is now a days preferred more is because it makes the authorization mechanism easier for micro-service applications, avoids single point of failure and also helps the application design to be more scalable.

Here is how JWT is structured: [<HEADER>.<PAYLOAD>.<SIGNATURE>]

The JWT token is divided into three parts, as we can see in the above example which are separated by two periods.

    1: HEADER    -> base64UrlEncode(header)
    2: PAYLOAD   -> base64UrlEncode(payload)
    3: SIGNATURE -> encryptionAlgorithm(base64UrlEncode(header) + '.' + base64UrlEncode(payload),256-bit-SECRET)
You can create your own JWT token by visiting website jwt.io. JWT is a value token, which will only contain the user information in PAYLOAD, with the name of type of algorithm used in the HEADER and  the token verification signature in the SIGNATURE part.


The above figure shows the implication of JWT. The server will create JWT token and will give it to the client, so that client can send it back on the subsequent request. Once the JWT token is created and provided to the client, we can do a REST call to  as below:
 curl -H "Authorization:Bearer [<HEADER>.<PAYLOAD>.<SIGNATURE>]" http://127.0.0.1:8080/app/api

Securing LRA endpoints

There are various LRA annotations used, which will internally call the REST APIs that are present in Coordinator and RecoveryCoordinator classes. So, below are the recommendations to, how to define roles for each and every APIs in order to create JWT token for client.

LRA-endPointsAllowed-roles
getAllLRAsclient
getLRAStatusclient

getLRAInfoclient
startLRAclient

renewTimeLimitclient
getNestedLRAStatusclient

closeLRAclient
cancelLRAclient

joinLRAViaBodyclient
leaveLRAclient

completeNestedLRAsystem
compensateNestedLRAsystem

forgetNestedLRAsystem
getCompensatoradmin

replaceCompensatoradmin
getRecoveringLRAsadmin

getFailedLRAsadmin
deleteFailedLRAadmin

One of the popular tool that could be used to generate JWT tokens would be Keycloak. Keycloak is an open source identity and access management solution. For more details about Keycloak you can also visit keycloak.org.

Problems with JWT and their solutions


1. Anyone can read first two parts of JWT tokens, i.e. HEADER and PAYLOAD, which are only base64 encoded. So, the PAYLOAD part must not contain any confidential information. It should contain enough information so that server could know who the user is.

2. If someone steals your JWT token, it will work for anyone. So in order to avoid the theft, we should be careful about how we are transmitting JWT. It has to be HTTPS connection and by using the process of OAuth which comes with its own security and protection to make sure people don't steal JWT tokens.

3. In compare to session based authentication, if someone steals sessionID, we can log off, which ends the session and it doesn't exist anymore. But in case of JWT there is nothing on the server to end. Since the whole information is inside JWT, we only set expiration for JWT by having expiry PAYLOADs, but we cannot log off. This situation can be handled by creating blacklisted JWTs table at server side and when the request comes to server, that JWT token will be validated if not the blacklisted one then the server will authorize the request if the token had valid signature.

4. If we choose to use Expiry JWT token for LRA, then if the transaction did not complete before the token expiration, then transaction will never complete. So avoid using Expiry JWT tokens with LRA and try to follow above three ways in order to avoid the security breaches.