Sunday, September 9, 2018

Tips on how to evaluate STM implementations

Software Transactional Memory (STM) is a way of providing transactional behaviour for threads operating on shared memory. The transaction is an atomic and isolated set of changes to memory such that prior to commit no other thread sees the memory updates and after commit the changes appear to take effect instantaneously so other threads never see partial updates but on abort all of the updates are discarded.

Unlike other models such as XA, OTS, JTA, WS-AT etc, with STM there is no accepted standard for developers to program against. Consequently the various implementations of STM differ in important respects which have consequences for how application developers build their software. I recently came upon an excellent book on Transactional Memory where the authors James Larus and Ravi Rajwar presented a taxonomy of features and characteristics that can be used to differentiate the various STM implementations from each other. In this and subsequent blogs I will explain the taxonomy and identify where the Narayana STM solution (which was introduced in Mark Little's initial blog on the topic) fits into it. Towards the end of the series I will include some tips, best practices and advice on how you can get the most out of the Narayana implementation of STM.

In this first article I will cover isolation, nesting and exception handling. In later articles I will discuss topics such as conflict detection and resolution, transaction granularity, concurrency control etc.

By way of motivation, why would one want to use STM in favour of other transaction models and concurrency control mechanisms:
  • The STM approach of mutating data inside of a transaction has some nice features:
    • It is less error prone since the demarcation of an atomic block of code is primitive but other synchronisation approaches are many and varied. Techniques such as locks, semaphores, signals etc are tricky to get right, for example the programmer must ensure that accesses are protected with the correct locks and in the correct order. With conventional concurrency control, imagine trying reverse all the changes made during a computation if a problem such as deadlock or data race is detected, whereas code changes that are protected by STM can be aborted in a single statement.
    • Transactional updates make it easier for the programmer to reason about his code (it is clear how different threads affect each other) and data (because it simplifies the sharing of state between threads).
  • The declarative approach (where the programmer simply marks which code blocks are transactional) means concurrent programming is more intuitive with no explicit locks or synchronisation to worry about.
  • Can perform much better than fine grained locking (which can lead to deadlock) and coarse grained locking (which inhibits concurrency):
    • If a thread takes a lock and is context switched or incurs cache misses or page faults etc then other threads that need the lock are stalled until the thread is rescheduled or until the needed data is retrieved.
    • With STM, updates can be batched up and speculatively committed together.
    • The runtime manages lock acquisition and release and resolves conflicts (using approaches such as timeouts and retries).
  • It is easier to compose operations using a technique called nesting (traditionally composing two operations can produce concurrency problems unless one analyses in detail the locking approach used by those operations).

Properties of a STM system

In the following I will describe the design choices available to STM systems in general and in particular I will illustrate the choices made by the Narayana STM implementation using code examples. The examples will be made available in the Narayana STM test suite so that you can also experiment with the particular properties of the implementation. Each of the examples will be using the same transactional object which is defined as follows:

    @Transactional
    public interface AtomicInt {
        int get() throws Exception;
        void set(int value) throws Exception;
    }

    public class AtomicIntImpl implements AtomicInt {
        private int state;

        @ReadLock
        public int get() throws Exception {
            return state;
        }

        @WriteLock
        public void set(int value) throws Exception {
            state = value;
        }
    }

The @Transactional annotation on the AtomicInt interface tells the system that instances of the interface are candidates to be managed by the STM system. The implementation of the interface defines a pair of methods for reading and writing the the shared state (by default all state is tracked but this default can be overridden via the @NotState annotation).

Property 1: Interaction with non transactional code

If uncommitted transactional memory updates are visible to non-transactional code and vice-versa (i.e. updates made by non-transactional code are visible to running transactions) then the isolation model is said to be weak. On the other hand if non-transactional accesses are upgraded to a transactional access then the model is said to be strong.

The weak access model, although common, can lead to data races. A data race occurs if two threads T1 and T2 access memory, T1 for writing, say, and the other for reading then the value of the memory read is indeterminate. If, for example T1 writes data inside a transaction and T2 reads that data, then if T1 aborts but T2 has made a decision based on the value it read then we have an incorrect program since aborted transactions must not have side effects (recall the "all or nothing" characteristic of atomicity).

Narayana STM follows the weak isolation model. The following test updates shared memory inside a transaction and then triggers a thread to perform non-transactional reads and writes on it while the transaction is still running. The test shows that the two threads interfere with each other producing indeterminate results:
    public void testWeakIsolation() throws Exception {
        AtomicIntImpl aiImple = new AtomicIntImpl();
        // STM is managed by Containers. Enlisting the above implementation
        // with the container returns a proxy which will enforce STM semantics
        AtomicInt ai = new RecoverableContainer().enlist(aiImple);
        AtomicAction tx = new AtomicAction();

        // set up the code that will access the memory outside of a transaction
        Thread ot = new Thread(() -> {
            try {
                synchronized (tx) {
                    tx.wait(); // for the other thread to start a transaction

                    // weak isolation implies that this thread (which is running
                    // outside of a transaction) can observe transactional updates
                    assertEquals(2, aiImple.get()); // the other thread set it to 2
                    aiImple.set(10); // this update is visible to transactional code
            } catch (Exception e) {
                e.printStackTrace();
            }
        });

        ot.start();

        ai.set(1); // initialise the shared memory
        tx.begin(); // start a transaction
        {
            ai.set(2); // conditionally set the value to 2

            synchronized (tx) {
                tx.notify(); // trigger non-transactional code to update the memory
            }

            // weak isolation means that this transactional code may see the
            // changes made by the non transactional code
            assertEquals(10, ai.get()); // the other thread set it to 10
            tx.commit(); // commit the changes made to the shared memory
        }

        // changes made by non transactional code are still visible after commit
        assertEquals(10, ai.get());
        assertEquals(aiImple.get(), ai.get());
    }

As an aside, notice in this example that the code first had to declare the shared data using the @Transactional annotation and then had to access it via a proxy returned from a RecoverableContainer. Some systems introduce new keywords into the language that demarcate the atomic blocks and in such systems any memory updates made by the atomic block would be managed by the STM implementation. That type of system takes some of the burden of ensuring correctness away from the programmer but are harder to implement (for example a common technique requires compiler extensions).

Property 2: Nested transactions

A nested transaction (the child) is one that is started in the context of an outer one (the parent). The child sees the changes made by the parent. Aborting the parent will abort each child. A parent that does not have any parents is called top level.

The effects of committing/aborting either transaction (the child or parent) and the visibility of changes depend upon which model is being used:

Flattened:

  • The parent and child transactions see each others updates.
  • If the child aborts the parent aborts too.
  • Changes made by the child only become visible to other threads when the parent commits
Pros - easy to implement
Cons - breaks composition (if the child aborts it causes all work done by the parent transaction to abort)

Closed Nested

  • Changes are hidden from the parent transaction (and from other transactions) until the child commits, at which time any changes made by the child become part of the parent transactions' set of updates (therefore, in contrast to open nested transactions, other transactions will not see the updates until the parent commits);
  • aborting the child does not abort the parent;
Pros - Is arguably the most natural model for application designers

Open Nested

  • When the child transaction commits, all other transactions see the updates even if the parent aborts which is useful if we want unrelated code to make permanent changes during the transaction even if the parent aborts.
Pros - enables work to be made permanent even if the parent aborts (for example logging code made by the child)

Narayana STM follows the closed model as is demonstrated by the following test case:
    public void testIsClosedNestedCommit() throws Exception {
        AtomicInt ai = new RecoverableContainer().enlist(new AtomicIntImpl());
        AtomicAction parent = new AtomicAction();
        AtomicAction child = new AtomicAction();

        ai.set(1); // initialise the shared memory
        parent.begin(); // start a top level transaction
        {
            ai.set(2); // update the memory in the context of the parent transaction
            child.begin(); // start a child transaction
            {
                ai.set(3); // update the memory in a child transaction
                // NB the parent would still see the value as 2
                // (not shown in this test)
                child.commit();
            }
            // since the child committed the parent should see the value as 3
            assertEquals(3, ai.get());
            // NB other transactions would not see the value 3 however until
            // the parent commits (not demonstrated in this test)
        }
        parent.commit();

        assertEquals(3, ai.get());
    }

Isolation amongst child transactions

The concept of isolation applies to nested transactions as well as to top level transactions. It seems most natural for siblings to use the same model as is used for isolation with respect to other transactions (ie transactions that are not in ancestor hierarchy of a particular child). For example the CORBA Object Transaction Service (OTS) supports the closed model and children do not see each others updates until the parent commits.

Property 3: Exception Handling

On exception the options are to either terminate or ignore the exception or to use a mixture of both where the programmer tells the system which exceptions should abort and which ones should commit the transaction which is similar to what the JTA 1.2 spec provides with its rollbackOn and dontRollbackOn annotation attributes.

The Narayana STM implementation takes the view that the programmer is best placed to make decisions about what to do under exceptional circumstances. The following test demonstrates this behaviour:
    public void testExceptionDoesNotAbort() throws Exception {
        AtomicInt ai = new RecoverableContainer().enlist(new AtomicIntImpl());
        AtomicAction tx = new AtomicAction();

        ai.set(1);
        tx.begin();
        {
            try {
                ai.set(2);
                throw new Exception();
            } catch (Exception e) {
                assertEquals(2, ai.get());
                // the transaction should still be active
                ai.set(3);
                tx.commit();
            }
        }

        assertEquals(3, ai.get());
    }

What's Next

That's all for this week. In the next instalment I will cover conflict detection and resolution, transaction granularity and concurrency control.