Saturday, March 19, 2011

Concurrency control

We managed to gloss over a few important points during the discussion on Isolation. Let's dig into them here.

How can you ensure isolation? Well typically this is covered by what's referred to as concurrency control for the resources within the transaction. A very simple and widely used approach is to regard all operations on resources (objects) to be of type ‘read’ or ‘write’, which follow the synchronization rule permitting ‘concurrent reads' but exclusive ‘writes’. This rule is imposed by requiring that any computation intending to perform an operation that is of type read (write) on an object, first acquire a ‘read lock’ (‘write lock’) associated with that object. A read lock on an object can be held concurrently by many computations provided no computation is holding a write lock on that object. A write lock on an object, on the other hand, can only be held by a computation provided no other computation is holding a read or a write lock. (Note, although we'll talk in terms of locks, this should not be used to infer a specific implementation. Timestamp-based concurrency control could just as easily be used, for example.)

In order to ensure the atomicity property, all computations must follow a ‘two–phase’ locking policy. During the first phase, termed the growing phase, a computation can acquire locks, but not release them. The tail end of the computation constitutes the shrinking phase, during which time held locks can be released but no locks can be acquired. Now suppose that a computation in its shrinking phase is to be rolled back, and that some objects with write locks have already been released. If some of these objects have been locked by other computations, then abortion of the computation will require these computations to be aborted as well. To avoid this cascade roll back problem, it is necessary to make the shrinking phase ‘instantaneous’.

Most transaction systems utilize what is commonly referred to as pessimistic concurrency control mechanisms: in essence, whenever a data structure or other transactional resource is accessed, a lock is obtained on it as described earlier. This lock will remain held on that resource for the duration of the transaction and the benefit of this is that other users will not be able to modify (and possibly not even observe) the resource until the holding transaction has terminated. There are a number of disadvantages of this style: (i) the overhead of acquiring and maintaining concurrency control information in an environment where conflict or data sharing is not high, (ii) deadlocks may occur, where one user waits for another to release a lock not realizing that that user is waiting for the release of a lock held by the first.

Therefore, optimistic concurrency control assumes that conflicts are not high and tries to ensure locks are held only for brief periods of time: essentially locks are only acquired at the end of the transaction when it is about to terminate. This kind of concurrency control requires a means to detect if an update to a resource does conflict with any updates that may have occurred in the interim and how to recover from such conflicts. Typically detection will happen using timestamps, whereby the system takes a snapshot of the timestamps associated with resources it is about to use or modify and compares them with the timestamps available when the transaction commits.

Resolution of conflicts is a different problem entirely, since in order to do so requires semantic information about the resources concerned. Therefore, most transaction systems that offer optimistic schemes will typically cause the detecting transaction to roll back and the application must retry, this time with new data. Obviously this may result in a lot of work being lost, especially if the transaction that rolls back has been running for some time.

Assuming both optimistic and pessimistic concurrency control are available to you (and they may not be), then which one to use is up to you. A close examination of the environment in which the application and transactional resources reside is necessary to determine whether a) shared access to resources occurs and b) the relative probability that sharing will cause a transaction to roll back. This might very well not be a black or white choice and may change over the lifetime of your objects or application. Certainly the use of different concurrency control schemes can be important when trying to improve the throughput of user requests and committed transactions, so it’s well worth considering and understanding the issues involved.
Type specific concurrency control

Another possible enhancement is to introduce type specific concurrency control, which is a particularly attractive means of increasing the concurrency in a system (and yes, it's supported in JBossTS). Concurrent read/write or write/write operations are permitted on an object from different transactions provided these operations can be shown to be non-interfering (for example, for a directory object, reading and deleting different entries can be permitted to take place simultaneously). Object-oriented systems are well suited to this approach, since semantic knowledge about the operations of objects can be exploited to control permissible concurrency within objects. Additional work may be needed when working with procedural systems.

Finally, what about deadlocks? When multiple transactions compete for the same resources in conflicting modes (locks), it is likely that some of them will fail to acquire those resources. If a transaction that cannot acquire a lock on a resource waits for it to be released, then that transaction is blocked – no forward progress can be made until the lock has been acquired. In some environments, it is possible for some transactions to be waiting for each other, where each of them is blocked and is also blocking another transaction. In this situation, none of the transactions can proceed and the system is deadlocked.

For example, let’s consider two transactions T1 and T2 that operate on two resources X and Y. Let’s assume that the execution of the operations involved in these transactions is:

T1: read(X); write(Y)
T2: read(Y); write(X)

If the serial execution of these transactions were to result in:

readT1(X); readT2(Y); writeT2(X); readT1(Y)

Note, readT1 means the read operation performed by T1 etc.

Assume that T1 obtained a read lock on X and then T2 gets a read lock on Y – possible because these operations aren’t conflicting and can thus occur in parallel. However, when T2 comes to write to X its attempt to get a write lock on X will block because T1 still holds its read lock. Likewise, T1’s attempt to get a write lock on Y will block because of the read lock that T2 holds. Each transaction is blocked waiting for the release of the others read lock before they can progress: they are deadlocked.
The only way for the deadlock to be resolved is for at least one of the transactions to release its locks that are blocking another transaction. Obviously such a transaction cannot commit (it has not been able to perform all of its work since it was blocked); therefore, it must roll back.

Deadlock detection and prevention is complicated enough in a non-distributed environment without then including the extra complexity of distribution. In general, most transaction systems allow deadlocks to occur, simply because to do otherwise can be too restrictive for applications. There are several techniques for deadlock detection, but the two most popular are:

  • Timeout-based: if a transaction has been waiting longer than a specified period of time, the transaction system will automatically roll back the transaction on the assumption it is deadlocked. The main advantage of this approach is that it is easy to implement in a distributed environment; the main disadvantage is that some transactions may execute for longer than expected and be rolled back when they are not in fact deadlocked.

  • Graph-based: this explicitly tracks waiting transaction dependencies by constructing a waits-for graph: nodes are waiting transactions and edges are waiting situations. The main advantage of this approach is that it is guaranteed to detect all deadlocks, whereas the main disadvantage is that in a distributed environment is can be costly to execute.

A slight variation on the timeout-based approach exists in some transaction systems, where timeouts can be associated with lock acquisition, such that the system will only block for the specified period of time. If the lock has not been acquired by the time this period elapses, it returns control to the application indicating that the lock has not been obtained. It is then up to the application to determine what to do; for example, it may be possible to acquire the required data elsewhere or to ask for a lock in a different mode. The advantage of this approach is that a transaction is not automatically rolled back if it cannot acquire a lock, possibly saving the application lots of valuable time; the disadvantage is that it requires additional effort on behalf of the application to resolve lock acquisition failures. However, for many objects and applications, this is precisely the best place to resolve lock conflicts because this is the only place where the semantic information exists to know what (if anything) can be done to resolve such conflicts.

No comments: