DSDP/DD/DSF Concurrency

DSF Concurrency Model

Version 1.0
Pawel Piech
© 2006, Wind River Systems. Release under EPL version 1.0.

Introduction

Providing a solution to concurrency problems is the primary design goal of DSF. To that end DSF imposes a rather draconian restriction on services that use it: 1) All service interface methods must be called using a single designated dispatch thread, unless explicitly stated otherwise, 2) The dispatch thread should never be used to make a blocking call (a call that waits on I/O or a call that makes a long-running computation). What the first restriction effectively means, is that the dispatch thread becomes a global "lock" that all DSF services in a given session share with each other, and which controls access to most of services' shared data. It's important to note that multi-threading is still allowed

within individual service implementation. but when crossing the service interface boundaries, only the dispatch thread can be used. The second restriction just ensures that the performance of the whole system is not killed by one service that needs to read a huge file over the network. Another way of looking at it is that the service implementations practice co-operative multi-threading using the single dispatch thread.

There are a couple of obvious side effects that result from this rule:

When executing within the dispatch thread, the state of the services is guaranteed not to change. This means that thread-defensive programming techniques, such as making duplicates of lists before iterating over them, are not necessary. Also it's possible to implement much more complicated logic which polls the state of many objects, without the worry about dead-locks.
Whenever a blocking operation needs to be performed, it must be done using an asynchronous method. By the time the operation is completed, and the caller regains the dispatch thread, this caller may need to retest the relevant state of the system, because it could change completely while the asynchronous operation was executing.

The Mechanics

java.util.concurrent.ExecutorService

DSF builds on the vast array of tools added in Java 5.0's java.util.concurrent package (see Java 5 concurrency package API for details), where the most important is the ExecutorService interface. ExecutorService is a formal interface for submitting Runnable objects that will be executed according to executor's rules, which could be to execute the Runnable immediately, within a thread pool, using a display thread, etc. For DSF, the main rule for executors is that they have to use a single thread to execute the runnable and that the runnables be executed in the order that they were submitted. To give the DSF clients and services a method for checking whether they are being called on the dispatch thread, we extended the ExecutorService interface as such:

public interface DsfExecutor extends ScheduledExecutorService
{
    /**
     * Checks if the thread that this method is called in is the same as the
     * executor's dispatch thread.
     * @return true if in DSF executor's dispatch thread
     */
    public boolean isInExecutorThread();
}

java.lang.concurrent.Future vs org.eclipse.dd.dsf.concurrent.Done

The Done object encapsulates the return value of an asynchronous call in DSF. It is actually merely a Runnable with an attached org.eclipse.core.runtime.IStatus object , but it can be extended by the services or clients to hold whatever additional data is needed. Typical pattern in how the Done object is used, is as follows:

Service:
    public class Service {
        void asyncMethod(Done done) {
            new Job() {
                public void run() {
                    // perform calculation                    
                    ... 
                    done.setStatus(new Status(IStatus.ERROR, ...));
                    fExecutor.execute(done);
                }
            }.schedule();
        }
    }

Client:
    ...
    Service service = new Service();
    final String clientData = "xyz";
    ...
    service.asynMethod(new Done() {
        public void run() {
            if (getStatus().isOK()) {
                // Handle return data
                ...
            } else {
	        // Handle error
                ...
            }
        }
    }

The service performs the asynchronous operation a background thread, but it can still submit the Done runnable with the executor. In other words, the Done and other runnables can be submitted from any thread, but will always execute in the single dispatch thread. Also if the implementation of the asyncMethod() is non-blocking, it does not need to start a job, it could just perform the operation in the dispatch thread. On the client side, care has to be taken to save appropriate state before the asynchronous method is called, because by the time the Done is executed, the client state may change.

The java.lang.concurrent package doesn't already have a Done, because the generic concurrent package is geared more towards large thread pools, where clients submit tasks to be run in a style similar to Eclipse's Jobs, rather than using the single dispatch thread model of DSF. To this end, the concurrent package does have an equivalent object, Future. Future has methods that allows the client to call the get()

method, and block while waiting for a result, and for this reason it cannot be used from the dispatch thread. But it can be used, in a limited way, by clients which are running on background thread that still need to retrieve data from synchronous DSF methods. In this case the code might look like the following:

Service:
    public class Service {
        int syncMethod() {
	    // perform calculation
            ...
            return result;
        }
    }

Client:
    ...
    DsfExecutor executor = new DsfExecutor();
    final Service service = new Service(executor);
    Future<Integer> future = executor.submit(new Callable<Integer>() {
        Integer call() {
            return service.syncMethod();
        }
    });
    int result = future.get();

The biggest drawback to using Future with DSF services, is that it does not work with asynchronous methods. This is because the Callable.call() implementation has to return a value within a single dispatch cycle. To get around this, DSF has an additional object called DsfQuery, which works like a Future combined with a Callable, but allows the implementation to make multiple dispatches before setting the return value to the client. The DsfQuery object works as follows:

Client creates the query object with its own implementation of DsfQuery.execute().
Client calls the DsfQuery.get() method on non-dispatch thread, and blocks.
The query is queued with the executor, and eventually the DsfQuery.execute() method is called on the dispatch thread.
The query DsfQuery.execute() calls synchronous and asynchronous methods that are needed to do its job.
The query code calls DsfQuery.done() method with the result.
The DsfQuery.get() method un-blocks and returns the result to the client.

[http://dsdp.eclipse.org/help/latest/topic/org.eclipse.dd.dsf.doc/reference/api/org/eclipse/dd/dsf/examples/concurrent/package-summary.html Slow Data Provider Example]

The point of DSF concurrency can be most easily explained through a practical example. Suppose there is a viewer which needs to show data that originates from a remote "provider". There is a considerable delay in transmitting the data to and from the provider, and some delay in processing the data. The viewer is a lazy-loading table, which means that it request information only about items that are visible on the screen, and as the table is scrolled, new requests for data are generated. The diagram below illustrates the logical relationship between components:

.<img alt="" title="Slow Data Provider Diagram"

src="dsf_concurrency_model-1.png" style="width: 636px; height: 128px;">

In detail, these components look like this:

Table Viewer

The table viewer is the standard org.eclipse.jface.viewers.TableViewer, created with SWT.VIRTUAL flag. It has an associated content provider, SlowDataProviderContentProvider) which handles all the interactions with the data provider. The lazy content provider operates in a very simple cycle:

Table viewer tells content provider that the input has changed by calling IContentProvider.inputChanged(). This means that the content provider has to query initial state of the data.
Next the content provider tells the viewer how many elements there are, by calling TableViewer.setItemCount().
At this point, the table resizes, and it requests data values for items that are visible. So for each visible item it calls: ILazyContentProvider.updateElement().
After calculating the value, the content provider tells the table what the value is, by calling TableViewer.replace().
If the data ever changes, the content provider tells the table to rerequest the data, by calling TableViewer.clear().

Table viewer operates in the SWT display thread, which means that the content provider must switch from the display thread to the DSF dispatch thread, whenever it is called by the table viewer, as in the example below:

    public void updateElement(final int index) {
        assert fTableViewer != null;
        if (fDataProvider == null) return;

        fDataProvider.getExecutor().execute(
            new Runnable() { public void run() {
                // Must check again, in case disposed while redispatching.
                if (fDataProvider == null) return;
                
                queryItemData(index);
            }});
    }

Likewise, when the content provider calls the table viewer, it also has to switch back into the display thread as in following example, when the content provider receives an event from the data provider, that an item value has changed.

   
    public void dataChanged(final Set<Integer> indexes) {
        // Check for dispose.
        if (fDataProvider == null) return;

        // Clear changed items in table viewer.
        if (fTableViewer != null) {
            final TableViewer tableViewer = fTableViewer;
            tableViewer.getTable().getDisplay().asyncExec(
                new Runnable() { public void run() {
                    // Check again if table wasn't disposed when 
                    // switching to the display thread.
                    if (tableViewer.getTable().isDisposed()) return; // disposed
                    for (Integer index : indexes) {
                        tableViewer.clear(index);
                    }
                }});
        }
    }

All of this switching back and forth between threads makes the code look a lot more complicated than it really is, and it takes some getting used to, but this is the price to be paid for multi-threading. Whether the participants use semaphores or the dispatch thread, the logic is equally complicated, and we believe that using a single dispatch thread, makes the synchronization very explicit and thus less error-prone.

Data Provider Service

The data provider service interface, DataProvider, is very similar to that of the lazy content provider. It has methods to:

get item count
get a value for given item
register as listener for changes in data count and data values

But this is a DSF interface, and all methods must be called on the service's dispatch thread. For this reason, the DataProvider interface returns an instance of DsfExecutor, which must be used with the interface.

Slow Data Provider

The data provider is actually implemented as a thread which is an inner class of SlowDataProvider service. The provider thread communicates with the service by reading Request objects from a shared queue, and by posting Runnable objects directly to the DsfExecutor but with a simulated transmission delay. Separately, an additional flag is also used to control the shutdown of the provider thread.

To simulate a real back end, the data provider randomly invalidates a set of items and notifies the listeners to update themselves. It also periodically invalidates the whole table and forces the clients to requery all items.

Data and Control Flow

This can be described in following steps:

The table viewer requests data for an item at a given index (SlowDataProviderContentProvider.updateElement).
The table viewer's content provider executes a Runnable in the DSF dispatch thread and calls the data provider interface (SlowDataProviderContentProvider.queryItemData).
Data provider service creates a Request object, and files it in a queue (SlowDataProvider.getItem).
Data provider thread de-queues the Request object and acts on it, calculating the value (ProviderThread.processItemRequest).
Data provider thread schedules the calculation result to be posted with DSF executor (SlowDataProvider.java:185).
The Done callback sets the result data in the table viewer (SlowDataProviderContentProvider.java:167).

Running the example and full sources

This example is implemented in the org.eclipse.dd.dsf.examples plugin, in the org.eclipse.dd.dsf.examples.concurrent package.

To run the example:

Build the test plugin (along with the org.eclipse.dsdp.DSF plugin) and launch the PDE.
Make sure to add the DSF Tests action set to your current perspective.
From the main menu, select DSF Tests -> Slow Data Provider.
A dialog will open and after a delay it will populate with data.
Scroll and resize dialog and observe the update behavior.

Initial Notes

This example is supposed to be representative of a typical embedded debugger design problem. Embedded debuggers are often slow in retrieving and processing data, and can sometimes be accessed through a relatively slow data channel, such as serial port or JTAG connection. But as such, this basic example presents a couple of major usability problems

The data provider service interface mirrors the table's content provider interface, in that it has a method to retrieve a single piece of data at a time. The result of this is visible to the user as lines of data are filled in one-by-one in the table. However, most debugger back ends are in fact capable of retrieving data in batches and are much more efficient at it than retrieving data items one-by-one.
When scrolling quickly through the table, the requests are generated by the table viewer for items which are quickly scrolled out of view, but the service still queues them up and calculates them in the order they were received. As a result, it takes a very long time for the table to be populated with data at the location where the user is looking.

These two problems are very common in creating UI for embedded debugging, and there are common patterns which can be used to solve these problems in DSF services.

Coalescing

Coalescing many single-item requests into fewer multi-item requests is the surest way to improve performance in communication with a remote debugger, although it's not necessarily the simplest. There are two basic patterns in which coalescing is achieved:

The back end provides an interface for retrieving data in large chunks. So when the service implementation receives a request for a single item, it retrieves a whole chunk of data, returns the single item, and stores the rest of the data in a local cache.
The back end providers an interface for retrieving data in variable size chunks. When the service implementation receives a request for a single item, it buffers the request, and waits for other requests to come in. After a delay, the service clears the buffer and submits a request for the combined items to the data provider.

In practice, a combination of the two patterns is needed, but for purpose of an example, we implemented the second pattern in the "Input-Coalescing Slow Data Provider" (InputCoalescingSlowDataProvider.java).

Input Buffer

The main feature of this pattern is a buffer for holding the requests before sending them to the data provider. In this example the user requests are buffered in two arrays: fGetItemIndexesBuffer and fGetItemDonesBuffer. The DataProvider.getItem() implementation is changed as follows:

 
    public void getItem(final int index, final GetDataDone<String> done) {
        // Schedule a buffer-servicing call, if one is needed.
        if (fGetItemIndexesBuffer.isEmpty()) {
            fExecutor.schedule(
                new Runnable() { public void run() {
                    fileBufferedRequests();
                }},
                COALESCING_DELAY_TIME, 
                TimeUnit.MILLISECONDS);
        }
        
        // Add the call data to the buffer.  
        // Note: it doesn't matter that the items were added to the buffer 
        // after the buffer-servicing request was scheduled.  This is because
        // the buffers are guaranteed not to be modified until this dispatch
        // cycle is over.
        fGetItemIndexesBuffer.add(index);
        fGetItemDonesBuffer.add(done);
    }

And method that services the buffer looks like this:

    public void fileBufferedRequests() { 
        // Remove a number of getItem() calls from the buffer, and combine them
        // into a request.
        int numToCoalesce = Math.min(fGetItemIndexesBuffer.size(), COALESCING_COUNT_LIMIT);
        final ItemRequest request = new ItemRequest(new Integer[numToCoalesce], new GetDataDone[numToCoalesce]); 
        for (int i = 0; i < numToCoalesce; i++) {
            request.fIndexes[i] = fGetItemIndexesBuffer.remove(0);
            request.fDones[i] = fGetItemDonesBuffer.remove(0);
        }

        // Queue the coalesced request, with the appropriate transmission delay.
        fQueue.add(request);
        
        // If there are still calls left in the buffer, execute another 
        // buffer-servicing call, but without any delay.
        if (!fGetItemIndexesBuffer.isEmpty()) {
            fExecutor.execute(new Runnable() { public void run() {
                fileBufferedRequests();
            }});
        }
    }

The most interesting feature of this implementation is the fact that there are no semaphores anywhere to control access to the input buffers. Even though the buffers are serviced with a delay and multiple clients can call the getItem() method, the use of a single dispatch thread prevents any race conditions that could corrupt the buffer data. In real-world implementations, the buffers and caches that need to be used are far more sophisticated with much more complicated logic, and this is where managing access to them using the dispatch thread is ever more important.

Cancellability

Table Viewer

Unlike coalescing, which can be implemented entirely within the service, cancellability requires that the client be modified as well to take advantage of this capability. For the table viewer content provider, this means that additional features have to be added. In CancellingSlowDataProviderContentProvider.java

ILazyContentProvider.updateElement() was changes as follows:

    public void updateElement(final int index) {
        assert fTableViewer != null;
        if (fDataProvider == null) return;
        
        // Calculate the visible index range.
        final int topIdx = fTableViewer.getTable().getTopIndex();
        final int botIdx = topIdx + getVisibleItemCount(topIdx);
        
        fCancelCallsPending.incrementAndGet();
        fDataProvider.getExecutor().execute(
            new Runnable() { public void run() {
                // Must check again, in case disposed while redispatching.
                if (fDataProvider == null || fTableViewer.getTable().isDisposed()) return;
                if (index >= topIdx && index <= botIdx) {
                    queryItemData(index);
                }
                cancelStaleRequests(topIdx, botIdx);
            }});
    }

Now the client keeps track of the requests it made to the service in fItemDataDones, and above, cancelStaleRequests() iterates through all the outstanding requests and cancels the ones that are no longer in the visible range.

Data Provider Service

The data provider implementation (CancellableInputCoalescingSlowDataProvider.java), builds on top of the coalescing data provider. To make the canceling feature useful, the data provider service has to limit the size of the request queue. This is because in this example which simulates communication with a target and once requests are filed into the request queue, they cannot be canceled, just like a client can't cancel request once it sends them over a socket. So instead, if a flood of getItem() calls comes in, the service has to hold most of them in the coalescing buffer in case the client decides to cancel them. Therefore the fileBufferedRequests() method includes a simple check before servicing the buffer, and if the request queue is full, the buffer servicing call is delayed.

     
        if (fQueue.size() >= REQUEST_QUEUE_SIZE_LIMIT) {
            if (fGetItemIndexesBuffer.isEmpty()) {
                fExecutor.schedule(
                    new Runnable() { public void run() {
                        fileBufferedRequests();
                    }},
                    REQUEST_BUFFER_FULL_RETRY_DELAY, 
                    TimeUnit.MILLISECONDS);
            }
            return;
        }

Beyond this change, the only other significant change is that before the requests are queued, they are checked for cancellation.

Final Notes

The example given here is fairly simplistic, and chances are that the same example could be implemented using semaphores and free threading with perhaps fewer lines of code. But what we have found is that as the problem gets bigger, the amount of features in the data provider increases, the state of the communication protocol gets more complicated, and the number of modules needed in the service layer increases, using free threading and semaphores does not safely scale. Using a dispatch thread for synchronization certainly doesn't make the inherent problems of the system less complicated, but it does help eliminate the race conditions and deadlocks from the overall system.

Coalescing and Cancellability are both optimizations. Neither of these optimizations affected the original interface of the service, and one of them only needed a service-side modification. But as with all optimizations, it is often better to first make sure that the whole system is working correctly and then add optimizations where they can make the biggest difference in user experience.

The above examples of optimizations can take many forms, and as mentioned with coalescing, caching data that is retrieved from the data provider is the most common form of data coalescing. For cancellation, many services in DSF build on top of other services, which means that even a low-level service can cause a higher level service to retrieve data, while another event might cause it to cancel those requests. The perfect example of this is a Variables service, which is responsible for calculating the value of expressions shown in the Variables view. The Variables service reacts to the Run Control service, which issues a suspended event and then requests a set of variables to be evaluated by the debugger back end. But as soon as a resumed event is issued by Run Control, the Variables service needs to cancel the pending evaluation requests.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

DSDP/DD/DSF Concurrency

Contents

DSF Concurrency Model

Introduction

The Mechanics

java.util.concurrent.ExecutorService

java.lang.concurrent.Future vs org.eclipse.dd.dsf.concurrent.Done

[http://dsdp.eclipse.org/help/latest/topic/org.eclipse.dd.dsf.doc/reference/api/org/eclipse/dd/dsf/examples/concurrent/package-summary.html Slow Data Provider Example]

Data and Control Flow

Running the example and full sources

Initial Notes

Coalescing

Cancellability

Final Notes

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

DSDP/DD/DSF Concurrency

Contents

DSF Concurrency Model

Introduction

The Mechanics

java.util.concurrent.ExecutorService

java.lang.concurrent.Future vs org.eclipse.dd.dsf.concurrent.Done

[http://dsdp.eclipse.org/help/latest/topic/org.eclipse.dd.dsf.doc/reference/api/org/eclipse/dd/dsf/examples/concurrent/package-summary.html Slow Data Provider Example]

Data and Control Flow

Running the example and full sources

Initial Notes

Coalescing

Cancellability

Final Notes