Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
DSDP/DD/DSF Concurrency
Contents
DSF Concurrency Model
Version
1.0
Pawel Piech
© 2006, Wind River Systems. Release
under EPL version 1.0.
Introduction
Providing a solution to concurrency problems is the primary design goal of DSF. To that end DSF imposes a rather draconian restriction on services that use it: 1) All service interface methods must be called using a single designated dispatch thread, unless explicitly stated otherwise, 2) The dispatch thread should never be used to make a blocking call (a call that waits on I/O or a call that makes a long-running computation). What the first restriction effectively means, is that the dispatch thread becomes a global "lock" that all DSF services in a given session share with each other, and which controls access to most of services' shared data. It's important to note that multi-threading is still allowed
within individual service implementation. but when crossing the service
interface boundaries, only the dispatch thread can be used. The
second restriction just ensures that the performance of the whole
system is not killed by one service that needs to read a huge file over
the network. Another way of looking at it is that the
service implementations practice co-operative multi-threading using the
single dispatch thread.
There are a couple of obvious side effects that result from this rule:
- When executing within the dispatch thread, the state of the services is guaranteed not to change. This means that thread-defensive programming techniques, such as making duplicates of lists before iterating over them, are not necessary. Also it's possible to implement much more complicated logic which polls the state of many objects, without the worry about dead-locks.
- Whenever a blocking operation needs to be performed, it must be done using an asynchronous method. By the time the operation is completed, and the caller regains the dispatch thread, this caller may need to retest the relevant state of the system, because it could change completely while the asynchronous operation was executing.
The Mechanics
java.util.concurrent.ExecutorService
DSF builds on the vast array of tools added in Java 5.0's
java.util.concurrent package (see Java 5 concurrency package API
for details), where the most important is the ExecutorService
interface. ExecutorService
is a formal interface for submitting Runnable objects that will be
executed according to executor's rules, which could be to execute the
Runnable immediately,
within a thread pool, using a display thread,
etc. For DSF, the main rule for executors is that they have
to use a single thread to execute the runnable and that the runnables
be executed in the order that they were submitted. To give the
DSF clients and services a method for checking whether they are
being called on the dispatch thread, we extended the ExecutorService
interface as such:
public interface DsfExecutor extends ScheduledExecutorService { /** * Checks if the thread that this method is called in is the same as the * executor's dispatch thread. * @return true if in DSF executor's dispatch thread */ public boolean isInExecutorThread(); }
java.lang.concurrent.Future vs org.eclipse.dd.dsf.concurrent.Done
The Done object
encapsulates the return value of an asynchronous call in DSF. It
is actually merely a Runnable with
an attached org.eclipse.core.runtime.IStatus
object , but it can be extended by the services or clients to hold
whatever additional data is needed. Typical pattern in how
the Done object is used,
is as follows:
Service: public class Service { void asyncMethod(Done done) { new Job() { public void run() { // perform calculation ... done.setStatus(new Status(IStatus.ERROR, ...)); fExecutor.execute(done); } }.schedule(); } } Client: ... Service service = new Service(); final String clientData = "xyz"; ... service.asynMethod(new Done() { public void run() { if (getStatus().isOK()) { // Handle return data ... } else { // Handle error ... } } }
The service performs the asynchronous operation a background thread,
but
it can still submit the Done runnable
with the executor. In other words, the Done and other runnables can be
submitted from any thread, but will always execute in the single
dispatch thread. Also if the implementation of the asyncMethod() is non-blocking,
it does not need to start a job, it could just perform the operation in
the dispatch thread. On the client side, care has to be taken to
save appropriate state before the asynchronous method is called,
because by the time the Done is
executed, the client state may change.
The java.lang.concurrent
package
doesn't already have a Done,
because the generic concurrent
package is geared more towards large thread pools, where clients submit
tasks to be run in a style similar to Eclipse's Jobs, rather than using
the single dispatch thread model of DSF. To this end, the
concurrent package does have an equivalent object, Future.
Future has methods that
allows the client to call the get()
method, and block while waiting for a result, and for this reason it cannot be used from the dispatch thread. But it can be used, in a limited way, by clients which are running on background thread that still need to retrieve data from synchronous DSF methods. In this case the code might look like the following:
Service: public class Service { int syncMethod() { // perform calculation ... return result; } } Client: ... DsfExecutor executor = new DsfExecutor(); final Service service = new Service(executor); Future<Integer> future = executor.submit(new Callable<Integer>() { Integer call() { return service.syncMethod(); } }); int result = future.get();
The biggest drawback to using Future
with DSF services, is that it does not work with
asynchronous methods. This is because the Callable.call()
implementation
has to return a value within a single dispatch cycle. To get
around this, DSF has an additional object called DsfQuery, which works like a Future combined with a Callable, but allows the
implementation to make multiple dispatches before setting the return
value to the client. The DsfQuery object works as follows:
- Client creates the query object with its own implementation of DsfQuery.execute().
- Client calls the DsfQuery.get() method on non-dispatch thread, and blocks.
- The query is queued with the executor, and eventually the DsfQuery.execute() method is called on the dispatch thread.
- The query DsfQuery.execute() calls synchronous and asynchronous methods that are needed to do its job.
- The query code calls DsfQuery.done() method with the result.
- The DsfQuery.get()
method un-blocks and returns the result to the client.
[http://dsdp.eclipse.org/help/latest/topic/org.eclipse.dd.dsf.doc/reference/api/org/eclipse/dd/dsf/examples/concurrent/package-summary.html Slow Data Provider Example]
The point of DSF concurrency can be most easily explained through
a practical example. Suppose there is a viewer which needs to
show data that originates from a remote "provider". There is a
considerable delay in transmitting the data to and from the provider,
and some delay in processing the data. The viewer is a
lazy-loading table, which means that it request information only about
items that are visible on the screen, and as the table is scrolled, new
requests for data are generated. The diagram below illustrates
the
logical relationship between components:
.<img alt="" title="Slow Data Provider Diagram"
src="dsf_concurrency_model-1.png" style="width: 636px; height: 128px;">
In detail, these components look like this:
Table Viewer
The table viewer is the standard org.eclipse.jface.viewers.TableViewer, created with SWT.VIRTUAL flag. It has an associated content provider, SlowDataProviderContentProvider) which handles all the interactions with the data provider. The lazy content provider operates in a very simple cycle:
- Table viewer tells content provider that the input has changed by calling IContentProvider.inputChanged(). This means that the content provider has to query initial state of the data.
- Next the content provider tells the viewer how many elements there are, by calling TableViewer.setItemCount().
- At this point, the table resizes, and it requests data values for items that are visible. So for each visible item it calls: ILazyContentProvider.updateElement().
- After calculating the value, the content provider tells the table what the value is, by calling TableViewer.replace().
- If the data ever changes, the content provider tells the table to rerequest the data, by calling TableViewer.clear().
Table viewer operates in the
SWT display thread, which means that the content provider must switch
from the display thread to the DSF dispatch thread, whenever it is
called by the table viewer, as in the example below:
public void updateElement(final int index) { assert fTableViewer != null; if (fDataProvider == null) return; fDataProvider.getExecutor().execute( new Runnable() { public void run() { // Must check again, in case disposed while redispatching. if (fDataProvider == null) return; queryItemData(index); }}); }
Likewise, when the content provider calls the table viewer, it also has
to switch back into the display thread as in following example, when
the content provider receives an event from the data provider, that an
item value has changed.
public void dataChanged(final Set<Integer> indexes) { // Check for dispose. if (fDataProvider == null) return; // Clear changed items in table viewer. if (fTableViewer != null) { final TableViewer tableViewer = fTableViewer; tableViewer.getTable().getDisplay().asyncExec( new Runnable() { public void run() { // Check again if table wasn't disposed when // switching to the display thread. if (tableViewer.getTable().isDisposed()) return; // disposed for (Integer index : indexes) { tableViewer.clear(index); } }}); } }
All of this switching back and forth between threads makes the code
look a lot more complicated than it really is, and it takes some
getting used to, but this is the price to be paid for multi-threading.
Whether the participants use semaphores or the dispatch thread, the
logic is equally complicated, and we believe that using a single
dispatch thread, makes the synchronization very explicit and thus less
error-prone.
Data Provider Service
The data provider service interface, DataProvider, is very similar to that of the lazy content provider. It has methods to:
- get item count
- get a value for given item
- register as listener for changes in data count and data values
But this is a DSF interface, and all methods must be called on the
service's dispatch thread. For this reason, the DataProvider interface returns
an instance of DsfExecutor,
which must be used with the interface.
Slow Data Provider
The data provider is actually implemented as a thread which is an inner class of SlowDataProvider service. The provider thread communicates with the service by reading Request objects from a shared queue, and by posting Runnable objects directly to the DsfExecutor but with a simulated transmission delay. Separately, an additional flag is also used to control the shutdown of the provider thread.
To simulate a real back end, the data provider randomly invalidates a
set of items and notifies the listeners to update themselves. It
also periodically invalidates the whole table and forces the clients to
requery all items.
Data and Control Flow
This can be described in following steps:
- The table viewer requests data for an item at a given index (SlowDataProviderContentProvider.updateElement).
- The table viewer's content provider executes a Runnable in the DSF dispatch thread and calls the data provider interface (SlowDataProviderContentProvider.queryItemData).
- Data provider service creates a Request object, and files it in a queue (SlowDataProvider.getItem).
- Data provider thread de-queues the Request object and acts on it, calculating the value (ProviderThread.processItemRequest).
- Data provider thread schedules the calculation result to be posted with DSF executor (SlowDataProvider.java:185).
- The Done callback sets the result data in the table viewer (SlowDataProviderContentProvider.java:167).
Running the example and full sources
This example is implemented in the org.eclipse.dd.dsf.examples
plugin, in the org.eclipse.dd.dsf.examples.concurrent
package.
To run the example:
- Build the test plugin (along with the org.eclipse.dsdp.DSF plugin)
and launch the PDE.
- Make sure to add the DSF Tests action set to your current perspective.
- From the main menu, select DSF Tests -> Slow Data Provider.
- A dialog will open and after a delay it will populate with data.
- Scroll and resize dialog and observe the update behavior.
Initial Notes
This example is supposed to be representative of a typical embedded
debugger design problem. Embedded debuggers are often slow in
retrieving and processing data, and can sometimes be accessed through a
relatively slow data channel, such as serial port or JTAG
connection. But as such, this basic example presents a couple
of major usability problems
- The data provider service interface mirrors the table's content provider interface, in that it has a method to retrieve a single piece of data at a time. The result of this is visible to the user as lines of data are filled in one-by-one in the table. However, most debugger back ends are in fact capable of retrieving data in batches and are much more efficient at it than retrieving data items one-by-one.
- When scrolling quickly through the table, the requests are
generated by the table viewer for items which are quickly scrolled out
of view, but the service still queues them up and calculates them in
the order they were received. As a result, it takes a very long
time for the table to be populated with data at the location where the
user is looking.
These two problems are very common in creating UI for embedded
debugging, and there are common patterns which can be used to solve
these problems in DSF services.
Coalescing
Coalescing many single-item requests into fewer multi-item requests is
the surest way to improve performance in communication with a remote
debugger, although it's not necessarily the simplest. There are
two basic patterns in which coalescing is achieved:
- The back end provides an interface for retrieving data in large chunks. So when the service implementation receives a request for a single item, it retrieves a whole chunk of data, returns the single item, and stores the rest of the data in a local cache.
- The back end providers an interface for retrieving data in variable size chunks. When the service implementation receives a request for a single item, it buffers the request, and waits for other requests to come in. After a delay, the service clears the buffer and submits a request for the combined items to the data provider.
In practice, a combination of the two patterns is needed, but for purpose of an example, we implemented the second pattern in the "Input-Coalescing Slow Data Provider" (InputCoalescingSlowDataProvider.java).
Input Buffer
The main feature of this pattern is a buffer for holding the requests before sending them to the data provider. In this example the user requests are buffered in two arrays: fGetItemIndexesBuffer and fGetItemDonesBuffer. The DataProvider.getItem() implementation is changed as follows:
public void getItem(final int index, final GetDataDone<String> done) { // Schedule a buffer-servicing call, if one is needed. if (fGetItemIndexesBuffer.isEmpty()) { fExecutor.schedule( new Runnable() { public void run() { fileBufferedRequests(); }}, COALESCING_DELAY_TIME, TimeUnit.MILLISECONDS); } // Add the call data to the buffer. // Note: it doesn't matter that the items were added to the buffer // after the buffer-servicing request was scheduled. This is because // the buffers are guaranteed not to be modified until this dispatch // cycle is over. fGetItemIndexesBuffer.add(index); fGetItemDonesBuffer.add(done); }
And method that services the buffer looks like this:
public void fileBufferedRequests() { // Remove a number of getItem() calls from the buffer, and combine them // into a request. int numToCoalesce = Math.min(fGetItemIndexesBuffer.size(), COALESCING_COUNT_LIMIT); final ItemRequest request = new ItemRequest(new Integer[numToCoalesce], new GetDataDone[numToCoalesce]); for (int i = 0; i < numToCoalesce; i++) { request.fIndexes[i] = fGetItemIndexesBuffer.remove(0); request.fDones[i] = fGetItemDonesBuffer.remove(0); } // Queue the coalesced request, with the appropriate transmission delay. fQueue.add(request); // If there are still calls left in the buffer, execute another // buffer-servicing call, but without any delay. if (!fGetItemIndexesBuffer.isEmpty()) { fExecutor.execute(new Runnable() { public void run() { fileBufferedRequests(); }}); } }
The most interesting feature of this implementation is the fact that
there are no semaphores anywhere to control access to the input
buffers. Even though the buffers are serviced with a delay and
multiple clients can call the getItem()
method, the use of a single
dispatch thread prevents any race conditions that could corrupt the
buffer data. In real-world implementations, the buffers and
caches that need to be used are far more sophisticated with much more
complicated logic, and this is where managing access to them using the
dispatch thread is ever more important.
Cancellability
Table Viewer
Unlike coalescing, which can be implemented entirely within the service, cancellability requires that the client be modified as well to take advantage of this capability. For the table viewer content provider, this means that additional features have to be added. In CancellingSlowDataProviderContentProvider.java
ILazyContentProvider.updateElement()
was changes as follows:
public void updateElement(final int index) { assert fTableViewer != null; if (fDataProvider == null) return; // Calculate the visible index range. final int topIdx = fTableViewer.getTable().getTopIndex(); final int botIdx = topIdx + getVisibleItemCount(topIdx); fCancelCallsPending.incrementAndGet(); fDataProvider.getExecutor().execute( new Runnable() { public void run() { // Must check again, in case disposed while redispatching. if (fDataProvider == null || fTableViewer.getTable().isDisposed()) return; if (index >= topIdx && index <= botIdx) { queryItemData(index); } cancelStaleRequests(topIdx, botIdx); }}); }
Now the client keeps track of the requests it made to the service in fItemDataDones, and above, cancelStaleRequests() iterates
through all the outstanding requests and cancels the ones that are no
longer in the visible range.
Data Provider Service
The data provider implementation (CancellableInputCoalescingSlowDataProvider.java), builds on top of the coalescing data provider. To make the canceling feature useful, the data provider service has to limit the size of the request queue. This is because in this example which simulates communication with a target and once requests are filed into the request queue, they cannot be canceled, just like a client can't cancel request once it sends them over a socket. So instead, if a flood of getItem() calls comes in, the service has to hold most of them in the coalescing buffer in case the client decides to cancel them. Therefore the fileBufferedRequests() method includes a simple check before servicing the buffer, and if the request queue is full, the buffer servicing call is delayed.
if (fQueue.size() >= REQUEST_QUEUE_SIZE_LIMIT) { if (fGetItemIndexesBuffer.isEmpty()) { fExecutor.schedule( new Runnable() { public void run() { fileBufferedRequests(); }}, REQUEST_BUFFER_FULL_RETRY_DELAY, TimeUnit.MILLISECONDS); } return; }
Beyond this change, the only other significant change is that before
the requests are queued, they are checked for cancellation.
Final Notes
The example given here is fairly simplistic, and chances are that the
same example could be implemented using semaphores and free threading
with perhaps fewer lines of code. But what we have found is that
as the problem gets bigger, the amount of
features in the data provider increases, the state of the
communication protocol gets more complicated, and the number of modules
needed in the service layer increases, using free threading and
semaphores does not safely scale. Using a dispatch thread for
synchronization certainly doesn't make the inherent problems of the
system less complicated, but it does help eliminate the race conditions
and deadlocks from the overall system.
Coalescing and Cancellability are both optimizations. Neither of these optimizations affected the original interface of the service, and one of them only needed a service-side modification. But as with all optimizations, it is often better to first make sure that the whole system is working correctly and then add optimizations where they can make the biggest difference in user experience.
The above examples of optimizations can take many forms, and as
mentioned with coalescing, caching data that is retrieved from the data
provider is the most common form of data coalescing. For
cancellation, many services in DSF build on top of other services,
which means that even a low-level service can cause a higher
level service to retrieve data, while another event might cause it to
cancel those requests. The perfect example of this is a Variables
service, which is responsible for calculating the value of expressions
shown in the Variables view. The Variables service reacts to the
Run Control service, which issues a suspended event and then requests a
set of variables to be evaluated by the debugger back end. But as
soon as a resumed event is issued by Run Control, the Variables service
needs to cancel the pending evaluation requests.