RT Shared Editing

From Eclipsepedia

Jump to: navigation, search

Project Lead: Mustafa K. Isik

Mentor(s): Scott Lewis, Ken Gilmer

Contents

Motivation

The RT Shared Editor, which I'll dub Cola (collaborate) for now, is supposed to be a tool enabling developers to reap the benefits of pair programming within the Eclipse IDE.

The term pair programming describes an activity in which two developers simultaneously work on a single development machine.

Even though not new to the development community, pair programming has witnessed a significant rise in adoption over the last years. One of the main reasons being the inclusion to the set of eXtreme programming (aka XP) practices. Thus pair programming is especially, but certainly not only, popular among developers utilizing the values, principles and practices propagated by the XP methodology.

The reasoning behind pair programming, as articulated by software developer, author and XP figurehead Kent Beck (2005 p.42) is manifold and pair programmer's activities are described as

  • keep each other focused and on task
  • brainstorm refinements to the system
  • clarify ideas
  • take initiative when their partner is stuck, lowering frustration
  • hold each other accountable to the team's practices

From my own experience I can add, that programming in pairs has proved to be beneficial in

  • widening developers' horizons and perspectives on perceiving and tackling problems
  • increasing trust in one's own skills and generating awareness for areas requiring improvement
  • learning from more advanced developers
    • improving through working with less experienced devs, motivating to reflect on the essence of one's own practices
  • refining one's own communication skills

Current Situation

Tool-support for pair programming in Eclipse is pretty much non-existent. Pair programming sessions, as spontaneous as they might be arranged in some teams, usually are set up for longer periods of time, ranging anywhere from an hour to some four or five. Pairs consist of locally available developers.

Limitations & Problems

Geographical limitations do not permit for simple pair programming. Since individuals are required to sit in front of the same machine, they apparently have to be located at offices close to each other. Thus software development, not being a regionally bound activity, as proven by the sustained success and advances of open-source software and the nature of many open-source teams, has to be carried out without utilizing effective pair programming in many cases.

Even when developers are located at the same work site, the effort to get two people set up together in front of a computer (drinks, resources, etc.) proves not to be worth for short programming or reviewing tasks, which sometimes are all that is required. This does not mean that coders do not occasionally sit down for such quick tasks, but from personal experience not as often as it might be beneficial for oneself and/or the code.

Traditional instant messaging software does not lend itself for code communication among developers. Such software generally proves to be fairly feature-limited and catering to different target audiences. Theoretically it would lend itself for sending back and forth short code snippets at best, practically large-scale adoption of such behaviour has yet to be seen.

Open-plan offices or group cube farm layouts do not support developers in utilizing pair programming benefits either, quickly degrading the workspace noise-level to that of an airport lobby (by the way, the true problem here would be of course the office layout, but the scope of changing that is unfortunately usually beyond the individual developer).

Furthermore developers, who usually keep book resources in their personal work environment, end up being stripped of those for half of the time they are pair programming.

Resolution

Cola is supposed to support collaborative work on code from disparate locations, minimizing organizational impact and improving on flexibility issues concerning pair programming.

Developers are intended to be able to work simultaneously on a single source file, viewing changes made by other contributors in real-time and editing the most up-to-date version of the file at all times.

The initial incarnation of Cola is to materialize in the form of an Eclipse IDE plug-in.

Consistency Maintenance

Maintaining consistency among shared data in a distributed real-time editing environment is of prime importance.

Over the last decade a lot of thought has been dedicated to the research of domain-specific issues in real-time collaborative groupware systems. Chengzheng Sun and Clarence Ellis have authored a paper (Sun & Ellis 1998) providing a thorough overview of such. The subsequent discussion of challenges specific to real-time collaborative editing is based on the referenced paper and intended to be self-sustaining in that you will not have to dig into academic research material to understand the challenges at hand. Figure 1 shows a somewhat idealistic scenario where communication lag between several editing sites remains without undesired consequences because operations are generated and executed at all sites in an orderly fashion. In a realistic distributed editing scenario, as depicted in Figure 2, the need for consistency maintenance becomes apparent.

Challenges

Divergence

Due to dependencies between operations originating from different editors on a shared document and suffering from propagation lag in a distributed environment, shared data state at different sites can divert from each other. This holds especially true for operations that are not commutative in execution order.

divergence in Figure 3
site 1 state site 2 state site 3 state
A:insert('a') a A:insert('a') a A:insert('a') a
C:insert('c') ac B:insert('b') ab B:insert('b') ab
B:insert('b') acb C:insert('c') abc C:insert('c') abc

Causality-violation

Each editing user's changes need to be communicated to the other editing sites. Independent from message generation times, notification of changes may arrive out-of-order. The resulting artifact for dependent operations is referred to as causality-violation. The order in which operations A and B in Figure 2 arrive at site 3 visualizes a typical case of causality-violation. Operation B is generated at site 2 after the arrival and execution of operation A. The reversed arrival order of both operations at site 3 may result in an undefined operation B at site 3 (e.g. B is supposed to delete an insertion commited by A). Depending on the underlying communications protocol even the in-order arrival of editing operations originating from the same site might not be guaranteed.

Intention-violation

Intention-violation is different from causality-violation in that it refers to the problems that arise when executing an operation on a document state altered by operations not having been executed at the operation's generation site.

Operation C in Figure 2 is defined/generated at site 3 on a document state neither affected by operation A nor B. Upon arrival and execution at sites 2 and 3 the respective document states have already been changed by operations A and B and can cause operation C to commit an unintended and undesired change.

In contrast to the divergence-problem, intention-violation cannot be resolved by a simple serialization protocol.

Resolution Approach

Look at the Sky & Spot JUPITER

A closer inspection of a paper titled "High-Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System" (Nichols et al 1995) reveals a very interesting approach to solving distributed-state synchronization problems in systems with more than two participating sites.

Even though functionality-wise the more advanced approaches described in (Sun & Ellis 1998) also represent solutions to the challenges in realizing Cola, (Nichols et al 1995) makes assumptions perfectly feasible for Cola's case that (in relative terms) greatly simplify the required concurrency-control algorithm.

Client-Server Communications

Much of the complexity of for instance the GROVE and REDUCE collaboration models stem from the fact, that they are designed for handling arbitrary communication-paths between each editing site. The JUPITER collaboration system on the other hand is designed around a central server. Utilizing the resulting communications topology, conflicting messages are resolved on the level of 2-way messaging between specific client - central server as opposed to n-way communications.

Concerning causality-violation, communications being effectively limited to 2-way / routed through the server document state, out-of-order arrival of messages adhering to a causal order, such as deletions referring to prior insertions, is prohibited. The central ordering instance, the server editing site, ensures, that all sites are updated in the right causal order.

Network Protocol

Considering operations originating from the same site, causality-violation of this specific type or more general reversed messaging, can be prohibited by using a network protocol not permitting for out-of-order communication of operations to the receiving site (either client or server). Cola's network protocol will be chosen with respect to satisfaction of this property.

Optimistic Change Application

Responsiveness and immediate user feedback are important and expected features in an editor, therefore user operations are applied immediately to the local document state without awaiting server approval or undergoing any modifications.

Conflict Resolution via Operational Transformations

Even with the client-server topology in place and a 2-way messaging protocol ordering communications, preventing the out-of-order arrival of messages causing causality-violation, intention-violation can still occur.

This becomes apparent when considering, that causality-violation is due to remote messages either from different sites or originating from a single site crossing on the wire on their way to the unaltered, i.e. consistent in document state concerning all prior operations, recipient. Ensuring the ordered arrival and execution of operations at the receiver site suffices to maintain consistency in document state.

Intention-violation on the other hand describes problems that occur due to operations being intended for execution on a certain document state, which is not available upon their arrival at the receiver. Locally generated and executed operations have altered the receiving site's document state during the transmission of remote operations. In this scenario, best described as mutually directed messages crossing on the wire, the same problem applies the other way around when exchanging sender and receiver labels on the sites.

Application of operations on a document state different from the one they'd been intended for, bears the danger of intention violation, ranging from insertions at wrong places to deletion of wrong sections. Dropping and rolling back operations, in such a highly interactive application domain, is not an option, especially since local operations are executed immediately.

Therefore intention-preserving transformations of such operations, i.e. operational transformations, are key to Cola's resolution approach.

basic 2-way communication is of the form

  • client generates operation
  • client executes operation locally
  • client notifies server of operation
  • server receives operation
  • in case of conflict: server transforms operation
  • server executes operation locally
  • server notifies all other clients of operation

for every receiving client

  • client receives operation
  • in case of conflict: client transforms operation
  • client executes operation locally


Presenting operational transformations as a means for conflict resolution, several immediate concerns arise

  1. what are cola's operations on which to perform transformations
  2. how do conflicting scenarios look like
  3. what do operational transformations look like and
  4. how are conflicts in document state to be detected in order to resolve them via transformations on operations?

Cola models all editing operations on documents as atomic deletions and insertions of single characters. Editing operations on more than a single char are broken down to atomic operations, as listed below.

→ del(position)
→ del(from_position, to_position):=
        for(int i = from_position; i <= to_position; i++){
         del(i); 
        }
→ ins(position, char)
→ ins(from_position, string):=
        for(int i = 0; i < string.length; i++){
         ins(from_position + i, string[i]);
        }


An example serves best to illustrate the features demanded of operational transformations.

conflicting insertions
remote op. local op. state @ client 1 state @ client 2 local op. remote op.
ins(2,O) CRON CRON ins(5,A)
ins(5,A) CORON CRONA ins(2,O)
COROAN CORONA

The user at site 2 intends to insert an A after the last N. When the corresponding operation he issued and which was successfully applied to his local document, arrives at site 1 the document state has been altered by the insertion of another character at a lower index, thus shifting all subsequent characters' indeces by one. The untransformed execution of the insertion from site 2 on site 1's document results in intention-violation. The obvious solution in this case would be to increment the insertion op's index by one and executing ins(6,O)

Building on the information provided by the knowledge of the document states being one operation apart and knowing which two operations are intended for execution, abstraction and generalization lead to the specific operational transformation for two conflicting insertions.

With regard to the conflicting insertions example, and the type of relation defined by transformations on operations, it becomes apparent that the transformation can be precisely described in terms of a function taking two conflicting operations at a time as input parameters and delivering two output operations modified for intention-preserving applicability at their respective destination sites.

For brevity reasons, I will refer to cola's operational transformation function as coopt function (for cola operational transformation).

coopt(ins(x, char_1), ins(y, char_2)):=
 {(ins(x, char_1), ins(y + 1, char_2)) , for x < y
  (ins(x + 1, char_1), ins(y, char_2)) , for x > y
  (noop, noop)                         , for x = y && char_1 == char_2
  serverside insertion comes first     , for x = y && char_1 != char_2
 }

Using coopt for resolving the conflict between the insertion operations in our example, yields the same and intention-preserved document state at both sites.

conflicting insertions resolved via coopt
remote op. local op. state @ client 1 state @ client 2 local op. remote op.
ins(2,O) CRON CRON ins(5,A)
coopt(ins(5,A), ins(2,O))
ins(6,A)
CORON CRONA coopt(ins(2,O), ins(5,A))
ins(2,O)
CORONA CORONA

Conflicts can also arise for deletions at both sites and deletion and insertion operations executed on the same document state at two different sites.

conflicting deletions
remote op. local op. state @ client 1 state @ client 2 local op. remote op.
del(6) MARIPOSA MARIPOSA del(5)
del(5) MARIPSA MARIOSA del(6)
MARISA MARIOA

The specific conflict in the conflicting deletions table can be resolved by decrementing the index of the del(6) operation originating from site 1 by one upon arrival at and prior to execution at site 2.

As for insertions, coopt handles conflicting deletions in a general way.

coopt(del(x), del(y):=
 {(del(x), del(y - 1) , for x < y
  (del(x - 1), del(y) , for x > y
  (noop, noop)        , for x = y 
 }

Transforming remote delete operations defined on document states one operation apart from the local state, results in conflict free convergence.

conflicting deletions resolved via coopt
remote op. local op. state @ client 1 state @ client 2 local op. remote op.
del(6) MARIPOSA MARIPOSA del(5)
coopt(del(5), del(6))
del(5)
MARIPSA MARIOSA coopt(del(6), del(5))
del(5)
MARISA MARISA

Cola provides two more operations, which also need to be handled by coopt. These last operations are intended to improve on the collaborative experience and are of non-editing nature. Both operations are user-specific in that every user's cursor, possibly each in a different color, is replicated among editing sites. This is to make remote user operations more comprehensible for the respective local user. The same applies to the user-specific highlighting of editor contents.

→ cursor(position, user)
→ highlight(position, user)
→ highlight(from_position, to_position, user):=
        for(int i = from_position; i <= to_position; i++){
         highlight(i, user); 
        }

Concurrency Algorithm

Summarizing the previous section and giving a motivation for the one at hand, operational transformations are applicable on operations, defined on the same document state at different sites. When reaching their respective destination sites in such scenarios, the remote sites' document state diverges by one operation. The transformation modifies the incoming operation with respect to the operation that has been executed at the receiving site, so that intention-preservation for the incoming operation is preserved.

In cases where the local, receiving document state diverges by more than one (locally executed) operation compared to the state at a remote site sending an operation, the mechanism of operational transformations cannot be utilized directly. The previously iterated preconditions for the application of such are not met.

A concurrency algorithm takes care of detecting conflicting situations and meeting requirements for resolution via operational transformations.

Figure 4 features an exemplary graph of the state-space a client and the server editing site roam during a session.

Protocol Review

Integration into Eclipse Communication Framework

Resources

Meeting Notes

Papers & Books