Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

ALF/SCM Vocabulary

< ALF
Revision as of 00:19, 15 June 2006 by Asimantel.serena.com (Talk | contribs) (Create a workspace(sandbox) (view))

Introduction

ALF is a web-services based framework for tools integration. In order for ALF to realize its full potential, standardized vocabularies will be needed for each class of tool. With such vocabularies, tools within a given class will be interoperable in the ALF framework. The purpose of this document is to define a standard ALF SCM vocabulary.

SCM (Software Configuration Management) refers to a class of tools providing the following services: version control of artifacts (i.e. tracking the series of revisions to each artifact), shared development of a set of artifacts (i.e. locking, branching, and merging of changes when different users attempt to make changes in parallel), and general services to support development of software by teams (e.g. user/group management, access control, support of specific development processes, etc). Some well-known examples of SCM tools are: CVS, Subversion, ClearCase, Perforce, and AccuRev, among others.

Usually the artifacts managed by SCM tools are source files, though the tools are applicable to any set of file-based artifacts (e.g., XML docs, test scripts in some testing language, HTML pages, Word documents, object files, executables, etc). At this time we are restricting ourselves to artifacts that are representable as files. Extending this vocabulary to deal with configuration management of other sorts of entities is left for the future.

The services defined by this document are both subsettable and extendable. The intent of this standard is not to force any particular SCM model on all SCM vendors, but rather, to ensure that vendors which provide a particular SCM service all do so in the same way.

The SCM ALF interface being defined here could be used as the basis of the client/server protocol for the GUI of an SCM tool itself, though that is not its primary goal. The main goal is to facilitate interaction between different classes of tools.

The outlines of the rest of this document is as follows: First a section on the general architecture that we have in mind, intended to set the stage for the detail-level concepts that follow. Then a section defining the objects in the SCM object model – for example files, elements, versions, branches, etc. One goal of this section is to agree on a standard term for each sort of object – for example we may agree to all use “change-set” even though some tools call this object a “change-package” and some call it an “activity”. More significantly, we will define what each object is operationally, and list some of the attributes of each object. Following the objects section will be a use cases section, listing the possible operations on these objects. Then a schema section, defining the XML schema for the web services we are defining. Finally a WSDL section, giving the WSDL definitions for the ALF SCM web services.

Architecture

Before getting into the detail-level concepts, we need an architectural framework.

The fundemental idea of ALF is for tools to provide callable interfaces and use a common callback mechanism allowing them to be coordinated by an external agent. The purpose is to diminish the need for tools to know about each other. To particpate in ALF, a tool should provide a set of services exposed as SOAP based web services and may raise events to an ALF event manager to signal significant changes that may be or interest outside the tool itself in the context of an application lifecycle. In ALF the direct client or such a tools is a BPEL service flow which runs when an ALF event is raised and is used to call one or more tools via their service interface to take some action and.or update their state.

(Picture to be inserted)Example.jpg

The picture above shows 3 ALF-enabled tools. For example, one might be an issue-tracking tool, one might be a testing tool, and one might be a build tool. These may be running on 3 different machines shown by the 3 boxes at the top.

The primary intent of ALF Vocabularies is to make the creation of the BPEL services flows easier by creating common definitions. This should allow data mapping to be more straight forward. At their most granular ALF Vocabularies may define simple datatypes in the form of names and value ranges that express particular concepts. At their boldest, ALF Vocabualries may define a system of interfaces that are intended to be implemented by all tools providing the described function.

Broadly, an SCM system is a repository of versioned items, generally files, which can be classified and related in a specialized way. Thus one aspect fo the SCM vocaulary is defining the core SCM "classification" concepts so that SCM relationships and queries can have a common base. The other key concept in SCM is the notion of a workspace, which is an area on the filesystem of the SCM client where the SCM system operates. For instance, if one of the clients issues an update workspace request to the SCM system, it expects the files to be materialized locally in the filesystem. In this regard, SCM systems differ from many of the other ALF tools. An issue tracking system is concerned mainly with accessing and updating server-side databases, and thus does not have the problem of transferring large numbers of files to/from the requesting machine.

The fact that SCM systems store files and that those files are the fundemental thing that others ALF enabled tools need to operate on presents some challanges. A typical SCM client will utilize a client side API provided by the SCM but running on the client to "pull" files to the client's local file system. The client's security context is used to access the local file system. An ALF service flow cannot act as a normal SCM client. Since it it can only call web services, it has no way to run a client side API. Thus if it were to "pull" files directly, the implication would be that the file content would be streamed through the BPEL engine. This is undesirable from both a performance and scalability standpoint. Further, logically, a service flow will have no interested in operating directly on the files materialized from the SCM repository. Instead, other tools will operate on those files under the instruction of the ALF service flow. Since these tools may reside on different servers and have very different implementations there is no guarantee that these tools can or will run in the same security context or share the same file space. Thus, contrary to the main thrust of ALF, there is a need for tools to interact directly with a workspace provider of some sort either as a shared resource where the period of access is managed by one of more ALF service flows and passed between tools or as a private resource where the period of access may be initiated by an ALF service flow but is managed exclusively by the tool.

The implication of this is that ALF SCM vocabulary should provide a common defintion of a Workspace provider that tools can target rather than have to integrate to every SCM individually. ALF may provide an example implementation of a such a Workspace provider. This could be utilized by SCM tools if they were not immediately able to provide their own implementation.

(Picture to be inserted) File:Architecture2.jpg

Objects

Workspace

(CVS: Sandbox, ClearCase: View, AccuRev: Workspace, Version Manager: Local Workspace, Dimensions: Work Area)

A workspace corresponds to an area on the filesystem of a client machine where the SCM user will work on the files that are under version control. Users can create workspaces, and load them with files that are copies of the shared artifacts (elements) that are under source control. The workspace's configuration determines what versions of each element is obtained. Users can edit the files in the workspace, and create new versions, and promote those versions back to the underlying repository so that other users of the SCM system can access them.

Workspaces have the following attributes:

  • a client machine hosting the workspace
  • a workspace root directory
  • a unique name within the SCM system
  • a configuration (rules saying what version of each element is to be loaded into the workspace. This might be as simple as a “parent stream”, or might be a more complex “config spec” style of rules)
  • an owner (typically whoever created the workspace)
  • possibly, permissions (who gets to read or write the workspace)
  • possibly, a set of load rules characterizing what files are to be loaded into the workspace
  • possibly other attributes

Files, Directory

These are operating system concepts, but since an SCM system operates on them, they need to be part of the SCM vocabulary also.

A file has the following attributes:

  • a base filename
  • a parent directory
  • contents (a stream of bytes)
  • a modification time-stamp
  • an owner
  • possibly, some permissions
  • possibly other attributes

A directory can be thought of as a special kind of file with no contents. A root directory (in the filesystem) is a special kind of directory with no parent. We speak of the children of a directory as the set of all files and directories having the given directory as a parent. We define the pathname of a file as the series of basenames from the root to given file, separated by ‘/’ or ‘\’, e.g.: “/home/fred/workspace/dir/some_file”.

Only files that live under the root directory of a workspace are subject to version control. Within a workspace, any given file may or may not be under version control. If a file is under version control, it is associated with an element (see next section).

For a file in a workspace, we speak of its workspace-relative-path as the relative path from the root of the workspace to the file. E.g. in a workspace rooted at “/home/fred/workspace”, the file “/home/fred/workspace/dir/some_file” has workspace-relative path of “dir/some_file”.

Element

(CVS: File [CVS fails to distinguish between "file" and "element"], ClearCase: Element, AccuRev: Element, Version Manager: Versioned File and Archive, Dimensions: Item)

An element is an SCM object that is created when a file or directory in a workspace is put under source control.

Elements have the following attributes:

  • an element ID (some SCM-system-specific internal identifier of the element, typically a number)
  • an element-type (e.g., directory, text file, binary file, possibly other values)
  • a set of versions (see next item)
  • an owner
  • a creation-time
  • possibly, a creation comment
  • possibly, permission bits

Version

(CVS: Version, ClearCase: Version, AccuRev: Version, Version Manager: Revision, Dimensions: Item Revision)

A version is an SCM object representing the contents of some version-controlled file at a particular moment. The initial version of a given element is created at the time the element is created, and captures the contents of the file when it was put under version control. Subsequent versions are created by “checkin” (or “keep”) operations on version-controlled files. This operation creates a new version and puts it into the version-set of the element corresponding to the checked-in file.

Versions have the following attributes:

  • contents (a series of bytes)
  • an element they are associated with
  • some sort of version-id (name or number)
  • a set of predecessors/successor versions
  • an owner
  • a creation-time
  • possibly, a creation comment
  • possibly other attributes

Note the contents of any given version are immutable. New versions of the element can be created, but past versions content do not change.

If we ignore branching for the moment (we’ll deal with that in section 6), then without branching each version has a single predecessor and a single successor, and thus the versions form a linear graph:

(picture to be inserted)

Each “checkin”/”keep” creates a new one of these version objects and links it into the graph.

Change-Set

(CVS: [none], UCM ClearCase: Activity, AccuRev: Change-Package, Version Manager: [none], Dimensions: Request)

A set of versions (i.e. changes from previous version) that logically go together. Possible uses of change-sets are: (a) Associate them with an issue in an issue-tracking system, (b) Treat them as a unit, e.g. commit them together, revert them together, merge them as a unit (e.g. for merging the same fix into multiple branches).

The versions in a change-set need not have been created at the same time, though some SCM systems might limit the notion of change-set to versions checked-in/committed in one transaction. The versions in a change-set logically need not be contiguous (e.g. versions 3 and 5 of a given element might be part of a single bug-fix, but not version 4), although many systems restrict change-sets to contiguous versions for implementation reasons.

Change-sets have the following attributes:

  • an id (possibly a name or a number)
  • a set of versions in the change-set
  • an owner
  • possibly, a comment
  • possibly, an identification of an issue in some issue-tracking system
  • possibly other attributes

Branch/Stream

(CVS: Branch, Base ClearCase: Branch, UCM ClearCase: Stream, AccuRev: Stream, Version Manager: Workspace and Promotion, Dimensions: Branch)

In the context of a single element’s version graph, a branch is a named linear sub-graph, for example here we have versions 0, 1, 2, and 3 in branch MAIN, and versions 0 and 1 in branch B (which spouted off of MAIN/1).

(picture to be inserted)

Normally, we speak of a “branch” in the collective sense. For example, if a set of elements all have version graphs of the same general shape as the above, we can speak of “latest on branch B” as meaning the latest version of each element along the “B” branch of the graph. The user who is “working on a branch B” is creating new versions on the branch B subgraph.

Merging means the act of merging together the changes that were made on a branch back into a single version in another branch, for example as selected by the red merge arrow in the following:

(picture to be inserted)

A stream is conceptually similar to a branch, but the streaming model adds some structure and applies some constraints as follows:

  • Streams form a hierarchy (every stream has a parent)
  • Merges go up the stream hierarchy, and are generally referred to as “deliver” or “promote” operations. The general idea is that you promote your work to the parent stream.
  • Workspaces live at the leaves of the stream hierarchy, and the contents of the workspace are determined by its position in the stream hierarchy.

The following whitepaper explains the difference between the stream model and the branching model in more detail: [1]

Attributes of branches/streams:

  • A name
  • Parent branch or stream
  • Owner
  • Creation time
  • Other attributes tbd

Configuration

(Base ClearCase: Config Spec, Dimensions: Baseline Template)

Loosely speaking, a configuration refers to the rules specifying what versions of what elements should be loaded into a workspace. The rules may be explicit, as in a ClearCase config spec. Or they may be implicit, e.g. determined by where the workspace lives in the stream hierarchy. Typically the configuration is expressed in terms of branches or streams, e.g. "load the latest version of each element on branch B". Alternatively one can use a baseline as a configuration (see next item). Note a configuration is merely a set of rules, and therefore does not refer to any fixed set of versions.

Baseline

(CVS: Tag, Base ClearCase: Label, UCM ClearCase: Baseline, AccuRev: Snapshot, Version Manager: Label, Dimensions: Baseline)

Despite the variety of names for this concept, it is fundamental to SCM. The motivation is to be able to re-create a fixed set of files, e.g. corresponding to the release of a product. A baseline is therefore a fixed set of versions, one version per element in a component. A baseline can be thought of as a "slice" through the element set capturing their states at a moment in time. The baseline can then be loaded into a workspace when you want to re-create the state it captured.

Although baseline has a similar definition as change-set (in terms of a set of versions), they have different intent. A change-set is intended merely to capture one logical change, and thus typically has a small number of versions in it (e.g. "foo.c version 5 and fum.c version 7" may be my change-set for a given fix). Whereas a baseline is intended as a slice through all the elements in a component, and thus has a large number of versions in it.

Attributes of baselines:

  • A name
  • A creation time-stamp
  • An owner
  • A list of versions
  • Possibly other attributes

Component

(CVS: [none], UCM ClearCase: Component, Dimensions: Design Part)

Conceptually, a component is a set of the elements that logically go together. For example, a SCM user developing a client/server app might have a “client” component with the source files to his client, and a “server” component with the source files to the server. Possible things people might want to do with components: (a) Load only the sources that go with a particular component into a workspace. (b) Use different configuration rules for different components

Note that the set of elements in a component might vary over time.

One possible way to define components is through the directory structure. "everything under directory gui/..." might be a definition of a component. One could then embellish that with include/exclude rules (e.g. "everything under directory gui/... except gui/doc/...").

Attributes of components are:

  • A name
  • Its definition (e.g., a set of include/exclude rules)
  • A creation time
  • An owner
  • Possibly other attributes

Repository

(CVS: Repository, ClearCase: VOB [stands for "versioned object base"], AccuRev: Depot, Version Manager: PDB [stands for "Project Database"], Dimensions: Base Database, Product)

The persistent store housing the SCM data and metadata. The SCM system may divide its data into multiple repositories, or store it all in one repository.

Attributes of repositories:

  • A host machine
  • A storage area, or specification of where it lives in a database
  • An owner
  • Permissions
  • Possibly other attributes

Use Cases

Build Server Use Cases

Build Service Flow Example

Create a workspace(sandbox) (view)

A workspace is characterized by a place on your local filesystem where the files correlate to objects under SCM control. Workspaces are a sharing mechanism: Different users have different workspaces (views) into the same set of underlying objects.

   "mkws" – AccuRev
   "co" – CVS, SVN (creates and populates the sandbox)
   "mkview" – ClearCase
   Possible methods, datatypes, etc.
   CreateWorkspace(
       String		workspaceName,  // the name of the new workspace 
       Repository 	repository, 	// 
       String 		clientHostName,	// 
       String 		workspaceRootPath
   )
   Events fired and who might consume:
       “New Workspace” – unsure
   Questions/thoughts:
       Is repository tool specific or ALF abstraction?  Perhaps both are in Repository.
       Unsure how this interacts with, for example, UpdateWorkspace().

Remove a workspace(sandbox) (view)

   "rmws" – AccuRev
    nothing – CVS, SVN (just delete the files)
   "rmview" – ClearCase
   

Put a set of files under source control.

This create elements corresponding to each file, and creates an initial version of each element whose contents come from the file.

   "add" - AccuRev, CVS, SVN
   "mkelem" - ClearCase
   "addfiles" - Version Manager
   Possible methods, datatypes, etc.
   AddAssets(
       Repository  	repository,	//  
       Workspace 	workspace, 	// the workspace from which the files/folders are being added
       Stream		stream,		// the stream/branch to which the assets will be added
       Configuration	configuration,	// filtering
       OtherOptions	otherOptions	// tool specific options
   )
   Events fired and who might consume:
       “New Change Set” – build service provider may want to initiate continuous integration builds
                          when “enough” change sets have occurred

   Questions/thoughts:
       Does 'Configuration' provide filtering concepts for operations such as this or is 
       something more specific needed (SelectionSet?).

Remove elements from source control

Could be implemented by just hiding the elements, if the SCM system doesn’t want to destroy information.

  "defunct" - AccuRev, 
  "remove" - CVS
  "delete" - SVN
  "rmelem" - ClearCase
  "delete" - Version Manager

Mark a element as being worked on, either exclusively or non-exclusively

“Exclusive” means you are preventing others from working on it at the same time. Non-exclusive may be a no-op in some systems, or it may have the effect of making the corresponding file writeable in some other systems.

  "co" – AccuRev (only needed for exclusive locking)
  "edit"  - CVS (only needed for exclusive locking)
  "lock" - SVN (only needed for exclusive locking)
  "checkout" – ClearCase (makes the file writeable, there are different switches for exclusive vs non-exclusive)
  "get -l" - Version Manager (only needed for exclusive locking; Rich clients auto-merge on check-in)

Create a new version of an element

The operation by which new versions come into existence. Takes the contents of a source file, creates a new version based on those contents, and puts the new version into the corresponding element’s version-history. Versions of an element are linked into a graph by predecessor/successor relationships.

  "keep" - AccuRev,
  "ci" - CVS, SVN
  "checkin" - ClearCase
  "put" - Version Manager

Create a new branch (stream) to support parallel development

The general idea is that new versions are created on that branch. So for instance, a set of elements might have a maintenance branch/stream and a future-development branch/stream, each with its own sequence of versions.

   “mkstream” – AccuRev
   "tag –b" – CVS
   "copy" - SVN
   "mkbrtype"  - base ClearCase
   "mkstream" – ClearCase with UCM

Capture a baseline (snapshot)

The general idea is to capture a fixed set of versions that can be re-materialized at a later time.

    "mksnap" – AccuRev
    "tag" – CVS
    "copy" - SVN
    "mklabel" - base ClearCase
    "mkbaseline" UCM ClearCase
    "label" - Version Manager
   Possible methods, datatypes, etc.
   CreateBaseline(
       String		baselineName,	// the name of the new baseline
       Repository 	repository, 
       Stream 	        stream,		// stream/branch or baseline+changesets? or arbitrary selection?
       Configuration 	configuration,	// appropriate filtering
       otherOptions	OtherOptions	// tool specific options
   )
   Events fired and who might consume:
       “New Baseline” – unsure

   Questions/thoughts:
       Consider how to request creation of a baseline from another baseline plus a list of change sets.

       Consider how to request creation of a baseline from an arbitrary list of assets.  (Is this 
       an important ALF case?)

Materialize a set of files in a workspace

The files correspond to underlying elements in the SCM system.

   "update" – AccuRev, CVS, SVN, ClearCase (for snapshot/web views, implicit for dynamic views)
   "get -u" - Version Manager
   Possible methods, datatypes, etc.
   UpdateWorkset(
       Repository      repository,     // 
       Workspace       workspace,      // 
       Baseline        baseline,       // if materializing from an existing baseline
       Stream          stream,         // if materializing from a stream/branch
       Configuration   configuration,  // provides filtering
       Options         options,        // standard options such as refresh, replace, etc.
       OtherOptions    otherOptions    // tool specific options
   )
   Events fired and who might consume:
       “Workspace Update Started” - unsure
       “Workspace Update Completed" - unsure

   Questions/thoughts:
       We may want to "normalize" the Baseline / Stream concepts into a SelectionSet.
 
       Is configuration the right filtering mechanism?

       Should there be an option to create the workspace and return it?  The thinking here would
       be that it's a SCM Service Provider managed workset, created by the versioning tool, handed
       to the build tool, etc.

Get history of an element or set of elements

Time-ordering listing of the history of changes.

   "hist"  - Accurev
   "history" - CVS
   "log" - SVN
   "hist" - ClearCase
   "pcli vlog" - Version Manager

Compare versions of element(s)

  "diff" - AccuRev, CVS, SVN, ClearCase 
  "vdiff" - Version Manager

Perform merges of versions, from one branch/stream to another

  "merge" – AccuRev, CVS, SVN, ClearCase

Promote versions up the stream hierarchy

Modern SCM systems define a process by which changes propagate up a hierarchy of streams. This use-case is to support such a hierarchical model.

  "promote" – AccuRev
  "deliver" – ClearCase with UCM
  "PromoteGroup" - Version Manager

Define a change-package (activity)

This is typically to support associating a set of changes with a defect. A change-package is typically defined as a set of versions without gaps (though other definitions are possible).

Schema

This section not yet written.

WSDL

This section not yet written.

References

JSR 147: Workspace Versioning and Configuration Management: [2]

WebDav DeltaV [3]

Back to the top