This plan lays out the work areas required to re architect the current Orion Java server for very large scale deployment. This is not all work we will be able to scope within a single release, but is intended to provide a roadmap of the work required to make this happen. Work items are roughly sorted by order of priority.
- 1 Assessment of current state
- 2 Goal architecture
- 3 Work areas for scalability
- 3.1 Split shared content/metadata from Orion server metadata
- 3.2 Metadata access by different Orion versions
- 3.3 Metadata locking
- 3.4 Orderly shutdown
- 3.5 Scalable search engine and indexer
- 3.6 Search index versioning
- 3.7 Move long running Git operations out of process
- 3.8 Log management
- 3.9 Search for implicit assumptions about system tools (exec calls)
- 4 Links
Assessment of current state
Much of the current Orion Java server is designed to enable scalability. There are a number of problem areas detailed in the next section, but the current server has some existing strengths:
- Nearly stateless. With the exception of Java session state for authentication, and some metadata caching for performance, the current server is stateless. The server can be killed and restarted at any time with no data loss. All data is persisted to disk at the end of each HTTP request, with the exception of asynchronous long running operations that occur in a background thread and span multiple HTTP requests. Authentication can be off-loaded to a front end proxy to avoid Java session state.
- Fast startup. The current server starts within 2-3 seconds, most of which is Java and OSGi startup. No data is loaded until client requests start to arrive. The exception is the background search indexer, which starts in a background thread immediately and starts crawling. If the search index has been deleted (common practice on server upgraded) it can take several minutes for it to complete indexing. This in no way blocks client requests - search results will just be potentially incomplete until the index is fully populated.
- Externalized configuration. Most server configuration is centrally managed in the orion.conf file. There are a small number of properties specified in orion.ini as well. Generally server configuration can be managed without code changes.
- Scalable core technologies. All of the core technologies we are using on the server are known to be capable of significant scaling: Equinox, Jetty, JGit, and Apache Solr. In some cases we will need to change how these services are configured but nothing we need to throw away and start from scratch. Solr doesn't handle search crawling but we could use complementary packages like Apache Nutch for crawling if needed.
- Pluggable credential store. The Orion server currently supports a completely pluggable credential store for Orion account information and passwords. We have a default implementation that uses Equinox secure storage, but we have also done implementations that support OpenID, Persona, and LDAP. While the default backing store is not suitable for enterprise scale use, we are confident that a scalable off the shelf credential solution can be plugged in with no code change.
Our goal is to support an architecture that enables very large scale Orion deployments. What would be required if we had a million accounts or 100,000 simultaneous users? A vanilla Orion download would not need to support this out of the box without further configuration, but we want it to be possible to support such scenarios.
Our goal is to support the classic three tier web server architecture:
- Front end server used for load balancing, SSL encyrption, URL mapping and possibly caching of static content. Examples include the Apache front-end used at orionhub.org, or light front-ends such as nginx or Squid.
- Process servers of varying types, scaled independently. Current examples include Orion file server, git server, and search server.
- Persistence layer. Our current persistence layer is the file system, but it should be possible to interchange with other persistence implementations (e.g., database) without code changes in the Orion servers.
Work areas for scalability
Right now we use the OSGi instance area as the root of the user content/user workspaces, Orion metadata, search indices, and server plugin metadata. We should allow some of the most valuable shared content to be moved outside the OSGi instance area: The user content and the search indices. That way the server instances area is reserved for data local to that server and its OSGi bundles.
The user content can be located at orion.file.user.content.location, or in the instance area if that config property is not set.
The search indices can be located at orion.search.index.location, or in the instance area if that config property is not set.
- bug 439716 - Allow user content outside of OSGi instance area
- bug 439717 - Store the Solr search indices outside of the OSGi instance area
Metadata access by different Orion versions
There will always be some scenarios where upgrading metadata requires Orion to be shut down while the system upgrades, like the upgrade from simple to simpleV2 (where the metadata location moved to the user base directory.
But most metadata upgrades should be as forward and backward compatible as possible.
Strategy 1: migrations should add new properties
When a metadata upgrade runs it should only add new properties to the metadata objects (currently Metastore, User, Workspace, and Project). We already read and write JSON files, so the current version would just ignore the new properties, and they would be available to the new version.
With this strategy to change the meaning of a property you would create a new property, not reuse a current one.
To clean up properties that are no longer used, we would provide a post-upgrade command that can be used after the upgrade has completed successfully.
The version should be written into the metadata with the migration step so we know the latest version of the schema used with the metadata, but should be ignored by most other read/writes.
For rollback we have 2 options. Since the older version of Orion can just ignore the properties, we just ignore the properties. The downside is if the rollback is caused by a metadata bug, we'll have to increase the metadata version a second time when we deliver a fix.
The second rollback option is to try and back out any changes to the metadata and downgrade the version. If we've only added a few new properties, this might be do-able.
Strategy 2: incompatible metadata
For the case where we do have incompatibilities, we can add the version number to the new metadata filename. i.e. user.json will become user.8.json. That will allow the older version to continue to work while we upgrade to the newer version. A post-upgrade cleanup command can be run to remove old files once the upgrade is considered successful. A roll back just involves switching back to the older version of Orion, and deleting the newly created files.
- bug 439715 - Metadata file upgrades should include the version in the filename
Orion metadata is stored in simple JSON files, on a per-user basis. As such, contention for access to these files only occurs when multiple requests/processes are being carried out for the same user concurrently. To avoid corruption or invalid behavior we must use file-level locking to ensure integrity of the metadata when concurrent requests do occur. Action: sweep through all metadata read/write and ensure file locking is performed.
- bug 439622 - Lock metadata on read/writes
- bug 339602 - [server] Thread safety of workspace metadata back end
Orion instances are generally designed to be disposable. Killing the Orion process is generally safe, but may result in a user operation being left incomplete. We need to ensure Orion responds promptly to a stop request and is able to do an orderly shutdown. Action: ensure all long running tasks that span requests respond to cancellation promptly.
- bug 438569 - Cancel search indexer on shutdown
Scalable search engine and indexer
Our interaction with the search engine is entirely over HTTP to an in-process Solr server. The indexer is currently run as a pair of background threads in the Orion server process (one for updating, one for handling deletion). Action: ensure there is only one indexer running when there are multiple Orion instances in play.
- bug 439062 - Ensure only one search indexer is running
Our current index schema hits scalability limits when the workspace grows to the 600+ GB range. Query times are rising sharply. We need to investigate how to shard or otherwise scale the index to achieve linear query performance vs size of workspace.
Longer term, the Solr server should be forked out of process so it can be scaled out independently. We will likely need to eventually move to a more scalable search indexer such as Apache Nutch or Elasticsearch.
Search index versioning
We can allow mutliple Orion search engines to share the same indices as long as we provide some locking on the process. But during an upgrade that effects the search schema we need to upgrade the indices while allowing the older version to continue to run.
In the short term we can use the org.eclipse.orion.internal.server.search.SearchActivator.CURRENT_INDEX_GENERATION in the path we give to the Solr server to provide separation of the current version and new version during an upgrade.
Post upgrade cleanup would involve just deleting the old version search index directory. Roll back would involve deleting the new version of the directory.
For example, the indices should be in INSTANCE_AREA/.metadata/.plugins/org.eclipse.orion.server.core.search/vCURRENT_INDEX_GENERATION/ or SEARCH_ROOT/vCURRENT_INDEX_GENERATION/.
- bug 439719 - Allow different versions of the Solr index to co-exist
Move long running Git operations out of process
Long running Git operations are currently run in background threads within the Orion server process. This should be moved to separate Git processes so that it can be scaled out independently. This would also enable Git processes to continue running if the Orion server node falls over or gets switched out. This can be implemented with a basic job queue. The Orion server writes a description of the long running Git operation to execute to a persistent queue. A separate Git server polls that queue and performs the git operations.
The currently used background jobs seem to be executed using an unlimited pool of worker threads, this can be used in a DoS attack to bring down the server (found that in load tests running tons of git commands). This should be fixed by limiting the number of worker threads / processes to a limit the available hardware can handle.
In addition we should investigate optimizing the configuration of JGit and any JVM running JGit. JGit makes a number of speed/space trade-offs that can be adjusted by configuring it different. For details see bug 344143.
We need to run gc on all git repositories to ensure proper performance of git commands. This could be achieved by
- exposing gc to users (add a button to excplicitly start gc, that's what EGit does)
- introducing a background job which periodically runs gc on all repositories / those which have many loose objects (that's what is typically done on Gerrit servers)
- teaching jgit to auto-gc when there are many loose objects in a repository and a git command was requested (this is what native git command line does)
Orion primarily uses SLF4J for logging, written to stdout/stderr of the Orion server process. The log configuration is stored in a bundle of the Orion server. It needs to be possible to configure the log external to the Orion build itself. I.e., the Orion server configuration controls the logging rather than settings embedded with the code. We should continue writing log data to stdout so that external applications can be used to persist logs for remote monitoring and debugging.
Search for implicit assumptions about system tools (exec calls)
The Orion server is pure Java code so it should be able to run anywhere that can run a modern JVM (generally Java 7 or higher). We need to validate that we don't make assumptions about external system tools (i.e., System.exec calls). We should bundle the Java 7 back end of the Eclipse file system API to enable running any Orion build on arbitrary hardware.
Main bug report: bug 440005
Main bug report for 3.0: bug 404935
Reading on scalability:
Assorted cloud platform docs: