Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Orion/Server scalability"

(Move long running Git operations out of process)
 
(33 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This plan lays out the work areas required to re architect the current Orion Java server for very large scale deployment. This is not all work we will be able to scope within a single release, but is intended to provide a roadmap of the work required over the coming releases to make this happen. Work items are roughly sorted by order of priority.
+
This plan lays out the work areas required to re architect the current Orion Java server for very large scale deployment. This is not all work we will be able to scope within a single release, but is intended to provide a roadmap of the work required to make this happen. Work items are roughly sorted by order of priority.
  
 
= Assessment of current state =
 
= Assessment of current state =
  
 
Much of the current Orion Java server is designed to enable scalability. There are a number of problem areas detailed in the next section, but the current server has some existing strengths:
 
Much of the current Orion Java server is designed to enable scalability. There are a number of problem areas detailed in the next section, but the current server has some existing strengths:
# Nearly stateless. With the exception of Java session state for authentication, and some metadata caching for performance, the current server is stateless. The server can be killed and restarted at any time with no data loss. All data is persisted to disk at the end of each HTTP request, with the exception of asynchronous long running operations that occur in a background thread and span multiple HTTP requests.
+
# Nearly stateless. With the exception of Java session state for authentication, and some metadata caching for performance, the current server is stateless. The server can be killed and restarted at any time with no data loss. All data is persisted to disk at the end of each HTTP request, with the exception of asynchronous long running operations that occur in a background thread and span multiple HTTP requests. Authentication can be off-loaded to a front end proxy to avoid Java session state.
# Fast startup. The current server starts within 2-3 seconds, most of which is Java and OSGi startup. No data is loaded until client requests start to arrive. The exception is the background search indexer, which starts in a background thread immediately and starts crawling. If the search index has been deleted (common practice on server upgraded) it can take several minutes for it to complete indexing.  
+
# Fast startup. The current server starts within 2-3 seconds, most of which is Java and OSGi startup. No data is loaded until client requests start to arrive. The exception is the background search indexer, which starts in a background thread immediately and starts crawling. If the search index has been deleted (common practice on server upgraded) it can take several minutes for it to complete indexing. This in no way blocks client requests - search results will just be potentially incomplete until the index is fully populated.
 
# Externalized configuration. Most server configuration is centrally managed in the orion.conf file. There are a small number of properties specified in orion.ini as well. Generally server configuration can be managed without code changes.
 
# Externalized configuration. Most server configuration is centrally managed in the orion.conf file. There are a small number of properties specified in orion.ini as well. Generally server configuration can be managed without code changes.
 
# Scalable core technologies. All of the core technologies we are using on the server are known to be capable of significant scaling: Equinox, Jetty, JGit, and Apache Solr. In some cases we will need to change how these services are configured but nothing we need to throw away and start from scratch. Solr doesn't handle search crawling but we could use complementary packages like Apache Nutch for crawling if needed.
 
# Scalable core technologies. All of the core technologies we are using on the server are known to be capable of significant scaling: Equinox, Jetty, JGit, and Apache Solr. In some cases we will need to change how these services are configured but nothing we need to throw away and start from scratch. Solr doesn't handle search crawling but we could use complementary packages like Apache Nutch for crawling if needed.
 +
# Pluggable credential store. The Orion server currently supports a completely pluggable credential store for Orion account information and passwords. We have a default implementation that uses Equinox secure storage, but we have also done implementations that support OpenID, Persona, and LDAP. While the default backing store is not suitable for enterprise scale use, we are confident that a scalable off the shelf credential solution can be plugged in with no code change.
 +
 +
= Goal architecture =
 +
 +
Our goal is to support an architecture that enables very large scale Orion deployments. What would be required if we had a million accounts or 100,000 simultaneous users? A vanilla Orion download would not need to support this out of the box without further configuration, but we want it to be possible to support such scenarios.
 +
 +
Our goal is to support the classic three tier web server architecture:
 +
# Front end server used for load balancing, SSL encyrption, URL mapping and possibly caching of static content. Examples include the Apache front-end used at orionhub.org, or light front-ends such as [http://nginx.org/ nginx] or [http://www.squid-cache.org/ Squid].
 +
# Process servers of varying types, scaled independently. Current examples include Orion file server, git server, and search server.
 +
# Persistence layer. Our current persistence layer is the file system, but it should be possible to interchange with other persistence implementations (e.g., database) without code changes in the Orion servers.
  
 
= Work areas for scalability =
 
= Work areas for scalability =
  
== Metadata backing store ==
+
== Split shared content/metadata from Orion server metadata ==
  
As of Orion 2.0, the server metadata is stored in Equinox preferences. There are four preference trees for the various types of metadata: Users, Workspaces, Projects, and Sites. These are currently persisted in four files, one per preference tree. There are a number of problems with this implementation:
+
Right now we use the OSGi instance area as the root of the user content/user workspaces, Orion metadata, search indices, and server plugin metadata. We should allow some of the most valuable shared content to be moved outside the OSGi instance area:  The user content and the search indices. That way the server instances area is reserved for data local to that server and its OSGi bundles.
# Storing large amounts of metadata in a single file that needs to be read on most requests is a severe bottleneck.
+
# Migration of metadata format across builds/releases is ad-hoc and error prone
+
# Individual preference node access/storage is thread safe, but many Orion API calls require multiple read/writes to multiple nodes. There is no way to synchronize access/update across nodes.
+
# Metadata file format does not lend itself to administrative tools for managing users, migrating, etc.
+
  
== Move search engine/indexer to separate process(es) ==
+
The user content can be located at ''orion.file.user.content.location'', or in the instance area if that config property is not set.
 +
 
 +
The search indices can be located at ''orion.search.index.location'', or in the instance area if that config property is not set.
 +
 
 +
*{{bug|439716}} - Allow user content outside of OSGi instance area
 +
*{{bug|439717}} - Store the Solr search indices outside of the OSGi instance area
 +
 
 +
== Metadata access by different Orion versions ==
 +
 
 +
There will always be some scenarios where upgrading metadata requires Orion to be shut down while the system upgrades, like the upgrade from simple to simpleV2 (where the metadata location moved to the user base directory.
 +
 
 +
But most metadata upgrades should be as forward and backward compatible as possible.
 +
 
 +
==== Strategy 1: migrations should add new properties ====
 +
 
 +
When a metadata upgrade runs it should only add new properties to the metadata objects (currently Metastore, User, Workspace, and Project).  We already read and write JSON files, so the current version would just ignore the new properties, and they would be available to the new version.
 +
 
 +
With this strategy to change the meaning of a property you would create a new property, not reuse a current one.
 +
 
 +
To clean up properties that are no longer used, we would provide a post-upgrade command that can be used after the upgrade has completed successfully.
 +
 
 +
The version should be written into the metadata with the migration step so we know the latest version of the schema used with the metadata, but should be ignored by most other read/writes.
 +
 
 +
For rollback we have 2 options.  Since the older version of Orion can just ignore the properties, we just ignore the properties.  The downside is if the rollback is caused by a metadata bug, we'll have to increase the metadata version a second time when we deliver a fix.
 +
 
 +
The second rollback option is to try and back out any changes to the metadata and downgrade the version.  If we've only added a few new properties, this might be do-able.
 +
 
 +
 
 +
==== Strategy 2: incompatible metadata ====
 +
 
 +
For the case where we do have incompatibilities, we can add the version number to the new metadata filename. i.e. ''user.json'' will become ''user.8.json''.  That will allow the older version to continue to work while we upgrade to the newer version.  A post-upgrade cleanup command can be run to remove old files once the upgrade is considered successful.  A roll back just involves switching back to the older version of Orion, and deleting the newly created files.
 +
 
 +
*{{bug|439715}} - Metadata file upgrades should include the version in the filename
 +
 
 +
== Metadata locking ==
 +
 
 +
Orion metadata is stored in simple JSON files, on a per-user basis. As such, contention for access to these files only occurs when multiple requests/processes are being carried out for the same user concurrently. To avoid corruption or invalid behavior we must use file-level locking to ensure integrity of the metadata when concurrent requests do occur. '''Action:''' sweep through all metadata read/write and ensure file locking is performed.
 +
 
 +
*{{bug|439622}} - Lock metadata on read/writes
 +
*{{bug|339602}} - [server] Thread safety of workspace metadata back end
 +
 
 +
== Orderly shutdown ==
 +
 
 +
Orion instances are generally designed to be [http://12factor.net/disposability disposable]. Killing the Orion process is generally safe, but may result in a user operation being left incomplete. We need to ensure Orion responds promptly to a stop request and is able to do an orderly shutdown. '''Action:''' ensure all long running tasks that span requests respond to cancellation promptly.
 +
 
 +
*{{bug|438569}} - Cancel search indexer on shutdown
 +
 
 +
== Scalable search engine and indexer ==
 +
 
 +
Our interaction with the search engine is entirely over HTTP to an in-process Solr server. The indexer is currently run as a pair of background threads in the Orion server process (one for updating, one for handling deletion). '''Action:''' ensure there is only one indexer running when there are multiple Orion instances in play.
 +
 
 +
*{{bug|439062}} - Ensure only one search indexer is running
 +
 
 +
Our current index schema hits scalability limits when the workspace grows to the 600+ GB range. Query times are rising sharply. We need to investigate how to shard or otherwise scale the index to achieve linear query performance vs size of workspace.
 +
 
 +
Longer term, the Solr server should be forked out of process so it can be scaled out independently. We will likely need to eventually move to a more scalable search indexer such as Apache Nutch or Elasticsearch.
 +
 
 +
== Search index versioning ==
 +
 
 +
We can allow mutliple Orion search engines to share the same indices as long as we provide some locking on the process.  But during an upgrade that effects the search schema we need to upgrade the indices while allowing the older version to continue to run.
 +
 
 +
In the short term we can use the '''org.eclipse.orion.internal.server.search.SearchActivator.CURRENT_INDEX_GENERATION''' in the path we give to the Solr server to provide separation of the current version and new version during an upgrade.
 +
 
 +
Post upgrade cleanup would involve just deleting the old version search index directory.  Roll back would involve deleting the new version of the directory.
 +
 
 +
For example, the indices should be in ''INSTANCE_AREA/.metadata/.plugins/org.eclipse.orion.server.core.search/vCURRENT_INDEX_GENERATION/'' or ''SEARCH_ROOT/vCURRENT_INDEX_GENERATION/''.
 +
 
 +
*{{bug|439719}} - Allow different versions of the Solr index to co-exist
  
 
== Move long running Git operations out of process ==
 
== Move long running Git operations out of process ==
  
== Move Java session state out of process ==
+
Long running Git operations are currently run in background threads within the Orion server process. This should be moved to separate Git processes so that it can be scaled out independently. This would also enable Git processes to continue running if the Orion server node falls over or gets switched out. This can be implemented with a basic job queue. The Orion server writes a description of the long running Git operation to execute to a persistent queue. A separate Git server polls that queue and performs the git operations.
  
== Externalize log configuration ==
+
The currently used background jobs seem to be executed using an unlimited pool of worker threads, this can be used in a DoS attack to bring down the server (found that in load tests running tons of git commands). This should be fixed by limiting the number of worker threads / processes to a limit the available hardware can handle.
 +
 
 +
In addition we should investigate optimizing the configuration of JGit and any JVM running JGit. JGit makes a number of speed/space trade-offs that can be adjusted by configuring it different. For details see {{bug|344143}}.
 +
 
 +
We need to run gc on all git repositories to ensure proper performance of git commands.
 +
This could be achieved by
 +
* exposing gc to users (add a button to excplicitly start gc, that's what EGit does)
 +
* introducing a background job which periodically runs gc on all repositories / those which have many loose objects (that's what is typically done on Gerrit servers)
 +
* teaching jgit to auto-gc when there are many loose objects in a repository and a git command was requested (this is what native git command line does)
 +
 
 +
== Log management ==
 +
 
 +
Orion primarily uses SLF4J for logging, written to stdout/stderr of the Orion server process. The log configuration is stored in a bundle of the Orion server. It needs to be possible to configure the log external to the Orion build itself. I.e., the Orion server configuration controls the logging rather than settings embedded with the code. We should continue writing log data to stdout so that external applications can be used to persist logs for remote monitoring and debugging.
  
 
== Search for implicit assumptions about system tools (exec calls) ==
 
== Search for implicit assumptions about system tools (exec calls) ==
 +
 +
The Orion server is pure Java code so it should be able to run anywhere that can run a modern JVM (generally Java 7 or higher). We need to validate that we don't make assumptions about external system tools (i.e., System.exec calls). We should bundle the Java 7 back end of the Eclipse file system API to enable running any Orion build on arbitrary hardware.
 +
 +
= Links =
 +
 +
Main bug report: {{bug|440005}}
 +
 +
Main bug report for 3.0: {{bug|404935}}
 +
 +
Reading on scalability:
 +
 +
* [http://www.12factor.net 12 Factor]
 +
 +
Assorted cloud platform docs:
 +
 +
* [https://developers.google.com/compute/docs/disks Google compute disk APIs]
 +
* [http://docs.cloudfoundry.com/infrastructure/micro/using-mcf.html  Micro Cloud Foundry]
 +
* [http://www.openstack.org/software/openstack-storage/ OpenStack storage]
 +
* [http://aws.amazon.com/documentation/s3/ Amazon S3 API]

Latest revision as of 17:46, 21 July 2014

This plan lays out the work areas required to re architect the current Orion Java server for very large scale deployment. This is not all work we will be able to scope within a single release, but is intended to provide a roadmap of the work required to make this happen. Work items are roughly sorted by order of priority.

Assessment of current state

Much of the current Orion Java server is designed to enable scalability. There are a number of problem areas detailed in the next section, but the current server has some existing strengths:

  1. Nearly stateless. With the exception of Java session state for authentication, and some metadata caching for performance, the current server is stateless. The server can be killed and restarted at any time with no data loss. All data is persisted to disk at the end of each HTTP request, with the exception of asynchronous long running operations that occur in a background thread and span multiple HTTP requests. Authentication can be off-loaded to a front end proxy to avoid Java session state.
  2. Fast startup. The current server starts within 2-3 seconds, most of which is Java and OSGi startup. No data is loaded until client requests start to arrive. The exception is the background search indexer, which starts in a background thread immediately and starts crawling. If the search index has been deleted (common practice on server upgraded) it can take several minutes for it to complete indexing. This in no way blocks client requests - search results will just be potentially incomplete until the index is fully populated.
  3. Externalized configuration. Most server configuration is centrally managed in the orion.conf file. There are a small number of properties specified in orion.ini as well. Generally server configuration can be managed without code changes.
  4. Scalable core technologies. All of the core technologies we are using on the server are known to be capable of significant scaling: Equinox, Jetty, JGit, and Apache Solr. In some cases we will need to change how these services are configured but nothing we need to throw away and start from scratch. Solr doesn't handle search crawling but we could use complementary packages like Apache Nutch for crawling if needed.
  5. Pluggable credential store. The Orion server currently supports a completely pluggable credential store for Orion account information and passwords. We have a default implementation that uses Equinox secure storage, but we have also done implementations that support OpenID, Persona, and LDAP. While the default backing store is not suitable for enterprise scale use, we are confident that a scalable off the shelf credential solution can be plugged in with no code change.

Goal architecture

Our goal is to support an architecture that enables very large scale Orion deployments. What would be required if we had a million accounts or 100,000 simultaneous users? A vanilla Orion download would not need to support this out of the box without further configuration, but we want it to be possible to support such scenarios.

Our goal is to support the classic three tier web server architecture:

  1. Front end server used for load balancing, SSL encyrption, URL mapping and possibly caching of static content. Examples include the Apache front-end used at orionhub.org, or light front-ends such as nginx or Squid.
  2. Process servers of varying types, scaled independently. Current examples include Orion file server, git server, and search server.
  3. Persistence layer. Our current persistence layer is the file system, but it should be possible to interchange with other persistence implementations (e.g., database) without code changes in the Orion servers.

Work areas for scalability

Split shared content/metadata from Orion server metadata

Right now we use the OSGi instance area as the root of the user content/user workspaces, Orion metadata, search indices, and server plugin metadata. We should allow some of the most valuable shared content to be moved outside the OSGi instance area: The user content and the search indices. That way the server instances area is reserved for data local to that server and its OSGi bundles.

The user content can be located at orion.file.user.content.location, or in the instance area if that config property is not set.

The search indices can be located at orion.search.index.location, or in the instance area if that config property is not set.

  • bug 439716 - Allow user content outside of OSGi instance area
  • bug 439717 - Store the Solr search indices outside of the OSGi instance area

Metadata access by different Orion versions

There will always be some scenarios where upgrading metadata requires Orion to be shut down while the system upgrades, like the upgrade from simple to simpleV2 (where the metadata location moved to the user base directory.

But most metadata upgrades should be as forward and backward compatible as possible.

Strategy 1: migrations should add new properties

When a metadata upgrade runs it should only add new properties to the metadata objects (currently Metastore, User, Workspace, and Project). We already read and write JSON files, so the current version would just ignore the new properties, and they would be available to the new version.

With this strategy to change the meaning of a property you would create a new property, not reuse a current one.

To clean up properties that are no longer used, we would provide a post-upgrade command that can be used after the upgrade has completed successfully.

The version should be written into the metadata with the migration step so we know the latest version of the schema used with the metadata, but should be ignored by most other read/writes.

For rollback we have 2 options. Since the older version of Orion can just ignore the properties, we just ignore the properties. The downside is if the rollback is caused by a metadata bug, we'll have to increase the metadata version a second time when we deliver a fix.

The second rollback option is to try and back out any changes to the metadata and downgrade the version. If we've only added a few new properties, this might be do-able.


Strategy 2: incompatible metadata

For the case where we do have incompatibilities, we can add the version number to the new metadata filename. i.e. user.json will become user.8.json. That will allow the older version to continue to work while we upgrade to the newer version. A post-upgrade cleanup command can be run to remove old files once the upgrade is considered successful. A roll back just involves switching back to the older version of Orion, and deleting the newly created files.

  • bug 439715 - Metadata file upgrades should include the version in the filename

Metadata locking

Orion metadata is stored in simple JSON files, on a per-user basis. As such, contention for access to these files only occurs when multiple requests/processes are being carried out for the same user concurrently. To avoid corruption or invalid behavior we must use file-level locking to ensure integrity of the metadata when concurrent requests do occur. Action: sweep through all metadata read/write and ensure file locking is performed.

  • bug 439622 - Lock metadata on read/writes
  • bug 339602 - [server] Thread safety of workspace metadata back end

Orderly shutdown

Orion instances are generally designed to be disposable. Killing the Orion process is generally safe, but may result in a user operation being left incomplete. We need to ensure Orion responds promptly to a stop request and is able to do an orderly shutdown. Action: ensure all long running tasks that span requests respond to cancellation promptly.

Scalable search engine and indexer

Our interaction with the search engine is entirely over HTTP to an in-process Solr server. The indexer is currently run as a pair of background threads in the Orion server process (one for updating, one for handling deletion). Action: ensure there is only one indexer running when there are multiple Orion instances in play.

  • bug 439062 - Ensure only one search indexer is running

Our current index schema hits scalability limits when the workspace grows to the 600+ GB range. Query times are rising sharply. We need to investigate how to shard or otherwise scale the index to achieve linear query performance vs size of workspace.

Longer term, the Solr server should be forked out of process so it can be scaled out independently. We will likely need to eventually move to a more scalable search indexer such as Apache Nutch or Elasticsearch.

Search index versioning

We can allow mutliple Orion search engines to share the same indices as long as we provide some locking on the process. But during an upgrade that effects the search schema we need to upgrade the indices while allowing the older version to continue to run.

In the short term we can use the org.eclipse.orion.internal.server.search.SearchActivator.CURRENT_INDEX_GENERATION in the path we give to the Solr server to provide separation of the current version and new version during an upgrade.

Post upgrade cleanup would involve just deleting the old version search index directory. Roll back would involve deleting the new version of the directory.

For example, the indices should be in INSTANCE_AREA/.metadata/.plugins/org.eclipse.orion.server.core.search/vCURRENT_INDEX_GENERATION/ or SEARCH_ROOT/vCURRENT_INDEX_GENERATION/.

  • bug 439719 - Allow different versions of the Solr index to co-exist

Move long running Git operations out of process

Long running Git operations are currently run in background threads within the Orion server process. This should be moved to separate Git processes so that it can be scaled out independently. This would also enable Git processes to continue running if the Orion server node falls over or gets switched out. This can be implemented with a basic job queue. The Orion server writes a description of the long running Git operation to execute to a persistent queue. A separate Git server polls that queue and performs the git operations.

The currently used background jobs seem to be executed using an unlimited pool of worker threads, this can be used in a DoS attack to bring down the server (found that in load tests running tons of git commands). This should be fixed by limiting the number of worker threads / processes to a limit the available hardware can handle.

In addition we should investigate optimizing the configuration of JGit and any JVM running JGit. JGit makes a number of speed/space trade-offs that can be adjusted by configuring it different. For details see bug 344143.

We need to run gc on all git repositories to ensure proper performance of git commands. This could be achieved by

  • exposing gc to users (add a button to excplicitly start gc, that's what EGit does)
  • introducing a background job which periodically runs gc on all repositories / those which have many loose objects (that's what is typically done on Gerrit servers)
  • teaching jgit to auto-gc when there are many loose objects in a repository and a git command was requested (this is what native git command line does)

Log management

Orion primarily uses SLF4J for logging, written to stdout/stderr of the Orion server process. The log configuration is stored in a bundle of the Orion server. It needs to be possible to configure the log external to the Orion build itself. I.e., the Orion server configuration controls the logging rather than settings embedded with the code. We should continue writing log data to stdout so that external applications can be used to persist logs for remote monitoring and debugging.

Search for implicit assumptions about system tools (exec calls)

The Orion server is pure Java code so it should be able to run anywhere that can run a modern JVM (generally Java 7 or higher). We need to validate that we don't make assumptions about external system tools (i.e., System.exec calls). We should bundle the Java 7 back end of the Eclipse file system API to enable running any Orion build on arbitrary hardware.

Links

Main bug report: bug 440005

Main bug report for 3.0: bug 404935

Reading on scalability:

Assorted cloud platform docs:

Back to the top