Difference between revisions of "IT SLA"
(New page: The Eclipse Foundation's IT team (the Webmasters) provides computer and network services and support that enable the Eclipse community, committers, members and EMO staff to access informat...)
|Line 9:||Line 9:|
== Requesting Support ==
== Requesting Support ==
Webmasters will attempt to provide support and resolve issues in a timely manner according to the severity of the issue and prevailing conditions. Due to the varying nature of requests and the fluctuating demands on the Webmasters, resolution times may vary. For service definitions, please see [[
Webmasters will attempt to provide support and resolve issues in a timely manner according to the severity of the issue and prevailing conditions. Due to the varying nature of requests and the fluctuating demands on the Webmasters, resolution times may vary. For service definitions, please see [[#Services_Covered|Services Covered]] below.
Revision as of 13:59, 19 December 2008
The Eclipse Foundation's IT team (the Webmasters) provides computer and network services and support that enable the Eclipse community, committers, members and EMO staff to access information and networked applications in a timely manner.
- 1 Webmaster Support
- 2 Computer Systems
- 2.1 Service Hours
- 2.2 Maintenance
- 2.3 Services Covered
- 2.4 Service Availability
- 2.5 SLA strategies
Eclipse Webmasters are available full-time from Monday to Friday, from 8:00am to 8:00pm Eastern Time, and on call outside those hours.
Webmasters will attempt to provide support and resolve issues in a timely manner according to the severity of the issue and prevailing conditions. Due to the varying nature of requests and the fluctuating demands on the Webmasters, resolution times may vary. For service definitions, please see Services Covered below.
|Severity||Request Process (webmaster hours)||Request process (outside webmaster hours)||Response time  (webmaster hours)||Response time  (outside webmaster hours)|
(Tier 1 service down; blocking entire team/project)
|IM (if available), Email to Webmaster|| Strategic Members: see Support Policy
Others: IM (if available), Email to Webmaster
(Tier 2 service down; password reset; permissions preventing commit & unable to commit; other issues blocking an individual committer)
|IM (if available), Email to Webmaster||Strategic Members: see Support Policy Others: Email to Webmaster||Immediate||Strategic Members: Immediate Others: next business day|
(Tier 3 service down; regular, non-blocking requests; signing)
|Open Bug||Open Bug||Within 4 hours||Within next business day|
(Account; Project; vserver; code restructuring)
|Open Bug||Open Bug||Within next 5 business days||Within next 5 business days|
(Requesting new software; site improvements; etc)
|Open Bug||Open Bug||Best Effort||Best Effort|
 Typical time to respond to a request. Time to complete a request will vary according to the complexity of the request and the time required to gather all the information needed to complete the request.
All services are expected to be available 24 hours a day, 365 days per year, except during scheduled maintenance periods.
Occasionally, services must be shut down for maintenance. The maintenance window is Sunday, from 6:00am to 8:00am ET.
At least three (3) days notice will be given for scheduled maintenance on Tier 1 and Tier 2 services affecting all users. In cases where the maintenance affects specific projects (such as CVS refactoring, or CVS/SVN migrations), notification and scheduling will be co-ordinated with the affected projects via bugzilla or public mailing list.
Emergency maintenance may occur at any time, and service notices will be made on a "Best Effort" basis.
Tier 1 - Critical
These services are the backbone of the Eclipse.org community and must be available at all times.
- CVS (pserver and SSH) / Subversion (svn and svn+ssh, excluding http and https)
- Website: www.eclipse.org
Tier 2 - Best Effort
These services offer support for important Eclipse-related activities, and their availability is based on "best effort"; Webmasters may be contacted (by authorized persons) on mobile devices for problem resolution, and will make a reasonable effort to restore service outside of support hours.
- Build server
- Mailing lists
- SVN over http/https
- Websites: dev, download, wiki, EclipseCON
Tier 3 - Next Business Day
These services are supported during webmaster hours. Webmasters may tend to issues during off-hours if they happen to be observed at that time.
- Project vservers
- Websites: help, EPIC, EclipseLive, PlanetEclipse, Blogs
- Other services not listed in Tier 1 and Tier 2
Service is considered unavailable if it is unable to respond to user requests after 5 attempts in three minutes. The service is not considered unavailable if it is simply degraded or slow, although the IT team will consider degraded performance a high priority issue.
|Tier 3||Best Effort|
Please note: scheduled maintenance does not constitute a down time.
As a rule, the IT team observe by the following guidelines to ensure server uptime, responsiveness and stability:
- Eclipse.org production servers are not used as test machines.
- Beta, Alpha, or test code on production servers is prohibited.
- Anything that poses a threat to the availability, the data integrity or the performance Tier 1 and Tier 2 services can and must be terminated.
- Committers and EMO staff are not permitted to run code on any server or hardware hosting a Tier 1 service.
- Eclipse.org IT uses F/OSS software only.
Software installation policies and procedures
- Clusters are used for Tier 1 and Tier 2 services where fault tolerance, scalability and performance are required.
- Installed software must be production quality - no Alpha or Beta code.
- Only required software is to be installed and used on Tier 1 and Tier 2 clusters. Software that is not required for the basic operation of the service increases the risk of memory leaks and security vulnerabilities, and may negatively affect performance.
- Server-side services, such as CVS and Apache, must be bundled with the Entreprise OS we use. Web-based services, such as Bugzilla, can be compiled from source, as they use an underlying OS service to manage ports, access and privilege separation.
- Installed software must be tested on an isolated node to ensure it doesn't impact the other services.
Software upgrade policies and procedures
- Release-quality software is used. No Release Candidates or Milestones.
- A period of at least 10 working days must pass before software is upgraded, to allow the maintainers to detect and fix any defects with the shipped product.
- Software upgrades must be tested on an isolated node to minimize impact on other services.
- If software is to be compiled from source (avoid!), follow the Software Compiling policies
Software Compiling policies
- As much as possible, avoid compiling software from source, as maintenance is tedious. Use a vendor OS package instead.
- Read and apply the Software Upgrade policies - no betas, etc.
- One cluster node is usually set up with make/gcc etc. We don't usually leave the make tools on all nodes.
- Only download/compile software from a reputable source. Run MD5 sums.
- If software must be compiled from source, software must be compiled as a non-root user. This is non-negotiable, as there is no reason to compile as root. Document any compilation and/or installation process so we can upgrade later.
- If software is to be installed on each cluster node, such as SVN, create an RPM package and/or use a 'make install' procedure so that we can repeat the installation on other nodes.
Operating System upgrade policies and procedures
- Only upgrade to Release-quality software. No Release Candidates or Milestones.
- Kernel upgrades must be tested on an isolated node, and tested in a production environment before being deployed to the entire cluster.
- OS upgrades must be tested on an isolated node, and tested in a production environment before being deployed to the entire cluster.
- Backend servers (storage, database, authentication) are *not* upgraded unless a problem arises where upgrading may solve it (i.e., MySQL) or there is a security issue that poses a risk to Tier 1 Services.