Difference between revisions of "IT SLA"

From Eclipsepedia

Jump to: navigation, search
(Requesting Support)
(7 intermediate revisions by one user not shown)
Line 17: Line 17:
 
! Blocker  
 
! Blocker  
 
(Tier 1 service down; blocking entire team/project)
 
(Tier 1 service down; blocking entire team/project)
| IM (if available), Email to Webmaster
+
| IM/SMS text (if available), Email to Webmaster
| Strategic Members: see Support Policy
+
| Entreprise/Strategic Members: see [[Support Policy]]
Others: IM (if available), Email to Webmaster  
+
Others: IM/SMS text (if available), Email to Webmaster  
 
| Immediate  
 
| Immediate  
 
| Upon notification
 
| Upon notification
Line 25: Line 25:
 
! Major  
 
! Major  
 
(Tier 2 service down; password reset; permissions preventing commit & unable to commit; other issues blocking an individual committer)
 
(Tier 2 service down; password reset; permissions preventing commit & unable to commit; other issues blocking an individual committer)
|IM (if available), Email to Webmaster || Strategic Members: see Support Policy  Others: Email to Webmaster || Within 2 hours || Strategic Members: Upon notification  Others: next business day
+
|IM/SMS text (if available), Email to Webmaster || Entreprise/Strategic Members: see [[Support Policy]] Others: Email to Webmaster || Within 2 hours || Strategic Members: Upon notification  Others: next business day
 
|-
 
|-
 
! Normal  
 
! Normal  
Line 52: Line 52:
 
Occasionally, services must be shut down for maintenance.  The maintenance window is Sunday, from 6:00am to 8:00am ET.   
 
Occasionally, services must be shut down for maintenance.  The maintenance window is Sunday, from 6:00am to 8:00am ET.   
  
At least three (3) days notice will be given for scheduled maintenance on Tier 1 and Tier 2 services affecting all users. In cases where the maintenance affects specific projects (such as CVS refactoring, or CVS/SVN migrations), notification and scheduling will be co-ordinated with the affected projects via bugzilla or public mailing list.  
+
At least three (3) days notice will be given for scheduled maintenance on Tier 1 and Tier 2 services affecting all users. In cases where the maintenance affects specific projects (such as SCM refactoring, or SCM migrations), notification and scheduling will be co-ordinated with the affected projects via bugzilla or public mailing list.  
  
 
Emergency maintenance may occur at any time, and service notices will be made on a "Best Effort" basis.
 
Emergency maintenance may occur at any time, and service notices will be made on a "Best Effort" basis.
Line 63: Line 63:
  
 
* Bugzilla
 
* Bugzilla
* CVS (pserver and SSH) / Subversion (svn and svn+ssh, excluding http and https)
+
* SCM (CVS (pserver and SSH) / Subversion (svn and svn+ssh) / Git (git and SSH)
 
* Website: www.eclipse.org
 
* Website: www.eclipse.org
 
  
 
=== Tier 2  - Best Effort ===
 
=== Tier 2  - Best Effort ===
 
These services offer support for important Eclipse-related activities, and their availability is based on "best effort"; Webmasters may be contacted (by authorized persons) on mobile devices for problem resolution, and will make a reasonable effort to restore service outside of support hours.
 
These services offer support for important Eclipse-related activities, and their availability is based on "best effort"; Webmasters may be contacted (by authorized persons) on mobile devices for problem resolution, and will make a reasonable effort to restore service outside of support hours.
  
* Build server
+
* Build server, Hudson infra
 
* Mailing lists
 
* Mailing lists
* SVN over http/https
 
 
* Websites: dev, download, wiki, EclipseCON
 
* Websites: dev, download, wiki, EclipseCON
  
Line 91: Line 89:
 
|-
 
|-
 
! Tier 1
 
! Tier 1
| 99.99%  
+
| >99.98%  
 
|-
 
|-
 
! Tier 2
 
! Tier 2
|99%  
+
|Best Effort (>99%)
 
|-
 
|-
 
! Tier 3
 
! Tier 3
|Best Effort
+
|Next Business Day (>95%)
 
|}
 
|}
  
Line 119: Line 117:
 
* Installed software must be production quality - no Alpha or Beta code.
 
* Installed software must be production quality - no Alpha or Beta code.
 
* Only required software is to be installed and used on Tier 1 and Tier 2 clusters. Software that is not required for the basic operation of the service increases the risk of memory leaks and security vulnerabilities, and may negatively affect performance.
 
* Only required software is to be installed and used on Tier 1 and Tier 2 clusters. Software that is not required for the basic operation of the service increases the risk of memory leaks and security vulnerabilities, and may negatively affect performance.
* Server-side services, such as CVS and Apache, must be bundled with the Entreprise OS we use. Web-based services, such as Bugzilla, can be compiled from source, as they use an underlying OS service to manage ports, access and privilege separation.
+
* Server-side services, such as SCM systems and Apache, must be bundled with the Entreprise OS we use. Web-based services, such as Bugzilla, can be compiled from source, as they use an underlying OS service to manage ports, access and privilege separation.
 
* Installed software must be tested on an isolated node to ensure it doesn't impact the other services.
 
* Installed software must be tested on an isolated node to ensure it doesn't impact the other services.
  
Line 136: Line 134:
 
* Read and apply the Software Upgrade policies - no betas, etc.
 
* Read and apply the Software Upgrade policies - no betas, etc.
 
* One cluster node is usually set up with make/gcc etc. We don't usually leave the make tools on all nodes.
 
* One cluster node is usually set up with make/gcc etc. We don't usually leave the make tools on all nodes.
* Only download/compile software from a reputable source. Run MD5 sums.
+
* Only download/compile software from a reputable source. Run MD5/SHA1 sums.
 
* If software must be compiled from source, '''software must be compiled as a non-root user'''. This is non-negotiable, as there is no reason to compile as root. Document any compilation and/or installation process so we can upgrade later.
 
* If software must be compiled from source, '''software must be compiled as a non-root user'''. This is non-negotiable, as there is no reason to compile as root. Document any compilation and/or installation process so we can upgrade later.
 
* If software is to be installed on each cluster node, such as SVN, create an RPM package and/or use a 'make install' procedure so that we can repeat the installation on other nodes.
 
* If software is to be installed on each cluster node, such as SVN, create an RPM package and/or use a 'make install' procedure so that we can repeat the installation on other nodes.

Revision as of 13:10, 12 June 2012

The Eclipse Foundation's IT team (the Webmasters) provides computer and network services and support that enable the Eclipse community, committers, members and EMO staff to access information and networked applications in a timely manner.

Contents

Webmaster Support

Webmaster Hours

Eclipse Webmasters are available full-time from Monday to Friday, from 8:00am to 5:00pm Eastern Time, and on call outside those hours.

Requesting Support

Webmasters will attempt to provide support and resolve issues in a timely manner according to the severity of the issue and prevailing conditions. Due to the varying nature of requests and the fluctuating demands on the Webmasters, resolution times may vary. For service definitions, please see Services Covered below.

Webmaster Support Request
Severity Request Process (webmaster hours) Request process (outside webmaster hours) Response time [1] (webmaster hours) Response time [1] (outside webmaster hours)
Blocker

(Tier 1 service down; blocking entire team/project)

IM/SMS text (if available), Email to Webmaster Entreprise/Strategic Members: see Support Policy

Others: IM/SMS text (if available), Email to Webmaster

Immediate Upon notification
Major

(Tier 2 service down; password reset; permissions preventing commit & unable to commit; other issues blocking an individual committer)

IM/SMS text (if available), Email to Webmaster Entreprise/Strategic Members: see Support Policy Others: Email to Webmaster Within 2 hours Strategic Members: Upon notification Others: next business day
Normal

(Tier 3 service down; regular, non-blocking requests; signing)

Open Bug Open Bug Within 4 hours Within next business day
Provisioning

(Account; Project; vserver; code restructuring)

Open Bug Open Bug Within next 5 business days Within next 5 business days
Enhancement

(Requesting new software; site improvements; etc)

Open Bug Open Bug Best Effort Best Effort

[1] Typical time to respond to a request. Time to complete a request will vary according to the complexity of the request and the time required to gather all the information needed to complete the request.

Computer Systems

Service Hours

All services are expected to be available 24 hours a day, 365 days per year, except during scheduled maintenance periods.

Maintenance

Occasionally, services must be shut down for maintenance. The maintenance window is Sunday, from 6:00am to 8:00am ET.

At least three (3) days notice will be given for scheduled maintenance on Tier 1 and Tier 2 services affecting all users. In cases where the maintenance affects specific projects (such as SCM refactoring, or SCM migrations), notification and scheduling will be co-ordinated with the affected projects via bugzilla or public mailing list.

Emergency maintenance may occur at any time, and service notices will be made on a "Best Effort" basis.


Services Covered

Tier 1 - Critical

These services are the backbone of the Eclipse.org community and must be available at all times.

  • Bugzilla
  • SCM (CVS (pserver and SSH) / Subversion (svn and svn+ssh) / Git (git and SSH)
  • Website: www.eclipse.org

Tier 2 - Best Effort

These services offer support for important Eclipse-related activities, and their availability is based on "best effort"; Webmasters may be contacted (by authorized persons) on mobile devices for problem resolution, and will make a reasonable effort to restore service outside of support hours.

  • Build server, Hudson infra
  • Mailing lists
  • Websites: dev, download, wiki, EclipseCON

Tier 3 - Next Business Day

These services are supported during webmaster hours. Webmasters may tend to issues during off-hours if they happen to be observed at that time.

  • Project vservers
  • Websites: help, EPIC, EclipseLive, PlanetEclipse, Blogs
  • Other services not listed in Tier 1 and Tier 2

Service Availability

Service is considered unavailable if it is unable to respond to user requests after 5 attempts in three minutes. The service is not considered unavailable if it is simply degraded or slow, although the IT team will consider degraded performance a high priority issue.

Service Availability
Tier Availability
Tier 1 >99.98%
Tier 2 Best Effort (>99%)
Tier 3 Next Business Day (>95%)

Please note: scheduled maintenance does not constitute a down time.

SLA strategies

As a rule, the IT team observe by the following guidelines to ensure server uptime, responsiveness and stability:

  • Eclipse.org production servers are not used as test machines.
  • Beta, Alpha, or test code on production servers is prohibited.
  • Anything that poses a threat to the availability, the data integrity or the performance Tier 1 and Tier 2 services can and must be terminated.
  • Committers and EMO staff are not permitted to run code on any server or hardware hosting a Tier 1 service.
  • Eclipse.org IT uses F/OSS software only.


Software installation policies and procedures

  • Clusters are used for Tier 1 and Tier 2 services where fault tolerance, scalability and performance are required.
  • Installed software must be production quality - no Alpha or Beta code.
  • Only required software is to be installed and used on Tier 1 and Tier 2 clusters. Software that is not required for the basic operation of the service increases the risk of memory leaks and security vulnerabilities, and may negatively affect performance.
  • Server-side services, such as SCM systems and Apache, must be bundled with the Entreprise OS we use. Web-based services, such as Bugzilla, can be compiled from source, as they use an underlying OS service to manage ports, access and privilege separation.
  • Installed software must be tested on an isolated node to ensure it doesn't impact the other services.


Software upgrade policies and procedures

  • Release-quality software is used. No Release Candidates or Milestones.
  • A period of at least 10 working days must pass before software is upgraded, to allow the maintainers to detect and fix any defects with the shipped product.
  • Software upgrades must be tested on an isolated node to minimize impact on other services.
  • If software is to be compiled from source (avoid!), follow the Software Compiling policies


Software Compiling policies

  • As much as possible, avoid compiling software from source, as maintenance is tedious. Use a vendor OS package instead.
  • Read and apply the Software Upgrade policies - no betas, etc.
  • One cluster node is usually set up with make/gcc etc. We don't usually leave the make tools on all nodes.
  • Only download/compile software from a reputable source. Run MD5/SHA1 sums.
  • If software must be compiled from source, software must be compiled as a non-root user. This is non-negotiable, as there is no reason to compile as root. Document any compilation and/or installation process so we can upgrade later.
  • If software is to be installed on each cluster node, such as SVN, create an RPM package and/or use a 'make install' procedure so that we can repeat the installation on other nodes.


Operating System upgrade policies and procedures

  • Only upgrade to Release-quality software. No Release Candidates or Milestones.
  • Kernel upgrades must be tested on an isolated node, and tested in a production environment before being deployed to the entire cluster.
  • OS upgrades must be tested on an isolated node, and tested in a production environment before being deployed to the entire cluster.
  • Backend servers (storage, database, authentication) are *not* upgraded unless a problem arises where upgrading may solve it (i.e., MySQL) or there is a security issue that poses a risk to Tier 1 Services.