Jetty/Tutorial/Apache

From Eclipsepedia

< Jetty‎ | Tutorial
Revision as of 17:26, 10 January 2011 by Lajoie.itumi.biz (Talk | contribs)

Jump to: navigation, search



Contents

Introduction

Apache httpd is a HTTP server written in C, that is often used to front other web services. Jetty is a full functional and optimized HTTP server and has no need of an apache httpd instance between it and the internet. However, deployers often want to place an instance of apache between Jetty and the internet for some of the following "reasons":

  • Performance. Apache Httpd does have slightly superior performance to jetty for pure HTTP request handling. However, for dynamic response generation, apache must pass the request to another process and the resulting double handling reduces the total throughput to less than direct requests to Jetty. More over, with the advent of comet style web applications, long held requests are common and the apache thread model assigns a thread per outstanding request, so apache does not scale to large numbers of comet connections.
  • Static content. Apache Httpd is very good at serving static content fast. However, Jetty is no slouch either as it can use direct memory mapped buffers for static content, so that only kernel space is used for the data transfer. Besides, if your application has a lot of static content, then you will get much better results by either ensuring good client caching or serving the content from an CDNS edge cache.
  • Security. Some believe that apache gives them a more secure solution as there are no TCP/IP connections terminating on Jetty. However, since Jetty is written in Java, it is not vulnerable to the class of security exploit that a server written in C is. Jetty has a good security record, but has had some past issues, but mostly of the nature that would not have been helped by a fronting instance of Apache.
  • Load Balancing. Apache has several options for load balancing between multiple servlet servers. These solutions are reasonable, but there are better software and appliance load balancers available. The main limitation of apache as a load balancer is that it's threading model is not-asynchronous, so scaling is limited (specially for comet traffic).
  • Administration. Often an enterprise has staff who are very familiar with apache and thus have a strong preference to deploy everything behind apache. This can be a good reason to avoid chaos in a deployment environment, so long as some of the performance and scalability limitations do not affect your web application.

So if we have not yet convinced you to not use apache, read on for the best way to do it. This tutorial can be followed step by step to build up more and more capabilities into your apache configuration.

Details

Which Module ?

Apache provides two mechanisms by which a request that it can receive can be forwarded to a servlet container like Jetty.

Mod_jk is a module written specifically for communicating with the apache tomcat server via the AJP protocol. It includes a load balancer and some management interfaces. Jetty supports this protocol via it's AJP connector, but we do not recommend using mod_jk since:

  • While the binary AJP protocol is more compact than HTTP, there is little benefit from this as the link between apache and the servlet container is often either local or over a fast LAN. Jetty is highly optimized for handling HTTP and HTTP semantics are well known and documented. Using AJP can change those semantics and reduce some key optimizations.
  • The mod_jk modules is maintained with the tomcat project rather than with the httpd project, thus it is not documented to the same standard as other apache modules and there are frequent version issues of which mod_jk should go with which apache.
  • The AJP protocol has been at verion 13 for some time, however there have been changes in the protocol without changing of the version number. Incompatibilities can frequently result.

The mod_proxy modules are superior in features, maintained with apache httpd, support HTTP and AJP and has a rich load balancer. We highly recommend using mod_proxy when using Jetty with apache.

Configuring Apache

Distributions of apache differ greatly about their approach to apache configuration files. The main difference is if the entire configuration is placed in a single file (apache.conf or httpd.conf) or split up into multiple directories of configuration files (conf.d, ports.conf, mods_available, mods-enabled) with the use of symlinks to activate modules. Configuration may also be done at the server level, or embeded within a VirtualHost configuration of the server.

This tutorial does not recommend or discuss in detail either approach and simply outlines the configuration directives needed. Where these directive are placed will depend greatly on your distribution and existing configuration.

In order to use any of the modules described below, they must first be loaded into the httpd server, so the following directives can be used to load all the modules discussed

LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_balancer_module /usr/lib/apache2/modules/mod_proxy_balancer.so
LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so
LoadModule proxy_ajp_module /usr/lib/apache2/modules/mod_proxy_ajp.so
LoadModule jk_module /usr/lib/apache2/modules/mod_jk.so

In some distributions, these load directives can be enabled with symlinks:

cd $APACHE_HOME/mods-enabled
ln -s ../mods-available/proxy.load proxy.load
ln -s ../mods-available/proxy_http.load proxy_http.load

Configuring mod_proxy

The full documentation for configuring mod_proxy is available for apache 1.3, 2.0, 2.1, 2.2.

The following directives form a good base configuration for mod_proxy:

# Turn off support for true Proxy behaviour as we are acting as 
# a transparent proxy
ProxyRequests Off
 
# Turn off VIA header as we know where the requests are proxied
ProxyVia Off
 
# Turn on Host header preservation so that the servlet container
# can write links with the correct host and rewriting can be avoided.
ProxyPreserveHost On
 
 
# Set the permissions for the proxy
<Proxy *>
  AddDefaultCharset off
  Order deny,allow
  Allow from all
</Proxy>
 
# Turn on Proxy status reporting at /status
# This should be better protected than: Allow from all
ProxyStatus On
<Location /status>
  SetHandler server-status
  Order Deny,Allow
  Allow from all
</Location>

Configuring mod_proxy_http

To connect to servlet container with HTTP protocol, the ProxyPass directive can be used to send requests received on a particular URL to a Jetty instance. The following example will proxy all requests received by apache on /test/* to the /context running on the local jetty instance on port 8080:

ProxyPass /test http://localhost:8080/context

Alternately, the location directive can be used to group multiple directives for the same URL:

<Location /test/>
  ProxyPass /test http://localhost:8080/context
  SetEnv proxy-nokeepalive 1
</Location>

The mod_proxy_http will set some additional headers on the requests that it proxies:

  • X-Forwarded-For - The IP address of the client
  • X-Forwarded-Host - The original host requested by the client in the Host HTTP request header
  • X-Forwarded-Server - The hostname of the proxy server

While not supported directly by mod_proxy_http, Jetty also understands the following experimental request header:

  • X-Forwarded-Proto - The URL protocol scheme of the original request

One option for setting this, if the protocol schema is static, is to use mod_headers RequestHeader directive.

If the values of these headers are meaningful to your web application, then Jetty can be configured to interpret them and make their values available via the servlet API. The setForwarded(true) method should be called on the connector. This can be done in jetty.xml like:

<Call name="addConnector">
  <Arg>
    <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
       <Set name="host"><SystemProperty name="jetty.host" /></Set>
       <Set name="port"><SystemProperty name="jetty.port" default="8080"/></Set>
       <Set name="forwarded">true</Set>
     </New>
   </Arg>
</Call>

Proxying SSL on Apache to HTTP on Jetty

The situation here is:

  https                 http
--------->   Apache   -------> Jetty

In other words, you have offloaded your SSL onto Apache and you want to use plain http to proxy to Jetty. You want Jetty to return all redirected pages using https:// to your Apache server. You can do that by setting the X-Forwarded-Proto as describe above.

If you need access on Jetty to some of the SSL information accessible on Apache, then you need to some configuration tricks on Apache to insert the SSL info as headers on outgoing requests. Follow the Apache configuration suggestions on http://www.zeitoun.net/articles/client-certificate-x509-authentication-behind-reverse-proxy/start which shows you how to use mod_headers to insert the appropriate request headers. Of course you will also need to code your application to look for the corresponding custom request headers bearing the SSL information.


Configuring mod_proxy_ajp

To connect to servlet container with AJP protocol, the ProxyPass directive can be used to send requests received on a particular URL to a Jetty instance, using "ajp" as the protocol on the URL. The following example will proxy all requests received by apache on /test/* to the /context running on the local jetty instance accepting AJP on port 8009:

ProxyPass /test ajp://localhost:8009/context

In order to accept AJP, the jetty instance must be started with an AJP connector configured. This can normally be done with the command line like:

java -jar start.jar OPTIONS=Server,ajp etc/jetty.xml etc/jetty-ajp.xml

The contents of the jetty-ajp.xml file simply add an AJP connector with the following

<Call name="addConnector">
  <Arg>
     <New class="org.eclipse.jetty.ajp.Ajp13SocketConnector">
       <Set name="port">8009</Set>
     </New>
  </Arg>
</Call>

It is recommended to NOT use the AJP protocol, and superior performance and clearer semantics will be achieve using HTTP.

Configuring mod_proxy_balancer

The full documentation for configuring mod_proxy_balancer is available for apache 2.1 and 2.2.

The balancer allows a received request to be proxied to one of several Jetty instances using either HTTP or AJP as the protocol. The following example shows how all requests to /test can be proxied to a two node cluster:

ProxyPass /test balancer://mycluster/context
<Proxy balancer://mycluster>
    BalancerMember http://myhost1.org:8080
    BalancerMember http://myhost2.org:8080
</Proxy>

If your webapplication uses sessions, then it is highly desirable to ensure that all requests for the same session are sent to the same node in the cluster. This can be achieved by appending a worker name to the session ID used by Jetty. In Jetty, the session IDs are managed by a session ID manager that can be shared between multiple contexts or set for the entire server. The following example shows how the jetty context/text.xml file may be used to set a ID manager and worker name of "node1" on the test context:

<Configure class="org.eclipse.jetty.webapp.WebAppContext">
  <Set name="contextPath">/test</Set>
  <Set name="war"><SystemProperty name="jetty.home" default="."/>/webapps/test.war</Set>
  <Get name="sessionHandler">
     <Get name="sessionManager">
       <Set name="idManager">
         <New class="org.eclipse.jetty.server.session.HashSessionIdManager">
           <Set name="workerName">node1</Set>
         </New>
       </Set>
     </Get>
  </Get>
</Configure>

Once your jetty instances have been configured with worker names, then the following configuration will set up mod_proxy_balancer to look for those worker names in the JSESSIONID cookie and jsessionid URL parameter:

ProxyPass /test balancer://mycluster/context stickysession=JSESSIONID|jsessionid nofailover=On
<Proxy balancer://mycluster>
    BalancerMember http://myhost1.org:8080 route=node1
    BalancerMember http://myhost2.org:8080 route=node2
</Proxy>

If your cluster supports distributed sessions (via Database, Wadi, terracotta, gigaspaces, etc), then you can set nofailover=Off, so that if a node fails then the balancer will reroute the request to another node in the cluster. Jetty will automatically rewrite the worker ID of a cookie for a rerouted request. With nofailover=On, an 503 unavailable response will be sent if a worker node fails.

Proxy Rewriting

When a request has been proxied to another server, often the response can be generated with incorrect links, cookie domains and redirection headers. However, a well written web application will either use relative links and/or the Host header to generate absolute addresses. So if ProxyPreserveHost directive is on, then often no rewriting is required.

However, not all web applications are well written with regards to the Host header, and some hard code domain names. If this is the case with your webapp, then you may need to rewrite some headers and links. The following example shows how the ProxyPassReverse directives can be used to rewrite headers and cookies.

ProxyPass /mirror/foo/ http://backend.example.com/
ProxyPassReverse /mirror/foo/ http://backend.example.com/
ProxyPassReverseCookieDomain backend.example.com public.example.com
ProxyPassReverseCookiePath / /mirror/foo/

If there are links within the body of the response that need to be rewritten, then the non-apache mod_proxy_html may be used.