# Air Travel Model

## Introduction

THIS DOC IS IN PREPARATION

STEM contains a model of Global Air Travel that covers 100% of commercial airports in the U.S. and about 80% of commercial airports world wide. The model was calibrated using data on individual tickets within the United States for all of 2007 from the U. S. Department of Transportation Research and Innovative Technology Administration Bureau of Transportation Statistics (RITA-BTS). Tickets give the origin and destination of full trips, rather than individual flights. The RITA-BTS ticket data (DB1BTicket from the Airline Origin and Destination Survey) are a sample of 10% of U.S. tickets from reporting carriers in that year. A complete description of the model is given in the following paper: [http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004403 The Cost of Simplifying Air Travel When Modeling Disease Spread.

The paper describes how we used U.S. ticket data from 2007 to compare a simplified **“pipe”** model, in which individuals flow in and out of the air transport system based on the number of arrivals and departures from a given airport, to a fully saturated model where all routes are modeled individually. We also compared the pipe model to a “gravity” model where the probability of travel is scaled by physical distance; the gravity model did not differ significantly from the pipe model.

The pipe model roughly approximates actual air travel, but tends to overestimate the number of trips between small airports and underestimate travel between major east and west coast airports. For most routes, the maximum number of false (or missed) introductions of disease is small (< 1 per day) but for a few routes this rate is greatly underestimated by the pipe model.

## Methodology

We obtained data on individual tickets within the United States for all of 2007 from the U. S. Department of Transportation Research and Innovative Technology Administration Bureau of Transportation Statistics (RITA-BTS). Tickets give the origin and destination of full trips, rather than individual flights. The RITA-BTS ticket data (DB1BTicket from the Airline Origin and Destination Survey) are a sample of 10% of U.S. tickets from reporting carriers. Using this model we calculated the probability of a trip originating at any airport A, terminating at any other airport B, as

P_{A,B}= T_{A,B}/T_{A}

where T_{A,B} is the number of trips from A to B, and T_{A} is the total number of trips originating at A. This defines the saturated, point-to-point model.

In order to account for the possibility of flights on unseen routes, we assigned 0.1 trip per year on every possible route not seen in the RITA-BTS data. These non-existent trips account for 0.01% of the trips considered in this analysis.

The simplified model we used is a “pipe” model, in which individuals flow in and out of the air transport system based on the number of arrivals and departures from a given airport (i.e., there is no explicit modeling of individual routes). Under this model the probability of a trip from origin A terminating at B is the proportion of all trips at any location ending at B:

To determine whether differences between p_{A,B} and p*_{A,B} could best be explained by the distance between the two locations, we considered a third “gravity” model of transport. Gravity models have proven useful in general (i.e., non-mode specific) models of transportation [5], and assume that the probability of an individual going from point A to point B is inversely proportional to some power of the distance between those locations. Under this model the probability that a trip from origin A terminates at B is:

We determined the appropriate β for this model by finding the value that maximized the likelihood of the data using a Newton type algorithm (as implemented in the nlm function in the R statistical language) [8]. Note that for a β of 0 this model reduces to the pipe model.

In infectious disease modeling we are interested in the rate of introductions from A to B, λA,B, and the overall rate of introductions into a given area, θB. Differences in these can be characterized in terms of their ratio, or their absolute difference. The latter is of more interest for the infectious disease modeler, because it can be used to quantify the expected rate of false introductions (or missed introductions) over the course of the epidemic. Table 1 shows these relations. We do not calculate θB over the course of the epidemic as this quantity does not have a closed form solution.

Table 1 goes here. All analysis was done using the R statistical package [7].