Development of a Risk Analysis Model for Producing High-Speed Rail Ridership and Revenue Forecasts presented to The 5 th Transportation Research Board Conference on Innovations in Travel Modeling presented by Cambridge Systematics, Inc. Rachel Copperman Co-authors and contributors Jeff Buxbaum, Weimin Huang, Kimon Proussaloglou, David Kurth, Moby Khan, George Mazur, Jason Lemp, Roberto Alvarado April 29, 2014
There Is Inherent Uncertainty at All Stages of the Forecasting Process Inputs Model Forecasts 2
We Sought to Capture Uncertainty Related to California High-Speed Rail (HSR) Forecasts Range of Inputs Version 2 R&R Model Range of Forecasts Risk analysis models provides a quick systematic methodology for producing a range of forecasts 3
We Followed Five Steps to Develop and Run the Risk Analysis Model 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Repeat Steps 2-5 for Each Analysis Year and Each Phase of HSR System Monte Carlo Simulation 4
We Compiled a Comprehensive List of Factors That May Affect HSR Ridership State growth and fiscal changes Overall growth Income level Household size Spatial distribution Job types Changes in large attractions Transportation system Fuel cost Highway capacity Security/screening Fares Frequency of service Autonomous vehicles Model-related risks Amount of total travel Travel by trip purpose Induced travel HSR share of travel 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 5
We Narrowed the List to Six Factors State growth and fiscal changes Overall growth Regional Spatial distribution Transportation system Fuel cost Air Fares Model related risks Amount of total travel HSR share of travel 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 6
State Growth Risk Factor Quantitative Value Distribution Rationale Overall Population and Employment Growth Ratio of future year households to observed year 2010 households Triangular and correlated Analyzed historical county-level socioeconomic estimates and forecasts Regional Spatial Distribution Ratio of San Joaquin Valley population to rest of California Overall socioeconomic growth is dependent on the fortunes of the San Joaquin Valley 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 7
Transportation System Risk Factor Quantitative Value Distribution Rationale Auto Operating Cost $/mile (2005$) Triangular Developed from U.S. EIA projections for gasoline prices and fuel efficiency forecasts Airline Fares Air fare skim factor Triangular Based on airline competitive response analysis 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 8
Model Related Risk Factor Quantitative Value Distribution Rationale High-Speed Rail Main Mode Choice Model Constants Change in HSR constant units from Base Normal Lowest point on the distribution corresponds to the conventional rail constant Trip Frequency Model Constants Annual average roundtrips per capita Normal Based on analysis of longdistance trip rates from various surveys 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 9
R&R Model was Run to Obtain Data Points for the Regression Equations Low and high values were selected for each risk factor Two-level fractional factorial design was pursued» Two-level full factorial = 64 runs = 2 levels ^ 6 risk factors» Fractional factorial = ½ of full factorial = 32 runs Added 15 additional runs to capture data points closer to median of each distribution Total of 47 model runs for each forecast year 10 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation
An Exponential Relationship Between Revenue and Resulted in the Best Model Fit Revenue and ridership were highly correlated We analyzed both linear and nonlinear transformations of model variables Revenue = exp (Intercept + a * Overall growth + b * Regional spatial distribution + c * Auto operating cost + d * Airline fares + e * HSR mode choice constant + f * Trip frequency constant) Estimated revenue was within 5% of R&R model revenue 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 11
Monte Carlo Simulation was Applied to the Risk Factor Distributions and Regression Equations We conducted the Monte Carlo simulation using the Crystal Ball add-on to Excel The simulation drew from the six risk factor distributions to construct 5,000 unique combinations of risk factor values Revenue was forecast by inputting these risk factor values into the regression equations 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 12
Cumulative Probability The Result of the Process was 5,000 Forecasts of Ridership and Revenue for Each Analysis Scenario 100 90 80 70 60 50 40 30 20 10 0 95 th Percentile = 1.95x 50 th Percentile = 1x 5 th Percentile = 0.5x 0.0 1.0 2.0 3.0 4.0 5.0 6.0 Revenue Normalized to 50 th Percentile 1. Identify 2. Develop Range of 3. Run R&R Model 4. Estimate Regression Models Monte Carlo Simulation 13
Selected the 15 th and 85 th Percentiles as Low and High Forecasts Reran R&R model for five scenarios surrounding 15 th percentile and five scenarios surrounding 85 th percentiles Averaged the output for the five runs Methodology results in Low and High trip totals, boarding counts, segment volumes, etc. that represent range of input variables 14
Conclusion Risk analysis models measures the uncertainty that exists in the forecasting process» Model estimation» Assumptions that underlie the forecasts Risk analysis models can provide a systematic methodology for producing Low and High scenario forecasts Risk analysis models are a useful tool as an alternative to varying a number of factors directly within a sophisticated travel demand model, that can take hours or days to run 15
16 Questions?
The Range and Distribution for the Socioeconomic Were Developed Together Risk factor quantitative values» Overall growth ratio of future year households to observed year 2010 households» Statewide spatial distribution ratio of San Joaquin Valley population to rest of California Distribution rationale» Based on an analysis of historical county-level socioeconomic estimates and forecasts from many sources» Correlation between risk factors is based on the finding that any departure from average statewide socioeconomic growth will depend on the fortunes of the San Joaquin Valley 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 17
The Range and Distribution for the Socioeconomic Were Developed Together (continued) Percentage of Statewide Growth in San Joaquin Valley Counties Distribution 3 (low Valley share) Distribution 2 Distribution 1 (high valley share) High Statewide Growth ~1% ~3% ~13% Mid Statewide Growth ~3% ~60% ~3% Low Statewide Growth ~13% ~3% ~1% 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 18
Auto Operating Cost Distributions are Based on U.S. Energy Information Administration (EIA) Projections Risk factor quantitative value» Auto operating cost dollars/mile (2005 dollars) Mid Value set at highest probability Distribution rationale» Low, mid, and high forecasts were based on analysis of EIA projections of gasoline prices and fuel efficiency Low Value set at the 15 th Percentile High Value set at the 85 th Percentile 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 19
Air Fare Distributions are Based on Competitive Response Scenarios Developed in Partnership with Aviation System Consulting Risk factor quantitative value» Air fare skim factor Mid Value set at highest probability Distribution rationale» Low and high forecasts were based on potential airline competitive response to the introduction of HSR Low Value set at the 15 th Percentile High Value set at the 85 th Percentile 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 20
Uncertainty in the HSR Constants Comes from the Distributional Assumptions of the Model Itself and the Data Used to Estimate the Model Risk factor quantitative value» Change in HSR constant units from calibrated model» The same change is applied to each trip purpose Distribution rationale» Uncertainty in the HSR constant comes from the following sources Mode choice model itself and the methodology used to calculate the HSR Stated-preference survey how data was collected, unknown bias in the survey instrument, respondents perceptions based on public opinion Introduction of a new mode that can not be calibrated to today s conditions Uncertainty exists in the HSR system itself» CVR constant should represent the minimum value for the distribution 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 21
Uncertainty in the HSR Constants Comes from the Distributional Assumptions of the Model Itself and the Data Used to Estimate the Model (continued) Mid value set at calibrated HSR constant Distribution assumed to be symmetrical and clustered around mean Normal Distribution Recreation/other CVR constant set at the 0.5 th percentile 22
Uncertainty in the Trip Frequency Constants Comes Primarily from the Data Used to Estimate the Model and How That Reflects Forecast Year Behavior Risk factor quantitative value» Annual average roundtrips per capita» The same change is applied to each trip purpose Distribution rationale» Estimation and calibration data was from the long-distance travel portion of the 2012-2013 California Household Travel Survey (CHTS) expanded to match 2010 California population» Estimated average annual trips per capita was close to the midpoint of national data collected in the 1995 American Travel Survey and the 2001 National Household Travel Survey» An additional long-distance survey (Harris survey) of CA predicted 2.2 annual trips less per person than the CHTS, which we are confident is low 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 23
Uncertainty in the Trip Frequency Constants Comes Primarily from the Data Used to Estimate the Model and How That Reflects Forecast Year Behavior (continued) Harris Survey average annual roundtrips per capita for set at the 0.5 th percentile Mid value set at calibrated average annual roundtrips per capita based on CHTS survey Distribution assumed to be symmetrical and clustered around mean Normal Distribution 1. Identify 2. Develop Range of Risk Factors and Distributions 3. Run R&R Version 2.0 Model 4. Develop Risk Analysis Regression Models Monte Carlo Simulation 24