Ram M. Pendyala and Karthik C. Konduri School of Sustainable Engineering and the Built Environment Arizona State University, Tempe

Ram M. Pendyala and Karthik C. Konduri School of Sustainable Engineering and the Built Environment Arizona State University, Tempe Using Census Data for Transportation Applications Conference, Irvine, Oct 25-27, 2011

Outline Motivation for population synthesis What is population synthesis? Standard IPF procedure Motivation for enhanced population synthesis New Iterative Proportional Updating (IPU) Algorithm Explanation of procedure Geometric Interpretation Census Databases for Population Synthesis Case Studies Population Evolution Model Model Components Conclusions

Microsimulation Models of Travel Increasing interest in microsimulation models for travel demand forecasting Microsimulation models simulate travel at the level of the individual decision-maker while recognizing inter-dependencies among activities, trips, persons, time, and space Microsimulation models of travel increasingly based on activitybased paradigm of travel behavior Explicit recognition of derived nature of travel demand Enhanced representation of time-space interactions and constraints

Microsimulation Models of Travel (continued) Activity-based microsimulation modeling approaches offer ability to address emerging policy questions of interest By simulating activities and travel at the level of the individual traveler, these models are able to address impacts of: Greenhouse gas emissions reduction targets Flexible working arrangements Impact of information and communication technology (ICT) Interactions between micro-scale land use changes and travel Pricing-based policies Non-motorized transportation mode enhancements

Why Population Synthesis? We need disaggregate household and person sociodemographic data for entire population of model region Such data for the entire population is generally not available This leads to the need to synthesize a regional population from known statistical distributions on the population We have: Disaggregate data for a sample of the population (PUMS, travel surveys) Marginal distributions for the entire region (census summary files, agency forecasts)

What is Population Synthesis? Population synthesis involves generating a synthetic population by expanding the disaggregate sample data to mirror known aggregate distributions of household and person variables of interest.

Standard IPF-Based Procedure Standard IPF (iterative proportional fitting)-based procedure based on Beckman et al (1996) Procedure Choose household-level control variables Obtain the marginal distributions on these variables from census summary files (SF) Generate a seed matrix of the joint distribution from a microdata sample data set (PUMS, travel survey) Expand the seed matrix using an IPF-procedure to match the given marginal control totals while maintaining the joint distribution implied by the seed matrix

Standard IPF-Based Procedure (continued) Selection probabilities are estimated for households in the microdata sample Households are drawn using the selection probabilities to match the expanded cell frequencies The resulting synthetic population is checked for goodness-offit and households are redrawn if necessary The synthetic population is comprised of all individuals within the synthesized (drawn) households

Motivation for Enhancement Key limitation of the standard IPF-based procedure Controls only for household attributes and not person attributes Synthetic populations fail to match distributions of person characteristics of interest The method ignores differences in household composition among households within a cell Hence the need to re-assign weights to sample households based on household composition

Recent Literature The issue has been recognized by researchers and a number of solutions have been proposed Guo and Bhat (2007) Arentze and Timmermans (2007) Pritchard and Miller (2009) Srinivasan et al (2009) Ye et al (2009) Lee and Yingfei (2011) Muller and Axhausen (2011)

PopGen: A New Population Synthesizer Incorporates a new Iterative Proportional Updating (IPU) algorithm for estimating household weights The algorithm estimates sample household weights such that BOTH household and person distributions are matched Simple, practical, and computationally tractable algorithm with an intuitive interpretation Basic idea behind IPU algorithm in PopGen Reallocate weights among sample households of a type to account for differences in household composition

PopGen Methodology Step 1: Estimate Household and Person Type Constraints household and person sample data household and person level marginal distributions Adjust priors to account for zero-cell problem Adjust marginals to account for the zero-marginal problem Run Iterative Proportional Fitting (IPF) procedure to estimate household and person type constraints

PopGen Methodology (continued) Step 2: Estimate Household Weights household and person sample data household and person type constraints from Step 1 Run the Iterative Proportional Updating (IPU) algorithm to estimate sample household weights that satisfy both household and person type constraints

PopGen Methodology (continued) Step 3: Generate the Synthetic Population household and person sample data household weights from Step 2 Apply rounding procedures to get the frequency of different household types in the synthetic population Estimate household selection probabilities using the computed weights Draw sample households based on selection probabilities for each household to match cell frequencies Repeat the process until a synthetic population with the best fit is obtained

PopGen Terminology Household Type Not to be confused with a household attribute household type Refers to a combination of household-level variables of interest Represents a cell in the joint distribution of a set of householdlevel variables Person Type Similar to above formed by a combination of multiple personlevel variables of interest

PopGen Terminology (continued) A measure of fit ( value) Measures the absolute relative deviation between the IPU-adjusted cell frequency and the IPF-estimated household/person type constraints Average value across all constraints is used as a goodness-of-fit measure Average value is also used to monitor and set convergence criterion for the IPU algorithm

PopGen Terminology (continued) A measure of fit ( value) j d i, j w c i j c j d i,j w i = adjusted cell frequency c j = the j th IPF-estimated constraint

Illustration of IPU Algorithm Household ID Initial Weights Frequency Matrix Household Type 1 Household Type 2 Person Type 1 Person Type 2 1 1 1 0 1 1 1 2 1 1 0 1 0 1 3 1 1 0 2 1 0 4 1 0 1 1 0 2 5 1 0 1 0 2 1 6 1 0 1 1 1 0 7 1 0 1 2 1 2 8 1 0 1 1 1 0 Person Type 3 Weighted Sum 3.00 5.00 9.00 7.00 7.00 Constraints 35.00 65.00 91.00 65.00 104.00 0 0.9143 0.9231 0.9011 0.8923 0.9327

Illustration of IPU Algorithm (continued) Adjustment with respect to household type constraints Household ID Initial Weights Household Type 1 Household Type 2 Person Type 1 Person Type 2 Person Type 3 Weights 1 Weights 2 1 1 1 0 1 1 1 11.67 11.67 2 1 1 0 1 0 1 11.67 11.67 3 1 1 0 2 1 0 11.67 11.67 4 1 0 1 1 0 2 1.00 13.00 5 1 0 1 0 2 1 1.00 13.00 6 1 0 1 1 1 0 1.00 13.00 7 1 0 1 2 1 2 1.00 13.00 8 1 0 1 1 1 0 1.00 13.00 Weighted Sum 3.00 5.00 9.00 7.00 7.00 Constraints 35.00 65.00 91.00 65.00 104.00 0 0.9143 0.9231 0.9011 0.8923 0.9327 Weighted Sum 1 35.00 5.00 51.67 28.33 28.33 Weighted Sum 2 35.00 65.00 111.67 88.33 88.33 35/3 = 11.67 65/5 = 13.00

Illustration of IPU Algorithm (continued) Adjustment with respect to person type constraints Household ID Initial Household Household Person Person Person Weights Type 1 Type 2 Type 1 Type 2 Type 3 Weights 1 Weights 2 Weights 3 Weights 4 Weights 5 1 1 1 0 1 1 1 11.67 11.67 9.51 8.05 12.37 2 1 1 0 1 0 1 11.67 11.67 9.51 9.51 14.61 3 1 1 0 2 1 0 11.67 11.67 9.51 8.05 8.05 4 1 0 1 1 0 2 1.00 13.00 10.59 10.59 16.28 5 1 0 1 0 2 1 1.00 13.00 13.00 11.00 16.91 6 1 0 1 1 1 0 1.00 13.00 10.59 8.97 8.97 7 1 0 1 2 1 2 1.00 13.00 10.59 8.97 13.78 8 1 0 1 1 1 0 1.00 13.00 10.59 8.97 8.97 Weighted Sum 3 5 9 7 7 Constraints 35 65 91 65 104 0 0.9143 0.9231 0.9011 0.8923 0.9327 Weighted Sum 1 35.00 5.00 51.67 28.33 28.33 Weighted Sum 2 35.00 65.00 111.67 88.33 88.33 Weighted Sum 3 28.52 55.38 91.00 76.80 74.39 Weighted Sum 4 25.60 48.50 80.11 65.00 67.68 Weighted Sum 5 35.02 64.90 104.84 85.94 104.00 1 0.0006 0.0015 0.1521 0.3222 0 35/3 = 11.67 65/5 = 13.00 91/111.67 = 0.81 65/76.80 = 0.85 104/67.68 = 1.54

Illustration of IPU Algorithm (continued) Household ID Initial Weights Household Type 1 Final Results Household Type 2 Person Type 1 Person Type 2 Person Type 3 Weights IPU Weights Without Reallocation 1 1 1 0 1 1 1 1.36 11.67 2 1 1 0 1 0 1 25.66 11.67 3 1 1 0 2 1 0 7.98 11.67 4 1 0 1 1 0 2 27.79 13.00 5 1 0 1 0 2 1 18.45 13.00 6 1 0 1 1 1 0 8.64 13.00 7 1 0 1 2 1 2 1.47 13.00 8 1 0 1 1 1 0 8.64 13.00 Constraints 35.00 65.00 91.00 65.00 104.00 δ 0 0.9143 0.9231 0.9011 0.8923 0.9327 δ IPU 0.0000 0.0000 0.0000 0.0000 0.0000

Average value (log-scale) Illustration of IPU Algorithm (continued) Improvement in Average Value 1.E+00 1.E-01 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 0 100 200 300 400 500 600 700 Number of Iterations

IPU: Geometric Interpretation Consider the following household structure and population constraints Household ID Household Type 1 Person Type 1 Weights 1 1 0 w 1 2 1 1 w 2 Constraints 4 3 Weights can be estimated by solving the following system of linear equations w w 1 2 w 3 2 4

IPU: Geometric Interpretation (continued) When solution is within the feasible region w 1 B S D A w2 = 3 C E S Starting Point B Adjustment for Household Constraint C Adjustment for Person Constraint D Adjustment for Household Constraint E Adjustment for Person Constraint continue to convergence O I I Solution w 2

IPU: Geometric Interpretation (continued) When solution is outside the feasible region w 1 A S w2 = 5 S Starting Point B Adjustment for household constraint C Adjustment for person constraint D Adjustment for household constraint E Adjustment for person constraint O B D I 1 C E I 2 I w 2 continue to convergence I Solution outside feasible region I 1 Corner solution where household constraint is satisfied I 2 Corner solution where person constraint is satisfied

Synthetic Population Synthetic population generation process can be divided into three steps Estimating whole frequencies Calculating selection probabilities Drawing households

Estimating Frequencies IPF-estimated household type constraints provide target frequencies Rounding procedures are employed to convert decimal values to whole frequencies Rounding procedures implemented in PopGen Arithmetic Rounding (default) Bucket Rounding Stochastic Rounding

Selection Probabilities Synthetic households are drawn probabilistically based on IPU-estimated weights Selection probabilities are estimated for each household type that needs to be synthesized No additional adjustments to match person constraints are needed The individuals from the synthetic households comprise the synthetic population

Illustration of Estimating Selection Probabilities Household ID Household Type 1 Household Type 2 Person Type 1 Person Type 2 Person Type 3 Final Weights Household Type 1 Household Type 2 Cumulative Cumulative Sum Probability Sum Probability 1 1 0 1 1 1 1.36 1.36 0.0389 - - 2 1 0 1 0 1 25.66 27.02 0.7720 - - 3 1 0 2 1 0 7.98 35.00 1.0000 - - 4 0 1 1 0 2 27.79 - - 27.79 0.4276 5 0 1 0 2 1 18.45 - - 46.24 0.7115 6 0 1 1 1 0 8.64 - - 54.88 0.8444 7 0 1 2 1 2 1.47 - - 56.35 0.8671 8 0 1 1 1 0 8.64 - - 64.99 1.0000

Drawing Households Rounded frequencies and the selection probabilities from earlier steps are used to generate a synthetic population For each household type, we use the corresponding selection probabilities to draw households The persons in the drawn households comprise the synthetic population for the target year As the drawing procedure is probabilistic, the fit of the synthetic population is checked The drawing procedure is repeated until a synthetic population with the best fit is obtained

Illustration of Drawing Households Household ID Household Type 1 Household Type 2 Cumulative Sum Probability Cumulative Sum Probability 1 1.36 0.0389 - - 2 27.02 0.7720 - - 3 35.00 1.0000 - - 4 - - 27.79 0.4276 5 - - 46.24 0.7115 6 - - 54.88 0.8444 7 - - 56.35 0.8671 8 - - 64.99 1.0000 1. Consider Household Type 1 2. Generate a random number between 0 and 1, e.g. 0.23 3. 0.0389 < 0.23 < 0.7720 4. Household ID 2 is added to the synthetic population 5. The process is repeated until 35 households of Household Type 1 are included 6. The process is repeated for Household Type 2 Frequency 35 65

Synthetic Population: Performance χ 2 goodness-of-fit statistic A goodness-of-fit measure to check match against person-level distributions The corresponding p-value represents the level of confidence at which the synthetic population matches the given constraints A synthetic population is drawn repeatedly until a desired p-value is achieved or a maximum number of draws is reached Maximum number of draws is user specified and dependent on geographic context 2 j n j c j c j 2 n j = frequency of synthetic persons of the j th person-type c j = the j th IPF-estimated person-type constraint

Case Studies Case Study Southern California (SCAG) Synthesis Year 2008 Sample Data Source Census 2000-5 percent PUMS ACS 2005-2007 - 3 percent PUMS* Marginal Distributions SCAG TAZ Data SCAG TAZ Data Baltimore Metropolitan (BMC) 2000 Census 2000 5 percent PUMS BMC TAZ Data

Case Studies: Control Variables Case Study Household-level Control Variables Southern California (SCAG) Baltimore Metropolitan (BMC) Presence of children (2), household type (5), household size (7), age of householder (2), family type (2), income (4) 1120 household type constraints Household size (5), worker count (4), income (4) 80 household type constraints

Case Studies: Control Variables (continued) Case Study Person-level Control Variables Southern California (SCAG) Age (10), gender (2), race (7) 140 person type constraints Baltimore Metropolitan (BMC) Person total (1) 1 person type constraint

Population Synthesis Summary Case Study Households Person Actual Synthesized Actual Synthesized Southern California (SCAG) Baltimore Metropolitan (BMC) Census PUMS 3-yr ACS PUMS Census PUMS 5,925,576 5,925,576 18,904,466 18,451,705 5,925,576 5,925,576 18,904,466 18,083,857 1,642,882 1,642,882 4,391,673 4,404,711

Performance: Aggregate Comparisons Household Size (Controlled) SCAG Using Census PUMS SCAG Using 3 yr ACS PUMS Actual Synthesized Actual Synthesized 1,800,000 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 1 2 3 4 5 6 7+ 1,800,000 1,600,000 1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 1 2 3 4 5 6 7+ BMC Using Census PUMS Actual Synthesized 600000 500000 400000 300000 200000 100000 0 1 2 3 4 5+

Performance: Aggregate Comparisons Household Income (Controlled) SCAG Using Census PUMS SCAG Using 3 yr ACS PUMS Actual Synthesized Actual Synthesized 2,000,000 2,000,000 1,500,000 1,500,000 1,000,000 1,000,000 500,000 500,000 0 < $25K >= $25K - $50K >= $50K - $100K >= $100K 0 < $25K >= $25K - $50K >= $50K - $100K >= $100K BMC Using Census PUMS Actual Synthesized 1000000 800000 600000 400000 200000 0 < $11.8K >= $11.8K and < $26K >= $26K and < $44.2 K >= $44.2K

Performance: Aggregate Comparisons Age (SCAG Controlled, BMC Uncontrolled) SCAG Using Census PUMS SCAG Using 3 yr ACS PUMS Actual Synthesized Actual Synthesized 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 < 5 >= 5 >= 15 >= 25 >= 35 >= 45 >= 55 >= 65 >= 75 >= 85 and < and < and < and < and < and < and < and < 14 24 34 44 54 64 74 84 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 < 5 >= 5 >= 15 >= 25 >= 35 >= 45 >= 55 >= 65 >= 75 >= 85 and < and < and < and < and < and < and < and < 14 24 34 44 54 64 74 84 BMC Using Census PUMS (Compared Against Census 2000 SF) Actual Synthesized 1000000 800000 600000 400000 200000 0 < 5 >= 5 and < 14 >= 15 and < 24 >= 25 and < 34 >= 35 and < 44 >= 45 and < 54 >= 55 and < 64 >= 65 and < 74 >= 75 and < 84 >= 85

Performance: Aggregate Comparisons Race (SCAG Controlled, BMC Uncontrolled) SCAG Using Census PUMS SCAG Using 3 yr ACS PUMS Actual Synthesized Actual Synthesized 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 BMC Using Census PUMS (Compared Against Census 2000 SF) Actual Synthesized 4000000 3000000 2000000 1000000 0

Actual Actual Actual Actual Performance: Disaggregate Comparisons BMC Case Study - Household Size Category 1 Person 2 Persons 14000 12000 10000 8000 6000 4000 2000 0 0 5000 10000 15000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 2000 4000 6000 8000 Synthesized Synthesized 3 Persons 4 Persons 6000 5000 5000 4000 3000 2000 4000 3000 2000 1000 1000 0 0 1000 2000 3000 4000 5000 6000 Synthesized 0 0 1000 2000 3000 4000 5000 Synthesized

Actual Performance: Disaggregate Comparisons 90000 80000 70000 60000 BMC Case Study Person total 50000 40000 30000 20000 10000 0 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Synthesized

Performance: Synthesized vs Marginals vs CTPP Matches given marginals almost perfectly Variable Category Synthesized CTPP Marginal Number of workers in the household No workers 407,421 422,680 407,448 1 worker 704,144 715,523 702,987 2 workers 631,557 587,365 631,257 3 or more workers 148,114 138,569 149,505 1 person 535,271 517,923 535,158 Household size 2 persons 560,442 571,080 560,022 3 persons 331,710 316,874 331,526 4 or more persons 463,813 459,006 464,530

Population Evolution Model Marginals are often generated at individual geography-level for specific forecast years Population is synthesized for each forecast year and activitybased travel demand model is applied to estimate the demand Issues: Estimation of travel demand for intermediate years Reflection of underlying socio-economic and demographic processes across forecast years

Population Evolution Model (continued) Instead of generating a synthetic population for every forecast year, evolve the base year synthetic population (annually) to obtain population for any future year Synthetic households and persons are subjected to a host of socio-economic and demographic evolutionary processes: Immigration and emigration Person-level life cycle events Household-level changes over time Population evolution allows intermediate year simulations while reflecting dynamics across forecast years

Population Evolution Prototype Developed and implemented for Baltimore Metropolitan Council (BMC) region Emigration Immigration Aging Mortality Fertility Income Occupation Labor Participation Child Leaving Model Education Marriage/ Divorce Household Formation Household Dissolution

Household Migration: Emigration Description Simulation Model Data Households moving out of the study region Select households from sample to match given household- and person-level distributions of emigrating households and locate them Procedure similar to population synthesis; IPF is used to estimate frequency of households and IPU is employed for selection probabilities Sample Census PUMS Control distributions Maryland Department of Planning

Emigration Control totals of person-level attributes of emigrants Person Attributes Anne Arundel County Baltimore City County Name Baltimore County Caroline County Carroll County Frederick County Male 47659 83094 51864 2332 11169 15772 Female 45684 86180 55470 2503 11822 15753 Person Total 93343 169274 107334 4835 22991 31525 White alone 75039 96375 81033 3705 21583 27660 Black or African American alone 11963 60886 19546 985 694 2041 American Indian & Alaska Native alone 439 861 318 18 67 96 Asian alone 2335 6062 3692 18 276 641 Native Hawaiian & other Pacific Islander alone 116 105 0 0 13 0 Some other race alone 958 1769 860 26 70 414 Two races or two or more races 2493 3216 1885 83 288 673 Person Total 93343 169274 107334 4835 22991 31525 5 to 9 years old 8511 14143 8166 387 1737 2762 10 to 14 years old 6636 12781 6653 377 1424 2167 15 to 19 years old 6808 11534 6595 455 1874 2702 20 to 24 years old 10606 15129 10099 697 3761 3956 25 to 29 years old 11203 22101 12982 461 2695 3750 30 to 34 years old 11213 22357 13868 467 2235 3191 35 to 39 years old 10632 18743 12298 492 2154 3504

Household Migration: Immigration Description Simulation Model Data Households moving into the study region Select households from sample to match given household- and person-level distributions of immigrating households and locate them Procedure similar to population synthesis; IPF is used to estimate frequency of households and IPU Is employed for selection probabilities Sample Census PUMS Control distributions Maryland Department of Planning

Emigration Control totals of household-level attributes of immigrants Household Attributes Anne Arundel County Baltimore City County Name Baltimore County Caroline County Carroll County Frederick County Married-couple family HH with children <18 42730 14210 51160 2170 14400 20190 Married-couple family HH without children <18 20430 10240 26650 1140 5970 8550 Other family Household with children <18 12910 15260 30410 1030 3070 4380 Other family Household without children <18 4850 5590 9350 270 740 1180 Non-family Household 18980 32930 40150 850 3130 6450 Household Total 99900 78230 157720 5460 27310 40750 Under $25,000 9290 26440 27560 920 2270 3960 $25,000 to $49,999 22730 22860 48020 2230 4970 8990 $50,000 to $74,999 25150 14000 39230 1280 7750 10060 $75,000 to $99,999 17750 7190 20040 670 5580 8080 $100,000 to $199,999 21190 6350 18710 360 6070 8710 $200,000 and over 3780 1400 4170 10 680 950 Household Total 99890 78240 157730 5470 27320 40750

Person-level Evolution: Aging Description Simulation Model Reflect the aging of individuals from year-toyear Increase the age of individuals by 1; update household attributes NA Data NA

Person-level Evolution: Mortality Description Simulation Model Reflect the mortality of individuals from yearto-year Remove person from the household; update household and person attributes Rate-based model is employed to calculate the probability of mortality Data Mortality Rates by person characteristics Centers for Disease Control and Prevention

Mortality Rates by Race, Gender, and Age American Indian or Asian or Pacific Black or African Alaska Native Islander Americxan White Female Male Female Male Female Male Female Male < 1 year 0.007211 0.009714 0.003757 0.004671 0.011872 0.01422 0.005159 0.006364 1-4 years 0.000481 0.000651 0.000193 0.000194 0.00038 0.000469 0.000232 0.000292 5-9 years 0.000149 0.000178 0.000113 0.000117 0.000182 0.000228 0.000118 0.000142 10-14 years 0.000183 0.000219 0.000109 0.000135 0.000186 0.000288 0.000129 0.000192 15-19 years 0.000664 0.001218 0.000225 0.000481 0.000381 0.00129 0.000371 0.000853 20-24 years 0.00065 0.001801 0.000287 0.000696 0.000655 0.00218 0.000462 0.001366 25-34 years 0.000914 0.002001 0.000285 0.000549 0.001082 0.002542 0.000585 0.001331 35-44 years 0.001994 0.003376 0.000574 0.00091 0.002475 0.003939 0.001297 0.002265 45-54 years 0.003542 0.005904 0.00144 0.002373 0.005582 0.009351 0.002914 0.005072 55-64 years 0.006928 0.010756 0.003426 0.005481 0.010896 0.019222 0.006591 0.010592 65-74 years 0.017176 0.021991 0.009016 0.013654 0.022901 0.037079 0.016731 0.025039 75-84 years 0.036757 0.042834 0.025276 0.03681 0.051458 0.075295 0.044574 0.062626 85+ years 0.068411 0.079638 0.076708 0.096749 0.124895 0.135019 0.132278 0.148593

Person-level Evolution: Fertility Description Simulation Model Data Mimics the addition of individuals through birth event Add an individual; update household and person attributes Binary logit model is applied to calculate the probability of female giving birth or not Model estimated using National Survey of Family Growth data

Fertility Binary logit model Variables Coefficient T-test Constant -9.3006-15.597 Age (Years) 0.6092 13.361 Age Square (Years 2 ) -0.0119-14.381 Female is in marriage 0.9853 11.439 Female is full-time employed -0.6465-7.595 Female is part-time employed -0.5058-4.718 Female is Black 0.4806 5.172 Female is Hispanic White 0.3878 3.074 Model Statistics Sample Size 7356 LL(b) -2392.94 LL(0) -5098.79 0.5307 Adj. 0.5291

Person-level Evolution: Education Description Simulation Model Data Reflects the enrollment and education attainment processes Update the education status if continues enrollment Rate-based model is employed to estimate the probability to discontinue Education continuation rates by current education level and person-characteristics Census PUMS

Education Enrollment discontinuation rates Non-Hispanic White Hispanic White Black or African Asian or Native Two or more Other American Hawaiin major No Schooling Completed 0.0062 0.0479 0.0148 0.0195 0.025 0.0939 1st grade to 4th grade 0.0028 0.033 0.0095 0.0058 0.0128 0.0403 5th grade or 6th grade 0.0107 0.0826 0.02 0.02 0.034 0.108 7th grade or 8th grade 0.0385 0.0548 0.0496 0.0216 0.0554 0.0426 9th grade 0.0265 0.0311 0.04 0.0153 0.0231 0.0726 10th grade 0.0337 0.0256 0.0589 0.0186 0.0436 0.048 11th grade 0.0244 0.0285 0.0646 0.0082 0.019 0.0477 12th grade, no diploma 0.0265 0.0722 0.0768 0.0453 0.0678 0.0724 High school graduate 0.2946 0.2019 0.4027 0.1585 0.2391 0.3423 Some college, but less than 1 yea 0.1017 0.0793 0.1299 0.0366 0.1148 0.1553 One or more years of college, no 0.2185 0.2119 0.3876 0.1221 0.2809 0.4162 Associate degree 0.1003 0.1261 0.1708 0.0648 0.1202 0.1389 Bachelor s degree 0.4849 0.3269 0.5484 0.4251 0.4927 0.4624 Master s degree 0.5362 0.4857 0.6705 0.4108 0.4808 0.62

Person-level Evolution: Individuals Moving Out Description Simulation Model Data Mimics children moving out from family households for college Remove person from the household; update household and person attributes; create a new household and locate Multinomial logit model is applied to estimate probability of living on-campus, living offcampus, or living with parents Model estimated using Census PUMS

Individuals Moving Out Model of residence location choice for college students Variables Coefficient T-test Living On-Campus Constant 4.2041 14.602 Age (Years) -0.0636-6.522 Female -0.3579-1.952 Annual Household Income ( $1,000) -4.6307-8.583 Non-Hispanic White 1.4138 7.682 Living Off-Campus Constant 0.2991 2.504 Female -0.2936-5.873 Annual Household Income ( $1,000) Non-Hispanic White -0.0254 0.3807-31.94 3.338 Hispanic White -0.3557-1.673 Black -0.358-3.017 Asian -0.4777-3.099 Living with Parents (Utility is fixed at 0) Model Statistics Sample Size 12756 LL(b) -5512.46 LL(0) -14013.9 0.6066 Adj. 0.6058

Person-level Evolution: Employment Description Simulation Model Data Reflects the participation in labor force, occupation type, and income Update the labor participation decision, occupation, and income; update household and person attributes Binary logit model of labor participation, multionomial logit model of occupation, ordered logit model of income Models of labor participation, occupation, and income estimated using Census PUMS (but no information on dynamics or history dependency; fresh simulation each time)

Employment Model for labor force participation Variables Coefficient T-test Constant -2.9993-31.884 Age (Years) 0.1891 47.392 Age Square (Years) -0.0027-63.19 Education Years 0.1746 41.508 Female In-marriage -0.316-9.537 Female Number of kids -0.2154-13.963 Male In-marriage 0.7411 19.633 Non-Hispanic white 0.1703 5.747 Hispanic white 0.3791 3.695 Asian/Native Hawaiian/ Pacific Islander -0.3242-4.849 Model Statistics Sample Size 61299 LL(b) -20182.82 LL(0) -42489.23 0.525 Adj. 0.5248

Person-level Evolution: Household Formation Description Simulation Model Data Mimics the decision of individuals to get married and form new households Simulate decision to marry, and match spouses from pool of eligible males and females Binary logit model of marriage for males and females Model estimated using National Survey of Family Growth

Employment Model of marriage decision for males and females Variables Coefficient T-test Constant -11.662-10.792 Age (Years) 0.6037 7.762 Age Square (Years 2 ) -0.0097-7.34 Black -0.3843-2.003 Hispanic White -0.8084-2.654 Full-time Employed 0.4009 2.591 Model Statistics Sample Size 4638 LL(b) -891.015 LL(0) -3214.82 0.7228 Adj. 0.721 Marriage model for males Marriage model for females Variables Coefficient T-test Constant -7.9976-9.783 Age (Years) 0.3842 6.078 Age Square (Years 2 ) -0.0072-6.258 White 0.701 5.374 Full-time Employed 0.3736 2.86 Model Statistics Sample Size 4877 LL(b) -1099.42 LL(0) -3380.48 0.6748 Adj. 0.6733

Person-level Evolution: Household Dissolution Description Simulation Model Mimics the decision of individuals to get divorced Simulate decision to divorce, and dissolve the household and locate the new household; update household and person attributes Binary logit model of getting divorced Data Model estimated using National Survey of Family Growth

Employment Model of divorce decision Variables Coefficient T-test Constant -3.198-19.137 Age (20 25 Years) 0.3923 1.466 Age (25 30 Years) 0.3303 1.555 Age (Elder than 40 Years) 0.4213 1.878 Hispanic White 0.4033 1.575 Fulltime Employed 0.294 1.731 Model Statistics Sample Size 2621 LL(b) -575.14 LL(0) -1816.74 0.6834 Adj. 0.6801

Population Evolution Prototype Implementation The prototype was developed using the software infrastructure that supports OpenAMOS an open-source activity-travel demand model system PopGen was used to generate base year synthetic population for year 2008 for Harford county in the BMC model region and population was evolved for 10 years past that i.e. from 2009 2018 All the models identified earlier were implemented except for the household formation model (spouse matching component)

Preliminary Results Race distribution 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 0.00 White alone Black or African American alone Asian alone Some other race alone Two or more major race groups American Indian, Alaska Native and other Pacific Islander

Preliminary Results Gender distribution 0.52 0.51 0.51 0.50 0.50 0.49 0.49 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 0.48 0.48 Male Female

Preliminary Results Household Size distribution 0.4 0.35 0.3 0.25 0.2 0.15 0.1 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 0.05 0 1 2 3 4 5 6 7+

Preliminary Results Worker Count distribution 0.6 0.5 0.4 0.3 0.2 0.1 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 0 0 1 2 3+

Population Evolution Challenges The prototype considers some basic dimensions of householdand person-level attributes of interest There are a host of other household- and person-level socioeconomic and demographic processes that are of interest Formation of non-family households, e.g., roommates Vehicle fleet composition and evolution, bicycle ownership Mobility options, e.g., driver s license and transit pass holding status ICT availability We need a better understanding of evolutionary processes and enhance their representation

Population Evolution Challenges (continued) Availability of data Data from a single source that can uniformly be used to estimate and model choice dimensions of interest avoid introducing sample biases Richer data needed to estimate and apply advanced models Modeling simultaneous choices, e.g., Education and Occupation choices Endogeneity of choices, e.g., auto ownership and residential/ workplace location choices (typically in land use model) Accounting for inter-person dependencies in the population evolution choice dimensions Sequencing/hierarchy of choice dimensions

Summary and Conclusions State of practice moving towards disaggregate microsimulation modeling of travel demand Need to generate synthetic population for base year and evolve populations for subsequent years to apply microsimulation models of travel demand The current Census information is adequate for base year synthetic population generation However, the applicability of Census to estimate models of various socio-economic and demographic evolutionary processes is limited

Summary and Conclusions Often researchers look for alternative survey resources to model these processes. This may potentially introduce survey biases ACS data offers valuable opportunity to capture dynamics of households on annual basis Create a panel PUMS sample that can be traced over long periods of time; households rotate in and out of panel sample Estimate models of evolution (changes in demographics) from a single consistent data source and introduce history dependency (e.g., labor force participation at time point t+1 is highly dependent on labor force participation at time t; should not simulate choice as a fresh start each year)

Questions?