A Method for Calculating Cost Correlation among Construction Projects in a Portfolio

International Journal of Architecture, Engineering and Construction Vol 1, No 3, September 2012, 134-141 A Method for Calculating Cost Correlation among Construction Projects in a Portfolio Payam Bakhshi 1,, Ali Touran 2 1 Department of Construction Management, Wentworth Institute of Technology, Boston, MA 02115, United States 2 Department of Civil and Environmental Engineering, Northeastern University, Boston, MA 02115, United States Abstract: One of the important steps in a probabilistic risk assessment is the recognition of the statistical correlation among cost components. Ignoring the correlation results in an underestimation of total cost variance. This becomes even more significant when we are dealing with a portfolio of projects. This may lead to underestimation of budget for the desired confidence level. While there have been several methods proposed to calculate the correlation between components of a project cost, proposing methods to calculate the correlation coefficient between total costs of projects has been neglected. In this paper a new method is proposed to mathematically calculate the Pearson Correlation Coefficient between costs of any two projects in a portfolio of projects. The Proposed Mathematical Model (PMM) is an analytical approach based on the premise of breaking down the total project cost to a base cost (deterministic) and risks cost (probabilistic). The PMM can help determine correlation coefficients between total project costs in a portfolio of projects which is a necessary step in probabilistic cost estimation techniques. Keywords: Correlation coefficient, construction costs, base cost, risks, portfolio of projects DOI: 10.7492/IJAEC.2012.015 1 INTRODUCTION When two or more random variables do not vary independent of each other, the measure of their dependence is measured by correlation coefficients. There are several correlation coefficients to measure this relationship among which Pearson Coefficient and Spearman s Rank Correlation Coefficient are the most commonly used in construction research and practice. It should be noted that Pearson Coefficient is a measure of linear relationship between variables while Spearman s Rank Correlation Coefficient is a measure of monotonosity (Iman and Conover 1982). Spearman s Rank Coefficient is a non-parametric measure of statistical dependence between two variables and is an indication of correlation between ranks of the values of random numbers instead of correlation between values (Kurowicka and Cooke 2006). This is very useful in most modeling situations (Iman and Davenport 1982). Several researchers have shown that the effect of excluding correlation between variables in cost or schedule estimation is significant (Ince and Buongiono 1991; Touran and Wiser 1992; Wall 1997; Touran and Suphot 1997; Ranasinghe 2000; Yang 2006). Touran and Wiser (1992) declared that correlations among project cost components are neglected, partly because of difficulty to measure them. In their study, using information provided by R. S. Means, Inc., they collected unit costs of 1,014 low rise office buildings in the US. Each project was broken down into 15 different cost items in accordance with Construction Specifications Institute (CSI) divisions. They performed Test of Goodness of Fit on each cost item and concluded that lognormal distribution was the best fit for each cost item. This dataset was used to conduct a Monte Carlo simulation and reach cumulative distribution function (CDF) of the total cost. First they assumed independent relationship between all 15 cost items and then the correlations were recognized. Even though the total cost means in both scenarios were very close to the real data s mean, *Corresponding author. Email: bakhship@wit.edu 134

the total cost variance in independent case was significantly lower than correlated case which was slightly less than the real data s variance. This was expected because the model in the independent case was sampling different distributions independently which was resulted in underestimating the total cost variance. Wall (1997) showed the importance of establishing correlation between the costs of sub-components of construction cost estimates in Monte Carlo simulation and the error that its ignorance can produce in the output. He stated this would lead to inaccurate risk assessment. In his study, he created a dataset consisting of cost per square meter of 216 new build office buildings in the UK. Furthermore, after test of goodness of fit, beta and lognormal distributions were selected as the two best fit on cost data. Then, it was concluded that the effect of ignoring correlation is more intense than the effect of the choice between lognormal and beta distributions. This reveals the importance of correlation in cost estimation and the adverse impact that its ignorance can have on the final outcome. Ranasinghe (2000) stated that treatment of correlation between variables is necessary to compute a theoretical distribution of a project cost. This requires the estimate of correlation information whether Monte Carlo simulation or analytical approach are taken. 2 SUBJECTIVE ESTIMATE OF CORRELATION When enough data is available, the correlation can be simply calculated mathematically using regular formula of Pearson Coefficient or Spearman s Rank Correlation Coefficient (Kurowicka and Cooke 2006). The problem is that usually there is not sufficient historical data available to calculate the correlation coefficients. Most of the time in construction cases, we do not have access to the detailed data about cost items or activity durations to find their relationships. In such a case, estimating correlation coefficients among various components of a project total cost or between projects total costs in a program/ portfolio is indispensable. Most of the researchers concentrate on subjective estimates of correlation elicited from the expert judgments (Ranasinghe and Russel 1992; Touran 1993; Chau 1995; Wang and Demsetz 2000; Cho 2006). As an example, Touran (1993) suggested a convenient system to quantify the subjective correlations. He recommended that experts can estimate the correlation in three levels of weak, moderate, or strong. These qualitative correlations would be based on previous experience and could vary from project to project, depending on the circumstances. The proposed correlation coefficients for different levels are: (1) Weak: 0.15 which is the midpoint of 0 to 0.3; (2) Moderate: 0.45 which is the midpoint of 0.3 to 0.6; (3) Strong: 0.80 which is the midpoint of 0.6 to 1.0. Touran (1993) applied both calculated correlation coefficients and suggested subjective coefficients in numerous construction cost examples to compare the resulting total cost CDFs. It was shown that the actual CDFs were very close to the CDFs using suggested subjective correlation. However, it should be noted that in order to have a mathematically correct and applicable correlation matrix, the matrix must be positive semidefinite. The use of qualitative or subjective correlation coefficients (or even calculated correlation coefficients from relatively small samples) may lead to a correlation matrix that may not be positive semidefinite. Chau (1995) used a similar qualitative assessment method for estimating degree of dependence. Cho (2006) employed concordance probability in conjunction with a three-step questionnaire to estimate correlation coefficients between activity durations. In this method, for two dependent random variables, a bivariate normal density is assumed and a conditional probability, called concordant, is required. For variables X and Y having two independently observed pairs (X 1, Y 1 ) and (X 2, Y 2 ), the concordance probability is: C_Pr Pr(Y 2 > Y 1 X 2 > X 1 ). The concordance probability is a monotone increasing function of correlation coefficient which can be graphed for correlation between -1 to +1 versus probability of 0 to 1. Cho suggested a three-step method to successfully elicit the correlation coefficient of the duration of two activities A and B, as follows: (1) Asking the experts to determine the mean duration and the standard deviation for each activity; (2) Asking the experts whether the pair of activities is influenced by the common environmental risks or shares human resources. If the answer is No, the correlation is 0; otherwise, if there is a dependency feeling between two activities, it should be proceeded to step 3; (3) Asking the experts in what fraction of the cases he/she would expect that the duration of activity B will be longer than its expected duration, given that the duration of activity A is longer than its expected duration. Having this fraction as the concordance probability and using the graph, the correlation coefficient is found. The method suggested by Cho (2006) for estimation of correlation between activity duration, cannot be easily applied to estimate correlation between cost components. First, it assumes a normal distribution for each variable which is not always the case. Moreover, asking the experts to estimate the fraction in step 3 cannot be an easy and also accurate task. Therefore, a more robust method is needed to estimate correlation as accurate as possible. The issue becomes more complex when there is a need to estimate the correlation coefficients between total project costs of different projects. This may happen if the objective is to develop contingency budget for a program or a portfolio of projects. The underestimation of total portfolio/ program cost variance can lead to significantly low contingency budget. It is of course possible to subjectively estimate the 135

correlations coefficient between each pair of projects using terms such as low, moderate, high and then use a sensible system to convert these measures into numerical values. Methods such as polling the experts or the Delphi approach may be used to improve the accuracy of results. However, these approaches may fall short of a rigorous analytical method and furthermore, it would be difficult to verify the reasonableness of the estimates. In the following section, we introduce an analytical method for calculation of Pearson Correlation Coefficient between two projects. 3 PROPOSED MATHEMATICAL MODEL (PMM) Finding correlation between project costs becomes necessary when the owner is using probabilistic techniques to estimate budget for portfolio of projects. The total cost of two projects can be correlated when projects are concurrent. If two projects are constructed in two completely different time frames, then the total cost of projects as random variables vary fully independent of each other. As it was described earlier, the most common approach for estimating correlation coefficient is to provide subjective estimates of it. This, while better than ignoring correlation, may be subject to inaccuracy and estimator s bias. No analytical approach for calculating correlations between project costs was found after an exhaustive search in civil engineering, construction, and general management literature. For instance, Ranasinghe (2000) suggested an analytical approach to estimate the correlation between bill item costs when calculating the standard deviation of a project cost. He presented a bill of quantities broken down to three levels: (1) usage of resources and unit market price, (2) bill item cost, and (3) project cost. The correlations between bill item costs (derived variables), called induced correlation, were estimated based on the correlation between historical market prices of resources (primary variables). This is a new correlation coefficient defined as the ratio, between the variance covariance induced in the two derived variables due to common primary variables in their functional relationship and the total variance covariance in the two derived variables. Also, Wang (2002) developed a factor based computer simulation model (COSTCOR) for cost analysis of a project considering correlations between cost items. In his model, the cost items are treated as random variables which are presented by total cost distributions. Then the uncertainty in each grandparent distribution is transferred to several factor cost distributions. The correlations between cost items are estimated by drawing cost samples from related portions of the cost distributions for cost items that are sensitive to a given factor. Two abovementioned models help estimate the correlation between cost items in a project. In this section, we propose a mathematical model, named Proposed Mathematical Model (PMM) which can be used to calculate the correlation coefficient between any two project costs. Using Pearson Correlation Coefficient definition, PMM helps analyst systematically calculate the correlation coefficient between costs of any two projects under consideration in the absence of historical data. The idea for this approach came from the authors research in the cost estimating and risk analysis of transportation projects. In the past few years, federal highway and transit agencies have encouraged (and sometimes required) the use of probabilistic risk assessments for major transportation projects. In general, in order to verify the adequacy of project contingency budget, the project s budget is divided into two components: (1) base cost, and (2) risks cost. Base cost is the cost of project with contingencies removed (Touran 2006). These are costs for items with a high degree of certainty and which are necessary for delivering the project. Risk costs on the other hand, are costs that are uncertain in nature and may or may not affect the project. The cost of risk factors is usually allowed for by budgeting a contingency set aside to cope with uncertainties and risks during a project design and construction. Using this definition, let us define the total cost of project as: n i X i B i + R ij (1) where X i denotes total cost, B i denotes the base cost of project i, R ij represents the monetary impact of risk factor j(j 1, 2,..., n i ) in project i and n i denotes the number of identified risk factors in project i. The sum of R ij is the required contingency budget for project i. Usually B i are deterministic values but R ij are modeled as random variables, although some elements can be deterministic. To estimate the correlation coefficient between costs of two projects, let us assume two projects with the following total costs: X 1 B 1 + R 1j (2) X 2 B 2 + R 2j (3) Risk factors in both projects can be divided into two parts: (1) common risk factors (CR) and (2) projectspecific risk factors (PR). CR risk factors are those that if they occur in project 1, they will potentially happen in project 2. PR risk factors are those that are not likely to happen in both projects. Therefore the costs can be rewritten as: p 1 X 1 B 1 + CR 1k + P R 1l (4) 136

m 2 p 2 X 2 B 2 + CR 2k + P R 2l (5) where m 1 m 2 m are the number of common risk factors between project 1 and 2 and p 1 n 1 m 1 and p 2 n 2 m 2 are the number of project-specific risk factors in project 1 and 2 respectively. Furthermore, CR 1k is the k th risk factor in project 1 which is a common risk factor between two projects under consideration. P R 1l is the l th risk factor in project 1 which is a project-specific risk factor. Similarly, CR 2k and P R 2l represent the risk factors in project 2. To estimate the correlation coefficient, we need to calculate the covariance between X 1 and X 2 : COV (X 1, X 2 ) COV (B 1 + CR 1k p 1 m 2 p 2 + P R 1l, B 2 + CR 2k + P R 2l ) (6) Expanding the above and eliminating the terms including the covariance between two constants or a constant and a variable (which are equal to zero), we have: m 2 COV (X 1, X 2 ) COV ( CR 1k, CR 2k )+ COV ( p 1 p 2 CR 1k, P R 2l )+ m 2 COV ( P R 1l, CR 2k )+ p 1 p 2 COV ( P R 1l, P R 2l ) To calculate the above covariances, we need to make some assumptions. We recognize the correlation between analogous common risk factors such as (CR 11, CR 21 ) and (CR 12, CR 22 ) in the two projects. All other combinations of common risk factors such as (CR 11, CR 22 ) or (CR 12, CR 23 ) are assumed to be independent, meaning the covariance is zero. We also consider that there is no correlation between all combinations of project-specific risk factors in the two projects (P R 1l, P R 2l ). We also assume that there is no correlation between common risk factors in project 1 and project-specific risk factors in project 2 and vice versa. (7) These assumptions of independence are justified because no explicit relationship exists between these combinations of risk factors. In other words, if one occurs in Project 1, it does not give us any new information on occurrence of the other one in Project 2. Therefore, the assumption of independence is rational and adequate. For instance, Table 1 depicts the risk factors identified in two projects. The first two risk factors labeled with CR are common risk factors in two projects. The other risk factors denoted by PR are projectspecific risk factors. The assumptions made in developing PMM simply mean that only the correlation coefficient between environmental regulation in project 1 and 2 (CR 11, CR 21 ) and correlation coefficient between exchange rate in project 1 and 2 (CR 12, CR 22 ) are non-zero. The correlation coefficient between any other combinations of risk factors such as environmental regulation in project 1 and exchange rate in project 2 (CR 11, CR 22 ), exchange rate in project 1 and domestic fiber optics purchase & install in project 2 (CR 12, P R 22 ), or permanent barriers in project 1 and archeology finds in project 2 (P R 12, P R 21 ) are zero. It should be noted that Table 1 here is presented as a general example to illustrate the logic used to develop the model. However, for actual projects, these relationships among any two projects must be carefully evaluated and identified. Knowing that: COV (x, y) ρ x,y σ x σ y (8) where ρ x,y is the correlation coefficient between x and y. Thus we have: m 2 COV (X 1, X 2 ) COV ( CR 1k, CR 2k ) COV (CR 11, CR 21 ) +...+ COV (CR 1m, CR 2m ) ρ CR11,CR 21 σ CR11 σ CR21 +... + ρ CR1m,CR 2m σ CR1m σ CR2m m ρ CR1k,CR 2k σ CR1k σ CR2k (9) To find the total cost variance of project 1, we know Table 1. An example of risk factors identified in two projects Project 1 Project 2 Risk ID Risk Event Risk ID Risk Event CR 11 Environmental Regulation CR 21 Environmental Regulation CR 12 Exchange Rate CR 22 Exchange Rate P R 11 Utility Relocation Variation P R 21 Archaeology Finds P R 12 Permanent Barriers P R 22 Domestic Fiber Optics Purchase & Install P R 13 Parking Space Construction 137

that: σx 2 1 COV (R 1j, R 1t ) t1 σr 2 1j + 2 tj+1 ρ R1j,R 1t σ R1j σ R1t (10) where σx 2 1 is total cost variance of project 1, σr 2 1j and σ R1j are respectively variance and standard deviation of j th risk factor in project 1, and ρ R1j,R 1t is the correlation coefficient between j th and t th risk factors in project 1. Similarly in project 2: σx 2 2 COV (R 2j, R 2t ) t1 σr 2 2j + 2 tj+1 ρ R2j,R 2t σ R2j σ R2t (11) It should be noted that the Eqs. (10) and (11) calculate the total cost variance of project 1 and 2 respectively considering the possible correlation between any pair of risk factors in each project. However, if there is a belief that there is no correlation between cost factors in each project, then the total variance equations can be reduced to the sum of risk factors variances. Unlike cost components in a project where the pairwise correlation usually exists among some of them due to common resources, construction methods, and management practices, the risk factors identified during risk assessment procedure may not be necessarily correlated (Ranasinghe 2000). Now, by substituting the total cost variance of project 1 and 2, Eqs. (10) and (11), and covariance between project 1 and 2, Eq. (9), into Pearson Correlation Coefficient formula, we have: ρ X1,X 2 COV (X 1, X 2 ) σ X1 σ X2 m (ρ CR 1k,CR σ 2k CR σ 1k CR ) 2k n1 n1 t1 COV (R 1j, R 1t ) n2 1 n2 t1 COV (R 2j, R 2t ) (12) Using Eq. (12), one can calculate the correlation coefficient among costs of any pair of projects with an acceptable degree of accuracy. It should be noted that if two projects are in the same geographical area, they may have more common risk factors. In Eq. (12), this can be translated into a larger numerator, thus higher correlation coefficient. In other words, the model indirectly considers the location of the projects under consideration by capturing the factors contributing to their cost dependency. This method is simple to apply on large projects where the risk register for these types of projects is mostly available. For instance, currently the Federal Transit Administration (FTA) requires each New Starts transit project to go through a complete risk analysis and hence the risk register should be prepared for each new project. The analyst should be careful to select the common risk factors correctly. This is the most important step in the application of the PMM. Since the correlation estimation is usually required between costs of similar projects in a portfolio, the agency can publish a template or a risk catalogue. As a result of this practice, the recognition of common risk factors becomes more accurate and straight-forward. The application of the proposed method is mainly in dealing with the required Program budget for a group or portfolio of projects. 4 NUMERICAL EXAMPLE To illustrate the application of the approach, two hypothetical transit projects along with their identified risks are presented. Then using the PMM, the correlation between costs of two projects is estimated. Tables 2 and 3 depict the risk register for two hypothetical transit projects. The risk register is a listing of all major risk factors that might affect the project cost (or schedule) along with their impact on budget (or schedule). Developing risk register is an established step in the current risk assessment practice encouraged by the Federal Highway Administration (FHWA) and the FTA. In Project 1, 26 risks/opportunities with the total monetary impact of $26,101,971 and standard deviation of $4,212,318 are identified. Project 2 has 18 identified risks/opportunities with the total impact of $31,726,377 and standard deviation of $5,033,338. Both risk assessments have been conducted after Final Design (100% design complete) in 2004, with the expected starting construction phase in 2005. Note that the potential cost of each risk factor is estimated probabilistically using an appropriate statistical distribution by a group of experts. In other words, the data is readily available for use in the PMM. The goal is to estimate the correlation between costs of these two projects using the proposed mathematical model. First, two risk registers shown in Tables 2 and 3 are compared to recognize the common risk factors in both risk registers. As it was mentioned earlier, the common risk factors are factors that if they occur in project 1, they can potentially happen in project 2. An expert needs to go over the factors in the risk registers of both projects and select the factors that will impact both projects for the same reason. This step can become much easier when an agency dealing with a portfolio of projects creates a template for preparing of risk registers. The common risk factors have been highlighted in two abovementioned tables. These are risks with IDs P1.R10, P1.R15, and P1.R23 in Project 1 corresponding with P2.R05, P2.R13, and P2.R18 in Project 2. The standard deviation of all risks can be found in the last column of risk registers. Using Eqs. (10) and (11), the total cost variances of both projects 1 138

Table 2. The risk register for the hypothetical transit project 1 Project Name Hypothetical transit project 1 Construction start date 3/7/2005 Location Las Vegas, NV Risk analysis at phase Final design Project BC $432,027,078 Risk analysis date 7/19/2004 Risk ID Risk/opportunity event Risk/opportunity impact Mean Std. Dev. 5% ($) Most likely($) 95%($) ($) ($) P1.R01 Owner directed change 0 2,400,000 4,800,000 2,400,002 1,433,054 P1.R02 Utility relocation variation -3,500,000 0 5,000,000 594,207 2,543,370 P1.R03 Remaining property acquisitions -250,000 2,500,000 4,000,000 2,004,382 1,276,693 P1.R04 Environmental risks 500,000 1,250,000 2,500,000 1,448,159 599,756 P1.R05 Proximity to existing structures 100,000 250,000 500,000 289,632 119,952 P1.R06 City restrictions 0 1,093,580 2,187,159 1,093,581 652,972 P1.R07 Design change for column location 25,000 50,000 100,000 59,916 22,568 P1.R08 Daily lane closures and 0 250,000 500,000 250,001 149,279 their frequency P1.R09 Design changes/city requirements 0 243,201 486,402 243,201 145,218 P1.R10 Estimate deviation -1,000,000 1,950,000 4,000,000 1,593,470 1,496,243 (pessimistic estimate) P1.R11 Permanent barriers 0 1,500,000 2,000,000 1,102,346 607,462 P1.R12 Parking space construction 0 250,000 300,000 170,134 92,236 P1.R13 Traffic signal modifications 0 1,642,000 1,970,400 1,117,433 605,808 P1.R14 Site conditions (geotech), 100,000 250,000 500,000 289,633 119,953 environmental risk P1.R15 Locomotives uncertainty 1,500,000 2,750,000 5,000,000 3,146,457 1,051,005 due to exchange rate P1.R16 Additional surveying required 25,000 75,000 200,000 104,787 52,920 P1.R17 Potential RTC caused project delay 601,865 1,203,730 2,407,460 1,442,462 543,303 P1.R18 Fire Protection - NFPA 130 0 180,000 300,000 156,230 89,823 P1.R19 Credit for Station Connector 0 0 2,400,000 983,540 754,247 P1.R20 Potential increase in insurance cost 0 1,687,500 3,375,000 1,687,486 1,007,603 P1.R21 Emergency walkway lighting 0 1,000,000 2,400,000 1,158,444 717,949 P1.R22 Additional fare collection equipment 0 200,000 300,000 160,334 90,272 P1.R23 Escalation from Sep 30,04 0 2,375,000 4,750,000 2,374,993 1,418,143 to NTP of Mar 05 P1.R24 Effect of potential delay 742,761 1,485,523 2,971,046 1,780,140 670,512 P1.R25 Scope change for additional oversight 200,000 350,000 500,000 350,000 89,566 and Before & After study P1.R26 V/E Study 50,000 100,000 150,000 100,000 29,856 Total 26,100,971 4,212,318 and 2 are estimated. To do that, the pairwise correlations between risk factors in each project must be identified. A thorough examination of all risk/opportunity events in each risk register does not reveal any dependency between them. For instance, if risk P1.R04 (Environmental risks) in project 1 happens, it has nothing to do with the occurrence of risk P1.R14 (Site conditions (geotech)). P1.R04 is predicting the cost impact caused by NEPA (National Environmental Policy Act) requirements (e.g. encountering hazardous materials during exaction) while P1.R14 estimates the cost impact due to changes in geotechnical site conditions (e.g. variation in soil bearing capacity or encountering rock during excavation). As another example, risk P1.R06 (City restrictions) which considers the possible costs imposed by traffic control or permissible construction hours is independent from risk P1.R09 (Design changes/city requirements) which takes into consideration the probable costs because of design modifications if city requirements are changed. It should be emphasized that the numerical example given here is just to illustrate the application of the model. However, the project team who are establishing the risk registers should define the dependency between the identified risk factors in each project during risk workshops. Therefore, total cost variances are: σ 2 X 1 σ 2 X 2 26 26 2 σ 2 R 1j + 26 tj+1 26 ρ R1j,R 1t σ R1j σ R1t σ 2 R 1j + 0 ($4, 212, 318) 2 18 σr 2 2j 18 18 2 tj+1 18 ρ R2j,R 2t σ R2j σ R2t σ 2 R 2j + 0 ($5, 033, 338) 2 (13) (14) Hence, standard deviations of total costs of project 1 139

Table 3. The risk register for the hypothetical transit project 2 Project Name Hypothetical transit project 2 Construction start date 1/3/2005 Location Maryland, MD Risk analysis at phase Final design Project BC $381,358,049 Risk analysis date 2/23/2004 Risk ID Risk/opportunity event Risk/opportunity impact Mean Std. Dev. 5% ($) Most likely($) 95%($) ($) ($) P2.R01 Design uncertainty 2,650,000 4,650,000 6,650,000 4,649,999 1,194,222 P2.R02 ADA Compliance 297,000 330,000 396,000 343,090 29,790 P2.R03 Opportunity (only half of 3,095,000 3,439,000 4,127,000-3,575,447 310,536 platform built) P2.R04 Archaeology finds 125,000 250,000 500,000 299,581 112,839 P2.R05 Deviation from estimate -55,000 2,650,000 4,750,000 2,410,402 1,436,225 (pessimistic estimate) P2.R06 Fiber optics purchase and install 480,000 500,000 900,000 653,671 131,578 P2.R07 Potential cost overrun on track costs 100,000 5,566,100 9,276,833 4,870,683 2,746,920 P2.R08 Opportunity that less than 100% 100,000 1,000,000 1,500,000-841,419 420,407 of line is born by MTA P2.R09 Risk of property price needed -1,500,000 0 1,000,000-198,093 748,518 to create wetlands P2.R10 Support and setup facility 0 150,000 250,000 130,190 74,854 P2.R11 Appraisal services ranged 225,000 600,000 900,000 570,299 201,703 P2.R12 Property acquisition 5,390,000 6,200,000 9,440,000 7,168,490 1,238,817 P2.R13 Locomotives uncertainty due to 3,500,000 6,500,000 8,750,000 6,202,966 1,569,664 exchange rate P2.R14 Bid uncertainty -1,440,000 0 2,880,000 571,169 1,299,908 P2.R15 Overrun on the rehab cars and 0 2,750,000 5,400,000 2,710,412 1,612,231 uncertainty on car condition P2.R16 Spare parts 961,200 2,352,000 5,140,800 2,906,516 1,257,752 P2.R17 Variability of engineering services -1,507,800 0 1,507,800 1 900,318 P2.R18 Escalation 0 3,250,000 5,500,000 2,853,867 1,645,986 Total 31,726,377 5,033,338 and 2, presented in the last row of risk registers, are respectively calculated to be $4,212,318 and $5,033,338. The analogous common risks in two projects are considered to be fully correlated (ρ 1.0). Then using Eq. (12), the correlation coefficient between costs of two projects is estimated: 1 ρ X1,X 2 4212318 5033338 (1496243 1436225+ 1051005 1569664+ (15) 1418143 1645986) 0.289 The result of Eq. (15) indicates that the Pearson Correlation Coefficient between costs of project 1 and 2 is 0.289 which is classified as a weak correlation. While the magnitude of correlation should be studied in the context of the application area, correlation coefficients of less than 0.50 are usually considered weak in similar engineering applications (Devore 2012). Pairwise correlation coefficients among project costs are necessary pieces of information that should be used by agencies for estimating of portfolio contingency using probabilistic methods. For instance, Bakhshi (2011) proposed a probabilistic model for calculation of contingency in a portfolio of construction projects. In order to reach an accurate contingency budget, the correlation coefficients between project costs are needed in this model. Ignoring or using incorrect correlation coefficients between project costs can lead to underestimating or overestimating of portfolio contingency. Therefore, it is indispensable to calculate pairwise project cost correlations in probabilistic portfolio budget estimating techniques. If there are more than two projects in a portfolio, the aforementioned steps are followed and the PMM is employed to calculate the correlation coefficient between costs of any two projects in the portfolio. In order to verify the estimated correlation in Eq. (13) and correctness of the model, we employed Monte Carlo simulation using @Risk (Palisade Corporation 2008) software. The simulation here is just employed to verify the outcome of the model. To this end, the risk registers of two hypothetical transit projects were modeled and full correlation was defined between three common risk factors in two projects. As indicated by the risk registers, the risks were modeled using a triangular distribution with three given points (5th percentile, most likely, 95th percentile) and the developed model was run for 50,000 iterations. The simulation results indicated Pearson s correlation coefficient of 0.287 among total cost of two projects which is very close to the analytical result. 5 CONCLUSION One problem facing the modeler in using the probabilistic approaches for cost estimating and budget de- 140

velopment for a project or a portfolio of projects is estimating the correlation coefficient between cost components (i.e., cost items or project costs). In order to reach a reasonable probabilistic cost estimate, the recognition of pairwise correlation between cost components is vital. Ignoring the dependency among cost components will result in underestimation of the total cost variance. As was described, the most common approach is to provide subjective estimates of correlation coefficients. To the best of our knowledge, there is no suggested method in literature for eliciting the correlation between costs of projects in a portfolio. In this paper, a new method, the PMM, was proposed to assist analysts systematically calculate the correlation coefficient between costs of two projects where there is no historical data available. It should be noted that the objective of the method is to help an agency estimate the pairwise correlation among costs of any two projects in their portfolio. This is a necessary piece of information to calculate portfolio contingency using probabilistic models. The PMM breaks down the cost of projects into a base cost which is deterministic and risk costs which can be either deterministic or probabilistic. It is the risk costs that form the randomness of the total cost and makes it possible to mathematically estimate the correlation between costs of two projects. Then, employing the risk register of the projects, an expert identifies the common risk factors among any two projects. Nowadays, for most of large projects the risk register is developed in the early stages of project s life. In those agencies that there is a template or risk catalogue, identification of common risk factors becomes easier and more accurate. Ultimately, a simple equation is developed to estimate the correlation between project costs using the standard deviations of identified common risk factors. The proposed method can be an effective tool for agencies that utilize probabilistic cost estimating techniques for their portfolio of projects where the recognition of pairwise correlation among project costs results in more precise budget estimates. REFERENCES Bakhshi, P. (2011). A Bayesian Model for Controlling Cost Overrun in a Portfolio of Construction Projects. PhD Dissertation, Northeastern University, Boston, Massachusetts, United States. Chau, K. (1995). Monte carlo simulation of construction costs using subjective data. Construction Management and Economics, 13(5), 369 383. Cho, S. (2006). An exploratory project expert system for eliciting correlation coefficient and sequential updating of duration estimation. Expert Systems with Applications, 30(4), 553 560. Devore, J. L. (2012). Probability and Statistics for Engineering and the Sciences. Cengage Learning, Boston, Massachusetts, United States. Iman, R. and Conover, W. (1982). A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics-Simulation and Computation, 11(3), 311 334. Iman, R. and Davenport, J. (1982). Rank correlation plots for use with correlated input variables. Communications in Statistics-Simulation and Computation, 11(3), 335 360. Ince, P. and Buongiono, J. (1991). Multivariate stochastic simulation with subjective multivariate normal distribution. Symposium on System Analysis in Forest Resources, Charleston, South Carolina, United States. Kurowicka, D. and Cooke, R. (2006). Uncertainty Analysis with High Dimensional Dependence Modeling. Wiley, Hoboken, New Jersey, United States. Palisade Corporation (2008). @Risk: Risk Analysis Add-in for Microsoft Excel. Ithaca, New York, United States. Ranasinghe, M. (2000). Impact of correlation and induced correlation on the estimation of project cost of buildings. Construction Management and Economics, 18(4), 395 406. Ranasinghe, M. and Russel, A. (1992). Treatment of correlation for risk analysis of engineering projects. Civil Engineering Systems, 9(1), 17 39. Touran, A. (1993). Probabilistic cost estimating with subjective correlations. Journal of Construction Engineering and Management, 119(1), 58 71. Touran, A. (2006). Owners risk reduction techniques using a CM. Construction Management Association of America, Jones Branch Drive, McLean, Virginia, United States. Touran, A. and Suphot, L. (1997). Rank correlation in simulating construction costs. Journal of Construction Engineering and Management, 123(3), 297 301. Touran, A. and Wiser, E. (1992). Monte carlo technique with correlated random variables. Journal of Construction Engineering and Management, 118(2), 258 272. Wall, M. D. (1997). Distributions and correlations in monte carlo simulation. Construction Management and Economics, 15(3), 241 258. Wang, W. (2002). Simulation-facilitated model for assessing cost correlations. Computer-Aided Civil and Infrastructure Engineering, 17(5), 368 380. Wang, W. and Demsetz, L. (2000). Model for evaluating networks under correlated uncertainty-netco. Journal of Construction Engineering and Management, 126(6), 458 466. Yang, I. (2006). Using gaussian copula to simulate repetitive projects. Construction Management and Economics, 24(9), 901 909. 141