Environmental samples below the limits of detection comparing regression methods to predict environmental concentrations ABSTRACT INTRODUCTION
|
|
- Thomasina Snow
- 5 years ago
- Views:
Transcription
1 Environmental samples below the limits of detection comparing regression methods to predict environmental concentrations Daniel Smith, Elana Silver, Martha Harnly Environmental Health Investigations Branch, California Department of Health Services Richmond, CA ABSTRACT In exposure assessment studies of trace environmental contaminants, investigators often face the difficulty of dealing statistically with chemical values below the laboratory's limits of detection. For example, in a study of pesticides measured in the dust of homes near agricultural fields, the goal is to use characteristics of the household to predict pesticide concentrations, even though many of the measurements are below the detection limit. We illustrate two regression techniques that can be useful in this situation: quantile regression and tobit regression. Both are available in SAS, as PROC QUANTREG and PROC QLIM respectively, yet have not been used extensively by epidemiologists and environmental health scientists. Quantile regression predicts a given quantile, such as the 50th or 90th percentile, rather than the mean as in standard regression, thereby sidestepping uncertainties in the dependent variable at the lower end of the distribution. Tobit regression incorporates this uncertainty, analogous to censoring in survival analysis. Both techniques follow the familiar multiple regression paradigm and can be used to model the effect of several explanatory variables. INTRODUCTION Studies of trace environmental contaminants often challenge both the laboratory and the data analyst by having many values near the laboratory s limits of quantitation, and some below these limits. Consider the hypothetical example in the figure below. One hundred random numbers were drawn from a normal distribution with a mean of 10. The entire distribution is shown in the left panel. If these were concentrations of an environmental compound, and the laboratory s limit of reporting were 7, then 21 of the 100 values in this case would fall below the limit (middle panel). The investigator would know the distribution of observations above 7, but only the number of observations below 7.? One way of dealing with this problem is to substitute a constant for the unknown values (right panel), such as half the detection limit or the detection limit divided by 2 (Hornung and Reed, 1990). While this method works well when there are relatively few below-detect observations, if the proportion of such values is sizable the distribution will be skewed and no longer normal. Further, the standard deviation may be over- or under-estimated, depending on where the unknown values are heaped. These problems can violate assumptions and bias results when the chemical concentration is to be used as a dependent variable in statistical analysis, such as multiple regression. Other more complicated methods involve simulating values for the below-detects by sampling from hypothetical distributions, or imputing their values from the known covariates (Lubin et al., 2004). Such procedures take several steps and may not be familiar to the researcher. Two simpler procedures can be used, quantile regression and tobit regression. These analyses can be accomplished in a single analytical step that follows the familiar multiple regression paradigm and programs are available in popular statistical packages such as SAS, Stata, and R. METHODS To illustrate these procedures, we use data from the CHAMACOS study (Center for the Health Assessment of Mothers and Children of Salinas, House dust was sampled from 239 homes in the Salinas Valley of California, an area of intense agricultural activity, and analyzed for pesticides used in nearby fields. Residents of the homes were surveyed about the household members, including whether they were agricultural workers themselves. The dataset also includes the amounts of pesticides applied on fields near the home, distance to the fields, and meteorological data. The goal is to try to predict pesticide concentrations in the house dust from these data. 1
2 As expected for a study of this type, many of the pesticide concentrations are below the laboratory s limits of quantitation. For example, diazinon had 34 of 239 samples (14%) below the limits of quantitation (which was 2 ng/g of dust), chlorpyrifos had 13% below, and chlorthal-dimethyl 6% below. DDT (which has not been used in the United States since 1972) was still detectable in some samples, but had 57% of values below the limits. QUANTILE REGRESSION Standard linear regression predicts the mean of the dependent variable Y for a particular value of the predictor variable X. In contrast, quantile regression predicts a given quantile of Y, such as the 50th or 90th percentile, for a given X. The 50th percentile of Y is of course the median, and is perhaps of most interest as an analog to the mean of Y. However, there may be other quantiles of interest, depending on the application. Quantile regression is considered a robust regression procedure, because a quantile such as the median depends on the ranks of the Y values, and not on specific values in the tails of the distribution. In the example above, even though 21% of the values are known only to be less than 7, the median of the entire distribution will be the same no matter what the unknown values really are. Quantile regression in SAS is accomplished with PROC QUANTREG. QUANTREG is still experimental as of this writing, but is available for SAS 9.1 Windows on a test basis at We illustrate quantile regression by modeling the pesticide diazinon. The values below the detection limit can be represented in the data set by zeros or half the detection limit the value given to them doesn t matter for computing quantiles, as long as they are less than the detectable values. We use seven predictors, X3_00198 (the amount of pesticide applied in the last two weeks), tempmax_mean and precip_sum (measures of temperature and rainfall), dist (categories for the distance to the nearest field), agwk_nc (number of agricultural workers in the home), wkcloth4 (whether agricultural work clothes are worn or stored in the home), and ipov200 (whether the household income is within 200% of the poverty limit). These analyses are for illustrative purposes only, and do not show the final models used in the CHAMACOS study. proc quantreg; class dist; model Log10diazinon = x3_00198 tempmax_mean precip_sum dist agwk_nc wkcloth4 ipov200 / quantile = 0.5; Most of the QUANTREG syntax is familiar to users of SAS s regression programs. The option quantile = 0.5 specifies that we want to predict the median of Log10diazinon. We use the default procedure for calculating confidence intervals here (inverse rank); there is an option to calculate the intervals by resampling, shown later. Note that it is not necessary to take a log transformation of the diazinon concentration to induce normality, as normality is not an assumption of quantile regression. We do it here to compare results to other models. The QUANTREG Procedure Model Information Data Set MYSAS.PEST04 Dependent Variable Log10_diazinon_1 Log10 diazinon 1 Number of Independent Variables 7 Number of Continuous Independent Variables 6 Number of Class Independent Variables 1 Number of Observations 239 Optimization Algorithm Simplex Method for Confidence Limits Inv_Rank Number of Observations Read 239 Number of Observations Used 239 Class Level Information Name Levels Values DIST
3 QUANTREG prints out summary statistics. MAD is median absolute deviation, a robust measure of scale. Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD Log10_diazin Quantile and Objective Function Quantile 0.5 Objective Function Predicted Value at Mean Parameter Estimates 95% Confidence Parameter DF Estimate Limits Intercept x3_ tempmax_mean precip_sum DIST DIST DIST agwk_nc WKCLOTH ipov The interpretation of the regression coefficients is similar to other regression analyses. For example, the increase in the median Log10diazinon concentration associated with having work clothes in the home is estimated as 0.22 (95% confidence interval ). Transforming back to the original diazinon scale, this corresponds to multiplying the median diazinon concentration by 1.66 (95% confidence interval ). The Stata program for quantile regression (called qreg) provides a goodness-of-fit measure for the model, referred to as pseudo-r 2. This is simply the proportional reduction in the sum of the absolute value of the residuals. The same statistic can be calculated from PROC QUANTREG by running the regression twice, once with the intercept only and once with the full model, comparing the sums of the absolute residuals. proc quantreg data = mydata; model log10diazinon = / quantile = 0.5; *Intercept only model; output out = intmodel res = resint; *Save the data with residuals; run; proc quantreg data = intmodel; *Bring in saved data to run full model; class dist; model Log10diazinon = x3_00198 tempmax_mean precip_sum dist agwk_nc wkcloth4 ipov200 / quantile = 0.5; *Full model; output out = fullmodel res = resfull; *Save these residuals too; run; data _null_; set fullmodel end = lastrow; *Bring in the results of the previous run; sresfull+abs(resfull); *Sum up the absolute values of both residuals; sresint+abs(resint); if lastrow then do; pseudor2 = (sresint sresfull)/sresint; put pseudor2 = ; end; run; 3
4 Another common goodness-of-fit statistic is the squared correlation coefficient between the observed and the predicted values. This is easily accomplished by adding predicted = pred to the output statement of the full model above, then invoking PROC CORR: proc corr data = fullmodel; var Log10diazinon pred; Finally, ODS graphics can provide a display of the relationship of the predictor variables to all the quantiles. ods rtf; ods graphics on; proc quantreg ci=resampling; class dist; model Log10_diazinon_1 = tempmax_mean precip_sum dist agwk_nc wkcloth4 ipov200 / quantile=all plot=quantplot (wkcloth4) / unpackpanel; run; ods graphics off; ods rtf close; We specify that we want the coefficient for wearing work clothes in the home plotted for all quantiles, with confidence intervals created by resampling. The plot provides the regression coefficient for WKCLOTH4 across all quantiles of Log10diazinon. Through most of the range, the coefficient is around 0.2, as we saw above in the regression on the median. The null part of the curve for quantiles less than the 15th percentile corresponds to the range where all the values were below the detection limit. Between the 15th and 20th percentiles, the regression coefficient is high, indicating that wearing or storing work clothes in the home has a greater impact on diazinon for homes with relatively lower concentrations. A discussion of quantile regression with examples from ecology is provided in the review by Cade and Noon (2003). 4
5 TOBIT REGRESSION Tobit regression (named for Nobel prize-winning economist James Tobin) treats observations below the limit of detection as censored values lying somewhere between zero and the detection limit, and adjusts the variance accordingly (Lubin et al, 2004). This can be viewed as survival analysis with left censored data, and indeed tobit regression can be accomplished with a survival analysis program such as PROC LIFEREG: proc lifereg; class distnew model (logdiazlower, logdiazupper) = x3_00198 tempmax_mean precip_sum distnew agwk_nc wkcloth4 ipov200 / distribution = normal; Here, the program requires us to create two variables for our pesticide concentration, logdiazlower and logdiazupper. The variables have the identical value for each observation above the detection limit. But for observations below the detection limit, logdiazlower is set to. and logdiazupper is the detection limit. In other words, we tell the program that for these observations, the unknown value is bracketed between nothingness and the detection limit 1. Tobit regression assumes that the dependent variable is normally distributed (unlike quantile regression which doesn t care about the distribution), so we specify distribution = normal, and have log transformed our dependent variable in this case. The output informs us that we have 34 left censored variables (the below detects). The LIFEREG Procedure Model Information Data Set WORK.FORTOBIT Dependent Variable log10lower Dependent Variable log10upper Number of Observations 239 Noncensored Values 205 Right Censored Values 0 Left Censored Values 34 Interval Censored Values 0 Name of Distribution Normal Log Likelihood Number of Observations Read 239 Number of Observations Used 239 Class Level Information Name Levels Values distnew Algorithm converged. 1 Tobin s original 1958 paper begins with an ode to nothingness. 5
6 The output again follows the familiar multiple regression format. Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept x3_ tempmax_mean precip_sum distnew distnew distnew agwk_nc WKCLOTH ipov Comparing the estimate for the work clothes variable (WKCLOTH4), we see that we have a somewhat larger regression coefficient than with quantile regression, although we remind ourselves that quantile regression is predicting the median, and tobit regression predicts the mean, so the results are not strictly comparable. SAS also provides a tobit regression program in SAS/ETS, called QLIM. It uses different terminology but gives the same results. proc qlim; class distnew; model Log10_diazinon = x3_00198 tempmax_mean precip_sum distnew agwk_nc wkcloth4 ipov200; endogenous Log10_diazinon ~ censored(lb = 0.30); In the endogenous statement, we tell QLIM that the value of Log10diazinon is censored below the lower bound of 0.30 (which is log10 of the detection limit, 2 ng/g). The QLIM Procedure Summary Statistics of Continuous Responses N Obs N Obs Standard Lower Upper Lower Upper Variable Mean Error Type Bound Bound Bound Bound Log10_diazinon_ Censored Class Level Information Class Levels Values distnew Model Fit Summary Number of Endogenous Variables 1 Endogenous Variable Log10_diazinon_1 Number of Observations 239 Log Likelihood Maximum Absolute Gradient Number of Iterations 15 AIC Schwarz Criterion
7 Parameter Estimates Standard Approx Parameter Estimate Error t Value Pr > t Intercept x3_ tempmax_mean precip_sum distnew distnew distnew agwk_nc WKCLOTH ipov The same goodness-of-fit measures described above for quantile regression can be calculated for tobit regression, although the syntax for the output statement of QLIM differs from that of QUANTREG. DISCUSSION To compare results between the methods, we focus again on wearing or storing work clothes in the home, and summarize various regressions for three of the pesticides. These pesticides have different numbers of observations below detection limits. The regression coefficients predicting log pesticide concentration have been exponentiated to provide a multiplier of the mean or median pesticide concentration. We compare quantile and tobit regressions to a linear regression where values below the detection limit were replaced with half the detection limit. For both diazinon and chlorthal-dimethyl, quantile regression on the medians provided lower point estimates than linear regression on the mean, despite the fact that a univariate examination of these variables showed medians close to the means. Quantile regression on the median of DDT is not possible because more than half the observations were below the detection limit. We note that the confidence intervals for quantile regression vary depending on the choice of the method used to calculate them. Little guidance is provided in the SAS documentation. Furthermore, we have observed (results not shown) that SAS s QUANTREG and the quantile regression programs in Stata and R all produce somewhat different confidence intervals, indicating that the programs are using different algorithms. Generally, tobit regression provided point estimates closer to linear regression, but with varying confidence intervals depending on the proportion of below detect values (wider as the proportion of nondetects increases). For DDT, tobit regression gives a much higher point estimate than linear regression where 57% of the DDT observations are heaped at one low value (half the detection limit). In addition, the confidence interval for DDT is much wider from tobit than from linear regression (the standard error is doubled). This reflects the true uncertainty from having so many of the observations censored, a fact that is masked in linear regression when substituting half the detection limit. The tobit regression confidence intervals from SAS s LIFEREG and QLIM also agree with those from Stata and R. Table. Estimated increase in pesticide house dust concentration from wearing work clothes in the home versus not Linear regression Values below detection limit replaced with 1/2 limit Diazinon 14% < Limit of Detection Chlorthal-dimethyl 6% < Limit of Detection DDT 57% < Limit of Detection 2.09 ( ) ( ) 2.14 ( ) Quantile regression on median Inverse Rank confidence interval (default) 1.66 ( ) 1.45 ( ) Bootstrap confidence interval (n=500) 1.66 ( ) 1.45 ( ) Tobit 2.15 ( ) 1.55 ( ) 4.79 ( ) 1 Ninety-five percent confidence limits given in parentheses 7
8 CONCLUSION Both quantile and tobit regression can be useful for situations where some values of the dependent variable are censored or below detection limits. Quantile regression is a robust procedure that can be used to predict any quantile of interest, although the lack of a standard procedure for calculating the variance of the estimates can produce inconsistent results across different programs. Tobit regression has a longer history of application and produces consistent results across various implementations. It is limited however to estimating the mean. It also requires the assumption of normality of the dependent variable, as with linear regression. REFERENCES Cade BS, Noon BR A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment 1: Hornung RW, Reed LD Estimation of average concentration in the presence of nondetectable values. Applied Occupational and Environmental Hygiene 5: Lubin JH, Colt JS, Camann D, et al Epidemiologic evaluation of measurement data in the presence of detection limits. Environmental Health Perspectives 112: StataCorp Stata Statistical Software: Release 9. College Station, TX: StataCorp LP. Tobin J Estimation of relationships for limited dependent variables. Econometrica 26: Reprint available on-line at ACKNOWLEDGMENTS We thank the CHAMACOS investigators Brenda Eskenazi and Asa Bradman for use of a portion of their data for illustrating these procedures. The CHAMACOS study was jointly funded by the U.S Environmental Protection Agency grant RD and National Institute of Environmental Health Sciences grant PO1 ES Its contents do not necessarily represent the official views of the funding agencies. CONTACT INFORMATION Contact the author at: Daniel Smith Environmental Health Investigations Branch California Department of Health Services 850 Marina Bay Parkway, Building P, 3rd Floor Richmond, CA DSmith2@dhs.ca.gov Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 8
Five Things You Should Know About Quantile Regression
Five Things You Should Know About Quantile Regression Robert N. Rodriguez and Yonggang Yao SAS Institute #analyticsx Copyright 2016, SAS Institute Inc. All rights reserved. Quantile regression brings the
More informationQuantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting
Quantile Regression By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting Agenda Overview of Predictive Modeling for P&C Applications Quantile
More informationSubject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018
` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.
More informationTo be two or not be two, that is a LOGISTIC question
MWSUG 2016 - Paper AA18 To be two or not be two, that is a LOGISTIC question Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT A binary response is very common in logistic regression
More informationEstimation Procedure for Parametric Survival Distribution Without Covariates
Estimation Procedure for Parametric Survival Distribution Without Covariates The maximum likelihood estimates of the parameters of commonly used survival distribution can be found by SAS. The following
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS
TASK Run intervention analysis on the price of stock M: model a function of the price as ARIMA with outliers and interventions. SOLUTION The document below is an abridged version of the solution provided
More informationSAS Simple Linear Regression Example
SAS Simple Linear Regression Example This handout gives examples of how to use SAS to generate a simple linear regression plot, check the correlation between two variables, fit a simple linear regression
More informationQuantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY
ABSTRACT Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY In ordinary least squares (OLS) regression, we model the conditional mean of the response or dependent
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #6 EPSY 905: Maximum Likelihood In This Lecture The basics of maximum likelihood estimation Ø The engine that
More informationQuantile Regression in Survival Analysis
Quantile Regression in Survival Analysis Andrea Bellavia Unit of Biostatistics, Institute of Environmental Medicine Karolinska Institutet, Stockholm http://www.imm.ki.se/biostatistics andrea.bellavia@ki.se
More informationModel fit assessment via marginal model plots
The Stata Journal (2010) 10, Number 2, pp. 215 225 Model fit assessment via marginal model plots Charles Lindsey Texas A & M University Department of Statistics College Station, TX lindseyc@stat.tamu.edu
More informationWindow Width Selection for L 2 Adjusted Quantile Regression
Window Width Selection for L 2 Adjusted Quantile Regression Yoonsuh Jung, The Ohio State University Steven N. MacEachern, The Ohio State University Yoonkyung Lee, The Ohio State University Technical Report
More informationAlastair Hall ECG 790F: Microeconometrics Spring Computer Handout # 2. Estimation of binary response models : part II
Alastair Hall ECG 790F: Microeconometrics Spring 2006 Computer Handout # 2 Estimation of binary response models : part II In this handout, we discuss the estimation of binary response models with and without
More informationThe Two-Sample Independent Sample t Test
Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal
More informationEXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING
Multiple (Linear) Regression Introductory example Page 1 1 options ps=256 ls=132 nocenter nodate nonumber; 3 DATA ONE; 4 TITLE1 ''; 5 INPUT X1 X2 X3 Y; 6 **** LABEL Y ='Plant available phosphorus' 7 X1='Inorganic
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationFitting financial time series returns distributions: a mixture normality approach
Fitting financial time series returns distributions: a mixture normality approach Riccardo Bramante and Diego Zappa * Abstract Value at Risk has emerged as a useful tool to risk management. A relevant
More informationINTRODUCTION TO SURVIVAL ANALYSIS IN BUSINESS
INTRODUCTION TO SURVIVAL ANALYSIS IN BUSINESS By Jeff Morrison Survival model provides not only the probability of a certain event to occur but also when it will occur... survival probability can alert
More informationMarket Variables and Financial Distress. Giovanni Fernandez Stetson University
Market Variables and Financial Distress Giovanni Fernandez Stetson University In this paper, I investigate the predictive ability of market variables in correctly predicting and distinguishing going concern
More informationRobust Critical Values for the Jarque-bera Test for Normality
Robust Critical Values for the Jarque-bera Test for Normality PANAGIOTIS MANTALOS Jönköping International Business School Jönköping University JIBS Working Papers No. 00-8 ROBUST CRITICAL VALUES FOR THE
More informationWC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology
Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to
More informationChapter 7. Inferences about Population Variances
Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from
More informationUsing New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)
Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds INTRODUCTION Multicategory Logit
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationHigh Volatility Medium Volatility /24/85 12/18/86
Estimating Model Limitation in Financial Markets Malik Magdon-Ismail 1, Alexander Nicholson 2 and Yaser Abu-Mostafa 3 1 malik@work.caltech.edu 2 zander@work.caltech.edu 3 yaser@caltech.edu Learning Systems
More informationPARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS
PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi
More informationNCSS Statistical Software. Reference Intervals
Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and
More informationstarting on 5/1/1953 up until 2/1/2017.
An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,
More informationOnline Appendix: Revisiting the German Wage Structure
Online Appendix: Revisiting the German Wage Structure Christian Dustmann Johannes Ludsteck Uta Schönberg This Version: July 2008 This appendix consists of three parts. Section 1 compares alternative methods
More informationStatistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron
Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to
More informationKARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI
88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical
More informationLAST SECTION!!! 1 / 36
LAST SECTION!!! 1 / 36 Some Topics Probability Plotting Normal Distributions Lognormal Distributions Statistics and Parameters Approaches to Censor Data Deletion (BAD!) Substitution (BAD!) Parametric Methods
More informationDetermining Probability Estimates From Logistic Regression Results Vartanian: SW 541
Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541 In determining logistic regression results, you will generally be given the odds ratio in the SPSS or SAS output. However,
More informationQuantitative Techniques Term 2
Quantitative Techniques Term 2 Laboratory 7 2 March 2006 Overview The objective of this lab is to: Estimate a cost function for a panel of firms; Calculate returns to scale; Introduce the command cluster
More informationSYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4
The syllabus for this exam is defined in the form of learning objectives that set forth, usually in broad terms, what the candidate should be able to do in actual practice. Please check the Syllabus Updates
More information9. Logit and Probit Models For Dichotomous Data
Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar
More informationMarket Risk Analysis Volume I
Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii
More informationPASS Sample Size Software
Chapter 850 Introduction Cox proportional hazards regression models the relationship between the hazard function λ( t X ) time and k covariates using the following formula λ log λ ( t X ) ( t) 0 = β1 X1
More informationData Appendix. A.1. The 2007 survey
Data Appendix A.1. The 2007 survey The survey data used draw on a sample of Italian clients of a large Italian bank. The survey was conducted between June and September 2007 and elicited detailed financial
More informationIntroduction to Population Modeling
Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create
More informationPrevious articles in this series have focused on the
CAPITAL REQUIREMENTS Preparing for Basel II Common Problems, Practical Solutions : Time to Default by Jeffrey S. Morrison Previous articles in this series have focused on the problems of missing data,
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationThe SAS System 11:03 Monday, November 11,
The SAS System 11:3 Monday, November 11, 213 1 The CONTENTS Procedure Data Set Name BIO.AUTO_PREMIUMS Observations 5 Member Type DATA Variables 3 Engine V9 Indexes Created Monday, November 11, 213 11:4:19
More informationCopyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.
Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1
More informationOnline Appendix to. The Value of Crowdsourced Earnings Forecasts
Online Appendix to The Value of Crowdsourced Earnings Forecasts This online appendix tabulates and discusses the results of robustness checks and supplementary analyses mentioned in the paper. A1. Estimating
More informationproc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';
BIOS 6244 Analysis of Categorical Data Assignment 5 s 1. Consider Exercise 4.4, p. 98. (i) Write the SAS code, including the DATA step, to fit the linear probability model and the logit model to the data
More informationArtificially Intelligent Forecasting of Stock Market Indexes
Artificially Intelligent Forecasting of Stock Market Indexes Loyola Marymount University Math 560 Final Paper 05-01 - 2018 Daniel McGrath Advisor: Dr. Benjamin Fitzpatrick Contents I. Introduction II.
More informationContents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali
Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous
More informationMultiple Regression. Review of Regression with One Predictor
Fall Semester, 2001 Statistics 621 Lecture 4 Robert Stine 1 Preliminaries Multiple Regression Grading on this and other assignments Assignment will get placed in folder of first member of Learning Team.
More informationTABLE OF CONTENTS - VOLUME 2
TABLE OF CONTENTS - VOLUME 2 CREDIBILITY SECTION 1 - LIMITED FLUCTUATION CREDIBILITY PROBLEM SET 1 SECTION 2 - BAYESIAN ESTIMATION, DISCRETE PRIOR PROBLEM SET 2 SECTION 3 - BAYESIAN CREDIBILITY, DISCRETE
More informationR. Kerry 1, M. A. Oliver 2. Telephone: +1 (801) Fax: +1 (801)
The Effects of Underlying Asymmetry and Outliers in data on the Residual Maximum Likelihood Variogram: A Comparison with the Method of Moments Variogram R. Kerry 1, M. A. Oliver 2 1 Department of Geography,
More informationStat 328, Summer 2005
Stat 328, Summer 2005 Exam #2, 6/18/05 Name (print) UnivID I have neither given nor received any unauthorized aid in completing this exam. Signed Answer each question completely showing your work where
More informationLloyds TSB. Derek Hull, John Adam & Alastair Jones
Forecasting Bad Debt by ARIMA Models with Multiple Transfer Functions using a Selection Process for many Candidate Variables Lloyds TSB Derek Hull, John Adam & Alastair Jones INTRODUCTION: No statistical
More information574 Flanders Drive North Woodmere, NY ~ fax
DM STAT-1 CONSULTING BRUCE RATNER, PhD 574 Flanders Drive North Woodmere, NY 11581 br@dmstat1.com 516.791.3544 ~ fax 516.791.5075 www.dmstat1.com The Missing Statistic in the Decile Table: The Confidence
More informationQuantile regression and surroundings using SAS
Appendix B Quantile regression and surroundings using SAS Introduction This appendix is devoted to the presentation of the main commands available in SAS for carrying out a complete data analysis, that
More informationVARIANCE ESTIMATION FROM CALIBRATED SAMPLES
VARIANCE ESTIMATION FROM CALIBRATED SAMPLES Douglas Willson, Paul Kirnos, Jim Gallagher, Anka Wagner National Analysts Inc. 1835 Market Street, Philadelphia, PA, 19103 Key Words: Calibration; Raking; Variance
More informationtm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}
PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on:
More information**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:
**BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,
More informationModel Construction & Forecast Based Portfolio Allocation:
QBUS6830 Financial Time Series and Forecasting Model Construction & Forecast Based Portfolio Allocation: Is Quantitative Method Worth It? Members: Bowei Li (303083) Wenjian Xu (308077237) Xiaoyun Lu (3295347)
More informationThe Baumol-Tobin and the Tobin Mean-Variance Models of the Demand
Appendix 1 to chapter 19 A p p e n d i x t o c h a p t e r An Overview of the Financial System 1 The Baumol-Tobin and the Tobin Mean-Variance Models of the Demand for Money The Baumol-Tobin Model of Transactions
More informationyuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0
yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0 Emanuele Guidotti, Stefano M. Iacus and Lorenzo Mercuri February 21, 2017 Contents 1 yuimagui: Home 3 2 yuimagui: Data
More informationMaximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017
Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical
More informationLecture 21: Logit Models for Multinomial Responses Continued
Lecture 21: Logit Models for Multinomial Responses Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University
More informationInternet Appendix. The survey data relies on a sample of Italian clients of a large Italian bank. The survey,
Internet Appendix A1. The 2007 survey The survey data relies on a sample of Italian clients of a large Italian bank. The survey, conducted between June and September 2007, provides detailed financial and
More informationThe Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis
The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis WenShwo Fang Department of Economics Feng Chia University 100 WenHwa Road, Taichung, TAIWAN Stephen M. Miller* College of Business University
More informationThe Fundamentals of Reserve Variability: From Methods to Models Central States Actuarial Forum August 26-27, 2010
The Fundamentals of Reserve Variability: From Methods to Models Definitions of Terms Overview Ranges vs. Distributions Methods vs. Models Mark R. Shapland, FCAS, ASA, MAAA Types of Methods/Models Allied
More information1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range
February 19, 2004 EXAM 1 : Page 1 All sections : Geaghan Read Carefully. Give an answer in the form of a number or numeric expression where possible. Show all calculations. Use a value of 0.05 for any
More informationCHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
More informationData screening, transformations: MRC05
Dale Berger Data screening, transformations: MRC05 This is a demonstration of data screening and transformations for a regression analysis. Our interest is in predicting current salary from education level
More informationbook 2014/5/6 15:21 page 261 #285
book 2014/5/6 15:21 page 261 #285 Chapter 10 Simulation Simulations provide a powerful way to answer questions and explore properties of statistical estimators and procedures. In this chapter, we will
More informationRescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models
Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models Dirk Enzmann & Ulrich Kohler University of Hamburg, dirk.enzmann@uni-hamburg.de
More informationChapter 6 Part 3 October 21, Bootstrapping
Chapter 6 Part 3 October 21, 2008 Bootstrapping From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the
More informationPresented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop -
Applying the Pareto Principle to Distribution Assignment in Cost Risk and Uncertainty Analysis James Glenn, Computer Sciences Corporation Christian Smart, Missile Defense Agency Hetal Patel, Missile Defense
More informationis the bandwidth and controls the level of smoothing of the estimator, n is the sample size and
Paper PH100 Relationship between Total charges and Reimbursements in Outpatient Visits Using SAS GLIMMIX Chakib Battioui, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is
More informationBloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0
Portfolio Value-at-Risk Sridhar Gollamudi & Bryan Weber September 22, 2011 Version 1.0 Table of Contents 1 Portfolio Value-at-Risk 2 2 Fundamental Factor Models 3 3 Valuation methodology 5 3.1 Linear factor
More informationChapter IV. Forecasting Daily and Weekly Stock Returns
Forecasting Daily and Weekly Stock Returns An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts -for support rather than for illumination.0 Introduction In the previous chapter,
More informationSMALL AREA ESTIMATES OF INCOME: MEANS, MEDIANS
SMALL AREA ESTIMATES OF INCOME: MEANS, MEDIANS AND PERCENTILES Alison Whitworth (alison.whitworth@ons.gsi.gov.uk) (1), Kieran Martin (2), Cruddas, Christine Sexton, Alan Taylor Nikos Tzavidis (3), Marie
More informationProbability & Statistics Modular Learning Exercises
Probability & Statistics Modular Learning Exercises About The Actuarial Foundation The Actuarial Foundation, a 501(c)(3) nonprofit organization, develops, funds and executes education, scholarship and
More informationFinancial Econometrics
Financial Econometrics Introduction to Financial Econometrics Gerald P. Dwyer Trinity College, Dublin January 2016 Outline 1 Set Notation Notation for returns 2 Summary statistics for distribution of data
More informationLong Run Stock Returns after Corporate Events Revisited. Hendrik Bessembinder. W.P. Carey School of Business. Arizona State University.
Long Run Stock Returns after Corporate Events Revisited Hendrik Bessembinder W.P. Carey School of Business Arizona State University Feng Zhang David Eccles School of Business University of Utah May 2017
More informationChapter 8 Statistical Intervals for a Single Sample
Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample
More informationCalculating the Probabilities of Member Engagement
Calculating the Probabilities of Member Engagement by Larry J. Seibert, Ph.D. Binary logistic regression is a regression technique that is used to calculate the probability of an outcome when there are
More information1. Overall approach to the tool development
Poverty Assessment Tool Submission USAID/IRIS Tool for Serbia Submitted: June 27, 2008 Updated: February 15, 2013 (text clarification; added decimal values to coefficients) The following report is divided
More informationJaime Frade Dr. Niu Interest rate modeling
Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,
More informationQuantile Regression due to Skewness. and Outliers
Applied Mathematical Sciences, Vol. 5, 2011, no. 39, 1947-1951 Quantile Regression due to Skewness and Outliers Neda Jalali and Manoochehr Babanezhad Department of Statistics Faculty of Sciences Golestan
More informationStatistics and Finance
David Ruppert Statistics and Finance An Introduction Springer Notation... xxi 1 Introduction... 1 1.1 References... 5 2 Probability and Statistical Models... 7 2.1 Introduction... 7 2.2 Axioms of Probability...
More informationWestfield Boulevard Alternative
Westfield Boulevard Alternative Supplemental Concept-Level Economic Analysis 1 - Introduction and Alternative Description This document presents results of a concept-level 1 incremental analysis of the
More informationStat 401XV Exam 3 Spring 2017
Stat 40XV Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationAssessing the reliability of regression-based estimates of risk
Assessing the reliability of regression-based estimates of risk 17 June 2013 Stephen Gray and Jason Hall, SFG Consulting Contents 1. PREPARATION OF THIS REPORT... 1 2. EXECUTIVE SUMMARY... 2 3. INTRODUCTION...
More informationSTATISTICAL DISTRIBUTIONS AND THE CALCULATOR
STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either
More informationThis homework assignment uses the material on pages ( A moving average ).
Module 2: Time series concepts HW Homework assignment: equally weighted moving average This homework assignment uses the material on pages 14-15 ( A moving average ). 2 Let Y t = 1/5 ( t + t-1 + t-2 +
More informationComparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models
Western Kentucky University From the SelectedWorks of Matt Bogard Spring March 11, 2016 Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models Matt Bogard Available
More informationDuration Models: Parametric Models
Duration Models: Parametric Models Brad 1 1 Department of Political Science University of California, Davis January 28, 2011 Parametric Models Some Motivation for Parametrics Consider the hazard rate:
More informationConfidence Intervals for an Exponential Lifetime Percentile
Chapter 407 Confidence Intervals for an Exponential Lifetime Percentile Introduction This routine calculates the number of events needed to obtain a specified width of a confidence interval for a percentile
More informationECON Introductory Econometrics. Lecture 1: Introduction and Review of Statistics
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 1-2 Lecture outline 2 What is econometrics? Course
More information1. Overall approach to the tool development
Poverty Assessment Tool Submission USAID/IRIS Tool for Ethiopia Submitted: September 24, 2008 Revised (correction to 2005 PPP): December 17, 2009 The following report is divided into six sections. Section
More informationWeb Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data
More informationWage Determinants Analysis by Quantile Regression Tree
Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a
More informationEstimating the Current Value of Time-Varying Beta
Estimating the Current Value of Time-Varying Beta Joseph Cheng Ithaca College Elia Kacapyr Ithaca College This paper proposes a special type of discounted least squares technique and applies it to the
More information