NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam

Similar documents
Tests for the Difference Between Two Linear Regression Intercepts

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

Tests for Two Means in a Multicenter Randomized Design

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Final/Exam #3 Form B - Statistics 211 (Fall 1999)

4.2 Bernoulli Trials and Binomial Distributions

Simple Random Sampling. Sampling Distribution

Chapter 5. Sampling Distributions

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

1 Inferential Statistic

8.1 Estimation of the Mean and Proportion

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Discrete Probability Distributions

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Estimation and Confidence Intervals

STAT 1220 FALL 2010 Common Final Exam December 10, 2010

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Conover Test of Variances (Simulation)

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Final Exam Suggested Solutions

MAKING SENSE OF DATA Essentials series

Confidence Intervals Introduction

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Two-Sample Z-Tests Assuming Equal Variance

Binomial Probability

This is very simple, just enter the sample into a list in the calculator and go to STAT CALC 1-Var Stats. You will get

MATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, Student Name (print):

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Quantitative Methods


Confidence Intervals and Sample Size

Econometrics and Economic Data

Tests for Two Variances

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Tests for One Variance

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

Section Introduction to Normal Distributions

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Statistics 13 Elementary Statistics

This homework assignment uses the material on pages ( A moving average ).

MATH6911: Numerical Methods in Finance. Final exam Time: 2:00pm - 5:00pm, April 11, Student Name (print): Student Signature: Student ID:

MBA 7020 Sample Final Exam

GOALS. Discrete Probability Distributions. A Distribution. What is a Probability Distribution? Probability for Dice Toss. A Probability Distribution

The Two-Sample Independent Sample t Test

Discrete Probability Distributions Chapter 6 Dr. Richard Jerz

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

FNCE 4030 Fall 2012 Roberto Caccia, Ph.D. Midterm_2a (2-Nov-2012) Your name:

Finance 100: Corporate Finance

Overview. Definitions. Definitions. Graphs. Chapter 4 Probability Distributions. probability distributions

Estimation and Confidence Intervals

1/2 2. Mean & variance. Mean & standard deviation

Appendix S: Content Portfolios and Diversification

Chapter 5 Basic Probability

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

The Simple Regression Model

Chapter 8 Statistical Intervals for a Single Sample

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

Quantitative Methods

DRAM Weekly Price History

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay Solutions to Final Exam

Basic Procedure for Histograms

Introduction to Population Modeling

Principles of Econometrics Mid-Term

. (i) What is the probability that X is at most 8.75? =.875

PASS Sample Size Software

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Problem Set # Due Monday, April 19, 3004 by 6:00pm

The Simple Regression Model

5.3 Statistics and Their Distributions

King s College London

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

5.7 Probability Distributions and Variance

Business Statistics 41000: Probability 3

A) The first quartile B) The Median C) The third quartile D) None of the previous. 2. [3] If P (A) =.8, P (B) =.7, and P (A B) =.

6. THE BINOMIAL DISTRIBUTION

MAS187/AEF258. University of Newcastle upon Tyne

First Exam for MTH 23

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Chapter 4 Probability Distributions

Better decision making under uncertain conditions using Monte Carlo Simulation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

Soc 709 Lec 2 Inferences from Regression

Statistical Methods in Practice STAT/MATH 3379

Central Limit Theorem (cont d) 7/28/2006

Stats SB Notes 6.3 Completed.notebook April 03, Mar 23 5:22 PM. Chapter Outline. 6.1 Confidence Intervals for the Mean (σ Known)

Confidence Intervals for Pearson s Correlation

Lecture 9. Probability Distributions. Outline. Outline

Finance 100 Problem Set CAPM

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

Transcription:

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam Do not look at other pages until instructed to do so. The time limit is two hours. This exam consists of 6 problems. Do all of your work in the space provided on this exam. The exam is closed book. You may not use any materials, including internet resources. You may use a calculator, but not a personal computer. Name, printed: Cornell ID: Briefly explain below your previous experience in data analytics and modeling: Your signature below signifies that you understand and will abide by the Cornell Code of Academic Integrity. Sign Here:

1. A local bank reviewed its credit card policy with the intention of recalling some of its credit cards. In the past, 5% of cardholders defaulted, leaving the bank unable to collect the outstanding balance. The bank found that the probability of missing a monthly payment is 0.10 for customers who do not default (those customers make the payment eventually). Of course, the probability of missing a monthly payment for those who default is 1. a) Are defaulting and missing a monthly payment two independent events? Show calculations which justify your answer. b) Are defaulting and missing a monthly payment two mutually exclusive events? Explain. c) Given that a customer missed a monthly payment, compute the probability that the customer will default. a. They are dependent. For example, P(missing default) =1.0 P(missing) < 1.0. P(missing) requires a calculation; it is = P(Miss Def)*P(Def) + P(Miss No Def)*P(No Def) = 1.0*0.05 + 0.1*0.95 = 0.145 b. No; they can both happen at the same time. Default and no default are mutually exclusive. c. P(default missing) = P(default and missing)/p(missing) = 0.05/0.145 = 0.345 using joint probabilities from part a. 1

2. A taxi driver is considering renting snow tires in preparation for a big snow storm. The rental cost is $150 in total, and the tires can be rented for this storm only. If the storm materializes, the driver will make $200 in addition to the $300 he would normally make on regular days. He cannot drive (or make any money) in the storm without snow tires. If he decides to rent the snow tires he must do so several days before the storm hits. What should the probability of stormy weather be to justify renting? Show the calculation. Rent: He makes either $500-150 or $300-150 Expected payoff = P(storm)*350 + (1 P(storm))*150 Not rent: He makes either $0 or $300 Expected payoff = P(storm)* 0 + (1 P(storm))*300 To break even: = P(storm)*350 + (1 P(storm))*150 = (1 P(storm))*300 Solving: P(storm) = 150/500 = 0.3. P(storm) must be > 0.3 to justify renting. 2

3. The average time that an employee works at a call center, before leaving the company, is being studied. Suppose that you have been given the results as a Confidence Interval. You have been asked to explain those results to your CEO. a) Explain what a confidence interval means (make up numbers as necessary). b) After you present the results, the CEO is not satisfied, saying that the results are not precise enough. She wants the width of the confidence interval to be cut in half. What would you do to achieve the CEO s required level of precision? a. The true value of a parameter (such as the long-run average time spent in a call center) cannot be known from a sample. Suppose the sample average was 100 days. We might be able to say that we are 95% sure that the range from 82 to 118 contains the true long-run average. This actually a loose statement. Officially, we can say that 95% of the intervals constructed as we did this one will contain the true parameter value. b. The best answer is that if we quadruple the sample size, we expect the standard deviation, and the width of the confidence interval, to become half as large. If we had constructed a 99.5% (or other very high confidence interval, we could construct an 80% confidence interval that is half as wide. 3

4. A manufactured part is designed to have two holes in it, and the distance between the holes is supposed to be exactly 10mm. However, the machine varies in accuracy, and the standard deviation of the distance between holes is known to be 0.04 mm. Assume that distance between holes does NOT have the normal probability distribution. A sample of 100 parts will be used to test whether the machine is correctly adjusted for a 10mm separation of holes. a) Briefly explain the Type I and Type II errors that might occur in testing this hypothesis. Your explanation should be in terms of the question above. No numbers or formulas are necessary. b) Why is it acceptable to use the normal probability distribution to describe the distribution of the sample average in this case? c) Based on the numbers given above, give a numerical value for the standard error of the sample average distance between holes? a. The null hypothesis would be that the mean = 10 mm. A Type 1 error would be: believing that the null hypothesis is false, when it is true. A Type 2 error would be: believing the null hypothesis is true when it is false. Believing that is more often called accepting a hypothesis. b. Because of the central limit theorem. For most distributions for the data, it is reasonable to assume that the sample means are normally distributed. For n = 100, sample means would be normally distributed no matter what the distribution of the data is. c. Since we know that σ = 0.04, then σ(x) = σ/ 100 = 0.004 4

5. A real estate investor has estimated the following relationship for properties in a city of 100,000 people in the Midwest region of the United States: y = 2890 + 1.1 x 1 + 0.9 x 2 + 24.0 x 3 (0.000) (0.121) (0.002) (0.031) p-values given in parentheses where y is the sale price, x 1 is the appraised land value ($), x 2 is the appraised house value ($), and x 3 is the area of living space (square feet). The "R-square" was 0.78, based on a sample of 55 properties. a) Is there a statistically significant relationship between y and x 1 at the 5% significance level? Explain. b) Give an interpretation of the R-square value that is given above. c) The investor is interested in properties with appraised land value of $100,000, appraised house value of $200,000, and square footage of 3,000. What sales price does the model predict for such a property? d) In order for the statistical test in part a) to be valid, a series of regression assumptions must be satisfied. State two of those assumptions. a. No; the p-value given is 0.121 for x 1 >0.05. b. 78% of the variation in y is explained by the relationship. (It is not really explained; the correlation could be spurious.) R-square is also called the multiple coefficient of determination. c. y = 2890 + 1.1 x 1 + 0.9 x 2 + 24.0 x 3 = 2890 + 1.1*100000 + 0.9*200000 + 24*3000 = $364,890 d. The regression assumptions have to do with the residuals. It is assumed that: a. The expected value of the residuals is constant at a value of zero b. The variance of the residuals is constant c. Residuals are not correlated with each other, and d. Residuals are normally distributed. 5

6. DMG, Inc. is considering two different marketing strategies for its latest software product, conveniently denoted by Strategy A and Strategy B. In order to assess the two strategies, Paula Cooke built and ran a simulation model. Pertinent results are shown on the next page. Using these simulation results, answer the following questions. (a) What is the probability that Strategy A will be more profitable than Strategy B? (b) What is a 95 % confidence interval for the unknown true mean of the difference in earnings between Strategy A and Strategy B? (c) Paula Cooke has staked her position that the new software product will earn at least $400,000. What is the probability of achieving this goal under each of the two different marketing strategies? (d) Notice that the sample mean in the last column of the table is the difference between the sample means of the two different strategies. However, the sample standard deviation of the last column is not the difference between the sample standard deviations of the two strategies. Why is this so? (e) Which of the two marketing strategies would you recommend? Defend your decision with relevant analysis. a. Between 65% and 70%. b. 210,923 ± 1.96 379,443 20,000 c. Strategy A: more than 90%. Strategy B: more than 95%. d. The last column is the standard deviation of the difference for each A and B outcome. This calculation is nonlinear in the value of each outcome (i.e. nn ii=1 (xx ii xx ) 2 ). As a result, the standard nn 1 deviation of the difference in the sample means cannot be determined by taking the difference between the two sample standard deviations. e. This is somewhat subjective. Acceptable arguments include: a. Strategy A it has a higher mean return than B overall and it will produce a higher return with a 65% to 70% probability. b. Strategy B although it has a lower mean return than A, the standard deviation is also lower than A. There is a 0% to 5% chance than the earnings are less than $400,000 with B, but this chance increases to 5% to 10% with A. 6

DMG Simulation Model Simulation Results for 20,000 Trials (all dollar values are contribution to earnings) Marketing Strategy A Marketing Strategy B A B Sample Mean $1,152,853 $941,930 $210,923 Sample Standard Deviation $638,578 $439,774 $379,443 Cumulative Probability.0001 $47,780 $290,000 ($511,723).05 $355,766 $423,101 ($259,280).10 $465,761 $485,737 ($175,133).15 $531,758 $524,885 ($135,163).20 $608,755 $551,729 ($93,089).25 $674,752 $579,691 ($48,912).30 $762,748 $663,206 ($11,045).35 $825,078 $704,963 $22,614.40 $889,242 $751,940 $64,687.45 $971,738 $832,845 $85,724.50 $1,056,068 $888,957 $136,213.55 $1,114,732 $924,189 $169,872.60 $1,246,726 $981,606 $214,750.65 $1,334,722 $1,018,143 $275,056.70 $1,417,218 $1,080,779 $338,167.75 $1,466,716 $1,166,904 $380,240.80 $1,598,710 $1,237,369 $481,217.85 $1,697,705 $1,354,812 $590,609.90 $1,917,695 $1,566,208 $733,660.95 $2,500,669 $1,754,116 $927,199 1.00 $3,347,631 $2,638,850 $2,012,701 7