Experimental Design and Statistics - AGA47A Czech University of Life Sciences in Prague Department of Genetics and Breeding Fall/Winter 2014/2015 Matúš Maciak (@ A 211) Office Hours: M 14:00 15:30 W 15:30 17:00 (or by appointment) 1 / 15
Brief Overview Some Useful Formulas for a random sample X 1,..., X n N(µ, σ 2 ) with σ 2 > 0 known: n X n µ σ N(0, 1) for a random sample X 1,..., X n N(µ, σ 2 ) with σ 2 > 0 unknown: n X n µ s n t n 1 for a random sample X 1,..., X n N(µ, σ 2 ) it holds: (n 1)s 2 n σ 2 χ 2 n 1 2 / 15
Brief Overview Confidence intervals for the sample mean X n with known variance σ 2 it holds: [ )] σ σ (X n u α/2 n, X n + u α/2 n P µ = 1 α; for the sample mean X n with unknown variance σ 2 it holds: [ ( )] P µ X n t n 1(α/2) sn, X n + t n 1(α/2) sn = 1 α; n n for the sample variance sn 2 it holds: [ ( )] (n 1)s P σ 2 2 n χ n 1(1 α/2), (n 1)sn 2 = 1 α; χ n 1(α/2) 3 / 15
Brief Overview Confidence intervals for the sample mean X n with known variance σ 2 it holds: [ )] σ σ (X n u α/2 n, X n + u α/2 n P µ = 1 α; for the sample mean X n with unknown variance σ 2 it holds: [ ( )] P µ X n t n 1(α/2) sn, X n + t n 1(α/2) sn = 1 α; n n for the sample variance sn 2 it holds: [ ( )] (n 1)s P σ 2 2 n χ n 1(1 α/2), (n 1)sn 2 = 1 α; χ n 1(α/2) In a very analogous way one can also construct one-sided confidence intervals for parameters µ and σ 2 ; 3 / 15
Brief Overview William Sealy Gosset - The Student 4 / 15
One sample problems in statistics Common one sample problems for one random sample X 1,..., X n N(µ, σ 2 ), for σ 2 > 0, the most common statistical problems one usually encounters include: 5 / 15
One sample problems in statistics Common one sample problems for one random sample X 1,..., X n N(µ, σ 2 ), for σ 2 > 0, the most common statistical problems one usually encounters include: estimating unknown parameters (e.g. µ R and σ 2 > 0); (various methods proposed, most common is the method of moments) constructing confidence intervals for µ R and σ 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) testing a pair of hypotheses about the true values of µ R or σ 2 > 0; (for a given critical value α (0, 1 2 )) 5 / 15
Two sample inference - motivation in statistics we also focus on problems which can be related to more than just one sample (e.g. comparison of two samples); the simplest scenario is to consider two different random samples and to answer a statistical question how are these samples different; there are many various characteristics that can be used to judge the difference between two populations: 6 / 15
Two sample inference - motivation in statistics we also focus on problems which can be related to more than just one sample (e.g. comparison of two samples); the simplest scenario is to consider two different random samples and to answer a statistical question how are these samples different; there are many various characteristics that can be used to judge the difference between two populations: two random samples X 1,..., X n1 F 1 (x) and Y 1,..., Y n2 F 2 (x); two distribution functions F 1 and F 2 ; how can they be different? 6 / 15
Two sample inference - motivation in statistics we also focus on problems which can be related to more than just one sample (e.g. comparison of two samples); the simplest scenario is to consider two different random samples and to answer a statistical question how are these samples different; there are many various characteristics that can be used to judge the difference between two populations: two random samples X 1,..., X n1 F 1 (x) and Y 1,..., Y n2 F 2 (x); two distribution functions F 1 and F 2 ; how can they be different?... different distributions F 1 and F 2 ;... different functional shapes of F 1 and F 2 ;... different range of values for F 1 and F 2 ;... different locations mean parameters µ 1 and µ 2 ;... different scale variance parameters σ 2 1 and σ2 2 ;......... 6 / 15
Two sample inference - motivation in statistics we also focus on problems which can be related to more than just one sample (e.g. comparison of two samples); the simplest scenario is to consider two different random samples and to answer a statistical question how are these samples different; there are many various characteristics that can be used to judge the difference between two populations: two random samples X 1,..., X n1 F 1 (x) and Y 1,..., Y n2 F 2 (x); two distribution functions F 1 and F 2 ; how can they be different?... different distributions F 1 and F 2 ;... different functional shapes of F 1 and F 2 ;... different range of values for F 1 and F 2 ;... different locations mean parameters µ 1 and µ 2 ;... different scale variance parameters σ 2 1 and σ2 2 ;......... Inference is again based on confidence intervals and hypotheses tests; 6 / 15
Two sample problem - motivation Density 0.0 0.1 0.2 0.3 0.4 5 0 5 we need some statistical approaches to reveal the difference; we need some decision criteria to judge the difference; 7 / 15
Two sample problem - motivation Density 0.0 0.1 0.2 0.3 0.4 5 0 5 we need some statistical approaches to reveal the difference; we need some decision criteria to judge the difference; 7 / 15
Two sample problem - motivation Density 0.0 0.1 0.2 0.3 0.4 0 5 10 we need some statistical approaches to reveal the difference; we need some decision criteria to judge the difference; 7 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: 8 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: parameter estimates for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; confidence intervals for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) hypothesis tests about the true values of µ R and σ 2 > 0; (for a given critical value α (0, 1 2 )) 8 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: parameter estimates for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; parameter estimate for the difference µ 1 µ 2 R; confidence intervals for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) hypothesis tests about the true values of µ R and σ 2 > 0; (for a given critical value α (0, 1 2 )) 8 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: parameter estimates for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; parameter estimate for the difference µ 1 µ 2 R; confidence intervals for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) confidence interval for the difference µ 1 µ 2 R; (with a given confidence level (1 α) for some α (0, 1 2 )) hypothesis tests about the true values of µ R and σ 2 > 0; (for a given critical value α (0, 1 2 )) 8 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: parameter estimates for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; parameter estimate for the difference µ 1 µ 2 R; confidence intervals for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) confidence interval for the difference µ 1 µ 2 R; (with a given confidence level (1 α) for some α (0, 1 2 )) hypothesis tests about the true values of µ R and σ 2 > 0; (for a given critical value α (0, 1 2 )) hypothesis tests about the true value of µ 1 µ 2 R; (for a given critical value α (0, 1 2 )) 8 / 15
Two sample problem (Gaussian) for two random samples X 1,..., X n1 N(µ 1, σ1 2) and Y 1,..., Y n2 N(µ 2, σ2 2), for σ2 1, σ2 2 > 0, the most common statistical problems (questions we are interested in) are: parameter estimates for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; parameter estimate for the difference µ 1 µ 2 R; confidence intervals for µ 1, µ 2 R and σ 2 1, σ 2 2 > 0; (with a given confidence level (1 α) for some α (0, 1 2 )) confidence interval for the difference µ 1 µ 2 R; (with a given confidence level (1 α) for some α (0, 1 2 )) hypothesis tests about the true values of µ R and σ 2 > 0; (for a given critical value α (0, 1 2 )) hypothesis tests about the true value of µ 1 µ 2 R; (for a given critical value α (0, 1 2 )) comparing two variances: σ 2 1 vs. σ 2 2; 8 / 15
Paired vs. Independent Samples let us assume an experiment producing two random samples:: 1 X 1,..., X n1 N(µ 1, σ 2 1 ); 2 Y 1,..., Y n2 N(µ 2, σ 2 2 ); 9 / 15
Paired vs. Independent Samples let us assume an experiment producing two random samples:: 1 X 1,..., X n1 N(µ 1, σ 2 1 ); 2 Y 1,..., Y n2 N(µ 2, σ 2 2 ); Various options are possible and one needs to distinguish among them: 9 / 15
Paired vs. Independent Samples let us assume an experiment producing two random samples:: 1 X 1,..., X n1 N(µ 1, σ 2 1 ); 2 Y 1,..., Y n2 N(µ 2, σ 2 2 ); Various options are possible and one needs to distinguish among them: random samples are balanced (n 1 = n 2 ) or they are not (n 1 n 2 ); random samples are only shifted (µ 1 µ 2 ), however, with the same variance σ1 2 = σ2 2 - homoscedastic samples; random samples are shifted and scaled (µ 1 µ 2 and σ1 2 σ2 2 ) - heteroscedastic samples; for balanced samples we can have observations X i and Y i being always measured on the same subject for every i = 1,..., n 1 = n 2 ; observations X i and Y j are always measured independently on two different subjects, for i = 1,..., n 1 and j = 1,..., n 2 ;... 9 / 15
Design of Experiments How it all goes: question of interest design of experiment collecting data statistical evaluation results interpretation answering the question of interest 10 / 15
Design of Experiments How it all goes: question of interest design of experiment collecting data statistical evaluation results interpretation answering the question of interest To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. Ronald Fisher (1890 1962) 10 / 15
Design of Experiments How it all goes: question of interest design of experiment collecting data statistical evaluation results interpretation answering the question of interest important is the question of interest behind the whole experiment; given the question of interest statistician designs an experiment; there are many ways how to design an experiment: 10 / 15
Design of Experiments How it all goes: question of interest design of experiment collecting data statistical evaluation results interpretation answering the question of interest important is the question of interest behind the whole experiment; given the question of interest statistician designs an experiment; there are many ways how to design an experiment: randomized experiment; factorial design (fully crossed design); block design experiment; blind vs. double-blind experiment; independent random samples; paired samples; 10 / 15
Estimating unknown parameters for two sample problem Parameter estimates sample means: X n1 = 1 n1 n 1 i=1 Xi and Y n 2 = 1 n2 Yi; n 2 i=1 sample variances: s 2 n 1 = 1 n 1 1 n1 i=1 ( Xi X n1 ) 2 and s 2 n2 = 1 n 2 1 n2 i=1 ( Yi Y n2 ) 2; 11 / 15
Estimating unknown parameters for two sample problem Parameter estimates sample means: X n1 = 1 n1 n 1 i=1 Xi and Y n 2 = 1 n2 Yi; n 2 i=1 sample variances: s 2 n 1 = 1 n 1 1 n1 i=1 ( Xi X n1 ) 2 and s 2 n2 = 1 n 2 1 How to estimate the difference? for paired samples 1 n n i=1 (X i Y i ); for independent samples: X n1 Y n2 ; n2 i=1 ( Yi Y n2 ) 2; 11 / 15
Estimating unknown parameters for two sample problem Parameter estimates sample means: X n1 = 1 n1 n 1 i=1 Xi and Y n 2 = 1 n2 Yi; n 2 i=1 sample variances: s 2 n 1 = 1 n 1 1 n1 i=1 ( Xi X n1 ) 2 and s 2 n2 = 1 n 2 1 How to estimate the difference? for paired samples 1 n n i=1 (X i Y i ); for independent samples: X n1 Y n2 ; What are the corresponding distributions? 1 n n i=1 (X i Y i ) N(µ 1 µ 2, σ 2 ) X n1 Y n2 N(µ 1 µ 2, σ 2 ) n2 i=1 ( Yi Y n2 ) 2; 11 / 15
Estimating unknown parameters for two sample problem Parameter estimates sample means: X n1 = 1 n1 n 1 i=1 Xi and Y n 2 = 1 n2 Yi; n 2 i=1 sample variances: s 2 n 1 = 1 n 1 1 n1 i=1 ( Xi X n1 ) 2 and s 2 n2 = 1 n 2 1 How to estimate the difference? for paired samples 1 n n i=1 (X i Y i ); for independent samples: X n1 Y n2 ; What are the corresponding distributions? 1 n n i=1 (X i Y i ) N(µ 1 µ 2, σ 2 ) X n1 Y n2 N(µ 1 µ 2, σ 2 ) n2 i=1 ( Yi Y n2 ) 2; What are the corresponding variance parameters σ 2 > 0; either σ 2 1 = σ2 2 or σ 2 1 σ2 2 11 / 15
Estimating unknown parameters for two sample problem Variance parameters estimation What are the corresponding estimates for σ 2 > 0 under the different scenarios? for equal or unequal sample sizes n 1 and n 2 and equal variances σ 2 1 = σ 2 2: σ 2 = σ 2 XY ( 1 n 1 + 1 n 2 ), for σ 2 XY = (n1 1)s2 n 1 + (n 2 1)s 2 n 2 n 1 + n 2 2 for σ XY 2 to called a pooled variance estimate; for equal (unequal) sample sizes n 1, n 2 and unequal variances σ 2 1 σ 2 2: σ 2 = s2 n 1 n 1 + s2 n 2 n 2 12 / 15
Estimating unknown parameters for two sample problem Degrees of Freedom Calculations For paired samples (n 1 = n 2): degrees of freedom = n 1 13 / 15
Estimating unknown parameters for two sample problem Degrees of Freedom Calculations For paired samples (n 1 = n 2): degrees of freedom = n 1 For independent samples (n 1 2 and σ 2 1 = σ 2 2): degrees of freedom = n 1 + n 2 2 13 / 15
Estimating unknown parameters for two sample problem Degrees of Freedom Calculations For paired samples (n 1 = n 2): degrees of freedom = n 1 For independent samples (n 1 2 and σ 2 1 = σ 2 2): degrees of freedom = n 1 + n 2 2 General procedure (n 1 n 2 and σ1 2 σ2): 2 ( ) σ 2 2 A + σ2 B n m degrees of freedom = σ 4 A n 2 (n 1) + σ4 B m 2 (m 1) 13 / 15
Estimating unknown parameters for two sample problem Degrees of Freedom Calculations For paired samples (n 1 = n 2): degrees of freedom = n 1 For independent samples (n 1 2 and σ 2 1 = σ 2 2): degrees of freedom = n 1 + n 2 2 General procedure (n 1 n 2 and σ1 2 σ2): 2 ( ) σ 2 2 A + σ2 B n m degrees of freedom = σ 4 A n 2 (n 1) + σ4 B m 2 (m 1) Conservative approach (independent samples n m): degrees of freedom = min{n, m} 1 13 / 15
Estimating unknown parameters for two sample problem Degrees of Freedom 14 / 15
Estimating unknown parameters for two sample problem To be continued... comparing two sample variances; inference on population proportion; some other statistical tests;... 15 / 15