Outlie Populatio Characteristics Types of Samples Sample Characterstics Sample Aalogue Estimatio Populatios Defs: A (fiite) populatio is a (fiite) set P of elemets e. A variable is a fuctio v : P IR. Examples a). P cosists of all Michiga household, v is household icome. b). P cosists of all GRE examiees v is verbal score. Populatio Characteristics The Populatio Size. Let deote the populatio size #P. The Populatio Mea Variace. The populatio mea variace are µ 1 v(e) e P σ 2 1 [v(e) µ] 2. e P Populatio Characteristics Cotiued The Populatio Distributio Fuctio: The distributio fuctio of X is #{e : v(e) x} F (x), F is called the populatio distributio fuctio (for v). Characteristics of F, like percetiles, are called populatio characteristics. ote: If a elemet e is selected rom (equally likely), the X v(e) is a rom variable for which X F, E(X) µ σ 2 X σ2.
F Ifiite Populatios: A Coveiet Fictio For large populatios, it may be useful to preted that the populatio distributio fuctio has a give form, usually ormal. Example: GRE Score: Approximately 1.2 millo studets took the GRE bewtee 2001 2004. Cosider verbal scores. The populatio mea variace are µ 469 σ 2 120. Let F be the populatio DF. The ( ) ( ) x µ x 469 F (x) Φ Φ σ 120 0.0 0.2 0.4 0.6 0.8 1.0 GRE Scores 200 300 400 500 600 700 800 x Samplig Def: A sample is a subset of the populatio, usually selected at rom. (More o this later). Deote the sample size by the sample values by X 1,, X, Sample Characteristics: Computed by the same (or slightly modified) formulas used for populatio characteristics. For example, X X 1 + + X, is called the sample mea. The Sample Mea Sample Characteristics AKA Sample Aalogue Estimatio X X 1 + + X, The Sample Variace ( ) 1 S 2 (X I 1 X) 2, The Sample Distributio Fuctio F # (z) #{i : X i z}.
Types of Samples Simple Rom Samples: Selected elemets from the populatio at rom (equally likely). The Other Types of Samples Stratified Samples: Suppose P has some structure, say P [X i z] #{e P : v(e) z} the populatio distributio fuctio. With replacemet: X 1,, X id F. F (z), P P 1 P M, where the P i are mutually exclusive. The, we could take simple rom samples X i,1,, X i,i Without replaemet: X 1,, X are depedet. However, if the populatio size is large, the there is is ot much depedece, this is frequetly igored. from each of the P i. Example: Geographical regios. More o Simple Rom Samples Two Stage Samples: With P P 1 P M, as above: : Select {i 1,, i m } from {1,, M}. : Take a simple rom sample from each P j. Systematic Samples Samplig Proportioal to Size If X 1,, X is a simple rom sample, the X i F, the populatio distributio fuctio. E(X i ) µ, (the populatio mea), σ 2 X i σ 2, (the populatio variace), sice these deped oly o the distributio of X i. E( X) µ.
Properties of X S 2 Example: GRE Scores 10,000 Samples of Size 100 For a Simple Rom Sample With Replacemet Histogram of x E( X) µ, (1) σ2 σ 2 X, (2) E(S 2 ) σ 2, (3) where µ σ 2 are the populatio mea variace. otes 1: (1) (3) are called ubiasedess; right aswer i repeated samples. Frequecy 0 500 1000 1500 420 440 460 480 500 x 2: Kow (1) (2); (3) soo. ote: Average X 469.0234 Average S 2 (119.9971) 2. sice LHS A Idetity (X i µ) 2 ( 1)S 2 + ( X µ) 2, [(X i X) + ( X µ)] 2 (X i X) 2 + 2( X µ) (X i X) + ( X µ)] 2 ( 1)S 2 + 0 + ( X µ) 2. Recall: Also Derivatio of (3): E(S 2 ) σ 2. (X i µ) 2 ( 1)S 2 + ( X µ) 2. ( 1)E(S 2 ) + E[( X µ) 2 ] E[(X i µ)] 2 σ 2. E[( X µ) 2 ] σ 2 X σ2 σ2. ( 1)E(S 2 ) + σ 2 σ 2, or E(S 2 ) σ 2.
The Derivatio Samplig Without Replacemet: For samplig without replacemet, ( ) σ 2 X σ2. Correctio Factor: ( 1 ) ote: The ratio of the w.o.r. stard deviatio to the w.r. is. Has little effect if < /10. ote: Accuracy is primarily determied by, ot /. For samplig without replacemet, Here E(X 1 X 2 ) j1 1 () 1 () j1,j i j1 v(e i )v(e j ) v(e i )v(e j ) v(e i ) 2 [ ] 2 v(e i )v(e j ) v(e i ) 2 µ 2 v(e i ) 2 E(Xi 2 ). σ X1,X 2 E(X 1 X 2 ) µ 2 1 [ 2 µ 2 E(Xi 2 ) ] µ 2 () ( ) 1 [µ 2 E(Xi 2 ) ] σ2 σ 2 Sum σx 2 i + j1,j i σ 2 σ 2 ( 1) ( 1 1 ) σ 2. σ Xi,X j Recall: σ 2 Sum ( 1 1 ) σ 2 σ 2 X σ2 Sum 2 ( ) σ 2. ( ) σ 2.
More o Stratified Populatios For a stratified populatio P P 1 P m, let i, µ i, σ 2 i deote the size, mea, variace of the i th sub-populatio P i. Further, let, µ, σ 2 deote the size, mea, variace of the etire populatio P. The where σ 2 1 + + m, (1) µ p i µ i, (2) p i σi 2 + p i (µ i µ) 2, (3) p i i. ote: σ 2 ca be bigger tha ay of the σ 2 i. Derivatios That 1 + + m is clear. For (2), let e ij be the j th elemet of P i. The For (3), write µ 1 m i v(e ij ) 1 j1 σ 2 1 m i µ i m i [v(e ij ) µ] 2 j1 m p i µ i. [v(e ij ) µ] 2 [v(e ij ) µ i ] 2 + [µ i µ][v(e ij ) µ i ] + (µ i µ) 2 proceed as i the idetity. More o Stratified Samplig ow suppose that idepedet simple rom samples of sizes 1,, m are drawm fro the sub-populatios P 1,, P m. Let suppose that The as above, 1 + + m r i i, r i p i i. X r 1 X1 + + r m Xm p 1 X1 + + p m Xm, E( X) p 1 µ 1 + + p m µ m µ More o Stratified Samplig The variace of X p1 X1 + + p m Xm for statified samplig is m p 2 i σ 2 i i 1 p i σi 2. For simple rom samplig, the variace would have bee [ σ 2 1 ] p i σi 2 + p i (µ i µ) 2, which could be substatially larger. so that X is ubiased.