Measures of Broad Sense Heritability from multi-location and multi-year trials.

Size: px

Start display at page:

Download "Measures of Broad Sense Heritability from multi-location and multi-year trials."

Toby McCoy
5 years ago
Views:

1 Measures of Broad Sense Heritability from multi-location and multi-year trials. ANOVA The Analysis of Variance (ANOVA) is a statistical tool that we rely on for measuring variation associated with a trait, ranging from yield in a field trial to gene expression on a micro-array. ANOVA techniques are used extensively in the analysis molecular marker associations with traits. The purpose of this exercise will be to review simple ANOVA models, and demonstrate how variance partitioning can be used to estimate heritability. ANOVA assigns total variation to known causes leaving a residual portion allocated to uncontrolled or unexplained variation, experimental error. The ability of ANOVA to partition variance is inextricably linked to experimental design. An excellent review of ANOVA can be found in McIntosh (1983). Table 1. Expected mean squares for randomized complete block experiments combined over locations. Sources of Var. df Mean Squares Locations L 1 M1 Blocks(Locations) L(r-1) M2 Treatment T-1 M3 L x T (L-1)(T-1) M4 Error L(r-1)(T-1) M5 L = Location r = Blocks (replicate) T = treatment Table 2. Expected mean squares for experiments with Random (R) and Fixed (F) effects. Sources of Var. RL-RT RL-FT FL-FT Locations Blocks(L) σ 2 e + rσ 2 TL + Tσ 2 r(l) + rtσ 2 L + rtσ 2 L + Tσ 2 r(l) + rtσ 2 L Treatment σ 2 e + rσ 2 TL + rlσ 2 T σ 2 e + rσ 2 TL + rlσ 2 T σ 2 e + rlσ 2 T L x T Error σ 2 e σ 2 e σ 2 e Table 3. F-ratios used to test effects for randomized complete block experiments combined over locations. F-Test Sources of Var. Mean Squares RL-RT RL-FT FL-FT Locations M1 (M1+M5)/(M2+M4) M1/M2 M1/M2 Blocks(Locations) M2 Treatment M3 M3/M4 M3/M4 M3/M5 L x T M4 M4/M5 M4/M5 M4/M5 Error M5 In a genetic/breeding experiment, Treatment would likely be genotype or variety. In an experiment designed to test for associations between a marker and a trait, Treatment would be the marker.

2 Take home messages: McIntosh 1983 serves as a set of reference tables for use during experimental design and analysis. The paper reinforces important points: (1) Unaccounted sources of variation will be pooled into the error term resulting in an inflated error; (2) the appropriate F-tests differ depending on whether the effects are fixed or random. Estimation of heritability from ANOVA Broad sense heritability can be estimated through estimates of variance obtained from replication of a population in time and space. Experiments designed to estimate variance components to be used in broad sense heritability must be grown in an adequate sample of environments. An example from a 2 year (97 and 98) replicated trial (replicate 1 and 2) Data: var rep year hplc uv

3 ANOVA Genotype SUM Average hplc SUM Sum Squares for Genotype: [(32.74) 2 + (33.44) 2 + (41.84) 2 + (31.62) 2 + (47.59) 2 + (49.57) 2 + (36.12) 2 ]/4 [272.92*9.75] = Sum of squares are calculated for: main effects, interactions, and error (total sum or squares main effects and interaction effects). Source DF Sum Squares Mean Square F Value Pr > F gen rep year year*gen rep*gen year*rep Error Note: Sum Squares / df = Mean Square Source gen rep year year*gen rep*gen year*rep gen*year*rep Expected MS Var(Error) + 2 Var(rep*gen) + 2 Var(year*gen) + 4 Var(gen) Var(Error) + 7 Var(year*rep) + 2 Var(rep*gen) + 14 Var(rep) Var(Error) + 7 Var(year*rep) + 2 Var(year*gen) + 14 Var(year) Var(Error) + 2 Var(year*gen) Var(Error) + 2 Var(rep*gen) Var(Error) + 7 Var(year*rep) Var(Error) To estimate Var(Gen) = Var(Error) + 2 Var(rep*gen) + 2 Var(year*gen) + 4 Var(gen) = Var(Error) + 2 Var(year*gen) = 2 Var(rep*gen) + 4 Var(gen) = 2 Var(rep*gen) = 4 Var(gen) = 4 var(gen) 2.55 = var(gen)

4 Compare to estimate obtained from MIVQUE0 in SAS: Component Estimate Var(gen) Var(rep) Var(year) Var(year*gen) Var(rep*gen) Var(year*rep) Var(Error) *SAS code (2yearlyc.txt) */; data cdata; infile 'a:2yrlyc.csv' delimiter = ',' firstobs = 2; input gen rep year hplc uv; proc glm; class year rep gen; model hplc uv = gen year rep gen*year gen*rep rep*year / ss3; random gen year rep gen*year gen*rep rep*year / test title 'ANOVA for lycopene data.with expected MS'; proc varcomp; class year rep gen; model hplc uv = rep gen*year gen*rep rep*year; title 'Variance Components for lycopene data,'; run; Notes on the SAS code The program above uses a new procedure (to us), Proc varcomp. Proc varcomp is one of two methods we can use to estimate variance components (the other, Proc Mixed will be introduced later). You can specify effects as fixed by putting them first in the MODEL statement and indicating the number of fixed effects with the FIXED= option. Variance components are estimated for RANDOM effects. There are four methods of estimation that can be specified in the PROC VARCOMP statement by using the METHOD = option. TYPE1 MIVQUE0 ML REML based on computation of the type 1 sums of squares The default. Similar to type 1, but computationally faster. Maximum likelihood Restricted maximum likelihood (favored in breeding work)

5 Estimating Heritability from Variance Components of lines. When using variance components from ANOVA to estimate broad sense heritability, the practical application will determine the appropriate denominator. For example it is common to exclude variance components due to blocks, years, and locations because it assumed that selection will occur on means across replicate, location, and years. This practice is based on the assumption that means will be corrected for differences between locations, blocks, and years (i.e. expressed as deviation from the mean). The rationale for this approach is that heritability should be defined based on the variance associated with the selection unit. BSH = σ 2 (G)/ σ 2 (x) and σ 2 (x) = σ 2 (P) when the selection unit is an individual. When the selection unit is not an individual, but a family, inbred line, or clone for which replicated phenotypic data has been collected, the expression of phenotypic variation is adjusted to represent the expected phenotypic variation among family (or clone, or inbred line) means. Rules of thumb: First, define selection unit. Second, correct phenotypic measurements for the appropriate effects 1. Third, main effects of year, location, and rep are dropped from the denominator. Fourth, variance estimates are adjusted for the selection unit. 1 least-squares, best linear unbiased prediction (BLUP), or simple corrections. Deviation JK = (Phenotypic measurement for individual J in block K) JK - (Block mean) K d JK = Y JK - Y K (Equation 4 from Cotterill, 1987) For a one year, one location, randomized complete block design with r replications and n individuals measured per plot and assuming we are selecting the best family or line, the heritability is: H family = σ 2 (G) σ 2 (G) + σ 2 error/r + σ 2 (within family)/rn When the plot is measured as a group (rather than n measurements per plot): H family = σ 2 (G) σ 2 (G) + σ 2 error/r + σ 2 (within family)/r However if the goal is to select the best individual from each line H = σ 2 (G) σ 2 (G) + σ 2 error + σ 2 (within family)

6 Justification for ignoring main effects (variation due to location, year, or blocks) is based on the fundamental assumption that corrections will be made for the effects prior to using phenotypic measurement to select. This is an important point if significant main effects exist. If main effects for location, year, or blocks are small they will contribute little to the denominator. The equation can be generalized as: [σ 2 (G)]/ [σ 2 (G) + σ 2 error/rep*year*location + σ 2 (GYL)/year*location + σ 2 (GL)/location + σ 2 (GY)/year] Hallauer and Miranda further generalize the equation as: H = [σ 2 (G)]/ [σ 2 (G) + σ 2 ge/e + σ 2 error/r*e] Where r = number of reps and e = number of environments The standard error for H is: SE(H) = [SEσ 2 (G)]/ [σ 2 (G) + σ 2 ge/e + σ 2 error/r*e] For the Lycopene data: Component Estimate Var(gen) Omit Var(rep) Var(year) Var(year*gen) Var(rep*gen) Var(year*rep) Var(Error) / [ ( /2) + ( /2) + ( /4) + ( /4) = setting negative estimates to 0 = Notes on Negative estimates of variance components. For the exercise using the 2yrlyc.xls data set, we noted that some variance components were negative. This possibility is noted in Chapter 18 of Lynch and Walsh, and is basically a problem associated with small n and variance components that are close to zero. The estimates were derived using the proc varcomp default MIVQUE0 which works by solving the set of expected mean square equations. Thus arithmetic is used and may result in negative estimates when variances are very low (i.e. one low number is smaller than another ). I noted earlier that there are multiple ways to estimate variance components in proc varcomp by adding the type = method. These methods are described again, below. The SAS proc varcomp can use the following four estimation procedures for variance components: Type1 computes the Type 1 sum of squares for each effect, equates each MS involving only random effects to its expected value, and solves the set of equations. MIVQUE0 is similar to type1, but is computationally simpler (and therefore is the default).

7 Maximum likelihood (ML) estimation uses a W-transformation of the expected mean squares equation and computes initial estimates using MIVQUE0. The program iterates until convergence. Restricted Maximum likelihood (REML) similar to ML, but separates the likelihood into two parts (one with fixed effects, one without). Initial estimates are obtained using MIVQUE0, then iteration is performed until convergence for the equation that does not contain fixed effects. REML is the method of choice for genetic studies. The syntax would be proc varcomp type = REML. The SAS procedure proc mixed can also be used for REML estimation. An advantage of using PROC MIXED over PROC VARCOMP is that the output will return estimates of error associated with the variance components. These estimates or error can then be used to place a standard error on our estimate of heritability. We will return to this point (below). PROC MIXED uses the following syntax: proc mixed data=cdata covtest; class year rep gen; model hplc = gen year rep(year) gen*year gen*rep(year) / ddfm = satterth ; random year rep gen; title Variance components using Proc mixed ; cdata is the data file name following the data statement.; covtest option statement calculates standard errors. a blank after model var = means that all affects are random. Fixed affects should be added here. Degrees of freedom are estimated by Satterthwaite s procedure. Using ANOVA to test for an association between a marker and a trait The application of simple F-tests or simple linear regression provides an intuitive example for statistical approaches that are used to establish linkage between a marker and a quantitative trait. For these approaches, the qualitative trait (or marker) is used to classify the progeny. The question is then asked if the populations based on marker classification have significantly different means. o oo quant ooo trait oo oo o o o o o o Genotypic Classes (backcross)

8 Source DF Expected MS Genotypes N-1 σ2 + bσ2(g) Marker 1 σ2 + b[σ2(g QTL ) + 4r(1-r)g 2 ] + bc(1 2r) 2 g 2 Gen(marker) N-2 σ2 + b[σ2(g QTL ) + 4r(1-r)g 2 ] Error N(b-1) σ2 Where b is the number of replicates r is the recombination fraction separating the marker from the QTL c is a coefficient related to the population size c = N (n n 2 2 )/N (n 1 + n 2 = 1; representing the number in each marker class) g is the genetic effect (in BC pop s additive and dominance effects are confounded). σ2(g QTL ) is the part of the error variance that cannot be explained by the QTL. When b = 1, Gen(marker) becomes the error term. If there are repeated measures on each genotype (b>1), the proper error term must be specified. The F-test for significance is Marker/Gen(marker) = bc(1 2r) 2 g 2. Thus significance of a marker depends on population size, recombination, the strength of the genetic effect relative to the error variance and the part of the error variance that cannot be explained by the QTL. References M.S. McIntosh Analysis of combined experiments. Agronomy Journal 75: Hallauer, AR. & J.B. Miranda Quantitative Genetics in Maize Breeding (second edition). Iowa State University Press. Ames, Iowa. (pp ). Cotterill, P. P. (1987). On estimating heritability according to practical applications. Silvae Genetica 36:46-48.

Topic 30: Random Effects Modeling

Topic 30: Random Effects Modeling Outline One-way random effects model Data Model Inference Data for one-way random effects model Y, the response variable Factor with levels i = 1 to r Y ij is the j th