Comparisons of Gene Expression Indexes for Oligonucleotide Arrays

Similar documents
/ Computational Genomics. Normalization

Tests for Two Correlations

Chapter 3 Student Lecture Notes 3-1

MgtOp 215 Chapter 13 Dr. Ahn

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

Evaluating Performance

3: Central Limit Theorem, Systematic Errors

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Creating a zero coupon curve by bootstrapping with cubic splines.

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Linear Combinations of Random Variables and Sampling (100 points)

Capability Analysis. Chapter 255. Introduction. Capability Analysis

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Analysis of Variance and Design of Experiments-II

Tests for Two Ordered Categorical Variables

EDC Introduction

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

4. Greek Letters, Value-at-Risk

Comparison of Singular Spectrum Analysis and ARIMA

OCR Statistics 1 Working with data. Section 2: Measures of location

PASS Sample Size Software. :log

A Bootstrap Confidence Limit for Process Capability Indices

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

Random Variables. b 2.

The Integration of the Israel Labour Force Survey with the National Insurance File

Notes on experimental uncertainties and their propagation

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

The Mack-Method and Analysis of Variability. Erasmus Gerigk

OPERATIONS RESEARCH. Game Theory

Understanding Annuities. Some Algebraic Terminology.

Risk and Return: The Security Markets Line

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Understanding price volatility in electricity markets

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

Skewness and kurtosis unbiased by Gaussian uncertainties

Multifactor Term Structure Models

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Data Mining Linear and Logistic Regression

Available online: 20 Dec 2011

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS041) p The Max-CUSUM Chart

Financial Risk Management in Portfolio Optimization with Lower Partial Moment

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Maturity Effect on Risk Measure in a Ratings-Based Default-Mode Model

Correlations and Copulas

Scribe: Chris Berlind Date: Feb 1, 2010

Quiz on Deterministic part of course October 22, 2002

International ejournals

Simultaneous Monitoring of Multivariate-Attribute Process Mean and Variability Using Artificial Neural Networks

Chapter 5 Student Lecture Notes 5-1

Bootstrap and Permutation tests in ANOVA for directional data

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

A Utilitarian Approach of the Rawls s Difference Principle

ASSESSING GOODNESS OF FIT OF GENERALIZED LINEAR MODELS TO SPARSE DATA USING HIGHER ORDER MOMENT CORRECTIONS

Efficient Sensitivity-Based Capacitance Modeling for Systematic and Random Geometric Variations

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

Using Conditional Heteroskedastic

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition

REFINITIV INDICES PRIVATE EQUITY BUYOUT INDEX METHODOLOGY

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

PhysicsAndMathsTutor.com

Alternatives to Shewhart Charts

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

Global sensitivity analysis of credit risk portfolios

Physics 4A. Error Analysis or Experimental Uncertainty. Error

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

Price and Quantity Competition Revisited. Abstract

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

CS54701: Information Retrieval

Available online at ScienceDirect. Procedia Computer Science 24 (2013 ) 9 14

A Set of new Stochastic Trend Models

Chapter 10 Making Choices: The Method, MARR, and Multiple Attributes

Elements of Economic Analysis II Lecture VI: Industry Supply

Appendix - Normally Distributed Admissible Choices are Optimal

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

IND E 250 Final Exam Solutions June 8, Section A. Multiple choice and simple computation. [5 points each] (Version A)

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Privatization and government preference in an international Cournot triopoly

Statistical Delay Computation Considering Spatial Correlations

arxiv:cond-mat/ v1 [cond-mat.other] 28 Nov 2004

The Institute of Chartered Accountants of Sri Lanka

UNIVERSITY OF NOTTINGHAM

Real Exchange Rate Fluctuations, Wage Stickiness and Markup Adjustments

Statistical Inference for Risk-Adjusted Performance Measure. Miranda Lam

Module Contact: Dr P Moffatt, ECO Copyright of the University of East Anglia Version 2

Cracking VAR with kernels

Stochastic Generation of Daily Rainfall Data

Underemployment Intensity, its Cost, and their Consequences on the Value of Time.

Transcription:

Journal of Data Scence 5(007), 45-439 Comparsons of Gene Expresson Indexes for Olgonucleotde Arrays Mounr Aout Laboratore Génétque des Malades Mult-factorelles-CNRS UMR8090 Abstract: Hgh densty olgonucleotde arrays have become a standard research tool to montor the expresson of thousands of genes smultaneously. Affymetrx GeneChp arrays are the most popular. They use short olgonucleotdes to probe for genes n an RNA sample. However, mportant challenges reman n estmatng expresson level from raw hybrdzaton ntenstes on the array. In ths paper, we deal wth the problem of estmatng gene expresson based on a statstcal model. The present method s lke L and Wong model (001a), but assumes more generalty. More precsely, we show how the model ntroduced by L and Wong can be generalzed to provde new measure of gene expresson. Moreover, we provde a comparson between these two models. Gene expresson, model-based estmaton, olgonucleotde ar- Key words: rays. 1. Introducton Hgh densty olgonucleotde expresson arrays are now wdely used n many area of bomedcal research for measurements of gene expresson. In the Affymetrx system, an array contans several thousands of genes and ESTs. To probe genes, olgonucleotdes of length 5 bp are used. Typcally, a mrna molecule of nterest (usually related to a gene) s represented by a probe set. Every probe set conssts of 10-0 probe pars. Every probe par s composed of a perfect match PM, a secton of the mrna molecule of nterest and a msmatch MM,whch s dentcal to the perfect match probe except for the base n the mddle (13th) poston. After RNA samples are prepared, labeled and hybrdzed wth arrays, these are scanned and mages are produced and processed to obtan an ntensty value for each probe. These ntenstes, PM j and MM j,representtheamountof hybrdzaton for arrays =1,...I and probe pars j =1,..., J for any gven probe set. There has been consderable dscusson over the approprate algorthm for constructng sngle expresson estmates based on multple-probe hybrdzaton

46 Mounr Aout data. At present, there are several analytcal methods to measure such ntenstes. However, we wll only dscuss the Affymetrx Mcroarray Sute MAS4.0 and MAS5.0 (1999 and 001) and the method of L and Wong LW (001a). The MAS 4.0 uses an average over probe pars PM j MM j,j =1,...J for each array =1,...I. Ths average dfference (AD) s motvated by underlyng statstcal model: PM j MM j = θ + ɛ j,j =1...J. The expresson ndex on array s represented wth the θ. AD s an approprate estmate of θ f the error term ɛ j has equal varance for j =1,..., J. However, the equal varance assumpton does not hold for GeneChp probe level data, snce probes wth larger mean ntenstes have larger varances, see Irzarry et al. (003c). The latest verson of ths software MAS5.0 computes the ant-log of a robust average of log (PM j CT j ). A correspondng statstcal model s log(pm j CT j )=log(θ j )+ɛ j,j =1,..., J. The basc dsadvantage for ths method s that there s no learnng about probe characterstcs, based on the performance of each probe across chps. To account for probe affnty effect, LW method suggests that PM j MM j = θ φ j +ɛ j,= 1,...I, j =1,...J, ɛ = N(0,σ ). The probe affnty effect s represented by φ j.the man object of ths paper s to generalze ths model by consderng separate models for PM and MM and makng general assumptons on the errors. Ths paper s organzed as follows: The next secton deals wth a general model based on L and Wong s model. We make general assumptons on the emprcal varance and correlaton of and between PM and MM, and estmate the parameters usng maxmum lkelhood. Based on our analyss, we wll show that our model gves an unbased estmate of the expresson ndex wth low varance. Secton 3 s concerned by a specal case usng PM only wth nconstant varance. In addton, we compare how well these methods perform usng the spke-n experment H GU95A descrbed n more detals n the same secton.. The Full L and Wong Model.1 The full model: A smple case Followng L and Wong, the PM and MM ntenstes are modeled as: PM j = ν j + θ α j + θ φ j + ɛ P j (.1) MM j = ν j + θ α j + ɛ M j (.) where I denotes the number of samples and J denotes the number of probe pars n a probe set. θ s the expresson ndex, ν s a non-specfc cross-hybrdzaton term, α s the rate of ncrease of MM ntensty and φ s the addtonal rate of ncrease of the PM ntensty.

Comparsons of Gene Expresson Indexes 47 Frequency 0.0 0.5 1.0 1.5.0.5 3.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Cor(PM,MM) Fgure 1: Correlaton between PM and MM Frequency 0 1 3 4 0 1000 000 3000 4000 Stdv(PM) Fgure : Standard devaton of PM

48 Mounr Aout Although ths model was ntroduced by L and Wong, they have only treated the reduced case whch we wll call RLW : PM j MM j = θ φ j + ɛ j,ɛ = N(0,σ ) Lemon et al.(00) use the above equatons, but assume that the PM and MM values are ndependent so ther model descrbes the margnal dstrbutons. Recently, Tab (004) ntroduced a model n whch t s assumed that the errors are correlated but wth common varance and a constant correlaton across samples. In general, these assumptons do not ft the observatons as we wll see later. We propose then to augment the recent model to permt to the emprcally observed correlaton between PM and MM and the varances of PM and MM to change across the arrays as s shown n Fgures 1-3. More precsely, we assume that the errors terms follow a bvarate normal dstrbuton accordng to ( ɛ P j ɛ M j ) = N (( 0 0 ) ( σ, ρ σ ρ σ σ where σ s the varance and ρ s the correlaton coeffcent. In the followng ths model wll be called FLW1. )) Frequency 0.0 0.5 1.0 1.5.0.5 3.0 0 500 1000 1500 000 500 Stdv(MM) Fgure 3: Standard devaton of MM

Comparsons of Gene Expresson Indexes 49. The estmates Gven data (PM j,mm j ) we can estmate the parameters of our model usng the maxmum lkelhood. It s known that the lkelhood functon of the bvarate normal dstrbuton can be expressed as: L =,j L(PM j,mm j,θ,α j,φ j,ν j,σ,ρ ) =,j K exp 1 [ X σ (1 ρ ) 1 ρ X 1 X + X ] where X 1 = PM j ν j θ α j θ φ j and X = MM j ν j θ α j. The correspondng log lkelhood functon s l =,j log(k ),j 1 [ X σ (1 ρ ) 1 ρ X 1 X + X ] To get the estmates of the parameters we take the partal dervatves wth respect to the correspondng parameters and we set the resultng expresson equal to zero. Hence, we obtan: ˆφ j = ˆα j = θ σ (1 ρ ) [(PM j ρ MM j ) (1 ρ )(ν j + θ α j )] θ σ (1 ρ ) θ σ (1+ρ ) [PM j + MM j ν j θ φ j ] θ σ (1+ρ ) νˆ j = (PM j θ α j θ φ j )+(MM j θ α j ) A + B ˆθ = j φ j +(1 ρ )α j +(1 ρ )α j φ j ˆσ j = (X 1 ρ X 1 X + X ) J(1 ρ ) j ˆρ = X 1X Jσ, where A = j φ j [PM j ρ MM j (1 ρ )ν j ], B =(1 ρ ) j α j [PM j + MM j ν j. The last two equatons can be wrtten as: ˆσ j = (X 1 + X ) J

430 Mounr Aout ˆρ = j X 1X j (X 1 + X ) These formulas have to be understood as steps n an teratve procedure that wll lead to fnal estmates. In ths case we wll not be concerned by solvng these equatons. However, they are useful when t comes to dervng varous propertes. If we assume the other parameters[ to] be known, It wll be easy to see that ˆθ s an unbased estmate of θ snce E ˆθ = θ. For the varance, we get: Var( ˆθ )= σ (1 ρ ) j φ j +(1 ρ )α j +(1 ρ )α j φ j (.3).3 Comparsons between FLW1 and RLW In ths secton, we wll gve a bref descrpton of the reduced L and Wong model and make a comparson between the estmates obtaned n each model n terms of accuracy (bas) and precson (varance). For the RLW model, we recall that: Y j := PM j MM j = θ φ j + ɛ j, j φ j = J, ɛ j = N(0,σ ) The estmated expresson ndex ˆθ can be obtaned usng the maxmum lkelhood or the least squares. Hence j ˆθ = Y jφ j j φ j The varance of the estmate, based on the assumptons of RLW model s Var( ˆθ )= σ J But, based on the FLW1 assumptons, on can easly show that Var( ˆθ )= σ (1 ρ ) j φ j (.4) and t s easy to see that (.3) (.4). Gven the L and Wong Model, one could choose a sutable model based on the dstrbuton of the errors. Another mportant pont for the selecton of the convenent estmate s the unbasedness and low varance. Snce we have shown that the correspondng ˆθ for our model s an unbased estmate wth low varance,

Comparsons of Gene Expresson Indexes 431 and accordng to the comparson above, we see that the full model should be a good choce..4 The full model: A general case In ths secton secton, we propose to augment the last model to take nto account the dfference of the emprcally observed varances between PM and MM assshownnfgure4. Frequency 0 1 3 4 0 500 1000 1500 Stdv(PM) Stdv(MM) Fgure 4: Dfference between standard devaton of PM and MM We wll then assume that the error terms n.1 and. are dstrbuted accordng to ( ) (( ) ( ɛ P j 0 σ = )) ɛ M 1, ρ σ 1, σ, N, j 0 ρ σ 1, σ, σ, where σ1, and σ, are the varances and ρ s the correspondng correlaton coeffcent. From now on, we wll call ths model the FLW model.

43 Mounr Aout In ths case, the lkelhood functon has the form L =,j =,j K exp L(PM j,mm j,θ,α j,φ j,ν j,σ 1,,σ,,ρ ) [ ] 1 X1 X 1 X (1 ρ ) σ1, ρ + X σ 1, σ, σ, The same computatons as above lead to the maxmum lkelhood estmates of the parameters: ˆφ j = ˆα j = [ θ σ σ1, (1 ρ ) (PM j ρ 1, σ σ, MM j ) (1 ρ 1, θ 1 ρ θ σ1, (1 ρ ) [a PM j + b MM j ν j (a + b ) a θ φ j ] θ (a 1 ρ + b ) νˆ j = a (PM j θ α j θ φ j )+b (MM j θ α j ) a + b A + B ˆθ = ˆσ 1, = ˆσ, = ˆρ = j j X 1 J j X φ j σ 1, +(a + b )α j +a α j φ j J j X 1X ( j X 1 ) ( j X ) ] σ, )(ν j + θ α j ) where A = j B = j φ j [ 1 σ 1, PM j ρ σ 1, σ, MM j a ν j α j [a PM j + b MM j (a + b )ν j ] ] a = 1 σ 1, σ1, (1 ρ ) σ, b = 1 σ, σ, (1 ρ ) σ 1, and

Comparsons of Gene Expresson Indexes 433 Gven the other parameters, t s thus easy to see that the estmate ˆθ of the expresson ndex s unbased. For the varance we get Var( ˆθ )= j φ j σ 1, 1 ρ (.5) +(a + b )α j +a α j φ j On the other hand the varance of ˆθ basedontherlw s Var( ˆθ )= σ 1, + σ, ρ σ 1, σ, j φ j (.6) and t s not easy to compare these varances. For example when a 0wehave (.5) (.6). In general, we use data from the spke-n studes HGU95A and HGU133 to make ths comparson (see Fgures 5-6 and we see that (.5) (.6) for almost all data (99 per cent of data) Hstogram of VFLW/FRLW 10 0 10 0 30 log(vflw/vrlw) Hstogram of VFLW/FRLW 10 0 10 0 log(vflw/vrlw) Frequency 0e+00 1e+05 e+05 3e+05 4e+05 5e+05 6e+05 Fgure 5: Rato of log-varance between FLW and RLW- HGU133 Frequency 0e+00 e+05 4e+05 6e+05 8e+05 Fgure 6: Rato of log-varance between FLW and RLW- HGU95A 3. Numercal Results and Conclusons 3.1 The model based on PM only It has been observed that some MM probes may respond poorly to the changes n the expresson level of the target gene as dscussed n L and Wong (001b). Ths phenomenon rased questons on the effcency of usng MM

434 Mounr Aout probes, and led some nvestgators to calculate fold changes usng only PM probes. To nvestgate the relatve performance of PM-only usng RLW and FLW, wemodfedtheflw model to estmate gene expresson levels usng only PM probes, and compared t to RLW. The modfed FLW model becomes PM j = ν j + θ φ j + ɛ j where ɛ j = N(0,σ ) The same procedure as above gves: ˆφ j = ˆν j = ˆθ = θ (PM σ j ν j ) 1 σ θ σ (PM j θ φ j ) 1 σ j φ j(pm j ν j ) j ˆσ = (PM j θ φ j ν j ) J To evaluate how ths model performs, we use a spke-n study HGU95A desgned by Affymetrx. 3. Data HGU95AGeneChp s a subset of the data used to develop and valdate the MAS5.0 algorthm. Human crna fragments matchng 16 probe-sets on the HGU95A GeneChp were added to the hybrdzaton mxture of the arrays at concentratons rangng from 0 to 104 pcomolar. The same hybrdzaton mxture, obtaned from a common tssue source, was used for all arrays. The crnas were spked-n at a dfferent concentraton on each array (apart from replcates) arranged n a cyclc Latn square desgn wth each concentraton appearng once n each row and column. Wthn each experment, only the spke-n concentratons are vared, background s the same for all arrays. Fold change calculatons are always made wthn experment to ensure that only spked-n genes wll be dfferentally expressed. For more detals see(http://www.affymetrx. com/analyss/downloadcenter.affx). j φ j

3.3 Numercal results Comparsons of Gene Expresson Indexes 435 Ths secton s concerned by evaluatng how the FLW based on PM-only performs. Actually we present a numercal comparson between FLW and RLW usng the spke-n study HGU95A GeneChp. we computed our estmates usng the R envronment see Ihaka and Gentleman (1996), whch can be freely obtaned from (http://cran.r-project.org) and the methods for Affymetrx Olgonucleotde Arrays R package descrbed n Irrzary et al. (003a), whch s freely avalable as part of the Boconductor project http://www.boconductor.org. We then use a benchmark for Affymetrx GeneChp expresson measures developed by Cope et al. (003) whch ams to evaluate and compare summares of Affymetrx probe level data. We submtted our data to the correspondng webtool whch s avalable at (http://affycomp.bostat.jhsph.edu). The results obtaned are summarzed n the table below (see Tables 1-). We got results for RLW from (http://affycomp.bostat.jhsph.edu/affy/rafajhu.edu/030519.1451/completeassessment.pdf) and results correspondng to FLW are gven n the Affycompwebtool report. The score components for Table NR1 are as follows: 1. Sgnal detect slope: Slope obtaned from regressng expresson values on nomnal concentratons n the spke-n data.. Sgnal detect R: R-squared obtaned from regressng expresson values on nomnal concentratons n the spke-n data. 3. AUC (FP < 100): Area under the ROC curve up to 100 false postves. 4. AFP, call f fc > : Average false postves f we use fold-change > asa cut-off. 5. ATP, call f fc > : Average true postves f we use fold-change > asa cut-off. 6. IQR: Interquartle range of log ratos among genes not dfferentally expressed. 7. Obs ntended-fc slope: Slope obtaned from regressng observed log-foldchanges aganst nomnal log-fold-changes. 8. Obs (low)nt-fc slope: Slope obtaned from regressng observed log-foldchanges aganst nomnal log-fold-changes for genes wth nomnal concentratons less than or equal to. 9. FC =,AUC(FP < 100): Area under the ROC curve up to 100 false postves when comparng arrays wth nomnal fold changes of.

436 Mounr Aout 10. FC =, AFP, call f fc > : Average false postves f we use fold-change> as a cut-off when comparng arrays where nomnal fold-changes are. 11. FC =,ATP,callffc > : Average true postves f we use fold-change > as a cut-off when comparng arrays where nomnal fold-changes are. and for Table : 1. Medan SD: Medan SD across replcates.. null log-fc IQR: Inter-quartle range of the log-fold-changes from genes that should not change. 3. null log-fc 99.9%: 99.9% percentle of the log-fold-changes f from the genes that should not change. 4. Sgnal detect R: R-squared obtaned from regressng expresson values on nomnal concentratons n the spke-n data. 5. Sgnal detect slope: Slope obtaned from regressng expresson values on nomnal concentratons n the spke-n data. 6. low.slope: Slope from regresson of observed log concentraton versus nomnal log concentraton for genes wth low ntenstes. 7. med.slope: As above but for genes wth medum ntenstes. 8. hgh.slope: As above but for genes wth hgh ntenstes. 9. Obs-ntended-fc slope: Slope obtaned from regressng observed log-foldchanges aganst nomnal log-fold-changes. 10. Obs-(low)nt-fc slope: Slope obtaned from regressng observed log-foldchanges aganst nomnal log-fold-changes for genes wth nomnal concentratons less than or equal to. 11. low AUC: Area under the ROC curve (up to 100 false postves) for genes wth low ntensty standardzed so that optmum s 1. 1. med AUC: As above but for genes wth medum ntenstes. 13. hgh AUC: As above but for genes wth hgh ntenstes. 14. weghted avg AUC: A weghted average of the prevous 3 ROC curves wth weghts related to amount of data n each class (low,medum,hgh). For more detals we refer to Irzarry et al. ( 003c).

Comparsons of Gene Expresson Indexes 437 Table 1: Comparson results 1 FLW-PMonly RLW-PMonly Perfecton Sgnal detect slope 0.480 0.533 1 Sgnal detect R 0.85 0.846 1 AUC (FP < 100) 0.783 0.674 1 AFP, call f fc > 7.331 36.907 0 ATP, call f fc > 10.78 11.47 16 IQR 0.11 0.446 0 Obsntendedfc slope 0.471 0.53 1 Obs(low) ntfc slope 0.04 0.317 1 FC =, AUC (FP < 100) 0.460 0.167 1 FC=,AFP,callffc > 6.81 8.64 0 FC=,ATP,callffc > 1.000 1.50 16 Table : Comparson results 1 FLW-PMonly RLW-PMonly Perfecton Medan SD 0.066 0.13 0 null log-fc IQR 0.105 0.04 0 null log-fc IQR %99.9 0.656 1.437 0 Sgnal detect R 0.85 0.846 1 Sgnal detect slope 0.480 0.533 1 low.slope 0.138 0.49 1 med.slope 0.547 0.641 1 hgh.slope 0.404 0.390 1 Obs-ntended-fc slope 0.471 0.53 1 Obs-(low) nt-fc slope 0.04 0.317 1 low AUC 0.95 0.041 1 med AUC 0.831 0.0 1 hgh AUC 0.61 0.011 1 weghted average AUC 0.47 0.079 1 4. Conclusons We have presented a comparson between the reduced and full form of L and Wong models usng ether the full bvarate or PM-only models. To understand the dfference n the performance of calls generated by these two models, we

438 Mounr Aout used both theoretcal and numercal crtera. To make a decson as a choce of a model, one can make comparson n terms of accuracy(unbased or low bas) and precson (low varance). We have shown that FLW1 has a less varance than RLW. Furthermore, usng the Spken study, t seems clear that FLW has consderably less varance than RLW. We also see that the PM-only model provdes mportant mprovements n varous aspects compared to the same model based on RLW. References Affycomp-webtool (005). Boconductor expresson assessment tool for affymetrx olgonucleotde arrays (affycomp). Report. Affymetrx (1999). Mcroarray Sute User Gude, Verson 4. Affymetrx (001). Mcroarray Sute User Gude, Verson 5. Cope, L. M., Irzarry, R. A., Jaffee, H., Wu, Z. and Speed, T. P. (003). A benchmark for affymetrx genechp expresson measures. Bonformatcs 0, 33-331. Ihaka, R. and Gentleman, R. (1996). R: a language for data analyss and graphcs. J. Comput. Graph. Stat. 5, 99-314. Irzarry, R., Gauter, L. and Cope, L. (003a). An R package for analyses of Affymetrx olgonucleotde arrays. In The Analyss of Gene Expresson Data: Methods and Software (Edted by Parmgan, G., Garrett, E. S.,Irzarry, R. A. and Zeger, S. L.), 313-341. Sprnger. Irzarry, R., Hobbs, B., Colln, F., Beazer-Barclay, Y., Antonells, K., Scherf, U. and Speed, T. (003c). Exploraton, normalzaton, and summares of hgh densty olgonucleotde array probe level data. Bostatstcs 4, 49-64. Lemon, W. J., Palatn, J. J. T., Krahe, R. and Wrght, F. A. (00). Theoretcal and expermental comparsons of gene expresson ndexes for olgonucleotde arrays.bonformatcs 18,1470-6. L, C. and Wong, W. H. (001a). Model based analyss of olgonucleotde arrays:expresson ndex computaton and outlers detecton. Proc. Natoanl Academy of Scence 98, 31-36. L, C. and Wong, W. H. (001b). Model-based analyss of olgonucleotde arrays: Model valdaton, desgn ssues and standard error applcaton. Genome Bology, research003.1-003.11. Lockhart, D., Dong, H., Byrne, M., Follette, M., Gallo, M., Chee, M., Mttmann, M., Wang, C., Kobayash, M., Horton, H. and Brown, E.L. (1996). Expresson montorng by hybrdzaton to hgh-densty olgonucleotde arrays. Nat. Botechnol. 14, 1675-1680. Srvastava, M. S. (00). Methods of Multvarate Statstcs. John Wley.

Comparsons of Gene Expresson Indexes 439 Tab, Z. (004). Statstcal analyss of olgonucleotde mcroarray data. Comptes Rendus de l Acadme des Scences 37, 175-180. Receved January 3, 006; accepted Aprl 3, 006. Mounr Aout Department of Statstcs and Data Processng IUT de Caen (Lseux) 11 Bd Jules Ferry 14100 Lseux France m.aout@lseux.utcaen.uncaen.fr