RESEARCH ARTICLE. The Penalized Biclustering Model And Related Algorithms Supplemental Online Material

Similar documents
A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Bayesian Linear Model: Gory Details

Evaluation of a New Variance Components Estimation Method Modi ed Henderson s Method 3 With the Application of Two Way Mixed Model

Statistical Inference and Methods

COS 513: Gibbs Sampling

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

ROM SIMULATION Exact Moment Simulation using Random Orthogonal Matrices

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Technical Appendix: Policy Uncertainty and Aggregate Fluctuations.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Bivariate Birnbaum-Saunders Distribution

Risk Management and Time Series

Course information FN3142 Quantitative finance

STA258 Analysis of Variance

A Multivariate Analysis of Intercompany Loss Triangles

Nonresponse Adjustment of Survey Estimates Based on. Auxiliary Variables Subject to Error. Brady T. West. University of Michigan, Ann Arbor, MI, USA

SELECTION OF VARIABLES INFLUENCING IRAQI BANKS DEPOSITS BY USING NEW BAYESIAN LASSO QUANTILE REGRESSION

Macroeconometric Modeling: 2018

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

Supplementary online material to Information tradeoffs in dynamic financial markets

Lecture 3: Factor models in modern portfolio choice

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

Calibration of Interest Rates

Lattice (Binomial Trees) Version 1.2

Financial Econometrics

1 Explaining Labor Market Volatility

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Portfolio Optimization. Prof. Daniel P. Palomar

Module 2: Monte Carlo Methods

Modelling strategies for bivariate circular data

STA218 Analysis of Variance

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Financial intermediaries in an estimated DSGE model for the UK

Financial Risk Management

Estimating Market Power in Differentiated Product Markets

Risk Margin Quantile Function Via Parametric and Non-Parametric Bayesian Quantile Regression

Modelling, Estimation and Hedging of Longevity Risk

ELEMENTS OF MATRIX MATHEMATICS

What s New in Econometrics. Lecture 11

Premia 14 HESTON MODEL CALIBRATION USING VARIANCE SWAPS PRICES

Extended Model: Posterior Distributions

Final exam solutions

Oil and macroeconomic (in)stability

1. You are given the following information about a stationary AR(2) model:

Bayesian Inference for Random Coefficient Dynamic Panel Data Models

Market Risk Analysis Volume I

"Pricing Exotic Options using Strong Convergence Properties

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Modelling Returns: the CER and the CAPM

ECON 815. A Basic New Keynesian Model II

Online Appendix (Not intended for Publication): Federal Reserve Credibility and the Term Structure of Interest Rates

Applications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK

induced by the Solvency II project

Application of MCMC Algorithm in Interest Rate Modeling

Monetary Economics Final Exam

Methods and Models of Loss Reserving Based on Run Off Triangles: A Unifying Survey

A Bayesian model for classifying all differentially expressed proteins simultaneously in 2D PAGE gels

Equity correlations implied by index options: estimation and model uncertainty analysis

Unobserved Heterogeneity Revisited

Identifying Long-Run Risks: A Bayesian Mixed-Frequency Approach

Continuous Distributions

may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

4 Reinforcement Learning Basic Algorithms

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Construction and behavior of Multinomial Markov random field models

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

John Hull, Risk Management and Financial Institutions, 4th Edition

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

The change of correlation structure across industries: an analysis in the regime-switching framework

Window Width Selection for L 2 Adjusted Quantile Regression

Notes on the EM Algorithm Michael Collins, September 24th 2005

Comparison of Pricing Approaches for Longevity Markets

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION

Chapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem

Portfolio selection with multiple risk measures

Stochastic Volatility (SV) Models

Microeconomic Theory August 2013 Applied Economics. Ph.D. PRELIMINARY EXAMINATION MICROECONOMIC THEORY. Applied Economics Graduate Program

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Asymptotic methods in risk management. Advances in Financial Mathematics

A New Multivariate Kurtosis and Its Asymptotic Distribution

Lecture IV Portfolio management: Efficient portfolios. Introduction to Finance Mathematics Fall Financial mathematics

A simple wealth model

INTERTEMPORAL ASSET ALLOCATION: THEORY

Operational Risk Aggregation

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

Semiparametric Modeling, Penalized Splines, and Mixed Models

A way to improve incremental 2-norm condition estimation

Logit Models for Binary Data

ARCH and GARCH models

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

Research Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model

Lecture Note of Bus 41202, Spring 2008: More Volatility Models. Mr. Ruey Tsay

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions

Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

The mean-variance portfolio choice framework and its generalizations

MAFS Computational Methods for Pricing Structured Products

Transcription:

Journal of Applied Statistics Vol. 00, No. 00, Month 00x, 8 RESEARCH ARTICLE The Penalized Biclustering Model And Related Algorithms Supplemental Online Material Thierry Cheouo and Alejandro Murua Département de mathématiques et de statistique, Université de Montréal, CP 68, succ. centre-ville, Montréal, Québec H3C 3J7 Canada Received 00 Month 00x; in final form 00 Month 00x) A. Introduction In these sections we provide further details on the EM and Bayesian algorithms described in the main body of the paper. Section B provides the EM updating equations for the plaid model of Section. Section C displays the full conditional distributions of the labels and parameters associated with the penalized plaid model of Section 4. The procedure to initialize the parameters and labels for the Marov chain Monte Carlo implementation of the penalized plaid model is described in Section D. Section E gives the URL addresses of the biclustering pacages used in this study, including our own pacage implementing the penalized plaid model. B. The EM updating equations Note that the bicluster and combination bicluster probabilities pρ i κ j ) are constants depending only on the combination bicluster. We will denote them as π, = 0,,..., K. Observe that pρ, κ Y, θ) = pρ i, κ j y ij, θ). It is straightforward to verify that pρ i, κ j y ij, θ) = pρ i κ j y ij, θ) = ρ i κ j π cb) σρ φ i κ j) y ij µρ i κ j, θ))/σρ i κ j ) ) π cb ) σρ i κ j) φ y ij µρ i κ j, θ))/σρ ), i κ j ) where cb) and cb ) denote the combination biclusters associated to ρ i, κ j ) and ρ i, κ j ), respectively. The maximizer of Qθ θ) for the plaid model is obtained by Corresponding author: A. Murua. E-mail: murua@dms.umontreal.ca ISSN: 066-4763 print/issn 360-053 online c 00x Taylor & Francis DOI: 0.080/066476YYxxxxxxxx http://www.tandfonline.com

T. Cheouo & A. Murua taing the derivatives with respect to θ. These yield: µ = E θρ i κ j ) E θρ i κ j )y ij E θρ i κ j ρ i κ j )µ + α i + β j ) α i = j E θρ i κ j ) E θρ i κ j )y ij E θρ i κ j ρ i κ j )µ + α i + β j ) µ j β j = i E θρ i κ j ) E θρ i κ j )y ij E θρ i κ j ρ i κ j )µ + α i + β j ) µ i π cb) pρ i κ j = ρ cb) κ cb) y ij, θ), where ρ cb) κ cb) denotes the corresponding -th combination bicluster, and σ = qp = qp E θ y ij ) ρ i κ j µ + α i + β j ) yij E θρ i κ j )y ij µ + α i + β j ) + E θρ i κ j ρ i κ j )µ + α i + β j )µ + α i + β j )., Note that the updating equations are recursive. The parameters can be estimated using a Gauss-Seidel relaxation scheme over =,..., K. For example, let the superscript t + ) denote the coefficients recently updated, and the superscript t), the coefficients not yet updated. Then in order to solve the system, say for α i s, we iterate for within the EM iterations α t+) i = j E θ ρ iκ j ) E θρ i κ j )y ij j < E θρ i κ j ρ i κ j )µ t+) + β t+) j ) E θρ i κ j ρ i κ j )µ t) + α t) i + β t) j ) > } µ t+). + α t+) i Also note that the expectation is intractable if the number of biclusters K is large. For example, one needs to compute E θρ i κ j ) = ρ i κ j pρ i κ j y ij, θ), which is ρ iκ j= a sum involving K terms. In the non-overlapping bicluster model, the sum reduces to one term E θρ i κ j ) = pρ i κ j =, ρ i κ j ) = y ij, θ). In this latter

case, the updating equations simplify to Journal of Applied Statistics 3 µ = E θ ρ iκ j ) E θρ i κ j )y ij α i = j E θ ρ iκ j ) E θρ i κ j )y ij µ β j = i E θ ρ iκ j ) E θρ i κ j )y ij µ σ = E θ ρ iκ j ) j i E θρ i κ j ) y ij µ α i β j ) π cb) pρ i κ j = y ij, θ). C. The penalized plaid model : the full conditionals The labels Note that the lielihood may be written as exp yij γ ij ) ρ } iκ j µ + α i + β j ) ) + log σρ i, κ j ) σρ i, κ j ) γ ij y )} ij µ 0 ) + log σ0. σ 0 Let the bicluster, be fixed. Define the variables z ij = y ij ρ i κ j µ α i β j ), α = α i ) i I R r, β = β j ) j J R c. To find the full conditional of the labels, say ρ i, we use the fact that K y ij ρ i κ j µ + α i + β j ) = z ij ρ i κ j µ + α i + β j ) = = ρ i κ j z ij µ α i β j ) + ρ i )z ij + ρ i κ j )z ij. Note that γ ij = K = ρ iκ j ). For a given, we will write γ ij = K =

4 T. Cheouo & A. Murua ρ i κ j ). Then ρ i κ j γ ij ) = ρ i κ j γ ij ρ i κ j )ρ i κ j = ρ i κ j. We have j γ ij ) z ij ρ i κ j µ + α i + β j )) σ ρ i κ j ) z ij µ α i β = ρ j) i σ ρ i κ j ) j J + ρ i ) j z ij µ α i β = ρ j) i σ ρ i κ j ) j J zij γ ij ) σ ρ i κ j ) + ρ zij i γ ij ) σ ρ i κ j ) j / J + ρ i ) zij γ ij ) σ ρ i κ j ) + zij γ ij ) σ ρ i κ j ). j J j / J As before, let θ denote the set of parameters of the model. Define A i = exp } z ij µ α i β j) σ ρ i κ j ) j J B i = exp σ0 C i = exp D i,ρi γ ij y ij µ 0 ) j J γ ij ) j J = exp γ ij ) j / J } σ 0 ) / σ ρ i κ j ), j J ) j J γ ij/, z ij σ ρ i κ j ) + log σ ρ i κ j ) )}, zij )} σ ρ i κ j ) + log σ ρ i κ j ). Also let ρ i) denote the set of all row labels except ρ i. From the above equation it is straightforward to verify that the full conditionals of ρ i satisfy pρ i y ij }, ρ i), κ), θ) A ρi i B ρi i C ρi i D i,ρi πρ i ), where πρ i ) = exp λ j K = ρ i κ j + γ ij + γ ij )κ j ρ i ) }. In particular, the ratio pρ i = y ij }, ρ i), κ), θ)/pρ i = 0 y ij }, ρ i), κ), θ) is given by A i B i C i D i,d i,0 exp λ j γ ij )κ j }. The term D i,ρi may be ignored for models whose variances do not depend on i, j). In particular, for the plaid model, the logarithm of this ratio is σ j J z ij µ α i β j ) + γ ij )zij +γ ijy ij µ 0 ) } λ j γ ij)κ j. The full conditional for κ j s are found in a similar way by symmetry.

The row and column effects Journal of Applied Statistics 5 Define the matrices R = diag j J σ ρ i, κ j )), and C = diag i I σ ρ i, κ j )). Let m denote the vector of all s in R m. Since the variance of α is given by σαv = σαi r r r r ), we may write α = V a for a random vector a N0, σαi r ). It is easy to verify that the full conditional of a is a multivariate normal with mean µ a, and variance Σ a, given by µ a, = V R V + σ α I r ) V z α,, Σ a, = V R V + σ α I r ), where z α, = j J z ij µ β j )/σ ρ i, κ j )) i I. Similarly, let U = I c c c c ). We may write β = U b for a random vector b N0, σ β I c ). It is easy to verify that the full conditional of b is a multivariate normal with mean µ b, and variance Σ b, given by µ b, = U C U + σ β I c ) U z β,, Σ b, = U C U + σ β I c ), where z β, = i I z ij µ α i )/σ ρ i, κ j )) j J. For the plaid model σρ i, κ j ) = σ, and for the model of Cheng and Church [], σρ i, κ j ) = σ. In both cases the variance is constant on each bicluster. Therefore, for these models, R = σ c I r and C = σ r I c. Hence the conditional means and variances for a and b become µ a, = Σ a, = µ b, = Σ b, = c σ c σ r σ r σ + σ α + σ α + σ β ) c σ z i z ) i I, ) I r + c σα r σ r r ), ) r z j z ) j J, σ + ) σα I c + r σβ c σ c c ), where z denotes the mean of the values of z ij in the bicluster, and z i = j J z ij /c, and z j = i I z ij /r. It can be easily shown that the full conditionals of the means µ, = 0,,..., K are also normal distributions with means and variances given by µ µ, = σ µ Σ µ, = σ µ + ) B + ) B ) z ij α i β j σ ρ i, κ j ) σ, ρ i, κ j ) ) B ). σ ρ i, κ j ) Again, for the plaid and Cheng and Church models, the means and variances

6 T. Cheouo & A. Murua simplify to µ µ, = σ µ + n ) n σ σ z, Σ µ, = σµ + n ) σ. Note that when σµ, σα, and σβ estimators. tend to infinity we obtain the hard-em or ICM) The variances Let z ij = y ij ρ i κ j µ + α i + β j ). The full conditionals of the variances are proportional to σ exp zij + ) νs σ + νs ) B σ } n logσ ) ν + ) log σ. If we suppose that there is no overlapping among the biclusters, we obtain the Cheng and Church model []. The corresponding full conditional of σ is an inverseχ distribution with scale νs + ) B zij )/ν + n ), and ν + n degrees of freedom. If instead we suppose that σρ i, κ j ) = σ independently of the cell i, j) i.e., σ = σ for all = 0,,..., K), then we obtain the full conditional of σ for the plaid model. This is also an inverse-χ distribution, but this time with scale νs + z ij )/ν + pq), and ν + pq degrees of freedom. D. The penalized plaid model: initial values Finding the initial membership labels ρ, κ) is a difficult tas. Several procedures have been suggested in the literature. We have adopted a technique similar to that of Turner et al. [3]. We run two independent -means algorithms [4] with = : once for the rows and once for the columns. Using the Cartesian product of the resulting -means row and column labels, we divide the data matrix into four groups. A single initial bicluster is chosen among these four groups according to a variance criterion explained a few lines below. The procedure is repeated as many times as the number of initial biclusters needed. A single initial bicluster is chosen after each application of the independent row and column -means algorithms. The elements of the biclusters already chosen are mased by replacing their original values y ij by random values. This is done so that in the next iteration a different group may be chosen. The masing procedure is not new. It has been used before by Sheng, Moreau and De Moor [] to determinate multiple biclusters. The criterion to choose an initial bicluster among the four groups yielded by each iteration is the following. Suppose that the cells of each group follow a random effect additive ANOVA model. That is, on each group g, y ij = µ g + α ig + β jg + ɛ ij, g =,, 3, 4. The standard moment estimates of the variances are ˆσ gα = MSS gα) MSS g e), ˆσ gβ c = MSS gβ) MSS g e), ˆσ ge = MSS g e), ) g r g where r g is the number of rows in the g-th group, c g is the number of columns, and MSS g e), MSS g α) and MSS g β) are the corresponding mean sum of squares for error, rows and columns, respectively. We select as an initial bicluster the group g

REFERENCES 7 that maximizes ˆσ gα + ˆσ gβ )/ˆσ ge. For each initial bicluster, the parameters µ g, α ig and β jg are initialized as ȳ.., ȳ i. ȳ.., ȳ.j ȳ.., respectively, where y.., ȳ i., ȳ.j stand for the overall bicluster mean, the bicluster i-th row mean, and the bicluster j-th column mean, respectively. The parameter µ 0 is estimated as the arithmetic mean of the zero-bicluster. The variance σ is initialized as the mean sum of squares of all the residuals. E. Biclustering algorithm pacages The following table gives the URLs associated with the biclustering pacages used in the comparison study presented in Section 5 of the main body of this paper. The pacage names are given in parenthesis under the associated algorithm names. Algorithm FABIA fabia) Description and Address R pacage: Factor Analysis for Bicluster Acquisition www.bioconductor.org/pacages/release/bioc/html/fabia.html SAMBA EXPANDER) EXpression Analyzer and DisplayER: java-based tool for analysis of gene expression and NGS data acgt.cs.tau.ac.il/expander/overview.html TUR biclust) R pacage: the plaid model implementation of Turner and al. cran.r-project.org/web/pacages/biclust/ CC biclust) R pacage: original method suggested by Cheng and Church to fit a mixture model cran.r-project.org/web/pacages/biclust/ Spectral biclust) R pacage: uses a singular value decomposition cran.r-project.org/web/pacages/biclust/ Penalized plaid penalizedplaid) java pacage: The models are: ) the non-overlapping bicluster model; ) the plaid model; and 3) the penalized plaid model. The estimation methods are: a) the Gibbs sampler and b) the Metropolis-Hastings algorithm www.dms.umontreal.ca/~murua/ References [] Y. Cheng and G. Church, Biclustering of expression data, Int. Conf. Intelligent Systems for Molecular Biology 000), pp. 6 86. Next Generation Sequencing

8 REFERENCES [] Q. Sheng, Y. Moreau, and B. De Moor, Biclustering microarray data by Gibbs sampling, Bioinformatics 9 003), pp. ii96 ii05. [3] H. Turner, T. Bailey, and W. Krzanowsi, Improved biclustering of microarray data demonstrated through systematic performance tests, Computational Statistics & Data Analysis 48 005), pp. 35 54. [4] J.H. Ward, Hierarchical groupings to optimize an objective function, J. American Statistical Association 58 963), pp. 34 44.