Computational Statistics Handbook with MATLAB

Similar documents
Market Risk Analysis Volume I

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Statistics and Finance

Financial Models with Levy Processes and Volatility Clustering

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Statistical Models and Methods for Financial Markets

Volatility Models and Their Applications

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

UPDATED IAA EDUCATION SYLLABUS

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Institute of Actuaries of India Subject CT6 Statistical Methods

PROBABILITY. Wiley. With Applications and R ROBERT P. DOBROW. Department of Mathematics. Carleton College Northfield, MN

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Subject CS2A Risk Modelling and Survival Analysis Core Principles

A First Course in Probability

Market Risk Analysis Volume IV. Value-at-Risk Models

From Financial Engineering to Risk Management. Radu Tunaru University of Kent, UK

2017 IAA EDUCATION SYLLABUS

Computational Methods in Finance

Syllabus 2019 Contents

Stochastic Claims Reserving _ Methods in Insurance

MODELS FOR QUANTIFYING RISK

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4

2.1 Random variable, density function, enumerative density function and distribution function

Applied Quantitative Finance

St. Xavier s College Autonomous Mumbai STATISTICS. F.Y.B.Sc. Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards)

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Implementing Models in Quantitative Finance: Methods and Cases

Fitting parametric distributions using R: the fitdistrplus package

Handbook of Financial Risk Management

HANDBOOK OF. Market Risk CHRISTIAN SZYLAR WILEY

Outline. Review Continuation of exercises from last time

Changes to Exams FM/2, M and C/4 for the May 2007 Administration

Introductory Econometrics for Finance

Monte Carlo Methods in Financial Engineering

Monte Carlo Methods in Finance

Applied Stochastic Processes and Control for Jump-Diffusions

Introduction to Risk Parity and Budgeting

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

Bayesian Multinomial Model for Ordinal Data

Content Added to the Updated IAA Education Syllabus

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Interest Rate Modeling

Semimartingales and their Statistical Inference

CFA Level I - LOS Changes

Market Risk Analysis Volume II. Practical Financial Econometrics

Introduction Models for claim numbers and claim sizes

CFA Level I - LOS Changes

I Preliminary Material 1

Fitting financial time series returns distributions: a mixture normality approach

Session 5. Predictive Modeling in Life Insurance

Dynamic Copula Methods in Finance

Chapter 6 Simple Correlation and

Central University of Punjab, Bathinda

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an Imprint of Elsevier

1. You are given the following information about a stationary AR(2) model:

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

List of figures. I General information 1

Discrete-time Asset Pricing Models in Applied Stochastic Finance

SECOND EDITION. MARY R. HARDY University of Waterloo, Ontario. HOWARD R. WATERS Heriot-Watt University, Edinburgh

TABLE OF CONTENTS - VOLUME 2

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

STA 532: Theory of Statistical Inference

yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

A Skewed Truncated Cauchy Logistic. Distribution and its Moments

St. Xavier s College Autonomous Mumbai F.Y.B.A. Syllabus For 1 st Semester Course in Statistics (June 2017 onwards)

Introduction to Algorithmic Trading Strategies Lecture 8

2017 IAA EDUCATION GUIDELINES

ELEMENTS OF MONTE CARLO SIMULATION

Robust Critical Values for the Jarque-bera Test for Normality

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

palgrave Shipping Derivatives and Risk Management macmiuan Amir H. Alizadeh & Nikos K. Nomikos

Algorithms, Analytics, Data, Models, Optimization. Xin Guo University of California, Berkeley, USA. Tze Leung Lai Stanford University, California, USA

CFA Level 1 - LOS Changes

Application of MCMC Algorithm in Interest Rate Modeling

Contents Utility theory and insurance The individual risk model Collective risk models

Credit Risk Modeling Using Excel and VBA with DVD O. Gunter Loffler Peter N. Posch. WILEY A John Wiley and Sons, Ltd., Publication

Homework Problems Stat 479

International Cost Estimating and Analysis Association Testable Topics List CCEA

Lecture 3: Probability Distributions (cont d)

Iranian Journal of Economic Studies. Inflation Behavior in Top Sukuk Issuing Countries : Using a Bayesian Log-linear Model

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications

List of Examples. Chapter 1

Key Features Asset allocation, cash flow analysis, object-oriented portfolio optimization, and risk analysis

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

GENERALIZED PARETO DISTRIBUTION FOR FLOOD FREQUENCY ANALYSIS

MAS187/AEF258. University of Newcastle upon Tyne

Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

Following Budapest. IAA Education Syllabus. Proposed motion and various related documents

Maximum Likelihood Estimation

Transcription:

«H Computer Science and Data Analysis Series Computational Statistics Handbook with MATLAB Second Edition Wendy L. Martinez The Office of Naval Research Arlington, Virginia, U.S.A. Angel R. Martinez Naval Surface Warfare Center Dahlgren, Virginia, U.S.A. Chapman &. Hall/CRC Taylor & Francis Group Boca Raton London New York Chapman & Hall/CRC is an imprint of the Taylor & Francis Group, an informa business

Table ofcontents Preface to the Second Edition Preface to the First Edition xvii xxi Chapter 1 Introduction 1.1 What Is Computational Statistics? 1 1.2 An Overview of the Book 2 Philosophy 2 What Is Covered 3 A Word About Notation 5 1.3 MATLAB Code 6 Computational Statistics Toolbox 7 Internet Resources 8 1.4 Further Reading 9 Chapter 2 Probability Concepts 2.1 Introduction 11 2.2 Probability 12 Background 12 Probability 14 Axioms of Probability 17 2.3 Conditional Probability and Independence 17 Conditional Probability 17 Independence 18 Bayes' Theorem 19 2.4 Expectation 21 Mean and Variance 21 Skewness 23 Kurtosis 23 2.5 Common Distributions 24 Binomial 24 Poisson 26 Uniform 29 Normal 31 vii

viii Computational Statistics Handbook with MATLAB, 2 ND Edition Exponential 34 Gamma 36 Chi-Square 37 Weibull 38 Beta 40 Student's t Distribution 41 Multivariate Normal 44 Multivariate t Distribution 47 2.6 MATLAB Code 48 2.7 Further Reading 49 Exercises 52 Chapter 3 Sampling Concepts 3.1 Introduction 55 3.2 Sampling Terminology and Concepts 55 Sample Mean and Sample Variance 57 Sample Moments 58 Covariance 60 3.3 Sampling Distributions 63 3.4 Parameter Estimation 65 Bias 66 MeanSquared Error 66 Relative Efficiency 67 Standard Error 67 Maximum Likelihood Estimation 68 Method of Moments 71 3.5 Empirical Distribution Function 72 Quantiles 74 3.6 MATLAB Code 77 3.7 Further Reading 78 Exercises 80 Chapter 4 Generating Random Variables 4.1 Introduction 83 4.2 General Techniques for Generating Random Variables 83 Uniform Random Numbers 83 Inverse Transform Method 86 Acceptance-Rejection Method 89 4.3 Generating Continuous Random Variables 93 Normal Distribution 93 Exponential Distribution 94 Gamma 95

Table ofcontents ix Chi-Square 98 Beta 99 Multivariate Normal 101 Multivariate Student's t Distribution 103 Generating Variates on a Sphere 104 4.4 Generating Discrete Random Variables 107 Binomial 107 Poisson 108 Discrete Uniform 111 4.5 MATLAB Code 112 4.6 Further Reading 113 Exercises 115 Chapter 5 Exploratory Data Analysis 5.1 Introduction 117 5.2 Exploring Univariate Data 119 Histograms 119 Stem-and-Leaf 122 Quantile-Based Plots - Continuous Distributions 124 Quantile Plots - Discrete Distributions 132 Box Plots 138 5.3 Exploring Bivariate and Trivariate Data 145 Scatterplots 145 Surface Plots 146 Contour Plots 148 Bivariate Histogram 149 3-D Scatterplot 155 5.4 Exploring Multi-Dimensional Data 158 Scatterplot Matrix 158 Slices and Isosurfaces 160 Glyphs 166 Andrews Curves 168 Parallel Coordinates 172 5.5 MATLAB Code 179 5.6 Further Reading 181 Exercises 183 Chapter 6 Finding Structure 6.1 Introduction 187 6.2 Projecting Data 188 6.3 Principal Component Analysis 190 6.4 Projection Pursuit EDA 195

x Computational Statistics Handbook with MATLAB, 2 ND Edition Projection Pursuit Index 197 Finding the Structure 198 Structure Removal 199 6.5 Independent Component Analysis 204 6.6 Grand Tour 211 6.7 Nonlinear Dimensionality Reduction 216 Multidimensional Scaling 216 Isometric Feature Mapping - ISOMAP 220 6.8 MATLAB Code 224 6.9 Further Reading 227 Exercises 230 Chapter 7 Monte Carlo Methods for Inferential Statistics 7.1 Introduction 233 7.2 Classical Inferential Statistics 234 Hypothesis Testing 234 Confidence Intervals 243 7.3 Monte Carlo Methods for Inferential Statistics 246 Basic Monte Carlo Procedure 246 Monte Carlo Hypothesis Testing 247 Monte Carlo Assessment of Hypothesis Testing 252 7.4 Bootstrap Methods 256 General Bootstrap Methodology 256 Bootstrap Estimate of Standard Error 258 Bootstrap Estimate of Bias 260 Bootstrap Confidence Intervals 262 7.5 MATLAB Code 268 7.6 Further Reading 269 Exercises 271 Chapter 8 Data Partitioning 8.1 Introduction 273 8.2 Cross-Validation 274 8.3Jackknife 281 8.4 Better Bootstrap Confidence Intervals 289 8.5 Jackknife-After-Bootstrap 293 8.6 MATLAB Code 295 8.7 Further Reading 296 Exercises 298

Table of Contents xi Chapter 9 Probability Density Estimation 9.1 Introduction 301 9.2 Histograms 303 1-D Histograms 303 Multivariate Histograms 309 Frequency Polygons 311 Averaged Shifted Histograms 316 9.3 Kernel Density Estimation 322 Univariate Kernel Estimators 322 Multivariate Kernel Estimators 327 9.4 Finite Mixtures 329 Univariate Finite Mixtures 331 Visualizing Finite Mixtures 333 Multivariate Finite Mixtures 335 EM Algorithm for Estimating the Parameters 338 Adaptive Mixtures 343 9.5 Generating Random Variables 348 9.6 MATLAB Code 356 9.7 Further Reading 357 Exercises 359 Chapter 10 Supervised Learning 10.1 Introduction 363 10.2 Bayes Decision Theory 365 Estimating Class-Conditional Probabilities: Parametric Method 367 Estimating Class-Conditional Probabilities: Nonparametric 369 Bayes Decision Rule 370 Likelihood Ratio Approach 377 10.3 Evaluating the Classifier 380 Independent Test Sample 380 Cross-Validation 382 Receiver Operating Characteristic (ROC) Curve 385 10.4 Classification Trees 390 Growing the Tree 394 Pruning the Tree 399 Choosing the Best Tree 403 Other Tree Methods 412 10.5 Combining Classifiers 414 Bagging 415 Boosting 417 Arcing Classifiers 420 Random Forests 422 10.6 MATLAB Code 423

xii Computational Statistics Handbook with MATLAB 9, 2 ND Edition 10.7 Further Reading 424 Exercises 428 Chapter 11 Unsupervised Learning 11.1 Introduction 431 11.2Measuresof Distance 432 11.3 Hierarchical Clustering 434 11.4 K-Means Clustering 442 11.5 Model-Based Clustering 445 Finite Mixture Models and the EM Algorithm 446 Model-Based Agglomerative Clustering 450 Bayesian Information Criterion 453 Model-Based Clustering Procedure 453 11.6 Assessing Cluster Results 458 Mojena - Upper Tail Rule 458 Silhouette Statistic 459 Other Methods for Evaluating Clusters 462 11.7 MATLAB Code 465 11.8 Further Reading 466 Exercises 469 Chapter 12 Parametric Models 12.1 Introduction 471 12.2 Spline Regression Models 477 12.3 Logistic Regression 482 Creating the Model 482 Interpreting the Model Parameters 487 12.4 Generalized Linear Models 488 Exponential Family Form 489 Generalized Linear Model 494 Model Checking 498 12.5 MATLAB Code 508 12.6 Further Reading 509 Exercises 511 Chapter 13 Nonparametric Models 13.1 Introduction 513 13.2 Some Smoothing Methods 514 Bin Smoothing 515 RunningMean 517

Table ofcontents xiii Running Line 518 Local Polynomial Regression - Loess 519 Robust Loess 525 13.3 Kernel Methods 528 Nadaraya-Watson Estimator 531 Local Linear Kernel Estimator 532 13.4 Smoothing Splines 534 Natural Cubic Splines 536 Reinsch Method for Finding Smoothing Splines 537 Values for a Cubic Smoothing Spline 540 Weighted Smoothing Spline 540 13.5 Nonparametric Regression - Other Details 542 Choosing the Smoothing Parameter 542 Estimation of the Residual Variance 547 Variability of Smooths 548 13.6 Regression Trees 551 Growing a Regression Tree 553 Pruning a Regression Tree 557 Selecting a Tree 557 13.7 Additive Models 563 13.8 MATLAB Code 567 13.9 Further Reading 570 Exercises 573 Chapter 14 Markov Chain Monte Carlo Methods 14.1 Introduction 575 14.2 Background 576 Bayesian Inference 576 Monte Carlo Integration 577 Markov Chains 579 Analyzing the Output 580 14.3 Metropolis-Hastings Algorithms 580 Metropolis-Hastings Sampler 581 Metropolis Sampler 584 Independence Sampler 587 Autoregressive Generating Density 589 14.4 The Gibbs Sampler 592 14.5 Convergence Monitoring 602 Gelman and Rubin Method 604 Raftery and Lewis Method 607 14.6 MATLAB Code 609 14.7 Further Reading 610 Exercises 612

xiv Computational Statistics Handbook with MATLAB, 2 ND Edition Chapter 15 Spatial Statistics 15.1 Introduction 617 What Is Spatial Statistics? 617 Types of Spatial Data 618 Spatial Point Patterns 619 Complete Spatial Randomness 621 15.2 Visualizing Spatial Point Processes 623 15.3 Exploring First-order and Second-order Properties 627 Estimating the Intensity 627 Estimating the Spatial Dependence 630 15.4 Modeling Spatial Point Processes 638 Nearest Neighbor Distances 638 IC-Function 643 15.5 Simulating Spatial Point Processes 646 Homogeneous Poisson Process 647 Binomial Process 650 Poisson Cluster Process 651 Inhibition Process 654 Strauss Process 656 15.6 MATLAB Code 658 15.7 Further Reading 659 Exercises 661 Appendix A Introduction to MATLAB A.l What Is MATLAB? 663 A.2 Getting Help in MATLAB 664 A.3 File and Workspace Management 664 A.4 Punctuation in MATLAB 666 A.5 Arithmetic Operators 666 A.6 Data Constructs in MATLAB 668 Basic Data Constructs 668 Building Arrays 668 CellArrays 669 A.7 Script Files and Functions 670 A.8 Control Flow 672 For Loop 672 WhileLoop 672 If-Else Statements 673 Switch Statement 673 A.9 Simple Plotting 673 A.10 Contact Information 676

Table ofcontents xv Appendix B Projection Pursuit Indexes B.l Indexes 677 Friedman-Tukey Index 677 Entropy Index 678 Moment Index 678 L 2 Distances 679 B.2 MATLAB Source Code 680 Appendix C MATLAB Statistics Toolbox File I/O 687 Dataset Arrays 687 GroupedData 687 Descriptive Statistics 688 Statistical Visualization 688 Probability Density Functions 689 Cumulative Distribution Functions 690 Inverse Cumulative Distribution Functions 691 Distribution Statistics Functions 691 Distribution Fitting Functions 692 Negative Log-Likelihood Functions 692 Random Number Generators 693 Hypothesis Tests 694 Analysis of Variance 694 Regression Analysis 694 Multivariate Methods 695 Cluster Analysis 696 Classification 696 Markov Models 696 Design of Experiments 697 Statistical Process Control 697 Graphical User Interfaces 697 Appendix D Computational Statistics Toolbox Probability Distributions 699 Statistics 699 Random Number Generation 700 Exploratory Data Analysis 700 Bootstrap and Jackknife 701 Probability Density Estimation 701 Supervised Learning 701 Unsupervised Learning 701

xvi Computational Statistics Handbook with MATLAB, 2 ND Edition Parametric and Nonparametric Models 702 Markov Chain Monte Carlo 702 Spatial Statistics 702 Appendix E Exploratory Data Analysis Toolboxes E.l Introduction 703 E.2 Exploratory Data Analysis Toolbox 704 E.3 EDA GUI Toolbox 705 Appendix F Data Sets Introduction 719 Appendix G Notation Overview 727 ObservedData 727 Greek Letters 728 Functions and Distributions 728 Matrix Notation 729 Statistics 729 References 731 Author Index 751 Subject Index 757