A test for balanced coverage across cases and controls as a qualifying criterion in collapsing analysis.

Size: px
Start display at page:

Download "A test for balanced coverage across cases and controls as a qualifying criterion in collapsing analysis."

Transcription

1 A test for balanced coverage across cases and controls as a qualifying criterion in collapsing analysis. Background and Motivation: Collapsing analyses test the association of qualifying rare variants in defined genomic regions (e.g all consensus coding sequence or CCDS boundaries) with disease phenotype. Qualifying criteria for variants in these analysis include variant quality (e.g read depth, genotype quality), variant functional prediction (e.g protein coding change) and population frequency (minor allele frequency in ExAC or control datasets). However, before these analyses can be carried out, it is essential to control and minimize signal artifacts arising out of differences in sequencing coverage between cases and controls. In the current IGM collapsing analysis framework, the penultimate step before the collapsing analysis on cases and controls is a site coverage harmonization (SCH)[1]. For each genomic site being interrogated in the collapsing run (e.g CCDS sites), we calculate the fraction of cases and fraction of controls covered at a predetermined threshold coverage (e.g 10X) and then calculate the absolute difference in fractional coverage between cases and controls. We then calculate the mean absolute difference from all sites, and then subtract it from the absolute difference values for each site to reflect the deviation from the mean difference, which is then squared to define the variation value for each site. The resulting variation estimates across the million CCDS sites are sorted from largest to smallest and plotted as a cumulative sum of variation plot. The plot is then shifted on a 45 angle to find the peak maximum point. In other words, (y-x) is plotted against x. Here, the x value at which (y-x) is maximized points us to the suggested cutoff index. Any site where the absolute fractional difference is above this threshold is then excluded in subsequent collapsing analysis. This method is effective at pruning sites where the fractional difference between cases and controls is sufficiently high to induce biases in collapsing studies. However, by not normalizing the absolute fractional coverage difference to the cohort mean, we prune well-covered sites that might have a marginally larger coverage difference than poorly covered sites with a smaller difference. For instance, at an absolute difference threshold of 0.05, a site with fractional coverage 0.89 in cases and 0.95 in controls will be pruned while a site with fractional coverage 0.12 in cases and 0.17 in controls will be retained. Additionally, by computing fractional coverage difference across all sites, we add a high computational effort to the collapsing runs. In a typical rare variant collapsing run, we identify only about ~300K sites of the CCDS regions to have a qualifying variant. The CCDS comprises ~33M bases, so for every analysis, the pre-computation of coverage balance constitutes a 100 fold excess of computational load.

2 Methods: To reduce the retention of poorly covered sites at the expense of highly covered sites, we impose a statistical test of independence between case/control status and coverage. At a given site: For x cases covered at 10X, y controls covered at 10x, and s total number of cases, t total number of controls, we can model the number of covered cases X as a Binomial random variable: X ~ Bin(n = number covered samples, p = P(case covered)) If case/control status and coverage status are independent, then: P(case covered) = P(case) = s s + t This allows us to perform a Binomial test (two-sided) on the actual number of covered samples, x: BinomTest(k = x, n = x + y, p = s s + t ) A binomial test as described above can be executed independently at each site, enabling parallelization at the computing level. This method will also resolve the need to pre-compute fractional coverage difference at all CCDS sites to identify a threshold difference as required by the SCH method. We can perform the binomial test of coverage bias as an additional qualifying criteria only on those sites where there is an otherwise qualifying variant identified in a sample, resulting in a 100 fold decrease in computational burden. Results: We implemented a binomial test of coverage and case/control status independence as additional qualifying criterion in ATAV. We used two IGM cohorts to compare the binomial test method with the SCH method (1) A chronic kidney disease cohort with ~10,000 controls and ~1,700 cases, and (2) An idiopathic pulmonary fibrosis cohort with ~4,000 controls and ~200 cases. For each cohort, we analyzed CCDS sites (S) using the SCH method and compiled a list of sites (SSCH) that would be pruned before subsequent collapsing analysis. Independently, we performed the binomial coverage test described above for every CCDS site for the same cohort and identified sites (Sbinom) with a nominal p-value of 0.05 to be pruned prior to collapsing analysis. Finally, we executed a collapsing analysis on the cohort on all CCDS sites without any coverage analysis method (SQV).

3 SQV represents the set of sites where a qualifying variant satisfying typical qualifying criteria for variant quality, function and minor allele frequency is present in at least one sample. We then calculated Qualifying sites pruned with both methods = SSCH Sbinom Qualifying sites uniquely pruned by SCH method = SSCH - Sbinom Qualifying sites uniquely pruned by binomial test method = Sbinom - SSCH CKD cohort (MAF ) CKD cohort (MAF ) IPF cohort Sites pruned by both methods Sites pruned by SCH only Sites pruned by binom test only Table 1. Sites pruned by coverage analysis methods. For all analyses, we found that the SCH method pruned sites vastly in excess of those pruned by the binomial test method (SSCH - Sbinom >> Sbinom - SSCH). We then investigated the mean coverage of the pruned sites to evaluate the overall coverage of sites which are pruned by these methods. We are typically interested sites with high coverage across the cohort, where we have an increased probability for a sample to have a variant that satisfies qualifying criteria. We evaluated fractional coverage difference as determined by the SCH method against the binomial test p- value (Figure 1A) at each site. Sites pruned by the SCH method, but retained by binomial test had a high overall coverage across the cohort (mean fractional coverage across all sites = 0.86, Figure 1B), while sites pruned by binomial test but retained by SCH had low coverage (mean fractional coverage across all sites = 0.13, Figure 1C), implying that the binomial test is capable of rescuing sites with high coverage that are otherwise pruned by the SCH method.

4 Figure 1. (A) Scatter plot of absolute difference of coverage fraction against a binomial test p-value for 100,000 CCDS sites. Lower left quadrant represents sites that are pruned due a nominally significant p-value of 0.05 in binomial test, but retained in SCH method. Upper right quadrant represents sites that are retained by a binomial test but pruned by SCH method. (B) Frequency histogram of cohort fraction coverage for sites retained by SCH method and pruned by binomial test. (C) Frequency histogram of cohort fraction coverage for sites retained by binomial test method and pruned by SCH. Inflation: Additionally, we measured the inflation in collapsing results using lambda (the ratio of Observed/Expected p-value at the 50 th percentile of gene p-values after collapsing) to evaluate any unforeseen biases in the analyses through the use of the binomial test. In the two cohorts we evaluated, there was no significant difference in the inflation factor between the two methods, with the binomial test method performing nominally better. Lambda SCH Lambda Binom-test IPF cohort CKD cohort ( MAF) CKD cohort ( MAF) Table 2: Lambda from collapsing analysis using SCH or binomial test to control for coverage imbalance. Qualifying variants in top collapsing genes:

5 We counted the number of variants pruned uniquely by either SCH or the binomial test method within the top ten most significant collapsing analysis genes for each analysis. The binomial test method rescued several qualifying variants in top collapsing genes in each analysis, while the SCH method did not rescue any top gene QVs in any of the analyses. # Binom. test rescued QVs # SCH rescued QVs IPF cohort 42 0 CKD cohort ( MAF) 5 0 CKD cohort ( MAF) 2 0 Table 3: Number of rescued qualifying variants in top 10 most significant collapsing analysis genes ATAV runtime: Eliminating the SCH ATAV step significantly reduces the overall time needed to complete a full collapsing analysis. For the ~11,700 sample CKD cohort, elimination of the SCH step in favor of the binomial test method brought ATAV time down by ~26 hours, while runtime for the IPF cohort decreased by ~13 hours. These reductions are equivalent to around half of the total runtime. Though runtime measurements are affected by overall ATAV load at the time of analysis and are therefore subject to variation, it is clear that the binomial test method has the potential to greatly improve the speed of collapsing analysis. Conclusions: We implemented a test of independence of coverage and case/control status as a qualifying criterion in collapsing analysis. Our test of coverage independence rescued sites with reasonably balanced coverage that were pruned out by SCH method. In general, we found large overlap between sites that were pruned by either method for reasons of coverage imbalance. However, the binomial test uniquely retained fold more sites than it uniquely pruned when compared to SCH. The binomial test method could evaluate several thousand additional variant sites in the CCDS region that are pruned by SCH. The inflation factor, measured by lambda, was not significantly altered between the two methods. Typical collapsing runs require coverage data for the entire cohort to establish minor allele frequency for a variant. Therefore, adding a coverage comparison test on otherwise qualifying variants only marginally added to the compute time for an analysis. Implementing the coverage test as part of the collapsing run resulted in a 50% reduction in ATAV compute load and collapsing analysis time through the elimination of a previously necessary coverage harmonization step. The binomial test for independence of coverage and case-control status is thus a computationally efficient and robust method to control for coverage imbalance in collapsing analysis.

6 REFERENCES: 1. Petrovski, S., et al., An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis. Am J Respir Crit Care Med, (1): p

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

Chapter 5 Normal Probability Distributions

Chapter 5 Normal Probability Distributions Chapter 5 Normal Probability Distributions Section 5-1 Introduction to Normal Distributions and the Standard Normal Distribution A The normal distribution is the most important of the continuous probability

More information

Diploma in Financial Management with Public Finance

Diploma in Financial Management with Public Finance Diploma in Financial Management with Public Finance Cohort: DFM/09/FT Jan Intake Examinations for 2009 Semester II MODULE: STATISTICS FOR FINANCE MODULE CODE: QUAN 1103 Duration: 2 Hours Reading time:

More information

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny. Distributions September 17 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a

More information

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny. Distributions February 11 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a random

More information

ROBUST CHAUVENET OUTLIER REJECTION

ROBUST CHAUVENET OUTLIER REJECTION Submitted to the Astrophysical Journal Supplement Series Preprint typeset using L A TEX style emulateapj v. 12/16/11 ROBUST CHAUVENET OUTLIER REJECTION M. P. Maples, D. E. Reichart 1, T. A. Berger, A.

More information

Descriptive Statistics

Descriptive Statistics Petra Petrovics Descriptive Statistics 2 nd seminar DESCRIPTIVE STATISTICS Definition: Descriptive statistics is concerned only with collecting and describing data Methods: - statistical tables and graphs

More information

On-line Appendix: The Mutual Fund Holdings Database

On-line Appendix: The Mutual Fund Holdings Database Unexploited Gains from International Diversification: Patterns of Portfolio Holdings around the World Tatiana Didier, Roberto Rigobon, and Sergio L. Schmukler Review of Economics and Statistics, forthcoming

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

Table 1. Summary of Faculty Salary Data for Fall Mean Salary Males. Mean Salary Females. Median Salary Males

Table 1. Summary of Faculty Salary Data for Fall Mean Salary Males. Mean Salary Females. Median Salary Males Report to the UTK Faculty Senate from the Senate Budget and Planning Committee Analysis of Faculty Salary Data based upon Gender using Data from Fall 2015 Draft August 31, 2016 Louis J. Gross, Chair, Faculty

More information

CSC Advanced Scientific Programming, Spring Descriptive Statistics

CSC Advanced Scientific Programming, Spring Descriptive Statistics CSC 223 - Advanced Scientific Programming, Spring 2018 Descriptive Statistics Overview Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

More information

The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods

The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods Conference Uses of Central Balance Sheet Data Offices Information IFC / ECCBSO / CBRT Özdere-Izmir, September

More information

The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods

The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods The Use of Accounting Information to Estimate Indicators of Customer and Supplier Payment Periods Pierrette Heuse David Vivet Dominik Elgg Timm Körting Luis Ángel Maza Antonio Lorente Adrien Boileau François

More information

Continuous Probability Distributions

Continuous Probability Distributions 8.1 Continuous Probability Distributions Distributions like the binomial probability distribution and the hypergeometric distribution deal with discrete data. The possible values of the random variable

More information

SOLUTIONS TO THE LAB 1 ASSIGNMENT

SOLUTIONS TO THE LAB 1 ASSIGNMENT SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors: 2 20 Histogram of Pull Strengths (lb) Frequency 1 10 0 9 61 63 6 67 69 71 73

More information

Caught on Tape: Institutional Trading, Stock Returns, and Earnings Announcements

Caught on Tape: Institutional Trading, Stock Returns, and Earnings Announcements Caught on Tape: Institutional Trading, Stock Returns, and Earnings Announcements The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

More information

CHAPTER 5 Sampling Distributions

CHAPTER 5 Sampling Distributions CHAPTER 5 Sampling Distributions 5.1 The possible values of p^ are 0, 1/3, 2/3, and 1. These correspond to getting 0 persons with lung cancer, 1 with lung cancer, 2 with lung cancer, and all 3 with lung

More information

The Persistent Effect of Temporary Affirmative Action: Online Appendix

The Persistent Effect of Temporary Affirmative Action: Online Appendix The Persistent Effect of Temporary Affirmative Action: Online Appendix Conrad Miller Contents A Extensions and Robustness Checks 2 A. Heterogeneity by Employer Size.............................. 2 A.2

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Binomial Distribution and Discrete Random Variables

Binomial Distribution and Discrete Random Variables 3.1 3.3 Binomial Distribution and Discrete Random Variables Prof. Tesler Math 186 Winter 2017 Prof. Tesler 3.1 3.3 Binomial Distribution Math 186 / Winter 2017 1 / 16 Random variables A random variable

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1 Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Section 7.4-1 Chapter 7 Estimates and Sample Sizes 7-1 Review and Preview 7- Estimating a Population

More information

CA660 Statistical Data Analysis (2013_2014) M.Sc. (DA Major) - backgrounds various. Exercises 2 : Probability Distributions and Applications

CA660 Statistical Data Analysis (2013_2014) M.Sc. (DA Major) - backgrounds various. Exercises 2 : Probability Distributions and Applications CA660 tatistical Data Analysis (03_0) M.c. (DA Major) - backgrounds various Exercises : Probability Distributions and Applications includes conditionals, Decision-making + Classical Inference: ampling

More information

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Assessing Normality Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile

More information

TABLE OF CONTENTS - VOLUME 2

TABLE OF CONTENTS - VOLUME 2 TABLE OF CONTENTS - VOLUME 2 CREDIBILITY SECTION 1 - LIMITED FLUCTUATION CREDIBILITY PROBLEM SET 1 SECTION 2 - BAYESIAN ESTIMATION, DISCRETE PRIOR PROBLEM SET 2 SECTION 3 - BAYESIAN CREDIBILITY, DISCRETE

More information

Section The Sampling Distribution of a Sample Mean

Section The Sampling Distribution of a Sample Mean Section 5.2 - The Sampling Distribution of a Sample Mean Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin The Sampling Distribution of a Sample Mean Example: Quality control check of light

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

MSCI US EQUITY INDEXES METHODOLOGY

MSCI US EQUITY INDEXES METHODOLOGY INDEX METHODOLOGY MSCI US EQUITY INDEXES METHODOLOGY Index Construction Objectives and Methodology for the MSCI US Equity Indexes July 2018 JULY 2018 CONTENTS 1 US Equity Indexes Methodology Overview...

More information

Aspects of Sample Allocation in Business Surveys

Aspects of Sample Allocation in Business Surveys Aspects of Sample Allocation in Business Surveys Gareth James, Mark Pont and Markus Sova Office for National Statistics, Government Buildings, Cardiff Road, NEWPORT, NP10 8XG, UK. Gareth.James@ons.gov.uk,

More information

Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return

More information

Forecasting Chapter 14

Forecasting Chapter 14 Forecasting Chapter 14 14-01 Forecasting Forecast: A prediction of future events used for planning purposes. It is a critical inputs to business plans, annual plans, and budgets Finance, human resources,

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc. 1 3.1 Describing Variation Stem-and-Leaf Display Easy to find percentiles of the data; see page 69 2 Plot of Data in Time Order Marginal plot produced by MINITAB Also called a run chart 3 Histograms Useful

More information

Stochastic Analysis Of Long Term Multiple-Decrement Contracts

Stochastic Analysis Of Long Term Multiple-Decrement Contracts Stochastic Analysis Of Long Term Multiple-Decrement Contracts Matthew Clark, FSA, MAAA and Chad Runchey, FSA, MAAA Ernst & Young LLP January 2008 Table of Contents Executive Summary...3 Introduction...6

More information

STAT 157 HW1 Solutions

STAT 157 HW1 Solutions STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill

More information

MUTUAL FUND PERFORMANCE ANALYSIS PRE AND POST FINANCIAL CRISIS OF 2008

MUTUAL FUND PERFORMANCE ANALYSIS PRE AND POST FINANCIAL CRISIS OF 2008 MUTUAL FUND PERFORMANCE ANALYSIS PRE AND POST FINANCIAL CRISIS OF 2008 by Asadov, Elvin Bachelor of Science in International Economics, Management and Finance, 2015 and Dinger, Tim Bachelor of Business

More information

Risk Management CHAPTER 12

Risk Management CHAPTER 12 Risk Management CHAPTER 12 Concept of Risk Management Types of Risk in Investments Risks specific to Alternative Investments Risk avoidance Benchmarking Performance attribution Asset allocation strategies

More information

MSCI US Equity Indices Methodology

MSCI US Equity Indices Methodology Index Construction Objectives and Methodology for the MSCI US Equity Indices Contents Section 1: US Equity Indices Methodology Overview... 5 1.1 Introduction... 5 1.2 Defining the US Equity Market Capitalization

More information

1 Bayesian Bias Correction Model

1 Bayesian Bias Correction Model 1 Bayesian Bias Correction Model Assuming that n iid samples {X 1,...,X n }, were collected from a normal population with mean µ and variance σ 2. The model likelihood has the form, P( X µ, σ 2, T n >

More information

Internet Appendix to Do the Rich Get Richer in the Stock Market? Evidence from India

Internet Appendix to Do the Rich Get Richer in the Stock Market? Evidence from India Internet Appendix to Do the Rich Get Richer in the Stock Market? Evidence from India John Y. Campbell, Tarun Ramadorai, and Benjamin Ranish 1 First draft: March 2018 1 Campbell: Department of Economics,

More information

FV N = PV (1+ r) N. FV N = PVe rs * N 2011 ELAN GUIDES 3. The Future Value of a Single Cash Flow. The Present Value of a Single Cash Flow

FV N = PV (1+ r) N. FV N = PVe rs * N 2011 ELAN GUIDES 3. The Future Value of a Single Cash Flow. The Present Value of a Single Cash Flow QUANTITATIVE METHODS The Future Value of a Single Cash Flow FV N = PV (1+ r) N The Present Value of a Single Cash Flow PV = FV (1+ r) N PV Annuity Due = PVOrdinary Annuity (1 + r) FV Annuity Due = FVOrdinary

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

GRAMMATICAL EVOLUTION. Peter Černo

GRAMMATICAL EVOLUTION. Peter Černo GRAMMATICAL EVOLUTION Peter Černo Grammatical Evolution (GE) Is an evolutionary algorithm that can evolve programs. Representation: linear genome + predefined grammar. Each individual: variable-length

More information

Appendices. Strained Schools Face Bleak Future: Districts Foresee Budget Cuts, Teacher Layoffs, and a Slowing of Education Reform Efforts

Appendices. Strained Schools Face Bleak Future: Districts Foresee Budget Cuts, Teacher Layoffs, and a Slowing of Education Reform Efforts Appendices Strained Schools Face Bleak Future: Districts Foresee Budget Cuts, Teacher Layoffs, and a Slowing of Education Reform Efforts Appendix 1: Confidence Intervals and Statistical Significance Many

More information

Decision Trees An Early Classifier

Decision Trees An Early Classifier An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

Longitudinal Analysis Report. Lebanon Valley College

Longitudinal Analysis Report. Lebanon Valley College Longitudinal Analysis Report Lebanon Valley College Time Span 1: 7/1/2014-6/30/2015 Total Tests = 19 Outbound = 19 Academic Level: Masters Aggregates: ACBSP (US) - Accreditation Council for Business Schools

More information

Chapter 6 Analyzing Accumulated Change: Integrals in Action

Chapter 6 Analyzing Accumulated Change: Integrals in Action Chapter 6 Analyzing Accumulated Change: Integrals in Action 6. Streams in Business and Biology You will find Excel very helpful when dealing with streams that are accumulated over finite intervals. Finding

More information

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment MBEJ 1023 Planning Analytical Methods Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment Contents What is statistics? Population and Sample Descriptive Statistics Inferential

More information

AIRCURRENTS: PORTFOLIO OPTIMIZATION FOR REINSURERS

AIRCURRENTS: PORTFOLIO OPTIMIZATION FOR REINSURERS MARCH 12 AIRCURRENTS: PORTFOLIO OPTIMIZATION FOR REINSURERS EDITOR S NOTE: A previous AIRCurrent explored portfolio optimization techniques for primary insurance companies. In this article, Dr. SiewMun

More information

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ] s@lm@n PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ] Question No : 1 A 2-step binomial tree is used to value an American

More information

Session Window. Variable Name Row. Worksheet Window. Double click on MINITAB icon. You will see a split screen: Getting Started with MINITAB

Session Window. Variable Name Row. Worksheet Window. Double click on MINITAB icon. You will see a split screen: Getting Started with MINITAB STARTING MINITAB: Double click on MINITAB icon. You will see a split screen: Session Window Worksheet Window Variable Name Row ACTIVE WINDOW = BLUE INACTIVE WINDOW = GRAY f(x) F(x) Getting Started with

More information

Cost Distribution Analysis of Remote Monitoring System Use in the Treatment of Chronic Diseases

Cost Distribution Analysis of Remote Monitoring System Use in the Treatment of Chronic Diseases University of Arkansas, Fayetteville ScholarWorks@UARK Industrial Engineering Undergraduate Honors Theses Industrial Engineering 5-2013 Cost Distribution Analysis of Remote Monitoring System Use in the

More information

Risk-Based Capital (RBC) Reserve Risk Charges Improvements to Current Calibration Method

Risk-Based Capital (RBC) Reserve Risk Charges Improvements to Current Calibration Method Risk-Based Capital (RBC) Reserve Risk Charges Improvements to Current Calibration Method Report 7 of the CAS Risk-based Capital (RBC) Research Working Parties Issued by the RBC Dependencies and Calibration

More information

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Consistent estimators for multilevel generalised linear models using an iterated bootstrap Multilevel Models Project Working Paper December, 98 Consistent estimators for multilevel generalised linear models using an iterated bootstrap by Harvey Goldstein hgoldstn@ioe.ac.uk Introduction Several

More information

Statistics and Probability

Statistics and Probability Statistics and Probability Continuous RVs (Normal); Confidence Intervals Outline Continuous random variables Normal distribution CLT Point estimation Confidence intervals http://www.isrec.isb-sib.ch/~darlene/geneve/

More information

MSCI Global Investable Market Indices Methodology

MSCI Global Investable Market Indices Methodology MSCI Global Investable Market Indices Methodology Index Construction Objectives, Guiding Principles and Methodology for the MSCI Global Investable Market Indices Contents Outline of the Methodology Book...

More information

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base Area of an annulus A = π(r 2 r 2 ) R radius of the outer circle r radius of the inner circle HSC formula sheet Area of an ellipse A = πab a length of the semi-major axis b length of the semi-minor axis

More information

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer

More information

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus Longitudinal Analysis Report Embry-Riddle Aeronautical University - Worldwide Campus Time Span 1: 7/1/2013-6/30/2014 Total Tests = 0 Outbound = 0 Time Span 2: 7/1/2014-6/30/2015 Total Tests = 0 Outbound

More information

Online Appendix (Not For Publication)

Online Appendix (Not For Publication) A Online Appendix (Not For Publication) Contents of the Appendix 1. The Village Democracy Survey (VDS) sample Figure A1: A map of counties where sample villages are located 2. Robustness checks for the

More information

Considerations for Sampling from a Skewed Population: Establishment Surveys

Considerations for Sampling from a Skewed Population: Establishment Surveys Considerations for Sampling from a Skewed Population: Establishment Surveys Marcus E. Berzofsky and Stephanie Zimmer 1 Abstract Establishment surveys often have the challenge of highly-skewed target populations

More information

Rubric TESTING FRAMEWORK FOR EARLY WARNING INDICATORS CONTENTS

Rubric TESTING FRAMEWORK FOR EARLY WARNING INDICATORS CONTENTS TESTING FRAMEWORK FOR EARLY WARNING INDICATORS Joint project by: Ģirts Maslinarskis (Latvijas Banka), Jussi Leinonen (ECB) & Matti Hellqvist (ECB) 12th Payment and Settlement System Simulation Seminar

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

THE EUROSYSTEM S EXPERIENCE WITH FORECASTING AUTONOMOUS FACTORS AND EXCESS RESERVES

THE EUROSYSTEM S EXPERIENCE WITH FORECASTING AUTONOMOUS FACTORS AND EXCESS RESERVES THE EUROSYSTEM S EXPERIENCE WITH FORECASTING AUTONOMOUS FACTORS AND EXCESS RESERVES reserve requirements, together with its forecasts of autonomous excess reserves, form the basis for the calibration of

More information

Probability and distributions

Probability and distributions 2 Probability and distributions The concepts of randomness and probability are central to statistics. It is an empirical fact that most experiments and investigations are not perfectly reproducible. The

More information

Specific Objectives. Be able to: Apply graphical frequency analysis for data that fit the Log- Pearson Type 3 Distribution

Specific Objectives. Be able to: Apply graphical frequency analysis for data that fit the Log- Pearson Type 3 Distribution CVEEN 4410: Engineering Hydrology (continued) : Topic and Goal: Use frequency analysis of historical data to forecast hydrologic events Specific Be able to: Apply graphical frequency analysis for data

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Lecture Week 4 Inspecting Data: Distributions

Lecture Week 4 Inspecting Data: Distributions Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Let s make our own sampling! If we use a random sample (a survey) or if we randomly assign treatments to subjects (an experiment) we can come up with proper, unbiased conclusions

More information

Monte Carlo Simulation (General Simulation Models)

Monte Carlo Simulation (General Simulation Models) Monte Carlo Simulation (General Simulation Models) Revised: 10/11/2017 Summary... 1 Example #1... 1 Example #2... 10 Summary Monte Carlo simulation is used to estimate the distribution of variables when

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

Chapter 4. The Normal Distribution

Chapter 4. The Normal Distribution Chapter 4 The Normal Distribution 1 Chapter 4 Overview Introduction 4-1 Normal Distributions 4-2 Applications of the Normal Distribution 4-3 The Central Limit Theorem 4-4 The Normal Approximation to the

More information

Chapter 5. Forecasting. Learning Objectives

Chapter 5. Forecasting. Learning Objectives Chapter 5 Forecasting To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Binomial distribution

Binomial distribution Binomial distribution Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Tuesday 24.05.2010 1 / 28 Overview Binomial distribution (Aalen chapter 4, Kirkwood and

More information

Mortality of Beneficiaries of Charitable Gift Annuities 1 Donald F. Behan and Bryan K. Clontz

Mortality of Beneficiaries of Charitable Gift Annuities 1 Donald F. Behan and Bryan K. Clontz Mortality of Beneficiaries of Charitable Gift Annuities 1 Donald F. Behan and Bryan K. Clontz Abstract: This paper is an analysis of the mortality rates of beneficiaries of charitable gift annuities. Observed

More information

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL There is a wide range of probability distributions (both discrete and continuous) available in Excel. They can be accessed through the Insert Function

More information

Alternate Specifications

Alternate Specifications A Alternate Specifications As described in the text, roughly twenty percent of the sample was dropped because of a discrepancy between eligibility as determined by the AHRQ, and eligibility according to

More information

Tutorial Handout Statistics, CM-0128M Descriptive Statistics

Tutorial Handout Statistics, CM-0128M Descriptive Statistics Tutorial Handout Statistics, CM-0128M January 18, 2013 Exercise 1. The following figures show the annual salaries in of 20 workers in a small firm. Calculate the arithmetic mean, median and mode salaries.

More information

Accolade: The Effect of Personalized Advocacy on Claims Cost

Accolade: The Effect of Personalized Advocacy on Claims Cost Aon U.S. Health & Benefits Accolade: The Effect of Personalized Advocacy on Claims Cost A Case Study of Two Employer Groups October, 2018 Risk. Reinsurance. Human Resources. Preparation of This Report

More information

23.1 Probability Distributions

23.1 Probability Distributions 3.1 Probability Distributions Essential Question: What is a probability distribution for a discrete random variable, and how can it be displayed? Explore Using Simulation to Obtain an Empirical Probability

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution Patrick Breheny February 16 Patrick Breheny STA 580: Biostatistics I 1/38 Random variables The Binomial Distribution Random variables The binomial coefficients The binomial distribution

More information

EENG473 Mobile Communications Module 3 : Week # (11) Mobile Radio Propagation: Large-Scale Path Loss

EENG473 Mobile Communications Module 3 : Week # (11) Mobile Radio Propagation: Large-Scale Path Loss EENG473 Mobile Communications Module 3 : Week # (11) Mobile Radio Propagation: Large-Scale Path Loss Practical Link Budget Design using Path Loss Models Most radio propagation models are derived using

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

Introduction to QTL (Quantitative Trait Loci) & LOD analysis Steven M. Carr / Biol 4241 / Winter Study Design of Hamer et al.

Introduction to QTL (Quantitative Trait Loci) & LOD analysis Steven M. Carr / Biol 4241 / Winter Study Design of Hamer et al. Introduction to QTL (Quantitative Trait Loci) & LOD analysis Steven M. Carr / Biol 4241 / Winter 2016 Quantitative Trait Loci: contribution of multiple genes to a single trait Linkage between phenotypic

More information

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 7 Statistical Intervals Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to

More information

And The Winner Is? How to Pick a Better Model

And The Winner Is? How to Pick a Better Model And The Winner Is? How to Pick a Better Model Part 2 Goodness-of-Fit and Internal Stability Dan Tevet, FCAS, MAAA Goodness-of-Fit Trying to answer question: How well does our model fit the data? Can be

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

MEMORANDUM. From: Division of Risk, Strategy, and Financial Innovation 1

MEMORANDUM. From: Division of Risk, Strategy, and Financial Innovation 1 MEMORANDUM To: File From: Division of Risk, Strategy, and Financial Innovation 1 Re: Information regarding activities and positions of participants in the singlename credit default swap market Date: 3/15/2012

More information

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance Chapter 5 Discrete Probability Distributions Random Variables Discrete Probability Distributions Expected Value and Variance.40.30.20.10 0 1 2 3 4 Random Variables A random variable is a numerical description

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

Genetic testing anti-selection risk and

Genetic testing anti-selection risk and Genetic testing anti-selection risk and implications for insurers Florian Rechfeld Senior Research Analyst, Life & Health R&D, Swiss Re CRO Assembly, 31 th May 2018 Trends and prospects in genetic testing

More information

CFA Level I - LOS Changes

CFA Level I - LOS Changes CFA Level I - LOS Changes 2018-2019 Topic LOS Level I - 2018 (529 LOS) LOS Level I - 2019 (525 LOS) Compared Ethics 1.1.a explain ethics 1.1.a explain ethics Ethics Ethics 1.1.b 1.1.c describe the role

More information

The Central Limit Theorem

The Central Limit Theorem The Central Limit Theorem Patrick Breheny March 1 Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 1 / 29 Kerrich s experiment Introduction The law of averages Mean and SD of

More information

CFA Level I - LOS Changes

CFA Level I - LOS Changes CFA Level I - LOS Changes 2017-2018 Topic LOS Level I - 2017 (534 LOS) LOS Level I - 2018 (529 LOS) Compared Ethics 1.1.a explain ethics 1.1.a explain ethics Ethics 1.1.b describe the role of a code of

More information