Descriptive Statistics Bios 662
|
|
- Jacob Knight
- 6 years ago
- Views:
Transcription
1 Descriptive Statistics Bios 662 Michael G. Hudgens, Ph.D. mhudgens :51 BIOS Descriptive Statistics
2 Descriptive Statistics Types of variables Measures of location Measures of spread, shape Data displays BIOS Descriptive Statistics
3 Types of Variables A variable is a quantity that may vary from object to object A sample or data set is a collection of values of one or more variables. Types of variables Quantitative variable intrinsically numerical e.g. age, height, counts Qualitative (categorical) - intrinsically nonnumerical e.g. gender, province, country BIOS Descriptive Statistics
4 Types of Variables Qualitative (categorical) - intrinsically nonnumerical Binary, dichotomous e.g., alive/dead, female/male Ordinal - natural ordering e.g., diagnosis (certain, probable, unlikely,...) e.g., attitude (strongly agree, agree, neutral,...) Nominal - no natural ordering e.g., religion, race In recording qualitative data, numerical values may be assigned BIOS Descriptive Statistics
5 Descriptive Statistics Types of variables Measures of location Measures of spread, shape Data displays BIOS Descriptive Statistics
6 Measures of Location (Arithmetic) Mean Percentiles Median Mode Geometric mean BIOS Descriptive Statistics
7 Arithmetic mean Data: x 1, x 2,..., x n Mean: x = x 1 + x x n n = 1 n n i=1 x i BIOS Descriptive Statistics
8 Example Duration of hospital stay in days: x 1 = 5, x 2 = 10, x 3 = 6, x 4 = 11 Mean: x = 1 32 ( ) = 4 4 = 8 BIOS Descriptive Statistics
9 Reporting of decimals Report mean with one more significant digit than the observations Example: If x is measured in whole numbers and x = 6.345, report x = 6.3 BIOS Descriptive Statistics
10 Let c be any constant Properties of Mean If then If then y i = x i + c for i = 1, 2, 3,..., n, ȳ = x + c y i = cx i for i = 1, 2, 3,..., n, ȳ = c x BIOS Descriptive Statistics
11 Properties of Mean - Example A sample of birth weights in a hospital found 1 oz = g ȳ = grams Therefore the mean in ozs. is x = ȳ = BIOS Descriptive Statistics
12 Order statistics Data: x 1, x 2,..., x n Order data from smallest to largest x (1) x (2) x (n) x (1), x (2),..., x (n) are order statisitics Note x (1) = min{x 1, x 2,..., x n } x (n) = max{x 1, x 2,..., x n } BIOS Descriptive Statistics
13 Example Duration of hospital stay in days: x 1 = 5, x 2 = 10, x 3 = 6, x 4 = 11 Order statistics: x (1) = 5, x (2) = 6, x (3) = 10, x (4) = 11 BIOS Descriptive Statistics
14 Percentiles Intuitive definition: the x percentile is such that x% of the observations are less than that value Also known as sample quantile BIOS Descriptive Statistics
15 Percentiles: Text definition The (p 100) th percentile of a sample y (np+p) if np + p is an integer ˆζ p = {y ( np+p ) + y ( np+p ) }/2 otherwise for 0 < p < 1 Note: y is the greatest integer y; i.e., the floor function y is the smallest integer y; i.e., the ceiling function Cf Def 3.11 of text BIOS Descriptive Statistics
16 Percentiles: General form General form (Hyndman and Fan, Am Stat 1996) ˆζ p = (1 γ)y (j) + γy (j+1) where j = pn + m for some m R and 0 γ 1. Let g = pn + m j If m = p and γ = 0 if g = 0 1/2 if g > 0 then j = pn + p and we recover text definition BIOS Descriptive Statistics
17 Percentiles: Software SAS Proc Univariate: 5 definitions of percentile R: 9 definitions Claim: none of these match the book definition BIOS Descriptive Statistics
18 R quantile() function >?quantile quantile package:stats R Documentation Sample Quantiles Description: The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1. Usage: quantile(x,...) ## Default S3 method: quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7,...) Arguments: BIOS Descriptive Statistics
19 x: numeric vectors whose sample quantiles are wanted. probs: numeric vector of probabilities with values in [0,1]. na.rm: logical; if true, any NA and NaN s are removed from x before the quantiles are computed. names: logical; if true, the result has a names attribute. FALSE for speedup with many probs. Set to type: an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used....: further arguments passed to or from other methods. Types: quantile returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in x at probabilities in probs. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by type, is employed. BIOS Descriptive Statistics
20 Percentiles: Class Definition The (p 100) th percentile of a sample: y ( np +1) if np is not an integer ˆζ p = {y (np) + y (np+1) }/2 if np is an integer for 0 < p < 1 Defintion 2 of R/Hyndman and Fan: m = 0 and 1 if g > 0 γ = 1/2 if g = 0 Defintion 5 of SAS BIOS Descriptive Statistics
21 Example Suppose n = 278 and we want the 75th percentile R such that > x <- 1:278 > quantile(x,.75,type=2) 75% 209 np = = ˆζ.75 = x (209) BIOS Descriptive Statistics
22 Example: SAS data; infile "H:/WWW/bios/662/2007fall/percentile.txt"; input x; proc univariate; var x; run; The UNIVARIATE Procedure Variable: x Quantiles (Definition 5) Quantile Estimate 75% Q % Median % Q % % % 3.0 0% Min 1.0 BIOS Descriptive Statistics
23 Median The sample median is the 50th percentile y ( n+1 2 ˆζ ) if n is odd.5 = {y (n/2) + y (n/2+1) }/2 if n is even for 0 < p < 1 BIOS Descriptive Statistics
24 Example Duration of hospital stay in days: x 1 = 5, x 2 = 10, x 3 = 6, x 4 = 11 Median: ˆζ.5 = {x (2) + x (3) }/2 = (6 + 10)/2 = 8 BIOS Descriptive Statistics
25 Mode The mode is the most frequently occurring value in the data set E.g., if then mode is 11 x 1 = 5, x 2 = 11, x 3 = 6, x 4 = 11 BIOS Descriptive Statistics
26 Geometric Mean Data: x 1, x 2,..., x n The geometric mean of x is x g = (x 1 x 2 x n ) 1/n Let y i = log(x i ) for i = 1, 2,..., n. Then x g = exp(ȳ) x g is used when data are of the form c k Eg, suppose x 1 = 10 and x 2 = 0.1. Then x g = 1 BIOS Descriptive Statistics
27 Comments Mean is most often used measure Median is better if there are influential observations (more robust to extreme values) Mode rarely used (exception: nominal data) BIOS Descriptive Statistics
28 Example Duration of hospital stay in days: x 1 = 5, x 2 = 10, x 3 = 6, x 4 = 11 ˆζ.5 = x = 8, x g = 7.6 Alter last observation: x 1 = 5, x 2 = 10, x 3 = 6, x 4 = 50 ˆζ.5 = 8, x = 17.7, x g = 11.1 BIOS Descriptive Statistics
29 Descriptive Statistics Types of variables Measures of location Measures of spread, shape Data displays BIOS Descriptive Statistics
30 Measures of Spread, Shape Range Variance and standard deviation Interquartile range Skewness, Kurtosis BIOS Descriptive Statistics
31 Range Range: r a = x (n) x (1) Easy to calculate Sensitive to unusual observations (outliers) Usually, the larger n is, the larger r a BIOS Descriptive Statistics
32 Sample Variance and Standard Deviation Want to measure deviation from mean Sample variance s 2 = 1 n 1 n i=1 (x i x) 2 = 1 n 1 n i=1 x 2 i n x2 Sample standard deviation s = s 2 BIOS Descriptive Statistics
33 Sample Variance and Standard Deviation An alternative form of the sample variance is s 2 1 = 1 n (x n i x) 2 i=1 Can show s 2 is unbiased for population variance σ 2, however E(s 2 1 ) = σ2 σ2 n van Belle et al. argue for s 2 based on d.f. (Note 3.5) BIOS Descriptive Statistics
34 Sample Standard Deviation The units of s are the same as the units of x i If s is large, the data are spread over a wide range Report the standard deviation with two more significant digits than the original observations BIOS Descriptive Statistics
35 Properties of the Standard Deviation If c is a constant and y i = x i + c, then s y = s x If then y i = cx i s y = cs x BIOS Descriptive Statistics
36 Some approximations The interval x ± s will contain approx 68% of the observations The interval x ± 2s will contain approx 95% of the observations Approx s by Note s ˆζ.75 ˆζ ˆζ.75 ˆζ.25 is called interquartile range BIOS Descriptive Statistics
37 Symmetry and Skewness Informally, define symmetry to indicate having a uniform or even distribution about the mean If a distribution is symmetric, mean=median Data sets that are not symmetric are said to be skewed Skewness is a measurement of the degree to which a data set is skewed BIOS Descriptive Statistics
38 Skewness Define rth sample moment about the mean m r = i (y i ȳ) r for r = 1, 2, 3,... n Text definition of sample skewness: a 3 = m 3 (m 2 ) 3/2 = i (y i ȳ) 3 /n { i (y i ȳ) 2 /n} 3/2 = n i (y i ȳ) 3 { i (y i ȳ) 2 } 3/2 Typo in text page 51 SAS Proc Univariate VARDEF=N BIOS Descriptive Statistics
39 Interpretation? Text: skewed to the right if mean is greater than mode Values of a 3 > 0 indicate... skewness to the right However, for {0, 2, 2, 3, 4} x = 2.2, mode equals 2, and skewness equals BIOS Descriptive Statistics
40 Alternative Definitions Another definition of skewness: b 3 = n n 1 i (y i ȳ) 3 n 2 { i (y i ȳ) 2 } 3/2 Default in SAS Many more definitions; cf Joanes and Gill (JRSS D 1998) BIOS Descriptive Statistics
41 Kurtosis Kurtosis is a measure of the flatness or peakedness of a distribution; degree of archedness; thickness of tails Text definition of sample kurtosis: a 4 = m 4 (m 2 ) 2 = i (y i ȳ) 4 /n i { i (y i ȳ) 2 /n} 2 = n (y i ȳ) 4 { i (y i ȳ) 2 } 2 Typo in text page 51 BIOS Descriptive Statistics
42 Kurtosis: SAS Proc Univariate VARDEF=N a 4 = 1 ( ) y i ȳ 4 3 n s i.e., i.e., Why minus 3? a 4 = (yi ȳ) 4 /n s 4 3 a 4 = m 4 (m 2 ) 2 3 BIOS Descriptive Statistics
43 Descriptive Statistics Types of variables Measures of location Measures of spread, shape Data displays BIOS Descriptive Statistics
44 Data display Simplest form is a line listing A frequency table gives the frequency of observations within a set of ordered intervals Intervals should be mutually exclusive and exhaustive 8 to 10 intervals is usually sufficient With the exception of the end intervals, the length of the intervals should be constant BIOS Descriptive Statistics
45 Frequency Table - Example: Table 3.6 Blood Pressure Native 1st 2nd < > Total BIOS Descriptive Statistics
46 Frequency Tables Table on previous slide example of empirical frequency distribution Difficult to compare blood pressure distributions due to different sample sizes Divide by sample size to get empirical relative frequency distribution BIOS Descriptive Statistics
47 ERFD - Example: Table 3.7 Blood Pressure Native 1st 2nd < > Total BIOS Descriptive Statistics
48 Empirical distribution function Def 3.9 The empirical cumulative distribution of a variable is a listing of the variable with the proportion of observations less than or equal to that value (cumulative proportion) Aka empirical distribution function (EDF) Does not necessarily entail binning BIOS Descriptive Statistics
49 ECD - Example Blood Pressure Native 1st 2nd < < Total BIOS Descriptive Statistics
50 ECD - Example ECD Native First Second ECD BP BP ECD ECD BP BP BIOS Descriptive Statistics
51 Graphs ECD/EDF Histogram Stem and leaf plot Box plot Trellis/conditional plots BIOS Descriptive Statistics
52 Histogram Data are divided into intervals as in a frequency table A histogram is a bar graph with the area of each bar equal to the relative frequency in the interval. Can compare histograms from samples of different size Intervals need not be the same width Beware effect of choice of interval width (Fig 3.1 text) BIOS Descriptive Statistics
53 > par(mfcol=c(1,2)) Histogram: Example (Fig 3.1 text) > hist(liver$albumin,col="gray",xlab="albumin (mg/dl)",breaks=7,freq=f,main="") > hist(liver$albumin,col="gray",xlab="albumin (mg/dl)",breaks=30,freq=f,main="") Density Density Albumin (mg/dl) Albumin (mg/dl) BIOS Descriptive Statistics
54 Sample Kurtosis Kurtosis= 1.79 Kurtosis= 2.98 Density Density x x Kurtosis= 4.09 Kurtosis= 7.96 Density Density x x BIOS Descriptive Statistics
55 Stem and Leaf Plot Stem consists of leading digits Leaves consist of last digit Example: x=496, stem=49, leaf=6 Make a column of stems from smallest to largest To the right of each stem, list in a row the leaves, in ascending order. Note: there will be one leaf for each observation BIOS Descriptive Statistics
56 Stem and Leaf Plot: Example > stem(liver$albumin) The decimal point is 1 digit(s) to the left of the BIOS Descriptive Statistics
57 > stem(liver$albumin,width=100) Stem and Leaf Plot: Example The decimal point is 1 digit(s) to the left of the BIOS Descriptive Statistics
58 > stem(liver$albumin,scale=2) The decimal point is 1 digit(s) to the left of the BIOS Descriptive Statistics
59 Stem and Leaf Advantage: visualize all (or almost all) of the data Disadvantage: loss of ordering of data set BIOS Descriptive Statistics
60 Box plot The top of the box is the 75th percentile (ˆζ.75 ); the bottom is the 25th percentile (ˆζ.25 ) A line through the box is drawn at the median BIOS Descriptive Statistics
61 Box plot The lines extending out of the box (whiskers) may extend to the 90th and 10th percentiles the largest and smallest values largest observation ˆζ x IQR; smallest observation ˆζ x IQR (text is wrong! cf Tukey 1977, Chambers et al 1983) Data beyond whiskers may be plotted individually BIOS Descriptive Statistics
62 Box plot: Example > boxplot(liver$albumin) BIOS Descriptive Statistics
63 Box plot What proportion of the data should we expect to be between the whiskers? If data normal, 95-98% for 6 n 20, 99% for n > 20 Ref: Hoaglin et al. (JASA 1986) Note so whiskers cover 1.5 IQR 1.5(1.35)s 2s ˆζ.5 ± 2.68s BIOS Descriptive Statistics
64 Box plot and Histogram Example Density Density x x BIOS Descriptive Statistics
65 Multivariate plots Describe relationships/associations between more than one variable Scatterplots Simple for two variables Add color, symbols for > 2 variables Trellis/conditional plots BIOS Descriptive Statistics
66 Scatterplot Example I y x BIOS Descriptive Statistics
67 Scatterplot Example II y z=0 z= x BIOS Descriptive Statistics
68 Trellis plots Solar.R Ozone Temperature Temperature Temperature Temperature BIOS Descriptive Statistics
69 Table or graph? Tables best suited for looking up specific information Graphs better for perceiving trends, making comparisons and predictions Ref Gelman et al (Amer Stat 2002) BIOS Descriptive Statistics
2 Exploring Univariate Data
2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting
More informationMath 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment
Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class
More informationSummarising Data. Summarising Data. Examples of Types of Data. Types of Data
Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017
More informationFrequency Distribution and Summary Statistics
Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationDescriptive Statistics
Petra Petrovics Descriptive Statistics 2 nd seminar DESCRIPTIVE STATISTICS Definition: Descriptive statistics is concerned only with collecting and describing data Methods: - statistical tables and graphs
More informationIntroduction to Computational Finance and Financial Econometrics Descriptive Statistics
You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow
More information9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives
Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical
More informationMeasures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean
Measure of Center Measures of Center The value at the center or middle of a data set 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) 1 2 Mean Notation The measure of center obtained by adding the values
More informationChapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1
Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and
More informationNumerical Descriptions of Data
Numerical Descriptions of Data Measures of Center Mean x = x i n Excel: = average ( ) Weighted mean x = (x i w i ) w i x = data values x i = i th data value w i = weight of the i th data value Median =
More informationLecture 2 Describing Data
Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms
More informationDescription of Data I
Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret
More informationExploratory Data Analysis
Exploratory Data Analysis Stemplots (or Stem-and-leaf plots) Stemplot and Boxplot T -- leading digits are called stems T -- final digits are called leaves STAT 74 Descriptive Statistics 2 Example: (number
More informationSTAT 113 Variability
STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2
More informationFundamentals of Statistics
CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct
More informationLecture Week 4 Inspecting Data: Distributions
Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your
More information3.1 Measures of Central Tendency
3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent
More informationExploring Data and Graphics
Exploring Data and Graphics Rick White Department of Statistics, UBC Graduate Pathways to Success Graduate & Postdoctoral Studies November 13, 2013 Outline Summarizing Data Types of Data Visualizing Data
More informationEmpirical Rule (P148)
Interpreting the Standard Deviation Numerical Descriptive Measures for Quantitative data III Dr. Tom Ilvento FREC 408 We can use the standard deviation to express the proportion of cases that might fall
More informationMeasures of Central Tendency Lecture 5 22 February 2006 R. Ryznar
Measures of Central Tendency 11.220 Lecture 5 22 February 2006 R. Ryznar Today s Content Wrap-up from yesterday Frequency Distributions The Mean, Median and Mode Levels of Measurement and Measures of Central
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationUnit 2 Statistics of One Variable
Unit 2 Statistics of One Variable Day 6 Summarizing Quantitative Data Summarizing Quantitative Data We have discussed how to display quantitative data in a histogram It is useful to be able to describe
More informationSampling and Descriptive Statistics
Sampling and Descriptive Statistics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Reference: 1. W. Navidi. Statistics for Engineering and Scientists.
More informationSTATISTICAL DISTRIBUTIONS AND THE CALCULATOR
STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationCategorical. A general name for non-numerical data; the data is separated into categories of some kind.
Chapter 5 Categorical A general name for non-numerical data; the data is separated into categories of some kind. Nominal data Categorical data with no implied order. Eg. Eye colours, favourite TV show,
More informationMonte Carlo Simulation (Random Number Generation)
Monte Carlo Simulation (Random Number Generation) Revised: 10/11/2017 Summary... 1 Data Input... 1 Analysis Options... 6 Summary Statistics... 6 Box-and-Whisker Plots... 7 Percentiles... 9 Quantile Plots...
More informationPopulations and Samples Bios 662
Populations and Samples Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-08-22 16:29 BIOS 662 1 Populations and Samples Random Variables Random sample: result
More informationChapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1
Chapter 3 Descriptive Measures Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1 Chapter 3 Descriptive Measures Mean, Median and Mode Copyright 2016, 2012, 2008 Pearson Education, Inc.
More informationHandout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25
Handout 4 numerical descriptive measures part Calculating Mean for Grouped Data mf Mean for population data: µ mf Mean for sample data: x n where m is the midpoint and f is the frequency of a class. Example
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More information1 Describing Distributions with numbers
1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write
More informationappstats5.notebook September 07, 2016 Chapter 5
Chapter 5 Describing Distributions Numerically Chapter 5 Objective: Students will be able to use statistics appropriate to the shape of the data distribution to compare of two or more different data sets.
More informationDescriptive Statistics
Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations
More informationOverview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution
PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations
More informationGraphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics
Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data
More informationSection 6-1 : Numerical Summaries
MAT 2377 (Winter 2012) Section 6-1 : Numerical Summaries With a random experiment comes data. In these notes, we learn techniques to describe the data. Data : We will denote the n observations of the random
More informationDescribing Data: One Quantitative Variable
STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive
More informationPercentiles, STATA, Box Plots, Standardizing, and Other Transformations
Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go
More informationChapter 3 Descriptive Statistics: Numerical Measures Part A
Slides Prepared by JOHN S. LOUCKS St. Edward s University Slide 1 Chapter 3 Descriptive Statistics: Numerical Measures Part A Measures of Location Measures of Variability Slide Measures of Location Mean
More informationGetting to know a data-set (how to approach data) Overview: Descriptives & Graphing
Overview: Descriptives & Graphing 1. Getting to know a data set 2. LOM & types of statistics 3. Descriptive statistics 4. Normal distribution 5. Non-normal distributions 6. Effect of skew on central tendency
More informationSimple Descriptive Statistics
Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency
More informationChapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.
1 3.1 Describing Variation Stem-and-Leaf Display Easy to find percentiles of the data; see page 69 2 Plot of Data in Time Order Marginal plot produced by MINITAB Also called a run chart 3 Histograms Useful
More informationSome Characteristics of Data
Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial
More informationIOP 201-Q (Industrial Psychological Research) Tutorial 5
IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,
More informationIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics 17.871 Types of Variables ~Nominal (Quantitative) Nominal (Qualitative) categorical Ordinal Interval or ratio Describing data Moment Non-mean based measure Center
More informationKING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section
KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 11: BUSINESS STATISTICS I Semester 04 Major Exam #1 Sunday March 7, 005 Please circle your instructor
More informationData that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.
Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer
More informationWeek 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.
Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.
More informationCHAPTER 2 Describing Data: Numerical
CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More informationDATA HANDLING Five-Number Summary
DATA HANDLING Five-Number Summary The five-number summary consists of the minimum and maximum values, the median, and the upper and lower quartiles. The minimum and the maximum are the smallest and greatest
More informationNOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS
NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS A box plot is a pictorial representation of the data and can be used to get a good idea and a clear picture about the distribution of the data. It shows
More informationTopic 8: Model Diagnostics
Topic 8: Model Diagnostics Outline Diagnostics to check model assumptions Diagnostics concerning X Diagnostics using the residuals Diagnostics and remedial measures Diagnostics: look at the data to diagnose
More informationStatistics (This summary is for chapters 18, 29 and section H of chapter 19)
Statistics (This summary is for chapters 18, 29 and section H of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x n =
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc COUNSELLING PSYCHOLOGY (2011 Admission Onwards) II Semester Complementary Course PSYCHOLOGICAL STATISTICS QUESTION BANK 1. The process of grouping
More informationSection3-2: Measures of Center
Chapter 3 Section3-: Measures of Center Notation Suppose we are making a series of observations, n of them, to be exact. Then we write x 1, x, x 3,K, x n as the values we observe. Thus n is the total number
More informationMATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)
LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of
More informationThe Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012
The Normal Distribution & Descriptive Statistics Kin 304W Week 2: Jan 15, 2012 1 Questionnaire Results I received 71 completed questionnaires. Thank you! Are you nervous about scientific writing? You re
More informationMath 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.
1 Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet. Warning to the Reader! If you are a student for whom this document is a historical artifact, be aware that the
More informationMonte Carlo Simulation (General Simulation Models)
Monte Carlo Simulation (General Simulation Models) Revised: 10/11/2017 Summary... 1 Example #1... 1 Example #2... 10 Summary Monte Carlo simulation is used to estimate the distribution of variables when
More informationSome estimates of the height of the podium
Some estimates of the height of the podium 24 36 40 40 40 41 42 44 46 48 50 53 65 98 1 5 number summary Inter quartile range (IQR) range = max min 2 1.5 IQR outlier rule 3 make a boxplot 24 36 40 40 40
More informationMath146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39
Source: www.mathwords.com The Greek Alphabet Page 1 of 39 Some Miscellaneous Tips on Calculations Examples: Round to the nearest thousandth 0.92431 0.75693 CAUTION! Do not truncate numbers! Example: 1
More informationStatistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)
Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x
More informationSOLUTIONS TO THE LAB 1 ASSIGNMENT
SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors: 2 20 Histogram of Pull Strengths (lb) Frequency 1 10 0 9 61 63 6 67 69 71 73
More informationChapter ! Bell Shaped
Chapter 6 6-1 Business Statistics: A First Course 5 th Edition Chapter 7 Continuous Probability Distributions Learning Objectives In this chapter, you learn:! To compute probabilities from the normal distribution!
More informationEngineering Mathematics III. Moments
Moments Mean and median Mean value (centre of gravity) f(x) x f (x) x dx Median value (50th percentile) F(x med ) 1 2 P(x x med ) P(x x med ) 1 0 F(x) x med 1/2 x x Variance and standard deviation
More informationNumerical Measurements
El-Shorouk Academy Acad. Year : 2013 / 2014 Higher Institute for Computer & Information Technology Term : Second Year : Second Department of Computer Science Statistics & Probabilities Section # 3 umerical
More information22.2 Shape, Center, and Spread
Name Class Date 22.2 Shape, Center, and Spread Essential Question: Which measures of center and spread are appropriate for a normal distribution, and which are appropriate for a skewed distribution? Eplore
More informationA LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]
1. a) 45 [1] b) 7 th value 37 [] n c) LQ : 4 = 3.5 4 th value so LQ = 5 3 n UQ : 4 = 9.75 10 th value so UQ = 45 IQR = 0 f.t. d) Median is closer to upper quartile Hence negative skew [] Page 1 . a) Orders
More informationDavid Tenenbaum GEOG 090 UNC-CH Spring 2005
Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,
More informationSteps with data (how to approach data)
Descriptives & Graphing Lecture 3 Survey Research & Design in Psychology James Neill, 216 Creative Commons Attribution 4. Overview: Descriptives & Graphing 1. Steps with data 2. Level of measurement &
More informationStatistics I Chapter 2: Analysis of univariate data
Statistics I Chapter 2: Analysis of univariate data Numerical summary Central tendency Location Spread Form mean quartiles range coeff. asymmetry median percentiles interquartile range coeff. kurtosis
More informationMeasures of Dispersion (Range, standard deviation, standard error) Introduction
Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample
More informationSTAT 157 HW1 Solutions
STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill
More informationECON 214 Elements of Statistics for Economists
ECON 214 Elements of Statistics for Economists Session 3 Presentation of Data: Numerical Summary Measures Part 2 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh
More informationy axis: Frequency or Density x axis: binned variable bins defined by: lower & upper limits midpoint bin width = upper-lower Histogram Frequency
Part 3 Displaying Data Histogram requency y axis: requency or Density x axis: binned variable bins defined by: lower & upper limits midpoint bin width = upper-lower 0 5 10 15 20 25 Density 0.000 0.002
More informationSTATS DOESN T SUCK! ~ CHAPTER 4
CHAPTER 4 QUESTION 1 The Geometric Mean Suppose you make a 2-year investment of $5,000 and it grows by 100% to $10,000 during the first year. During the second year, however, the investment suffers a 50%
More informationMath 227 Elementary Statistics. Bluman 5 th edition
Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract
Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise
More information1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:
1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Review of previous
More informationNumerical summary of data
Numerical summary of data Introduction to Statistics Measures of location: mode, median, mean, Measures of spread: range, interquartile range, standard deviation, Measures of form: skewness, kurtosis,
More informationMBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment
MBEJ 1023 Planning Analytical Methods Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment Contents What is statistics? Population and Sample Descriptive Statistics Inferential
More informationGetting to know data. Play with data get to know it. Image source: Descriptives & Graphing
Descriptives & Graphing Getting to know data (how to approach data) Lecture 3 Image source: http://commons.wikimedia.org/wiki/file:3d_bar_graph_meeting.jpg Survey Research & Design in Psychology James
More informationMaster of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree
CHINHOYI UNIVERSITY OF TECHNOLOGY SCHOOL OF BUSINESS SCIENCES AND MANAGEMENT POST GRADUATE PROGRAMME Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management
More informationPutting Things Together Part 2
Frequency Putting Things Together Part These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for, and are in
More informationMEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,
MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE Dr. Bijaya Bhusan Nanda, CONTENTS What is measures of dispersion? Why measures of dispersion? How measures of dispersions are calculated? Range Quartile
More informationDescriptive Statistics (Devore Chapter One)
Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf
More informationCSC Advanced Scientific Programming, Spring Descriptive Statistics
CSC 223 - Advanced Scientific Programming, Spring 2018 Descriptive Statistics Overview Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Checking for
More informationSTAB22 section 1.3 and Chapter 1 exercises
STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea
More informationChapter 3. Populations and Statistics. 3.1 Statistical populations
Chapter 3 Populations and Statistics This chapter covers two topics that are fundamental in statistics. The first is the concept of a statistical population, which is the basic unit on which statistics
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution
More informationSTAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model
STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good
More informationDescriptive Analysis
Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable
More informationSTAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model
STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good
More information