MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

Similar documents
CSC Advanced Scientific Programming, Spring Descriptive Statistics

3.1 Measures of Central Tendency

Numerical Descriptive Measures. Measures of Center: Mean and Median

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Descriptive Statistics

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

1.2 Describing Distributions with Numbers, Continued

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

ECON 214 Elements of Statistics for Economists

DESCRIPTIVE STATISTICS

Descriptive Statistics

Lecture 2 Describing Data

DATA SUMMARIZATION AND VISUALIZATION

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

CHAPTER 2 Describing Data: Numerical

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Fundamentals of Statistics

Summary of Statistical Analysis Tools EDAD 5630

Descriptive Analysis

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Basic Procedure for Histograms

PSYCHOLOGICAL STATISTICS

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Continuous Probability Distributions

2 Exploring Univariate Data

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Description of Data I

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

Numerical Descriptions of Data

Exploring Data and Graphics

Simple Descriptive Statistics

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

GEOMORPHIC PROCESSES Laboratory #5: Flood Frequency Analysis

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Measures of Central tendency

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

2 DESCRIPTIVE STATISTICS

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

VARIABILITY: Range Variance Standard Deviation

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

One Proportion Superiority by a Margin Tests

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Some Characteristics of Data

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

DATA HANDLING Five-Number Summary

M249 Diagnostic Quiz

Statistics vs. statistics

DazStat. Introduction. Installation. DazStat is an Excel add-in for Excel 2003 and Excel 2007.

STAB22 section 1.3 and Chapter 1 exercises

Lecture Week 4 Inspecting Data: Distributions

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

Lecture 1: Review and Exploratory Data Analysis (EDA)

Monte Carlo Simulation (Random Number Generation)

Edexcel past paper questions

Population Mean GOALS. Characteristics of the Mean. EXAMPLE Population Mean. Parameter Versus Statistics. Describing Data: Numerical Measures

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

The Mode: An Example. The Mode: An Example. Measure of Central Tendency: The Mode. Measure of Central Tendency: The Median

Descriptive Statistics (Devore Chapter One)

Copyright 2005 Pearson Education, Inc. Slide 6-1

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

CABARRUS COUNTY 2008 APPRAISAL MANUAL

NCSS Statistical Software. Reference Intervals

Introduction to Basic Excel Functions and Formulae Note: Basic Functions Note: Function Key(s)/Input Description 1. Sum 2. Product

Descriptive Statistics: Measures of Central Tendency and Crosstabulation. 789mct_dispersion_asmp.pdf

Chapter 6 Confidence Intervals

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

GOALS. Describing Data: Displaying and Exploring Data. Dot Plots - Examples. Dot Plots. Dot Plot Minitab Example. Stem-and-Leaf.

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

appstats5.notebook September 07, 2016 Chapter 5

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

1 Describing Distributions with numbers

The normal distribution is a theoretical model derived mathematically and not empirically.

Descriptive Statistics in Analysis of Survey Data

A CLEAR UNDERSTANDING OF THE INDUSTRY

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Statistics 511 Supplemental Materials

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Software Tutorial ormal Statistics

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Frequency Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

MgtOp S 215 Chapter 8 Dr. Ahn

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Probability distributions

Two-Sample T-Test for Superiority by a Margin

Data Analysis. BCF106 Fundamentals of Cost Analysis

Frequency Distribution and Summary Statistics

Applications of Data Dispersions

Data screening, transformations: MRC05

Transcription:

MBEJ 1023 Planning Analytical Methods Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment Contents What is statistics? Population and Sample Descriptive Statistics Inferential Statistics Statistical data analysis Contents Scales of Measurement Skewness Measure of Dispersion Using Excel 1

What is statistics Statistics consists of a body of methods for collecting and analyzing data Agresti & Finlay, 1997 Statistics Raw data What kind and how much data need to be collected? Quantitative techniques How should we organize and summarize the data? How can we analyse the data and draw conclusions from it? Meaningful information How can we assess the strength of the conclusions and evaluate their uncertainty? Population and Sample What kind and how much data need to be collected? Population is the collection of all individuals or items under consideration in a statistical study Weiss, 1999 Sample is that part of the population from which information is collected Weiss, 1999 2

Population and Sample Ideal survey: The sampled population= the target population For obvious reasons it is impossible A perfect sample: A scaled down version of the target population, mirroring every characteristic of the target populationp It is impossible A good sample: Reproduce the characteristics of the target population as closely as possible Descriptive Statistics How should we organize and summarize the data? Descriptive statistics consist of methods for organizing and summarizing information Weiss, 1999 Descriptive Statistics Central Tendency Mean, Median, Mode, Sum, Dispersion Std. deviation, Variance, Range, Minimum, Maximum, Distribution Normal, Chi-square, Binomial, Poisson, Geometric, Percentile Quartiles, Percentiles,. 3

Inferential Statistics How can we assess the strength of the conclusions and evaluate their uncertainty? Inferential statistics consist of methods for drawing and measuring the reliability of conclusions about population based on information obtained from a sample of the population Weiss, 1999 Inferential Statistics Point Estimation Interval Estimation Hypothesis Testing Confidence level Margin of error Statistical data analysis How can we analyse the data and draw conclusions from it? Scale of measurement Number of groups Nature of the relationship between groups Number of variables Assumptions of statistical tests 4

Statistical data analysis Begin Formulate the research problem Define population and sample Collect the data Do descriptive data analysis Use appropriate statistical methods to solve the research problem Report the results End Basic mathematical notations Variable Number of Observations, n Counter Variable A variable can be defined as a known characteristic or phenomenon of a population or sample. Variable Quantitative (e.g. height, income, etc.) Qualitative (e.g. gender, religion, etc.) Student s Weight (in kg) = {56, 45, 65, 47, 50} w = {56, 45, 65, 47, 50} Uppercase variable Population s characteristic Lowercase variable Sample s characteristic 5

Number of Observations, n x = {44, 71, 55, 32, 27} y = {3.5, 2.7, 3.0, 4.5, 5.2} z = {-8, 6, -4} For x and y, n = 5 For z, n = 3 Counter b = {90, 85, 76, 92, 85, 53, 74, 85, 90, 66} n = 10 To avoid misunderstanding same values for different observations, mathematicians use counter to refer to the individual value in a set of observations A counter normally is represented by the letters i, j and k b i i = 2 b 2 = 85 i = 3 b 3 = 76 Example Counter c = {91, 86, 77, 93, 86, 54, 75, 86, 95, 67, 80} a) i = 5 86 b) i = n 80 c) i = 1, 2,, n 91, 86, 77, 93, 86, 54, 75, 86, 95, 67, 80 d) c 9 95 e) a = {c 3, c 8 } a = {77, 86} Descriptive Statistics Central Tendency Mean, Median, Mode, Sum, Dispersion Std. deviation, Variance, Range, Minimum, Maximum, Distribution Normal, Chi-square, Binomial, Poisson, Geometric, Percentile Quartiles, Percentiles,. When your data are described correctly and adequately, everybody will have an insight on the features of your data Descriptive statistics help us to simplify large amounts of data in a sensible way. For instance, the Grade Point Average (GPA) describes the general performance of a student across a potentially wide range of course experiences. 6

Scales of Measurement The ways that numbers are being assigned to observations Measurement is basically the process of assigning numbers to observations according to certain rules Sprinthall, 2000 Scales of Measurement Nominal Establish identity (Apartment has pool = 1, Apartment does not have pool =0) Ordinal Place into an order and ranking (Apartments are ranked according to their prices) Interval Position along a continuous scale The scale does not have absolute zero (we cannot talk about no temperature) Ratio Measure on a ratio scale (Floor area) Zero has meaning Zero denotes the absence of something 1. Magnitude the ability to be counted 2. Order the ability to be ranked 3. Interval having equal distance 4. Rational zero a number zero on the scale that is meaningful The location of the distribution is assessed by its central tendency Central tendency, by definition, is a typical or representative score Mode Median Mean 7

Mode The mode is the most frequently occurring score value Distance, d, between home and workplace (in km) d --- 9 11 12 12 13 14 14 14 15 16 17 M o = 14 The mode may be seen on a frequency distribution as the score value which corresponds to the highest point Mode A distribution may have more than one mode Age, a, when started working (in years) a --- 19 20 20 21 22 23 23 24 Such distributions are called bimodal M o = 20, 23 Median The median is the score value which cuts the distribution in half, such that half the scores fall above the median and half fall below it (1) Order the scores from lowest to highest (2) If there are odd numbers of scores M d = X i i = (N + 1) / 2 (2) If there is an even number of scores M d = (X i + X i ) / 2 1 2 i 1 = N / 2 i 2 = (N + 2) / 2 d --- 9 11 12 12 13 14 14 14 15 16 17 a --- 19 20 20 21 22 23 23 24 There are odd numbers of scores (11) i = (11 + 1) / 2 = 6 M d = d 6 M d = 14 There is an even number of scores (8) i 1 = 8 / 2 = 4 i 2 = (8 + 2) / 2 = 5 M d = (a 4 + a 5 ) / 2 M d = (21 + 22) / 2 M d = 21.5 8

Calculating the median for the class interval data Monthly household income (in RM) Household Income No. of Households (X) (f) < RM1,000 10 RM1,001 RM2,000 18 RM2,001 RM3,000 45 RM3,001 RM4,000 22 > RM4,000 5 M d = the median ll = lower exact limit containing the n(0.50) score n = total number of scores cf = cumulative frequency of scores above the interval containing the n(0.50) score f i = frequency of scores in the interval containing the n(0.50) score w = width of the class interval Calculating the median for the class interval data Monthly household income (in RM) Household Income Exact Limits No. of Cumulative (X) Households Frequency (f) (cf) < 1,000 0 1000.5 10 10 1,001 2,000 1000.5 2000.5 18 28 2,001 3,000 2000.5 3000.5 45 73 3,001 4,000 3000.5 4000.5 22 95 > 4,000 4000.5-5 100 Class Boundaries are the midpoints between the upper class limit of a class and the lower class limit of the next class in the sequence n = 100 n (0.50) = 50 II = 2000.5 cf = 28 f i = 45 w = 1000 M d = 2000.5 + ((50 28 ) / 45) 1000 M d = 2489.4 M d = the median ll = lower exact limit containing the n(0.50) score n = total number of scores cf = cumulative frequency of scores above the interval containing the n(0.50) score f i = frequency of scores in the interval containing the n(0.50) score w = width of the class interval Mean The population mean is the sum of the observations divided by the population size = The population mean, N = Population size a --- 19 20 20 21 22 23 23 24 = Sample mean, n = Sample size = (19 + 20 + 20 + 21 + 22 + 23 + 23 + 24) / 8 = 21.5 9

Mean Sometimes, we are given data arranged in frequency table Age No. of Respondents (x) (f) 19 5 20 4 21 5 22 3 23 1 24 2 = the sample mean x i = individual observation f i = class frequency n = the number of classes = (19 (5) + 20 (4) + 21 (5) + 22 (3) + 23 (1) + 24 (2)) / 5 + 4 + 5 + 3 + 1 + 2 = 417 / 20 = 20.85 Mean Sometimes, data are further summarised using class intervals Age, x Frequency, f 18 20 9 21 23 9 24-26 2 Class Intervals Midpoint, x i Frequency, f i x i f i 18 20 19 9 171 21 23 22 9 198 24 26 25 2 50 Sum, 20 419 = the sample mean x i = the midpoint of the class interval f i = class frequency n = the number of class intervals = 419 / 20 = 20.95 Choosing Appropriate Measure of Central Tendency Measurement Scale Nominal Ordinal Interval/Ratio Measure of Central Tendency Mode: The value that appears most often in a distribution. Median: The value that t divides id the distribution ib ti of responses into two equal size groups (the value of the 50th percentile). Mode and Median Mean: The arithmetic average of a distribution. 10

Skewness Mode Median M o M d Mean The three measures of location can be used together to describe the central tendency of a distribution Skewness Symmetrical distribution It described a distribution that is normally distributed. The concept of Normal distribution is used in many statistical analysis and tests. Symmetrical distribution has a zero skewness. A positively skewed distribution occurs when both the mode and the median are located to the left of the mean If the mode and the median are located to the right of the mean, then we have a distribution that is negatively skewed. Skewness Pearson s Index of Skewness (I) = the sample mean = the median = the sample standard deviation 11

Skewness Example The sample mean of a set of data is 3.45, the median is 4.00 and the sample standard deviation is 1.22. Compute the Pearson s Index of Skewness, and determine if the data is symmetrically distributed. = 3.45 = 4 = 1.22 I = 3 (3.45 4) / 1.22 I = -1.35 The distribution is not symmetric around the mean. The distribution is negatively skewed. Measure of Dispersion The range The variance The standard deviation Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample Measure of Dispersion The range Range = Largest - smallest Distribution 1: 32 35 36 37 38 40 42 42 43 43 45 Range = 45 32 = 13 Distribution 2: 32 32 32 32 34 34 34 34 34 35 45 Range = 45 32 = 13 The range is greatly affected by extreme scores The range is not the most important measure of variability 12

Measure of Dispersion The variance The standard deviation The population variance ( within the population. ) is a measure of variability between observations X = the individual observation in the population = the population mean N = the size of the population The population standard deviation ( ) is the positive square root of the variance A small variance indicates that the data tends to be very close to the mean and hence to each other, while a high variance indicates that the data is very spread out around the mean and from each other. Measure of Dispersion The variance The standard deviation The sample variance Steps to compute the Variance Step 1 - Find the mean of the scores. Step 2 - Subtract the mean from every score. Step 3 - Square the results of Step 2. = Sample mean x i = individual observation n = Sample size The sample standard deviation Step 4 - Sum the results of Step 3. Step 5 - Divide the results of Step 4 by n-1. Step 6 - Take the square root of Step 5. The result at Step 5 is the sample variance. The sample standard deviation is obtained in Step 6. Measure of Dispersion The variance The standard deviation Example i x i x i -x (x i x) 2 1 3-4 16 2 6-1 1 3 8 1 1 4 8 1 1 5 10 3 9 Total, 35 0 28 Step 5 - Divide the results of Step 4 by n-1. S 2 = 28 / (5 1) = 7 Step 6 - Take the square root of Step 5. Step 1 - Find the mean of the scores. _ x = 35 / 5 = 7 Step 2 - Subtract the mean from every score. _ x i -x Step 3 - Square the results of Step 2. _ (x i x) 2 Step 4 - Sum the results of Step 3. _ (x i x) 2 = 28 13

Measure of Dispersion The variance The standard deviation The standard deviation measures variability in units of measurement, while the variance does so in units of measurement squared. For example, if one measured height in inches, then the standard deviation would be in inches, while the variance would be in inches squared. For this reason, the standard deviation is usually the preferred measure when describing the variability of distributions. The variance, however, has some unique properties which make it very useful later on in the course. Exercise: Use the following data and calculate the variance and the standard deviation. Age No. of Respondents (x) (f) 19 5 20 4 21 5 22 3 23 1 24 2 If the number of data that we need to process exceeds a certain limit, we will find that even the simplest data analysis will be troublesome. There are various applications range from the very general spreadsheet applications like MS Excel and Lotus 1-2-3 to a more advanced statistical applications like SPSS, Minitab, S-Plus and SAS to solve this problem. The spreadsheet applications are easier to learn but they lack advanced statistical functions. Statistical applications like SPSS have more data analysis capabilities but require advanced mathematical knowledge. Various applications share some common steps in performing statistical analysis 14

Perform Data Entry To enter a fresh new set of data, you can select the Type in data radio button and click OK. An easier approach, however, is just to click CANCEL. The title bar The menu bar The tool bar Column heading Once you are in the data editor, you can enter your data A variable can be in many forms such as numerical, strings, date, The number of decimal places that SPSS will display Which numbers represent which categories (for discrete data of both nominal and ordinal levels of measurement). For example, you could assign the labels 'Male and 'Female to the numeric values 1 and 2 The width of a variable is the number of characters SPSS will allow to be entered for the variable A string of text to indentify in more detail what a variable represents To name a column, just go to the Variable View 15

Tell SPSS what to do when encounter missing values in our data file. The columns property tells SPSS how wide the column should be for each variable. Don't confuse this one with width, which indicates how many digits of the number will be displayed. The column size indicates how much space is allocated rather than the degree to which it is filled. Indicates whether the information in the Data View should be left-justified, rightjustified, or cantered Our data is a scale, ordinal, or nominal data To name a column, just go to the Variable View It is always good computing practice to frequently save your data. In SPSS, almost all statistical analysis that you want to perform are located in the Analyze menu. The same goes to the descriptive statistical analysis that we want to conduct. To proceed with our descriptive statistical analysis, select the menu Analyze > Descriptive Statistics > Descriptives 16

To customize the analysis to be performed, click on the Options Now, a dialog box will appear prompting to you to select the variable(s) that you want to describe. To do this, click on the variable. With the variable x selected, click on the arrow to move the variable x into the selected variable(s) box on the right For this exercise, we will select several statistics that we have covered thus far. These options are Mean, Standard (Std.) Deviation, Variance, Range, Minimum and Maximum. Among these options only Mean is used to measure the central tendency, whereas the other statistics are used to measure the degree of dispersion. Using Excel Excel and SPSS use the cell paradigm Excel uses the workbook paradigm where a single workbook can contain many data sheets. Excel is a spreadsheet application and its greatest use is when you have a lot of data to manipulate. The spreadsheet applications are easier to learn but they lack advanced statistical functions. 17

Using Excel Data Entry Simply enter your data The data sheet in this example does not have any more space at the top of the sheet to insert our column header. We can solve this by inserting a new row at the top. Using Excel Column border To insert a new row in Excel, first right-click on the row number, above which you want to insert a new row. Among the options available in this pop-up menu is one called Insert. Click Insert and a new row is automatically inserted on-top of row. Double-clicking the column border will increase the width of the column to fit the widest of the text in that column. You can save your data by selecting the menu File > Save or by clicking the button. Using Excel In Excel, data manipulation is achieved through entering a set of formulas. To enter a formula in a cell, you must start by typing the equal ( = ) sign. If the equal ( = ) is not entered, Excel will treat the formula as text which means that no computation will be performed. Functions in Excel for computing descriptive statistics Functions Formula Examples Mean =Average(Cells) =Average(A2:A6) Mode =MODE(Cells) =MODE(A2:A6) Median =MEDIAN(Cells) =MEDIAN(A2:A6) Minimum =MIN(Cells) =MIN(A2:A6) Maximum =MAX(Cells) =MAX(A2:A6) Variance =VAR(Cells) =VAR(A2:A6) Standard Deviation =STDEV(Cells) =STDEV(A2:A6) To perform computation in Excel, first select the cell where you want the result to appear. Then, write the formula by first typing the equal ( = ) sign in the cell (or, in the formula box). Press Enter and the result of the computation will be shown in the cell that you have selected earlier. 18

Using Excel Formula box To refer to a cell, you need not type the cell number. Instead, you can click the cell which you want to use and the cell number will be inserted in the formula. This way, you can avoid referring to the wrong cell. We can use formulas that we ourselves defined for computing statistics that are not defined by Excel. Thank you Dr. Mehdi Moeinaddini mehdi@utm.my 19