Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Similar documents
Lecture 2 Describing Data

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

STAT 113 Variability

appstats5.notebook September 07, 2016 Chapter 5

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

3.1 Measures of Central Tendency

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Fundamentals of Statistics

Descriptive Statistics

DATA SUMMARIZATION AND VISUALIZATION

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Section 6-1 : Numerical Summaries

Some estimates of the height of the podium

Numerical Descriptions of Data

Some Characteristics of Data

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

CHAPTER 2 Describing Data: Numerical

Description of Data I

The normal distribution is a theoretical model derived mathematically and not empirically.

ECON 214 Elements of Statistics for Economists 2016/2017

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

Chapter 7 1. Random Variables

Lecture Data Science

Basic Procedure for Histograms

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

2011 Pearson Education, Inc

Lecture 1: Review and Exploratory Data Analysis (EDA)

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Unit 2 Statistics of One Variable

Lecture Week 4 Inspecting Data: Distributions

Frequency Distribution and Summary Statistics

2 Exploring Univariate Data

Describing Data: One Quantitative Variable

CSC Advanced Scientific Programming, Spring Descriptive Statistics

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Probability distributions

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

1 Describing Distributions with numbers

Math 227 Elementary Statistics. Bluman 5 th edition

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

DESCRIBING DATA: MESURES OF LOCATION

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

ECON 214 Elements of Statistics for Economists

4. DESCRIPTIVE STATISTICS

Example: Histogram for US household incomes from 2015 Table:

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

The Normal Distribution

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

Statistical Methods in Practice STAT/MATH 3379

SOLUTIONS TO THE LAB 1 ASSIGNMENT

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

CHAPTER 6 Random Variables

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Chapter 4. The Normal Distribution

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Section3-2: Measures of Center

Chapter 5: Probability models

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Numerical Measurements

Probability Distribution Unit Review

Random Variables. 6.1 Discrete and Continuous Random Variables. Probability Distribution. Discrete Random Variables. Chapter 6, Section 1

Chapter 6: Random Variables

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

AP Statistics Chapter 6 - Random Variables

STA 248 H1S Winter 2008 Assignment 1 Solutions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

x is a random variable which is a numerical description of the outcome of an experiment.

Statistics I Chapter 2: Analysis of univariate data

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Descriptive Analysis

1/2 2. Mean & variance. Mean & standard deviation

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Continuous Probability Distributions

Putting Things Together Part 2

Edexcel past paper questions

NOTES: Chapter 4 Describing Data

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

Copyright 2005 Pearson Education, Inc. Slide 6-1

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

DATA HANDLING Five-Number Summary

Theoretical Foundations

The topics in this section are related and necessary topics for both course objectives.

Applications of Data Dispersions

Math Take Home Quiz on Chapter 2

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Monte Carlo Simulation (Random Number Generation)

Lecture 6: Chapter 6

Transcription:

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion. A way you can get some confirmation that you're on the right track. 4 steps: 1. Recognise the problem. 2. Gather data to help understand and solve the problem. 3. Analyse and present the data 4. Act on the analysis. Parameter: numerical measure that describes a characteristic of a population. Statistic: numerical measure that describes a characteristic of a sample. Descriptive Data: Collecting, summarising and presenting data. 1. Collect Data. Eg. Survey 2. Summarise/Characterise Data. 3. Present Data. Inferential Statistics: Drawing conclusion about a population based on sample results. 1. Estimation 2. Hypothesis testing Data Types Categorical (non-numerical/qualitative) Nominal (labels that do not imply order) eg. Yes/no. Ordinal (values that are still labels but have order) eg. HD/D/C/P/N Categorical data CAN be coded numerically. Eg. Option 1,2,3,4,5. Numerical (quantitative) Discrete (counting process) eg. How many children. Continuous (measured) eg. time Data can be grouped or ungrouped. Grouped (Observations are grouped into classes eg. $30k - $50k) Ungrouped

Week 2 Numerical Data: An example of numeric variable is salary. How are salaries distributed across different people? To answer this, ask these 5 questions. 1. What is the average salary? 2. How spread out are the salaries? (variance) 3. What are the extreme salaries at either end? (outliers) 4. Is the distribution of salary symmetric or skewed? 5. Does the distribution of salary have any other important features? Measures of Central Tendency: Mean = average Most used measure of central tendency. Very affected by extreme values. Aggregated distance of data values from the typical value is lowest if that 'typical' value is the mean. Median = midpoint of ranked values. You have to rank the data first. Middle or middle mean of middle 2 values. Position of median =!"# $ Mode = Most frequently observed value. You can have more than 1 mode. Mode can be used for nominal data. Quartiles: Position of quartiles: Q1 =!"# % Q2 =!"# $ Q3 = &!"# % N= number of data. Percentiles: partition a set of data. Week 3 Measures of Variation: Range: simple measure of variation. The range is the difference between the largest and the smallest. Largest - Smallest. Ignores the distribution of the data. Sensitive to outliers. Interquartile Range (IQR): 3rd Quartile - 1st quartile. Resistant to outliers. Range of the middle 50% o the data. Using boxplots is a good way of describing numerical data.

To summarise a set of data: 1. Measure of average (mean, median, mode) 2. Measures of average. Measures of Variation: Standard deviation squared. Each value in data contributes to it. It is sensitive to outliers. Standard deviation: Square root of variance. Easier to interpret. Shape of Distribution: 2 good ways to examine the distribution of numerical variables: 1. Histogram 2. Boxplot now, how to describe distribution? Shapes of distribution: Symmetrical: Has a single peak. Looks approx. the same left and right. Positively skewed/right skewed: When the tail is toward the right, it is right skewed. Negatively skewed/ Left skewed: When the tail is toward the left.

Multimodal: Bimodal distribution. Has two peaks, not necessarily equal height. In this case, split the data into 2 sets and analyse separately. Probability The link between descriptive and inferential statistics. Probability: a numerical value that represents the chance, likelihood, possibility that an event will occur. Event: each possible outcome of a variable. The probability that random variable X is equal to a particular value x is denoted: P(X = x) Probabilities p(x) are estimated from relative frequencies. Eg. 3/60 = 0.05 All probabilities must lie between 1 and 0. And the sum of all probabilities must equal 1. The Binomial Distribution: 1. The experiment consists of n trials. 2. Two possible outcomes of each trial. Success/Failure. 3. The probability of success is identical at each trial. 4. Trials are independent. Eg. Experiment: toss a coin 3 times. 1. A trial is 1 toss of a coin. N = 3 2. We are interested in the number of heads. Head = success 3. P(success) = 0.5. P(failure) = 0.5 4. Trails are independent because the outcomes of one toss is independent of the outcome of another. Random variable X is the number of heads. = binomial distributed. See table 4a and 4b in formula and statistic tables on Moodle. 4a gives point probabilities. 4b gives you cumulative probabilities. Or. Use excel's statistical function BINOM.DIST: EG. Where x is binomially distributed, n = 10, find P(3<X<8) = P(X 7) - P(X 3)

Week 4 The Continuous Distribution (recall discrete random variables) Toss a coin 3 times and look at number of heads (x) X = 0,1,2 or 3. We can calculate P(X = a particular value) Eg. P(X=3) = 0.125 A continuous random variable: Has an uncountable infinite number of values. Not any exact number. Can assume any value in the interval (between 2 points) Eg. Survey of women's heights. Height of randomly selected woman. A continuous random variable. X may take any value. Its not useful to consider that X will equal an exact number However, it is sensible to consider that X will lie within a range. Eg. P(161.5 < X < 162.5) The Probability Density Function: Organise the data into class intervals of 5cm. Plot the corresponding RELATIVE FREQUENCY: When you reduce the class interval (make it smaller) it will make the graph smoother. Continuous distribution has a continuum of possible values. Eg, X = all values between 0 and 100 or X = all values greater than 0. Then, the total probability of 1 is spread over this continuum. F(x) measure probability density. The interval X values which are more likely to occur are shown in the regions of the graph where the probability density is larger. The larger the density, the more probable that it will occur there. Total area between the graph of f(x) and horizontal axis represents the total probability = 1. The Normal Distribution The most important probability Distribution in statistics. This is because many data sets have a histogram that is well described by the normal distribution. Properties: Symmetrical, unimodal. Mean = median (approx.) Modal class is in the region of the mean(median) The curve extends to ± infinity in both directions. The distribution is completely defined by two parameters. o Mean and Variance Expressed as X ~ N(mean, variance)