AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Similar documents
Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

DATA SUMMARIZATION AND VISUALIZATION

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Section Introduction to Normal Distributions

Chapter 7 1. Random Variables

UNIT 4 MATHEMATICAL METHODS

2 Exploring Univariate Data

Basic Procedure for Histograms

DATA HANDLING Five-Number Summary

2011 Pearson Education, Inc

MAS187/AEF258. University of Newcastle upon Tyne

STAT 113 Variability

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Chapter 4. The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 6. The Normal Probability Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Chapter 6 Simple Correlation and

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

The Normal Distribution

Math 227 Elementary Statistics. Bluman 5 th edition

Los Angeles Unified School District Division of Instruction Financial Algebra Course 2

Counting Basics. Venn diagrams

Frequency Distribution and Summary Statistics

MAKING SENSE OF DATA Essentials series

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 9. Probability Distributions. Outline. Outline

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Lecture 1: Review and Exploratory Data Analysis (EDA)

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Lecture 9. Probability Distributions

Business Statistics 41000: Probability 3

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base

Random Variables and Probability Distributions

The Normal Probability Distribution

Describing Data: One Quantitative Variable

Normal Probability Distributions

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

AP Statistics Chapter 6 - Random Variables

1 Describing Distributions with numbers

3.1 Measures of Central Tendency

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Prob and Stats, Nov 7

Lecture 2 Describing Data

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Probability & Statistics Modular Learning Exercises

Some estimates of the height of the podium

Probability Distribution Unit Review

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

CHAPTER 2 Describing Data: Numerical

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Numerical Descriptions of Data

Some Characteristics of Data

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Statistics for Business and Economics

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Section3-2: Measures of Center

MAS187/AEF258. University of Newcastle upon Tyne

Review of the Topics for Midterm I

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Part V - Chance Variability

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

1 SE = Student Edition - TG = Teacher s Guide

Chapter 4 Variability

Descriptive Statistics

AP Stats Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Variance, Standard Deviation Counting Techniques

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

appstats5.notebook September 07, 2016 Chapter 5

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Business Statistics 41000: Probability 4

Description of Data I

BUSINESS MATHEMATICS & QUANTITATIVE METHODS

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

CSC Advanced Scientific Programming, Spring Descriptive Statistics

The normal distribution is a theoretical model derived mathematically and not empirically.

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Statistical Methods in Practice STAT/MATH 3379

Describing Uncertain Variables

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

Introduction to Statistical Data Analysis II

CHAPTERS 5 & 6: CONTINUOUS RANDOM VARIABLES

Exploratory Data Analysis

4: Probability. What is probability? Random variables (RVs)

8. From FRED, search for Canada unemployment and download the unemployment rate for all persons 15 and over, monthly,

IOP 201-Q (Industrial Psychological Research) Tutorial 5

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Transcription:

AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes, Test Reviews, and Tests! CHAPTER 1: Exploring Data Distribution Range Spread Frequency Outlier Center Shape Skewed Left Skewed Right Symmetric Dot Plot Histogram Stemplot Split Stems Back-to-back Stemplot Time plot Mean x Nonresistant Median Resistant Interquartile Range Quartiles Q 1, Q 3 A. Graphing one variable (univariate) data ALWAYS PLOT DATA AND LABEL AXES CAREFULLY 1. Categorical data nominal scale, divided into groups, qualitative properties Bar graphs (bars do not touch) Pie charts (percentages must sum to 100%) 2. Quantitative data rational scale (can +,,,, with numbers describing data) Dot plots can resemble probability curves Stem (& leaf) plots remember to put in the key (ex: 8 2 means 82) When is it advantageous to split stems on a stemplot? When is a back to back stemplot useful? Histogram put // for breaks in axis, use no fewer than 5 classes (bars), check to see if scale is misleading, look for symmetry & skewness Ogive cumulative frequency plot Time plot used for seasonal variation where the x-axis is time Box plot modified shows outliers Side by side are good for comparing quartiles, medians and spread B. Summary statistics for one variable data (use calculator with 1-variable stats) 1. Measures of central tendency (center) Mean μ or x median (middle) mode (most) 2. Measures of spread range (max min) quartile (25% = Q 1, 75% = Q3) IQR: interquartile range (Q 3 Q 1 ) variance s 2 = (x x )2 or σ 2 = (x μ)2 n 1 n standard deviation square root of variance σ or s

What does standard deviation measure? When does standard deviation equal zero? What is the relationship between variance and standard deviation? Mean, range, variance, and standard deviation are non-resistant measures (strongly influenced by outliers). How does adding a constant to data affect the mean, median, IQR, and range of data? Use variance and standard deviation with approximately normal distributions only. Remember: the mean skews toward the tail. When examining a distribution you can describe the overall pattern by its: SOCS or CUSS When does an observation become an outlier? CHAPTER 2: The Normal Distributions Density Curve μ (mu) σ (sigma) Outcomes Simulation Normal Curve Normal Distribution Inflection Point 68-95-99.7 Rule Percentile N(μ, σ) Standardized Value Z-scores Standard Normal Distribution Normal Probability Plot 1. If the mean = 0 and the standard deviation = 1, this is a standard, normal curve 2. Explain how to standardize a variable. 3. Use with z scores (standard scores), z = x μ σ +z are scores are above the mean and z are scores below the mean. 4. To compare two observations from different circumstances, find the z score of each, then compare. 5. Is there a difference between the 80 th percentile and the top 80%? Explain. 6. Know how to use Table A, and how to use z scores to find the p value, the probability (or proportion or percent) of the data that lies under a portion of the bell curve, p values represent area under the curve. 7. ALWAYS DRAW THE CURVE to show work and shade to display your area. 8. Where is the median and median of a density curve located? 9. Empirical Rule: 68% 95% 99.7% rule for area under the curve 10. What is a uniform distribution? CHAPTER 3: Examining Relationships Response Variable Explanatory Variable Independent Variable Dependent Variable Scatterplot Positive Association Negative Association Linear Correlation r-value Regression Line Mathematical Model Least-Squares y (y-hat) SSM SSE Regression r 2 Coefficient of Determination Residuals/Residual Plot Influential Observation

Chapter 3, continued A. Graphing two variable (bivariate) data DATA MUST BE QUANTITATIVE. Graph the explanatory variable (independent) on the x axis, the response variable (dependent) on the y-axis 1. Scatterplots look for relationships between the variables. 2. Look for clusters of points and gaps. B. Analyzing two variable quantitative data when a linear relationship is suggested 1. Linear correlation coefficient (r) Measures the strength of the linear relationship 1 r 1 o r = 0 indicates no relationship (the ellipse is a perfect circle) o r indicates an inverse relationship o r is a non-resistant measure (outliers strongly affect r) o r = 1 ) ( y i y ) or Use Calculator! n 1 (x i x s x s y o What does it mean if two variables have High/Weak/No correlation? 2. Least squares regression line (LSRL) used for prediction; minimizes the vertical distances from each data point to the line drawn. (Linreg a+bx) y varies with respect to x, so choose explanatory & response axes carefully (y is dependent on x) y = predicted y value y = a + bx is the equation of the LSRL where b = slope = r ( s y s x ) and the point (x,y ) is always on the line. What is extrapolation and why is this dangerous? 3. Coefficient of Determination (r 2 ) gives the percent of variation in the values of y that can be explained by the regression line. The better the line fits, the higher the value of r 2. To judge "fit of the line" look at r and r 2. If r = 0.7, then r 2 = 0.49, so about half the variation in y is accounted for by the least-squares regression line. 4. Residual (y y ) = vertical distance from actual data point to the regression line. y y = observed y value predicted y value where residuals sum to zero Residual plot scatterplot of (observed x values, predicted y values) or (x, y )). Use calculator to plot residuals on y axis, original x values on x axis. Check: no pattern good linear relationship, curved pattern non-linear relationship, plot widens larger x values do not predict y values well Outliers y values far from the regression line (have large residuals) Influential points x values far from the regression line

CHAPTER 4: More on Two-Variable Data Exponential Function Power Function Linear Growth Exponential Growth Extrapolation Lurking Variables Causation Common Response Confounding Marginal Distributions Conditional Distributions A. Analyzing two variable quantitative data when the data is in a curved pattern: Explain the difference between linear growth and exponential growth. How does the power model differ to the exponential model? If the data appears curved in the shape of a power function or an exponential function, use the calculator to fit an appropriate function to the data. B. Check for exponential regression 1. Take the log of the original y data (in your list) 2. Replot the data (x, log y); it will look linear 3. Calculate a least squares regression line as if the transformed data was original data. 5. Remember that the regression equation is now in the form log y = a + bx 4. Verify with residual plot (no pattern) 5. Remember to undo the transformation before making a prediction using the regression line. C. Check for power regression 1. Take the log of both the original x and original y data (in your lists) 2. Replot the data (log x, log y); it will look linear 3. Calculate a least squares regression line as if the transformed data was original data. Remember that the regression equation is now in form log y = a + b log x 4. Verify with residual plot (no pattern) 5. Remember to undo the transformation before making a prediction using the regression line. D. Cautions in analyzing data 1. Correlation does not imply causation. Only a well-designed, controlled experiment may establish causation. 2. Define lurking variable. 3. Define causation. Give an example. 4. Define common response. Give an example. 5. Define confounding. Give an example. E. Relations in categorical data 1. From a two-way table of counts, find marginal and categorical distributions (in %) 2. Describe relationship between two categorical variables by comparing percents. 3. What is the marginal distribution of a two-way table? 4. How are conditional distributions calculated? 5. Recognize Simpson s paradox and be able to explain it.

CHAPTER 5: Producing Data Confounded Population Sample Convenience Sampling Systematic Random Sample Simple Random Sample Stratified Random Sample / Strata Voluntary Response Sample Nonresponse Response Bias Undercoverage Observational Study Experimental Study Experimental Units Subjects Factor / Treatment Placebo Effect Control Group Replication Statistically Significant Completely Randomized Experiment Block Design Matched Pairs Design Double-blind Experiment Census contacts every individual in the population to obtain data Symbols μ and σ are parameters and are used only with population data What is the difference between sampling and census? Sample survey collects data from a part of a population in order to learn about the entire population Symbols x and s x are statistics and are used with sample data 1. Bad sampling designs result in bias in different forms voluntary response sample participants choose themselves, usually those with strong opinions choose to respond e.g. on line surveys, call in opinion questions convenience sample investigators choose to sample those people who are easy to reach e.g. marketing surveys done in a mall bias the design systematically favors certain outcomes or responses e.g. surveying pacifist church members about attitudes toward war 2. Good sampling designs simple random sample SRS a group of n individuals chosen from a population in such a way that every set of n individuals has an equal chance of being the sample actually chosen; use a random number table or randint on the calculator stratified random sample divide the population into groups (strata) of similar individuals (by some chosen category) then choose a simple random sample from each of the groups systematic random sampling choosing every nth individual after choosing the first randomly 3. Cautions (even when the design is good) include: undercoverage when some groups of the population are left out, often because a complete list of the population from which the sample was chosen was not available. nonresponse when an individual appropriately chosen for the sample cannot or does not respond response bias when an individual does not answer a question truthfully wording of questions questions are worded to elicit a particular response

Observational study observes individuals in a population or sample, measures variables of interest, but does not in any way assign treatments or influence responses How does an experiment differ to an observational study? Experiment deliberately imposes some treatment on individuals (experimental units or subjects) in order to observe response. Can give evidence for causation if well designed with a control group 3 necessities: o Control for lurking variables by assigning units to groups that do not get the treatment o Randomize use simple random sampling to assign units to treatments/control groups o Replicate use the same treatment on many units to reduce the variation due to chance The "best" experiments are double blind neither the investigators nor the subjects know which treatments are being used on which subjects. Placebos are often used. Block designs subjects are grouped before the experiment based on a particular characteristic or set of characteristics, then simple random samples are taken within each block. Matched pairs is one type of block design where two treatments are assigned, State the two most common ways in which matched pairs experiments are designed. Remember how to sketch a design (p. 302) CHAPTER 6: Probability: The Study of Randomness Trial Random Probability Independence Random Phenomenon Sample Space Simulation Tree Diagram Replacement Disjoint P(A) Complement A C Venn Diagram Union (or) Intersection (and) Conditional Probability A. Basic definitions Probability only refers to the long run ; never short term Independent one event does not change (have an effect on) another event Mutually exclusive (disjoint) events cannot occur at the same time, so there can be no intersection of events in a Venn diagram. Mutually exclusive events ALWAYS have an effect on each other, so they can never be independent. What is simulation? What does the calculator command randint (0, 99, 10) perform? What is a sample space? What is an event? What is meant by joint probability? What is meant by conditional probability? State the formula for finding conditional probability.

B. General rules All probabilities for one event must sum to 1 Notation: P(A c ) = complement of A P(A B) = Union = P(A or B) P(A B) = Intersection = P(A and B) P(B A) = P(A B) P(B) = B given A = conditional probability P(A c ) = 1 P(A) where A c is the complement of A P(A B) = P(A) + P(B) P( B) P(A B) = P(A) P(B A) P(A and B) = P(A) P(B A) If P(A B) = 0 then A and B are mutually exclusive If P(B A) = P(B), then A and B are independent If P(A B) = P(A) P(B), then A and B are independent CHAPTER 7: Random Variables Random Variable Discrete Random Probability Probability Histogram Variable Distribution Density Curve Probability Density Continuous Random Uniform Distribution Curve Variable Normal Distribution μ X μ Y Expected Value Law of Large Numbers Variance Standard Deviation 1. Graphs, whether of continuous or discrete variables, must have area under a curve = 1. What is a discrete random variable? What is a continuous random variable? Histograms discrete Smooth curves continuous. The graphs do not need to be symmetric. 2. To get the expected value or mean of a discrete random variable, multiply the number of items by the probability assigned to each item (usually given in a probability distribution table), then sum those products, μ = x i p i Explain the difference between the notations x and μ X. Explain the Law of Large Numbers. 3. To get the variance of a discrete random variable, use σ 2 = (x i μ) 2 p i where p is the probability assigned to each item, x. Not on formula sheet: 4. To find the sum or difference (±) using two random variables, add or subtract the means to get the mean of the sum or difference of the variables, μ X±Y = μ X ± μ Y 5. To get the standard deviation, add the variances, then take the square root of the sum, 2 σ X±Y = σ 2 2 X + σ Y or σ X±Y = σ 2 2 X + σ Y

NOT ON FINAL EXAM: CHAPTER 8: The Binomial and Geometric Distributions Binomial Distribution Binomial Random Variable Geometric Distribution Probability Distribution Function (pdf) Cumulative Distribution Function (cdf) Binomial P(X = k) = ( n k ) (p)k (1 p) n k μ = np σ 2 = np(1 p) Geometric P(X = k) = (1 p) n 1 p P(X > n) = (1 p) n u = 1 p σ 2 = 1 p p 2 What is the difference between a probability distribution function (pdf) and a cumulative distribution function (cdf)? 1. The binomial distribution What is meant by B(n, p)? What are the four conditions for the binomial setting? 1. Only two options (success or failure). 2. All observations are independent. 3. The probability of success, p, is constant. 4. There is a fixed number of trials/observations. The mean of a binomial distribution is μ = np where p is the probability and n is the number of observations/trials in the sample. The standard deviation of the binomial distribution is σ = np(1 p) The graph of a binomial distribution is skewed with the proability, unless the number in the sample is very large, then the distribution becomes normal. binomial probability (pdf): ( n k )pk (1 p) 1 k or use calculator: binompdf(n,p,x) To find the probability of a number less than or equal to: use binomcdf(n,p,x). Under what conditions can we approximate the binomial distribution with a normal distribution? 2. The geometric distribution Conditions are the same as for the binomial except there is not a fixed number of observations because the task is to find out how many times it takes before a success occurs. The mean of the geometric distribution is μ = 1 p The standard deviation of the geometric distribution is σ = 1 p p 2 The graph of the geometric distribution is strongly right skewed always. geometric probability: (1 p) n p or use geometpdf(p,x) To find the probability of a number less than or equal to: use geometcdf(p,x) Modified from: StatsMonkey/AP_Review