Descriptive Statistics-II. Mahmoud Alhussami, MPH, DSc., PhD.

Size: px
Start display at page:

Download "Descriptive Statistics-II. Mahmoud Alhussami, MPH, DSc., PhD."

Transcription

1 Descriptive Statistics-II Mahmoud Alhussami, MPH, DSc., PhD.

2 Shapes of Distribution A third important property of data after location and dispersion - is its shape Distributions of quantitative variables can be described in terms of a number of features, many of which are related to the distributions physical appearance or shape when presented graphically. modality Symmetry and skewness Degree of skewness Kurtosis

3 Modality The modality of a distribution concerns how many peaks or high points there are. A distribution with a single peak, one value a high frequency is a unimodal distribution.

4 Modality A distribution with two or more peaks called multimodal distribution.

5 Symmetry and Skewness A distribution is symmetric if the distribution could be split down the middle to form two haves that are mirror images of one another. In asymmetric distributions, the peaks are off center, with a bull of scores clustering at one end, and a tail trailing off at the other end. Such distributions are often describes as skewed. When the longer tail trails off to the right this is a positively skewed distribution. E.g. annual income. When the longer tail trails off to the left this is called negatively skewed distribution. E.g. age at death.

6 Symmetry and Skewness Shape can be described by degree of asymmetry (i.e., skewness). mean > median positive or right-skewness mean = median symmetric or zero-skewness mean < median negative or left-skewness Positive skewness can arise when the mean is increased by some unusually high values. Negative skewness can arise when the mean is decreased by some unusually low values.

7 Left skewed: Right skewed: Symmetric: 7

8 Shapes of the Distribution Three common shapes of frequency distributions: A B C Symmetrical and bell shaped Positively skewed or skewed to the right Negatively skewed or skewed to the left 9 March

9 Shapes of the Distribution Three less common shapes of frequency distributions: Bimodal A B C Reverse J-shaped Uniform 9 March

10 This guy took a VERY long time! 10

11 Degree of Skewness A skewness index can readily be calculated most statistical computer program in conjunction with frequency distributions The index has a value of 0 for perfectly symmetric distribution. A positive value if there is a positive skew, and negative value if there is a negative skew. A skewness index that is more than twice the value of its standard error can be interpreted as a departure from symmetry.

12 Measures of Skewness or Symmetry Pearson s skewness coefficient It is nonalgebraic and easily calculated. Also it is useful for quick estimates of symmetry. It is defined as: skewness = mean-median/sd Fisher s measure of skewness. It is based on deviations from the mean to the third power.

13 Pearson s skewness coefficient For a perfectly symmetrical distribution, the mean will equal the median, and the skewness coefficient will be zero. If the distribution is positively skewed the mean will be more than the median and the coefficient will be the positive. If the coefficient is negative, the distribution is negatively skewed and the mean less than the median. Skewness values will fall between -1 and +1 SD units. Values falling outside this range indicate a substantially skewed distribution. Hildebrand (1986) states that skewness values above 0.2 or below -0.2 indicate severe skewness.

14 Fisher s Measure of Skewness The formula for Fisher s skewness statistic is based on deviations from the mean to the third power. The measure of skewness can be interpreted in terms of the normal curve A symmetrical curve will result in a value of 0. If the skewness value is positive, them the curve is skewed to the right, and vice versa for a distribution skewed to the left. A z-score is calculated by dividing the measure of skewness by the standard error for skewness. Values above or below are significant at the 0.05 level because 95% of the scores in a normal deviation fall between and from the mean. E.g. if Fisher s skewness= and st.err. =0.197 the z- score = 0.195/0.197 = 0.99

15 Kurtosis The distribution s kurtosis is concerns how pointed or flat its peak. Two types: Leptokurtic distribution (mean thin). Platykurtic distribution (means flat).

16 Kurtosis There is a statistical index of kurtosis that can be computed when computer programs are instructed to produce a frequency distribution For kurtosis index, a value of zero indicates a shape that is neither flat nor pointed. Positive values on the kurtosis statistics indicate greater peakedness, and negative values indicate greater flatness.

17 Fishers measure of Kurtosis Fisher s measure is based on deviation from the mean to the fourth power. A z-score is calculated by dividing the measure of kurtosis by the standard error for kurtosis.

18 Normal Distribution Also called belt shaped curve, normal curve, or Gaussian distribution. A normal distribution is one that is unimodal, symmetric, and not too peaked or flat. Given its name by the French mathematician Quetelet who, in the early 19 th century noted that many human attributes, e.g. height, weight, intelligence appeared to be distributed normally.

19 Normal Distribution The normal curve is unimodal and symmetric about its mean ( ). In this distribution the mean, median and mode are all identical. The standard deviation ( ) specifies the amount of dispersion around the mean. The two parameters and completely define a normal curve. Also called a Probability density function. The probability is interpreted as "area under the curve. The area under the whole curve = 1 9 March

20 Sampling Distribution A sample statistic is often unequal to the value of the corresponding population parameter because of sampling error. Sampling error reflects the tendency for statistics to fluctuate from one sample to another. The amount of sampling error is the difference between the obtained sample value and the population parameter. Inferential statistics allow researchers to estimate how close to the population value the calculated statistics is likely to be. The concept of sampling, which are actually probability distributions, is central to estimates of sampling error.

21 Characteristics of Sampling Distribution Sampling error= sample mean-population mean. Every sample size has a different sampling distribution of the mean. Sampling distributions are theoretical, because in practice, no one draws an infinite number of samples from a population. Their characteristics can be modeled mathematically and have determined by a formulation known as the central limit theorem. This theorem stipulates that the mean of the sampling distribution is identical to the population mean. A consequence of Central Limit Theorem is that if we average measurements of a particular quantity, the distribution of our average tends toward a normal one. The average sampling error-the mean of the (meanμ)sample would always equal zero.

22 Standard Error of the Mean The standard deviation of a sampling distribution of the mean has a special name: the standard error of the mean (SEM). The smaller the SEM, the more accurate are the sample means as estimates of the population value.

23 Central Limit Theorem describes the characteristics of the "population of the means" which has been created from the means of an infinite number of random population samples of size (N), all of them drawn from a given "parent population". It predicts that regardless of the distribution of the parent population: The mean of the population of means is always equal to the mean of the parent population from which the population samples were drawn. The standard deviation of the population of means is always equal to the standard deviation of the parent population divided by the square root of the sample size (N). The distribution of means will increasingly approximate a normal distribution as the size N of samples increases.

24 Standard Normal Variable It is customary to call a standard normal random variable Z. The outcomes of the random variable Z are denoted by z. The table in the coming slide give the area under the curve (probabilities) between the mean and z. The probabilities in the table refer to the likelihood that a randomly selected value Z is equal to or less than a given value of z and greater than 0 (the mean of the standard normal). 9 March

25 Source: Levine et al, Business Statistics, Pearson. 25

26 The Rule for the Normal Distribution 68% of the observations fall within one standard deviation of the mean 95% of the observations fall within two standard deviations of the mean 99.7% of the observations fall within three standard deviations of the mean When applied to real data, these estimates are considered approximate! 9 March

27 Remember these probabilities (percentages): # standard deviations from the mean Approx. area under the normal curve ±1.68 ± ± ±2.955 ± ±3.997 Practice: Find these values yourself using the Z table. Two Sample Z Test 27

28 Standard Normal Curve 9 March

29 Standard Normal Distribution 50% of probability in here probability=0.5 50% of probability in here probability=0.5 9 March

30 Standard Normal Distribution 95% of probability in here 2.5% of probability in here 2.5% of probability in here Standard Normal Distribution with 95% area marked 9 March

31 Calculating Probabilities Probability calculations are always concerned with finding the probability that the variable assumes any value in an interval between two specific points a and b. The probability that a continuous variable assumes the a value between a and b is the area under the graph of the density between a and b. 9 March

32 If the weight of males is N.D. with μ=150 and σ=10, what is the probability that a randomly selected male will weigh between 140 lbs and 155 lbs? Normal Distribution 32

33 Solution: X Z Z = ( )/ 10 = s.d. from mean Area under the curve =.3413 (from Z table) Z = ( ) / 10 =+.50 s.d. from mean Area under the curve =.1915 (from Z table) Answer: =

34 If IQ is ND with a mean of 100 and a S.D. of 10, what percentage of the population will have (a)iqs ranging from 90 to 110? (b)iqs ranging from 80 to 120? Solution: Z = (90 100)/10 = Z = ( )/ 10 = Area between 0 and 1.00 in the Z-table is.3413; Area between 0 and is also.3413 (Z-distribution is symmetric). Answer to part (a) is =

35 (b) IQs ranging from 80 to 120? Solution: Z = (80 100)/10 = Z = ( )/ 10 = Area between =0 and 2.00 in the Z-table is.4772; Area between 0 and is also.4772 (Z-distribution is symmetric). Answer is =

36 Suppose that the average salary of college graduates is N.D. with μ=$40,000 and σ=$10,000. (a) What proportion of college graduates will earn $24,800 or less? (b) What proportion of college graduates will earn $53,500 or more? (c) What proportion of college graduates will earn between $45,000 and $57,000? (d) Calculate the 80th percentile. (e) Calculate the 27th percentile. 36

37 (a) What proportion of college graduates will earn $24,800 or less? Solution: Convert the $24,800 to a Z-score: Z = ($24,800 - $40,000)/$10,000 = Always DRAW a picture of the distribution to help you solve these problems. 37

38 .4357 $24,800 $40, X Z First Find the area between 0 and in the Z-table. From the Z table, that area is Then, the area from to - is = Answer: 6.43% of college graduates will earn less than $24,

39 (b) What proportion of college graduates will earn.4115 $53,500 or more?.0885 $40,000 $53,500 Solution: Convert the $53,500 to a Z-score. Z = ($53,500 - $40,000)/$10,000 = Find the area between 0 and in the Z- table:.4115 is the table value. When you DRAW A PICTURE (above) you see that you need the area in the tail: Answer: Thus, 8.85% of college graduates will earn $53,500 or more. Z 39

40 .1915 (c) What proportion of college graduates will earn between $45,000 and $57,000? Z = $45,000 $40,000 / $10,000 =.50 Z = $57,000 $40,000 / $10,000 = 1.70 $40k $45k $57k Z From the table, we can get the area under the curve between the mean (0) and.5; we can get the area between 0 and 1.7. From the picture we see that neither one is what we need. What do we do here? Subtract the small piece from the big piece to get exactly what we need. Answer: =

41 Parts (d) and (e) of this example ask you to compute percentiles. Every Z-score is associated with a percentile. A Z-score of 0 is the 50 th percentile. This means that if you take any test that is normally distributed (e.g., the SAT exam), and your Z-score on the test is 0, this means you scored at the 50 th percentile. In fact, your score is the mean, median, and mode. 41

42 (d) Calculate the 80 th percentile Solution: $40,000 First, what Z-score is associated 0.84 with the 80 th percentile? A Z-score of approximately +.84 will give you about.3000 of the area under the curve. Also, the area under the curve between - and 0 is Therefore, a Z-score of +.84 is associated with the 80th percentile. ANSWER Now to find the salary (X) at the 80 th percentile: Just solve for X: +.84 = (X $40,000)/$10,000 Z 42

43 (e) Calculate the 27 th percentile Solution: First, what Z-score is associated with the 27 th $40,000 percentile? A Z-score of approximately -.61will give you about.2300 of the area under the curve, with.2700 in the tail. (The area under the curve between 0 and -.61 is.2291 which we are rounding to.2300). Also, the area under the curve between 0 and is Therefore, a Z-score of -.61 is associated with the 27 th percentile. ANSWER.2300 Now to find the salary (X) at the 27 th percentile: Just solve for X: =(X $40,000)/$10,000 X = $40,000 - $6,100 = $33, Z 43

44 Graphical Methods Frequency Distribution Histogram Frequency Polygon Cumulative Frequency Graph Pie Chart. 9 March

45 Presenting Data Table Condenses data into a form that can make them easier to understand; Shows many details in summary fashion; BUT Since table shows only numbers, it may not be readily understood without comparing it to other values.

46 Principles of Table Construction Don t try to do too much in a table Us white space effectively to make table layout pleasing to the eye. Make sure tables & test refer to each other. Use some aspect of the table to order & group rows & columns.

47 Principles of Table Construction If appropriate, frame table with summary statistics in rows & columns to provide a standard of comparison. Round numbers in table to one or two decimal places to make them easily understood. When creating tables for publication in a manuscript, double-space them unless contraindicated by journal.

48 Frequency Distributions A useful way to present data when you have a large data set is the formation of a frequency table or frequency distribution. Frequency the number of observations that fall within a certain range of the data. 9 March

49 Frequency Table Age Number of Deaths < , , , , Total 34,524 49

50 Presenting Data Chart Visual representation of a frequency distribution that helps to gain insight about what the data mean. Built with lines, area & text: bar charts, pie chart

51 PERCENT Bar Chart Simplest form of chart Used to display nominal or ordinal data ETHICAL ISSUES SCALE ITEM Never Seldom Sometimes Frequently ACTING AGAINST YOUR OWN PERSONAL/RELIGIOUS VIEWS

52 PERCENT Cluster Bar Chart Diploma Bachelor Degree As sociate Degree Post Bac Employment Full time RN Part time RN Self employed RN HIGHEST EDUCATION

53 Pie Chart Alternative to bar chart Circle partitioned into percentage distributions of qualitative variables with total area of 100% Doctorate NonNursing Doctorate Nursing MS NonNursing MS Nursing Juris Doctor BS NonNursing BS Nursing Missing Diploma-Nursing AD Nursing

54 Histogram Appropriate for interval, ratio and sometimes ordinal data Similar to bar charts but bars are placed side by side Often used to represent both frequencies and percentages Most histograms have from 5 to 20 bars

55 FREQUENCY Histogram Std. Dev = Mean = 61.6 N = SF-36 VITALITY SCORES

56 0 5 Number of Men Pictures of Data: Histograms Blood pressure data on a sample of 113 men Systolic BP (mmhg) Histogram of the Systolic Blood Pressure for 113 men. Each bar spans a width of 5 mmhg on the horizontal axis. The height of each bar represents the number of individuals with SBP in that range. 9 March

57 Frequency Polygon Frequency Polygon Childrens w eights First place a dot at the midpoint of the upper base of each rectangular bar. The points are connected with straight lines. At the ends, the points are connected to the midpoints of the previous and succeeding intervals (these intervals have zero frequency). 9 March

58 Hallmarks of a Good Chart Simple & easy to read Placed correctly within text Use color only when it has a purpose, not solely for decoration Make sure others can understand chart; try it out on somebody first Remember: A poor chart is worse than no chart at all.

59 Coefficient of Correlation Measure of linear association between 2 continuous variables. Setting: two measurements are made for each observation. Sample consists of pairs of values and you want to determine the association between the variables. 9 March

60 Association Examples Example 1: Association between a mother s weight and the birth weight of her child 2 measurements: mother s weight and baby s weight Both continuous measures Example 2: Association between a risk factor and a disease 2 measurements: disease status and risk factor status Both dichotomous measurements 9 March

61 Correlation Analysis When you have 2 continuous measurements you use correlation analysis to determine the relationship between the variables. Through correlation analysis you can calculate a number that relates to the strength of the linear association. 9 March

62 Scatter Plots and Association You can plot the 2 variables in a scatter plot (one of the types of charts in SPSS/Excel). The pattern of the dots in the plot indicate the statistical relationship between the variables (the strength and the direction). Positive relationship pattern goes from lower left to upper right. Negative relationship pattern goes from upper left to lower right. The more the dots cluster around a straight line the stronger the linear relationship. 9 March

63 Birth Weight Data x (oz) y(%) x birth weight in ounces y increase in weight between 70 th and 100 th days of life, expressed as a percentage of birth weight 63

64 Increase in Birth Weight (%) Pearson Correlation Coefficient Birth Weight Data Birth Weight (in ounces) 9 March

65 Calculations of Correlation Coefficient In SPSS: Go to TOOLS menu and select DATA ANALYSIS. Highlight CORRELATION and click ok Enter INPUT RANGE (2 columns of data that contain x and y ) Click ok (cells where you want the answer to be placed. 9 March

66 Pearson Correlation Results x (oz) y(%) x (oz) 1 y(%) Pearson Correlation Coefficient = Interpretation: - values near 1 indicate strong positive linear relationship - values near 1 indicate strong negative linear relationship - values near 0 indicate a weak linear association 9 March

67 CAUTION!!!! Interpreting the correlation coefficient should be done cautiously! A result of 0 does not mean there is NO relationship. It means there is no linear association. There may be a perfect non-linear association. 9 March

68 The Uses of Frequency Distributions Becoming familiar with dataset. Cleaning the data. Outliers-values that lie outside the normal range of values for other cases. Inspecting the data for missing values. Testing assumptions for statistical tests. Assumption is a condition that is presumed to be true and when ignored or violated can lead to misleading or invalid results. When DV is not normally distributed researchers have to choose between three options: Select a statistical test that does not assume a normal distribution. Ignore the violation of the assumption. Transform the variable to better approximate a distribution that is normal. Please consult the various data transformation.

69 The Uses of Frequency Distributions Obtaining information about sample characteristics. Directing answering research questions.

70 Outliers Are values that are extreme relative to the bulk of scores in the distribution. They appear to be inconsistent with the rest of the data. Advantages: They may indicate characteristics of the population that would not be known in the normal course of analysis. Disadvantages: They do not represent the population Run counter to the objectives of the analysis Can distort statistical tests.

71 Sources of Outliers An error in the recording of the data. A failure of data collection, such as not following sample criteria (e.g. inadvertently admitting a disoriented patient into a study), a subject not following instructions on a questionnaire, or equipment failure. An actual extreme value from an unusual subjects.

72 Missing Data Any systematic event external to the respondent (such as data entry errors or data collection problems) or action on the part of the respondent (such as refusal to answer) that leads to missing data. It means that analyses are based on fewer study participants than were in the full study sample. This, in turn, means less statistical power, which can undermine statistical conclusion validity-the degree to which the statistical results are accurate. Missing data can also affect internal validity-the degree to which inferences about the causal effect of the dependent variable on the dependent variable are warranted, and also affect the external validity-generalizability.

73 Strategies to avoid Missing Data Persistent follow-up Flexibility in scheduling appointments Paying incentives. Using well-proven methods to track people who have moved. Performing a thorough review of completed data forms prior to excusing participants.

74 Techniques for Handling Missing Data Deletion techniques. Involve excluding subjects with missing data from statistical calculation. Imputation techniques. Involve calculating an estimate of each missing value and replacing, or imputing, each value by its respective estimate. Note: techniques for handling missing data often vary in the degree to which they affect the amount of dispersion around true scores, and the degree of bias in the final results. Therefore, the selection of a data handling technique should be carefully considered.

75 Deletion Techniques Deletion methods involve removal of cases or variables with missing data. Listwise deletion. Also called complete case analysis. It is simply the analysis of those cases for which there are no missing data. It eliminates an entire case when any of its items/variables has a missing data point, whether or not that data point is part of the analysis. It is the default of the SPSS. Pairwise deletion. Called the available case analysis (unwise deletion). Involves omitting cases from the analysis on a variable-by-variable basis. It eliminates a case only when that case has missing data for variables or items under analysis.

76 Imputation Techniques Imputation is the process of estimating missing data based on valid values of other variables or cases in the sample. The goal of imputation is to use known relationship that can be identified in the valid values of the sample to help estimate the missing data

77 Types of Imputation Techniques Using prior knowledge. Inserting mean values. Using regression

78 Prior Knowledge Involves replacing a missing value with a value based on an educational guess. It is a reasonable method if the researcher has a good working knowledge of the research domain, the sample is large, and the number of missing values is small.

79 Mean Replacement Also called median replacement for skewed distribution. Involves calculating mean values from a available data on that variable and using them to replace missing values before analysis. It is a conservative procedure because the distribution mean as a whole does not change and the researcher does not have to guess at missing values.

80 Mean Replacement Advantages: Easily implemented and provides all cases with complete data. A compromise procedure is to insert a group mean for the missing values. Disadvantages: It invalidates the variance estimates derived from the standard variance formulas by understanding the data s true variance. It distorts the actual distribution of values. It depresses the observed correlation that this variable will have with other variables because all missing data have a single constant value, thus reducing the variance.

81 Using Regression Involves using other variables in the dataset as independent variables to develop a regression equation for the variable with missing data serving as the dependent variable. Cases with complete data are used to generate the regression equation. The equation is then used to predict missing values for incomplete cases. More regressions are computed, using the predicted values from the previous regression to develop the next equation, until the predicted values from one step to the next are comparable. Prediction from the last regression are the ones used to replace missing values.

82 Using Regression Advantages: It is more objective than the researcher s guess but not as blind as simply using the overall mean. Disadvantages: It reinforces the relationships already in the data, resulting in less generalizability. The variance of the distribution is reduced because the estimate is probably too close to the mean. It assumes that the variable with missing data is correlated substantially with missing data is correlated substantially with the other variables in the dataset. The regression procedure is not constrained in the estimates it makes.

83 Categorical Data Data that can be classified as belonging to a distinct number of categories. Binary data can be classified into one of 2 possible categories (yes/no, positive/negative) Ordinal data that can be classified into categories that have a natural ordering (i.e.. Levels of pain: none, moderate, intense) Nominal- data can be classified into >2 categories (i.e.. Race: Arab, African, and other) 9 March

84 Proportions Numbers by themselves may be misleading: they are on different scales and need to be reduced to a standard basis in order to compare them. We most frequently use proportions: that is, the fraction of items that satisfy some property, such as having a disease or being exposed to a dangerous chemical. "Proportions" are the same thing as fractions or percentages. In every case you need to know what you are taking a proportion of: that is, what is the DENOMINATOR in the proportion. p x n percent(100) x n (100) 9 March

85 Proportions and Probabilities We often interpret proportions as probabilities. If the proportion with a disease is 1/10 then we also say that the probability of getting the disease is 1/10, or 1 in 10. Proportions are usually quoted for samples. Probabilities are almost always quoted for populations. 9 March

86 Workers Example Smoking Workers Cases Controls No Yes No Yes Yes For the cases: Proportion of exposure=84/397=0.212 or 21.2% For the controls: No Proportion of exposure=45/315=0.143 or 14.3% 9 March

87 Prevalence Disease Prevalence = the proportion of people with a given disease at a given time. disease prevalence = Number of diseased persons at a given time Total number of persons examined at that time Prevalence is usually quoted as per 100,000 people so the above proportion should be multiplied by 100, March

88 Interpretation At time t Prevalence Cases ( old Total new) Problem of exposure, consequently Not comparable measurement Old = duration of the disease New = speed of the disease 9 March

89 Screening Tests Through screening tests people are classified as healthy or as falling into one or more disease categories. These tests are not 100% accurate and therefore misclassification is unavoidable. There are 2 proportions that are used to evaluate these types of diagnostic procedures. 9 March

90 Sensitivity and Specificity Sensitivity and specificity are terms used to describe the effectiveness of screening tests. They describe how good a test is in two ways - finding false positives and finding false negatives Sensitivity is the Proportion of diseased who screen positive for the disease Specificity is the Proportion of healthy who screen healthy 9 March

91 Sensitivity and Specificity Condition Present Condition Absent Test Positive True Positive (TP) False Positive (FP) Test Negative False Negative (FN) True Negative (TN) Test Sensitivity (Sn) is defined as the probability that the test is positive when given to a group of patients who have the disease. Sn= (TP/(TP+FN))x100. It can be viewed as, 1-the false negative rate. The Specificity (Sp) of a screening test is defined as the probability that the test will be negative among patients who do not have the disease. Sp = (TN/(TN+FP))X100. It can be understood as 1-the false positive rate.

92 Positive & Negative Predictive Values The positive predictive value (PPV) of a test is the probability that a patient who tested positive for the disease actually has the disease. PPV = (TP/(TP+FP))X 100. The negative predictive value (NPV) of a test is the probability that a patent who tested negative for a disease will not have the disease. NPV = (TN/(TN+FN))X100.

93 The Efficiency The efficiency (EFF) of a test is the probability that the test result and the diagnosis agree. It is calculated as: EFF = ((TP+TN)/(TP+TN+FP+FN)) X 100

94 Example A cytological test was undertaken to screen women for cervical cancer. Test Positive Test Negative Total Actually Positive 154 (TP) 225 (FP) 379 Actually Negative 362 (FN) 23,362 (TN) 23, (TP+FN) 23587(FP+TN) Sensitivity =? Specificity =? 9 March

95 Relative Risk Relative risks are the ratio of risks for two different populations (ratio=a/b). Relative Risk disease incidence disease incidence in group in group If the risk (or proportion) of having the outcome is 1/10 in one population and 2/10 in a second population, then the relative risk is: (2/10) / (1/10) = A relative risk >1 indicates increased risk for the group in the numerator and a relative risk <1 indicates decreased risk for the group in the numerator. 9 March

96 Relative Risk Relative risk the chance that a member of a group receiving some exposure will develop a disease relative to the chance that a member of an unexposed group will develop the same disease. RR P(disease exposed) P(disease unexposed) Recall: a RR of 1.0 indicates that the probabilities of disease in the exposed and unexposed groups are identical an association between exposure and disease does not exist. 9 March

97 Odds Ratio Odds ratio (OR) is how strongly quantify the presence or absence of property A associated with the presence or absence of property B in a given population. It is a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. The odds ratio is the ratio of the odds of the outcome in the two groups. OR=1 Exposure does not affect odds of outcome OR>1 Exposure associated with higher odds of outcome OR<1 Exposure associated with lower odds of outcome

98 When is it used? Odds ratios are used to compare the relative odds of the occurrence of the outcome of interest (disease or disorder), given exposure to the variable of interest (health characteristic). The odds ratio can also be used to determine whether a particular exposure is a risk factor for a particular outcome, and to compare the magnitude of various risk factors for that outcome. Odd s Ratio= A/B divided by C/D = AD/BC

99 Odd s Ratio and Relative Risk Odds ratios are better to use in casecontrol studies (cases and controls are selected and level of exposure is determined retrospectively) Relative risks are better for cohort studies (exposed and unexposed subjects are chosen and are followed to determine disease status - prospective) 9 March

100 Odd s Ratio and Relative Risk When we have a two-way classification of exposure and disease we can approximate the relative risk by the odds ratio Disease Yes No Exposure Yes A B A+B No C D C+D Relative Risk=A/(A+B) divided by C/(C+D) Odd s Ratio= A/B divided by C/D = AD/BC 9 March

101 Case Control Study Example Disease: Pancreatic Cancer Exposure: Cigarette Smoking Disease Exposure Yes No Yes No March

102 Example Continued Relative risk for exposed vs. non-exposed Numerator- proportion of exposed people that have the disease Denominator-proportion of non-exposed that have the disease Relative Risk= (38/119)/(2/58)= March

103 Example Continued Odd s Ratio for exposed vs. non-exposed Numerator- ratio of diseased vs. non-diseased in the exposed group Denominator- ratio of diseased vs. nondiseased in the non-exposed group Odd s Ratio= (38/81)/(2/56)=(38*56)/(2*81) = March

104

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

Establishing a framework for statistical analysis via the Generalized Linear Model

Establishing a framework for statistical analysis via the Generalized Linear Model PSY349: Lecture 1: INTRO & CORRELATION Establishing a framework for statistical analysis via the Generalized Linear Model GLM provides a unified framework that incorporates a number of statistical methods

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Terms & Characteristics

Terms & Characteristics NORMAL CURVE Knowledge that a variable is distributed normally can be helpful in drawing inferences as to how frequently certain observations are likely to occur. NORMAL CURVE A Normal distribution: Distribution

More information

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES f UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES Normal Distribution: Definition, Characteristics and Properties Structure 4.1 Introduction 4.2 Objectives 4.3 Definitions of Probability

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

CHAPTER 2 Describing Data: Numerical

CHAPTER 2 Describing Data: Numerical CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations

More information

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012 The Normal Distribution & Descriptive Statistics Kin 304W Week 2: Jan 15, 2012 1 Questionnaire Results I received 71 completed questionnaires. Thank you! Are you nervous about scientific writing? You re

More information

David Tenenbaum GEOG 090 UNC-CH Spring 2005

David Tenenbaum GEOG 090 UNC-CH Spring 2005 Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,

More information

Moments and Measures of Skewness and Kurtosis

Moments and Measures of Skewness and Kurtosis Moments and Measures of Skewness and Kurtosis Moments The term moment has been taken from physics. The term moment in statistical use is analogous to moments of forces in physics. In statistics the values

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Frequency Distribution and Summary Statistics

Frequency Distribution and Summary Statistics Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow

More information

Numerical Descriptions of Data

Numerical Descriptions of Data Numerical Descriptions of Data Measures of Center Mean x = x i n Excel: = average ( ) Weighted mean x = (x i w i ) w i x = data values x i = i th data value w i = weight of the i th data value Median =

More information

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind. Chapter 5 Categorical A general name for non-numerical data; the data is separated into categories of some kind. Nominal data Categorical data with no implied order. Eg. Eye colours, favourite TV show,

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

Measures of Central tendency

Measures of Central tendency Elementary Statistics Measures of Central tendency By Prof. Mirza Manzoor Ahmad In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central or typical value for a

More information

The Normal Distribution

The Normal Distribution Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution

More information

STAB22 section 1.3 and Chapter 1 exercises

STAB22 section 1.3 and Chapter 1 exercises STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions. Outline. Outline Outline Lecture 9 Probability Distributions 6-1 Introduction 6- Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7- Properties of the Normal Distribution

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

Lecture 9. Probability Distributions

Lecture 9. Probability Distributions Lecture 9 Probability Distributions Outline 6-1 Introduction 6-2 Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7-2 Properties of the Normal Distribution

More information

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class

More information

Lecture Week 4 Inspecting Data: Distributions

Lecture Week 4 Inspecting Data: Distributions Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your

More information

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment MBEJ 1023 Planning Analytical Methods Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment Contents What is statistics? Population and Sample Descriptive Statistics Inferential

More information

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data

More information

Math 227 Elementary Statistics. Bluman 5 th edition

Math 227 Elementary Statistics. Bluman 5 th edition Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line. Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,

More information

M249 Diagnostic Quiz

M249 Diagnostic Quiz THE OPEN UNIVERSITY Faculty of Mathematics and Computing M249 Diagnostic Quiz Prepared by the Course Team [Press to begin] c 2005, 2006 The Open University Last Revision Date: May 19, 2006 Version 4.2

More information

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Quantitative Methods for Economics, Finance and Management (A86050 F86050) Quantitative Methods for Economics, Finance and Management (A86050 F86050) Matteo Manera matteo.manera@unimib.it Marzio Galeotti marzio.galeotti@unimi.it 1 This material is taken and adapted from Guy Judge

More information

chapter 2-3 Normal Positive Skewness Negative Skewness

chapter 2-3 Normal Positive Skewness Negative Skewness chapter 2-3 Testing Normality Introduction In the previous chapters we discussed a variety of descriptive statistics which assume that the data are normally distributed. This chapter focuses upon testing

More information

ECON 214 Elements of Statistics for Economists

ECON 214 Elements of Statistics for Economists ECON 214 Elements of Statistics for Economists Session 7 The Normal Distribution Part 1 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh College of Education

More information

Chapter 4. The Normal Distribution

Chapter 4. The Normal Distribution Chapter 4 The Normal Distribution 1 Chapter 4 Overview Introduction 4-1 Normal Distributions 4-2 Applications of the Normal Distribution 4-3 The Central Limit Theorem 4-4 The Normal Approximation to the

More information

starting on 5/1/1953 up until 2/1/2017.

starting on 5/1/1953 up until 2/1/2017. An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,

More information

Examples of continuous probability distributions: The normal and standard normal

Examples of continuous probability distributions: The normal and standard normal Examples of continuous probability distributions: The normal and standard normal The Normal Distribution f(x) Changing μ shifts the distribution left or right. Changing σ increases or decreases the spread.

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

What s Normal? Chapter 8. Hitting the Curve. In This Chapter

What s Normal? Chapter 8. Hitting the Curve. In This Chapter Chapter 8 What s Normal? In This Chapter Meet the normal distribution Standard deviations and the normal distribution Excel s normal distribution-related functions A main job of statisticians is to estimate

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda, MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE Dr. Bijaya Bhusan Nanda, CONTENTS What is measures of dispersion? Why measures of dispersion? How measures of dispersions are calculated? Range Quartile

More information

Lecture 6: Chapter 6

Lecture 6: Chapter 6 Lecture 6: Chapter 6 C C Moxley UAB Mathematics 3 October 16 6.1 Continuous Probability Distributions Last week, we discussed the binomial probability distribution, which was discrete. 6.1 Continuous Probability

More information

Expected Value of a Random Variable

Expected Value of a Random Variable Knowledge Article: Probability and Statistics Expected Value of a Random Variable Expected Value of a Discrete Random Variable You're familiar with a simple mean, or average, of a set. The mean value of

More information

STAT 113 Variability

STAT 113 Variability STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2

More information

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean Measure of Center Measures of Center The value at the center or middle of a data set 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) 1 2 Mean Notation The measure of center obtained by adding the values

More information

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative STAT:10 Statistical Methods and Computing Normal Distributions Lecture 4 Feb. 6, 17 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowa.edu 1 2 Using density curves to describe the distribution of values of

More information

Summary of Statistical Analysis Tools EDAD 5630

Summary of Statistical Analysis Tools EDAD 5630 Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure

More information

DESCRIPTIVE STATISTICS

DESCRIPTIVE STATISTICS DESCRIPTIVE STATISTICS INTRODUCTION Numbers and quantification offer us a very special language which enables us to express ourselves in exact terms. This language is called Mathematics. We will now learn

More information

Description of Data I

Description of Data I Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret

More information

3.1 Measures of Central Tendency

3.1 Measures of Central Tendency 3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

Descriptive Statistics Bios 662

Descriptive Statistics Bios 662 Descriptive Statistics Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-08-19 08:51 BIOS 662 1 Descriptive Statistics Descriptive Statistics Types of variables

More information

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.) Starter Ch. 6: A z-score Analysis Starter Ch. 6 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 85 on test 2. You re all set to drop

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

DATA HANDLING Five-Number Summary

DATA HANDLING Five-Number Summary DATA HANDLING Five-Number Summary The five-number summary consists of the minimum and maximum values, the median, and the upper and lower quartiles. The minimum and the maximum are the smallest and greatest

More information

2 DESCRIPTIVE STATISTICS

2 DESCRIPTIVE STATISTICS Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled

More information

CH 5 Normal Probability Distributions Properties of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise

More information

Data screening, transformations: MRC05

Data screening, transformations: MRC05 Dale Berger Data screening, transformations: MRC05 This is a demonstration of data screening and transformations for a regression analysis. Our interest is in predicting current salary from education level

More information

AP Statistics Chapter 6 - Random Variables

AP Statistics Chapter 6 - Random Variables AP Statistics Chapter 6 - Random 6.1 Discrete and Continuous Random Objective: Recognize and define discrete random variables, and construct a probability distribution table and a probability histogram

More information

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă DESCRIPTIVE STATISTICS II Sorana D. Bolboacă OUTLINE Measures of centrality Measures of spread Measures of symmetry Measures of localization Mainly applied on quantitative variables 2 DESCRIPTIVE STATISTICS

More information

Edexcel past paper questions

Edexcel past paper questions Edexcel past paper questions Statistics 1 Chapters 2-4 (Discrete) Statistics 1 Chapters 2-4 (Discrete) Page 1 Stem and leaf diagram Stem-and-leaf diagrams are used to represent data in its original form.

More information

Continuous Probability Distributions

Continuous Probability Distributions 8.1 Continuous Probability Distributions Distributions like the binomial probability distribution and the hypergeometric distribution deal with discrete data. The possible values of the random variable

More information

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION 1 Day 3 Summer 2017.07.31 DISTRIBUTION Symmetry Modality 单峰, 双峰 Skewness 正偏或负偏 Kurtosis 2 3 CHAPTER 4 Measures of Central Tendency 集中趋势

More information

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis Descriptive Statistics (Part 2) 4 Chapter Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. Chebyshev s Theorem

More information

Section Introduction to Normal Distributions

Section Introduction to Normal Distributions Section 6.1-6.2 Introduction to Normal Distributions 2012 Pearson Education, Inc. All rights reserved. 1 of 105 Section 6.1-6.2 Objectives Interpret graphs of normal probability distributions Find areas

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

Chapter 6: The Normal Distribution

Chapter 6: The Normal Distribution Chapter 6: The Normal Distribution Diana Pell Section 6.1: Normal Distributions Note: Recall that a continuous variable can assume all values between any two given values of the variables. Many continuous

More information

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2] 1. a) 45 [1] b) 7 th value 37 [] n c) LQ : 4 = 3.5 4 th value so LQ = 5 3 n UQ : 4 = 9.75 10 th value so UQ = 45 IQR = 0 f.t. d) Median is closer to upper quartile Hence negative skew [] Page 1 . a) Orders

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name The bar graph shows the number of tickets sold each week by the garden club for their annual flower show. ) During which week was the most number of tickets sold? ) A) Week B) Week C) Week 5

More information

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good

More information

Chapter 6: The Normal Distribution

Chapter 6: The Normal Distribution Chapter 6: The Normal Distribution Diana Pell Section 6.1: Normal Distributions Note: Recall that a continuous variable can assume all values between any two given values of the variables. Many continuous

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc COUNSELLING PSYCHOLOGY (2011 Admission Onwards) II Semester Complementary Course PSYCHOLOGICAL STATISTICS QUESTION BANK 1. The process of grouping

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION In Inferential Statistic, ESTIMATION (i) (ii) is called the True Population Mean and is called the True Population Proportion. You must also remember that are not the only population parameters. There

More information

NOTES: Chapter 4 Describing Data

NOTES: Chapter 4 Describing Data NOTES: Chapter 4 Describing Data Intro to Statistics COLYER Spring 2017 Student Name: Page 2 Section 4.1 ~ What is Average? Objective: In this section you will understand the difference between the three

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter7 Probability Distributions and Statistics Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number of boys in

More information

Lectures delivered by Prof.K.K.Achary, YRC

Lectures delivered by Prof.K.K.Achary, YRC Lectures delivered by Prof.K.K.Achary, YRC Given a data set, we say that it is symmetric about a central value if the observations are distributed symmetrically about the central value. In symmetrically

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

Probability & Statistics Modular Learning Exercises

Probability & Statistics Modular Learning Exercises Probability & Statistics Modular Learning Exercises About The Actuarial Foundation The Actuarial Foundation, a 501(c)(3) nonprofit organization, develops, funds and executes education, scholarship and

More information

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, 2013 Abstract Introduct the normal distribution. Introduce basic notions of uncertainty, probability, events,

More information

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing Overview: Descriptives & Graphing 1. Getting to know a data set 2. LOM & types of statistics 3. Descriptive statistics 4. Normal distribution 5. Non-normal distributions 6. Effect of skew on central tendency

More information

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good

More information

3.3-Measures of Variation

3.3-Measures of Variation 3.3-Measures of Variation Variation: Variation is a measure of the spread or dispersion of a set of data from its center. Common methods of measuring variation include: 1. Range. Standard Deviation 3.

More information