Ordinal Predicted Variable

Size: px
Start display at page:

Download "Ordinal Predicted Variable"

Transcription

1 Ordinal Predicted Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

2 Goals and General Idea

3 Goals When would we use this type of analysis? When the predicted variable is ordinal! Places in a race (1st, 2nd, 3rd, etc.) Surveys on a Likert scale (5 = strongly agree, 4 = agree, 3 = neutral 2 = disagree, 1 = strongly disagree) Scaled responses (good, mediocre, bad) etc.

4 Characteristics Ordinal data are kind of a pain to deal with Know order, but not necessarily equally spaced How much do you like fish (1-hate to 5-love)? May be harder to go from 1 2 than 4 5 As predictor variables increase, should sequentially step through predicted values How can we ensure this happens?

5 Characteristics Suppose ordinal data with 7 levels There will be cut-off points (thresholds) between levels, indicating where it switches from one to another (indicated here as θs) If there are k levels, there will be k-1 of these thresholds From Kruschke (2015) p. 673

6 Characteristics How do we get probabilities for each level? Cumulative distribution From Kruschke (2015) p. 673

7 Characteristics Now values range from 0 to 1 Probability for each level is the cumulative area up to the threshold just above that level minus the cumulative area up to the threshold just below that level Call each threshold point an α value cumulative proportion α1 α2 α3 α4 α5 α response

8 Characteristics For first category, probability is cumulative probability for that value, minus zero Considering the mean and sd of the underlying distribution Cumulative normal distribution in JAGS is pnorm cumulative proportion α1 α2 α3 α4 α5 α response

9 Characteristics For second category, probability is cumulative probability for that value, minus that for the first category cumulative proportion α1 α2 α3 α4 α5 α response

10 Characteristics For third category, probability is cumulative probability for that value, minus that for the second category cumulative proportion α1 α2 α3 α4 α5 α response

11 Characteristics For fourth category, probability is cumulative probability for that value, minus that for the third category cumulative proportion α1 α2 α3 α4 α5 α response

12 Characteristics For fifth category, probability is cumulative probability for that value, minus that for the fourth category cumulative proportion α1 α2 α3 α4 α5 α response

13 Characteristics For sixth category, probability is cumulative probability for that value, minus that for the fifth category cumulative proportion α1 α2 α3 α4 α5 α response

14 Characteristics For seventh category, probability is one, minus the cumulative probability for the 6th category cumulative proportion α1 α2 α3 α4 α5 α response

15 Characteristics One problem: Our α values are only relative to one another, and have no absolute position, as is Could add any constant to raw (i.e. non-cumulative) values, and recover the same estimates Like sliding our distribution up and down the x-axis, our α estimates would remain the same Real problems for MCMC process (any value is reasonable!)

16 Characteristics One solution: Pin down distribution by specifying the two extreme α values Estimate the rest, relative to these, which is all that matters Will specify this in the data list

17 Characteristics The mean (μ) of this distribution is the result of the additive effect of our predictor variables Our standard equation for the effects of the predictor variables goes into this μ

18 Characteristics Distribution does not have to look normal for the normal distribution to be appropriate All the following histograms were generated by a normal distribution From Kruschke (2015) p. 673

19 Characteristics What we are estimating: 1. The α values for all but the first and last thresholds 2. The mean (μ) of the underlying distribution (based on the additive effect of the predictor variables) 3. The standard deviation (σ) of the underlying distribution 4. Other appropriate distribution parameters if not using the normal distribution

20 The Data

21 Data Fake data generated from code in Kruschke (2011) ord <- read.table("ordinaldata.csv", header = TRUE, sep = ",")

22 Data Fake data generated from code in Kruschke (2011) Ordinal predicted variable ord <- read.table("ordinaldata.csv", header = TRUE, sep = ",")

23 Data Fake data generated from code in Kruschke (2011) Two metric predictor variables ord <- read.table("ordinaldata.csv", header = TRUE, sep = ",")

24 Data Can use the pairs function to plot the data, and get some idea of potential patterns (keeping in mind the issue of interactions) pairs(ord, pch = 16, col = rgb(0, 0, 1, 0.3))

25 Data Exploration Y X1 X

26 Data Exploration Looks like a negative relationship between X1 & Y, and a positive relationship between X2 & Y Y X1 X

27 Data Exploration Use the table function to get frequencies for each ordinal response ytable <- table(ord$y) ytable

28 Data Exploration Make as a data frame and format properly ytable.df <- as.data.frame(ytable) ytable.df[, 1] <- as.numeric(as.character(ytable.df[, 1]))

29 Data Exploration Plot the data plot(ytable.df[, 1], ytable.df[, 2], type = "h", ylab = "Frequency", xlab = "response", lwd = 4, col = rgb(0, 0, 1, 0.5)) Frequency response

30 Data Exploration Can also transpose this to the cumulative distribution of your data, if you want to Frequency response # Get proportions pr_y <- ytable / nrow(ord) # Get cumulative proportions cum_pr_y <- cumsum(pr_y)

31 Data Exploration Can also transpose this to the cumulative distribution of your data, if you want to An R function that calculates the cumulative sums of a vector Frequency response # Get proportions pr_y <- ytable / nrow(ord) # Get cumulative proportions cum_pr_y <- cumsum(pr_y)

32 Data Exploration # Plot plot(ytable.df[, 1], cum_pr_y, type = "b", lwd = 2, ylab = cumulative proportion", xlab = "response", ylim = c(0, 1), col = blue ) cumulative proportion response

33 Frequentist Approach

34 Frequentist Approch I don t know polr function from the MASS package seems to be an option, but I couldn t get it to work (with limited time)

35 Bayesian Approach

36 Load Libraries & Functions library(runjags) library(coda) source("plotpost.r")

37 Organize the Data y <- ord$y N <- length(y) nlevels <- length(unique(y)) x1 <- ord$x1 x2 <- ord$x2

38 Organize the Data y <- ord$y N <- length(y) nlevels <- length(unique(y)) x1 <- ord$x1 x2 <- ord$x2 Making a variable with the number of response levels will make your code more generic

39 Standardize the Metric Variables # x1 x1mean <- mean(x1) x1sd <- sd(x1) zx1 <- (x1 - x1mean) / x1sd # x2 x2mean <- mean(x2) x2sd <- sd(x2) zx2 <- (x2 - x2mean) / x2sd

40 Create a List For Alpha Values #--- Create a list for anchored alpha values ---# # with beginning and ending values, but the # # rest will be filled in by the MCMC process # # (have them as "NA" for now). # # # alpha <- rep(na, nlevels - 1) alpha[1] < # Set first value alpha[nlevels - 1] <- nlevels # Set last value

41 Make Data List For JAGS datalist = list( y = y, nlevels = nlevels, N = N, x1 = zx1, x2 = zx2, alpha = alpha )

42 Define the Model Doesn t lend itself well to a diagram We ll just walk through the code

43 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { Note that I have rearranged things from how I did it before, to try to add clarity. We ll walk through it. # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

44 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { The black box into which we can put any equations that we have dealt with before (or more) # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

45 modelstring = " model { #--- The likelihood ---# for (i in 1:N) {...describe the mean of the normal distribution describing the data # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

46 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) The probability of each data point being in the first category is the cumulative probability up to the first threshold, based on the mean and sd of the underlying distribution # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

47 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) The probability of each data point being in each of the middle categories is the cumulative probability up to the top threshold for the given category, minus the cumulative probability up to the bottom threshold for the given category, based on the mean and sd of the underlying distribution # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

48 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) This is just a safety net. If the difference calculated to the right is less than zero, zero will be used. Included because probabilities can t be less than zero. # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

49 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) The probability of each data point being in the last category is one minus the cumulative probability up to the lower threshold for that category, based on the mean and sd of the underlying distribution # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

50 modelstring = " model { #--- The likelihood ---# for (i in 1:N) { What we ve created in the last few lines is a probability matrix for each data point being in each of our response categories... # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

51 modelstring = " model { #--- The likelihood ---# for (i in 1:N) {...these are used to describe the categorical distribution that is ultimately fit to the observed response variables # The standard part of our equation mu[i] <- b0 + (b1 * x1[i]) + (b2 * x2[i]) # log-odds for each value being in first category p[i, 1] <- pnorm(alpha[1], mu[i], 1 / sigma^2) # log-odds for each value being in categories between the highest and the lowest for (j in 2:(nLevels - 1)) { p[i, j] <- max(0, pnorm(alpha[j], mu[i], 1 / sigma^2) - pnorm(alpha[j - 1], mu[i], 1 / sigma^2)) # log-odds for each value being in the highest category p[i, nlevels] <- 1 - pnorm(alpha[nlevels - 1], mu[i], 1/sigma^2)... # Now, fit the y data to a categorical distribution # with the characteristics we just calculated y[i] ~ dcat(p[i, 1:nLevels])

52 Define the Model... #--- The Priors ---# # intercept and effect coefficients b0 ~ dnorm((1 + nlevels) / 2, 1 / nlevels^2) b1 ~ dnorm(0, 1 / nlevels^2) b2 ~ dnorm(0, 1 / nlevels^2) # Sigma sigma ~ dunif(nlevels / 1000, nlevels * 10) # Intermediate alpha values (we set the min and max # values in our initial data list) for (j in 2:(nLevels - 2)) { alpha[j] ~ dnorm(j + 0.5, 1 / 2^2) " writelines(modelstring, con = "model.txt")

53 Define the Model... #--- The Priors ---# # intercept and effect coefficients b0 ~ dnorm((1 + nlevels) / 2, 1 / nlevels^2) b1 ~ dnorm(0, 1 / nlevels^2) b2 ~ dnorm(0, 1 / nlevels^2) # Sigma sigma ~ dunif(nlevels / 1000, nlevels * 10) # Intermediate alpha values (we set the min and max # values in our initial data list) for (j in 2:(nLevels - 2)) { alpha[j] ~ dnorm(j + 0.5, 1 / 2^2) " writelines(modelstring, con = "model.txt") Our mean value (b0) should be about in the centre of our categories, and the sd is the number of categories (can t be beyond this)

54 Define the Model... #--- The Priors ---# # intercept and effect coefficients b0 ~ dnorm((1 + nlevels) / 2, 1 / nlevels^2) b1 ~ dnorm(0, 1 / nlevels^2) b2 ~ dnorm(0, 1 / nlevels^2) # Sigma sigma ~ dunif(nlevels / 1000, nlevels * 10) # Intermediate alpha values (we set the min and max # values in our initial data list) for (j in 2:(nLevels - 2)) { alpha[j] ~ dnorm(j + 0.5, 1 / 2^2) " writelines(modelstring, con = "model.txt") Same logic for sd here

55 Define the Model... #--- The Priors ---# # intercept and effect coefficients b0 ~ dnorm((1 + nlevels) / 2, 1 / nlevels^2) b1 ~ dnorm(0, 1 / nlevels^2) b2 ~ dnorm(0, 1 / nlevels^2) # Sigma sigma ~ dunif(nlevels / 1000, nlevels * 10) # Intermediate alpha values (we set the min and max # values in our initial data list) for (j in 2:(nLevels - 2)) { alpha[j] ~ dnorm(j + 0.5, 1 / 2^2) " writelines(modelstring, con = "model.txt") Our prior for sigma comes from a uniform distribution with a minimum value of our number of levels divided by 1000, and a maximum value of the number of levels times 10

56 Define the Model... #--- The Priors ---# # intercept and effect coefficients b0 ~ dnorm((1 + nlevels) / 2, 1 / nlevels^2) b1 ~ dnorm(0, 1 / nlevels^2) b2 ~ dnorm(0, 1 / nlevels^2) # Sigma sigma ~ dunif(nlevels / 1000, nlevels * 10) # Intermediate alpha values (we set the min and max # values in our initial data list) for (j in 2:(nLevels - 2)) { alpha[j] ~ dnorm(j + 0.5, 1 / 2^2) " writelines(modelstring, con = "model.txt") Only need alpha priors for the middle values because outer ones were specified. These will come from a normal distribution with mean of that level value plus 0.5, and a sd of 2. Note that what you choose here should be based on what values you used to specify the outer alphas.

57 Specify Initial Values initslist <- function() { list( b0 = rnorm(n = 1, mean = (1 + nlevels) / 2, sd = nlevels), b1 = rnorm(n = 1, mean = 0, sd = nlevels), b2 = rnorm(n = 1, mean = 0, sd = nlevels), sigma = runif(n = 1, min = nlevels / 1000, max = nlevels * 10) )

58 Specify MCMC Parameters and Run runjagsout <- run.jags( method = "simple", model = "model.txt", monitor = c("b0", "b1", "b2", "sigma", "alpha"), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)

59 Specify MCMC Parameters and Run runjagsout <- run.jags( method = "simple", model = "model.txt", monitor = c("b0", "b1", "b2", "sigma", "alpha"), data = datalist, inits = initslist, n.chains = 3, Note adapt that = there 500, is a lot going on in this burnin = 1000, model. As a result, it takes substantially sample = 20000, longer thin than = 1, our other ones. This one takes about summarise 10 min = on TRUE, my computer, and my plots = FALSE) computer is fairly fast.

60 Next Steps (On Your Own) Retrieve the data and take a peek at the structure Test model performance Extract & parse results Convert back to original scale

61 View Posteriors

62 Plotting Posterior Distributions β 0 par(mfrow = c(1, 1)) histinfo = plotpost(b0, xlab = bquote(beta[0])) mean = % HDI β 0

63 Plotting Posterior Distributions β 1 & β 2 par(mfrow = c(1, 2)) histinfo = plotpost(b1, xlab = bquote(beta[1]), main = "x1") histinfo = plotpost(b2, xlab = bquote(beta[2]), main = "x2") x1 x2 mean = mean = % HDI % HDI β β 2

64 Plotting Posterior Distributions β 1 & β 2 x1 has a credible negative effect, and x2 has par(mfrow = c(1, 2)) a credible positive effect. These are on a histinfo = plotpost(b1, xlab = bquote(beta[1]), main = "x1") histinfo = plotpost(b2, real xlab scale. = No bquote(beta[2]), need to transform main them = for "x2") interpretation. x1 mean = x2 mean = % HDI % HDI β β 2

65 Posterior Predictive Check

66 Posterior Predictive Check Code is clunky and slow, but should make sense (and work!) Takes about 10 minutes on my computer...so be patient! Code predicts for the entire data set (N = 200), but could use a subset For each step in the chain, count the number of individuals assigned to each level (given the predictor variables) Compare the mean and HDIs of these predictions relative to true values

67 Posterior Predictive Check source("hdiofmcmc.r") # Create a matrix to hold results ypostpred <- matrix(0, nrow = chainlength, ncol = nlevels) # For each step in the chain... for (i in 1:chainLength) { # Initialize holders (counters for each level) counter1 <- 0 counter2 <- 0 counter3 <- 0 counter4 <- 0 counter5 <- 0 counter6 <- 0 counter7 <- 0...

68 Posterior Predictive Check source("hdiofmcmc.r") Starting a loop that will go through every step in the chain # Create a matrix to hold results ypostpred <- matrix(0, nrow = chainlength, ncol = nlevels) # For each step in the chain... for (i in 1:chainLength) { # Initialize holders (counters for each level) counter1 <- 0 counter2 <- 0 counter3 <- 0 counter4 <- 0 counter5 <- 0 counter6 <- 0 counter7 <- 0...

69 Posterior Predictive Check source("hdiofmcmc.r") # Create a matrix to hold results ypostpred <- matrix(0, nrow = chainlength, ncol = nlevels) # For each step in the chain... for (i in 1:chainLength) { # Initialize holders (counters for each level) counter1 <- 0 counter2 <- 0 counter3 <- 0 counter4 <- 0 counter5 <- 0 counter6 <- 0 counter7 < For each response level, initialize a counter that will keep track of how many results were assigned to each level (re-zeroed for each step in the chain)

70 Posterior Predictive Check... # For each individual... for (j in 1:N) { Then (for each step in the chain), for each individual (data point) calculate the mean, and then the probability of them being in each response level, using equations we have seen before. # Calculate mean mu = b0[i] + (b1[i] * x1[j]) + (b2[i] * x2[j]) # Calculate their probability of being in each level levelprobs <- rep(0, times = nlevels) levelprobs[1] <- pnorm(alpha[i, 1], mu, sigma[i]) levelprobs[2] <- pnorm(alpha[i, 2], mu, sigma[i]) - pnorm(alpha[i, 1], mu, sigma[i]) levelprobs[3] <- pnorm(alpha[i, 3], mu, sigma[i]) - pnorm(alpha[i, 2], mu, sigma[i]) levelprobs[4] <- pnorm(alpha[i, 4], mu, sigma[i]) - pnorm(alpha[i, 3], mu, sigma[i]) levelprobs[5] <- pnorm(alpha[i, 5], mu, sigma[i]) - pnorm(alpha[i, 4], mu, sigma[i]) levelprobs[6] <- pnorm(alpha[i, 6], mu, sigma[i]) - pnorm(alpha[i, 5], mu, sigma[i]) levelprobs[7] <- 1 - pnorm(alpha[i, 6], mu, sigma[i])...

71 # Find item number for highest value levelid <- which.max(levelprobs) # Increase counter for appropriate group if(levelid == 1) { counter1 <- counter1 + 1 else { if (levelid == 2) { counter2 <- counter2 + 1 else { if (levelid == 3) { counter3 <- counter3 + 1 else { if (levelid == 4) { counter4 <- counter4 + 1 else { if (levelid == 5) { counter5 <- counter5 + 1 else { if (levelid == 6 ) { counter6 <- counter6 + 1 else { counter7 <- counter7 + 1

72 # Find item number for highest value levelid <- which.max(levelprobs) # Increase counter for appropriate group if(levelid == 1) { counter1 <- counter1 + 1 else { if (levelid == 2) { counter2 <- counter2 + 1 else { if (levelid == 3) { counter3 <- counter3 + 1 else { if (levelid == 4) { counter4 <- counter4 + 1 else { if (levelid == 5) { counter5 <- counter5 + 1 else { if (levelid == 6 ) { counter6 <- counter6 + 1 else { counter7 <- counter7 + 1 Identify for which response level the individual has the highest probability

73 # Find item number for highest value levelid <- which.max(levelprobs) # Increase counter for appropriate group if(levelid == 1) { counter1 <- counter1 + 1 else { if (levelid == 2) { counter2 <- counter2 + 1 else { if (levelid == 3) { counter3 <- counter3 + 1 else { if (levelid == 4) { counter4 <- counter4 + 1 else { if (levelid == 5) { counter5 <- counter5 + 1 else { if (levelid == 6 ) { counter6 <- counter6 + 1 else { counter7 <- counter7 + 1 Increment the appropriate counter. For each step in the chain, these counters will be the number of individuals predicted to be in each response level

74 Posterior Predictive Check... # Place results in results matrix ypostpred[i, 1] <- counter1 ypostpred[i, 2] <- counter2 ypostpred[i, 3] <- counter3 ypostpred[i, 4] <- counter4 ypostpred[i, 5] <- counter5 ypostpred[i, 6] <- counter6 ypostpred[i, 7] <- counter7 ypredmeans = apply(ypostpred, 2, median, na.rm = TRUE) ypredhdi = apply(ypostpred, 2, HDIofMCMC)

75 Posterior Predictive Check Fill the appropriate row (step in chain) of the ypostpred matrix with the counts for each response level.... # Place results in results matrix ypostpred[i, 1] <- counter1 ypostpred[i, 2] <- counter2 ypostpred[i, 3] <- counter3 ypostpred[i, 4] <- counter4 ypostpred[i, 5] <- counter5 ypostpred[i, 6] <- counter6 ypostpred[i, 7] <- counter7 ypostpred will have one row for each step in the chain, indicating how many individuals were assigned to each response level (columns) ypredmeans = apply(ypostpred, 2, median, na.rm = TRUE) ypredhdi = apply(ypostpred, 2, HDIofMCMC)

76 Posterior Predictive Check... # Place results in results matrix ypostpred[i, 1] <- counter1 ypostpred[i, 2] <- counter2 ypostpred[i, 3] <- counter3 ypostpred[i, 4] <- counter4 ypostpred[i, 5] <- counter5 ypostpred[i, 6] <- counter6 ypostpred[i, 7] <- counter7 Calculate the mean and HDI for each response level, across all steps in the chain ypredmeans = apply(ypostpred, 2, median, na.rm = TRUE) ypredhdi = apply(ypostpred, 2, HDIofMCMC)

77 Posterior Predictive Check # Plot original data hist(y, breaks = c(0.5, (1:nLevels + 0.5)), main = "", col = "skyblue", border = "white") # Add predicted means points(x = 1:nLevels, y = ypredmeans, pch = 16) # Add HDI bars segments(x0 = 1:nLevels, y0 = ypredhdi[1, ], x1 = 1:nLevels, y1 = ypredhdi[2, ], lwd = 2) Frequency y

78 Questions?

79 Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example... Chapter 4 Point estimation Contents 4.1 Introduction................................... 2 4.2 Estimating a population mean......................... 2 4.2.1 The problem with estimating a population mean

More information

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0, Stat 534: Fall 2017. Introduction to the BUGS language and rjags Installation: download and install JAGS. You will find the executables on Sourceforge. You must have JAGS installed prior to installing

More information

Mixture Models and Gibbs Sampling

Mixture Models and Gibbs Sampling Mixture Models and Gibbs Sampling October 12, 2009 Readings: Hoff CHapter 6 Mixture Models and Gibbs Sampling p.1/16 Eyes Exmple Bowmaker et al (1985) analyze data on the peak sensitivity wavelengths for

More information

4. Basic distributions with R

4. Basic distributions with R 4. Basic distributions with R CA200 (based on the book by Prof. Jane M. Horgan) 1 Discrete distributions: Binomial distribution Def: Conditions: 1. An experiment consists of n repeated trials 2. Each trial

More information

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

STAT Lecture 9: T-tests

STAT Lecture 9: T-tests STAT 491 - Lecture 9: T-tests Posterior Predictive Distribution Another valuable tool in Bayesian statistics is the posterior predictive distribution. The posterior predictive distribution can be written

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Scott Creel Wednesday, September 10, 2014 This exercise extends the prior material on using the lm() function to fit an OLS regression and test hypotheses about effects on a parameter.

More information

STATISTICAL LABORATORY, May 18th, 2010 CENTRAL LIMIT THEOREM ILLUSTRATION

STATISTICAL LABORATORY, May 18th, 2010 CENTRAL LIMIT THEOREM ILLUSTRATION STATISTICAL LABORATORY, May 18th, 2010 CENTRAL LIMIT THEOREM ILLUSTRATION Mario Romanazzi 1 BINOMIAL DISTRIBUTION The binomial distribution Bi(n, p), being the sum of n independent Bernoulli distributions,

More information

Income inequality and the growth of redistributive spending in the U.S. states: Is there a link?

Income inequality and the growth of redistributive spending in the U.S. states: Is there a link? Draft Version: May 27, 2017 Word Count: 3128 words. SUPPLEMENTARY ONLINE MATERIAL: Income inequality and the growth of redistributive spending in the U.S. states: Is there a link? Appendix 1 Bayesian posterior

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range. MA 115 Lecture 05 - Measures of Spread Wednesday, September 6, 017 Objectives: Introduce variance, standard deviation, range. 1. Measures of Spread In Lecture 04, we looked at several measures of central

More information

Market Volatility and Risk Proxies

Market Volatility and Risk Proxies Market Volatility and Risk Proxies... an introduction to the concepts 019 Gary R. Evans. This slide set by Gary R. Evans is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

More information

Package tailloss. August 29, 2016

Package tailloss. August 29, 2016 Package tailloss August 29, 2016 Title Estimate the Probability in the Upper Tail of the Aggregate Loss Distribution Set of tools to estimate the probability in the upper tail of the aggregate loss distribution

More information

Frequency Distributions

Frequency Distributions Frequency Distributions January 8, 2018 Contents Frequency histograms Relative Frequency Histograms Cumulative Frequency Graph Frequency Histograms in R Using the Cumulative Frequency Graph to Estimate

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Stochastic Loss Reserving with Bayesian MCMC Models Revised March 31

Stochastic Loss Reserving with Bayesian MCMC Models Revised March 31 w w w. I C A 2 0 1 4. o r g Stochastic Loss Reserving with Bayesian MCMC Models Revised March 31 Glenn Meyers FCAS, MAAA, CERA, Ph.D. April 2, 2014 The CAS Loss Reserve Database Created by Meyers and Shi

More information

Bayesian Multinomial Model for Ordinal Data

Bayesian Multinomial Model for Ordinal Data Bayesian Multinomial Model for Ordinal Data Overview This example illustrates how to fit a Bayesian multinomial model by using the built-in mutinomial density function (MULTINOM) in the MCMC procedure

More information

PASS Sample Size Software

PASS Sample Size Software Chapter 850 Introduction Cox proportional hazards regression models the relationship between the hazard function λ( t X ) time and k covariates using the following formula λ log λ ( t X ) ( t) 0 = β1 X1

More information

Chapter 4 Variability

Chapter 4 Variability Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry B. Wallnau Chapter 4 Learning Outcomes 1 2 3 4 5

More information

STAB22 section 1.3 and Chapter 1 exercises

STAB22 section 1.3 and Chapter 1 exercises STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information

Regression and Simulation

Regression and Simulation Regression and Simulation This is an introductory R session, so it may go slowly if you have never used R before. Do not be discouraged. A great way to learn a new language like this is to plunge right

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

STAT 157 HW1 Solutions

STAT 157 HW1 Solutions STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill

More information

Statistics for Engineering, 4C3/6C3, 2012 Assignment 2

Statistics for Engineering, 4C3/6C3, 2012 Assignment 2 Statistics for Engineering, 4C3/6C3, 2012 Assignment 2 Kevin Dunn, dunnkg@mcmaster.ca Due date: 23 January 2012 Assignment objectives: Use a table of normal distributions to calculate probabilities Summarizing

More information

Jacob: The illustrative worksheet shows the values of the simulation parameters in the upper left section (Cells D5:F10). Is this for documentation?

Jacob: The illustrative worksheet shows the values of the simulation parameters in the upper left section (Cells D5:F10). Is this for documentation? PROJECT TEMPLATE: DISCRETE CHANGE IN THE INFLATION RATE (The attached PDF file has better formatting.) {This posting explains how to simulate a discrete change in a parameter and how to use dummy variables

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

A new tool for selecting your next project

A new tool for selecting your next project The Quantitative PICK Chart A new tool for selecting your next project Author Sean Scott, PMP, is an accomplished Project Manager at Perficient. He has over 20 years of consulting IT experience providing

More information

Numerical Descriptive Measures. Measures of Center: Mean and Median

Numerical Descriptive Measures. Measures of Center: Mean and Median Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where

More information

One sample z-test and t-test

One sample z-test and t-test One sample z-test and t-test January 30, 2017 psych10.stanford.edu Announcements / Action Items Install ISI package (instructions in Getting Started with R) Assessment Problem Set #3 due Tu 1/31 at 7 PM

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions Frequentist Methods: 7.5 Maximum Likelihood Estimators

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

starting on 5/1/1953 up until 2/1/2017.

starting on 5/1/1953 up until 2/1/2017. An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,

More information

GARCH Models. Instructor: G. William Schwert

GARCH Models. Instructor: G. William Schwert APS 425 Fall 2015 GARCH Models Instructor: G. William Schwert 585-275-2470 schwert@schwert.ssb.rochester.edu Autocorrelated Heteroskedasticity Suppose you have regression residuals Mean = 0, not autocorrelated

More information

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture Trinity River Restoration Program Workshop on Outmigration: Population Estimation October 6 8, 2009 An Introduction to Bayesian

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Descriptive Statistics Bios 662

Descriptive Statistics Bios 662 Descriptive Statistics Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-08-19 08:51 BIOS 662 1 Descriptive Statistics Descriptive Statistics Types of variables

More information

Descriptive Statistics (Devore Chapter One)

Descriptive Statistics (Devore Chapter One) Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf

More information

First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016

First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016 First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016 You will have 70 minutes to complete this exam. Graphing calculators, notes, and textbooks are not permitted. I pledge

More information

Putting Things Together Part 2

Putting Things Together Part 2 Frequency Putting Things Together Part These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for, and are in

More information

Decision Trees: Booths

Decision Trees: Booths DECISION ANALYSIS Decision Trees: Booths Terri Donovan recorded: January, 2010 Hi. Tony has given you a challenge of setting up a spreadsheet, so you can really understand whether it s wiser to play in

More information

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model Assessment on Credit Risk of Real Estate Based on Logistic Regression Model Li Hongli 1, a, Song Liwei 2,b 1 Chongqing Engineering Polytechnic College, Chongqing400037, China 2 Division of Planning and

More information

A.REPRESENTATION OF DATA

A.REPRESENTATION OF DATA A.REPRESENTATION OF DATA (a) GRAPHS : PART I Q: Why do we need a graph paper? Ans: You need graph paper to draw: (i) Histogram (ii) Cumulative Frequency Curve (iii) Frequency Polygon (iv) Box-and-Whisker

More information

Conditional Power of One-Sample T-Tests

Conditional Power of One-Sample T-Tests ASS Sample Size Software Chapter 4 Conditional ower of One-Sample T-Tests ntroduction n sequential designs, one or more intermediate analyses of the emerging data are conducted to evaluate whether the

More information

Monte Carlo Simulations

Monte Carlo Simulations Is Uncle Norm's shot going to exhibit a Weiner Process? Knowing Uncle Norm, probably, with a random drift and huge volatility. Monte Carlo Simulations... of stock prices the primary model 2019 Gary R.

More information

Influence of Personal Factors on Health Insurance Purchase Decision

Influence of Personal Factors on Health Insurance Purchase Decision Influence of Personal Factors on Health Insurance Purchase Decision INFLUENCE OF PERSONAL FACTORS ON HEALTH INSURANCE PURCHASE DECISION The decision in health insurance purchase include decisions about

More information

Metropolis-Hastings algorithm

Metropolis-Hastings algorithm Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University March 27, 2018 Jarad Niemi (STAT544@ISU) Metropolis-Hastings March 27, 2018 1 / 32 Outline Metropolis-Hastings algorithm Independence

More information

1 Bayesian Bias Correction Model

1 Bayesian Bias Correction Model 1 Bayesian Bias Correction Model Assuming that n iid samples {X 1,...,X n }, were collected from a normal population with mean µ and variance σ 2. The model likelihood has the form, P( X µ, σ 2, T n >

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

We use probability distributions to represent the distribution of a discrete random variable.

We use probability distributions to represent the distribution of a discrete random variable. Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are

More information

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to

More information

Study 2: data analysis. Example analysis using R

Study 2: data analysis. Example analysis using R Study 2: data analysis Example analysis using R Steps for data analysis Install software on your computer or locate computer with software (e.g., R, systat, SPSS) Prepare data for analysis Subjects (rows)

More information

Hydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013

Hydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013 Hydrology 4410 Class 29 In Class Notes & Exercises Mar 27, 2013 Log Normal Distribution We will not work an example in class. The procedure is exactly the same as in the normal distribution, but first

More information

1. Empirical mean and standard deviation for each variable, plus standard error of the mean:

1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Solutions to Selected Computer Lab Problems and Exercises in Chapter 20 of Statistics and Data Analysis for Financial Engineering, 2nd ed. by David Ruppert and David S. Matteson c 2016 David Ruppert and

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

STA Module 3B Discrete Random Variables

STA Module 3B Discrete Random Variables STA 2023 Module 3B Discrete Random Variables Learning Objectives Upon completing this module, you should be able to 1. Determine the probability distribution of a discrete random variable. 2. Construct

More information

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation. 1 Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation. 2 Once we know the central location of a data set, we want to know how close things are to the center. 2 Once we know

More information

Lecture 2. Probability Distributions Theophanis Tsandilas

Lecture 2. Probability Distributions Theophanis Tsandilas Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 3, 208 [This handout draws very heavily from Regression Models for Categorical

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 12, 2018 CS 361: Probability & Statistics Inference Binomial likelihood: Example Suppose we have a coin with an unknown probability of heads. We flip the coin 10 times and observe 2 heads. What can

More information

Probability and Stochastics for finance-ii Prof. Joydeep Dutta Department of Humanities and Social Sciences Indian Institute of Technology, Kanpur

Probability and Stochastics for finance-ii Prof. Joydeep Dutta Department of Humanities and Social Sciences Indian Institute of Technology, Kanpur Probability and Stochastics for finance-ii Prof. Joydeep Dutta Department of Humanities and Social Sciences Indian Institute of Technology, Kanpur Lecture - 07 Mean-Variance Portfolio Optimization (Part-II)

More information

Outline. Review Continuation of exercises from last time

Outline. Review Continuation of exercises from last time Bayesian Models II Outline Review Continuation of exercises from last time 2 Review of terms from last time Probability density function aka pdf or density Likelihood function aka likelihood Conditional

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Chapter 8 Statistical Intervals for a Single Sample

Chapter 8 Statistical Intervals for a Single Sample Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample

More information

The CreditMetrics Package

The CreditMetrics Package The Creditetrics Package October 19, 2006 Version 0.0-1 Date 2006-10-18 Title Functions for calculating the Creditetrics risk model Author Andreas Wittmann aintainer Andreas Wittmann

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty George Photiou Lincoln College University of Oxford A dissertation submitted in partial fulfilment for

More information

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true)) Posterior Sampling from Normal Now we seek to create draws from the joint posterior distribution and the marginal posterior distributions and Note the marginal posterior distributions would be used to

More information

Problem Set 6. I did this with figure; bar3(reshape(mean(rx),5,5) );ylabel( size ); xlabel( value ); mean mo return %

Problem Set 6. I did this with figure; bar3(reshape(mean(rx),5,5) );ylabel( size ); xlabel( value ); mean mo return % Business 35905 John H. Cochrane Problem Set 6 We re going to replicate and extend Fama and French s basic results, using earlier and extended data. Get the 25 Fama French portfolios and factors from the

More information

Supplementary material for the paper Identifiability and bias reduction in the skew-probit model for a binary response

Supplementary material for the paper Identifiability and bias reduction in the skew-probit model for a binary response Supplementary material for the paper Identifiability and bias reduction in the skew-probit model for a binary response DongHyuk Lee and Samiran Sinha Department of Statistics, Texas A&M University, College

More information

F1 Results. News vs. no-news

F1 Results. News vs. no-news F1 Results News vs. no-news With news visible, the median trading profits were about $130,000 (485 player-sessions) With the news screen turned off, median trading profits were about $165,000 (283 player-sessions)

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

How I Trade Forex Using the Slope Direction Line

How I Trade Forex Using the Slope Direction Line How I Trade Forex Using the Slope Direction Line by Jeff Glenellis Copyright 2009, Simple4xSystem.net By now, you should already have both the Slope Direction Line (S.D.L.) and the Fibonacci Pivot (FiboPiv)

More information

Final Exam Suggested Solutions

Final Exam Suggested Solutions University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management BA 386T Tom Shively PROBABILITY CONCEPTS AND NORMAL DISTRIBUTIONS The fundamental idea underlying any statistical

More information

PAIRS TRADING (just an introduction)

PAIRS TRADING (just an introduction) PAIRS TRADING (just an introduction) By Rob Booker Trading involves substantial risk of loss. Past performance is not necessarily indicative of future results. You can share this ebook with anyone you

More information

Predicting the Market

Predicting the Market Predicting the Market April 28, 2012 Annual Conference on General Equilibrium and its Applications Steve Ross Franco Modigliani Professor of Financial Economics MIT The Importance of Forecasting Equity

More information

Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006

Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006 Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006 1 Using random samples to estimate a probability Suppose that you are stuck on the following problem:

More information

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD MAJOR POINTS Sampling distribution of the mean revisited Testing hypotheses: sigma known An example Testing hypotheses:

More information

1. better to stick. 2. better to switch. 3. or does your second choice make no difference?

1. better to stick. 2. better to switch. 3. or does your second choice make no difference? The Monty Hall game Game show host Monty Hall asks you to choose one of three doors. Behind one of the doors is a new Porsche. Behind the other two doors there are goats. Monty knows what is behind each

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

M249 Diagnostic Quiz

M249 Diagnostic Quiz THE OPEN UNIVERSITY Faculty of Mathematics and Computing M249 Diagnostic Quiz Prepared by the Course Team [Press to begin] c 2005, 2006 The Open University Last Revision Date: May 19, 2006 Version 4.2

More information

Occupancy models with detection error Peter Solymos and Subhash Lele July 16, 2016 Madison, WI NACCB Congress

Occupancy models with detection error Peter Solymos and Subhash Lele July 16, 2016 Madison, WI NACCB Congress Occupancy models with detection error Peter Solymos and Subhash Lele July 16, 2016 Madison, WI NACCB Congress Let us continue with the simple occupancy model we used previously. Most applied ecologists

More information

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER STA2601/105/2/2018 Tutorial letter 105/2/2018 Applied Statistics II STA2601 Semester 2 Department of Statistics TRIAL EXAMINATION PAPER Define tomorrow. university of south africa Dear Student Congratulations

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Arius Deterministic Exhibit Statistics

Arius Deterministic Exhibit Statistics Arius Deterministic Exhibit Statistics Milliman, Inc. 3424 Peachtree Road, NE Suite 1900 Atlanta, GA 30326 USA Tel +1 800 404 2276 Fax +1 404 237 6984 actuarialsoftware.com Information in this document

More information

Description of Data I

Description of Data I Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables In this chapter, we introduce a new concept that of a random variable or RV. A random variable is a model to help us describe the state of the world around us. Roughly, a RV can

More information

Quantitative Analysis and Empirical Methods

Quantitative Analysis and Empirical Methods 3) Descriptive Statistics Sciences Po, Paris, CEE / LIEPP Introduction Data and statistics Introduction to distributions Measures of central tendency Measures of dispersion Skewness Data and Statistics

More information

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind. Chapter 5 Categorical A general name for non-numerical data; the data is separated into categories of some kind. Nominal data Categorical data with no implied order. Eg. Eye colours, favourite TV show,

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

Properties of Probability Models: Part Two. What they forgot to tell you about the Gammas

Properties of Probability Models: Part Two. What they forgot to tell you about the Gammas Quality Digest Daily, September 1, 2015 Manuscript 285 What they forgot to tell you about the Gammas Donald J. Wheeler Clear thinking and simplicity of analysis require concise, clear, and correct notions

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

Tests for One Variance

Tests for One Variance Chapter 65 Introduction Occasionally, researchers are interested in the estimation of the variance (or standard deviation) rather than the mean. This module calculates the sample size and performs power

More information