Appropriate exploratory analysis including profile plots and transformation of variables (i.e. log(nihss)) as appropriate will occur.

Similar documents
Homework 0 Key (not to be handed in) due? Jan. 10

To be two or not be two, that is a LOGISTIC question

Lecture 21: Logit Models for Multinomial Responses Continued

Topic 8: Model Diagnostics

8.1 Example: Hormone treatment of steers Example with different slopes Example: Concentration of a hormone in cattle...

ECG 752: Econometrics II Spring Assessed Computer Assignment 3: Answer Key

Chapter 8. Sampling and Estimation. 8.1 Random samples

2 Exploring Univariate Data

SAS Simple Linear Regression Example

Basic Procedure for Histograms

Some estimates of the height of the podium

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

DATA SUMMARIZATION AND VISUALIZATION

Found under MATH NUM

1. Distinguish three missing data mechanisms:

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing

EXAMPLE 4: DISTRIBUTING HOUSEHOLD-LEVEL INFORMATION TO RESPONDENTS

STAT 113 Variability

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Decile Analysis: Perspective and Performance

Lecture 2 Describing Data

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Building Better Credit Scores using Reject Inference and SAS

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Frequency Distribution and Summary Statistics

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Study 2: data analysis. Example analysis using R

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Putting Things Together Part 2

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Ti 83/84. Descriptive Statistics for a List of Numbers

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Lecture Week 4 Inspecting Data: Distributions

BIOS 4120: Introduction to Biostatistics Breheny. Lab #7. I. Binomial Distribution. RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.

Developing WOE Binned Scorecards for Predicting LGD

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

In-House Counsel COMPENSATION REPORT COPYRIGHT 2015 GENERAL COUNSEL METRICS. ALL RIGHTS RESERVED

STAT 157 HW1 Solutions

How Wealthy Are Europeans?

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Introduction to Descriptive Statistics

One Sample T-Test With Howell Data, IQ of Students in Vermont

starting on 5/1/1953 up until 2/1/2017.

Lecture 3: Data Description - Multiple Attributes

Data screening, transformations: MRC05

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

DATA HANDLING Five-Number Summary

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

2CORE. Summarising numerical data: the median, range, IQR and box plots

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Describing Data: One Quantitative Variable

Lecture 1: Review and Exploratory Data Analysis (EDA)

Chapter 3. Populations and Statistics. 3.1 Statistical populations

Unit 5: Study Guide Multilevel models for macro and micro data MIMAS The University of Manchester

Quantile regression and surroundings using SAS

The Impact of Fee Schedule Updates on Physician Payments

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

The SAS System 11:03 Monday, November 11,

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

APPLICATIONS OF STATISTICAL DATA MINING METHODS

EXAMPLE 6: WORKING WITH WEIGHTS AND COMPLEX SURVEY DESIGN

Building and Checking Survival Models

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Measures of Association

STA 4504/5503 Sample questions for exam True-False questions.

Analysis of fi360 Fiduciary Score : Red is STOP, Green is GO

Variance, Standard Deviation Counting Techniques

Performance of. Gilt Mutual Funds. ICRA Online Limited

Using the TI-83 Statistical Features

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Session 5: Associations

Empirical Rule (P148)

2012 Oregon Child Care Market Price Study

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Business Statistics 41000: Probability 4

PASS Sample Size Software

Exploring Data and Graphics

Mathematics 1000, Winter 2008

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

We will use an example which will result in a paired t test regarding the labor force participation rate for women in the 60 s and 70 s.

STROKE HOSPITALIZATIONS

Description of Data I

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

22S:105 Statistical Methods and Computing. Two independent sample problems. Goal of inference: to compare the characteristics of two different

Descriptive Statistics (Devore Chapter One)

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level STATISTICS 4040/01

Chapter 6 Simple Correlation and

Transcription:

Final Examination Project Biostatistics 581 Winter 2009 William Meurer, M.D. Introduction: The NINDS tpa stroke study was published in 1995. This medication remains the only FDA approved medication for the treatment of acute stroke. The use of this drug has remained controversial despite proven benefit; as outcomes (with respect to level of disability) were improved at 90 days. Benefit (in terms of improvement based on neurological exam) was not established in the short term as the primary outcome of part I of the study was a 4 point or more improvement in the National Institutes of Health Stroke Scale (NIHSS) at 24 hours. As there is significant biological plausibility that improvement at 24 hours is predictive of ultimate outcome it would be useful to develop a model that could predict with confidence the final degree of improvement based on changes within the first 24 hours; as this might allow for future acute stroke trials to be expedited. The Dataset: A completely de-identified data set is available from the federal government with the data from all 624 patients enrolled in the trial. Of interest for this evaluation is the serial measurements of the NIHSS score (measured at baseline or prior to treatment, 2 hours, 24 hours and 90 days.) Data on level of disability at 90 days is also described using the modified Rankin Scale (mrs), which ranges from 0 (normal), 1 (no significant disability), 2 (some disability) to 6 (death). The proposed data analysis: I proposed the construction of 2 separate models: Outcome: NIHSS at 90 days Predictors: NIHSS at baseline, 2hr, 24 hours (slope), treatment (yes/no), age, systolic blood pressure, glucose, slope*treatment Outcome: mrs at 90 days Predictors: NIHSS at baseline, 2hr, 24 hours (slope), treatment (yes/no), age, systolic blood pressure, glucose, slope*treatment Appropriate exploratory analysis including profile plots and transformation of variables (i.e. log(nihss)) as appropriate will occur. Raghu s comments This looks good. You might also want to study whether the longitudinal pattern of NIHSS differ by mrs by using NIHSS as outcome and mrs as between subject factor as it is measured only once at the 90 days.

Results: Please see the appendix for SAS code. Based on the above objectives, I generated exploratory profile plots. First in figure 1, I plotted all subjects by treatment group (tpa is blue dots connected by blue lines and placebo is open red boxes connected by red lines). Overall trend lines for each group (tpa versus placebo) were plotted. Given the time points were 0, 2, 24, 168 and 2160 hours I felt that plotting the time on the log base 10 scale would represent the data well. (Each of the time points was shifted ahead 1 hour since log(1) = 0.) Visually comparing the trend lines suggests that the majority of the separation between the tpa and control groups is occurring when comparing baseline to two hours. This makes biological sense as well; since re-canalization of an occluded artery would be likely to lead to such observed rapid improvement. In figure 2, the trend lines were plotted to examine trends in the change in NIHSS based on treatment group assignment and ultimate outcome (which was defined as good if the mrs was 0 or 1.) From this, one observes that in subjects who ultimately do not have a good outcome (mrs 2-6) there is very little difference in the trajectory of NIHSS over time. There is separation between the groups who ultimately have a good outcome. The tpa treated group has more rapid improvement and again this separation appears to occur in the first two hours. In figure 3 we examined the overall trends in NIHSS over time depending on the ultimate outcome, when considered over the entire mrs. (SAS would not add a legend. The fitted line closest to the bottom of the graph represents a 3 month mrs of 0 and the top line represents a 3 month mrs of 6.) The best ultimate outcomes appear to be associated with the most sharp improvements. Based on these findings, I assessed the ability of slope of NIHSS change (representing decrease in NIHSS points per hour) using the slope at 2 hours and at 24 hours. Interestingly, the majority of the slope was contributed by the change in the first 2 hours. I examined whether the mean slope (2 hr) varied between treatment and placebo groups. The mean slope for tpa treated patients was 1.24 (95% CI 0.91-1.56) and for placebo was 0.74 (0.51-0.96). There was a significant difference (t=2.47 p = 0.014). Since 2hr slope appeared to represent a potential variable of interest I built a logistic regression model with the outcome of dichotomous mrs (0-1 = good outcome, 2-6 = bad outcome). Point 95% Wald Effect Estimate Confidence Limits 2hr Slope 1.252 1.151 1.361 AGE 0.978 0.963 0.993 Serum Glucose 0.999 0.996 1.001 SBP 0.997 0.988 1.005 tpa treated 1.988 1.387 2.848 This demonstrated that an increase in the 2 hr slope (and thus a larger decrease in NIHSS at 2 hours from pre-treatment) was highly predictive of 3 month outcome. The results were quite similar when ordinal regression was used (and the full 0 to 6 range of mrs was employed as the outcome.)

The distribution of 2 hr slope per each treatment group was approximately normal and is depicted in the included histogram. To further describe the distribution of NIHSS at each time point by treatment group, boxplots were constructed (Figure 4.) The distributions in the tpa group appear to be shifted downward (meaning less neurological deficit) for all time points. Figure 5 demonstrates that the overall mrs outcome is better in the tpa group compared to placebo. I examined whether quartile of 2hr slope was predictive of 3 month outcome. (The median of 2 hr slope was 0.5 with an IQR of 0 2. Therefore the 25 th percentile was negative values for 2 hr slope and thus represented neurological deterioration from baseline.) tpa treatment was considered as a covariate in this logistic regression model of the dichotomous mrs outcome. (4 th quartile represents most improvement.) Point 95% Wald Effect Estimate Confidence Limits 2hr slope quartile 2 vs 1 1.149 0.625 2.110 quartile 3 vs 1 1.647 0.996 2.724 quartile 4 vs 1 3.829 2.326 6.301 tpa treated 1 vs 0 1.856 1.309 2.632 Not surprisingly, the subjects in the 4 th quartile had a significantly higher odds of a good outcome than those in the lowest quartile. The raw number distributions of 3 month outcome on mrs by 2hr slope quartile are given in Figure 6. I built a random effects model to predict NIHSS using PROC mixed, including potentially important co-variates. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F hour 1 2487 0.70 0.4045 hr2 1 617 52.26 <.0001 tpa 1 617 6.56 0.0106 AGE 1 617 4.57 0.0329 LGLU 1 617 5.97 0.0148 This appears to indicate that the effect of time on NIHSS was not significant. (Although the 2 hour slope was a significant predictor). As per your suggestion I added 3 month mrs into the model in an attempt to account for differences in trajectory of NIHSS between ultimate mrs. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F hour 1 2487 0.70 0.4045 hr2 1 616 21.60 <.0001 tpa 1 616 0.25 0.6153 AGE 1 616 0.98 0.3229 LGLU 1 616 1.90 0.1682 rank3m 1 616 1544.79 <.0001 It is well known (and would be suspected) that the NIHSS would be closely related to the mrs at 90 days and this covariate has a stronger association with that than the others. However, the appeal of using the 2 hour slope to predict 3 month outcome is in the design of future stroke trials; one would not know what the mrs 3 months from now would be with good certainty.

I constructed a similar model in PROC mixed, but I restricted to the first two timepoints (0 and 2 hours). I did this based on what I observed in the profile plots. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F hour 1 604 25.47 <.0001 hr2 1 600 27.03 <.0001 tpa 1 600 0.90 0.3429 AGE 1 600 6.79 0.0094 LGLU 1 600 3.11 0.0783 CSYS 1 600 4.27 0.0391 hour*tpa 1 604 5.21 0.0228 Substantiating what was observed in the plots, there is a significant impact of change over time. The treatment time interaction is significant; but the tpa treatment main effect is not (likely owing to a lack of difference between the tpa and placebo groups at baseline implying that randomization worked adequately. Appendix SAS Code libname nind '.'; Options FMTSEARCH=(nind.formats); data nind; set nind.ninds_updated_all; proc format; value newrank_f 0='0-No symptoms' 1='1-No significant disability' 2='2-Slight disability' 3='3-Moderate disability' 4='4-Moderately severe disability' 5='5-Severe disability' 6='6-Dead'; data nind; set nind; format rank3m newrank_f.; proc contents; data nind_shorter; set nind; keep record baseline nihhr2 nihhr24 nihday710 nihm3; data nind; *calculate slope of treatment response; set nind; hr2 = (baseline - nihhr2) / 2; *slope at 2 hours; hr22 = (nihhr2 - nihhr24) / 22; *slope from 2-24 hours; slope = hr2 + hr22; *total slope for first 24 hours - units NIHSS point per hour; ods pdf file = '.\histogram.pdf'; proc univariate data=nind noprint; class treatcd;

histogram hr2 / midpoints =-6 to 12 by 1 cfill=red normal; title "Distribution of Delta NIHSS per hour based on 2 hr"; title; ods pdf close; proc ttest data=nind; class treatcd; var hr2 hr22 slope; data nind; set nind; if treatcd = 1 then tpa =1; else if treatcd = 2 then tpa=0; slope_treat = slope*tpa; hr2_treat = hr2*tpa; proc reg data=nind; model nihm3 = hr2 age lglu /*csys*/ tpa /*hr2_treat*/; proc logistic data=nind; class tpa(param=ref ref=first); model rank3m = hr2 age lglu /*csys*/ tpa / link=clogit; proc logistic data=nind; class tpa(param=ref ref=first); model rankin1(ref=first) = hr2 age lglu csys tpa ; proc means data=nind median p25 p75; var slope hr2 hr22; *make a category for quartile of slope; data nind; set nind; if slope < -0.1818182 then qslope = 1 ; if -0.1818182 =< slope < 0.7727273 then qslope = 2; if 0.7727273 =< slope < 2.1363636 then qslope = 3; if slope >= 2.1363636 then qslope = 4; data nind; set nind; if hr2 < 0 then qhr2 = 1 ; if 0 =< hr2 < 0.5 then qhr2 = 2; if 0.5 =< hr2 < 2 then qhr2 = 3; if hr2 >= 2 then qhr2 = 4; proc sort data=nind; by tpa; /* ods rtf file='.\bars_new.rtf'; proc freq data=nind; by tpa; tables qhr2*rank3m qslope*rank3m qhr2*rankin1 qslope*rankin1 / nocol norow nopct; ods rtf close;*/ proc logistic data=nind; *gives estimates of odds ratios for quartile; class qhr2(param=ref ref=first) tpa(param=ref ref=first);

model rankin1 (event='yes') = qhr2 tpa; proc sort data=nind_shorter; by record; proc sort data=nind; by record; proc transpose data=nind_shorter out=wide; by record; proc print data=wide; proc contents data=wide; data nind_wide; merge wide nind; by record; rename COL1 = nih; data nind_wide; set nind_wide; if _name_ = 'BASELINE' then hour = 0; else if _name_ = 'nihhr2' then hour = 2; else if _name_ = 'nihhr24' then hour = 24; else if _name_ = 'nihday710' then hour = 168; else if _name_ = 'nihm3' then hour = 2160; proc sort data=nind_wide; by treatcd hour; ods pdf file = '.\boxplots.pdf'; proc boxplot data=nind_wide; by treatcd; plot nih*hour / boxstyle=schematic boxwidth=5 ; title 'Figure 4: Distribution of NIHSS at each time point by treatment'; proc sort data=nind; by treatcd; proc boxplot data=nind; plot rank3m*treatcd / boxstyle=schematic boxwidth=5 ; title 'Figure 5: Distribution of mrs at 3 months by treatment'; ods pdf close; data nind_wide; set nind_wide; if hour ne 0 then log_hour = log(hour); else if hour = 0 then log_hour = 0; proc format; value outgroup_f 1='tPA treated, mrs 0-1 at 3 months' 2='tPA treated, mrs 2-6 at 3 months' 3='placebo treated, mrs 0-1 at 3 months' 4='placebo treated, mrs 2-6 at 3 months'; data nind_wide; set nind_wide; hour1 = hour+1; *This sets the baseline time to 1 from zero so that the plots of hours on log scale look right;

if treatcd = 1 and rankin1 = 1 then outgroup = 1; if treatcd = 1 and rankin1 = 0 then outgroup = 2; if treatcd = 2 and rankin1 = 1 then outgroup = 3; if treatcd = 2 and rankin1 = 0 then outgroup = 4; format outgroup outgroup_f.; /* set graphic options for spaghetti plots */ goptions reset = all; goptions colors=() ftext=simplex htext=10; axis1 width=3 major=(h=1 w=3) minor=none label=( angle=90 'NIH Stroke Scale') offset=(2); axis2 width=3 major=(h=1 w=3) minor=none label=( angle=0 'Hour of study (log base 10 scale) Baseline = Hour 1') offset=(2) logbase=10 logstyle=expand; ods pdf file ='.\profile.pdf' dpi=1200; symbol1 interpol=join color=blue value=dot repeat=312 line=1; *treatment; symbol2 interpol=join color=red value=square repeat=312 line=1; *placebo; proc gplot data=nind_wide; *response based on treatment group; *where hour in (0 2 24 168); plot nih * hour1=record/ nolegend vaxis=axis1 haxis=axis2; plot2 nih * hour1=treatcd /; goptions htext=1; symbol3 v=none i=sm50sm color=black width=6 line=1; symbol4 v=none i=sm50sm color=black width=6 line=3; title "Figure 1: Change in NIHSS based on treatment group"; quit; proc gplot data=nind_wide; *response based on ultimate outcome; *where hour in (0 2 24 168); plot nih * hour1=record/ nolegend vaxis=axis1 haxis=axis2; plot2 nih * hour1=outgroup /; goptions htext=1; symbol3 v=none i=sm50sm color=black width=6 line=1; symbol4 v=none i=sm50sm color=black width=6 line=3; symbol5 v=none i=sm50sm color=black width=6 line=5; symbol6 v=none i=sm50sm color=black width=6 line=7; title "Figure 2: Trajectory of NIHSS over time based on treatment group and 3 month outcome"; quit; proc gplot data=nind_wide; *response based on mrs at 3 months; plot nih * hour1=record/ nolegend vaxis=axis1 haxis=axis2; plot2 nih * hour1=rank3m /; legend position=bottom mode=protect; goptions htext=1; symbol3 v=none i=sm50sm color=black width=6 line=1; symbol4 v=none i=sm50sm color=black width=6 line=2; symbol5 v=none i=sm50sm color=black width=6 line=3; symbol6 v=none i=sm50sm color=black width=6 line=4;

symbol7 v=none i=sm50sm color=black width=6 line=5; symbol8 v=none i=sm50sm color=black width=6 line=6; symbol9 v=none i=sm50sm color=black width=6 line=7; title "Figure 3: Trajectory of NIHSS over time based on mrs at 3 months"; quit; ods pdf close; proc sort data=nind_wide; by tpa record; * model at 2 hours; proc mixed method=reml noitprint dfbw maxiter=400 data=nind_wide; where hour < 3; model nih=hour tpa age lglu csys tpa*hour / solution; random intercept hour/subject=record type=un; * model including all times; proc mixed method=reml noitprint dfbw maxiter=400 data=nind_wide; model nih=hour hr2 tpa age lglu / solution; random intercept hour/subject=record type=un; *above models repeated to account for differences in trajectory based on ultimate outcome; * model at 2 hours; proc mixed method=reml noitprint dfbw maxiter=400 data=nind_wide; where hour < 3; model nih=hour hr2 tpa age lglu csys tpa*hour / solution; random intercept hour/subject=record type=un; * model including all times - model fit improves if mrs at 3 months included; proc mixed method=reml noitprint dfbw maxiter=400 data=nind_wide; model nih=hour hr2 tpa age lglu rank3m / solution; random intercept hour/subject=record type=un; proc freq data=nind_wide; tables rank3m;

Figure 6: Distribution of outcomes based on 2 hr delta NIHSS tpa treated patients Quartiles of 2 hr decrease in NIHSS 4 32 36 3 18 16 9 14 2 1 14 2 7 1 6 10 4 7 9 9 12 6 2 12 7 15 2 8 19 15 10 10 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Proportion (Number in box) of patients based on 90 day mrs No symptoms No significant disability Slight disability Moderate disability Moderate severe disability Severe disability Death Quartiles of 2 hr decrease in NIHSS Placebo Placebo treated patients 4 15 No 15 12 6 14 3 6 No significan Moderate 3symptom 13 t 15 Slight Moderate 15 severe 19 Severe 23 8 20 s disability disability disability disability disability Death 2 1 1 4 9 11 2 8 7 13 1411 2 27 8 11 2 1 9 2 7 11 8 11 1 4 11 8 13 14 2 27 3 13 15 15 19 23 8 20 4 15 15 12 6 14 3 6 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Proportion (Number in box) of patients based on 90 day mrs No symptoms No significant disability Slight disability Moderate disability Moderate severe disability Severe disability Death