* Solution Lab 2 - Logistic
use http://www.imm.ki.se/biostatistics/data/marathon.dta, clear
* Q1)
tabulate nas135 urinat3p, col
* Risk = P(Y=1|X=1) = 8/27
* Odds = P(Y=1|X=1)/(1-P(Y=1|X=1)) = 8/19
di 8/27
di 8/19
/*
For those who reported urine output 3 times of more, the risk of
hyponatremia was 30%
*/
* Risk = P(Y=1|X=0) = 51/453
* Odds = P(Y=1|X=0)/(1-P(Y=1|X=0)) = 51/402
di 51/453
di 51/402
* For those who reported urine output less than 3 times , the risk of
* hyponatremia was 11%
* Odds1 / Odds0
display ( 8/19) / ( 51/402)
* Risk1/ Risk0
display (8/27) / (51/453)
/* Those runners who reported urine output 3 times of more had
a 3.3 fold increase odds of hyponatremia
*/
* H0: OR = 1
* Can I use intercheangebly risk and odds to describe the
* occurrence of a binary outcome? It depends
* if p < .1
scalar p = .01
scalar odds = p/(1-p)
di p
di odds
logistic nas135 urinat3p
* Hand calculations for the SE of log(OR)
scalar se = sqrt(1/402+1/51+1/8+1/19)
scalar logor = log(( 8/19) / ( 51/402))
di exp(logor-1.96*se)
di exp(logor+1.96*se)
* log(odds|urinat3p) = b0 + b1*urinat3p
* log(odds|urinat3p=0) = b0
* log(odds|urinat3p=1) = b0 + b1
* (odds|urinat3p=1) / (odds|urinat3p=0) = exp(b1)
logistic nas135 urinat3p
* Pr(Y=1|x=0) = invlogit(b0)
di invlogit(_b[_cons])
* Pr(Y=1|x=1) = invlogit(b0 + b1)
di invlogit(_b[_cons] + _b[urinat3p]*1)
* q2
* Can you describe how the outcome risk is changing
* according to the running time?
su runtime
kdensity runtime
* (risk|runtime) = invlogit(b0 + b1*runtime)
* (odds|runtime) = exp(b0 + b1*runtime)
* log(odds|runtime) = b0 + b1*runtime
* b0 = the log odds of hyponatremia among those
* runners who completed the marathon in 0 minutes.
* Data extrapolation
* b1 = the log odds ratio of hyponatremia associated
* with every 1 minute increase in running time
* exp(b1) = the odds ratio of hyponatremia associated
* with every 1 minute increase in running time
logistic nas135 runtime
* OR = 1.02
/*
Every 1 minute increase in running time is associated
with a 2% higher odds of hyponatremia.
*/
* What is the odds ratio of hyponatremia associated with 30 minutes increase?
* exp(b1*30)
display exp(_b[runtime]*30)
lincom _b[runtime]*30, eform
/*
Every 30 minutes increase in running time is associated
with a 59% higher odds of hyponatremia.
*/
* What is the risk of hyponatremia for those who completed the marathon in 140 minutes?
di invlogit(_b[_cons]+ _b[runtime]*140)
di invlogit(_b[_cons]+ _b[runtime]*210)
di invlogit(_b[_cons]+ _b[runtime]*500)
tw (function risk = invlogit(_b[_cons]+ _b[runtime]*x), range(140 400))
* exp(b1*30)
display exp(.0155019*30)
/*
Every 30 minutes increase in running time is associated with
a 60% higher odds (95% CI = 1.3 to 1.9) of hyponatremia.
*/
lincom _b[runtime]*30, eform
* Odds Ratio comparing x1 vs x2 of the continuous predictor
* based on the specified linear-response model.
* exp(b1*(x1-x2))
lincom _b[runtime]*(180-210), eform
/*
Compared to those running 3:30 hr:mm, the odds of
hyponatremia among those who run 3:00 hr:mm was 37% lower
(95% CI = 0.5 to 0.75).
*/
lincom _b[runtime]*(300-210), eform
/*
Compared to those running 3:30 hr:mm, the odds of
hyponatremia among those who run 5:00 hr:mm was 4 times
larger (95% CI = 2.3 to 7.0).
*/
* Plot the linear-response model using 3:30 hr:mm as referent.
* Store the OR and confidence limits as function of running time
* get log ORs and 95% CIs
predictnl logor = _b[runtime]*(runtime-210), ci(lo hi)
* exponentiate to get ORs and 95% CIs
gen or = exp(logor)
gen lb = exp(lo)
gen ub = exp(hi)
* plot ORs and 95% CIs on the log scale
tw (line or lb ub runtime, sort lp(l - -)) ///
if inrange(runtime,180,330) , ///
yscale(log) ///
ytitle("Odds Ratio") ///
ylabel(.5 1 2 4 8, angle(horiz) format(%3.2fc) ) ///
xtitle("Race duration, minutes") ///
legend(off)
* Q3. Categorical exposure.
gen runtimeh = runtime/60
gen runtimehc = 1 if runtimeh < 3.5
replace runtimehc = 2 if runtimeh >= 3.5 & runtimeh <= 4
replace runtimehc = 3 if runtimeh > 4 & runtimeh < .
label define runtimehc 1 "<3:30" 2 "3:30-4:00" 3 ">4:00"
label values runtimehc runtimehc
tabulate runtimehc
tabulate nas135 runtimehc, col
tabstat nas135, by(runtimehc) format(%3.2fc)
/*
The risk of hyponatremia is increasing with race duration,
ranging from 4% among those running <3:30 hours up to
23% among those running more than 4:00 hours.
*/
/*
We specify a logistic regression with indicator variables
for the categorized version of running time.
The prefix command xi: together with the shortcut i.runtimehc
will create (0/1) indicator variables.
_Iruntimehc_1 (1 if runtimehc==1 and 0 otherwise)
_Iruntimehc_2 (1 if runtimehc==2 and 0 otherwise)
_Iruntimehc_3 (1 if runtimehc==3 and 0 otherwise)
We omit the indicator variable for the lowest category (default).
log(odds) = b0 + b1*_Iruntimehc_2 + b2*_Iruntimehc_3
*/
tabodds nas135 runtimehc , or base(1)
xi: logistic nas135 i.runtimehc
* The odds of hyponatremia among those running <3:30 was 4% (4 cases every 100 non-cases).
* Compared to those running less than 3:30, those who run 3:30-4:00 had 3.7 fold
* increase odds of hyponatremia (95% CI = 1.6-8.6).
* Compared to those running less than 3:30, those who run more than 4:00 had 6.7 fold
* increase odds of hyponatremia (95% CI = 3-15).
* An overall p-value for the association of the categorized covariate and the
* odds of hyponatremia can be obtained by testing simultaneously the
* regression coefficients related to the indicator variables equal to zero.
testparm _Iruntimehc_2 _Iruntimehc_3
* The small p-value suggests an overall statistically significant association
* between race duration categorized and the odds of hyponatremia.
* Q4. Continuous exposure. Quadratic assumption for the relation between
* race duration and odds of hyponatremia.
gen runtimehsq = runtimeh^2
logistic nas135 runtimeh runtimehsq
testparm runtimeh runtimehsq
testparm runtimehsq
/*
We found a significant association between race duration
and odds of hyponatremia (p<.001) with no evidence of non-linearity (p=0.389).
*/
* Based on the quadratic model.
lincom _b[runtimeh]*(3-3.5) + _b[runtimehsq]*(9-12.25), eform
/*
Compared to those running 3:30 hr:mm, the odds of
hyponatremia among those who run 3:00 hr:mm was 48% lower
(95% CI = 0.3 to 0.8).
*/
lincom _b[runtimeh]*(5-3.5) + _b[runtimehsq]*(25-12.25), eform
/*
Compared to those running 3:30 hr:mm, the odds of
hyponatremia among those who run 5:00 hr:mm was 4.2 times
larger (95% CI = 2.4 to 7.3).
*/
* Store the log OR and 95% CI based on the quadratic model using 3.5 as referent.
predictnl logorsq = _b[runtimeh]*(runtimeh-3.5) + _b[runtimehsq]*(runtimehsq-12.25)
* Get ORs=exp(logORs)
gen orsq = exp(logorsq)
* Overlay the linear trend and quadratic trend for the ORs
tw (line ors or lb ub runtimeh, sort lp(longdash l dot dot)) ///
if inrange(runtimeh,3,5.5) , ///
yscale(log) ///
ytitle("Odds Ratio") ///
ylabel(.5 1 2 4 8, angle(horiz) format(%3.2fc) ) ///
xtitle("Race Duration (hr:mim)") ///
legend(off) plotregion(style(none)) ///
xlabel(3 "3:00" 3.5 "3:30" 4 "4:00" 4.5 "4:30" 5 "5:00" 5.5 "5:30")