-------------------------------------------------------------------------------------------------------------------------------------------
help for rocss
-------------------------------------------------------------------------------------------------------------------------------------------
ROC curve and other statistics for any classification method
rocss dep_var prob_var [if exp] [in range] [, ncut(#) savedata(filename) graph replace]
Description
rocss calculates sensitivity, specificity, cumulative area under the ROC curve and percentage of subjects correctly classified at
user-specified probability cutoffs.
dep_var is the binary outcome variable coded 0, 1.
prob_var contains the estimated probabilities that dep_var==1.
An example with four observations:
id dep_var prob_var
1 0 0.2
2 1 0.8
3 1 0.9
4 0 0.3
Remarks
Unlike lsens, rocss is not a post-estimation command and allows the user to set arbitrary probability cutoffs. However, if used on
predicted probabilities derived by logistic, logit or probit, rocss represents a flexible alternative to lsens.
Options
ncut(#) specifies the number equally spaced probability intervals in the range 0, 1. The number of corresponding probability cutoffs
will be (# + 1), at values 0, 1/#, 2/#, ..., 1. The default is 10 equally spaced intervals.
savedata(filename) specifies the name of a new dataset created to contain the probability cutoffs and corresponding sensitivity,
specificity, cumulative area under the ROC curve and the percentage of subjects correctly classified. The dataset is saved in the
current directory.
graph graphs sensitivity versus 1-specificity (help for lroc) calculated at each probability cutoff.
replace requests that if the dataset specified in savedata(filename) already exists, it should be overwritten.
Examples
. webuse lbw, clear
. logistic low age lwt smoke ptl ht ui
. lstat
. lroc, nograph
. lstat, cutoff(0.30)
. predict p
. rocss low p // compare the results
. rocss low p, ncut(20) gr
. rocss low p, saved(allsens)
. rocss low p, ncut(80) gr saved(allsens) rep
Authors
Nicola Orsini, Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden and Institute of Information Science and
Technology, National Research Council of Italy, Pisa, Italy.
Matteo Bottai, Arnold School of Public Health, University of South Carolina, Columbia, USA and Institute of Information Science and
Technology, National Research Council of Italy, Pisa, Italy.
Also see
[R] logistic
On-line: help for help for lroc, lstat, lsens, roc
Click here to run or save the do-file for the following worked examples and be sure to have an update version. Type
. net install http://nicolaorsini.altervista.org/stata/rocss
. which rocss
c:\ado\plus\r\rocss.ado
*! Version 1.0 - March 11, 2004 - N.Orsini
. webuse lbw, clear
(Hosmer & Lemeshow data)
. logistic low age lwt smoke ptl ht ui
Logistic regression Number of obs = 189
LR chi2(6) = 25.88
Prob > chi2 = 0.0002
Log likelihood = -104.39591 Pseudo R2 = 0.1103
------------------------------------------------------------------------------
low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .9586258 .0331527 -1.22 0.222 .895801 1.025857
lwt | .9858131 .0065579 -2.15 0.032 .9730433 .9987505
smoke | 1.734347 .5959725 1.60 0.109 .8843789 3.401213
ptl | 1.80987 .630593 1.70 0.089 .9142657 3.582796
ht | 6.439757 4.419149 2.71 0.007 1.677839 24.7166
ui | 2.089219 .9537039 1.61 0.107 .8539267 5.111489
------------------------------------------------------------------------------
. lstat
Logistic model for low
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 16 13 | 29
- | 43 117 | 160
-----------+--------------------------+-----------
Total | 59 130 | 189
Classified + if predicted Pr(D) >= .5
True D defined as low != 0
--------------------------------------------------
Sensitivity Pr( +| D) 27.12%
Specificity Pr( -|~D) 90.00%
Positive predictive value Pr( D| +) 55.17%
Negative predictive value Pr(~D| -) 73.13%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 10.00%
False - rate for true D Pr( -| D) 72.88%
False + rate for classified + Pr(~D| +) 44.83%
False - rate for classified - Pr( D| -) 26.88%
--------------------------------------------------
Correctly classified 70.37%
--------------------------------------------------
. lroc, nograph
Logistic model for low
number of observations = 189
area under ROC curve = 0.7373
. lstat, cutoff(0.30)
Logistic model for low
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 38 42 | 80
- | 21 88 | 109
-----------+--------------------------+-----------
Total | 59 130 | 189
Classified + if predicted Pr(D) >= .3
True D defined as low != 0
--------------------------------------------------
Sensitivity Pr( +| D) 64.41%
Specificity Pr( -|~D) 67.69%
Positive predictive value Pr( D| +) 47.50%
Negative predictive value Pr(~D| -) 80.73%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 32.31%
False - rate for true D Pr( -| D) 35.59%
False + rate for classified + Pr(~D| +) 52.50%
False - rate for classified - Pr( D| -) 19.27%
--------------------------------------------------
Correctly classified 66.67%
--------------------------------------------------
. predict p
(option p assumed; Pr(low))
. rocss low p // compare the results
+------------------------------------------------------+
| cutoff sens spec omspec cclass carea |
|------------------------------------------------------|
1. | 0.000 1.0000 0.0000 1.0000 31.2169 0.0000 |
2. | 0.100 1.0000 0.0769 0.9231 36.5079 0.0769 |
3. | 0.200 0.8983 0.3692 0.6308 53.4392 0.3544 |
4. | 0.300 0.6441 0.6769 0.3231 66.6667 0.5917 |
5. | 0.400 0.4407 0.8615 0.1385 73.0159 0.6918 |
6. | 0.500 0.2712 0.9000 0.1000 70.3704 0.7055 |
7. | 0.600 0.1864 0.9462 0.0538 70.8995 0.7160 |
8. | 0.700 0.0508 0.9846 0.0154 69.3122 0.7206 |
9. | 0.800 0.0169 0.9923 0.0077 68.7831 0.7209 |
10. | 0.900 0.0000 1.0000 0.0000 68.7831 0.7209 |
11. | 1.000 0.0000 1.0000 0.0000 68.7831 0.7209 |
+------------------------------------------------------+
Number of observations = 189
Number of probability cutoffs (10+1) = 11
Area under ROC curve = 0.7209
Highest value of correctly classified = 73.0159
. rocss low p, ncut(20) gr
+------------------------------------------------------+
| cutoff sens spec omspec cclass carea |
|------------------------------------------------------|
1. | 0.000 1.0000 0.0000 1.0000 31.2169 0.0000 |
2. | 0.050 1.0000 0.0077 0.9923 31.7460 0.0077 |
3. | 0.100 1.0000 0.0769 0.9231 36.5079 0.0769 |
4. | 0.150 0.9661 0.1846 0.8154 42.8571 0.1828 |
5. | 0.200 0.8983 0.3692 0.6308 53.4392 0.3549 |
6. | 0.250 0.8475 0.5385 0.4615 63.4921 0.5026 |
7. | 0.300 0.6441 0.6769 0.3231 66.6667 0.6059 |
8. | 0.350 0.5424 0.8000 0.2000 71.9577 0.6789 |
9. | 0.400 0.4407 0.8615 0.1385 73.0159 0.7091 |
10. | 0.450 0.3220 0.8923 0.1077 71.4286 0.7209 |
11. | 0.500 0.2712 0.9000 0.1000 70.3704 0.7231 |
12. | 0.550 0.2542 0.9077 0.0923 70.3704 0.7252 |
13. | 0.600 0.1864 0.9462 0.0538 70.8995 0.7336 |
14. | 0.650 0.1017 0.9692 0.0308 69.8413 0.7370 |
15. | 0.700 0.0508 0.9846 0.0154 69.3122 0.7381 |
16. | 0.750 0.0339 0.9923 0.0077 69.3122 0.7385 |
17. | 0.800 0.0169 0.9923 0.0077 68.7831 0.7385 |
18. | 0.850 0.0169 0.9923 0.0077 68.7831 0.7385 |
19. | 0.900 0.0000 1.0000 0.0000 68.7831 0.7385 |
20. | 0.950 0.0000 1.0000 0.0000 68.7831 0.7385 |
21. | 1.000 0.0000 1.0000 0.0000 68.7831 0.7385 |
+------------------------------------------------------+
Number of observations = 189
Number of probability cutoffs (20+1) = 21
Area under ROC curve = 0.7385
Highest value of correctly classified = 73.0159
. rocss low p, saved(allsens)
+------------------------------------------------------+
| cutoff sens spec omspec cclass carea |
|------------------------------------------------------|
1. | 0.000 1.0000 0.0000 1.0000 31.2169 0.0000 |
2. | 0.100 1.0000 0.0769 0.9231 36.5079 0.0769 |
3. | 0.200 0.8983 0.3692 0.6308 53.4392 0.3544 |
4. | 0.300 0.6441 0.6769 0.3231 66.6667 0.5917 |
5. | 0.400 0.4407 0.8615 0.1385 73.0159 0.6918 |
6. | 0.500 0.2712 0.9000 0.1000 70.3704 0.7055 |
7. | 0.600 0.1864 0.9462 0.0538 70.8995 0.7160 |
8. | 0.700 0.0508 0.9846 0.0154 69.3122 0.7206 |
9. | 0.800 0.0169 0.9923 0.0077 68.7831 0.7209 |
10. | 0.900 0.0000 1.0000 0.0000 68.7831 0.7209 |
11. | 1.000 0.0000 1.0000 0.0000 68.7831 0.7209 |
+------------------------------------------------------+
Number of observations = 189
Number of probability cutoffs (10+1) = 11
Area under ROC curve = 0.7209
Highest value of correctly classified = 73.0159
. rocss low p, ncut(80) gr saved(allsens) rep
+------------------------------------------------------+
| cutoff sens spec omspec cclass carea |
|------------------------------------------------------|
1. | 0.000 1.0000 0.0000 1.0000 31.2169 0.0000 |
2. | 0.013 1.0000 0.0000 1.0000 31.2169 0.0000 |
3. | 0.025 1.0000 0.0000 1.0000 31.2169 0.0000 |
4. | 0.038 1.0000 0.0000 1.0000 31.2169 0.0000 |
5. | 0.050 1.0000 0.0077 0.9923 31.7460 0.0077 |
6. | 0.063 1.0000 0.0154 0.9846 32.2751 0.0154 |
7. | 0.075 1.0000 0.0385 0.9615 33.8624 0.0385 |
8. | 0.087 1.0000 0.0538 0.9462 34.9206 0.0538 |
9. | 0.100 1.0000 0.0769 0.9231 36.5079 0.0769 |
10. | 0.112 1.0000 0.1000 0.9000 38.0952 0.1000 |
11. | 0.125 1.0000 0.1308 0.8692 40.2116 0.1308 |
12. | 0.138 0.9661 0.1615 0.8385 41.2698 0.1610 |
13. | 0.150 0.9661 0.1846 0.8154 42.8571 0.1833 |
14. | 0.162 0.9492 0.2385 0.7615 46.0317 0.2349 |
15. | 0.175 0.9153 0.2615 0.7385 46.5608 0.2564 |
16. | 0.188 0.8983 0.3000 0.7000 48.6772 0.2913 |
17. | 0.200 0.8983 0.3692 0.6308 53.4392 0.3535 |
18. | 0.213 0.8814 0.4000 0.6000 55.0265 0.3808 |
19. | 0.225 0.8644 0.4692 0.5308 59.2593 0.4413 |
20. | 0.237 0.8644 0.5077 0.4923 61.9048 0.4745 |
21. | 0.250 0.8475 0.5385 0.4615 63.4921 0.5008 |
22. | 0.262 0.7966 0.5769 0.4231 64.5503 0.5325 |
23. | 0.275 0.7458 0.6077 0.3923 65.0794 0.5562 |
24. | 0.287 0.7119 0.6538 0.3462 67.1958 0.5898 |
25. | 0.300 0.6441 0.6769 0.3231 66.6667 0.6055 |
26. | 0.313 0.6271 0.7000 0.3000 67.7249 0.6201 |
27. | 0.325 0.6102 0.7231 0.2769 68.7831 0.6344 |
28. | 0.338 0.5932 0.7615 0.2385 70.8995 0.6576 |
29. | 0.350 0.5424 0.8000 0.2000 71.9577 0.6794 |
30. | 0.363 0.4915 0.8154 0.1846 71.4286 0.6874 |
31. | 0.375 0.4746 0.8385 0.1615 72.4868 0.6985 |
32. | 0.387 0.4576 0.8462 0.1538 72.4868 0.7021 |
33. | 0.400 0.4407 0.8615 0.1385 73.0159 0.7090 |
34. | 0.412 0.4068 0.8692 0.1308 72.4868 0.7123 |
35. | 0.425 0.3729 0.8692 0.1308 71.4286 0.7123 |
36. | 0.438 0.3729 0.8846 0.1154 72.4868 0.7180 |
37. | 0.450 0.3220 0.8923 0.1077 71.4286 0.7207 |
38. | 0.463 0.2881 0.8923 0.1077 70.3704 0.7207 |
39. | 0.475 0.2881 0.8923 0.1077 70.3704 0.7207 |
40. | 0.488 0.2881 0.9000 0.1000 70.8995 0.7229 |
41. | 0.500 0.2712 0.9000 0.1000 70.3704 0.7229 |
42. | 0.512 0.2712 0.9000 0.1000 70.3704 0.7229 |
43. | 0.525 0.2712 0.9000 0.1000 70.3704 0.7229 |
44. | 0.538 0.2712 0.9077 0.0923 70.8995 0.7250 |
45. | 0.550 0.2542 0.9077 0.0923 70.3704 0.7250 |
46. | 0.563 0.2373 0.9308 0.0692 71.4286 0.7306 |
47. | 0.575 0.2203 0.9308 0.0692 70.8995 0.7306 |
48. | 0.587 0.2034 0.9308 0.0692 70.3704 0.7306 |
49. | 0.600 0.1864 0.9462 0.0538 70.8995 0.7336 |
50. | 0.613 0.1864 0.9538 0.0462 71.4286 0.7351 |
51. | 0.625 0.1356 0.9538 0.0462 69.8413 0.7351 |
52. | 0.637 0.1017 0.9538 0.0462 68.7831 0.7351 |
53. | 0.650 0.1017 0.9692 0.0308 69.8413 0.7366 |
54. | 0.663 0.1017 0.9846 0.0154 70.8995 0.7382 |
55. | 0.675 0.0508 0.9846 0.0154 69.3122 0.7382 |
56. | 0.688 0.0508 0.9846 0.0154 69.3122 0.7382 |
57. | 0.700 0.0508 0.9846 0.0154 69.3122 0.7382 |
58. | 0.712 0.0508 0.9846 0.0154 69.3122 0.7382 |
59. | 0.725 0.0508 0.9923 0.0077 69.8413 0.7386 |
60. | 0.738 0.0508 0.9923 0.0077 69.8413 0.7386 |
61. | 0.750 0.0339 0.9923 0.0077 69.3122 0.7386 |
62. | 0.762 0.0339 0.9923 0.0077 69.3122 0.7386 |
63. | 0.775 0.0339 0.9923 0.0077 69.3122 0.7386 |
64. | 0.788 0.0169 0.9923 0.0077 68.7831 0.7386 |
65. | 0.800 0.0169 0.9923 0.0077 68.7831 0.7386 |
66. | 0.813 0.0169 0.9923 0.0077 68.7831 0.7386 |
67. | 0.825 0.0169 0.9923 0.0077 68.7831 0.7386 |
68. | 0.837 0.0169 0.9923 0.0077 68.7831 0.7386 |
69. | 0.850 0.0169 0.9923 0.0077 68.7831 0.7386 |
70. | 0.863 0.0000 0.9923 0.0077 68.2540 0.7386 |
71. | 0.875 0.0000 0.9923 0.0077 68.2540 0.7386 |
72. | 0.887 0.0000 1.0000 0.0000 68.7831 0.7386 |
73. | 0.900 0.0000 1.0000 0.0000 68.7831 0.7386 |
74. | 0.913 0.0000 1.0000 0.0000 68.7831 0.7386 |
75. | 0.925 0.0000 1.0000 0.0000 68.7831 0.7386 |
76. | 0.938 0.0000 1.0000 0.0000 68.7831 0.7386 |
77. | 0.950 0.0000 1.0000 0.0000 68.7831 0.7386 |
78. | 0.962 0.0000 1.0000 0.0000 68.7831 0.7386 |
79. | 0.975 0.0000 1.0000 0.0000 68.7831 0.7386 |
80. | 0.988 0.0000 1.0000 0.0000 68.7831 0.7386 |
81. | 1.000 0.0000 1.0000 0.0000 68.7831 0.7386 |
+------------------------------------------------------+
Number of observations = 189
Number of probability cutoffs (80+1) = 81
Area under ROC curve = 0.7386
Highest value of correctly classified = 73.0159
Draw your own graph using the new dataset.