Ex. 10.6
Ex. 10.6
McNemar test (An Introduction to Categorical Data Analysis). We support the test error rates on the spam data to be 5.5% for a generalized additive model (GAM), and 4.5% for gradient boosting (GBM), with a test sample of size 1536.
(a) Show that the standard error of these estimates is about 0.6%. Since the same test data are used for both methods, the error rates are correlated, and we cannot perform a two-sample t-test. We can compare the methods directly on each test observation, leading to the summary in Table below.
GAM | GBM | |
---|---|---|
Correct | Error | |
Correct | 1434 | 18 |
Error | 33 | 51 |
The McNemar test focuses on the discordant errors, 33 vs. 18.
(b) Conduct a test to show that GAM makes significantly more errors than gradient boosting, with a two-sided \(p\)-value of 0.036.
Soln. 10.6
(a) The standard error for a binomial estimate is \(\sqrt{p(1-p)/n}\). It's straightforward to verify that
(b) The McNemar test statistic is
which has a 1 degree of freedom, thus the \(p-\)value is 0.036.
Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|