require(leaps)
## Loading required package: leaps
data <- read.csv('case1102.csv')
data$logY = log(data$Brain/data$Liver)
Y <- data$logY
X <- data[,c('Days', 'Sex', 'Weight', 'Loss', 'Tumor')]
We use the leaps
package to perform subset selection.
rsubsets <- regsubsets(logY ~ Days+Sex+Weight+Loss+Tumor, data=data)
s <- summary(rsubsets, matrix.logical=TRUE)
s$cp
## [1] 9.457598 1.430200 2.006538 4.000835 6.000000
plot(rsubsets, scale='Cp')
The way to interpet this plot is to look at first the smallest \(C_p\) values, which happens to be around 1.4 and see the black
dots which in this case are given by Days, SexMale
So if we were to choose the covarates based only on \(C_p\) values, we select: Days and Sex
Here \(p=5\) and in principle any model with \(C_p < p\) is better than the full model, so we can also select these:
rsubsets <- regsubsets(logY ~ Days+Sex+Weight+Loss+Tumor,
data=data,
method='forward')
sforward <- summary(rsubsets, matrix.logical=TRUE)
sforward
## Subset selection object
## Call: regsubsets.formula(logY ~ Days + Sex + Weight + Loss + Tumor,
## data = data, method = "forward")
## 5 Variables (and intercept)
## Forced in Forced out
## Days FALSE FALSE
## SexMale FALSE FALSE
## Weight FALSE FALSE
## Loss FALSE FALSE
## Tumor FALSE FALSE
## 1 subsets of each size up to 5
## Selection Algorithm: forward
## Days SexMale Weight Loss Tumor
## 1 ( 1 ) FALSE TRUE FALSE FALSE FALSE
## 2 ( 1 ) TRUE TRUE FALSE FALSE FALSE
## 3 ( 1 ) TRUE TRUE TRUE FALSE FALSE
## 4 ( 1 ) TRUE TRUE TRUE FALSE TRUE
## 5 ( 1 ) TRUE TRUE TRUE TRUE TRUE
rsubsets <- regsubsets(logY ~ Days+Sex+Weight+Loss+Tumor,
data=data,
method='backward')
sbackward <- summary(rsubsets, matrix.logical=TRUE)
sbackward
## Subset selection object
## Call: regsubsets.formula(logY ~ Days + Sex + Weight + Loss + Tumor,
## data = data, method = "backward")
## 5 Variables (and intercept)
## Forced in Forced out
## Days FALSE FALSE
## SexMale FALSE FALSE
## Weight FALSE FALSE
## Loss FALSE FALSE
## Tumor FALSE FALSE
## 1 subsets of each size up to 5
## Selection Algorithm: backward
## Days SexMale Weight Loss Tumor
## 1 ( 1 ) FALSE TRUE FALSE FALSE FALSE
## 2 ( 1 ) TRUE TRUE FALSE FALSE FALSE
## 3 ( 1 ) TRUE TRUE TRUE FALSE FALSE
## 4 ( 1 ) TRUE TRUE TRUE FALSE TRUE
## 5 ( 1 ) TRUE TRUE TRUE TRUE TRUE
rsubsets <- regsubsets(logY ~ Days+Sex+Weight+Loss+Tumor,
data=data,
method="seqrep")
sboth <- summary(rsubsets, matrix.logical=TRUE)
sboth
## Subset selection object
## Call: regsubsets.formula(logY ~ Days + Sex + Weight + Loss + Tumor,
## data = data, method = "seqrep")
## 5 Variables (and intercept)
## Forced in Forced out
## Days FALSE FALSE
## SexMale FALSE FALSE
## Weight FALSE FALSE
## Loss FALSE FALSE
## Tumor FALSE FALSE
## 1 subsets of each size up to 5
## Selection Algorithm: 'sequential replacement'
## Days SexMale Weight Loss Tumor
## 1 ( 1 ) FALSE TRUE FALSE FALSE FALSE
## 2 ( 1 ) TRUE TRUE FALSE FALSE FALSE
## 3 ( 1 ) TRUE TRUE TRUE FALSE FALSE
## 4 ( 1 ) TRUE TRUE TRUE FALSE TRUE
## 5 ( 1 ) TRUE TRUE TRUE TRUE TRUE
From the above, we conclude that the variable selection in this case gives us the same set for all four methods.