library(ggplot2)
data <- read.csv('data_ch9_16.csv', header=T)
data$PollenRemovedlogit = log(data$PollenRemoved/(1-data$PollenRemoved))
data$DurationOfVisitlog = log(data$DurationOfVisit)
ggplot(data, aes(x=DurationOfVisitlog, y=PollenRemovedlogit, color=BeeType)) +
geom_point(shape=1) +
scale_colour_hue(l=50) +
geom_smooth(method=lm,
se=FALSE)
lmfit <- lm(PollenRemovedlogit ~ BeeType + DurationOfVisitlog
+ BeeType*DurationOfVisitlog, data=data)
summary(lmfit)
##
## Call:
## lm(formula = PollenRemovedlogit ~ BeeType + DurationOfVisitlog +
## BeeType * DurationOfVisitlog, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3803 -0.3699 0.0307 0.4552 1.1611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.0390 0.5115 -5.941 4.45e-07 ***
## BeeTypeWorker 1.3770 0.8722 1.579 0.122
## DurationOfVisitlog 1.0121 0.1902 5.321 3.52e-06 ***
## BeeTypeWorker:DurationOfVisitlog -0.2709 0.2817 -0.962 0.342
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6525 on 43 degrees of freedom
## Multiple R-squared: 0.6151, Adjusted R-squared: 0.5882
## F-statistic: 22.9 on 3 and 43 DF, p-value: 5.151e-09
r <- residuals(lmfit)
yh <- predict(lmfit)
p1<-ggplot(lmfit, aes(.fitted, .resid))+geom_point()
p1 <- p1 +geom_hline(yintercept=0)+geom_smooth() +
geom_text(aes(label=ifelse((.resid>4*IQR(.resid)|.fitted>4*IQR(.fitted)),paste('', "\n", .fitted, ",", .resid),"")), hjust=1.1)
p1
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
From the residual plot, there seem to be no outliers(see outlier detection part in the last code chunk wheere a outlier is defined if it is greater than 4*IQR(x)).
Also the p-value of cross interaction term \(BeeTypeWorker:DurationofVisitlog\) is 0.342 and hence at a significance level of 0.05 can be safely neglected.
\[ SS(\beta_0,\beta_1\dots \beta_n) = \sum_{i=1}^Nw_i(Y_i-\beta_0-\beta_1X_{1i}-\beta_2X_{2i}-\cdots-\beta_pX_{pi})^2 \]
Similarly,
To prove that this is indeed the minimum, we need to show that \[\frac{\partial^2 SS}{\partial \beta_i^2}\] is convex:
Similarly for any \(1 \leq j \leq p\):And for \[ k \neq j \]: