library(tidyverse)
library(ggpubr)
theme_set(theme_pubr())Lecture 09: Estriol and birthweight linear models
biostatistics, healthcare, statistics, R, statistical testing, power analysis, GLM, regression
Overview
This notebook demonstrates linear regression analysis using:
- Estriol and birthweight: Simple linear regression
- Age, Birthweight, and Systolic Blood Pressure: Multiple linear regression
Setup
Example 1: Estriol and Birthweight
Data
Examining the relationship between maternal estriol levels and infant birthweight.
# Estriol (mg/24 hr)
estriol <- c(7, 9, 9, 12, 14, 16, 16, 14, 16, 16, 17, 19, 21, 24, 15, 16,
17, 25, 27, 15, 15, 15, 16, 19, 18, 17, 18, 20, 22, 25, 24)
# Birthweight (g/100)
birthweight <- c(25, 25, 25, 27, 27, 27, 24, 30, 30, 31, 30, 31, 30, 28, 32, 32,
32, 32, 34, 34, 34, 35, 35, 34, 35, 36, 37, 38, 40, 39, 43)
# Create a data frame
data <- data.frame(estriol = estriol, birthweight = birthweight)
# View the data
print(data) estriol birthweight
1 7 25
2 9 25
3 9 25
4 12 27
5 14 27
6 16 27
7 16 24
8 14 30
9 16 30
10 16 31
11 17 30
12 19 31
13 21 30
14 24 28
15 15 32
16 16 32
17 17 32
18 25 32
19 27 34
20 15 34
21 15 34
22 15 35
23 16 35
24 19 34
25 18 35
26 17 36
27 18 37
28 20 38
29 22 40
30 25 39
31 24 43
Linear Model
Fitting a simple linear regression model:
\[\text{birthweight} = \beta_0 + \beta_1 \times \text{estriol} + \epsilon\]
fit <- lm(birthweight ~ estriol, data = data)
summary(fit)
Call:
lm(formula = birthweight ~ estriol, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.1200 -2.0381 -0.0381 3.3537 6.8800
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.5234 2.6204 8.214 4.68e-09 ***
estriol 0.6082 0.1468 4.143 0.000271 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.821 on 29 degrees of freedom
Multiple R-squared: 0.3718, Adjusted R-squared: 0.3501
F-statistic: 17.16 on 1 and 29 DF, p-value: 0.0002712
Interpretation:
- Each unit increase in estriol is associated with an increase in birthweight
- Check the p-value to assess statistical significance
- R² indicates the proportion of variance explained
Example 2: Multiple Linear Regression
Data
Analyzing systolic blood pressure (SBP) as a function of age and birthweight.
age <- c(3, 4, 3, 2, 4, 5, 2, 3, 5, 4, 2, 3, 3, 4, 3, 3)
birthweight <- c(135, 120, 100, 105, 130, 125, 125, 105, 120, 90,
120, 95, 120, 150, 160, 125)
# Systolic Blood Pressure in mm Hg (y)
sbp <- c(89, 90, 83, 77, 92, 98, 82, 85, 96, 95, 80, 79, 86, 97, 92, 88)
data <- data.frame(
age = age,
birthweight = birthweight,
sbp = sbp
)
print(data) age birthweight sbp
1 3 135 89
2 4 120 90
3 3 100 83
4 2 105 77
5 4 130 92
6 5 125 98
7 2 125 82
8 3 105 85
9 5 120 96
10 4 90 95
11 2 120 80
12 3 95 79
13 3 120 86
14 4 150 97
15 3 160 92
16 3 125 88
Multiple linear regression model
Fitting a model with both predictors:
\[\text{SBP} = \beta_0 + \beta_1 \times \text{age} + \beta_2 \times \text{birthweight} + \epsilon\]
fit <- lm(sbp ~ age + birthweight, data = data)
summary(fit)
Call:
lm(formula = sbp ~ age + birthweight, data = data)
Residuals:
Min 1Q Median 3Q Max
-4.0438 -1.3481 -0.2395 0.9688 6.6964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.45019 4.53189 11.794 2.57e-08 ***
age 5.88772 0.68021 8.656 9.34e-07 ***
birthweight 0.12558 0.03434 3.657 0.0029 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.479 on 13 degrees of freedom
Multiple R-squared: 0.8809, Adjusted R-squared: 0.8626
F-statistic: 48.08 on 2 and 13 DF, p-value: 9.844e-07
Interpretation:
- Age coefficient: Effect of age on SBP, holding birthweight constant
- Birthweight coefficient: Effect of birthweight on SBP, holding age constant
- Adjusted R²: Accounts for the number of predictors
Individual Predictor Models
For comparison, fitting models with single predictors:
Age Only
fit <- lm(sbp ~ age, data = data)
summary(fit)
Call:
lm(formula = sbp ~ age, data = data)
Residuals:
Min 1Q Median 3Q Max
-7.1395 -2.3314 -0.2163 2.1872 5.8605
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 67.6791 3.1906 21.212 4.84e-12 ***
age 6.1535 0.9283 6.629 1.13e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.403 on 14 degrees of freedom
Multiple R-squared: 0.7584, Adjusted R-squared: 0.7411
F-statistic: 43.94 on 1 and 14 DF, p-value: 1.135e-05
Birthweight Only
fit <- lm(sbp ~ birthweight, data = data)
summary(fit)
Call:
lm(formula = sbp ~ birthweight, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.653 -3.000 -1.087 2.877 11.707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.13333 10.40990 6.641 1.11e-05 ***
birthweight 0.15733 0.08556 1.839 0.0872 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.213 on 14 degrees of freedom
Multiple R-squared: 0.1946, Adjusted R-squared: 0.137
F-statistic: 3.382 on 1 and 14 DF, p-value: 0.08722
Comparison
Compare the coefficients and R² values across models:
- Full model: Both predictors included
- Age only: Single predictor model
- Birthweight only: Single predictor model
Notice how coefficients and significance can change when including/excluding predictors due to confounding and correlation between variables.