Lecture 09: Estriol and birthweight linear models

Keywords

biostatistics, healthcare, statistics, R, statistical testing, power analysis, GLM, regression

Overview

This notebook demonstrates linear regression analysis using:

  1. Estriol and birthweight: Simple linear regression
  2. Age, Birthweight, and Systolic Blood Pressure: Multiple linear regression

Setup

library(tidyverse)
library(ggpubr)
theme_set(theme_pubr())

Example 1: Estriol and Birthweight

Data

Examining the relationship between maternal estriol levels and infant birthweight.

# Estriol (mg/24 hr)
estriol <- c(7, 9, 9, 12, 14, 16, 16, 14, 16, 16, 17, 19, 21, 24, 15, 16,
             17, 25, 27, 15, 15, 15, 16, 19, 18, 17, 18, 20, 22, 25, 24)

# Birthweight (g/100)
birthweight <- c(25, 25, 25, 27, 27, 27, 24, 30, 30, 31, 30, 31, 30, 28, 32, 32,
                 32, 32, 34, 34, 34, 35, 35, 34, 35, 36, 37, 38, 40, 39, 43)

# Create a data frame
data <- data.frame(estriol = estriol, birthweight = birthweight)

# View the data
print(data)
   estriol birthweight
1        7          25
2        9          25
3        9          25
4       12          27
5       14          27
6       16          27
7       16          24
8       14          30
9       16          30
10      16          31
11      17          30
12      19          31
13      21          30
14      24          28
15      15          32
16      16          32
17      17          32
18      25          32
19      27          34
20      15          34
21      15          34
22      15          35
23      16          35
24      19          34
25      18          35
26      17          36
27      18          37
28      20          38
29      22          40
30      25          39
31      24          43

Linear Model

Fitting a simple linear regression model:

\[\text{birthweight} = \beta_0 + \beta_1 \times \text{estriol} + \epsilon\]

fit <- lm(birthweight ~ estriol, data = data)
summary(fit)

Call:
lm(formula = birthweight ~ estriol, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.1200 -2.0381 -0.0381  3.3537  6.8800 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  21.5234     2.6204   8.214 4.68e-09 ***
estriol       0.6082     0.1468   4.143 0.000271 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.821 on 29 degrees of freedom
Multiple R-squared:  0.3718,    Adjusted R-squared:  0.3501 
F-statistic: 17.16 on 1 and 29 DF,  p-value: 0.0002712

Interpretation:

  • Each unit increase in estriol is associated with an increase in birthweight
  • Check the p-value to assess statistical significance
  • R² indicates the proportion of variance explained

Example 2: Multiple Linear Regression

Data

Analyzing systolic blood pressure (SBP) as a function of age and birthweight.

age <- c(3, 4, 3, 2, 4, 5, 2, 3, 5, 4, 2, 3, 3, 4, 3, 3)

birthweight <- c(135, 120, 100, 105, 130, 125, 125, 105, 120, 90,
                 120, 95, 120, 150, 160, 125)

# Systolic Blood Pressure in mm Hg (y)
sbp <- c(89, 90, 83, 77, 92, 98, 82, 85, 96, 95, 80, 79, 86, 97, 92, 88)

data <- data.frame(
  age = age,
  birthweight = birthweight,
  sbp = sbp
)

print(data)
   age birthweight sbp
1    3         135  89
2    4         120  90
3    3         100  83
4    2         105  77
5    4         130  92
6    5         125  98
7    2         125  82
8    3         105  85
9    5         120  96
10   4          90  95
11   2         120  80
12   3          95  79
13   3         120  86
14   4         150  97
15   3         160  92
16   3         125  88

Multiple linear regression model

Fitting a model with both predictors:

\[\text{SBP} = \beta_0 + \beta_1 \times \text{age} + \beta_2 \times \text{birthweight} + \epsilon\]

fit <- lm(sbp ~ age + birthweight, data = data)
summary(fit)

Call:
lm(formula = sbp ~ age + birthweight, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0438 -1.3481 -0.2395  0.9688  6.6964 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 53.45019    4.53189  11.794 2.57e-08 ***
age          5.88772    0.68021   8.656 9.34e-07 ***
birthweight  0.12558    0.03434   3.657   0.0029 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.479 on 13 degrees of freedom
Multiple R-squared:  0.8809,    Adjusted R-squared:  0.8626 
F-statistic: 48.08 on 2 and 13 DF,  p-value: 9.844e-07

Interpretation:

  • Age coefficient: Effect of age on SBP, holding birthweight constant
  • Birthweight coefficient: Effect of birthweight on SBP, holding age constant
  • Adjusted R²: Accounts for the number of predictors

Individual Predictor Models

For comparison, fitting models with single predictors:

Age Only

fit <- lm(sbp ~ age, data = data)
summary(fit)

Call:
lm(formula = sbp ~ age, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.1395 -2.3314 -0.2163  2.1872  5.8605 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  67.6791     3.1906  21.212 4.84e-12 ***
age           6.1535     0.9283   6.629 1.13e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.403 on 14 degrees of freedom
Multiple R-squared:  0.7584,    Adjusted R-squared:  0.7411 
F-statistic: 43.94 on 1 and 14 DF,  p-value: 1.135e-05

Birthweight Only

fit <- lm(sbp ~ birthweight, data = data)
summary(fit)

Call:
lm(formula = sbp ~ birthweight, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-8.653 -3.000 -1.087  2.877 11.707 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 69.13333   10.40990   6.641 1.11e-05 ***
birthweight  0.15733    0.08556   1.839   0.0872 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.213 on 14 degrees of freedom
Multiple R-squared:  0.1946,    Adjusted R-squared:  0.137 
F-statistic: 3.382 on 1 and 14 DF,  p-value: 0.08722

Comparison

Compare the coefficients and R² values across models:

  • Full model: Both predictors included
  • Age only: Single predictor model
  • Birthweight only: Single predictor model

Notice how coefficients and significance can change when including/excluding predictors due to confounding and correlation between variables.