7.4 Logistic regression

With a binary outcome measure, logistic regression is generally more appropriate than linear (OLS) regression. Use the glm() function to estimate a generalized model, and specify the model family as binomial within the arguments.

# create binary measure of "above average math proficiency"
  dcps =
    dcps %>%
    mutate(AboveAvgMath = if_else(ProfMath > mean(ProfMath),1,0))

  Model3 = 
    glm(
      AboveAvgMath ~ ProfLang + NumTested,  # specify model
      family = 'binomial',  # logistic estimation
      data = dcps
    )

To view the coefficient estimates and evaluate hypotheses, again apply the summary() function to the model object.

# View estimates
  summary(Model3)  
## 
## Call:
## glm(formula = AboveAvgMath ~ ProfLang + NumTested, family = "binomial", 
##     data = dcps)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.938  -0.547  -0.351   0.213   2.115  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.22427    0.61253   -5.26  1.4e-07 ***
## ProfLang     0.11366    0.02412    4.71  2.4e-06 ***
## NumTested   -0.00239    0.00229   -1.04      0.3    
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 144.342  on 107  degrees of freedom
## Residual deviance:  77.127  on 105  degrees of freedom
## AIC: 83.13
## 
## Number of Fisher Scoring iterations: 6
# Odds ratios
  exp(coef(Model3))
## (Intercept)    ProfLang   NumTested 
##     0.03978     1.12037     0.99761

The results indicate that a percentage-point increase in a school’s language proficiency is expected to raise the odds of being above average in math by 12%, conditional on the number of students tested. Again, the increase is significant (\(p < 0.001\)).