7.2 OLS regression

Estimating a regression model using OLS is simple in R. Using the lm() function will estimate a model with one or more independent variables. Simply specify the formula using the syntax: Y ~ X1 + X2.

# Bivariate (unconditional) estimate
  Model1 <- lm(ProfMath ~ ProfLang, data = dcps)

# Multivariate (conditional) estimate
  Model2 <- lm(ProfMath ~ ProfLang + NumTested, data = dcps)

To view the coefficient estimates and evaluate hypotheses about the relationship, apply the summary() function to the model object.

  summary(Model1)

## 
## Call:
## lm(formula = ProfMath ~ ProfLang, data = dcps)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -38.23  -5.15  -0.91   7.17  26.92 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.9096     1.5049     0.6     0.55    
## ProfLang      0.8761     0.0391    22.4   <2e-16 ***
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.94 on 106 degrees of freedom
## Multiple R-squared:  0.826,  Adjusted R-squared:  0.824 
## F-statistic:  503 on 1 and 106 DF,  p-value: <2e-16

  summary(Model2)

## 
## Call:
## lm(formula = ProfMath ~ ProfLang + NumTested, data = dcps)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -39.33  -5.41  -0.80   6.98  26.43 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2109     1.7146    1.29     0.20    
## ProfLang      0.8943     0.0405   22.06   <2e-16 ***
## NumTested    -0.0102     0.0066   -1.55     0.12    
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.88 on 105 degrees of freedom
## Multiple R-squared:  0.83,   Adjusted R-squared:  0.827 
## F-statistic:  256 on 2 and 105 DF,  p-value: <2e-16

Notice in each that the independent variables define the rows. In Model2, the estimated slope coefficient for ProfLang is 0.89 with a p-value less than 0.001. This means that on average and net of the number of students tested, a 1-percentage-point increase in language proficiency is associated with a 0.89-percentage-point increase in math proficiency. The association is statistically significant (\(p<0.001\)). We might also note that the variables in the model account for almost 90% of observed variation in math proficiency across DC Public Schools (\(Adj~R^2=0.83\)).