7.2 OLS regression
Estimating a regression model using OLS is simple in R. Using the lm()
function will estimate a model with one or more independent variables. Simply specify the formula using the syntax: Y ~ X1 + X2
.
# Bivariate (unconditional) estimate
<- lm(ProfMath ~ ProfLang, data = dcps)
Model1
# Multivariate (conditional) estimate
<- lm(ProfMath ~ ProfLang + NumTested, data = dcps) Model2
To view the coefficient estimates and evaluate hypotheses about the relationship, apply the summary()
function to the model object.
summary(Model1)
##
## Call:
## lm(formula = ProfMath ~ ProfLang, data = dcps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.23 -5.15 -0.91 7.17 26.92
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9096 1.5049 0.6 0.55
## ProfLang 0.8761 0.0391 22.4 <2e-16 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.94 on 106 degrees of freedom
## Multiple R-squared: 0.826, Adjusted R-squared: 0.824
## F-statistic: 503 on 1 and 106 DF, p-value: <2e-16
summary(Model2)
##
## Call:
## lm(formula = ProfMath ~ ProfLang + NumTested, data = dcps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.33 -5.41 -0.80 6.98 26.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2109 1.7146 1.29 0.20
## ProfLang 0.8943 0.0405 22.06 <2e-16 ***
## NumTested -0.0102 0.0066 -1.55 0.12
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.88 on 105 degrees of freedom
## Multiple R-squared: 0.83, Adjusted R-squared: 0.827
## F-statistic: 256 on 2 and 105 DF, p-value: <2e-16
Notice in each that the independent variables define the rows. In Model2
, the estimated slope coefficient for ProfLang
is 0.89 with a p-value less than 0.001. This means that on average and net of the number of students tested, a 1-percentage-point increase in language proficiency is associated with a 0.89-percentage-point increase in math proficiency. The association is statistically significant (\(p<0.001\)). We might also note that the variables in the model account for almost 90% of observed variation in math proficiency across DC Public Schools (\(Adj~R^2=0.83\)).