4.3 Referencing variables of a data frame
A data frame (or tibble) is a two-dimensional structure in which each variable forms a column and each observation forms a row. Each element is the value that a given observation takes on for a particular variable.
How do you reference or identify the variables in a data frame (e.g. to calculate the mean number of students tested, NumTested
from the dcps
schools data)? The $
and [[
extraction operators both pull variables from a dataframe or items from a list. Note that $
requires the variable name, whereas [[
allows you to use either the variable’s name (in quotes) or column number in the data frame. Do what you want. I prefer [[
for anyone in more advanced programming, but I’ll typically use $
in this course.
# Extract the variable (all observations)
$NumTested dcps
## [1] 72 147 67 180 213 224 158 112 157 375
## [11] 222 149 167 67 83 87 96 121 289 109
## [21] 495 62 1423 146 117 156 109 189 212 159
## [31] 112 77 137 112 362 310 110 160 94 91
## [41] 178 334 290 235 354 114 156 137 115 181
## [51] 306 111 96 193 246 132 12 129 93 153
## [61] 143 140 212 143 105 148 258 129 66 144
## [71] 399 133 117 79 144 213 120 288 96 170
## [81] 20 61 121 261 135 102 113 134 121 73
## [91] 217 211 175 409 239 126 102 374 199 180
## [101] 173 190 23 253 172 154 152 421
# Mean on a variable (3 ways)
mean(dcps$NumTested) # object$variable
## [1] 180.1
mean(dcps[['NumTested']]) # object[['variable']]
## [1] 180.1
mean(dcps[[4]]) # object[[column#]]
## [1] 180.1