4.3 Referencing variables of a data frame

A data frame (or tibble) is a two-dimensional structure in which each variable forms a column and each observation forms a row. Each element is the value that a given observation takes on for a particular variable.

How do you reference or identify the variables in a data frame (e.g. to calculate the mean number of students tested, NumTested from the dcps schools data)? The $ and [[ extraction operators both pull variables from a dataframe or items from a list. Note that $ requires the variable name, whereas [[ allows you to use either the variable’s name (in quotes) or column number in the data frame. Do what you want. I prefer [[ for anyone in more advanced programming, but I’ll typically use $ in this course.

# Extract the variable (all observations)
  dcps$NumTested
##   [1]   72  147   67  180  213  224  158  112  157  375
##  [11]  222  149  167   67   83   87   96  121  289  109
##  [21]  495   62 1423  146  117  156  109  189  212  159
##  [31]  112   77  137  112  362  310  110  160   94   91
##  [41]  178  334  290  235  354  114  156  137  115  181
##  [51]  306  111   96  193  246  132   12  129   93  153
##  [61]  143  140  212  143  105  148  258  129   66  144
##  [71]  399  133  117   79  144  213  120  288   96  170
##  [81]   20   61  121  261  135  102  113  134  121   73
##  [91]  217  211  175  409  239  126  102  374  199  180
## [101]  173  190   23  253  172  154  152  421
# Mean on a variable (3 ways)
  mean(dcps$NumTested) # object$variable
## [1] 180.1
  mean(dcps[['NumTested']]) # object[['variable']]
## [1] 180.1
  mean(dcps[[4]]) # object[[column#]]
## [1] 180.1