8.4 Appending and merging data

It is often useful to combine data from different sources. This may take the form of appending (adding additional cases with information on the same variables) or merging (adding additional variables that describe the same cases).

8.4.1 Appending new cases

To append new cases to your data frame, use bind_rows(OldData,NewData). Note that the variable names need to match exactly across data frames, but the variable order does not matter.

  # old data
    myData = tribble(
      ~District, ~Students,
      115, 985,
      116, 1132
    )

  # new data to add
    new = tribble(
      ~District, ~Students,
      117, 419,
      118, 633
    )
    
  # Append new to old
    myData = bind_rows(myData,new)
    
    myData
## # A tibble: 4 x 2
##   District Students
##      <dbl>    <dbl>
## 1      115      985
## 2      116     1132
## 3      117      419
## 4      118      633

8.4.2 Merging

To merge data frames (add new variables for existing cases), use left_join(OldData,NewData). In order to link rows in one data frame to rows in another, it is critical that the data sets contain a common identifier, with the same variable name and same values. Building on the example above

  # new variables
    newvars = tribble(
      ~District,~Teachers,
      115, 43,
      116, 71,
      118, 55
    )

  # join new to old
    myData = left_join(myData,newvars)
## Joining, by = "District"
    myData
## # A tibble: 4 x 3
##   District Students Teachers
##      <dbl>    <dbl>    <dbl>
## 1      115      985       43
## 2      116     1132       71
## 3      117      419       NA
## 4      118      633       55