8.1 Filter/subset data

It is often necessary to limit your analysis to some subset of cases. Use the filter() command to specify the criteria by which to select cases.

  film =
    film %>%
    filter(SubjectSex == 'Female') # criteria for keeping cases

  head(film)
## # A tibble: 6 x 10
##   Title Release NumSubjects SubjectName SubjectType
##   <chr>   <dbl>       <dbl> <chr>       <chr>      
## 1 Big ~    2014           1 Margaret K~ Artist     
## 2 Test~    2014           1 Vera Britt~ Other      
## 3 The ~    2014           1 Brittany M~ Actress    
## 4 Wild     2014           1 Cheryl Str~ Other      
## 5 Diana    2013           1 Princess D~ Other      
## 6 Love~    2013           1 Linda Love~ Actress    
## # ... with 5 more variables: SubjectRace <chr>,
## #   PersonOfColor <dbl>, SubjectSex <chr>,
## #   LeadActor <chr>, Period <chr>

The conditions inside filter() identify the cases, or rows, to keep (i.e. you’re selecting only those rows that satisfy the given conditions). This can be based on any number of conditions. Note that a double equals sign == is used to check a logical condition.