6.2 Cross-tabulation

There is a three-step process for presenting and evaluating the association between two nominal variables. The first step is to create a basic cross-tabulation of the joint frequency using the count() function. Here we consider the possibility that the representation of female subjects in biopics (SubjectSex) has changed over time (Period).

# Raw cross-tabulation
  xtab =
    film %>% 
    count(SubjectSex,Period) %>%  # (OutcomeVar,ExposureVar)
    na.omit() %>% # drop NA categories
  # now organize results into a 2-way table
    pivot_wider(
      names_from = Period, # MUST be the ExposureVar
      values_from = n, 
      values_fill = 0
    )

  xtab
## # A tibble: 2 x 4
##   SubjectSex `1915--1965` `1965--1999` `2000--2014`
##   <chr>             <int>        <int>        <int>
## 1 Female               44           59           74
## 2 Male                132          203          249

Second, use chisq.test() to conduct a \(\chi^2\) test of independence. Specify the contingency table created above and add [-1] to exclude the first column (category names) from the calculation/

  chisq.test(xtab[-1])
## 
##  Pearson's Chi-squared test
## 
## data:  xtab[-1]
## X-squared = 0.4, df = 2, p-value = 0.8

Based on these results, the given sex of biopic subjects is independent of (ie does not differ systematically across) time period. The relationship is not statistically significant (\(\chi^2(2,N=761)=0.40\), \(p=0.81\)).

Finally, to present the results of a cross-tabulation, convert the raw frequencies in your table to percentages (within categories of the the exposure variable). Start by calling the raw tabulation from above.

# Relative freq for presentations  
  xtab %>%
  # add a row total
    mutate(Total = rowSums(.[-1])) %>%
  # convert to percentage
    mutate_at(-1, ~ round(100 * ./sum(.), digits=1))
## # A tibble: 2 x 5
##   SubjectSex `1915--1965` `1965--1999` `2000--2014`
##   <chr>             <dbl>        <dbl>        <dbl>
## 1 Female               25         22.5         22.9
## 2 Male                 75         77.5         77.1
## # ... with 1 more variable: Total <dbl>

Copy and paste the table into your document. Then format appropriately (e.g. category labels) for final presentation.