6.2 Cross-tabulation
There is a three-step process for presenting and evaluating the association between two nominal variables. The first step is to create a basic cross-tabulation of the joint frequency using the count()
function. Here we consider the possibility that the representation of female subjects in biopics (SubjectSex
) has changed over time (Period
# Raw cross-tabulation
xtab %>%
film count(SubjectSex,Period) %>% # (OutcomeVar,ExposureVar)
na.omit() %>% # drop NA categories
# now organize results into a 2-way table
names_from = Period, # MUST be the ExposureVar
values_from = n,
values_fill = 0
## # A tibble: 2 x 4
## SubjectSex `1915--1965` `1965--1999` `2000--2014`
## <chr> <int> <int> <int>
## 1 Female 44 59 74
## 2 Male 132 203 249
Second, use chisq.test()
to conduct a χ2 test of independence. Specify the contingency table created above and add [-1]
to exclude the first column (category names) from the calculation/
## Pearson's Chi-squared test
## data: xtab[-1]
## X-squared = 0.4, df = 2, p-value = 0.8
Based on these results, the given sex of biopic subjects is independent of (ie does not differ systematically across) time period. The relationship is not statistically significant (χ2(2,N=761)=0.40, p=0.81).
Finally, to present the results of a cross-tabulation, convert the raw frequencies in your table to percentages (within categories of the the exposure variable). Start by calling the raw tabulation from above.
# Relative freq for presentations
xtab # add a row total
mutate(Total = rowSums(.[-1])) %>%
# convert to percentage
mutate_at(-1, ~ round(100 * ./sum(.), digits=1))
## # A tibble: 2 x 5
## SubjectSex `1915--1965` `1965--1999` `2000--2014`
## <chr> <dbl> <dbl> <dbl>
## 1 Female 25 22.5 22.9
## 2 Male 75 77.5 77.1
## # ... with 1 more variable: Total <dbl>
Copy and paste the table into your document. Then format appropriately (e.g. category labels) for final presentation.