Chapter 7 Statistical Analysis
7.1 t-test
7.1.1 example: Independent t-test
Let’s start by analyzing the difference in strength between superheroes from different publishers.
## # A tibble: 4 × 2
## Publisher mean_strength
## <chr> <dbl>
## 1 DC Comics 64.6
## 2 George Lucas 33.3
## 3 Image Comics 60
## 4 Marvel Comics 43.1
We can observe from the dataset that the mean strength of superheroes from different publishers seems to vary. But, is this difference statistically significant?
Let’s conduct a t-test to compare heroes from DC Comics and Marvel Comics.
heroes2 |>
filter(Publisher %in% c("Marvel Comics", "DC Comics")) |>
group_by(Publisher) |>
summarize(Strengths = list(Strength)) |>
with(t.test(Strengths[[1]], Strengths[[2]], paired = FALSE))
##
## Welch Two Sample t-test
##
## data: Strengths[[1]] and Strengths[[2]]
## t = 2.8807, df = 32.441, p-value = 0.00698
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.297946 36.652054
## sample estimates:
## mean of x mean of y
## 64.600 43.125
That looks complicated. I do not normally use that.
7.1.1.1 Step 1: Subset the data
First, we need to subset the data into two groups: one for DC Comics and one for Marvel Comics.
7.1.1.2 Step 2: Conduct the t-test
Next, we use the t.test()
function to calculate the difference in strength between the two groups.
##
## Welch Two Sample t-test
##
## data: marvel$Strength and DC$Strength
## t = -2.8807, df = 32.441, p-value = 0.00698
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -36.652054 -6.297946
## sample estimates:
## mean of x mean of y
## 43.125 64.600
From the results, we see that there is a significant difference in the strength levels between superheroes from DC Comics and Marcel Comics.
Important Note: Independent vs Paired t-test
In this case, we are comparing superheroes from two different groups (DC vs Marvel), which makes this an independent t-test. A paired t-test would be inappropriate here because the two groups consist of entirely different individuals.
However, when comparing repeated measurements on the same group of people (e.g., before and after a task), a paired t-test would be more suitable.
7.1.2 example: Paired t-test
Remember the performance data from earlier? Let’s say we want to analyze if there’s any difference in exam scores between Chapter 1 and Chapter 3 for the same group of students.
We can perform a paired t-test in this case:
# group them first
subset(performance, Subject_em == "Eng" & Chapter == "C1") -> eng_c1
subset(performance, Subject_em == "Eng" & Chapter == "C3") -> eng_c3
performance |>
with(t.test(eng_c1$Score, eng_c3$Score, paired = T))
##
## Paired t-test
##
## data: eng_c1$Score and eng_c3$Score
## t = -0.44114, df = 5, p-value = 0.6775
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -17.06789 12.06789
## sample estimates:
## mean difference
## -2.5
This test checks whether the students’ performance significantly changed between Chapter 1 and Chapter 3. The paired argument ensures that the test accounts for the fact that these are the same individuals in both conditions.
7.2 ANOVA
ANOVA is a powerful statistical tool used to compare the means of multiple groups. It helps in determining whether the observed differences between group means are statistically significant. Here’s how we can use anova_test()
in RStudio to perform various types of ANOVA.
heroes2 |>
anova_test(
dv = # dependent variable,
wid = # subject name/id,
type = NULL, # default: type II ANOVA
between = # between-subject independent variable,
within = # within-subject independent variable,
effect.size = "ges",
error = NULL,
white.adjust = FALSE,
observed = NULL,
detailed = TRUE)
detailed = TRUE
: Requests a more detailed statistical output.FALSE
is the default.type = NULL
: Type 2 ANOVA is the default. You can change this to1
or3
based on your needs.effect.size =
: Computes effect size, the default is using generalizing eta squared (ges
) or partial eta squared (pes
). to compute.- The
Help
section in RStudio says I could choose both, but I haven’t figure out how to do so yet.
- The
For ANCOVA, you can add
covariate =
to include continuous variables as covariates. Insert it belowwithin =
, abovetype =
.
7.2.1 one-way ANOVA
One-way ANOVA is used when we have one independent variable (IV) and one dependent variable (DV).
7.2.2 two-way ANOVA
Two-way ANOVA is used when we have two independent variables and one dependent variable. Here, we look at how these two IVs interact with each other.
7.2.2.1 example 1: Strength ~ BMI_status * Gender
## ANOVA Table (type III tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 (Intercept) 68473.044 109803.3 1 133 82.938 1.11e-15 * 0.384000
## 2 Gender 51.278 109803.3 1 133 0.062 8.04e-01 0.000467
## 3 BMI_status 8413.707 109803.3 3 133 3.397 2.00e-02 * 0.071000
## 4 Gender:BMI_status 6885.839 109803.3 3 133 2.780 4.40e-02 * 0.059000
Note: Be mindful of the ANOVA type. Even if you don’t specify a type, RStudio may default to type III ANOVA. If you prefer type II ANOVA, specify
type = 2
.
7.2.2.2 example 2: strength ~ gender * BMI
However, we would encounter an error here. Why is that?
Step 1: Check for mismatched variable types. For example, putting both categorical (e.g., Gender
) and numerical (e.g., BMI
) in the between =
section.
Solution 1
convert numerical variables into categorical variables
For example, you can classify BMI into levels like obese, overweight, normal, and underweight. (We did this in the previous section.)
# Solution 1
heroes2 |>
anova_test(
dv = Strength,
wid = Name,
between = c(Gender, BMI_status),
detailed = T)
## ANOVA Table (type III tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 (Intercept) 68473.044 109803.3 1 133 82.938 1.11e-15 * 0.384000
## 2 Gender 51.278 109803.3 1 133 0.062 8.04e-01 0.000467
## 3 BMI_status 8413.707 109803.3 3 133 3.397 2.00e-02 * 0.071000
## 4 Gender:BMI_status 6885.839 109803.3 3 133 2.780 4.40e-02 * 0.059000
Solution 2
use ANCOVA (Analysis of Covariance) (continuous * category ~ continuous)
ANCOVA is suitable when you have continuous variable that you want to adjust for or control while analyzing the main effects of categorical independent variables. In ANCOVA, you can include the continuous variable as a covariate.
# Solution 2 (ANCOVA)
heroes2 |>
anova_test(
dv = Strength,
wid = Name,
between = Gender,
covariate = BMI,
detailed = T)
## ANOVA Table (type II tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 BMI 23336.275 110912.1 1 138 29.036 2.99e-07 * 1.74e-01
## 2 Gender 10.606 110912.1 1 138 0.013 9.09e-01 9.56e-05
In Solution 2, the BMI variable is used as a covariate. This allows you to account for the continuous variable (BMI) while examining the effect of gender on strength.
7.2.2.3 example 3: strength ~ gender * Intelligence
Let us take a look at another example.
In the heroes2
dataset, the Intelligence
is a categorical variable. The intelligence levels of superheroes are labelled as high
, good
, average
, moderate
, and low
.
## # A tibble: 5 × 1
## Intelligence
## <chr>
## 1 moderate
## 2 good
## 3 high
## 4 average
## 5 low
Therefore, we should not encounter the same problem as example 2.
However, we still see errors here. After examining Step 1, we can conclude that it is not the issue here. We can try to debug using the following methods:
Step 2
We examine both independent variables separately to ensure they are functioning properly individually. We can delete one of the variables each time for testing. (Although this is usually not the problem, it’s still good practice to check.)
- Checking the variable
Gender
## ANOVA Table (type II tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 Gender 3772.182 134248.4 1 139 3.906 0.05 0.027
- Checking the variable
Intelligence
## ANOVA Table (type II tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 Intelligence 8758.544 129262 4 136 2.304 0.062 0.063
Both variables look fine, and the code works smoothly when tested individually.
Step 3
We examine if there are any missing values in any of the category. (I often forgot about this step, and spend hours confused about why my ANOVA isn’t ANOVAing…)
##
## average good high low moderate
## F 14 20 4 0 2
## M 24 41 27 2 7
Here, we see that in female superheroes category, there is a zero for low intelligence level. This prevents the ANOVA from proceeding. Now that we have identified the problem, we can remove the intelligence == "low"
from our analysis, as it hinders the process.
heroes2 |>
filter(Intelligence != "low") |>
anova_test(
dv = Strength,
wid = Name,
between = c(Gender, Intelligence),
type = 2,
detailed = T)
## ANOVA Table (type II tests)
##
## Effect SSn SSd DFn DFd F p p<.05 ges
## 1 Gender 1982.076 126823.2 1 131 2.047 0.155 0.015
## 2 Intelligence 3719.933 126823.2 3 131 1.281 0.284 0.028
## 3 Gender:Intelligence 406.757 126823.2 3 131 0.140 0.936 0.003
You might think: “I’ll just add group_by()
before the anova_test()
.” However, while this approach would not give you error message, it would not yield a two-way ANOVA as we intended. Instead, it would just separate the data into two groups before performing a one-way ANOVA on each. See the codes below:
heroes2 |>
filter(Intelligence != "low") |>
group_by(Gender) |>
anova_test(
dv = Strength,
wid = Name,
between = Intelligence,
detailed = T)
## # A tibble: 2 × 10
## Gender Effect SSn SSd DFn DFd F p `p<.05` ges
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 F Intelligence 1564. 33535. 3 36 0.56 0.645 "" 0.045
## 2 M Intelligence 2563. 93288. 3 95 0.87 0.46 "" 0.027
We can see that the result table differs from the two-way ANOVA table (refer to the upper table), and the numbers are also different (see DFn
and DFd
in both results).
7.2.3 three-way ANOVA
7.2.3.1 example 1: Strength ~ Gender * Intelligence * BMI_status
If we want to add more independent variables, we can simply include additional column names in the between =
section.
heroes2 |>
filter(Intelligence != "low") |>
anova_test(
dv = Strength,
wid = Name,
between = c(Gender, Intelligence, BMI_status),
type = 2,
detailed = T)
## ANOVA Table (type II tests)
##
## Effect SSn SSd DFn DFd F p p<.05
## 1 Gender 1083.194 96626.99 1 118 1.323 0.252000
## 2 Intelligence 3465.586 96626.99 3 118 1.411 0.243000
## 3 BMI_status 16524.941 96626.99 3 118 6.727 0.000314 *
## 4 Gender:Intelligence 2703.193 96626.99 2 118 1.651 0.196000
## 5 Gender:BMI_status 8295.111 96626.99 3 118 3.377 0.021000 *
## 6 Intelligence:BMI_status 7715.048 96626.99 7 118 1.346 0.235000
## 7 Gender:Intelligence:BMI_status NA 96626.99 0 118 NA NA <NA>
## ges
## 1 0.011
## 2 0.035
## 3 0.146
## 4 0.027
## 5 0.079
## 6 0.074
## 7 NA
Note that if the independent variable is within-subject (in the case of a mixed analysis), we place the within-subject independent variables in the within =
section. This is commonly used when analyzing experimental results, such as comparing test scores before learning and after learning. See below for more details on mixed ANOVA.
7.2.4 mixed ANOVA
As mentioned, mixed ANOVA is often used for analyzing experimental results, as it allows for tracking individual changes over time, which is difficult to achieve with other methods.
Let’s assume a high school wants to investigate whether different teaching styles affect students’ exam scores. They conducted an experiments, dividing students into two groups: one received education in a fun way, while the other was taught in a more rigid manner. The school also tested the students before and after learning the subject.
Independent variable: teaching style (fun/rigid), exam time (before/after)
Dependent variable: exam score
teaching_style <- data.frame(
student = c("1", "2", "3", "4", "5", "6",
"1", "2", "3", "4", "5", "6"),
style = as.factor(c("fun", "rigid", "fun", "rigid", "fun", "rigid",
"fun", "rigid", "fun", "rigid", "fun", "rigid")),
timing = as.factor(c("before", "before", "before", "before", "before", "before",
"after", "after", "after", "after", "after", "after")),
score = c(50, 13, 10, 70, 23, 70, 70, 90, 66, 83, 80, 80))
The data frame is created with 6 students: 3 participated in the fun
teaching style, and 3 experienced the rigid
style. All of their before
and after
exam scores were recorded.
Here, we want to see if the teaching style influenced their exam scores.
teaching_style |>
anova_test(
dv = score,
wid = student,
between = style,
within = timing,
detailed = T)
## ANOVA Table (type II tests)
##
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 (Intercept) 1 4 41418.750 1278.667 129.569 0.00034 * 0.929
## 2 style 1 4 954.083 1278.667 2.985 0.15900 0.232
## 3 timing 1 4 4524.083 1876.667 9.643 0.03600 * 0.589
## 4 style:timing 1 4 90.750 1876.667 0.193 0.68300 0.028
From the analysis, we can observe that teaching style does not have a significant impact on their scores. However, there is a main effect of the timing
variable.
By examining the mean scores across different times, we can see that students performed better after learning, regardless of the teaching style applied.
## # A tibble: 2 × 2
## timing mean_score
## <fct> <dbl>
## 1 after 78.2
## 2 before 39.3
7.2.5 Additional notes
Use
filter()
to choose specific groups for analysis.For instance,
filter(Intelligence %in% c("high", "good")
orfilter(Hair_color != "No Hair")
.group_by()
is useful if you want to examine ANOVA results for both conditions’ simultaneously.
I recently learned that the
anova_test()
function does not support more than one within variable, therefore, you might encounter error messages when doing that (e.g.,within = c(Var1, Var2)
).One way to avoid this is to merge the dataset temporary into the two variables you need.
method 1: merge data temporary
heroes2 |>
filter(Publisher %in% c("Marvel Comics", "DC Comics")) |>
filter(Gender %in% c("F", "M")) |>
group_by(Publisher, Gender, Name) |>
summarize(Height = mean(Height, na.rm = TRUE), .groups = "drop") -> temporary_hero_data_for_anova
temporary_hero_data_for_anova
## # A tibble: 137 × 4
## Publisher Gender Name Height
## <chr> <chr> <chr> <dbl>
## 1 DC Comics F Miss Martian 178.
## 2 DC Comics F Raven 165.
## 3 DC Comics F Starfire 194.
## 4 DC Comics F Supergirl 165.
## 5 DC Comics F Vixen 176.
## 6 DC Comics F Wonder Woman 183.
## 7 DC Comics M Adam Strange 185.
## 8 DC Comics M Alan Scott 181.
## 9 DC Comics M Aqualad 179.
## 10 DC Comics M Aquaman 186.
## # ℹ 127 more rows
Here, as you can see from the temporary dataset, I only need two main variables: Publisher
and Gender
. Therefore, I tell R that none of the other variables matter to me right now. I only need these two variables and Name
to identify each unique hero. I use the summarize()
function to calculate each hero’s mean Height
across different records, using na.rm = TRUE
to ignore missing values.
The .groups = "drop"
argument tells R to drop all grouping levels after the summarize()
operation is complete. This means that the resulting dataset will no longer retain any of the grouping structure (Publisher
, Gender
, or Name
), making the output a regular data frame without group attributes. This is useful when we don’t need to continue further grouped calculations on this summarized data.
Then you could use this data for further ANOVA analysis.
temporary_hero_data_for_anova |>
filter(Publisher %in% c("Marvel Comics", "DC Comics")) |>
filter(Gender %in% c("F", "M")) |>
group_by(Publisher) |>
anova_test(
dv = Height,
wid = Name,
between = Gender,
effect.size = "ges",
detailed = FALSE
)
## # A tibble: 2 × 8
## Publisher Effect DFn DFd F p `p<.05` ges
## * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 DC Comics Gender 1 23 1.93 0.178 "" 0.078
## 2 Marvel Comics Gender 1 110 3.62 0.06 "" 0.032
method 2: ez ANOVA
7.3 Post hoc analysis
After conducting an ANOVA, if we want to explore the data further (assuming the ANOVA results are significant), we can perform a post hoc analysis.
I used to be confused about the difference between post hoc analysis and pairwise t-test. It’s important to note that post hoc analysis is done after obtaining a significant result from ANOVA, while pairwise t-test can be applied to any two groups of data.
teaching_style |>
group_by(style) |>
pairwise_t_test(
score ~ timing,
paired = TRUE,
p.adjust.method = "bonferroni")
## # A tibble: 2 × 11
## style .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 fun score after before 3 3 3.64 2 0.068 0.068 ns
## 2 rigid score after before 3 3 1.53 2 0.267 0.267 ns
In this example, we use post hoc analysis:
score ~ timing
: Here, we put the dependent variable on the left and the independent variable on the right. We are comparingscores
across different examtiming
.paired =
: Set this toTRUE
if it’s a repeated measures design , andFALSE
if it’s a between-subject design (i.e., different participants in each group).p.adjust.method
: This argument allows us to choose a method for adjusting p values.