An (in)complete guide to reaction time analysis

August 17, 2021

Recently, one of my colleagues asked me what would be the best way to analyze reaction time (or response time; RT) data. I have summarized some of my analysis pipelines and thoughts that might help him. I hope this article to be helpful to someone who might struggle to analyze RT data.

*This article is not a comprehensive study of how RT analysis should be done. Rather, I will keep the article minimal and link to more excellent sources for further information.

Exmaple dataset

We will use the reaction time data by Rosenbaum, Mama, and Algom (2017), who have generously shared their dataset on the open science framework (OSF). I could find this dataset via this article.

The task that participants performed was a Stroop task. Rosenbaum and colleagues investigated whether posture (either standing up or sitting) affects selective attention during the Stroop task (either congruent or incongruent trials). In this tutorial, we will simply explore whether different postures or task conditions affect reaction times.

Preprocessing

First, we will start the analysis by loading two libraries.

library("tidyverse") # for data wrangling
library("afex") # for ANOVA, mixed-models

The raw data uploaded on OSF is actually .xlsx, which contains different kinds of data across sheets. For simplicity, I will only use data from experiment 3, which is located in the sheet called Exeriment 3 raw.

df <- readxl::read_xlsx("raw data Stroop.xlsx", sheet = "Exeriment 3 raw")

The raw data is quite messy. Different subjects’ data are concatenated, even headers (!!!). Therefore, we first need to filter out those headers being smuggled into the data frame, change the affected columns as numbers, then convert the factors as factors.

df <- df %>%
  filter(Subj != "Subj") %>%
  drop_na() %>%
  rename(posture = congruence,
         congruency = word) %>%
  mutate(across(c(Subj, Order, condition, RT), as.integer),
         across(c(Subj, posture, congruency), factor))
str(df)

## tibble [7,200 × 16] (S3: tbl_df/tbl/data.frame)
##  $ Subj        : Factor w/ 50 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Cond        : chr [1:7200] "1" "1" "1" "1" ...
##  $ Block       : chr [1:7200] "1" "1" "1" "1" ...
##  $ Trial       : chr [1:7200] "0" "0" "0" "0" ...
##  $ Order       : int [1:7200] 23 21 39 77 69 35 78 27 64 71 ...
##  $ color       : chr [1:7200] "red" "red" "red" "blue" ...
##  $ congruency  : Factor w/ 2 levels "c","ic": 1 1 1 2 2 2 1 1 1 2 ...
##  $ posture     : Factor w/ 2 levels "SIT","STAND": 1 1 1 1 1 1 1 1 1 1 ...
##  $ condition   : int [1:7200] 7 8 9 10 11 12 13 14 15 16 ...
##  $ Resp        : chr [1:7200] "-1" "-1" "-1" "-1" ...
##  $ Name        : chr [1:7200] "7.wav" "8.wav" "9.wav" "10.wav" ...
##  $ Stim        : chr [1:7200] "אדום" "אדום" "אדום" "כחול" ...
##  $ RT          : int [1:7200] 5282 3161 944 812 1067 1096 1005 949 944 1380 ...
##  $ Correct     : logi [1:7200] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ raw outliers: logi [1:7200] FALSE FALSE TRUE TRUE TRUE TRUE ...
##  $ 2.5SD       : logi [1:7200] FALSE FALSE TRUE TRUE TRUE TRUE ...

Filtering RTs (Outlier removal)

When doing RT analysis, we often filter out incorrect responses where they are considered as errors. Participants correctly performed the task in 7023 trials out of 7200 trials, which is around 98 % accuracy.

sum(df$Correct == "TRUE")

## [1] 7023

df <- filter(df, Correct == "TRUE")

Let’s look at the distribution of correct responses.

hist(df$RT)

c(mean = mean(df$RT), quantile(df$RT))

##      mean        0%       25%       50%       75%      100% 
##  957.9694  101.0000  720.0000  853.0000 1036.0000 9658.0000

The first obvious observation is that RTs are positively skewed. The mean RT is around 958 ms, where the fastest reaction was around 101 ms and the slowest one was up to 10 s.

A common step to be taken next is to exclude outlier RTs that are over 2 or 3 standard deviations (SDs) away from the mean. I do not advocate this approach because responses are right-skewed (i.e., not normally distributed), and it increases researcher degrees of freedom (Simmons, Nelson, and Simonsohn 2011). One may set the criterion per participants, per conditions within each participant, and before or after excluding incorrect trials.

I believe we should minimize the impact of data exclusion process such that not excluding data at all or using absolute cut-offs. Here, I will set an absolute criterion to exclude responses that are too fast (less than 200 ms), or too slow (larger than 3000 ms) (although this criterion is also vague and subject to the specific experimental paradigm).

sum(df$RT < 200)

## [1] 8

Responses that were too fast were around 0.11 % (8 / 7023) of the correct trials.

sum(df$RT > 3000)

## [1] 125

Responses that were too slow were around 1.78 % (125 / 7023) of the correct trials. In total, we remove around 1.89 % of the responses.

df <- df %>%
  filter(RT > 200, RT < 3000)

The RTs are still positively skewed.

hist(df$RT)

c(mean = mean(df$RT), quantile(df$RT))

##      mean        0%       25%       50%       75%      100% 
##  906.9106  224.0000  694.0000  853.0000 1036.0000 2946.0000

Setting contrasts

Our last step of this preprocessing pipeline is setting SUM CONTRASTS (sum-to-zero) for the two categorical predictors (posture, congruency). The default scheme when setting factors in R is TREATMENT CONTRASTS (e.g., control as 0 and treatment as 1). The detailed discussion of why it is a good idea to use sum contrasts instead of treatment contrasts especially when interaction is considered is described in a great piece by Nicenboim, Schad, and Vasishth (2021).

df <- df %>%
  mutate(across(c(posture, congruency), ~ `contrasts<-`(.x, , contr.sum)))

(Actually, afex package defaults to the sum contrasts and converts contrasts of all the factors to sum contrasts automatically when performing analysis with its functions. Therefore, the current step is unnecessary. However, I added this section to explicitly discuss differences between contrasts and what is the default behavior of R.)

Data Analysis

Now, we have a preprocessed RT data. In this classic example of 2 × 2 design, you may be attracted to use ANOVA right away. However, I suggest you to check out mixed models and use them instead: An Introduction to Mixed Models for Experimental Psychology (Singmann and Kellen 2019). In the sections below, I will first show an example analysis with ANOVA, then with mixed models. Lastly, I will demonstrate further analysis using the logarithm of RTs.

ANOVA

First off, the analysis of variance (ANOVA) incorporating the repeated measures design.

anova_res <- afex::aov_ez(
  data = df,
  dv = "RT",
  id = "Subj",
  within = c("posture", "congruency")
)

## Warning: More than one observation per cell, aggregating the data using mean
## (i.e, fun_aggregate = mean)!

As shown in the warning message, one advantage of using afex package is that it automatically aggregates the data if it has not already been summarized.

anova_res

## Anova Table (Type 3 tests)
## 
## Response: RT
##               Effect    df      MSE          F  ges p.value
## 1            posture 1, 49 11283.79       0.69 .002    .410
## 2         congruency 1, 49  2476.50 257.47 *** .138   <.001
## 3 posture:congruency 1, 49  1951.28   11.47 ** .006    .001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Here, there is a significant effect of congruency, a significant effect of interaction between posture and congruency, but no such effect of posture.

Linear Mixed Model (LMM)

While we are also able to incorporate additional elements such as the effect of item in LMM as it provides much wider flexibility than ANOVA, we test the minimal structure here for simplicity. For more information about LMM, please refer to the work of Barr et al. (2013) or Brauer and Curtin (2018).

lmm_res <- afex::mixed(
  RT ~ posture*congruency + (1|Subj),
  data = df
)

## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]

lmm_res

## Mixed Model Anova Table (Type 3 tests, S-method)
## 
## Model: RT ~ posture * congruency + (1 | Subj)
## Data: df
##               Effect         df          F p.value
## 1            posture 1, 6837.43     3.69 +    .055
## 2         congruency 1, 6837.11 269.57 ***   <.001
## 3 posture:congruency 1, 6837.17    9.52 **    .002
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

In this simple example, there is no huge difference between results of ANOVA and LMM.

One can conveniently use extract_eq function from equatiomatic package to visualize the underlying equations of mixed-effects models. (Unfortunately, equatiomatic does not yet support visualizing mixed objects, so we use lmer function instead for visualization purpose.)

equatiomatic::extract_eq(
  lme4::lmer(
    RT ~ posture * congruency + (1 | Subj),
    data = df
  )
)

$\begin{aligned} {RT}_{i} & \sim N (μ, σ^{2}) \\ μ & = α_{j [i]} + β_{1} ({posture}_{1}) + β_{2} ({congruency}_{1}) + β_{3} ({congruency}_{1} \times {posture}_{1}) \\ α_{j} & \sim N (μ_{α_{j}}, σ_{α_{j}}^{2}), for Subj j = 1, \dots,J \end{aligned}$

LMM with log RTs

As we have seen in Filtering RTs (Outlier removal) section, the reaction time distribution is positively skewed. In this case, rather than modeling RT distribution as normal, we can apply LMM on the log RT, which not only reflects the RTs more realistically but also has much more advantages (see Lo and Andrews (2015)). In the article You should (usually) log transform your positive data, Andrew Gelman concludes

Log transform, kids. And don’t listen to people who tell you otherwise.

So let’s do as what Gelman says.

df <- mutate(df, log_rt = log(RT))
hist(df$log_rt)

The log RT distribution looks much closer to normal distribution now.

lmm_logrt_res <- afex::mixed(
  log_rt ~ posture*congruency + (1|Subj),
  data = df
)

## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]

lmm_logrt_res

## Mixed Model Anova Table (Type 3 tests, S-method)
## 
## Model: log_rt ~ posture * congruency + (1 | Subj)
## Data: df
##               Effect         df          F p.value
## 1            posture 1, 6837.33    9.70 **    .002
## 2         congruency 1, 6837.10 419.92 ***   <.001
## 3 posture:congruency 1, 6837.14    7.69 **    .006
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Let’s plot the results.

afex::afex_plot(lmm_logrt_res, "posture", "congruency")

Interestingly, we found an additional main effect of posture, in which we did not observe in previous models using RTs on their original scale.

Conclusion

In this article, we have explored various ways of RT analysis, including minimal preprocessing steps.

References

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Brauer, Markus, and John J. Curtin. 2018. “Linear Mixed-Effects Models and the Analysis of Nonindependent Data: A Unified Framework to Analyze Categorical and Continuous Independent Variables That Vary Within-Subjects and/or Within-Items.” Psychological Methods 23 (3): 389–411. https://doi.org/10.1037/met0000159.

Lo, Steson, and Sally Andrews. 2015. “To Transform or Not to Transform: Using Generalized Linear Mixed Models to Analyse Reaction Time Data.” Frontiers in Psychology 6. https://doi.org/10.3389/fpsyg.2015.01171.

Nicenboim, Bruno, Daniel Schad, and Shravan Vasishth. 2021. An Introduction to Bayesian Data Analysis for Cognitive Science. https://vasishth.github.io/bayescogsci/book/sec-MR-ANOVA.html#interactions-between-contrasts.

Rosenbaum, David, Yaniv Mama, and Daniel Algom. 2017. “Stand by Your Stroop: Standing up Enhances Selective Attention and Cognitive Control.” Psychological Science 28 (12): 1864–67. https://doi.org/10.1177/0956797617721270.

Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–66. https://doi.org/10.1177/0956797611417632.

Singmann, Henrik, and David Kellen. 2019. “An Introduction to Mixed Models for Experimental Psychology.” In New Methods in Cognitive Psychology, edited by Daniel Spieler and Eric Schumacher, 1st ed., 4–31. New York: Routledge.

Computing Environment

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] afex_1.0-1      lme4_1.1-27.1   Matrix_1.3-4    forcats_0.5.1  
##  [5] stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_2.0.0    
##  [9] tidyr_1.1.3     tibble_3.1.3    tidyverse_1.3.1 ggplot2_3.3.5  
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-152            fs_1.5.0                lubridate_1.7.10       
##  [4] httr_1.4.2              numDeriv_2016.8-1.1     tools_4.0.4            
##  [7] backports_1.2.1         bslib_0.2.5.1           utf8_1.2.2             
## [10] R6_2.5.0                DBI_1.1.1               colorspace_2.0-2       
## [13] withr_2.4.2             tidyselect_1.1.1        emmeans_1.6.2-1        
## [16] curl_4.3.2              compiler_4.0.4          cli_3.0.1              
## [19] rvest_1.0.1             xml2_1.3.2              labeling_0.4.2         
## [22] bookdown_0.22           sass_0.4.0              equatiomatic_0.2.0.9000
## [25] scales_1.1.1            mvtnorm_1.1-2           digest_0.6.27          
## [28] foreign_0.8-81          minqa_1.2.4             rmarkdown_2.9          
## [31] rio_0.5.27              pkgconfig_2.0.3         htmltools_0.5.1.1      
## [34] highr_0.9               dbplyr_2.1.1            rlang_0.4.11           
## [37] readxl_1.3.1            rstudioapi_0.13         farver_2.1.0           
## [40] jquerylib_0.1.4         generics_0.1.0          jsonlite_1.7.2         
## [43] broom.mixed_0.2.7       zip_2.2.0               car_3.0-11             
## [46] magrittr_2.0.1          Rcpp_1.0.7              munsell_0.5.0          
## [49] fansi_0.5.0             abind_1.4-5             lifecycle_1.0.0        
## [52] stringi_1.7.3           yaml_2.2.1              carData_3.0-4          
## [55] MASS_7.3-54             plyr_1.8.6              grid_4.0.4             
## [58] parallel_4.0.4          crayon_1.4.1            lattice_0.20-44        
## [61] haven_2.4.2             splines_4.0.4           hms_1.1.0              
## [64] knitr_1.33              pillar_1.6.2            boot_1.3-28            
## [67] estimability_1.3        codetools_0.2-18        reshape2_1.4.4         
## [70] reprex_2.0.0            glue_1.4.2              evaluate_0.14          
## [73] blogdown_1.4            data.table_1.14.0       modelr_0.1.8           
## [76] vctrs_0.3.8             nloptr_1.2.2.2          tzdb_0.1.2             
## [79] cellranger_1.1.0        gtable_0.3.0            assertthat_0.2.1       
## [82] xfun_0.24               openxlsx_4.2.4          xtable_1.8-4           
## [85] broom_0.7.9             coda_0.19-4             lmerTest_3.1-3         
## [88] ellipsis_0.3.2