The Effect of Stimulus-Response Compatibility on the Association of Fluid Intelligence and Working Memory with Choice Reaction Times

It is a well-replicated finding that reaction time is correlated with performance in intelligence tests. According to the binding hypothesis of working memory capacity, the ability to establish bindings between elements and to integrate them into new structural representations is the source of the common variance between different cognitive tasks, including fluid intelligence and working memory. The goal of this study was to examine the effects of stimulus-response compatibility on the association between reaction time, fluid intelligence, and working memory. Based on the binding hypothesis, we expected that correlations between reaction time and fluid intelligence would be larger for arbitrary than for compatible stimulus-response mappings. We report data from two studies (Study 1: n = 135, mean age = 18 years; Study 2: n = 153, mean age = 17 years). We used choice reaction time tasks with compatible and arbitrary mappings as well as indicators of fluid intelligence and working memory (Study 1) and fluid and crystallized intelligence (Study 2). In both studies, we established a measurement model that included a factor reflecting general reaction time, and a nested factor reflecting the cost of establishing and maintaining arbitrary stimulus-response bindings. The results of Study 1 supported the hypothesis that the ability to uphold arbitrary bindings is correlated with working memory, but it was not correlated with fluid intelligence. In Study 2, the correlations between the binding factor and fluid and crystallized intelligence were again not significantly different from 0. We discuss possible reasons for these findings.


Results
Experimental effects. We conducted ANOVA with two within-subject variables: (1) instruction (speed vs. accuracy) and (2) SR mapping (compatible vs. arbitrary). Analyses were run separately by type of stimuli (arrow, shape, word). Results are summarized in Table 2. Central to our research questions, we observed effects of SR-compatibility on RT for two out of three tasks: The effect of SR-compatibility on RT was significant for arrow and shape tasks, but not for the word task. SR-compatibility had the expected effect on accuracy for all three tasks.
Measurement models for choice RT tasks. Because our main research questions concerned the effect of SR-compatibility on covariances of RT with gf and WM, we considered data from the word task not useful for answering these questions. Subsequent analyses were performed on RTs from arrow and shape tasks. We first tested a general factor model (Model A, see Figure 3) reflecting common variance among all choice RT tasks. Standardized model parameters and fit indicators are provided in Figure 3. Model A showed poor fit to data. Next, we established a latent factor reflecting shared variance due to instruction (speed vs. accuracy). In this model (Model B, see Figure 3), the nested factor "instruction" reflected shared variance due to accuracy instruction. The introduction of the nested instruction factor led to a significant increase in model fit (Δχ 2 = 104.58, Δdf = 4, p < .01), and to an improvement in fit indices. In a third step, we added a latent   , p = .07, CFI = .99, RMSEA = .07, SRMR = .04. Models only included indicators of reaction time in the arrow and shape tasks because no consistent stimulus-response compatibility effects were found in the word task. Model B fit the data better than Model A and Model C fit the data better than Model B. N = 135, *p < .05.
"binding" factor reflecting shared variance due to arbitrary SR mappings (Model C, see Figure 3). This led to a further significant increase in model fit (Δχ 2 = 27.25, Δdf = 4, p < .01) and to an improvement in fit indices. All four loadings of the binding factor were significant (weakest loading: p = .02; all others p < .01). Structural model. We extended Model C to test our hypotheses regarding correlations of RT with gf and WM. gf and WM were modeled as correlated factors. The structural model and standardized model parameters are shown in Figure 4. In line with the prediction of the binding hypothesis, the correlation of the binding factor with WM was significant (ρ = -.34, p = .01, CI = -.59 to -.08). The correlation between the binding factor and gf had the expected sign, but the 95% confidence interval included zero (ρ = -.24, p = .06, CI = -.48 to .01). General RT was correlated with gf (ρ = -.23, p = .03, CI = -.44 to -.02) and WM (ρ = -.26, p = .02, CI = -.49 to -.04). Correlations of the instruction factor with gf (ρ = -.09, p = .39, CI = -.31 to .12) and WM (ρ = -.04, p = .71, CI = -.27 to .18) were not significant. In sum, these findings are partially in line with those reported by Wilhelm & Oberauer (2006); however, correlations were smaller. Descriptive statistics for variables included in this model are provided in Table 3.

Study 2 Method
Participants and testing procedure. Data were collected in 2007 and 2008 from 171 students attending 11th and 12th grades at 6 schools in Berlin and Brandenburg, Germany in two 90-minute sessions. We excluded data from 14 participants not completing both sessions and four participants performing below chance level at one or more RT tasks. The final sample included 153 participants (65 male, 88 female, age: mean = 17.2 years, SD = 0.9, range = 16-20).
Measures. An overview of measures in Study 2 can be found on the OSF website (https://osf.io/cx52g/). Two-and four-choice RT tasks were used with varying SR-compatibility (compatible vs. arbitrary) and SR  sets (arrows vs. words) and administered on a customized keyboard (picture available here: https://osf.io/ tf4mc/). Participants were asked to respond as fast and as accurately as possible. The shape task was not used in Study 2 because of time constraints. SR mappings are shown in Figure 5. The tasks started with an instruction on the relevant SR mapping followed by practice and test blocks containing trials of the same SR mapping. Stimuli remained on the screen until participants responded. Compatible and arbitrary conditions of two-choice tasks were practiced in two and four blocks consisting of 8 trials each, respectively. Compatible and arbitrary conditions of four-choice tasks were practiced in three and six blocks consisting of 16 trials each, respectively. For each condition of two-choice and four-choice tasks, we used 6 blocks consisting of 21 and 41 trials each, respectively. After each practice and test block, feedback on mean accuracy and RT was provided. Data for the RT tasks including latency and accuracy for each trial are provided on the OSF website (https://osf.io/x3vkc/). The practice trials and the first trial in each block were discarded in further analyses. Data treatment followed the same procedure as in Study 1. Table 4 provides an overview of descriptive statistics for latencies and accuracies in choice RT tasks. We used four tests to measure gf, including three paper-pencil tests (solving equations, propositional reasoning, and matrices; Wilhelm, 2005) and a computerized version of 15 items from Set II of the Advanced Progressive Matrices (Raven, 1958). gc was measured with 43 items of the general knowledge test from the IST-2000-R (Amthauer, Brocke, Liepmann, & Beauducel, 2001). Data for the gf and gc tasks including raw responses to each item are provided on the OSF website (https://osf.io/rb7qu/). Three composites were built for verbal, numeric, and figural content. WM was not measured because of time constraints.

Results
Experimental effects. We conducted ANOVA with two within-subject variables: (1) number of response alternatives (two-choice vs. four-choice) and (2) SR mapping (compatible vs. arbitrary). Analyses were run separately for both types of the word and arrow task (response keys: left/right and up/down and corresponding arrow keys, respectively). Results are summarized in Table 5. Central to our research question, SR-  compatibility had an effect on average latency. Effect sizes were stronger for arrow than word tasks. For both arrow and word tasks, SR-compatibility effects differed by the number of response alternatives, as indicated by the interaction effect. An examination of descriptive statistics (Table 4) suggested that for the arrow task, SR-compatibility had effects in both the two-choice and four-choice task, but that the effect was stronger for the four-choice task. For the word task, no effect of SR-compatibility on latencies was found for the twochoice tasks, while SR-compatibility had an effect on latencies in the four-choice task. Effects on accuracies were less consistent: SR-compatibility had at best minor effects on accuracy in the word task. Measurement models for choice RT tasks. We only included RTs from the arrow task in the measurement model, because (a) SR-compatibility effects on RT were weaker in the word task, (b) SR-compatibility had no consistent effect on accuracies in the word task, and (c) this allowed keeping models more consistent across Studies 1 and 2. We first tested a general factor model reflecting common variance among all choice RT tasks with correlated residuals between compatible and arbitrary conditions of the four-choice task (Model A, see Figure 6). Standardized model parameters are provided in Figure 6. Model A showed acceptable fit by some of the fit criteria, except for RMSEA, which indicated poor fit. Next, we established a latent factor reflecting additional costs of binding (Model B, see Figure 6). The nested factor "binding" reflected shared variance due to arbitrary mappings. The introduction of the binding factor led to a significant increase in model fit (Δχ 2 = 39.66, Δdf = 3, p < .01) and to an improvement in fit indices. All loadings of the binding factor were significant (p < .01).
Structural model. We extended Model B to test our hypotheses regarding the correlation between RT and gf. The structural model and standardized model parameters are shown in Figure 7. The correlation between general RT and gf was moderately negative (ρ = -.23, p = .02, CI = -.42 to -.03). Contrary to the predictions of the binding hypothesis, the correlation between gf and binding was not significant (ρ = -.08, p = .54, CI = -.31 to .17). gc was unrelated to general RT (ρ = -.12, p = .25, CI = -.31 to .08) and binding (ρ = -.02, p = .88, CI = -.26 to .22). Descriptive statistics for variables included in this model are provided in Table 6.

Discussion
We examined effects of SR-compatibility on correlations of RT with gf and WM. Based on the binding hypothesis, we expected that gf and WM would be related to the ability to establish and uphold arbitrary bindings between independent elements. This ability is especially important with arbitrary SR mappings because there are no preexisting associations between stimuli and the corresponding responses. In Study 1, partially in line with the predictions of the binding hypothesis, WM was correlated with the latent binding factor (ρ = -.34, p = .01, CI = -.59 to -.08). The correlation between the binding factor and gf had the expected direction but was not significant (ρ = -.24, p = .06, CI = -.48 to .01). In Study 2, there was no evidence for a correlation of gf with the binding factor (ρ = -.08, p = .54, CI = -.31 to .17).
The present studies were conceptual replications of the study by Wilhelm and Oberauer (2006) who found higher correlations of WM and gf with a latent binding factor than with a general RT factor. Study 1 partially replicated these findings; however, the correlations with binding were substantially weaker (ρ = -.24 with gf, and ρ = -.34 with WM) than in the previous study (ρ = -.55 with gf, and ρ = -.89 with WM; Wilhelm & Oberauer, 2006). Study 2 did not replicate their findings. In sum, we found mixed evidence for the binding hypothesis and we could not replicate the strong correlation found by Wilhelm and Oberauer (2006). The study by Wilhelm and Oberauer (2006) was based on a smaller sample, and their correlations of binding with gf and WM were probably overestimated.
The inconsistency of the present results with Wilhelm and Oberauer (2006) is arguably due to a confound between response conflict and binding in the earlier study. One difference between the studies is that there was dimensional overlap between the SR sets both in compatible and arbitrary conditions in the study by Wilhelm and Oberauer (2006). In arbitrary conditions, the corresponding response key for each stimulus was present among the response alternatives, but not the correct response. This possibly resulted in response conflicts, which may have led to an increased need for inhibitory control (Kane & Engle, 2003). Therefore, individual differences in the ability to control response conflict (for instance through inhibition of erroneous response tendencies) offer an alternative explanation of the correlation of arbitrary SR mappings with WM and gf. Obviously, both explanations might also work in concert. In the current study, response conflicts were avoided by generating SR sets without dimensional overlap for the arbitrary-mapping conditions. One inevitable downside of the present task design is that the tasks with compatible vs. arbitrary mappings used different stimulus categories (e.g., arrows vs. colors). The compatibility effect is therefore potentially confounded with differences in how easily these stimulus categories can be perceptually identified and discriminated. Moreover, if the differences between stimuli vary across individuals, then individual differences    in compatibility effects could in part reflect individual differences in perceptual identification and discrimination. This contamination of the variance reflected in the "binding" factors could be responsible for their weaker associations with intelligence and WM capacity. Against this assumption, Meier & Kane (2015) did not find any evidence that RT costs due to SR incompatibility were related to individual differences in WM. However, their study involved compatible vs. incompatible, but no arbitrary SR mappings. Another important difference between the present studies and the study by Wilhelm & Oberauer (2006) is that the previous study involved a more heterogeneous sample. The participants in the present studies were 11th and 12th grade students, whereas the participants in Wilhelm & Oberauer ranged between 18 and 36 years in age. They were also more heterogeneous in terms of educational background: 82% of participants had completed the German academic high school track, whereas all participants in the present studies were students attending the academic track. This may have resulted in higher correlations between all cognitive variables in Wilhelm & Oberauer. Specifically, we found a correlation of ρ = .45 between gf and gc in Study 2 whereas Wilhelm and Oberauer reported a correlation of ρ = .75. Furthermore, the correlations between Arrow Series, Number Series, Propositions, and Raven's matrices in Study 1 ranged from r = .07 to .37. The same correlations ranged from r = .32 to .50 in Wilhelm and Oberauer (2006). The correlations between rotation span, counting span, and memory updating in Study 1 ranged from r = .31 to .42; the same correlations ranged from r = .47 to .57 in Wilhelm & Oberauer (2006). In contrast, the correlation between gf and WM was somewhat higher in Study 2 (ρ = .92) than in the study by Wilhelm & Oberauer (2006) (ρ = .81).
In studies 1 and 2, we observed that SR-compatibility depended on the SR set: With words as stimuli, no or weaker effects of SR-compatibility were found. More research is needed to understand why effects of SR-compatibility were non-existent or weaker in the word task.