Explicit and Implicit Devaluation Effects of Food-Specific Response Inhibition Training

The overvaluation of reward-associated stimuli such as energy-dense foods can drive compulsive eating behaviours, including overeating. Previous research has shown that training individuals to inhibit their responses towards appetitive stimuli can lead to their devaluation, providing a potential avenue for behaviour change. Over two preregistered experiments, we investigated whether training participants to inhibit their responses to specific foods would be effective in reducing their evaluations when these were assessed using both explicit and implicit measures. Participants completed an online session of go/no-go training with energy-dense foods that were consistently associated with either responding (go) or inhibiting a response (no-go). An ‘explicit’ devaluation effect was expected as a reduction in self-reported liking from pre-to post-training for no-go items compared to both go items and foods that were not presented during training (untrained items). An ‘implicit’ devaluation effect was then measured using the affective priming paradigm, by comparing differences in reaction times for congruent and incongruent trials (i.e., priming effects) between food primes. Experiment 1 revealed conclusive evidence for small-to-medium devaluation effects both in terms of explicit ratings and priming effects. We also observed that the priming effect for no-go items was close to zero. Experiment 2 successfully replicated most of the preregistered and exploratory outcomes from Experiment 1 except for the priming effect for untrained items. Potential explanations for this discrepancy are discussed but overall, these findings provide further support for a devaluation effect of response inhibition training. To our knowledge, our study provides the first evidence that training-induced devaluation can potentially be captured by affective priming measures, but more research is needed to further assess their sensitivity before they can be used to elucidate the mechanisms of action underlying devaluation effects.


Supplementary Information (SI) S1. Recruitment and data exclusions
Figure S1.Recruitment and data exclusion diagrams.In the recruitment stage we do not take into account the number of participants who dropped out of the study before it was initiated and only consider the total number of individuals who started the study and answered screening questions to assess their eligibility.In this figure we report the number of participants who had incomplete data (voluntary or software-related drop-outs), were not eligible to participate or were excluded after study completion based on preregistered criteria for data analysis.A. In Experiment 1, the majority of participants was recruited via Prolific and a number of individuals from the general population conducted the study online by receiving the link through personal communication.A total of 120 participants were eligible to participate and completed the study.There were minimal exclusions for accuracy in the affective priming paradigm (APP; food or non-food blocks) and the proportion of correct responses on no-go trials in the go/no-go training (GNG) task.After seven exclusions, the final data consisted of 113 participants.B. In Experiment 2, the recruitment was extended to include participants from Cardiff University who used the Experimental Management System (EMS) for course credits.A total of 290 participants completed eligibility screening and 72 were excluded although inclusion/exclusion criteria were advertised with the study.From the 218 participants who participated in the study, we received ten datasets that were incomplete (GNG and/or APP data).Out of 208 participants who completed the study, 18 were excluded for their accuracy in the APP and GNG as shown in the diagram.The final data for analysis consisted of 190 participants.There were no data exclusions due to APP timing delays in neither Experiment 1 nor Experiment 2. ER: Error rate(s); PCnogo: Proportion of correct responses on no-go trials

S2. Bayes Factor Design Analyses
For the Bayes Factor Design Analysis (BFDA) we used the BFDA R package (Schönbrodt & Stefan, 2018, 2019) and the code is available at https://osf.io/evcng/.For the BFDA simulations the design priors were the same as the planned analysis priors for both Experiments 1 and 2. For the planned directional Bayesian paired samples-samples t-tests in Experiment 1 (H1-H3), we examined the probabilities of simulated studies (10,000) terminating at either the boundary for the alternative hypothesis (H1) or the null (H0) when the specified evidential threshold was reached.Simulations were run for nmin (N=50) and nmax (N=130) as well as the stopping points for data inspection (i.e., every 20 participants after nmin was reached; N=70, N=90 and N=110).These simulations were repeated for three different effect sizes as given by Cohen's dz (0.2, 0.5, 0.8) and a true effect of zero (i.e., H0).
Analyses for Experiment 1 showed that 62.62% of all simulated correctly terminated at the H0 boundary when nmax was reached and the probability of obtaining false positive evidence was low, with only 0.95% of studies incorrectly terminating at the H1 boundary.At nmax, assuming a ‗small' true effect size (dz= 0.2), only 35.65% of studies correctly terminated at the H1 boundary and 4.6% of studies incorrectly terminated at H1 and no studies reached the H0 boundary at the specified evidential threshold (0%).For an assumed ‗large' true effect size (dz= 0.8), 99.99% of studies correctly terminated at H1 with 70 participants.
The BFDA procedure in Experiment 2 was adjusted to account for the direct replication of findings and simulations were only run for the expected effect size of 0.35 under H1 (for justification and details on how this effect size was obtained please see Analyses for Experiment 2).The maximum sample size was increased to 200 and consistent with Experiment 1 we used symmetric boundaries for our stopping rule (BF 10 ≥10 or BF 01 ≥10).The BFDA was performed for the Bayesian paired samplessamples t-tests that correspond to H1, H2, H3 and H4 in Experiment 2. The R code is available at https://osf.io/esy8x/.
Results showed that 80% of all studies stopped at N= 90 and 95% of all studies stopped at N= 130 if H1 was true.The false negative rate was 0.6% and 98.4% of studies correctly terminated at the H1 boundary.Only 1% of studies terminated at nmax, out of which only 0.4% were inconclusive (anecdotal evidence for H1 or H0).Under H0, 80% of all studies stopped at N= 130 and 95% of all studies stopped at N= 190.The false positive rate was 2.9% and 88.6% of studies correctly terminated at the H0 boundary.A total of 8.6% of studies terminated at nmax with only 0.4% showing evidence for the alternative hypothesis compared to the null (BF10 > 3) and 4.7% showing evidence for the null hypothesis compared to the alternative (BF01 > 3).

S3. Food and non-food stimuli characteristics
The set of food and non-food stimuli used for selection was adopted from previous work that employed the affective priming paradigm as an indirect measure of food liking (see Tzavella et al., 2020 and corresponding Supplementary Material).For food categories that were added in this study we used the food-pics database (Blechert, 2019;Blechert et al., 2014) and Pixabay (https://pixabay.com/).All stimuli that could be shared openly in the public domain are available at https://osf.io/u36nd/and IDs from the food-pics database are also provided for copyright-protected stimuli (https://osf.io/qbjcg/).The nutritional information of the foods has not been recorded in detail for this study as the task design only included food categories that were considered high in fat, sugar and/or salt according to the NHS colour-labelling system (NHS, 2018) and food choice behaviour was not investigated.Where appropriate, stimuli were edited to have a white background and matched dimensions.

S4. Targets in the affective priming paradigm
At the time of study preregistration, targets were selected based on ratings from 84 individuals recruited as part of ongoing data collection for a previous study (Tzavella et al., 2020).Valence and arousal ratings were recorded, and participants were also asked to indicate which targets they considered ambivalent and if any, such words were excluded (e.g., -alone‖).The descriptive statistics for positive and negative target word ratings can be found in Tables S1 and S2 respectively.The sets of positive and negative targets in our previous study were originally selected from the EMOTE database (Grühn, 2016) and were matched as much as possible on imagery, concreteness, familiarity, and emotionality.Faster to categorise positive words when the picture was positive (i.e., picture you liked the most) Faster to categorise negative words when the picture was negative (i.e., picture you liked the least) Slower to categorise positive words when the picture was negative Slower to categorise negative words when the picture was positive Responses were not influenced by the content of the pictures Q5.Did you find all the words in the task clearly positive or negative?Certain words may be considered unclear or ambivalent.These may be words that have both positive and negative meaning for you depending on the context.If not, please type in any words in the text box.

S6.1 Normality violations in Experiment 1
Shapiro-Wilk tests revealed that all contrasts (i.e., difference scores of the sample means) under H2 did not follow a normal distribution and as planned, the median RTs were log-transformed and RT priming effects were re-computed for supplementary analyses.Consistent with the results from confirmatory analyses, there was very strong evidence for both H2a and H2b.The overall logRT priming effect for nogo foods was lower relative to the observed for go foods [H2a; BF 10 = 67.44;t( 112

S6.2 Normality violations in Experiment 2
RT priming effects under H2 did not violate the normality assumption, as defined in our preregistered exclusion criteria and therefore supplementary analyses were not conducted.On the contrary, Shapiro-Wilk tests suggested deviations from normality for all contrasts under H4.The results from the supplementary analyses with logRTs were consistent with the results from the preregistered analyses.

S7. Inspection for speed-accuracy trade-offs
Error rates on congruent and incongruent trials across training conditions were inspected for the potential occurrence of speed-accuracy trade-offs (SATs), where participants strategically choose between accuracy and speed (e.g., slowing down to be more accurate, or responding consistently fast and ignoring any errors).Error rates would need to be either greater on incongruent compared to congruent trials or statistically no difference should be observed.Bayesian and frequentist paired samples t-tests were not directional (or two-tailed).We used the same analyses priors employed for preregistered analyses in the two experiments.

S8. Robustness checks for analysis prior in Experiment 1
To assess how robust our results were to the choice of a default Cauchy prior in the first Experiment we first conducted checks using JASP to examine whether evidence for our hypothesesas indicated by the computed BFs -was changed when a wide and ultrawide prior was used (see Figure S2).None of the directional t-tests had any meaningful differences in terms of the strength of evidence for the alternative compared to the null hypothesis.For our manipulation check (H3) with an ultrawide prior the BF was 93.87 (i.e., just under the threshold for extreme evidence).After the checks were completed, we recomputed all BFs for the pre-registered hypotheses using the updated informed prior from Experiment 2 and additional location parameters (see Table S3) to account for the smallest effect size of interest

S9. Priming effects for low and high liking ratings
To what extent might the washing out of the no-go priming effect reflect the elimination of response tendencies after training rather than attenuated evaluations of the food stimuli?To answer this question, we undertook an exploratory analysis to compare the training-induced reduction of the priming effect between foods that achieved lower vs. higher liking ratings.We reasoned that if training eliminated response tendencies (independently of food devaluation) then the reduction of the no-go priming effect should be insensitive to liking ratings.On the other hand, if training caused devaluation then the reduction should be greater for lower-rated foods (see Figure S3 below).As the experiments did not differ in their task designs or procedure, for this analysis we pooled the data across Experiment 1 and 2 to maximise sensitivity (N=303).In each training condition there were only 8 food items, represented by two exemplars.This meant that there were not enough available food categories to extract a ‗moderate' and ‗high' liking spectrum of scores, so instead we applied a median split to assign all food items in each condition into low and high categories.This was undertaken for both pre-training and post-training ratings.
As a first step in this analysis, we examined the distribution of mean liking ratings from the pretraining phase and excluded participants who had a negative average (<0) for no-go or untrained foods as this would indicate an overall disliking of the items (N=243) .Similarly, we excluded extremely positive values (>80) as they could not be considered moderate (N=230).Next, we calculated the average RT priming effect (ΔRT) for lower-rated and higher-rated foods in the no-go and untrained conditions (see Figure S3) based on the ratings acquired post-training.Our first assumption for this analysis was that nogo foods that were rated more negatively would be easier to devalue and would therefore show a greater devaluation effect compared to lower-rated untrained foods.As shown in Figure S3, we expected that this devaluation effect for lower-rated foods (no-go < untrained) would be greater than that for higher-rated foods.Therefore, our next step in this analysis was to calculate the two priming effects directly (untrained no-go for lower-rated foods and untrainedno-go for higher-rated foods).If a no-go priming effect was completely diminished due to items not eliciting any response tendencies, we would expect no difference between the two devaluation effects (RT priming differences).However, if the devaluation effects reported as part of our pre-registered analyses were driven by a change in the strength of liking for no-go stimuli then we would expect that the devaluation effect for lower-rated items would be greater than that for higher-rated items.In the final step of the analysis we further removed outliers for the calculated difference scores based on the IQR rule, as previously explained in S6.1, which resulted in a sample of 215 participants.We conducted Bayesian paired samples t-tests with the informed prior parameters from Experiment 2 and assuming a small effect (0.20) for the expected difference (devaluation effect for lower-rated foods > devaluation effect for higher-rated foods).Supplementary frequentist statistics are provided for completeness.Consistent with the devaluation hypothesis, there was moderate evidence that the priming devaluation effect for lower-rated foods post-training (M = 15.67,SE = 3.29) was greater than that for higher-rated foods (M = 5.28, SE = 3.47) [BF10 = 8.30; t(214) = 2.37, p = 0.009, d = 0.16; also see Figure S4].Looking at each devaluation effect separately, there was extreme evidence for a small-to-medium effect for lower-rated items as the ΔRT was smaller for no-go (M = ˗7.45,SD = 45.04)compared to ) = −3.53,p < .001,d av = −0.37,95% CI for d av = −0.58,−0.16].The logRT priming effect for no-go foods was also lower compared to the effect for untrained foods [H2b; BF 10 = 33.00;t(112) = −3.30,p = 0.001, d av = −0.31,95% CI for d av = −0.50,−0.12].The difference scores for changes in liking ratings from pre-to post-training could not be log-transformed, but supplementary analyses were conducted in order to examine whether the devaluation effect was robust after the removal of extreme values in the data.Participants were excluded from these analyses if based on the Interquartile Range Rule (IQR Rule) their difference scores were above or below the acceptable maximum and minimum values 1 , respectively.The sample size after the removal of outliers was 103 and there was still extreme evidence that participants rated no-go foods more negatively after training relative to both go foods [H1a; BF 10 = 1126.94;t(102) = −4.37,p < .001,d av = −0.50,95% CI for d av = −0.73,−0.26] and untrained foods [H1b; BF 10 = 162.36;t(102) = −3.81,p < .001,d av = −0.36,95% CI for d av = −0.56,−0.17].
) = 0.19, p = 0.576, d av = 0.01, 95% CI for d av = −0.06,0.07).After removal of outliers based on the IQR Rule for changes in explicit liking from pre-to posttraining, there were 173 participants in the sample.The supplementary analyses were not consistent with the preregistered analyses without outlier exclusion with regards to untrained foods.There was strong evidence that participants rated no-go foods more negatively after training compared to untrained foods [H1b; BF 10 = 10.36;t(172) = −2.69,p = 0.004, d av = −0.19,95% CI for d av = −0.34,−0.05].After exclusions for outliers, there was still extreme evidence that the change in explicit liking for no-go foods was more negative relative to the change for go foods [H1a; BF 10 = 888.80;t(172) = −3.94,p < .001,d av = −0.30,95% CI for d av = −0.45,−0.15].

(
d=0.163; smaller effect reported in Study 2 ofChen, Veling, Dijksterhuis et al. 2018  for low value food items) and the average expected effect size that has informed the sampling plan in similar studies (d=0.537;e.g.,Chen, Veling, de Vries et al. 2018;Chen et al. 2019).

Figure S2 .
Figure S2.Robustness check plots for Cauchy priors in Experiment 1.The computed Bayes Factors

Figure S3 .
Figure S3.Predicted differences between devaluation effects for lower-and higher-rated foods if the training influenced valuation of the foods rather than a generalised reduction of response tendencies.

Figure S4 .
Figure S4.Plotted devaluation effects for lower-and higher-rated foods.Vertical bars in the boxplots indicate the range, excluding the outliers based on the Interquartile Range.

Table S2 . Descriptive statistics for negative target word valence and arousal ratings
Please answer the following questions about the main word task you completed.Please try to respond honestly.Research shows that people, when answering questions, prefer not to pay attention and minimise their effort as much as possible.If you are reading this, please select "none of the above" on the next question.
Q3. How frequently did you see the content of the picture that was presented before the word?1=Never; 2=Very infrequently; 3=Somewhat infrequently; 4=Occasionally; 5=Somewhat frequently; 6=Very frequently; 7=Always Q4.Please indicate whether you believe that the picture content influenced your responses in any way by selecting all statements below that apply to your performance in the word task.
Please select all that apply.Slowed down to be more accurate Responded fast most of the time and ignored any errors No strategy used Other [open-ended response] Were you aware of the study hypotheses/aims prior to completion?If yes, please explain.For example, did you know that the attention task is a form of training and/or what is measured by the word task?
7-point Spinella, 2007)=strongly disagree to 7=strongly agree) Q7.Did you purposefully use any kind of strategy to make your responses faster and/or more accurate?-No-Yes[open-endedresponse]Q11.During the attention task, did you learn that on occasions where you shouldn't respond there were specific food images being shown?-No -YesAll questionnaires added for student analyses were completed at the end of the study in random order and these included the short form of the Barratt Impulsiveness Scale(BIS-15;Spinella, 2007), the Food Cravings Questionnaire -Trait -reduced (FCQ-T-r;Meule et al., 2014), the Perceived Stress Scale (PSS;