A Database of Dutch–English Cognates, Interlingual Homographs and Translation Equivalents

To investigate the structure of the bilingual mental lexicon, researchers in the field of bilingualism often use words that exist in multiple languages: cognates (which have the same meaning) and interlingual homographs (which have a different meaning). A high proportion of these studies have investigated language processing in Dutch–English bilinguals. Despite the abundance of research using such materials, few studies exist that have validated such materials. We conducted two rating experiments in which Dutch–English bilinguals rated the meaning, spelling and pronunciation similarity of pairs of Dutch and English words. On the basis of these results, we present a new database of Dutch–English identical cognates (e.g. “wolf”–“wolf”; n = 58), non-identical cognates (e.g. “kat”–“cat”; n = 74), interlingual homographs (e.g. “angel”–“angel”; n = 72) and translation equivalents (e.g. “wortel”–“carrot”; n = 78). The database can be accessed at http://osf.io/tcdxb/.

Despite the abundance of research using cognates and interlingual homographs, and the high proportion of those studies that investigated language processing in Dutch-English bilinguals specifically, few studies exist that have extensively validated or pre-tested such materials. Indeed, the biggest study to do so for Dutch and English that we are aware of was conducted by Dijkstra et al. (2010). As part of the stimulus development for a series of experiments examining the impact of cross-linguistic similarity, they asked Dutch-English bilinguals to rate pairs of Dutch and English words in terms of their semantic, orthographic and phonological similarity. This rating experiment yielded a set of 360 words, all of which had a semantic similarity rating of greater than 6 on their 7-point scale. Half of the items had an orthographic similarity rating of less than 2 (and were considered the control items in their subsequent experiments), while the other half had ratings greater than 2 (which were considered the ' cognates'). Notably, however, only 31 items were identical cognates.
Another large study that collected similarity ratings from Dutch-English bilinguals was conducted by Tokowicz, Kroll, De Groot, and Van Hell (2002). Their aim was to collect number-of-translation norms for 562 Dutch-English translation pairs. As in the rating study conducted by Dijkstra et al. (2010), they asked Dutch-English bilinguals to rate the semantic similarity of these pairs of words but, in contrast to Dijkstra et al. (2010), their participants were asked to give a single 'form' similarity rating, taking both the pairs' orthographic and phonological similarity into account. Again, the authors state that approximately 40% of these pairs could be considered cognates, but only 35 pairs were identical in form. In short, although resources exist that have validated the cross-linguistic similarity of pairs of Dutch and English words, these resources contain very few identical pairs and, if they do, most of these are cognates. No one has, as yet, attempted to validate a set of Dutch-English interlingual homographs.
The aim of the experiments presented here was to fill that gap. We conducted two rating experiments to develop a database of Dutch-English identical and non-identical cognates and identical interlingual homographs, 1 as well as Dutch-English translation equivalents. The identical cognates, non-identical cognates and translation equivalents were rated in Experiment 1; the interlingual homographs were rated in Experiment 2. 2 Similar to Dijkstra et al. (2010 andTokowicz et al. (2002), we asked Dutch-English bilinguals to rate the items' similarity in Dutch and English in terms of their meaning, spelling and pronunciation. Ratings were obtained for meaning, spelling and pronunciation similarity as these variables critically affect word processing in bilinguals (e.g. Dijkstra et al., 1999;. Phonological similarity is not usually considered a core feature of the definitions of the word types, however, so the pronunciation similarity ratings were obtained for the sake of completeness only and were not used to discard any items from the database. Furthermore, as we intended to use these items in a cross-lingual priming paradigm in which participants would first read Dutch sentences that contained one of the stimuli (see Poort and Rodd (2017, May 30) and Experiment 2 of Poort and Rodd (in press), we decided to provide these sentences in the rating experiments as well.
In these experiments, we adopted the following definitions of the critical word types: • Identical cognates were defined as words that had an identical written form in both Dutch and English and highly similar meanings in both languages (e.g. "wolf"-"wolf"). • Non-identical cognates were defined as having very similar but not identical forms in Dutch and English and highly similar meanings in both languages (e.g. "kat"-"cat"). • Interlingual homographs were defined as having identical forms in Dutch and English, but different and unrelated meanings (e.g. "angel"-"angel", where "angel" means "insect's sting" in Dutch). • Translation equivalents were defined as a pair of Dutch and English words that were translations of each other but whose written forms were not at all or only minimally similar (e.g. "wortel"-"carrot").
1 Non-identical interlingual homographs technically also exist (e.g. the Dutch word "prei" means "leek" while the English word "prey" refers to something that is hunted), but these are much harder to operationalise. This is most likely because few bilinguals would consider the ' conflict' for a non-identical interlingual homograph to be as strong as for an identical interlingual homograph. Consequently, these items are not often used in research and we did not set out to validate a set of non-identical interlingual homographs. 2 The two experiments were conducted separately for the simple reason that we initially set out only to create a database of identical and non-identical cognates and translation equivalents. However, as the Additional analyses reported for the second experiment show, the participants in the two experiments did not use the scales in a meaningfully different manner. This indicates that the fact that the ratings were obtained in separate experiments does not affect their validity.

Materials
We first collected an initial set of 103 identical cognates, all nouns and/or adjectives between 3 and 8 letters long. (Note that some of the items could also be used as verbs in English, such as "plan"-"plan".) Sixty-one of these items were taken from Dijkstra et al. (2010) and Tokowicz et al. (2002). The rest of the identical cognates were selected from other published research articles (see Table 1). Of the 61 identical cognates selected from Dijkstra et al. (2010) and Tokowicz et al. (2002), we discarded all items with a meaning similarity rating that was less than 6 on their 7-point scales. We also discarded any items with a frequency in Dutch or English that was less than 2 occurrences per million according to the SUBTLEX-NL (Keuleers, Brysbaert, & New, 2010) and SUBTLEX-US (Brysbaert & New, 2009) 3 databases and one item that had a mean lexical decision accuracy of less than 85% in the English Lexicon Project (Balota et al., 2007). Finally, we discarded any items that were only identical in form when inflected (e.g. "pure"-"pure", where "pure" in Dutch is the inflected form of the adjective "puur"). Next, we collected an initial set of 134 non-identical cognates and 444 translation equivalents, again all nouns and/or adjectives between 3 and 8 letters long and with frequencies greater than 2 occurrences per million in both Dutch and English. We again selected only items that had received a meaning similarity rating greater than 6 on the 7-point scales used by Dijkstra et al. (2010) and Tokowicz et al. (2002). Furthermore, for the set of non-identical cognates, we selected only items with a score greater than 0.5 but less than 1 on an objective measure of orthographic overlap, which we calculated using the formula proposed by Schepens, Dijkstra, and Grootjens (2012): we divided the Levenshtein distance (Levenshtein, 1966) between the Dutch and English written forms of the word by the number of letters of the longest form of the word and subtracted this from 1. We also required that their form similarity rating (Tokowicz et al., 2002) or average orthography-phonology similarity rating (Dijkstra et al., 2010) was above 5. (Because Tokowicz et al. (2002) had asked their participants to take both spelling and pronunciation into account for a single 'form similarity' rating, we calculated an average of the orthographic and phonological similarity ratings items had received in the Dijkstra et al. (2010) study, to be more comparable to Tokowicz et al.'s (2002) form similarity rating.). The 444 translation equivalents had objective orthographic overlap scores of less than 0.5 and form similarity ratings (Tokowicz et al., 2002) or average of orthography-phonology similarity ratings (Dijkstra et al., 2010) of less than 3. All English forms of the items had a mean lexical decision accuracy in the English Lexicon Project (Balota et al., 2007) greater than 85%.
Because the identical cognates generally had lower frequency-of-use values than the non-identical cognates and translation equivalents, items with high frequency-of-use values were discarded. Similarly, the Published articles from which we selected many of the identical cognates and interlingual homographs that were rated in the two experiments. The first column lists the sources of identical cognates for the first experiment. The second column lists the sources of identical interlingual homographs for the second experiment.
identical cognates were less orthographically complex than the non-identical cognates and translation equivalents, so items with a high OLD20 in either Dutch or English were excluded. A word's OLD20 value is calculated as its mean orthographic Levenshtein distance to a its 20 closest neighbours (Yarkoni, Balota, & Yap, 2008). Finally, offensive words and items that could belong to more than one word type were excluded (e.g. the Dutch word "brood" is a non-identical cognate of the English word "bread", but also an identical interlingual homograph of the English word "brood"). After this second step in the selection procedure, a total of 65 identical cognates, 102 non-identical cognates and 315 translation equivalents remained. To determine the final set of stimuli to be rated, we let the software package Match (Van Casteren & Davis, 2007) select the 80 non-identical cognates and 80 translation equivalents that best matched the 65 identical cognates. Matching was based on log-transformed word frequency, word length and OLD20 in both Dutch and English. Note that, because Tokowicz et al.'s (2002) aim was to collect translation norms, many of the translation equivalents and some of the non-identical cognates appeared more than once in Tokowicz et al.'s (2002) materials with different translations (e.g. "afval"-"trash" and "afval"-"waste"). We manually made sure no word form was selected by Match twice. Table 2 lists means, minimums, maximums and standard deviations per word type for each of the matching measures (and raw word frequency) for both English and Dutch.
As mentioned in the Introduction, we intended to use these stimuli in a cross-lingual long-term priming experiment. In this experiment, the participants would first read Dutch sentences that contained one of the items. Therefore, the next step involved writing the Dutch sentences for the selected items (see Table 3 for example sentences). The sentences were between 6 and 12 words long and were written so that the target word was placed as far towards the end of the sentence as possible, as this minimises ambiguity. Each target word appeared only in its own sentence and not in any other sentence and only in its uninflected form (e.g. nouns were not pluralised). For nine of the non-identical cognates and 21 of the translation equivalents that Match selected, it was difficult to write a clear and concise sentence that complied with all of these criteria. These items were manually replaced with more suitable items of a similar frequency, length and OLD20.
Finally, to ensure that the participants would make full use of the rating scale for all three aspects-meaning, spelling and pronunciation similarity-across all items, 40 identical interlingual homographs and 21 nonidentical interlingual homographs were selected from Poort et al. (2016) and a list on Wikipedia (Wikipedia, 2014). We selected only words that were between 3 and 8 letters long. Any items for which either the Dutch or English frequency was less than 2 occurrences per million or more than 700 were discarded, as well as all items that belonged to more than one word type 4 and items for which it was difficult to write a clear and concise Dutch sentence. This left 31 identical and 14 non-identical interlingual homograph pairs to serve as fillers in the first experiment. The sentences for these items were written according to the same criteria as for the identical and non-identical cognates and the translation equivalents. A native speaker of Dutch then proofread all 270 sentences for both the targets and fillers and suggested corrections and clarifications where necessary.

Design and Procedure
The experiment was set up in Qualtrics (Qualtrics, 2015). Participants saw the English word (in bold) on the left and the Dutch sentence with the Dutch target word in bold on the right and were asked to rate, on a scale from 1 to 7, how similar the two words in bold were in terms of their meaning, spelling and pronunciation. As there were 225 target items, to reduce the total length of the experiment and minimise any effects of fatigue, we created five versions of the experiment, each containing 45 target items plus the 45 identical and non-identical interlingual homograph fillers. To allow us to check whether the participants had carefully read the sentences, each version also included an additional five catch trials for which the Dutch and English words could be translations of each other (varying in their degree of orthographic similarity), but in the context of the sentence the Dutch word required a different English translation. For example, the word "vorst" in Dutch can be translated as "frost" in English, but also as "monarch". The word "vorst" was then used in a Dutch sentence to mean "monarch", but the participants were asked to rate the similarity in meaning (and spelling and pronunciation) between "vorst" and "frost". Participants were randomly assigned to one of the five versions of the experiment and the order of items was randomised individually for each participant. Only five items were presented per screen, for a total of 19 screens. At the start of the experiment, the participants were shown six examples (including an example of a catch item) with suggested ratings. They filled in a language background questionnaire at the end.

Participants
Our aim was to recruit between 10 and 15 participants for each of the five versions of the experiment. Participants were eligible to participate in the experiment if they were a native speaker of Dutch and fluent speaker of English and had not been diagnosed with a language disorder. They also had to be between the ages of 18 and 50 and of Dutch or Belgian nationality. A total of 77 participants was recruited through personal contacts resident in the Netherlands, social media and word-of-mouth. The participants gave informed consent (by means of ticking a box on the online consent form) and participated for a chance to win an electronic gift card worth €100 (then roughly £75). The UCL Experimental Psychology Ethics Committee provided approval of our study protocol (Project ID: fMRI/2013/016). The data from one participant were excluded because this participant regularly rated the spelling and pronunciation similarity of the identical and non-identical cognates a 1 or 2. The data from an additional nine participants were excluded because these participants made more than three mistakes on the five catch trials.
The remaining 67 participants (14 males; M age = 23.5 years, SD age = 5.4 years) had started learning English from an average age of 7.7 (SD = 3.3 years) and so had an average of 15.8 years of experience with English (SD = 5.8 years). The participants rated their proficiency as 9.7 out of 10 in Dutch (SD = 0.6) and 9.2 in English (SD = 0.7). A two-sided paired t-test showed this difference to be significant [t(66) = 4.729, p < .001]. The five versions were completed by 13, 14, 12, 15 and 13 participants respectively. There were no differences between the versions with respect to the demographic variables reported here (as shown by ANOVAs and chi-square tests where appropriate; all ps > .125).

Findings
Mean ratings for the three word types (identical cognates, non-identical cognates and translation equivalents) for all three aspects (meaning, spelling and pronunciation similarity) can be found in Table 4. Overall, most items had received high (or low) ratings for the three aspects as expected for their word type.
All translation equivalents received meaning similarity ratings of 6 or greater. Seven identical and three non-identical cognates with meaning similarity ratings below 6 on the 7-point scale were discarded from the database. Unexpectedly, two identical cognates ("crisis"-"crisis" and "lens"-"lens") received spelling similarity ratings of less than 7. Since these two items were truly identical, they were not discarded. Two translation equivalents with spelling similarity ratings higher than 3 were discarded. Our intention was also to discard all non-identical cognates with spelling similarity ratings of less than 5, in line with the initial selection When she was a little girl she was quite cheeky.
catch item vorst frost Een andere aanduiding voor monarch is vorst.
A different term for monarch is sovereign. criteria, but 21 non-identical cognates met this criterion. In order not to reduce the number of stimuli too much, only the one non-identical cognate with a spelling similarity rating of less than 4 was discarded. In conclusion, the first experiment produced a database of stimuli that included 58 identical cognates, 76 nonidentical cognates and 78 translation equivalents.

Experiment 2
A second experiment was conducted to produce a database of identical interlingual homographs. This second experiment was designed in an identical manner as the first experiment.

Materials
Seventy additional identical interlingual homographs between 3 and 8 letters long were selected from the research articles listed in Table 1 or a list of identical entries in the SUBTLEX-US and SUBTLEX-NL databases (Brysbaert & New, 2009;Keuleers et al., 2010, respectively). In the latter case, all noun, verb and adjective entries between 3 and 8 letters long were extracted from the SUBTLEX-US and SUBTLEX-NL databases and those with identical forms but dissimilar meanings in Dutch and English (as judged by the first author) were manually selected. As in Experiment 1, from this initial selection any items that had a mean lexical decision accuracy in the English Lexicon Project (Balota et al., 2007) of less than 85% were discarded. Since it was more difficult to find identical interlingual homographs, items with frequencies of less than 2 occurrences per million that were considered to be well-known words regardless were retained, as well as words with a very high frequency or high OLD20. Similarly, we also included items that could be a (non-)identical cognate (e.g. "lever"-"lever", where "lever" in Dutch is also a non-identical cognate with the English word "liver"). Lastly, items for which it was difficult to write a clear and concise sentence in Dutch were excluded, as well as items that were only identical when inflected. A total of 56 items met these criteria. Table 2 lists means, minimums, maximums and standard deviations for each of these measures (and raw word frequency) for both English and Dutch. The sentences for these items were written according to the same criteria as for the first experiment and were proofread by the same native speaker of Dutch who proofread those sentences. Finally, to ensure again that the participants would make full use of the entire rating scale across all items for all three aspects they were asked to judge, seven identical cognates, seven non-identical cognates, seven non-identical interlingual homographs and 14 translation equivalents were selected from the materials for the first experiment to serve as fillers in the second experiment.

Design and Procedure
The experimental design and procedure of the second experiment was the same as that of the first, except that participants were now also able to indicate if they were not familiar with a word, as not all words met the desired frequency criteria. Two versions of the experiment were created, each containing 28 targets plus the 35 identical and non-identical cognate and translation equivalent fillers and the five catch trials from the first experiment.

Participants
Again, our aim was to recruit between 10 and 15 participants for each of the two versions of the experiment. A total of 24 participants was recruited using the same eligibility criteria and recruitment procedure as for the first experiment. The participants again gave informed consent (by means of ticking a box on the online consent form) and participated for a chance to win an electronic gift card worth €75 (then roughly £55). The UCL Experimental Psychology Ethics Committee provided approval of our study protocol (Project ID: fMRI/2013/016).
The data from one participant were excluded because this participant regularly rated the spelling and pronunciation similarity of the identical and non-identical cognates a 1 or 2. No participants made more than three mistakes on the five catch trials.
The remaining 23 participants (8 males; M age = 24.5 years, SD age = 5.9 years) had started learning English from an average age of 6.3 (SD = 4.0 years) and so had an average of 18.2 years of experience with English (SD = 5.0 years). The participants rated their proficiency as 9.5 out of 10 in Dutch (SD = 0.7) and 9.2 in English (SD = 0.7). A two-sided paired t-test showed this difference to be non-significant [t(66) = 1.628, p = .118]. Eleven participants completed version 1 and 12 completed version 2. A two-sided independent-samples Welch's t-test showed that there was a significant difference in age between the two versions [version 1: M = 22.4 years, SD = 1.9 years; version 2: M = 26.5 years, SD = 6.0 years; t(13.4) = 2.264, p = .041]. There were no significant differences between the versions with respect to the other demographic variables reported here (as shown by additional independent-samples Welch's t-tests and chi-square tests where appropriate; all ps > .1).

Findings
Mean ratings for the identical interlingual homographs for all three aspects (meaning, spelling and pronunciation similarity) can be found in Table 4. Of the 87 interlingual homographs that had been rated in total across both experiments, most had received high (or low) ratings as expected for the three aspects. Again, five items ("angel"-"angel", "fee"-"fee", "peer"-"peer", "steel"-"steel" and "wand"-"wand") had strangely received spelling similarity ratings of less than 7, but these were retained as they were truly identical.
A total of 15 identical interlingual homographs was discarded from the database. One item in retrospect should not have been included in the pre-test because it had a mean accuracy in the English Lexicon Project (Balota et al., 2007) of less than 85%. Twelve other items were discarded because they had received an average meaning similarity rating greater than 2. Three other items ("honk"-"honk", "lever"-"lever" and "stadium"-"stadium") had also received an average meaning similarity rating greater than 2, but this was due to one or two participants giving them a high rating of 7 while all other participants had given them a rating of 1 or 2. As the majority of participants agreed that these items' meanings were highly dissimilar, they were retained. Finally, two of the items that had been included in the second experiment were discarded because they had received ratings from fewer than ten participants, as some participants had indicated that they did not know those items. In total, the first and second experiment combined yielded a set of 72 identical interlingual homographs to add to the database.

Additional analyses 3.2.1.1 Between-experiment comparisons
To determine whether participants in the second experiment used the rating scales in a consistently different manner, we compared the ratings from the two experiments for the 28 identical cognates, non-identical cognates and translation equivalents that had been included in the first experiment as targets and in the second experiment as fillers. Overall, the differences between the ratings from the two experiments for the three properties were small. (Positive differences indicate higher ratings were given in Experiment 2.) For meaning similarity, the average difference was 0.04 (SD = 0.16, range = -0.43-0.36). For spelling similarity, it was -0.04 (SD = 0.17, range = -0.34-0.61). Finally, for pronunciation similarity, the average difference was -0.01 (SD = 0.16, range = -0.25-0.50). Two-tailed one-sample t-tests indicated that these differences between the two experiments were not significant for any of the three properties [for meaning similarity: t(27) = 1.495, p = .147; for spelling similarity: t(27) = 1.379, p = .179; for pronunciation similarity: t(27) = 0.489, p = .629].

Correlation analyses
We computed correlations to assess the relationship between the objective orthographic similarity scores calculated using Schepens et al.'s (2012) formula, which we used to select our items, and the subjective spelling similarity ratings the items received in the experiments. We only included the non-identical cognates and the translation equivalents in these analyses, as the identical cognates and interlingual homographs by design all had a score of 1 on the objective orthographic similarity measure and (nearly all) had received a mean subjective spelling similarity rating of 7. We computed separate correlations for the non-identical cognates and the translation equivalents, as the non-identical cognates had been chosen because they had high objective orthographic similarity scores and the translation equivalents had been chosen because they had low scores. The scatterplots in panel A of Figure 1 demonstrate this discontinuity. For the non-identical cognates, the correlation between the objective orthographic similarity scores and the subjective spelling similarity ratings was strong and significant [r(74) = .657; 95% CI: .506-.768, p < .001]. For the translation equivalents, the correlation was somewhat less strong but still significant [r(76) = .417; 95% CI: .214-.585, p < .001].
We also computed correlations to determine whether our ratings were similar to those obtained by Dijkstra et al. (2010) and Tokowicz et al. (2002). Note that these analyses did not include the identical interlingual homographs, as these items had not been included in either of these two studies. First, we computed correlations for the relationship between Dijkstra et al.'s (2010) orthographic similarity ratings and the spelling similarity ratings we had obtained. These analyses also did not include the identical cognates, as they had nearly all received mean ratings of 7 in both our experiments and Dijkstra et al.'s (2010) experiment. As for the correlation between the objective orthographic similarity scores and subjective spelling similarity ratings, the scatterplot in panel B of Figure 1 showed a discontinuity in the data, so we computed separate correlations for the non-identical cognates and translation equivalents. For the translation equivalents, the correlation was near-perfect and significant [r(42) = .938; 95% CI: .889-.966, p < .001]. For the non-identical cognates, the correlation was similarly highly significant but slightly less strong [r(41) = .810; 95% CI: .674-.893, p < .001].
Second, we computed correlations for the relationship between Dijkstra et al.'s (2010) phonological similarity ratings and the pronunciation similarity ratings we had obtained. These analyses did include the identical cognates. As there was no discontinuity between the word types, nor did the scatterplot in panel C of Figure 1 suggest that the strength of the relationship between these two variables differed between the word types, we computed this correlation only across all items. The correlation was near-perfect and highly significant [r(111) = .985; 95% CI: .979-.990, p < .001].
Finally, we computed correlations for the relationship between Tokowicz et al.'s (2002) semantic similarity ratings and the meaning similarity ratings we had obtained in these experiments. Again, we computed this correlation only across all items, as the scatterplot in panel D of Figure 1 did not show a discontinuity between the word types, nor did it suggest that the strength of the relationship between these two variables differed between the word types. This correlation was of medium size but significant [r(134) = .365; 95% CI: .209-.502, p < .001].  (2010) phonological similarity rating (P-rating; x-axis) plotted against the pronunciation similarity ratings obtained in the current experiments (y-axis). D Tokowicz et al.'s (2002) semantic similarity rating (x-axis) plotted against the meaning similarity ratings obtained in the current experiments (y-axis). Panels A and B display two regression lines fitted separately for each word type, while panels C and D display a single regression line fitted across all items. Word types are distinguished by colours and shapes (identical cognates, squares in green; non-identical cognates, circles in purple; translation equivalents, triangles in blue).

Discussion
The two experiments presented in this paper have produced a database of experimentally validated Dutch-English identical cognates (n = 58), non-identical cognates (n = 76), identical interlingual homographs (n = 72) and translation equivalents (n = 78). While all of the non-identical cognates and translation equivalents had previously been validated in similar experiments (Dijkstra et al., 2010;Tokowicz et al., 2002), in contrast to this previous work, we also validated a large set of identical cognates and interlingual homographs. Our items were rated in two rating experiments, where we asked participants to rate the items' spelling, pronunciation and meaning similarity in Dutch and English. One-sample t-tests showed that the ratings did not differ significantly between the two experiments. This indicates that there was no significant, consistent shift in how participants used the three scales in the two experiments. Furthermore, the spelling and pronunciation similarity ratings we obtained for the subset of items that had been included in Dijkstra et al.'s (2010) study correlated near-perfectly with the ratings obtained by Dijkstra et al. (2010) themselves. This provides further evidence of the validity of our ratings.
The correlation between our meaning similarity ratings and Tokowicz et al.'s (2002) semantic similarity ratings was considerably less strong (.365). As has been noted frequently, however, when either or both of the two variables involved in a bivariate correlation is restricted in range, this often leads to an underestimate of the correlation in the sample compared to the true correlation in the population (e.g. Alexander, Barrett, Alliger, & Carson, 1986;Bobko, 1983;Sackett & Yang, 2000;Thorndike, 1949). In our case, we had specifically selected items from Tokowicz et al.'s (2002) materials that had semantic similarity ratings greater than 6, effectively restricting both the range of their ratings and our own. Had we obtained similarity ratings for the full set of items included by Tokowicz et al. (2002), it is likely that our meaning similarity ratings would have correlated more strongly with Tokowicz et al.'s (2002) semantic similarity ratings.
The correlations between the objective orthographic similarity scores calculated using the formula proposed by Schepens et al. (2012) and the subjective spelling similarity ratings for the non-identical cognates and translation equivalents were of medium to strong size. This suggests that the spelling similarity ratings were influenced by other factors than merely the orthographic similarity of the items, such as by the pronunciation similarity of the items or cross-lingual spelling regularities. Notably, Schepens et al. (2012) themselves report a correlation of .96 with Dijkstra et al.'s (2010) orthographic similarity ratings. Most likely our correlations were lower because the non-identical cognates and translation equivalents had been dichotomised both with respect to the objective orthographic similarity scores and the subjective spelling similarity ratings. Consequently, only relatively few items had scores around the 0.50 mark and/or ratings in the 2-5 range. In contrast, Schepens et al. (2012) had computed their correlations across the full range of objective orthographic similarity scores and subjective spelling similarity ratings.
Researchers interested in using these stimuli should note we provided the participants with a sentence in Dutch for each item, as we intended to use these items in a series of cross-lingual long-term priming experiment. This may also have contributed to the low correlation we observed between our meaning similarity ratings and Tokowicz et al.'s (2002) semantic similarity ratings. Many of the non-identical cognates and translation equivalents had multiple meanings or senses in Dutch or English (or even multiple translations) and our sentences were, of course, confined to using only of those. In contrast, the participants in Tokowicz et al.'s (2002) study were free to think of whichever meaning(s) or sense(s) of these items they could think of, which likely resulted in differences in how their participants rated the meaning similarity of those items compared to our participants. However, the aforementioned effect of restricting the range of the ratings makes it difficult to say whether the observed correlation was low because we provided a sentence context for the Dutch word forms. Lastly, researchers should also note that the items in our database (especially the interlingual homographs) often have a different grammatical class in Dutch and English. Because many of these items are often also syntactically ambiguous within Dutch and/or English, we have not matched or labelled the items with respect to grammatical class.
To conclude, these stimuli will be useful for future research into the structure of the bilingual mental lexicon and bilingual language processing in general. While we intended to use these stimuli in experiments focusing on visual word processing using lexical decision tasks, we encourage researchers to use a range of various types of tasks and paradigms to further explore the differences we have observed between using a lexical decision task and a semantic relatedness judgement task (see Poort & Rodd, in press). Furthermore, we do not believe that providing a sentence context in the rating experiments affects whether these items can be used in isolation in future experiments, as we have used a subset of these items to successfully replicate both the cognate facilitation effect and the interlingual homograph inhibition effect (see ). Nevertheless, researchers should keep in mind that many of the items we have included in