Stressing what is important: Orthographic cues and lexical stress assignment

https://doi.org/10.1016/j.jneuroling.2008.09.002Get rights and content

Abstract

Computational models of reading have typically focused on monosyllabic words. However extending those models to polysyllabic word reading can uncover critical points of distinction between competing models. We present a connectionist model of stress assignment that learned to map orthography onto stress position for English disyllabic words. We compared the performance of the connectionist model to Rastle and Coltheart's [(2000).] rule-based model of stress assignment for words and nonwords. The connectionist model performed well on predicting human performance in reading nonwords that both contained and did not contain affixes, whereas the Rastle and Coltheart model performed well only on nonwords with affixes. The connectionist model provides an important first step to simulating all aspects of polysyllabic word reading, and indicates that a probabilistic approach to stress assignment can reflect human performance on stress assignment for both words and nonwords.

Introduction

Computational modelling has enabled links to be forged between neural structure and cognitive processes (see, e.g., Monaghan and Shillcock, in press, Rogers and McClelland, 2004). Computational models have also facilitated insight into the cognitive categories involved in particular tasks. Particularly insightful in this respect have been models of single word reading, where proposals for the precise mechanisms involved in mapping written words onto spoken forms have been tested (Coltheart et al., 2001, Seidenberg and McClelland, 1989). Yet, these previous computational models of reading have concentrated on determining the mapping from letters, or sets of letters, onto phonemes, or sets of phonemes. In this paper we review the implications for this restriction to phonology in comparing computational models of reading, and show that considering stress assignment in reading is an important distinguishing characteristic between alternative cognitive accounts of word processing.

There are two recent traditions for modelling the cognitive processes involved in mapping letters onto phonemes: the dual-route model, and the connectionist triangle model. The dual-route framework incorporates into the model two systems for forming the mapping between letters and phonemes. The Dual-Route Cascaded (DRC) model (Coltheart, 2000, Coltheart et al., 2001) implemented these two routes in a model of reading, with the lexical route comprising a stored lexicon containing phonological information for all the words known to the hearer, and the second sub-lexical route which applies grapheme-phoneme correspondence rules to convert serially the orthographic input into phonemes. Though the two routes operate simultaneously and in parallel, for word reading, the lexical route is configured to process the written input faster than the sub-lexical route, and so correct naming of irregular words is achieved. For nonwords, there are no entries in the stored lexicon and output from the sub-lexical route determines the pronunciation. A recent development in the dual-route framework is the CDP+ model which provides an impressive fit to item-level naming data (Perry, Ziegler, & Zorzi, 2007). The model is an adaptation of the DRC, except that the grapheme-phoneme correspondence route is implemented as an associative network that is trained on the lexicon to discover the correspondences. In the DRC model, these correspondences are rule-based and provided to the model.

A contrasting tradition in modelling reading is the connectionist triangle model, where the mapping between orthography and phonology is mediated by direct links between these representations and also connections to and from a semantic representation of words. The triangle model has been implemented, to varying degrees of completeness (Harm and Seidenberg, 1999, Harm and Seidenberg, 2004, Plaut et al., 1996, Seidenberg and McClelland, 1989), in connectionist models where all connections between representations are learned. So, the model stores statistics about the associations between the representations, and these representations interact in the process of mapping written words onto pronunciation. The two frameworks of modelling reading have shown convergence over many aspects of their architectures, as exemplified by the incremental, nested modelling approach of CDP+, which encompasses both trained, associative networks characteristic of the triangle model tradition, as well as hard-wired localist lexical units inherited from the DRC. However, a key distinction is the nature of nonword reading in these models. In the dual-route model, pronunciation rules are applied to the graphemes of the nonword. In the triangle model, nonwords are read by analogy to similar words and parts of words to which the model has previously been exposed. This distinction proves to be critical for conceptions of how stress is applied to nonword naming, and we will return to this below. Imaging studies of reading and reading impairment support these notions of direct and indirect pathways involved in reading (e.g., Shaywitz, Lyon, & Shaywitz, 2006). Monaghan and Shillcock (in press) have demonstrated how anatomical distinctions in left and right hemisphere processing, and impairments to the specified pathways between left and right hemisphere processing can result in dyslexic behaviour for the reading task. Such links between reading and anatomy are driven by the theoretical advances resulting from implementing computational models of reading.

All the previously described models have focused only on monosyllabic reading. Such a limitation is a reasonable constraint to determine the lower bound of performance for a particular architecture of the reading system – so if a model fails to simulate human performance for reading monosyllables then it is insufficient as a model of the reading process. However, extending models to polysyllabic reading can potentially reveal limitations or over-specifications of models of reading. Take, for instance, the CDP+ model of reading. It requires a pre-mapping of letters into graphemes in the rule-correspondence route of the model. For a word like “graph”, this is a relatively trivial task, as the system can recognise that “g”, “r”, “a”, and “ph” are the graphemes to be extracted from the individual letters. Then, these can be mapped with little effort onto the phonemes. But for a word like “hothouse”, the mapping becomes more problematic in that the model has to make a decision about whether “th” is one or two graphemes, and to which syllable in the output each grapheme is related. Thus, extending the CDP+ model to polysyllabic words, though possible, highlights a heavy pre-processing requirement in the system for determining the graphemes and the assignment of these graphemes to the different syllables, which were processes irrelevant to monosyllabic reading systems.

So far the only model of polysyllabic reading from a cognitive perspective has been developed for French (Ans, Carbonnel, & Valdois, 1998). Although the basic assumptions of the model extend previous connectionist models of reading, it assumes additionally that reading of polysyllabic words is supported by two procedures that operate serially (therefore is in contrast to previous connectionist models of reading): first, a global procedure which uses knowledge about whole words, and second, an analytic procedure applying to word syllable segments which is activated only if the global procedure fails. The model successfully provided an account of basic effects in human reading such as the effect of frequency by consistency interaction and a position-of-irregularity, as well as some effects associated with different subtypes of dyslexia.

An additional challenge for models of polysyllabic reading is the assignment of lexical stress in reading. For some languages, like French, this question is not an issue because stress is always fixed on the last syllable. On the other hand, in languages such as English or Dutch, there is a tendency for words to be stressed on the first syllable (78% and 75% of disyllabic words, respectively, in the CELEX database, Baayen, Piepenbrock, & van Rijn, 1993). However, treating this as a default or regular state still requires that the reading system must determine stress position for the remaining 22–25% of the disyllabic words. How does the reading system accomplish this task? How, for instance does the reading system know that “giraffe” is second syllable stress (iambic pattern of stress), but that “zebra” is stressed on the first syllable (trochaic pattern of stress) in English?

Within the dual-route framework, an obvious solution is to include position of stress in the lexical route. So, we know “giraffe” is second syllable stress because our encoding of the lexical item includes information about both the phonemes within the word and stress position. But lexical-based storage of stress is problematic for nonword reading. Nonwords can be, and generally are, pronounced with stress by readers. The dual-route model simulates nonword reading by using only the GPC rules, as, by definition, there is no lexical entry. Consequently, in current formulations of the DRC model there is no mechanism for stress assignment of nonwords.

This difficulty was addressed by Rastle and Coltheart (2000) who generated a rule-based system for stress assignment for words and nonwords, that included both orthographic and phonological information and that extended to stress assignment in nonword reading. Rastle and Coltheart (2000) based their model on linguistic analysis of stress patterns in English by Fudge (1984) and Garde (1968), which suggested that 54 word beginnings and 101 word endings (most of which were morphemes in English) could influence the placement of stress. In particular, certain morphemes were either reliably stressed or unstressed (such as the suffix –ing, which rarely carries stress). The Rastle and Coltheart model (R&C model) consisted of several steps in an algorithm that used this correspondence between certain morphemes and stress position: 1) identification of first word beginnings and then word endings; 2) translation of the remaining parts of words into a phonological representation by using GPC rules plus a set of additional rules for correction of illegal phoneme combinations; 3) stress assignment based on the stored affix' stress position and the quality of the vowels (presence of schwa); and 4) if no prefix and suffix was identified, application of first syllable stress as the default stress position.

The R&C model was effective in correct stress assignment for 89.7% of English disyllabic words from the CELEX database (Baayen et al., 1993). Its ability to extend to coverage of nonwords was tested by determining stress placement by participants for a set of disyllabic nonwords. The R&C model agreed with that of the majority of participants' decisions for 84.8% of these stimuli. However, the set of nonwords was somewhat biased to good performance by the model due to the majority of nonwords containing affixes listed in the R&C morpheme list. In particular, second-syllable stress is recognised by the model if the nonword contains a prefix that is habitually unstressed, and 40% of the second syllable nonwords in the stimulus set contained such prefixes. The model also included phonological information in the form of whether vowels were reduced as an additional contribution to stress position assignment which was not available only from the orthography. Thus, the relative role of orthographic and phonological cues for stress assignment remains unclear from this model.

An alternative in studying stress assignment in reading sets is the connectionist tradition, which proposes that the statistical regularities with respect to stress assignment will be learned in the same way as the learning of regularities in the orthography to phonology mapping (Harm and Seidenberg, 1999, Harm and Seidenberg, 2004, Plaut et al., 1996, Seidenberg and McClelland, 1989). Previous attempts in connectionist and other data-driven frameworks have shown that generalisation of lexical stress assignment is possible without using explicit linguistic rules (Arciuli and Thompson, 2006, Daelemans et al., 1994, Gupta and Touretzky, 1994, Zevin and Joanisse, 2000). We extend these previous studies by modelling a large scale, realistic lexicon of disyllabic English words that learns the mapping between orthography and stress position. The model provides an empirical test of the extent to which probabilistic information in orthography can provide information about stress position in a representative lexicon of English. The model also provides a substantial first step in the connectionist modelling of polysyllabic reading, to determine the extent to which stress assignment may be accomplished using the same principles of connectionist models of reading that apply to mapping between orthography and phonology.

Corpus analyses of English polysyllabic words have indicated that there are numerous probabilistic cues available in the phonology and orthography for indicating stress position. In phonology, cues for stress are present both in the rime (reduced vowels are unstressed and consonantal clusters in codas are stressed, Chomsky & Halle, 1968) and in the onset (consonantal clusters tend to be stressed, Kelly, 2004). In orthography, length and complexity of coda and onset, as well as the identity of particular letters (both consonants and vowels) tend to predict stress assignment accurately, albeit, probabilistically (Arciuli and Cupples, 2006, Arciuli and Cupples, 2007, Kelly et al., 1998). For instance, there are some word beginnings, like cu- (which tends to be stressed) or be- (which tends to be unstressed), and endings, like –um (typically unstressed) or –een (typically stressed) that are not always morphological but still carry reliable information about stress position. Experimental studies have demonstrated that readers are sensitive to such phonological and orthographic cues present in the input (Arciuli and Cupples, 2006, Arciuli and Cupples, 2007, Kelly and Bock, 1988, Kelly et al., 1998). The R&C model of stress assignment incorporates a partial version of the orthographic information potentially available for stress assignment by focusing on frequent, regular patterns of letters that predict certain patterns of stress assignment. Such cues are a part of the orthographic information, but, in the connectionist tradition, they do not have special status beyond other, equally reliable cues that may be present in the letter string to determine stress position. This provides a substantial advantage in that a separate list of morphemes does not have to be incorporated into the connectionist model of stress assignment, and decisions about what does and does not count as a morpheme, and how it can be identified and isolated within the letter string, also do not have to be decided a priori.

We report the results of a connectionist model of English stress assignment that tested the extent to which orthographic regularities alone can predict stress assignment in English words and a variety of stimulus sets of nonwords. The first set of simulations aimed to investigate to what extent a connectionist model will be successful in stress assignment in English disyllabic words based only on orthographic representations. The model was then tested on two sets of nonwords, varying in the extent to which they contain affixes, and compared to readers' responses to the stimuli. The same test stimuli were applied also to the R&C model. We hypothesised that the connectionist model would perform well on nonwords that contained and did not contain affixes, whereas the R&C model was predicted to perform well only on nonwords with affixes.

Section snippets

Architecture

We constructed a simple feedforward network that learned to map the orthography of words onto stress position (see Fig. 1). The orthographic input layer was composed of 14 letter slots, which was sufficient to encode all the words in the word and nonword corpora. If a letter appeared in the slot then one of 26 units was active to represent the letter. The input layer was fully connected to a layer of 100 hidden units, which in turn was fully connected to one output stress unit. Words in the

Results

The mean percent correct stress assignment for the connectionist model on the 90% of words used for training and the 10% reserved for testing is shown in Fig. 2, along with the R&C model's performance on the CELEX disyllabic words. Fig. 2 also presents mean d′ for the model's discrimination between first and second syllable stress assignment1

Discussion

The present study provided a demonstration that stress assignment for words and nonwords can be accomplished with accuracy in a connectionist model that learns to map orthography onto stress position for disyllabic words in English. This implies that stress-assignment does not need to be governed by a set of predefined linguistic rules, but rather it can emerge to a large extent through a combination of different cues present in the orthographic input. The sources of information implemented in

Acknowledgements

This work was supported by the ESRC/ARC Bilateral Research Awards Grant, RES 000-22-1975/LX0775703. We would like to thank Kathy Rastle and an anonymous reviewer for helpful comments on an earlier version of this paper.

References (32)

  • R.H. Baayen et al.

    The CELEX lexical database (CD-ROM)

    (1993)
  • N. Chomsky et al.

    The sound pattern of English

    (1968)
  • M. Coltheart

    Dual routes from print to speech and dual routes from print to meaning: some theoretical issues

  • M. Coltheart et al.

    DRC: a dual route cascaded model of visual word recognition and reading aloud

    Psychological Review

    (2001)
  • W. Daelemans et al.

    The acquisition of stress: a data-oriented approach

    Computational Linguistics

    (1994)
  • E. Fudge

    English word-stress

    (1984)
  • Cited by (0)

    View full text