Quantitative analyses of orthography to phonology mapping in English and French

Ronald Peereman* & Alain Content**

* Université de Bourgogne, L.E.A.D.- C.N.R.S.

** Université libre de Bruxelles, LAPSE

Quantitative analyses of orthography to phonology mapping in English and French

Current psycholinguistic models assume that phonological processes constitute an essential component of skilled reading and reading acquisition, and various authors have hypothesized that the complexity of phonological computation from print should be linked to the transparency of the relations between orthography and phonology. In this regard, the English orthography is often claimed to provide a more ambiguous instance of mapping than the French orthography. The general aim of the present study is to document this assumption through statistical analyses of large lexical databases while considering specific word units. Although most available analyses focused on grapheme-phoneme correspondences, (e.g., Berndt, Reggia, & Mitchum, 1987, for English; Véronis, 1986, for French), the consistency of the mappings between orthography and phonology could, in principle, be described at various levels of word structure. However, various empirical observations suggest that the final vowel-consonant cluster (the "body" unit) of monosyllabic words may have a special status in reading. A first purpose of the analyses was to examine whether, relative to other subsyllabic word units, the body and the corresponding phonological rime are more consistent units in relating orthography and phonology, for French and English.

Reading performance seems to be function of the orthographic and phonological word knowledge that is activated during the process of converting print to sound. During the last few years, much evidence has shown that reading performance is influenced by the orthographic and phonological similarities between the letter string being processed and other words. Letter strings are named faster when they are both orthographically and phonologically similar to numerous words (the "phonographic neighbors") than when they have few phonographic neighbors (Peereman & Content, 1995, 1997). Thus, a comprehensive psycholinguistic approach requires to determine consistency as a function of the particular pool of words that contribute to the print-to-sound conversion of the letter string.

Finally, independently of consistency, some word units might also acquire particular importance when they are formed by highly contingent elements. In fact, some authors have insisted on the early sensitivity to rhyme in prereading children. Hence, a third analysis explored whether bodies and rimes correspond to units whose constituents (the vowel and the coda) occur more frequently together than constituents of other word units.

All analyses were restricted to the monosyllabic words occurring in the Celex database for English (Baayen, Piepenbrock & Gulikers, 1995), and in the Brulex database for French (Content, Mousty & Radeau, 1990).

1. Overall consistency

The aim of the first analysis was to compare print-to-sound consistency for orthographic units of various sizes occurring in English and French words. A secondary objective was to compare orthographic-to-phonology consistency and phonology-to-orthography consistency. This last measure corresponds to the complexity of retrieving the correct spelling from the sounds of the word. It has been shown that spelling is less accurate when the words included sounds with multiple orthographic renderings (e. g., Kreiner & Gough, 1990). In addition, Stone, Vanhoy and Van Orden (in press) have recently claimed that phonology-to-orthography consistency influences reading performance. Although the evidence they adduced suffers from important limitations caused by confounded variables, their suggestion further invites to look more closely at sound-to-spelling consistency.

All words including an initial and final consonants (or consonant cluster) were involved in the present analysis (N = 3,564 for English; N = 1,787 for French). Cconsistency measures from orthography to phonology (O-P) and from phonology to orthography (P-O) were computed for the initial consonant or consonant cluster (C1), the vowel (V), the final consonant or consonant cluster (C2), as well as for C1V (the lead), and VC2 (the body or rime). O-P consistency corresponds to the proportion of words containing a given orthographic unit with the same pronunciation as the target word, relative to the total number of words containing that particular orthographic unit. The same logic applies to P-O consistency. The results appear in Table 1. Confirming previous observations (Treiman et al., 1995), English is characterized by a low O-P consistency for vowels which drastically increases when considered within the body. In contrast, vowel consistency in French was already high when considered in isolation. Interestingly, body consistency was nearly equivalent for French and English. Finally, P-O consistency was low both for English and French, and the rime unit did not show a specific advantage relative to the lead.

Table 1. Consistency ratios (in percent) averaged over all monosyllabic words, for Orthographic and Phonological units in English and French

Orthography to Phonology


Phonology to Orthography


Unit English French English French
C1 95 95 90 99
V 48 94 67 68
C2 96 97 50 58
C1V 57 96 74 73
VC2 91 98 67 58

2. Neighborhood consistency

In the case of a consistent bi-directional mapping between orthography and phonology, words that look alike should also sound alike, and phonologically similar words should have similar spellings. Hence, the words that are orthographically similar to a target letter string (the orthographic neighbors) should also be phonologically similar to the target pronunciation (phonological neighbors). These words, which are both orthographically and phonologically similar to the target are called phonographic neighbors. The consistency of the phonological codes activated by a target letter string can be estimated by the percentage of phonographic neighbors within the orthographic neighborhood. In the same vein, the consistency of the orthographic codes activated from the sounds of the target word is given by the percentage of phonographic neighbors within the phonological neighborhood. Neighborhood characteristics were computed on the same word corpus than used in the preceding analyses. Orthographic neighbors were operationally defined as the words which can be generated by a single letter substitution. Similarly, phonological neighbors were any word obtained by a single phoneme substitution. The results indicate that 74% of the orthographic neighbors of English words were phonographic neighbors. The corresponding value for French was 85%. The slight difference between English and French mirrors the overall difference observed above in the O-P consistency analysis. Percentages of phonographic neighbors within the phonological neighborhood were more similar for English and French (35% and 30%, respectively), reflecting the fact that both orthographies are equally inconsistent from phonology. In additional analyses, we computed neighborhood as a function of the unit shared by the target and the neighbors. Given the neighbor definition, neighbors are of three types: 1. body neighbors (identical VC2); 2. lead neighbors (identical C1V) or; 3. consonant neighbors (identical C1 and C2). Table 2 describes the proportion of each of the three types of orthographic neighbors for English and French. In both cases, the large majority of neighbors consisted in body-neighbors. The larger number of body-neighbors than lead-neighbors is interesting to consider with regard to vowel consistency. The consistency of vowel pronunciation within the neighborhood can be estimated by the product of the lead and body consistencies by the number of neighbors sharing these units. The results indicate a mean vowel neighborhood consistency of .78 for English vowel, and of .97 for French vowel. Hence, English vowel consistency strongly increases (from .48 to .78) when considered within the neighborhood.

Table 2.Proportion of the orthographic neighborhood corresponding to each type of neigbor.

3. Phonological and orthographic cohesiveness

The cohesiveness of the C1V and VC2 units were estimated by dividing the number of existing C1V (or VC2) by the number of possible C1V (or VC2) in the absence of co-occurrence constraints. The number of existing units corresponds to the number of units found in the word corpus. The number of possible units is simply the product of the number of occurrence of each constituent (C, V). Hence, for example, the number of possible rimes is the product of the number of different Vowels by the number of different codas (C2). The proportion of existing units relative to possible units defined the Space Occupation Ratio (SOR). In the absence of co-occurence constraints, SORs should be 1. All monosyllabic words were used in the analysis (N = 4,015 for English; N = 2,449 for French). The number of existing C1V and VC2 units are shown in Table 3, together with their corresponding SORs. First, for both English and French, the number of different bodies was slightly larger than the number of different leads, whereas the opposite pattern appears for rimes and leads. Second, co-occurrence constraints seem higher for French than for English leads. As a result, although in both cases, bodies and rimes have smaller SORs than leads, the difference is more marked for English.

Table 3. Number of units and Space occupation Ratios for Orthographic and Phonological C1V and VC2 units.

Number of units Space Occupation Ratios
Units English French English French
Orthographic Lead 838 860 .23 .10
Body 981 883 .09 .06
Phonological Lead 756 572 .46 .27
Rime 610 469 .28 .20

4. Conclusions

The analyses suggest that the particular status of body-rime units might be related to their high cohesiveness and to the fact that print-to-sound associations are less ambiguous when based on the body. There were however some differences between English and French. Relative to the lead unit, the bodies and rimes were only slightly more cohesive in French whereas a larger difference occurred for English. Furthermore, only for English, print-to-sound consistency increased for the body unit. An additional finding is that in both orthographies, most of the orthographic neighbors share the body. Hence, given that body-rime consistency are very close for English and French, print-to-sound conversion processes are expected to be nearly as efficient for both orthographies. Finally, English and French are characterized by a weak phonology-to-orthography consistency that did not vary as a function of the units considered in the analyses.

5. References

Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (CD-ROM). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.

Berndt, R. S., Reggia, J. A., & Mitchum, C. C. (1987). Empirically derived probabilities for grapheme-to-phoneme correspondences in English. Behavior Research Methods, Instruments, & Computers, 19, 1-9.

Content, A., Mousty, P., & Radeau, M. (1990). Brulex. Une base de données lexicales informatisée pour le français écrit et parlé. Année Psychologique, 90, 551-566.

Kreiner, D. S., & Gough, P. B. (1990). Two ideas about spelling: rules and word-specific memory. Journal of Memory and Language, 29, 103-118.

Peereman, R., & Content, A. (1995). The neighborhood size effect in naming: Lexical activation or sublexical correspondences? Journal of Experimental Psychology : Learning, Memory and Cognition, 21, 409-421.

Peereman, R., & Content, A. (1997). Orthographic and Phonological Neighborhoods in Naming: Not all neighbors are equally influential in orthographic space. Manuscript under review, LEAD-CNRS, Université de Bourgogne à Dijon.

Stone, G. O., Vanhoy, M., & Van Orden, G. C. (in press). Perception is a two-way street: feedforward and feedback phonology in visual word recognition. Journal of Memory and Language.

Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology : General, 124, 107-136.

Véronis, J. (1986). Etude quantitative sur le système graphique et phono-graphique du français. Cahiers de Psychologie Cognitive, 6, 501-531.