Phonetics and Phonology

student working at computer

The Stanford Department of Linguistics has a strong focus on phonetics and phonology, with a special emphasis on variation. 

Our research integrates phonetic and phonological theory with other aspects of language structure (syntax, morphology) and language use (sociolinguistics, psycholinguistics, pragmatics) and pursues its implications for closely related fields (metrics, historical linguistics, language change). 

Our work is theoretically driven and empirically rich.  A variety of behavioral and computational methods are employed to understand, through the investigation of sound patterns across languages, the processes and representations that underlie language and cognition.

Members of the phonetics and phonology community gather weekly at an informal Phonetics and Phonology Workshop (‘P-interest’) featuring presentations of ongoing research by those at Stanford as well as by visitors, especially from nearby universities.  We focus on issues related to descriptive, theoretical, experimental, and computational research in phonetics and phonology.More informally, we meet once a month for a P-int night. 

Please see the Upcoming Events list in the box below for details of upcoming meetings.  For the foreseeable future, any meetings will be held online.

Want to receive email updates about our activities?

Join our P-interest mailing list!

People in this subfield

Graduate students, related news, look who’s talking: clapp and glass, welcome to our incoming ph.d. students, qp fest is next friday, anttila and magri project funded by france-stanford center, look who’s talking: stanford linguists at wccfl 42, upcoming events.

Home

  • Phonetics and Phonology

Sound waves

Phonetics is the study of speech sounds as physical entities (their articulation, acoustic properties, and how they are perceived), and phonology is the study of the organization and function of speech sounds as part of the grammar of a language. The perspectives of these two closely related subfields are combined in laboratory phonology, which seeks to understand the relationship between cognitive and physical aspects of human speech.

Phonetics, Phonology, Romance languages

Sociolinguistics, Language Variation and Change, Southern US Englishes, African American Language, Phonetics, Phonology

Phonetics, Korean

Language variation, Corpus linguistics, American English

Slavic prosody and the phonology/morphology interface; historical Slavic linguistics and accentology; and sociolinguistics, with a focus on questions of language and identity and language contact in the former Yugoslavia.

Phonetics, Phonology, Laboratory Phonology, Speech Acoustics, Romance Languages, English

Related Articles...

This past summer, Linguistics faculty and students represented UGA at several conferences. We are excited about all of the phenomenal research taking place in the department, and we would like to take the opportunity to recognize some of the projects that…

  • Computational Linguistics
  • Corpus Methods
  • Historical Linguistics
  • Language Acquisition
  • Language Documentation
  • Pragmatics and Discourse Analysis
  • Psycholinguistics and Neurolinguistics
  • Sociolinguistics and Language Variation
  • Syntax and Morphology

Support Linguistics at UGA

Your donations to the Department of Linguistics will support research and travel opportunities for students and faculty and other initiatives to enhance students' education in linguistics. Please consider joining other friends and alumni who have shown their support by making a gift to our fund . We greatly appreciate your contributions to the success of our programs!   

EVERY DOLLAR CONTRIBUTED TO THE DEPARTMENT HAS A DIRECT IMPACT ON OUR STUDENTS AND FACULTY.

  • College of Arts & Sciences
  • Graduate Division
  • College of Liberal and Professional Studies

Home

Phonetics is the science of speech. It studies the articulation, acoustics, and perception of speech sounds.

The phonetics group at Penn emphasizes the interdisciplinary and experimental nature of phonetics in both teaching and research. The group is engaged in a wide range of research topics, including laboratory studies of speech production and perception, prosody modeling, phonetic patterns in large speech corpora, integration of phonetic knowledge in speech synthesis/recognition, etc.

Mark Liberman 's recent research areas include the phonology and phonetics of lexical tone, and its relationship to intonation; gestural, prosodic, morphological and syntactic ways of marking focus, and their use in discourse; formal models for linguistic annotation; information retrieval and information extraction from text.

Jianjing Kuang 's recent research areas include the multidimensionality of tonal contrasts, phonation (production, perception and phonological representation), laryngeal articulations across languages, experimental fieldwork (Tibeto-Burman, Mayan, Hmong-Mien languages), computational modeling (mapping between production and perception), and prosody (intonation patterns and prosody in sentence processing).

Link to the Phonetics Lab page.

Phonetics, prosody, natural language processing, speech communication

Multidimensionality of tonal contrasts, phonation, laryngeal articulations across languages, computational modeling, and prosody.

Phonetics research themes

Current active research areas

  • Acoustic modelling of sound change
  • Contact phonetics and prosody: Venetan dialect and regional Italian
  • Contact phonetics and prosody: Indian English in India and the diaspora
  • Contact phonetics and prosody: mapping Cypriot prosody
  • Contact phonetics and prosody: Modern Greek dialects in contact
  • Prosody in multimodal communication

Earlier projects

  • Acquisition of Prosody in L1 (APriL)
  • Acquisition of Consonant Timing (ACT)
  • Long-range coarticulation
  • Magnetic resonance imaging in the moving vocal tract
  • ICT tools for searching, annotation and analysis of audiovisual media
  • Exemplar models of speech production
  • Previous research on Intonation
  • Speech rhythm
  • Mining a year of speech
  • Phonetic changes in Foreign/Altered Accent Syndrome

  Projects

Our research in these areas is largely supported by externally-funded research grants. Past and present awards include:

  • The acquisition of rhythm in Catalan, Castilian and English . €7,100 from Batista i Roca Foundation, to P. Prieto, E. Payne , B. Post, L. Astruc and M. Vanrell. 2007-2008.
  • Using neologisms to test theories of speech production. £356,593 from ESRC to G. Kochanski and J. Coleman. 1/9/08-31/3/11.
  • Comparing dialects using statistical measures of rhythm . £284,099 from ESRC to G. Kochanski and E. Keane. 1/8/08-31/10/10.
  • A cross-linguistic study of intonational development in young infants and children . £7,365.50 from British Academy, to B. Post, E. Payne , L. Astruc and P. Prieto. 2009-2010.
  • AMENPRO: Automated metrics for the evaluation of non-native prosody . British Council France, to A. Loukina. 2010.
  • Mining a Year of Speech. £100,000 from JISC to J. Coleman (matched by an award from NSF to M. Liberman). 1/1/10-30/6/11.
  • Word joins in real-life speech: a large corpus-based study. £543,700 from ESRC to J. Coleman, G. Kochanski, R. Temple and J. Yuan. 1/11/10-31/10/13.
  • Project Bamboo: An international collaboration to advance Humanities research through developing shared technological services. US$131,941 from the Andrew W. Mellon Foundation to J. Coleman.
  • The acquisition of consonant timing. £9,700 from British Academy, to E. Payne , B. Post, H. Simonsen and N. Garmann. 2013-2015.
  • Ancient Sounds: mixing acoustic phonetics, statistics and comparative philology to bring speech back from the past. £63,974 from AHRC to J. Coleman. 2015
  • Indian English in the diaspora: A study investigating linguistic modification among new migrants in Australia and the UK , AUD $  17,097 from the  Australian Research Council Centre of Excellence for the Dynamics of Language, Transdisciplinary Innovation Grant,  to   J. Fletcher, O. Maxwell and E. Payne. 2019-2021 .
  • The Prosody and Phonetics of Venetian dialect and regional Italian . £4,000 from the Gladys Krieble Delmas Foundation, to E. Payne. 2021-2022 .
  • Effects of dialect and setting on word stress perception in Indian English , AUD $18,189.7  from the  Australian Research Council Centre of Excellence for the Dynamics of Language, Transdisciplinary Innovation Grant,  t o  R. Fuchs, O. Maxwell and E. Payne ,  G. Wigglesworth and P. Escudero. 2020-2021.
  • Indian English on the move: language contact and change in new urban diasporas . £34,309 from the Leverhulme Trust, to E. Payne . 2021-2022.
  • Eastern Origins of English. £174,862 from the Leverhulme Trust, to J. Coleman. 10/2021-9/2024.
  • Mapping prosodic convergence in Cyprus: a geo-historical acoustic investigation of the effects of insularity at a linguistic crossroads, £76,976.17 from John Fell Fund, to E. Payne, 0011309
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, toward “english” phonetics: variability in the pre-consonantal voicing effect across english dialects and speakers.

phonetics research

  • 1 Department of Linguistics, McGill University, Montreal, QC, Canada
  • 2 Glasgow University Laboratory of Phonetics, University of Glasgow, Glasgow, United Kingdom
  • 3 Department of Linguistics, University of Kentucky, Kentucky, KY, United States

Recent advances in access to spoken-language corpora and development of speech processing tools have made possible the performance of “large-scale” phonetic and sociolinguistic research. This study illustrates the usefulness of such a large-scale approach—using data from multiple corpora across a range of English dialects, collected, and analyzed with the SPADE project—to examine how the pre-consonantal Voicing Effect (longer vowels before voiced than voiceless obstruents, in e.g., bead vs. beat ) is realized in spontaneous speech, and varies across dialects and individual speakers. Compared with previous reports of controlled laboratory speech, the Voicing Effect was found to be substantially smaller in spontaneous speech, but still influenced by the expected range of phonetic factors. Dialects of English differed substantially from each other in the size of the Voicing Effect, whilst individual speakers varied little relative to their particular dialect. This study demonstrates the value of large-scale phonetic research as a means of developing our understanding of the structure of speech variability, and illustrates how large-scale studies, such as those carried out within SPADE, can be applied to other questions in phonetic and sociolinguistic research.

1. Introduction

There exist a large number of well-studied properties of speech that are known to vary across languages and communities of speakers, which have long been of interest to sociolinguists and phoneticians. One dimension of this variability, which is the focus of this study, is that of variation within languages : across dialects and their speakers. For example, the deletion of word-final /t/ and /d/ segments (in e.g., mist, missed ) has been shown to vary across a wide range of dialects and speech communities (e.g., Labov et al., 1968 ; Guy, 1980 ; Tagliamonte and Temple, 2005 ), as have the dialect-specific realization of English vowels (e.g., Thomas, 2001 ; Clopper et al., 2005 ; Labov et al., 2006 ), and variation in the degree of aspiration in English voiced and voiceless stops (e.g., Docherty, 1992 ; Stuart-Smith et al., 2015 ; Sonderegger et al., 2017 ). The study of this kind of variation provides a means of understanding the sources and structures of variability within languages: both in how particular dialects may systematically differ from each other, and how the variable realization of speech sounds maps to speakers' cognitive representation of language and speech ( Liberman et al., 1967 ; Lisker, 1985 ; Kleinschmidt, 2018 ). Despite decades of research, however, there is much we do not know about the scope, extent, and structure of this kind of language-internal variability. Within the phonetic literature, most research has focused on highly-controlled speech styles in ‘laboratory settings', generally focusing on a single dialect in each study; much of the work focusing on phonetic variability in spontaneous speech is on single dialects (e.g., Ernestus et al., 2015 ). The sociolinguistic and dialectological literatures have often examined spontaneous speech, with some notable cross-dialectal studies (e.g., Clopper et al., 2005 ; Labov et al., 2006 ; Jacewicz and Fox, 2013 ), but nonetheless primarily focus on variation in vowel quality. Increasingly, however, research within phonetics and sociophonetics is being performed at a larger scale across speech communities ( Labov et al., 2006 , 2013 ; Yuan et al., 2006 , 2007 ; Yuan and Liberman, 2014 ; Coleman et al., 2016 ; Liberman, 2018 ), driven by the development of new speech processing tools and data sharing agreements. This “large-scale” approach is applied here to one such well-studied variable, the pre-consonantal voicing effect, as a means of characterizing its degree and structure of variability in a single phonetic effect across English dialects and speakers.

The pre-consonantal voicing effect (henceforth Voicing Effect , VE) refers to vowels preceding voiced obstruents being consistently longer than their voiceless counterparts, such as the differences in beat - bead and mace - maze ( House and Fairbanks, 1953 ; House, 1961 ). The VE has been reported—to greater or lesser extent—in a range of languages ( Zimmerman and Sapon, 1958 ; Chen, 1970 ), though varies in size based on properties of the phonetic environment, such as whether the obstruent is a stop or fricative, the height of the vowel, and many others ( Klatt, 1973 ; Crystal and House, 1982 ; Port and Dalby, 1982 ). The evidence for the English VE to date is sourced predominantly from laboratory studies of highly-controlled speech, often in citation form, recorded from small numbers of often standard General American English speakers (e.g., Rositzke, 1939 ; House and Fairbanks, 1953 ; Peterson and Lehiste, 1960 ; House, 1961 ; Crystal and House, 1982 ; Luce and Charles-Luce, 1985 ). On the basis of this evidence, the VE has been noted for being particularly large in English relative to other languages ( Zimmerman and Sapon, 1958 ; Chen, 1970 ), and has long been suggested as a prominent cue to consonant voicing in English ( Denes, 1955 ; Klatt, 1973 ). This in turn has motivated claims that the VE is learned in English, as opposed to being a low-level phonetic property in other languages ( Fromkin, 1977 ; Keating, 2006 ; Solé, 2007 ). At the same time, numerous questions about the nature and extent of the VE in English remain unexplored. In this study, we will examine the variability in the VE across a range of English dialects, focusing on the following two research questions: (1) how large is the VE as realized in spontaneous English speech? and (2) how much does the VE vary across dialects and speakers? In addressing these questions, we hope to gain insight into a number of open issues, including the extent to which there is a single “English” VE or whether dialects differ in the magnitude of the effect, as well as the range of VE sizes across individual speakers of a given dialect.

This paper answers these questions by taking a “large-scale” approach to the study of the VE. Concretely, this refers to the use of a large amount of acoustic data, collected from a large number of speakers across a range of English dialects. This analysis falls within the framework of the SPeech Across Dialects of English (SPADE) project ( Sonderegger et al., 2019 , https://spade.glasgow.ac.uk/ ), which aims to consider phonetic and phonological variation in British and North American English across time and space through the use of automated acoustic analysis of features across English dialects occurring in many corpora. The methodological and research goals of the SPADE project are exemplified through this study of the English VE, specifically by the use of multiple corpora of diverse sources and structures, and the use of linguistic and acoustic analysis via the Integrated Speech Corpus ANalysis (ISCAN) tool ( McAuliffe et al., 2019 ), developed as part of the broader SPADE project. Both the volume and complexity of the resulting data and the goals of the study motivate the need for appropriately-flexible approaches to the statistical analysis: specifically, the data is statistically analyzed using Bayesian regression models ( Carpenter et al., 2017 ), which enable us to accurately estimate the size of the VE across dialects and speakers directly, whilst controlling for the complex nature of the spontaneous speech data.

The structure of this paper is as follows. Section 2 outlines previous work on the VE, and some of the outstanding questions related to our current understanding of its variability. Section 3 describes the data: the corpora of different dialects from SPADE. Sections 4, 5 describe the methodological approach: the process of acoustic and statistical analysis of the data. The results of this analysis are reported in section 6, and then discussed with respect to our specific research questions in section 7 and concluding in section 8.

2. The Voicing Effect (VE)

The observation that vowels preceding voiced obstruents are consistently longer than before voiceless obstruents was first noted in early phonetics textbooks (e.g., Sweet, 1880 ; Kenyon, 1940 ; Thomas, 1947 ; Jones, 1948 ) and in preliminary experimental work from the first half of the twentieth century ( Heffner, 1937 ; Rositzke, 1939 ; Hibbitt, 1948 ). Studies explicitly manipulating the VE in English observed an effect of around 1.45—that is, vowels before voiced consonants were longer than before voiceless consonants by a ratio of around 2:3 ( House and Fairbanks, 1953 ; House, 1961 ), and this effect was a cue to the voicing of the obstruent ( Denes, 1955 ; Lisker, 1957 ; Raphael, 1972 ).

In these studies, VE was shown to be affected by consonant manner: namely, that fricatives showed a smaller or minimal VE compared to stops ( Peterson and Lehiste, 1960 ), and less-robustly cued the voicing of the final consonant ( Raphael, 1972 ). Initial studies of connected speech suggested that the size of the VE in this type of speech is more variable: VEs in carrier sentences are similar to those in isolated words ( Luce and Charles-Luce, 1985 ) 1 whilst vowels in read or spontaneous speech exhibit smaller VE sizes of around 1.2, and a negligible VE for fricatives ( Crystal and House, 1982 ; Tauberer and Evanini, 2009 ). VE size is also modulated by the overall length of the vowel, which is hypothesized to be due to an intrinsic incompressibility of the vowel, limited by the minimal time required to perform the articulatory motor commands necessary for vowel production ( Klatt, 1976 ). This general suggestion has been supported by observations that VE is smaller for unstressed and phrase-medial vowels ( Umeda, 1975 ; Klatt, 1976 ), and vowels produced at a faster speech rate ( Crystal and House, 1982 ; Cuartero, 2002 ). The VE is thus modulated by a range of phonetic factors, and largely predict a reduction of VE size in instances where vowels are generally shorter; vowels that undergo “temporal compression” have a reduced capacity to maintain a large VE size, and so VE is minimized. As these effects have only been investigated in laboratory speech, it is not clear whether the size and direction of these effects are maintained in less-controlled spontaneous speech styles.

Examining the VE across languages, Zimmerman and Sapon (1958) first observed that whilst English speakers produced a robust VE, Spanish speakers did not modulate vowel length in the same way, though this study did not control for the syllabic structure of test items. Comparing across English, French, Russian, and Korean, Chen (1970) observed that all four languages produced a VE size of at least 1.1, though all languages had different VE sizes (English = 1.63, French = 1.15, Russian = 1.22, Korean = 1.31). This was interpreted as evidence that VE is a phonetically-driven effect with additional language-specific phonological specification ( Fromkin, 1977 ). Mack (1982) , comparing English and French monolinguals with bilinguals, observed that English monolinguals maintained a substantially larger VE than French monolinguals, whilst the French-English bilinguals also produced the shorter French-style pattern instead of adapting to the larger English VE pattern. Keating (1985) suggested that VE is “phonetically-preferred,” though ultimately controlled by the grammar of the particular language. English, then, is expected to have a larger VE than other languages, though it is not known if the English VE is of a comparable size in spontaneous speech.

The work discussed above has not differentiated between varieties of English, and cross-linguistic comparisons of VE have presumed that a single “English” VE size exists. Little work has focused on variation in VE across English dialects beyond a small number of studies on specific dialects. One dialect group of interest has been Scottish Englishes and the application of the Scottish Vowel Length Rule (SVLR), where vowels preceding voiced fricatives and morpheme boundaries are lengthened, whilst all other contexts have short vowels ( Aitken, 1981 ), and hence do not show the VE. In studies of the SVLR, some East Coast Scotland speakers show some evidence of the VE in production ( Hewlett et al., 1999 ), whilst VE-like patterns were not observed in spontaneous Glaswegian ( Rathcke and Stuart-Smith, 2016 ). On the other hand, studies of African American English (AAE) have claimed that voiced stops undergo categorical devoicing in this variety, which has resulted in additional vowel lengthing before voiced stops to maintain the pre-consonantal voicing contrast ( Holt et al., 2016 ; Farrington, 2018 ). Only one study has previously compared the VE across English dialects in spontaneous speech. Tauberer and Evanini (2009) , using interview data from the Atlas of North American English ( Labov et al., 2006 ), observe that North American English dialects vary in their VE values, ranging from 1.02 to 1.33, and that dialects with shorter vowels on average (New York City) also show a smaller-than-average VE size (1.13). Moreover, despite recognition that individual speakers may exhibit variability in their VE sizes ( Rositzke, 1939 ; Summers, 1987 ), no study has formally examined the extent of variability across speakers, nor how dialects may differ in the degree of VE variability amongst its speakers. The two patterns observed for Scottish and African American English suggest that English dialects can maintain relatively “small” (or no), and “large” VEs, respectively; we know little about the degree of VE variability beyond these dialects without a controlled study across multiple English varieties, which is one of the goals of this study.

Whilst a large number of studies on the VE have provided useful information for its realization in English and other languages, there are still a range of outstanding questions that can be addressed through a large-scale cross-dialectal approach. To what extent is the VE a learned property of a given language, compared with an automatic consequence of low-level phonetic structure? Much of the discussion with respect to variation in VE has revolved around differences across languages ( Chen, 1970 ; Keating, 1985 ), which may differ both in their phonetic realization of segments but also the phonological representation of those segments. In this sense, examining VE variability internal to a language (i.e., across dialects ) potentially avoids this problem; the specification of phonological categories—here, the voicing status of final obstruents—are expected be largely consistent within a language, meaning that language-internal variability may be driven by only differences in phonetic implementation.

Little is known about how English dialects may vary in their implementation of the VE, and so a range of possibilities exist for how dialects might compare. One possibility is that, with the exception of varieties with specific phonological rules interacting with the VE, dialects might cluster around a single “English” VE value, potentially of the size reported in the previous literature. Such a finding would support the previous approach in the literature, in terms of English compared to other languages, and suggest that dialects do not differ in how the final voicing contrast is phonetically implemented. Alternatively, dialects may differ gradiently from each other, and so may show a continuum of possible dialect-specific VE sizes. If dialects do differ in their VE size in this way, this would suggest that the previous literature on the VE in “English” accounts for just a fraction of the possible VE realizations across English, and would provide evidence that individual English dialects differ in their phonetic implementation of an otherwise “phonological” contrast ( Keating, 1984 , 1985 ).

Similarly, little is known about how individual speakers vary in the VE, and what the overall distribution of speaker VE sizes is. Synchronic variability across speakers is one of the key inputs to sound change ( Ohala, 1989 ; Baker et al., 2011 ), and also defines the limits of a speech community, i.e., speakers who share sociolinguistic norms in terms of production and social evaluation (e.g., Labov, 1972 ). Whilst dialects may differ in the realization of segments or the application of phonological processes, dialect-internal variability is potentially more limited if a phonetic alternation such as the VE is critical to speech community membership.

3. Data for This Study

The varieties of English included in this study are from North America, Great Britain, and Ireland. For the purposes of this study, North American dialects refer to the regions of the United States and Canada outlined in The Atlas of North American English , which is based around phonetic, not lexical, differences between geographic regions ( Labov et al., 2006 ; Boberg, 2018 ). For Canadian data specifically, the primary distinction was made between “urban” and “rural” speakers, based on its relative importance noted in comparison to much weaker geographic distinctions, at least for the corpus which makes up most Canadian data in this study ( Rosen and Skriver, 2015 ). Within the British and Irish groups, dialects from England in this study are defined in terms of Trudgill's dialectal groupings ( Trudgill, 1999 ), which groups regions in terms of both phonological and lexical similarity. Due to the lack of geographical metadata for speakers from Ireland and Wales, these dialects were simply coded as “Ireland” and “Wales” directly. Scottish Englishes are grouped based on information from The Scottish National Dictionary 2 . The data used in this study comes from the SPADE project, which aims to bring together and analyze over 40 speech corpora covering English speech across North America, the United Kingdom, and Ireland. In this study, we analyze data from 15 of these corpora, which together cover 30 different English dialects from these regions, comprised of speech from interviews, conversations, and reading passages. A basic description of each of these corpora is given below, outlining the type of speech and phonetic alignment tools used.

• Audio British National Corpus (AudioBNC, Coleman et al., 2012 ): The spoken sections of the British National Corpus, originally containing speech from over 1,000 speakers. However, due to a range of recording issues (e.g., overlapping speech, background noise, microphone interference), a large portion of the corpus is inaccurately aligned. In order to define a subset of the AudioBNC which maximizes the accuracy of the alignment, utterances were kept if they met a number of criteria: the utterance length was greater than one second, that the utterance contained at least two words, that the mean harmonics-to-noise ratio of the recording was at least 5.6, and that the mean difference in segmental boundaries between the alignment and a re-alignment with the Montreal Forced Aligner (MFA, McAuliffe et al., 2017a ) was at most 30 ms 3 . 50 TextGrids from the remaining data were manually checked and deemed to be as approximately accurate as that of normal forced-alignment.

• Brains in Dialogue ( Solanki, 2017 ): recordings of 24 female Glaswegian speakers producing spontaneous speech in a laboratory setting. There are 12 recordings for each speaker, which were aligned with LaBB-CAT ( Fromont and Hay, 2012 ).

• Buckeye ( Pitt et al., 2007 ): spontaneous interview speech of 40 speakers from Columbus Ohio, recorded in 1990s–2000s. The Buckeye corpus is hand-corrected with phonetic transcription labels: these were converted back to phonological transcriptions in order to be comparable with data from the other corpora.

• Corpus of Regional African American Language (CORAAL, Kendall and Farrington, 2018 ): spontaneous sociolinguistic interviews with 100 AAE speakers from Washington DC, Rochester NY, and Princeville NC, recorded between 1968 and 2016, and aligned with the MFA.

• Doubletalk ( Geng et al., 2013 ): recordings of paired speakers carrying out a variety of tasks in order to elicit a range of styles/registers in a discourse/interactive situation. Ten speakers make up five pairs where one member is a speaker of Southern Standard British English and the other member is a speaker of Scottish English.

• Hastings ( Holmes-Elliott, 2015 ): recordings of sociolinguistic interviews with 46 speakers from Hastings in the south east of England, male and female, aged from 8 to 90, aligned using FAVE ( Rosenfelder et al., 2014 ).

• International Corpus of English—Canada (ICE-Canada, Greenbaum and Nelson, 1996 ): interview and broadcast speech of Canadian English, recorded in the 1990s across Canada, and aligned using the MFA. Speaker dialect was defined in terms of their city or town of origin. In this study, we coded a speaker as “urban” if their birthplace was a large Canadian city.

• Canadian Prairies ( Rosen and Skriver, 2015 ): Spontaneous sociolinguistic interviews, recorded between 2010 and 2016, with speakers of varying ethnic backgrounds from the provinces of Alberta and Manitoba, conducted as part of the Language in the Prairies project, and was aligned using the MFA.

• Modern RP ( Fabricius, 2000 ): reading passages by Cambridge University students recorded in 1990s and 2000s. The speakers were chosen for having upper middle-class backgrounds as defined by at least one parent having a professional occupation along with the speaker also having attended private schooling. The data used in this study come from a reading passage aligned with FAVE.

• Philadelphia Neighborhood Corpus (PNC, Labov and Rosenfelder, 2011 ): sociolinguistic interviews with 419 speakers from Philadelphia, recorded between 1973 and 2013, and were aligned with FAVE.

• Raleigh ( Dodsworth and Kohn, 2012 ): semi-structured sociolinguistic interviews of 59 White English speakers in Raleigh, North Carolina, born between 1955 and 1989, and aligned with the MFA.

• Santa Barbara ( Bois et al., 2000 ): spontaneous US English speech, recorded in the 1990s and 2000s, from a range of speakers of different regions, genders, ages, and social backgrounds.

• The Scottish Corpus of Texts and Speech (SCOTS, Anderson et al., 2007 ): approximately 1,300 written and spoken texts (23% spoken), ranging from informal conversations, interviews, etc. Most spoken texts were recorded since 2000.

• Sounds of the City (SOTC, Stuart-Smith et al., 2017 ): vernacular and standard Glaswegian from 142 speakers over 4 decades (1970s–2000s), collected from historical archives and sociolinguistic surveys, aligned using LaBB-CAT.

• Switchboard ( Godfrey et al., 1992 ): 2,400 spontaneous telephone conversations between random participants from the multiple dialect regions in the United States on a variety of topics, containing data from around 500 speakers.

The goals of this study are to examine the size and variability in the English VE in spontaneous speech, and in variation in the VE across dialects and individual speakers. Specifically, the kind of dialectal variability being addressed in this study is that of regional variability: variability by race or ethnicity is not being directly considered in this study, with the exception of three African American English varieties, given the particular observations about AAE with respect to the VE ( Holt et al., 2016 ; Farrington, 2018 ). This study also does not focus on differences according to age, either age-grading or apparent/real-time change in the VE over time; only speech data recorded since 1990s was included; the other data recorded prior to 1990 was excluded from further analysis. Analysis of the role of age and time in the VE in these English dialects remains a subject for future study.

4. Data Analysis

Having collected and organized the speech data into dialects, it is then possible to extract and acoustically analyze the data in the study: that is, going from raw data (audio and transcription files) to datasets which can be statistically analyzed. As the corpora differ in their formats—the phone labels used, organization of speaker data, etc.—modifying the acoustic analysis procedure for each different corpus format would be both labor and time-intensive, as well as increase the risk that the analysis itself differed across corpora. In order to standardize the acoustic analysis across corpora, the Integrated Speech Corpus Analysis (ISCAN) tool was developed for use in this kind of cross-dialectal study in the context of the SPADE project. This section provides a brief overview of the ISCAN system: see McAuliffe et al. (2017b , 2019) and the ISCAN documentation page for details of the implementation 4 .

The process of deriving a dataset from raw corpus files consists of three major steps. In the first step, individual speech corpora (in the form of sets of audio-transcription pairs) are imported into a graph database format, where each transcription file is minimally composed of word and phone boundaries (e.g., word-level and phone-level tiers in a TextGrid), and these word-phone relationships are structurally-defined in the database (i.e., that each phone belongs to a word). Importers have been developed for a range of standard automatic aligners, including all formats of corpora described in section 3. Corpora, represented in database format, can then be further enriched with additional structure, measurements, and linguistic information. For example, utterances can be defined as groups of words (separated silence of a specified length, e.g., 150 ms), syllables can be defined as a property between groups of adjacent phones. Once the database has been enriched with utterance and syllable information, speech rate (often defined as syllables per second within an utterance) can be calculated and included in the database. Similarly, information about words (such as frequency) or speakers (such as gender, age, dialect etc.) can be added to the corpus from metadata files. Once a corpus has been sufficiently enriched with linguistic and acoustic information, it is then possible to perform a query on the corpus at a given level of analysis. This level of analysis refers to the level of the hierarchy on which the resulting datafile should use as the main level of observation, for example individual phones, syllables, or utterances. Filters can be applied to a query to restrict it to the particular contexts of interest, for example, including only syllables occurring at the right edge of an utterance, or vowels followed by a specific subset of phone types (e.g., obstruents). Finally, the resulting query can then be exported into a data format (currently CSV only) for further analysis.

Each corpus was processed using the ISCAN software pipeline, and then combined into a single “master” dataset, containing all phonetic, dialect, and speaker information from all of the analyzed corpora necessary to carry out the analysis of the VE below. As the vowel duration annotations from the corpora (except for Buckeye) were created via forced alignment with a minimum duration of 10 ms and a time-step of 30 ms, any token with a vowel duration below 50 ms was excluded from further study, as is common in acoustic studies of vowel formants to exclude heavily reduced vowels ( Dodsworth, 2013 ; Fruehwald, 2013 ). To reduce the additional prosodic and stress effects on vowel duration, the study only included vowels from monosyllabic words occurring phrase-finally, where a phrase is defined as a chunk of speech separated by 150 ms of silence. Raw speech rate was calculated as syllables per second within a phrase, from which two separate speech rates were derived. First, a mean speech rate for each speaker was calculated, which reflects whether a speaker is a “fast” or “slow” speaker overall. From that mean speech rate, a local speech rate was calculated as the raw rate for the utterance subtracted from the given speaker's mean. This local speech rate can be interpreted as how fast or slow that speaker produced the vowel within that particular phrase relative to their average speech rate ( Sonderegger et al., 2017 ; Cohen Priva and Gleason, 2018 ). Word frequency was defined using the SUBTLEX-US dataset ( Brysbaert and New, 2009 ). The final dataset contained 229,406 vowel tokens (1,485 word types) from 1,964 speakers from 30 English dialects. Table 1 shows the number of speakers and tokens for each dialect, and how many speakers/tokens were derived from each speech corpus.

www.frontiersin.org

Table 1 . Number of speakers and tokens per dialect (left), and by corpora from which each dialect was derived.

5. Statistical Analysis

The research goals of this study focus on the size and variability of the VE in English spontaneous speech, and how the VE varies across dialects and speakers. These goals motivate an approach of estimating the size of the VE in these contexts, rather than testing whether the VE “exists” or not. Whilst controlled laboratory experiments are explicitly designed to balance across these contexts (by including matching numbers of tokens with stops vs. fricatives, using words with similar frequency, etc.), spontaneous speech taken from corpora is rarely balanced in this sense: some speakers speak more than others, have different conversations leading to some combinations of segments occurring infrequently relative to others, speakers manage properties of their speech (such as speech rate) for communicative purposes which are generally absent in laboratory studies. In trying to obtain an accurate estimate of the VE (or indeed any other linguistic property), the unbalanced nature of spontaneous speech motivates the need for a statistical approach where individual factors of interest (e.g., obstruent manner of articulation, dialects, etc.) can be explored whilst controlling for the influence of other effects. This approach—the use of multiple regression to model corpus data—is now common in phonetics and sociolinguistic research (e.g., Tagliamonte and Baayen, 2012 ; Roettger et al., 2019 ), but has not, to our knowledge, been used to analyze multiple levels of variability in the VE.

In this study, this approach to estimation is performed using Bayesian regression modeling. Whilst other multifactorial statistical models would also be valid, Bayesian models provide us with some advantages that make the goal of estimating the size of the VE easier. Mixed-models are ideal for use in this study, as these capture variability at multiple levels (the VE overall, across dialects, across speakers) and this variability is of direct interest for our research questions. Bayesian mixed models resemble more traditional linear mixed-effects (LME) models approaches commonly used in linguistic and phonetic research, such as those performed with the lme4 package ( Bates et al., 2015 ), though differ in a few key respects. First, Bayesian models make it easy to calculate the range of possible VE sizes in each context, as opposed to a single value that would be output in LME models: whilst LME models provide ranges for “fixed” effects (across all dialects/speakers), Bayesian models provide a range of possible sizes for each level (i.e., an individual dialect). In a Bayesian model, all parameters (coefficients) in the model are assumed to have a prior distribution of possible values, reflecting which effect sizes are believed to be more or less likely, before examining the data itself. The output of a Bayesian model is a set of posterior distributions, which result from combining the priors and the likelihood of observing the data. Each model parameter has its own posterior distribution, which each represent the range of values for that parameter that is consistent with both the modeled data, conditioned on prior expectations about likely values, and the structure of the model itself. Bayesian models are well-suited to the task in this study, as they allow for flexible fitting of model parameters, and allow the complex random-effects structures which are often recommended for fitting statistically-conservative models ( Barr et al., 2013 ), but which often fail to converge in LME models ( Nicenboim and Vasishth, 2016 ). See Vasishth et al. (2018) for an introduction to Bayesian modeling applied to phonetic research.

A Bayesian mixed model of log-transformed vowel duration was fit using brms (Bürkner, 2018 ): a R-based front-end for the Stan programming language ( Carpenter et al., 2017 ), containing the following population-level (“fixed effects”) predictors: the voicing and manner of the following obstruent, vowel height (high vs. non-high), the lexical class of the word (lexical vs. functional), both mean and local speech rates, and lexical frequency . To observe how compression of the vowel influences VE size, interactions between all of these factors with obstruent voicing were also included. The continuous predictors (both speech rates, frequency), were centered and divided by two standard deviations ( Gelman and Hill, 2007 ). The two-level factors (obstruent voicing, manner, vowel height, lexical class) were converted into binary (0,1) values and then centered.

The group-level (“random effects”) structure of the model contained the complete set of model predictors for both dialects and speakers, nested within dialects. These terms capture two kinds of variability in the VE size: for each individual dialect, as well as the degree of variability across speakers—the nesting of speaker term inside dialects can be interpreted as capturing the variability in the size of the VE across speakers within a given dialect. Given the expectation that both the overall vowel duration (represented by the intercept) and the manner of the obstruent would affect the size of the VE, correlation terms between the intercept and both the consonant voicing and manner predictors, as well as for the interaction between the voicing and manner predictors, were included for both dialects and speakers. Random intercepts were included for words and phoneme labels, also nested within dialects. The model was fit using 8,000 samples across 4 Markov chains (2000/2000 warmup/sample split per chain) and was fit with weakly informative “regularizing” priors ( Nicenboim and Vasishth, 2016 ; Vasishth et al., 2018 ): the intercept prior used a normal distribution with a mean of 0 and a standard deviation of 1 [written as Normal (0, 1)]; the other fixed effects parameters used Normal (0, 0.5) priors, with the exception of the obstruent voicing parameter which used a Normal (0.1, 0.2) prior 5 . The group-level (for dialects, speakers) parameters used the brms default prior of a half Student's t -distribution with 3 degrees of freedom and a scale parameter of 10. The correlations between group-level effects used the LKJ ( Lewandowski et al., 2009 ) with ζ = 2, which gives lower prior probability to perfect (−1/1) correlations, as recommended by Vasishth et al. (2018) .

The results in this study will be reported in the context of the two main research questions concerning VE variability (1) in spontaneous speech, and (2) across English dialects and individual speakers. The results are reported for each effect in terms of the median value with 95% credible intervals (CrIs), and the probability of that effect's direction. These values enable us to understand the size of the effect (i.e., the change in vowel duration) and the confidence in the effect's predicted direction. The strength of evidence for an effect is distinct from the strength of the effect itself: to value the strength of evidence for an effect, we follow the recommendations of Nicenboim and Vasishth (2016) and consider there to be strong evidence of an effect if the 95% credible interval does not include 0, and weak evidence for an effect if 0 is within the 95% CrI but the probability of the effect's direction is at least 95% (i.e., that there is <5% probability that the effect changes direction). Evaluating the strength of an effect is determined with respect to effect sizes previously reported for laboratory (e.g., House and Fairbanks, 1953 ; House, 1961 ) and connected speech ( Crystal and House, 1982 ; Tauberer and Evanini, 2009 ). The degree of variability across dialects can be compared with the findings of Tauberer and Evanini (2009) ; as there is no known comparison for speaker variability, this will be compared to variability across dialects as an initial benchmark.

6.1. The Voicing Effect in Spontaneous Speech

Table 2 reports the population-level (“fixed”) effects for each parameter in the fitted model. The “overall” VE size averaging across dialects, which is between 1.09 and 1.2, is estimated to be smaller than reported in previous laboratory studies ( β ^ = 0.14, CrI = [0.09, 0.19], Pr( β ^ > 0 ) = 1) 6 and more consistent with VE sizes reported in studies of connected and spontaneous speech ( Crystal and House, 1982 ; Tauberer and Evanini, 2009 ).

www.frontiersin.org

Table 2 . Posterior mean ( β ^ ), estimated error, upper & lower credible intervals, and posterior probability of the direction of each population-level parameter included in the model of log-transformed vowel duration.

Looking at how the overall VE size for all dialects is modulated by phonetic context, there is weak evidence that the manner of the following obstruent modulates VE size ( β ^ = −0.04, CrI = [−0.10, 0.02], Pr( β ^ < 0 ) = 0.91): whilst stops appear to have a larger VE size ( Figure 1 , top left), the uncertainty in VE size for each obstruent manner (represented by the spread of the credible intervals) suggests that it is possible there is no difference in VE size between both obstruent manners. Whilst high vowels are shown to be shorter than non-high vowels overall ( β ^ = −0.22, CrI = [−0.25, −0.18], Pr( β ^ < 0 ) = 1), there is strong evidence that high vowels have a larger VE than non-high vowels ( β ^ = 0.07, CrI = [0.02, 0.11], Pr( β ^ > 0 ) = 1). There is a similarly strong effect for lexical class ( β ^ = −0.07, CrI = [−0.13, 0.00], Pr( β ^ < 0 ) = 0.97), where functional words have smaller VEs than open-class lexical items ( Figure 1 , top right). Lexical frequency also has a strong and evident effect on VE size ( β ^ = −0.07, CrI = [−0.11, −0.03], Pr( β ^ < 0 ) = 1), where higher-frequency words have smaller VEs than their lower-frequency counterparts ( Figure 1 , bottom left), whilst local speech rate also reduces VE size ( β ^ = −0.06, CrI = [−0.08, −0.03], Pr( β ^ < 0 ) = 1; Figure 1 , bottom middle). For mean speaking rate, however, the effect on VE is both small with weak evidence ( β ^ = −0.01, CrI = [−0.03, 0.01], Pr( β ^ < 0 ) = 0.77): this is reflected in Figure 1 (bottom right), where the difference between faster and slower speakers has a negligible effect on VE size. These results generally suggest that shorter vowels (within-speaker) tend to have smaller VE sizes, consistent with the temporal compression account ( Klatt, 1973 ): the apparent exception to this is the relationship between VE size and vowel height, which is addressed in section 7.

www.frontiersin.org

Figure 1 . Modulation of VE size in different phonetic contexts: obstruent manner (Top Left) , vowel height (Top Middle) , lexical class (Top Right) , frequency (Bottom Left) , local (Bottom Middle) , and mean (Bottom Right) speech rates. Points and error bars indicate the posterior mean value with 95% credible intervals, whilst holding all other predictors at their average values. Dashed line indicates no difference between vowels preceding voiced or voiceless consonants. For continuous predictors (frequency, speech rates), the estimate VE size is shown at three values for clarity.

6.2. Voicing Effect Across Dialects and Speakers

Turning to dialectal variability in VE, we observe that the dialect variation in VE (the dialect-level standard deviation, σ ^ d i a l e c t ) is between 0.07 and 0.12: this can be interpreted as meaning that the difference in VE size between a “low” and “high” VE dialect is between 32 and 61% 7 ( Table 3 ). This is comparable with the range of possible values for the overall VE (between 0.09 and 0.19, Table 2 , row 2). To understand whether this constitutes a “large” degree of variability, one metric is to assess whether a “low VE” dialect would actually have a reversed effect direction (voiceless > voiced), which is tested by subtracting 2 × σ ^ d i a l e c t from the overall VE size and comparing to 0. There is little evidence that dialects differ enough to change direction ( β ^ = −0.05, CrI = [−0.09, 0], Pr( β ^ > 0 ) = 0.06), which suggests that whilst individual dialects differ in the size of the VE, no dialect fully differs in the direction of the effect (i.e., no dialect's credible interval is fully negative).

www.frontiersin.org

Table 3 . Posterior mean ( σ ^ ), estimated error, and 95% credible intervals for dialect and speaker-level parameters related to obstruent voicing included in the model of log-transformed vowel duration.

Another way of understanding the degree of dialectal variability in VE is to examine the predicted VE for individual dialects. As shown in Figure 2 , dialects appear to differ gradiently from each other, ranging from dialects with effectively-null VE to those with strong evidence for large VEs. The Scottish dialects of Central Scotland and Edinburgh have VEs of at most 1.06 and 1.09, respectively, based on their upper credible interval value, whilst their median values (indicated by the points in Figure 2 ) indicate that the most likely VE size is around 0 (Central Scotland: β ^ = 0.99, CrI = [0.93, 1.06]; Edinburgh: β ^ = 1.01, CrI = [0.93, 1.09]): indeed, all Scottish dialects have a predicted VE size of 1.16 at the highest, with most of these having median values <1.1 ( Table 4 ). North American dialects, in contrast, all have robustly positive VE values (no credible interval crosses the 0 line) and are generally larger than the British and Irish variants, shown by the position of red (North American) and blue (United Kingdom and Ireland) points respectively in Figure 2 . In particular, the AAE dialects have the largest VEs in the sample, which are all robustly larger than the average “English” VE size (Rochester NY: β ^ = 1.35, CrI = [1.27, 1.44]; Princeville NC: β ^ = 1.39, CrI = [1.31, 1.48]; Washington DC: β ^ = 1.49, CrI = [1.42, 1.56]): this is consistent with previous studies of studies on AAE, which posit that final devoicing of word-final voiced obstruents results in compensatory vowel lengthening ( Holt et al., 2016 ; Farrington, 2018 ).

www.frontiersin.org

Table 4 . Estimated VE sizes (mean, estimated error, and upper and lower credible intervals) for each dialect used in this study.

www.frontiersin.org

Figure 2 . Estimated VE size for each dialect analyzed in this study (red = North American, blue = United Kingdom and Ireland). Points and errorbars indicate the posterior mean value with 95% credible intervals, whilst holding all other predictors at their average values. Dashed line indicates no difference between vowels preceding voiced or voiceless consonants.

Turning to variability in VE across individual speakers, we observe that speakers are estimated to vary within-dialect by between 0.07 and 0.08 ( σ ^ s p e a k e r = 0.08, CrI = [0.07, 0.08]), meaning that speakers differ in their VE ratios by between 32 and 37% ( Table 3 ). To put this value in context and get an impression of the size of variability across speakers, this value is compared with the degree of variability across dialects. Figure 3 illustrates how likely the model deems different degrees of by-speaker and by-dialect variability: highest probability (darker shading) lies where by-dialect variability is greater than by-speaker variability. By the metric of between-dialect variability, Figure 3 illustrates that whilst dialects differ in VE size, individual speakers vary little from their dialect-specific baseline value.

www.frontiersin.org

Figure 3 . Heatmap of posterior samples of by-dialect ( σ ^ d i a l e c t ) and by-speaker ( σ ^ s p e a k e r ) voicing effect standard deviations. Equal variability is indicated by the dashed line, with darker shades indicating a greater density of samples.

7. Discussion

The findings from this study will be discussed with respect to the two research questions: (1) how the VE is realized in spontaneous speech, and (2) how the VE varies across dialects and speakers. The VE in English is often considered to be substantially larger than in other languages ( Chen, 1970 ) and claimed to play a significant perceptual role in cueing consonant voicing ( Denes, 1955 ). Taken together, these observations have formed the basis for claims that the VE in English is phonologically specified beyond an otherwise phonetically-consistent acoustic property across languages ( Fromkin, 1977 ; Keating, 1985 ). Previous work has focused on controlled laboratory speech, leaving open the question of how the VE is realized in spontaneous English speech.

In this study, the overall VE in spontaneous speech was observed to have a maximum size of around 1.2—substantially smaller than the 1.5 commonly reported in laboratory studies (e.g., House and Fairbanks, 1953 ; Peterson and Lehiste, 1960 ; House, 1961 ; Chen, 1970 ), and more consistent with previous research on VE in connected speech ( Crystal and House, 1982 ; Tauberer and Evanini, 2009 ). Spontaneous VE size was also shown to be affected by a range of phonetic factors, such as consonant manner, vowel height, frequency, and speech rate, though the evidence for each of these effects varies substantially (section 6.1). What the effects of these phonetic factors suggest is that contexts where vowels are often shorter also have shorter VE sizes, supporting the argument of “temporal compression”: that vowels which have already undergone shortening cannot be subsequently shortened further ( Harris and Umeda, 1974 ; Klatt, 1976 ). An interesting exception to this finding is that the VE size was found to be larger for high vowels than non-high vowels in this study ( Figure 1 )—the direction of this effect may be counter to that predicted by temporal compression, and opens a question as to whether this and other predictions of temporal compression are straightforwardly replicable in spontaneous speech environments. The overall smaller-size and impact of phonetic factors of the VE in spontaneous speech indicates a possible fragility of the VE in spontaneous speech, in apparent contrast to the supposed perceptual importance of the VE as a cue to consonant voicing ( Denes, 1955 ; Lisker, 1957 ; Raphael, 1972 ). This apparent conflict between the perceptual importance of the VE and its subtlety in production provides an interesting area for future work.

The fact that VE size in English differs so widely between laboratory and connected speech not only demonstrates the importance of speech style and context on phonetic realization ( Labov, 1972 ; Lindblom, 1990 ), but also raises the question of “how big” the VE in English really is, or could be. If larger overall VE size is only observable in laboratory speech, it would be interesting to empirically re-evaluate the question of whether English VE is in fact larger than in other languages. For languages that exhibit smaller VEs than English in laboratory speech ( Chen, 1970 ), it is not clear how such languages may realize the VE in more naturalistic speech. One possibility is that the VE across languages is comparatively small in spontaneous speech and similarly affected by phonetic factors; alternatively, the VE in spontaneous speech across other languages may still be smaller than in English and retain cross-linguistic differences akin to those reported by Chen (1970) , and thus English would still retain its status as a language with a distinct realization of the VE.

The first research question (section 6.1) considered how the VE was modulated in spontaneous speech, averaging across dialects. To what extent dialects themselves differ in VE was the focus of the second research question. As shown in section 6.2, English was shown to exhibit a range of different VE sizes across individual dialects. The dialects with the smallest and largest VEs—Scottish Englishes and AAE, respectively—were expected to show these values given evidence of additional phonological rules governing vowel duration in these varieties ( Aitken, 1981 ; Holt et al., 2016 ; Rathcke and Stuart-Smith, 2016 ; Farrington, 2018 ). Beyond these varieties, dialects appear to differ gradiently from each other, ranging in VE values from around 1.05 in South West England to 1.24 in the Northern Cities region ( Figure 2 ). As opposed there being a single “English” VE value, there appears to be a range of VE sizes within the language. Such a finding further complicates the notion that English has a particular and large VE relative to other languages. Imagining these different dialects as “languages” with minimally different phonological structures, this finding demonstrates that such similar “languages” can have very different phonetic effects ( Keating, 1985 ). This in turn underlies a more nuanced approach to the question of whether English truly differs from other languages in its VE size: not only may English have varieties with greater or lesser VE sizes, but other languages may also exhibit similar dialectal VE ranges.

Individual speakers are also shown to vary in the realization of the VE, though the extent of this variability is rather limited when compared to variability across dialects ( Figure 3 ): that is, whilst dialects appear to demonstrate a range of possible VE patterns, individual speakers vary little from their dialect-specific baseline values. Such a finding supports an interpretation where the VE has a dialect-specific value which speakers learn as part of becoming a speaker of that speech community. The limited extent of speaker variability could predict that the VE will be stable within individual English dialects, given the key role of synchronic speaker variability as the basis for sound change ( Ohala, 1989 ; Baker et al., 2011 ). This would need checking on a dialect-by-dialect basis, however, given recent evidence of Glaswegian undergoing weakening in its vowel duration patterns ( Rathcke and Stuart-Smith, 2016 ). It also highlights the need for studies addressing both synchronic and diachronic variability across dialects, which we hope to address in future work. One important caveat to the finding is that it assumes that all the dialects analyzed in this study contain only speakers who are speakers of that dialect: if a given dialect had a particularly large degree of by-speaker variability, it could be that this could reflect the existence of multiple speakers of different dialects (and thus different VE patterns) within that particular dialect coding. This is unlikely to be a particular problem in this study, however, as a separate model that allows for by-speaker variability to vary on a per-dialect basis showed that no dialect with a sufficiently large number of tokens exhibited overly large by-speaker variability (section 6.2).

By using speech data from multiple sources and multiple dialects, it has been possible to investigate variability of a phonological feature across “English” overall, examine variability at the level of individual dialects and speakers, and reveal the extent of English-wide phonetic variability that was not previously apparent in studies of individual dialects and communities. In this sense, our “large-scale” approach, using consistent measures and controlling factors, enables us to understand the nature of dialectal variability in the English VE directly within the context of both other dialects and English as a whole.

Whilst this kind of study extends the scope of analysis for (socio)phonetic research, there are of course a number of limitations that should be kept in mind in studies of this kind. This study of the English VE predominantly uses data from automatic acoustic measurements, in turn calculated from forced aligned-segmented datasets. All forced-alignment tools have a minimum time resolution (often 10 ms), a minimum segment duration (often 30 ms), and there always exists the possibility of poor or inaccurate alignment. This is a necessary consequence of the volume of data used in this study: there is simply too much data to manually check and correct all durations, and so the best means of limiting these effects is through sensible filtering and modeling of the data. For example, segments with aligned durations of less than 50 ms were excluded, since accurately capturing the duration of a vowel this small could be difficult given the time resolution of the aligner. This decision could exaggerate the size of the VE estimation, as only the most reduced vowels have been removed from the data. Another property of forced alignment which impacts our study of VE is that aligners will only apply the phonological segment label to the segment, meaning that it is possible to only examine VE in terms of phonological voicing specification (i.e., whether a segment is underlyingly voiced or not), as opposed to whether the segment itself was realized with phonetic voicing. For example, the realization of the stop as devoiced ( Farrington, 2018 ) or as a glottal stop ( Smith and Holmes-Elliott, 2018 ), or the relative duration of the closure preceding the vowel ( Lehiste, 1970 ; Port and Dalby, 1982 ; Coretta, 2019 ), could affect VE size which is not controllable by exclusively using phonological segment labels. How this kind of phonetic variation, and the more general relationship between a “phonological” and a “phonetic” VE, should be understood would certainly be an interesting project for future work. Finally, given the diversity of formats and structures of the corpora available for this study, it has only been possible to categorize and study dialects in a rather broad “regional” fashion. Similarly, we were unable to investigate the effect of speaker age due to the heterogenous coding of age across the corpora: we agree this is an important dimension that we have attempted to account for in the approach to statistical modeling, and is certainly necessary to examine in future work. Whilst these limitations may be less suitable for approaching other questions in phonetics and sociolinguistics which are concerned with variability at a more detailed level, the approach taken in this study points to a promising first step toward exposing the structures underlying fine-grained phonetic variability at a larger level across multiple speakers and dialects of a language.

8. Conclusion

The recent increase in availability of spoken-language corpora, and development of speech and data processing tools have now made it easier to perform phonetic research at a “large-scale”—incorporating data from multiple different corpora, dialects, and speakers. This study applies this large-scale approach to investigate how the English Voicing Effect (VE) is realized in spontaneous speech, and the extent of its variability across individual dialects and speakers. Little has been known about how the VE varies across dialects bar a handful of studies of specific dialects ( Aitken, 1981 ; Tauberer and Evanini, 2009 ; Holt et al., 2016 ). English provides an interesting opportunity to directly examine how phonetic implementation may differ across language varieties with minimally different phonological structures ( Keating, 1985 ). By applying tools for automatic acoustic analysis ( McAuliffe et al., 2019 ) and statistical modeling ( Carpenter et al., 2017 ), it was found that the English VE is substantially smaller in spontaneous speech, as compared with controlled laboratory speech, and is modulated by a range of phonetic factors. English dialects demonstrate a wide degree of variability in VE size beyond that expected from specific dialect patterns such as the SVLR, whilst individual speakers are relatively uniform with respect to their dialect-specific baseline values. In this way, this study provides an example of how large-scale studies can provide new insights into the structure of phonetic variability of English and language more generally.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Author Contributions

JT extracted the data, performed the statistical analysis, and wrote the first draft of the manuscript. All authors contributed to the conception and design of the study. All authors contributed to manuscript revision, and read and approved the submitted version.

The research reported here is part of SPeech Across Dialects of English (SPADE): Large-scale digital analysis of a spoken language across space and time (2017–2020); ESRC Grant ES/R003963/1, NSERC/CRSNG Grant RGPDD 501771-16, SSHRC/CRSH Grant 869-2016-0006, NSF Grant SMA-1730479 (Digging into Data/Trans-Atlantic Platform), and was also supported by SSHRC #435-2017-0925 awarded to MS and a Fonds de Recherche du Québec Société et Culture International Internship award granted to JT.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We acknowledge the crucial contribution of The SPADE Consortium ( https://spade.glasgow.ac.uk/the-spade-consorti um/ ), which comprises the Data Guardians who generously shared their datasets which are reported in this paper, and/or whose datasets were used in the development of the ISCAN tool. No SPADE research would have been possible without The SPADE Consortium. This research reported in this paper is an extended version of a preliminary report in Tanner et al. (2019) . We would like to thank the audiences of the 2019 Montreal-Ottawa-Toronto Phonology Workshop and UK Language Variation & Change 12 for their feedback on previous versions of this research. We also thank SPADE team members, especially Michael Goodale, Rachel Macdonald, and Michael McAuliffe for assistance with data management.

1. ^ Harris and Umeda (1974) , in their study of overall vowel duration, attribute this difference to a “mechanical” prosody as a consequence of numerous repetitions.

2. ^ Part of The Dictionary of the Scots Language ( https://dsl.ac.uk/ ).

3. ^ We are grateful to Michael Goodale for designing and performing this filtering protocol.

4. ^ https://iscan.readthedocs.io/

5. ^ The values chosen for the obstruent voicing parameter reflect the decision to allow a wide range of possible VE sizes, including values both above and below those reported in the previous literature. A sensitivity analysis was performed using an additional model fit with a “uniform” flat prior for the obstruent voicing parameter, which returned VE values differing by an order of 10 −3 , suggesting that the decision for the weakly-informative prior did not adversely affect the reported results.

6. ^ As vowel duration was log-transformed prior to fitting, effects are interpreted by taking the exponent of the model parameter's value, e.g., e 0.19 =1.2, which refers to a vowel duration increase of 20%.

7. ^ The value is multiplied by 4 to get the 95% range of values = 2 σ ^ d i a l e c t for both sides of the distribution = 0.28, which is then back-transformed from log via the exponential function = e 0.28 = 1.32.

Aitken, A. J. (1981). The Scottish Vowel Length Rule. The Middle English Dialect Project . Edinburgh.

PubMed Abstract | Google Scholar

Anderson, J., Beavan, D., and Kay, C. (2007). “The Scottish corpus of texts and speech,” in Creating and Digitizing Language Corpora , eds J. C. Beal, K. P. Corrigan, and H. L. Moisl (New York, NY: Palgrave), 17–34. doi: 10.1057/9780230223936_2

CrossRef Full Text | Google Scholar

Baker, A., Archangeli, D., and Mielke, J. (2011). Variability in American English s-retraction suggests a solution to the actuation problem. Lang. Variat. Change 23, 347–374. doi: 10.1017/S0954394511000135

Barr, D. J., Levy, R., Sheepers, C., and Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang . 68, 255–278. doi: 10.1016/j.jml.2012.11.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw . 67, 1–48. doi: 10.18637/jss.v067.i01

Boberg, C. (2018). “Dialects of North American English,” in Handbook of Dialectology , eds C. Boberg, J. Nerbonne, and D. Watt (Oxford: John Wiley and Sons), 450–461. doi: 10.1002/9781118827628.ch26

Bois, J. W. D., Chafe, W. L., Meyer, S. A., Thompson, S. A., and Martey, N. (2000). Santa Barbara Corpus of Spoken American English . Technical report, Linguistic Data Consortium, Philadelphia, PA.

Google Scholar

Brysbaert, M., and New, B. (2009). Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990. doi: 10.3758/BRM.41.4.977

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. R J . 10, 395–411. doi: 10.32614/RJ-2018-017

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., et al. (2017). Stan: A probabilistic programming language. J. Stat. Softw . 76, 1–32. doi: 10.18637/jss.v076.i01

Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant environment. Phonetica 22, 129–159. doi: 10.1159/000259312

Clopper, C. G., Pisoni, D. B., and de Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. J. Acoust. Soc. Am . 118, 1661–1676. doi: 10.1121/1.2000774

Cohen Priva, U., and Gleason, E. (2018). “The role of fast speech in sound change,” in Proceedings of the 40th Annual Conference of the Cognitive Science Society (Austin, TX: Cognitive Science Society), 1512–1517.

Coleman, J., Baghai-Ravary, L., Pybus, J., and Grau, S. (2012). Audio BNC: The Audio Edition of the Spoken British National Corpus. Technical report, Oxford . Available online at: http://www.phon.ox.ac.uk/AudioBNC

Coleman, J., Renwick, M. E. L., and Temple, R. A. M. (2016). Probabilistic underspecification in nasal place assimilation. Phonology 33, 425–458. doi: 10.1017/S0952675716000208

Coretta, S. (2019). An exploratory study of voicing-related differences in vowel duration as compensatory temporal adjustment in Italian and Polish. Glossa 4, 1–25. doi: 10.5334/gjgl.869

Crystal, T. H., and House, A. S. (1982). Segmental durations in connected speech signals: preliminary results. J. Acoust. Soc. Am . 72, 705–716. doi: 10.1121/1.388251

Cuartero, N. (2002). Voicing assimilation in Catalan and English (Ph.D. thesis). Universitat Autónoma de Barcelona, Barcelona, Spain.

Denes, P. (1955). Effect of duration on the perception of voicing. J. Acoust. Soc. Am . 27, 761–764. doi: 10.1121/1.1908020

Docherty, G. (1992). The Timing of Voicing in British English Obstruents . Berlin; New York, NY: Foris. doi: 10.1515/9783110872637

Dodsworth, R. (2013). Retreat from the Southern Vowel Shift in Raleigh, NC: social factors. Univ. Pennsylvania Work. Pap. Linguist . 19, 31–40. Available online at: https://repository.upenn.edu/pwpl/vol19/iss2/5/

Dodsworth, R., and Kohn, M. (2012). Urban rejection of the vernacular: the SVS undone. Lang. Variat. Change 24, 221–245. doi: 10.1017/S0954394512000105

Ernestus, M., Hanique, I., and Verboom, E. (2015). The effect of speech situation on the occurrence of reduced word pronunciation variants. J. Phonet . 38, 60–75. doi: 10.1016/j.wocn.2014.08.001

Fabricius, A. H. (2000). T-glottalling between stigma and prestige: a sociolinguistic study of Modern RP (Ph.D. thesis). Copenhagen Business School, Copenhagen, Denmark.

Farrington, C. (2018). Incomplete neutralization in African American English: the cast of final consonant devoicing. Lang. Variat. Change 30, 361–383. doi: 10.1017/S0954394518000145

Fromkin, V. A. (1977). “Some questions regarding universal phonetics and phonetic representations,” in Linguistic Studies Offered to Joseph Greenberg on the Occasion of His Sixtieth Birthday , ed A. Juilland (Saratoga, NY: Anma Libri), 365–380.

Fromont, R., and Hay, J. (2012). “LaBB-CAT: an annotation store,” in Australasian Language Technology Workshop 2012, Vol. 113 , 113–117.

Fruehwald, J. (2013). The phonological influence on phonetic change (Ph.D. thesis). University of Pennsylvania, Pennsylvania, PA, United States.

Gelman, A., and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models . Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511790942

Geng, C., Turk, A., Scobbie, J. M., Macmartin, C., Hoole, P., Richmond, K., et al. (2013). Recording speech articulation in dialogue: evaluating a synchronized double electromagnetic articulography setup. J. Phonet . 41, 421–431. doi: 10.1016/j.wocn.2013.07.002

Godfrey, J. J., Holliman, E. C., and McDaniel, J. (1992). “SWITCHBOARD: telephone speech corpus for research and development,” in Proceedings of the 1992 IEEE International Conference on Acoustics, Speech and Signal Processing - Vol. 1 (San Francisco, CA), 517–520. doi: 10.1109/ICASSP.1992.225858

Greenbaum, S., and Nelson, G. (1996). The International Corpus of English (ICE project). World English . 15, 3–15. doi: 10.1111/j.1467-971X.1996.tb00088.x

Guy, G. (1980). “Variation in the group and the individual: the case of final stop deletion,” in Locating Language in Time and Space , ed W. Labov (New York, NY: Academic Press), 1–36.

Harris, M., and Umeda, N. (1974). Effect of speaking mode on temporal factors in speech: vowel duration. J. Acoust. Soc. Am . 56, 1016–1018. doi: 10.1121/1.1903366

Heffner, R.-M. S. (1937). Notes on the lengths of vowels. Am. Speech 12, 128–134. doi: 10.2307/452621

Hewlett, N., Matthews, B., and Scobbie, J. M. (1999). “Vowel duration in Scottish English speaking children,” in Proceedings of 14th The International Congress of Phonetic Sciences (San Francisco, CA).

Hibbitt, G. W. (1948). Diphthongs in American speech: a study of the duration of diphthongs in the contextual speech of two hundred and ten male undergraduates (Ph.D. thesis). Columbia University, New York, NY, United States.

Holmes-Elliott, S. (2015). London calling: assessing the spread of metropolitan features in the southeast (Ph.D. thesis). University of Glasgow, Glasgow, Scotland.

Holt, Y. F., Jacewicz, E., and Fox, R. A. (2016). Temporal variation in African American English: the distinctive use of vowel duration. J. Phonet. Audiol . 2. doi: 10.4172/2471-9455.1000121

House, A. S. (1961). On vowel duration in English. J. Acoust. Soc. Am . 33, 1174–1178. doi: 10.1121/1.1908941

House, A. S., and Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. J. Acoust. Soc. Am . 25, 105–113. doi: 10.1121/1.1906982

Jacewicz, E., and Fox, R. A. (2013). “Cross-dialectal differences in dynamic formant patterns in American English vowels,” in Vowel Inherent Spectral Change , eds G. S. Morrison and P. F. Assmann (Berlin: Springer), 177–198. doi: 10.1007/978-3-642-14209-3_8

Jones, D. (1948). An Outline of English Phonetics . New York, NY: E. P. Dutton & Company.

Keating, P. (1984). Phonetic and phonological representation of stop consonant voicing. Language 60, 189–218. doi: 10.2307/413642

Keating, P. (2006). “Phonetic encoding of prosodic structure,” in Speech Production: Models, Phonetic Processes, and Techniques , eds J. Harrington and M. Tabain (New York, NY: Psychology Press), 197–186.

Keating, P. A. (1985). “Universal phonetics and the organization of grammars,” in Phonetic Linguistics: Essays in Honor of Peter Ladefoged , ed V. A. Fromkin (New York, NY: Academic Press), 115–132.

Kendall, T., and Farrington, C. (2018). The Corpus of Regional African American Language. Version 2018.10.06 , Eugene, OR.

Kenyon, J. S. (1940). American Pronunciation . Ann Arbor, MI: George Wahr.

Klatt, D. H. (1973). Interaction between two factors that influence vowel duration. J. Acoust. Soc. Am . 54, 1102–1104. doi: 10.1121/1.1914322

Klatt, D. H. (1976). Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Am . 59, 1208–1221. doi: 10.1121/1.380986

Kleinschmidt, D. F. (2018). Structure in talker variability: how much is there and how much can it help? Lang. Cogn. Neurosci . 34, 1–26. doi: 10.1080/23273798.2018.1500698

Labov, W. (1972). Sociolinguistic Patterns . Philadelphia, PA: University of Pennsylvania Press.

Labov, W., Ash, S., and Boberg, C. (2006). The Atlas of North American English: Phonetics, Phonology, and Sound Change . Berlin: Mouton de Gruyter. doi: 10.1515/9783110167467

Labov, W., Cohen, P., Robins, C., and Lewis, J. (1968). A Study of the Non-Standard English of Negro and Puerto Rican Speakers in New York City . Technical Report 1 & 2, Linguistics Laboratory, University of Pennsylvania.

Labov, W., and Rosenfelder, I. (2011). “New tools and methods for very large scale measurements of very large corpora,” in New Tools and Methods for Very-Large-Scale Phonetics Research Workshop , Pennsylvania, PA.

Labov, W., Rosenfelder, I., and Fruehwalf, J. (2013). One hundred years of sound change in Philadelphia: linear incrementation, reversal, and reanalysis. Language 89, 30–65. doi: 10.1353/lan.2013.0015

Lehiste, I. (1970). Temporal organization of higher-level linguistic units. J. Acoust. Soc. Am . 48:111. doi: 10.1121/1.1974906

Lewandowski, D., Kurowicka, D., and Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal . 100, 1989–2001. doi: 10.1016/j.jmva.2009.04.008

Liberman, M. (2018). Corpus phonetics. Annu. Rev. Linguist . 5, 91–107. doi: 10.1146/annurev-linguistics-011516-033830

Liberman, M. A., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code. Psychol. Rev . 74, 431–461. doi: 10.1037/h0020279

Lindblom, B. (1990). “Explaining phonetic variation: a sketch of the h&h theory,” in Speech Production and Speech Modelling , Vol. 4 of NATO ASI Series, eds W. J. Hardcastle and A. Marchal (Dordrecht: Kluwer Academic Publishers), 403–439. doi: 10.1007/978-94-009-2037-8_16

Lisker, L. (1957). Linguistic segments, acoustic segments and synthetic speech. Language 33, 370–374. doi: 10.2307/411159

Lisker, L. (1985). The pursuit of invariance in speech signals. J. Acoust. Soc. Am . 77, 1199-1202. doi: 10.1121/1.392185

Luce, P. A., and Charles-Luce, J. (1985). Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. J. Acoust. Soc. Am . 78, 1949–1957. doi: 10.1121/1.392651

Mack, M. (1982). Voicing-dependent vowel duration in English and French: monolingual and bilingual production. J. Acoust. Soc. Am . 71, 173–178. doi: 10.1121/1.387344

McAuliffe, M., Coles, A., Goodale, M., Mihuc, S., Wagner, M., Stuart-Smith, J., et al. (2019). “ISCAN: A system for integrated phonetic analyses across speech corpora,” in Proceedings of the 19th International Congress of Phonetic Sciences (Melbourne, VIC).

McAuliffe, M., Scolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017a). Montreal Forced Aligner [computer program] . Available online at: https://montrealcorpustools.github.io/Montreal-Forced-Aligner/

McAuliffe, M., Stengel-Eskin, E., Socolof, M., and Sonderegger, M. (2017b). “Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora,” in Proceedings of Interspeech 2017 (Stockholm). doi: 10.21437/Interspeech.2017-1390

Nicenboim, B., and Vasishth, S. (2016). Statistical methods for linguistic research: foundational ideas - part II. Lang. Linguist. Compass 10, 591–613. doi: 10.1111/lnc3.12207

Ohala, J. (1989). “Sound change is drawn from a pool of synchronic variation,” in Language Change: Contributions to the Study of Its Causes , eds L. E. Breivik and E. H. Jahr (Berlin: Mouton de Gruyter), 173–198.

Peterson, G. E., and Lehiste, I. (1960). Duration of syllable nuclei in English. J. Acoust. Soc. Am . 32, 693–703. doi: 10.1121/1.1908183

Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., et al. (2007). Buckeye Corpus of Spontaneous Speech, 2nd Edn . Columbus, OH: Ohio State University.

Port, R. F., and Dalby, J. (1982). Consonant/vowel ratio as a cue for voicing in English. Percept. Psychophys . 32, 141–152. doi: 10.3758/BF03204273

Raphael, L. J. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American English. J. Acoust. Soc. Am . 51, 1296–1303. doi: 10.1121/1.1912974

Rathcke, T., and Stuart-Smith, J. (2016). On the tail of the Scottish Vowel Length Rule in Glasgow. Lang. Speech 59, 404–430. doi: 10.1177/0023830915611428

Roettger, T. B., Winter, B., and Baayen, R. H. (2019). Emergent data analysis in phonetic sciences: towards pluralism and reproducibility. J. Phonet . 73, 1–7. doi: 10.1016/j.wocn.2018.12.001

Rosen, N., and Skriver, C. (2015). Vowel patterning of Mormons in Southern Alberta, Canada. Lang. Commun . 42, 104–115. doi: 10.1016/j.langcom.2014.12.007

Rosenfelder, I., Fruehwald, J., Evanini, K., Seyfarth, S., Gorman, K., Prichard, H., et al. (2014). FAVE (Forced Alignment and Vowel Extraction) Program Suite v1.2.2 10.5281/zenodo.22281 .

Rositzke, H. A. (1939). Vowel-length in General American speech. Am. Speech 15, 99–109. doi: 10.2307/408728

Smith, J., and Holmes-Elliott, S. (2018). The unstoppable glottal: tracking rapid change in an iconic British variable. English Lang. Linguist . 22, 323–355. doi: 10.1017/S1360674316000459

Solanki, V. J. (2017). Brains in dialogue: investigating accommodation in live conversational speech for both speech and EEG data (Ph.D. thesis). University of Glasgow, Glasgow, Scotland.

Solé, M.-J. (2007). “Controlled and mechanical properties in speech,” in Experimental Approaches to Phonology , eds P. Beddor and M. Ohala (Oxford: Oxford University Press), 302–321.

Sonderegger, M., Bane, M., and Graff, P. (2017). The medium-term dynamics of accents on reality television. Language 93, 598–640. doi: 10.1353/lan.2017.0038

Sonderegger, M., Stuart-Smith, J., McAuliffe, M., Macdonald, R., and Kendall, T. (2019). “Managing data for integrated speech corpus analysis in SPeech Across Dialects of English (SPADE),” in Open Handbook of Linguistic Data Management , eds A. Berez-Kroeker, B. McDonnell, E. Koller, and L. Collister (Cambridge: MIT Press).

Stuart-Smith, J., Jose, B., Rathcke, T., MacDonald, R., and Lawson, E. (2017). “Changing sounds in a changing city: an acoustic phonetic investigation of real-time change over a century of Glaswegian,” in Language and a Sense of Place: Studies in Language and Region , eds C. Montgomery and E. Moore (Cambridge: Cambridge University Press), 38–65. doi: 10.1017/9781316162477.004

Stuart-Smith, J., Sonderegger, M., Rathcke, T., and Macdonald, R. (2015). The private life of stops: VOT in a real-time corpus of spontaneous Glaswegian. Lab. Phonol . 6, 505–549. doi: 10.1515/lp-2015-0015

Summers, W. V. (1987). Effects of stress and final consonant voicing on vowel production: articulatory and acoustic analyses. J. Acoust. Soc. Am . 82, 847–863. doi: 10.1121/1.395284

Sweet, H. (1880). A Handbook of Phonetics . London: MacMillan & Co.

Tagliamonte, S., and Temple, R. (2005). New perspectives on an ol variable: (t, d) in British English. Lang. Variat. Change 17, 281–302. doi: 10.1017/S0954394505050118

Tagliamonte, S. A., and Baayen, R. H. (2012). Models, forests, and trees of York English: was/were variation as a case study for statistical practice. Lang. Variat. Change 24, 135–178. doi: 10.1017/S0954394512000129

Tanner, J., Sonderegger, M., Stuart-Smith, J., and SPADE-Consortium (2019). Vowel duration and the voicing effect across English dialects. Univers. Toronto Work. Pap. Linguist . 41, 1–13. doi: 10.33137/twpl.v41i1.32769

Tauberer, J., and Evanini, K. (2009). “Intrinsic vowel duration and the post-vocalic voicing effect: some evidence from dialects of North American English,” in Proceedings of Interspeech .

Thomas, C. K. (1947). An Introduction to the Phonetics of American English . New York, NY: Ronald Press Company.

Thomas, E. R. (2001). An Acoustic Analysis of Vowel Variation in New World English . American Dialect Society.

Trudgill, P. (1999). The Dialects of England . Oxford: Blackwell.

Umeda, N. (1975). Vowel duration in American English. J. Acoust. Soc. Am . 58, 434–445. doi: 10.1121/1.380688

Vasishth, S., Nicenboim, B., Beckman, M., Li, F., and Kong, E. J. (2018). Bayesian data analysis in the phonetic sciences: a tutorial introduction. J. Phonet . 71, 147–161. doi: 10.1016/j.wocn.2018.07.008

Yuan, J., and Liberman, M. (2014). F0 declination in English and Mandarin broadcast news speech. Speech Commun . 65, 67–74. doi: 10.1016/j.specom.2014.06.001

Yuan, J., Liberman, M., and Cieri, C. (2006). “Towards an integrated understanding of speaking rate in conversation,” in Proceedings of Interspeech 2006 , Pittsburgh, PA.

Yuan, J., Liberman, M., and Cieri, C. (2007). “Towards an integrated understanding of speech overlaps in conversation,” in Proceedings of the International Congress of Phonetic Sciences XVI (Saarbrücken), 1337–1340.

Zimmerman, S. A., and Sapon, S. M. (1958). Note on vowel duration seen cross-linguistically. J. Acoust. Soc. Am . 30, 152–153. doi: 10.1121/1.1909521

Keywords: voicing effect, English, phonetic variability, Bayesian modeling, dialectal variation, speaker variability

Citation: Tanner J, Sonderegger M, Stuart-Smith J and Fruehwald J (2020) Toward “English” Phonetics: Variability in the Pre-consonantal Voicing Effect Across English Dialects and Speakers. Front. Artif. Intell. 3:38. doi: 10.3389/frai.2020.00038

Received: 24 December 2019; Accepted: 01 May 2010; Published: 29 May 2020.

Reviewed by:

Copyright © 2020 Tanner, Sonderegger, Stuart-Smith and Fruehwald. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: James Tanner, james.tanner@mail.mcgill.ca

This article is part of the Research Topic

Computational Sociolinguistics

Optica Publishing Group

  • Keep it simple - don't use too many different parameters.
  • Example: (diode OR solid-state) AND laser [search contains "diode" or "solid-state" and laser]
  • Example: (photons AND downconversion) - pump [search contains both "photons" and "downconversion" but not "pump"]
  • Improve efficiency in your search by using wildcards.
  • Asterisk ( * ) -- Example: "elect*" retrieves documents containing "electron," "electronic," and "electricity"
  • Question mark (?) -- Example: "gr?y" retrieves documents containing "grey" or "gray"
  • Use quotation marks " " around specific phrases where you want the entire phrase only.
  • For best results, use the separate Authors field to search for author names.
  • Use these formats for best results: Smith or J Smith
  • Use a comma to separate multiple people: J Smith, RL Jones, Macarthur
  • Note: Author names will be searched in the keywords field, also, but that may find papers where the person is mentioned, rather than papers they authored.

Photonics Research

Lan Yang, Editor-in-Chief

Photonics Research

Editorial Board

Search this Journal

Submit a paper, sign up for alerts, newest articles view all.

article thumbnail

Feature Issues View All

  • Advancing Integrated Photonics: From Device Innovation to System Integration Status In Progress Editors Liang Feng, Junqiu Liu, and Cheng Wang
  • Optical Microresonators Status Published Editors Yun-Feng Xiao, Kartik Srinivasan, Pascal Del'Haye, and Mengjie Yu
  • Optical Metasurfaces: Fundamentals and Applications Status Published Editors Guixin Li, Thomas Pertsch, Shumin Xiao, and Arka Majumdar

Journal News View All

22
MAY
2024
Editor-in-Chief Nominations: Announcing a for Editors-in-Chief of JOSA B and . Nominations due 17 June.
 
28
FEB
2024
Now Published! multi-journal special collection.
 

Today's Top Downloads

Topics in Current Issue

  • ( Login to access My Topics )
  • Image Processing and Image Analysis ( 1 )
  • Imaging Systems, Microscopy, and Displays ( 4 )
  • Instrumentation and Measurements ( 4 )
  • Integrated Optics ( 3 )
  • Lasers and Laser Optics ( 2 )
  • Nonlinear Optics ( 1 )
  • Optical and Photonic Materials ( 1 )
  • Optical Devices ( 3 )
  • Optoelectronics ( 2 )
  • Physical Optics ( 3 )
  • Quantum Optics ( 2 )
  • Silicon Photonics ( 1 )
  • Surface Optics and Plasmonics ( 1 )

Photonics Research Videos

Photonics Research Interviews feature prominent researchers discussing their career with the Journal's editors. View more episodes .

Spotlight on Optics Visit Spotlight on Optics

spotlight thumbnail

Dynamically reconfigurable metasurfaces have emerged as an unprecedented platform for on-chip manipulation of...

The core elements of an imaging system, which include an illumination source and a lens assembly, have remained...

Light sheet fluorescence microscopy is a technique used to image an extremely thin section of a biological sample,...

On the Cover: Photonics Research Read more

14
MAY
2024
 
13
MAY
2024
 
10
MAY
2024
 
29
MAR
2024
 
28
MAR
2024
 
01
FEB
2024
 

About this Journal

Topic Scope: The journal publishes fundamental and applied research progress in optics and photonics. Topics include, but are not limited to, lasers, LEDs and other light sources; fiber optics and optical communications; imaging, detectors and sensors; novel materials and engineered structures; optical data storage and displays; plasmonics; quantum optics; diffractive optics and guided optics; medical optics and biophotonics; ultraviolet and x-rays; terahertz technology. Photonics Research is a joint publishing effort of the Optica Publishing Group and Chinese Laser Press.

Special Collections

phonetics research

Photonics Research celebrated its 10 th Anniversary in 2023. View the top-cited articles across ten years.

Field Error

  • Publishing Home
  • Conferences
  • Preprints (Optica Open)
  • Information for
  • Open Access Information
  • Open Access Statement and Policy
  • Terms for Journal Article Reuse
  • Other Resources
  • Optica Open
  • Optica Publishing Group Bookshelf
  • Optics ImageBank
  • Optics & Photonics News
  • Spotlight on Optics
  • Optica Home
  • About Optica Publishing Group
  • About My Account
  • Sign up for Alerts
  • Send Us Feedback
  • Go to My Account
  • Login to access favorites
  • Recent Pages

Login or Create Account

  • DOI: 10.1016/j.wocn.2023.101275
  • Corpus ID: 264091122

Advancements of phonetics in the 21st century: Theoretical and empirical issues of spoken word recognition in phonetic research

  • Natasha Warner
  • Published in J. Phonetics 1 November 2023
  • Linguistics
  • J. Phonetics

129 References

Who speaks "kid" how experience with children does (and does not) shape the intelligibility of child speech., prosodic phrasing mediates listeners' perception of temporal cues: evidence from the korean accentual phrase, the use of exemplars differs between native and non-native listening, the lombard intelligibility benefit of native and non-native speech for native and non-native listeners, processes in connected speech, intelligibility and recall of sentences spoken by adult and child talkers wearing face masks, a perceptual learning approach for dysarthria remediation: an updated review., face mask type affects audiovisual speech intelligibility and subjective listening effort in young and older adults, spoken word recognition in a second language: the importance of phonetic details, face masks and speaking style affect audio-visual word recognition and memory of native and non-native speech, related papers.

Showing 1 through 3 of 0 Related Papers

  • Tools and Resources
  • Customer Services
  • Applied Linguistics
  • Biology of Language
  • Cognitive Science
  • Computational Linguistics
  • Historical Linguistics
  • History of Linguistics
  • Language Families/Areas/Contact
  • Linguistic Theories
  • Neurolinguistics
  • Phonetics/Phonology
  • Psycholinguistics
  • Sign Languages
  • Sociolinguistics
  • Share This Facebook LinkedIn Twitter

Article contents

The phonetics of prosody.

  • Amalia Arvaniti Amalia Arvaniti Professor of English Language and Linguistics, Faculty of Arts, Radboud University
  • https://doi.org/10.1093/acrefore/9780199384655.013.411
  • Published online: 30 July 2020

Prosody is an umbrella term used to cover a variety of interconnected and interacting phenomena, namely stress, rhythm, phrasing, and intonation. The phonetic expression of prosody relies on a number of parameters, including duration, amplitude, and fundamental frequency (F0). The same parameters are also used to encode lexical contrasts (such as tone), as well as paralinguistic phenomena (such as anger, boredom, and excitement). Further, the exact function and organization of the phonetic parameters used for prosody differ across languages. These considerations make it imperative to distinguish the linguistic phenomena that make up prosody from their phonetic exponents, and similarly to distinguish between the linguistic and paralinguistic uses of the latter. A comprehensive understanding of prosody relies on the idea that speech is prosodically organized into phrasal constituents, the edges of which are phonetically marked in a number of ways, for example, by articulatory strengthening in the beginning and lengthening at the end. Phrases are also internally organized either by stress, that is around syllables that are more salient relative to others (as in English and Spanish), or by the repetition of a relatively stable tonal pattern over short phrases (as in Korean, Japanese, and French). Both types of organization give rise to rhythm, the perception of speech as consisting of groups of a similar and repetitive pattern. Tonal specification over phrases is also used for intonation purposes, that is, to mark phrasal boundaries, and express information structure and pragmatic meaning. Taken together, the components of prosody help with the organization and planning of speech, while prosodic cues are used by listeners during both language acquisition and speech processing. Importantly, prosody does not operate independently of segments; rather, it profoundly affects segment realization, making the incorporation of an understanding of prosody into experimental design essential for most phonetic research.

  • suprasegmentals

1. Introduction

Prosody is an umbrella term used to cover a variety of interconnected and interacting phenomena, namely stress, rhythm, phrasing, and intonation. A term that was extensively used in the past and remains popular today is the term suprasegmentals ; it is the title of Lehiste’s classic monograph on the topic ( Suprasegmentals , 1970 ), and also used in Ladd ( 2008 , chap. 1). The term suprasegmentals will be avoided here as it alludes to a two-layered view of speech, whereby consonants and vowels constitute one layer and prosody is seen as the icing on a cake, a decorative and optional component that does not interfere with the integrity of the main segmental layer. This metaphor is evident in descriptions of speech as being produced “without prosody” (e.g., Conderman & Strobel, 2010 ; Wingfield, Lahar, & Stine, 1989 ; Witteman, van Heuven, & Schiller, 2012 ). Although for the purposes of analysis the principled distinction between segments and various components of prosody is desirable, the idea that segments are independent of prosody does not hold at the phonetic level: not only is it impossible to produce speech without prosody, but segments are strongly influenced by all aspects of prosodic structure, as discussed in some detail in this article.

What is often meant by speech without prosody is the absence of certain marked patterns associated with emotion and affect (also known as affective , emotive , or emotional prosody ). This use of the term prosody is not covered here, as it refers to paralinguistic functions of the phonetic parameters that also encode (linguistic) prosody. Though paralinguistic phenomena are beyond the scope of the present article, it is worth considering why this confusion has arisen and how one can make a principled distinction between (linguistic) prosody and paralinguistics. In the words of Ladd ( 2008 , p. 34) “paralinguistic messages deal primarily with basic aspects of interpersonal interaction—such as aggression, appeasement, solidarity, condescension—and with the speaker’s current emotional state—such as fear, surprise, anger, joy, boredom.” As such, paralinguistic information can often be conveyed even in the absence of a linguistic signal; for example, anger can be detected even when listening to low-pass filtered speech or a language unknown to the listener (Ladd, 2008 , chap. 1; but see Chen, Gussenhoven, & Rietveld, 2004 , on language-specific aspects of such interpretations). Arvaniti ( 2007 ) argues that a possible diagnostic criterion for paralinguistic phenomena is the gradience of both the acoustic parameters used to express them and of what they signify; for example, greater pitch range expansion indicating a greater degree of surprise. This definition is very close to what Bolinger ( 1961 ) has referred to as gradience (for a discussion, see Ladd, 2014 , chap. 4).

One reason for the conflation between prosody and paralinguistics is that the acoustic parameters used to encode linguistic prosodic distinctions are also used to convey paralinguistic information: for example, pitch is the main exponent of intonation but is also used paralinguistically to express excitement, boredom, and anger (see also section 5 ). In addition, however, the conflation of paralinguistics and prosody is related to the ubiquitous confounding of the linguistic phenomena that are part of prosody with their phonetic exponents; as an example, it is often the case that the term intonation , one of the components of prosody, is used as a synonym for fundamental frequency (F0), the main phonetic exponent of intonation. In order to avoid this confusion, here the linguistic components of prosody will be kept distinct from the phonetic parameters used for their realization.

A final issue to consider is that many of the phonetic exponents of prosody are also used to encode lexical contrast. Thus, F0 is the prime exponent of both intonation and lexical meaning in languages with tone, such as Cantonese, Thai, or Igbo. This in itself is not an issue, but it should be borne in mind that speakers unfamiliar with a given language are prone to interpreting the use of prosodic parameters according to how they are organized and used in their own linguistic system. This is amply demonstrated in studies of L2 prosody, and the development of creoles (Gooden, Drayton, & Beckman, 2009 ; Ortega-Llebaria, Hong, & Fan, 2013 ; Ortega-Llebaria, Nemogá, & Presson, 2017 ; Qin, Chien, & Tremblay, 2017 ; Skoruppa, Cristià, Peperkamp, & Seidl, 2011 ; Tremblay, Broersma, & Coughlin, 2018 ). For instance, a native speaker of English may interpret as stress the high falling pitch of a lexically accented syllable in Japanese or the pitch rise on Korean phrase-initial syllables because in English stressed syllables are often associated with high or rising pitch (Beckman, 1986 ; de Jong, 1994 ). In Japanese, however, pitch accent does not cue prominence (Beckman, 1986 ), while in Korean the pitch rise is a phrasal, not a stress-related, phenomenon (Jun, 2005a ). Taking into consideration the possibility of such cross-linguistic differences is essential when studying prosody.

Stress is a phenomenon that straddles the divide between lexical and postlexical levels: while word stress is a lexical property, stress applies to entire utterances as well (and is sometimes referred to as sentence stre ss; for a discussion, see Ladd, 2008 , chap. 6). Here, stress is included both because it operates at the phrasal level, and because many of its phonetic exponents are traditionally seen as part of prosody. Stress is not a phonetic property as such; rather, the term refers to the fact that in many languages one or more syllables in a word stand out relative to the rest, with the differences leading to alternations in prominence at the phrasal level as well. For instance, native speakers of English are likely to agree that in subject (n.) the first syllable is more prominent than the last, that is, stressed, while the reverse is the case with subject (v.). Such differences in relative salience can be phonetically achieved in a number of ways that vary substantially across languages that have stress.

The primary function of stress is culminative , that is, stress makes syllables stand out relative to others, a function that has repercussions for rhythm as detailed in section 3 . In addition, in languages in which the location of stress can vary, stress may also have contrastive function , that is, lead to a change in meaning, as in subject (n.) versus subject (v.). In some languages, such as Spanish and Greek, the functional load of stress is significant, while in others, such as English, it is limited to a small set of lexical items. Finally, stress has delimitative function in languages in which its position is fixed, as in Hungarian and Finnish in which stress always falls on the first syllable of a word. Something similar applies in English, in which 85% of content words start with a stressed syllable (Cutler & Carter, 1987 ). Statistical probabilities of this sort aid speech segmentation, processing, and acquisition (among many, Cutler, 2015 ; Skoruppa et al., 2011 ).

The fact that from a linguistic perspective stress expresses relations of relative salience (e.g., Hayes, 1995 ; Ladd, 2008 , chap. 6; Liberman & Prince, 1977 ) has led to assumptions that stressed syllables must necessarily be acoustically prominent as well. Phonetically, however, stress is not a uniform phenomenon as the view of stress as acoustic prominence implies. This is evident if one considers the findings on the connection between stress and duration. On the one hand, many studies show that stressed vowels are longer than unstressed ones (e.g., Beckman, 1986 , on English; Sluijter & van Heuven, 1996 , on Dutch; Arvaniti, 2000 , on Greek; Ortega-Llebaria & Prieto, 2011 , on Catalan and Spanish; Farnetani & Kori, 1990 , and D’Imperio & Rosenthal, 1999 , on Italian; Garellek & White, 2015 , on Tongan; Yakup & Sereno, 2016 , on Uyghur). On the other hand, stressed vowel duration is also affected by a number of additional parameters, such as the position of the stressed syllable in the word (D’Imperio & Rosenthal, 1999 , on Italian), the presence of pitch accent and focus (e.g., Botinis, 1989 , on Greek; Sluijter & van Heuven, 1996 , on Dutch), the interaction of stress, accent, and boundary lengthening (Katsika, 2016 , on Greek), and the level of stress involved (e.g., Farnetani & Kori, 1990 , Arvaniti, 1992 , 1994 , and Garellek & White, 2015 , found no evidence for durational effects of secondary stress in Italian, Greek, and Tongan, respectively). Finally, there are languages like Welsh in which stressed syllables are shorter than unstressed ones (Williams, 1985 ). In short, although durational differences associated with stress are present in many languages, stress alone cannot explain all durational distinctions.

In addition to duration, stress is associated with greater amplitude, a view that harks back to Stetson ( 1951 ) and the connection between stress and chest pulses. This view is not strongly supported by studies measuring average intensity, in that consistent differences are found in some languages (e.g., Garellek & White, 2015 , on Tongan) but not others (e.g., Arvaniti, 2000 , on Greek). However, when intensity differences are combined with duration, they often lead to consistently greater amplitude integral (Beckman, 1986 , on English; Arvaniti, 2000 , on Greek; Ortega-Llebaria & Prieto, 2007 , on Spanish). Amplitude integral combines the average intensity of a signal with its duration to give a measure of loudness that integrates the effect of duration on loudness (Beckman, 1986 ; Lieberman, 1960 ). This measurement is based on the fact that a longer sound will sound louder than a shorter sound with the same average intensity (Moore, 2012 ).

An alternative measure of loudness was proposed by Sluijter and van Heuven ( 1996 ) in their study of Dutch stress: they found that stressed vowels have greater spectral balance, that is, they show a smaller reduction in the amplitude of higher frequencies or less spectral tilt. Sluijter and van Heuven ( 1996 ) associated this difference with greater vocal effort. Their findings about Dutch stress were replicated for some languages (e.g., Polish, Macedonian, and Bulgarian; Crosswhite, 2003 ), but not consistently so: Campbell and Beckman ( 1997 ) found spectral tilt differences only between accented and unstressed vowels in American English, while Garellek and White ( 2015 ) found no spectral tilt effect in Tongan.

Stress-related differences also pertain to vowel quality: in English, for example, the vowel of the second syllable of subject (n.) is reduced to a schwa, [ˈsʌbdʒəkt], while the reverse obtains in subject (v.), [səbˈdʒɛkt]. Vowel quality differences are essential in determining stress in English, as indicated by vowel alternations like those found in photograph [ˈfəʊtəgrɑːf] versus photography [fəˈtɒgrəfi] (e.g., Beckman & Edwards, 1994 ; see Cutler, 2015 , for a review). Similarly to English, the distinction between stressed and unstressed vowels is phonologized in Italian, albeit to a much smaller extent: Italian has a seven vowel system, [i e ɛ a ɔ o u], but the distinction between open-mid and close-mid vowels is neutralized in unstressed position (Rogers & d’Arcangeli, 2004 ). In other languages, however, changes in quality, though evident, are not substantial and not always consistent across speakers and vowels (Sluijter & van Heuven, 1996 , on Dutch; Fourakis, Botinis, & Katsaiti, 1999 , on Greek; Ortega-Llebaria & Prieto, 2011 , on Catalan and Spanish; Adamou & Arvaniti, 2014 , on Romani). On the other hand, Garellek and White ( 2015 , p. 23), who also found no significant changes in vowel quality between stressed and unstressed vowels in Tongan, report differences in voicing quality, which indicate that stressed vowels are clearer (i.e., less noisy or breathy) than unstressed vowels.

Although the investigation of stress has often focused on vowels, there is evidence that stress affects consonants as well. Suomi and Ylitalo ( 2004 ) report that stress leads to longer consonant durations in Finnish. In German, VOT for voiceless stops is longer in stressed syllables (Haag, 1979 ). In English, lenited forms of some consonants appear in unstressed syllables: simplifying somewhat, in American English, /t/ and /d/ are flapped intervocalically when they are onsets of unstressed syllables, as in city > [ˈsɪɾi], while in British English, intervocalic /t/ is realized as a glottal stop in the same context; for example, city > [ˈsɪʔi].

A way to unify these observations from a number of different languages is offered by de Jong’s analysis of stress as localized hyperarticulation (de Jong, 1995 ). De Jong borrows the term hyperarticulation from Lindblom’s H&H theory (Lindblom, 1990 ), according to which variation in speech can be accounted for by positing a continuum from hypo- to hyperarticulation. The ends of the continuum reflect two competing forces on articulation, economy of effort (which leads to hypoarticulation), and the need to be understood (which leads to hyperarticulation). De Jong ( 1995 , p. 491) posits that “stress involves a localized shift toward hyperarticulate speech.” Although de Jong’s data come from English, hyperarticulation can unify the results of cross linguistic studies like those discussed in this section, in that in all languages stressed syllables are hyperarticulated in some way. Hyperarticulation may be manifested as increases in duration, amplitude, or both, changes in phonation that may lead to changes in spectral characteristics, and changes in vowel quality. Viewing stress as localized hyperarticulation is also consistent with the results of articulatory studies (among many, Beckman, Edwards, & Fletcher, 1992 ; Cho & Keating, 2009 ; Harrington, Fletcher, & Roberts, 1995 ), and offers an explanation for various types of phonologized vowel reduction in unstressed syllables, as in English and Italian.

The cross-linguistic variation in the realization of stress is illustrated in Figure 1 , which shows the English word banana as pronounced by a speaker of American English, namely, [bəˈnænə], and Figure 2 , which show its Greek cognate [bɐˈnɐnɐ]. The differences in duration and spectral changes are significant between the two renditions and extend to the function words as well: a in English is produced as a schwa, while [mɲɐ] ‘a/one’ in Greek shows no such reduction. Nevertheless, to native speakers of each language, the middle syllable of banana stands out and is considered stressed. To speakers of English, the stressed syllable of the Greek rendition may not sound very prominent, but it is so to native speakers of Greek (cf. Arvaniti & Rathcke, 2015 ; Protopapas, Panagaki, Andrikopoulou,Gutiérrez Palma, & Arvaniti, 2016 , among many). It is equally important to recognize that to speakers of Greek the change of vowel quality in the American English version does not make the middle syllable more prominent, because they do not associate stress with changes in vowel quality.

Figure 1. Spectrogram of a banana , as pronounced by a speaker of American English: [ə bəˈnænə].

Figure 2. Spectrogram of [mɲɐ bɐˈnɐnɐ ] ‘a banana’ , as pronounced by a speaker of Standard Greek.

A common misconception about the phonetic correlates of stress relates to F0. Specifically, it is often said that stressed syllables have high or rising pitch, and many studies of stress include an investigation of F0 along these lines (e.g., Ortega-Llebaria & Prieto, 2011 , on Spanish and Catalan; Gordon & Applebaum, 2010 , on Turkish Kabardian; Garellek & White, 2015 , on Tongan). These claims can be traced back to Fry ( 1958 ). Fry manipulated a number of acoustic parameters in pairs of English words like subject (n.) and subject (v.) and showed that changes in F0 outweighed those of duration and intensity in inducing a change in the perceived location of stress. Fry ( 1958 ) interpreted this result as evidence that F0 is the most important correlate of stress.

The problem with Fry’s experiments is that they confounded stress with intonation. Fry ( 1958 ) assumed that he was testing word stress, but his stimuli were one-word utterances; thus, his tests conflated word stress with accentuation (or sentence stress ), which is expressed primarily by means of F0 (see section 5 ). Specifically, in a language like English, certain pitch movements, known as pitch accents , are expected to co-occur with stressed syllables: when listeners hear a word like subject accented on the first syllable, they assume the accent is there because that syllable is stressed, even if its vowel quality, duration, or intensity are not ideal. In the words of Francis Nolan (cited in Ladd, 2008 , p. 54), pitch is prominence cueing not prominence lending . In short, the relationship between stress and F0 is an indirect one: stressed syllables are docking sites for pitch movements, but whether these pitch movements will occur at all and of what type they will be is not determined by stress but by intonation (see also section 5 , and Gordon, 2014 , for a review).

The following examples further illustrate this point. Figures 3 and 4 show the same phrase, uttered in two different ways: in Figure 3 , both the stressed syllable of Isabel and that of Dunham show high pitch, while in Figure 4 the first (and stressed) syllable of Isabel has low pitch instead. The difference in pitch between the two utterances is one of intonation. The utterance in Figure 3 is a typical response to a run-of-the-mill question, such as who was on the phone? The utterance in Figure 4 queries the interlocutor’s contribution to the common ground; it would be used, for instance, if the speaker had just been told that clumsy Isabel Dunham, and not her gifted sister Mary, won gold in gymnastics; it could be followed by are you sure you don’t mean MARY Dunham? As these two examples illustrate, there is no direct connection between stress and high pitch, and syllables remain stressed even when they have low pitch (as their duration, amplitude, and spectral characteristics in Figures 3 and 4 indicate). This last point is further illustrated in Figures 5 and 6 with words that form a minimal pair based on the location of stress: in both Figure 5 , MARY’s the óbject of inquiry , and Figure 6 , MARY will objéct to the inquiry , F0 is flat on the word object ; however, differences in the duration and quality of the vowels are evident and sufficient to indicate the difference in stress between the noun in Figure 5 and the verb in Figure 6 (cf. Huss, 1978 ).

Figure 3. Spectrogram and F0 contour of Isabel Dunham , uttered as a statement (e.g., as an answer to the question who was on the phone? ).

Figure 4. Spectrogram and F0 contour of Isabel Dunham , uttered as a question (e.g., as a response to a piece of news about Isabel Dunham that the speaker considers to more likely apply to her sister, Mary).

In conclusion, there is no direct connection between stress and objective acoustic measures of prominence, in that different languages may indicate stress in a number of ways (though F0 is unlikely to be a direct correlate of stress). These cross-linguistic findings have implications for phonetic research: they suggest that it is not possible to determine whether a language has stress by simply measuring the duration, intensity, or spectral characteristics of segments. This is both because parameters that encode stress differ across languages, and also because not all languages have stress, so acoustic prominence may be the outcome of phrasal processes instead. For instance, syllables initial to the accentual phrase in Korean are articulatorily strengthened and show resistance to coarticulation (Cho & Keating, 2009 ). Neither of these phenomena is related to stress, however, as Korean does not have stress (Jun, 2005a ). Avoiding the temptation to interpret such acoustic effects as exponents of stress is important. Further, in order to understand the contribution of F0 and disentangle intonation from stress, it is essential to consider data from utterances with different tunes, rather than base conclusions on declarative utterances only (in which the connection between stress and high F0 is most likely to be manifested). The exploration of whether a system has stress should start with phonological observations, taking into consideration morphophonological alternations that may lead to alternations in vowel quality, processes like blending and hypocoristic formation, and the potential role of stress in acquisition and processing (for criteria, see Gordon, 2011 ). The answer may be that, like Ambonese Malay, a language does not have stress or other phenomena with primarily culminative function (Maskikit-Essed & Gussenhoven, 2016 ).

Figure 5. Spectrogram and F0 contour of MARY is the object of inquiry (with focus on Mary ).

Figure 6. Spectrogram and F0 contour of MARY will object to the inquiry (with focus on Mary ).

There is no good and generally accepted definition of speech rhythm. A definition based on the psychology of rhythm is adopted here, namely that rhythm is an abstraction that relies on perceiving constituents in speech as groups of a similar and repetitive pattern. This definition, however, is not generally accepted. In much of the phonetic literature rhythm has been confounded with timing, that is, with duration patterns, and specifically with the idea that languages fall into distinct categories based on keeping some constituent constant in duration. This idea can be traced back to impressionistic work on English from the early 20th century , which eventually gave rise to the notion of isochrony and rhythm classes: languages are said to be stress-, syllable-, or mora-timed depending on whether the unit that is supposed to show stable (i.e., isochronous) duration is the stress foot, the syllable, or the mora, respectively. Experimental research starting with Classe ( 1939 ) and continuing to at least the 1980s has failed time and again to find evidence of isochrony in production, leading some authors to advocate that rhythm classes are a perceptual illusion of speakers of various Germanic languages (Roach, 1982 ; for reviews see Arvaniti, 2009 , 2012a , in press-a ).

Along similar lines to Roach ( 1982 ), Dauer ( 1983 ) argued that syllable-timing (and by extension mora-timing) is not a plausible basis for rhythm and proposed instead that languages form a rhythm continuum from least to most stressed-based . According to Dauer ( 1983 ), a language’s placement on the continuum is determined by the prominence of its stress exponents. Although Dauer’s equating of acoustic prominence with stress is problematic, as discussed in section 2 , her conceptualization of rhythm is closer to the understanding of rhythm in psychology and musicology.

3.1. Rhythm Classes and Rhythm Metrics

Dauer’s main point of a stress-based rhythm continuum was largely ignored in subsequent research, but a small subset of her criteria for determining stress salience formed the basis of rhythm quantification in Ramus, Nespor, and Mehler ( 1999 ). The aim of Ramus et al. ( 1999 ) was to quantify timing differences related to rhythm class, a concept they considered uncontroversial in linguistics. They argued that so-called stress-timed languages like English have more varied syllable structures and greater vowel reduction than syllable- and mora-timed languages, and that this is reflected in differences in the duration of vocalic and consonantal stretches of speech. In addition, Ramus et al. argued that these differences can be used during acquisition to resolve the bootstrapping problem by allowing infants to pay selective attention to one of the timing units (for arguments against this view see later in this section). Ramus et al. ( 1999 ) tested a number of measures and concluded that %V, the percentage of vocalic intervals, and Δ ‎ C, the standard deviation of consonantal intervals, best capture the differences they argued exist between rhythm classes.

Since Ramus et al. ( 1999 ), a number of additional metrics have been proposed, for example the pairwise variability indices or PVIs (Grabe & Low, 2002 ), and Varcos, standard deviations divided by the mean (Dellwo, 2006 ). Variations on these metrics and several additional measures have also been proposed. Frota and Vigário ( 2001 ) proposed the use of standard deviations of normalized percentages. Wagner and Dellwo ( 2004 ) proposed a measure similar to the PVIs but based on z-transformed syllable durations. Nolan and Asu ( 2009 ) used PVIs on syllable and foot durations. Many of these metrics have been widely applied in fields ranging from forensic work and L2 phonetics to the study of acquisition and atypical speech (e.g., Hannon, Lévêque, Nave, & Trehub, 2016 , on acquisition; White & Mattys, 2007 , on L2; Liss et al., 2009 , on atypical speech; Harris, Gries, & Miglio, 2014 , on forensic phonetics).

Despite their popularity, metrics are fraught with problems, both theoretical and methodological. First, metrics are implausible as measures of rhythm during acquisition: they require that infants retain in short-term memory chunks of speech in order to compute global statistical trends (which cannot be computed on the fly) with the aim of recognizing the rhythm class of the language they are learning. Second, while infants learning languages categorized as stress-timed can focus on stressed syllables, it is unclear what infants learning syllable- or mora-timed languages could focus on as, by definition, all syllables (or moras) should be equally plausible word onsets. Empirical evidence does not support the inevitable conclusion stemming from this view, namely that infants learning syllable- or mora-timed languages face greater challenges (among many, Tzakosta, 2004 , on Greek; Pons & Bosch, 2010 , on French and Spanish). Third, metrics are circular: as there is no independent evidence for rhythm classes, metrics are used both to determine class affiliation and to support the notion that rhythm classes exist (for a detailed critique, see Arvaniti, 2009 ).

In addition, metrics are problematic as measures, even if circularity and implausibility are ignored. First, they are volatile and affected by a large number of factors. Renwick ( 2013 ), and Horton and Arvaniti ( 2013 ) have independently shown that %V, the measure often said to be the most stable and accurate predictor or rhythm class, strongly correlates with the number of closed syllables in the speech sample, independently of the language tested. Additionally, metric scores show significant interspeaker variation, and are also affected by the overall segmental composition of the speech sample used (Wiget et al., 2010 ; Arvaniti, 2012a ), and the method of eliciting it (Arvaniti, 2012a ). The effect size of these factors is larger than that of language, indicating larger variability within than across languages (Arvaniti, 2012a ). Second, the exact effects of such factors on metrics are unpredictable; this is reflected in the fact that metrics said to capture the same phenomenon (e.g., consonantal variability), do not correlate with one another (Loukina, Kochanski, Rosner, Keane, & Shih, 2011 ; Arvaniti, 2012a ; Horton & Arvaniti, 2013 ). This is because metrics are strongly influenced by local effects, such as phrase final lengthening or the irregularities present in atypical speech (e.g., Arvaniti, 2009 ; Lowit, 2014 ). This effectively means that two speech samples can yield similar scores in some metric but for entirely different reasons (Arvaniti, 2009 ). Consequently, metric scores are uninterpretable on their own; they can be interpreted only with close scrutiny of timing relationships between segments in a given speech sample, but such scrutiny is not aided by metric scores (Arvaniti, 2009 ). Finally, metrics can be problematic from a statistical perspective because they are often used in bundles, with researchers selecting results that turn out to be statistically significant according to some factor relevant to the study, a practice that increases type I error (among others, Li & Post, 2014 ; Kaminskaïa, Tennant, & Russell, 2016 ). For all these reasons, metrics provide no strong evidence in favor of rhythm classes and are not reliable or informative measures of timing in general, as has recently been argued, for example, by Post & Payne ( 2018 ).

3.2. Rhythm Classes and Perception

Perception experiments have not provided greater support for rhythm classes than production research. Many experiments are based on discrimination among either infants or adults (among many, Nazzi, Bertoncini, & Mehler, 1998 ; Nazzi, Jusczyk, & Johnson, 2000 ; Nazzi & Ramus, 2003 ). Another set of studies is based on processing (in the form of spotting or monitoring), which, it is argued, relies on syllables, morae, or feet depending on the rhythmic class of the listeners’ native language (Cutler, Mehler, Norris, & Seguí, 1986 , 1992 ; Otake, Hatano, Cutler, & Mehler, 1993 ; Cutler & Otake, 1994 ; Murty, Otake, & Cutler, 2007 ).

The argument behind the discrimination experiments is that if two languages can be discriminated from each other then they must belong to different rhythm classes. However, the premise behind these experiments is questionable, as several studies have shown that varieties of the same language can be discriminated from each other both by infants and adults (Nazzi, Jusczyk, & Johnson, 2000 , and White, Mattys, & Wiget, 2012 , on infants and adults, respectively). In addition, some discrimination experiments have led to counterintuitive results, such as Moon-Hwan ( 2004 ), who concluded that Korean is mora-timed because it was discriminated from Italian and English but not Japanese; there is no evidence, however, that the mora plays any role in Korean timing, phonology, or processing. Finally, some languages can be discriminated from both stress- and syllable-timed prototypes, a result that also sits uneasily with the premise that languages belong to distinct rhythm classes (see e.g., Ramus, Dupoux, & Mehler, 2003 , who found that Polish can be discriminated from both English and Spanish). Such results indicate that putative rhythm class is not a good explanation for discrimination.

Another problem with the discrimination experiments is that in order to force listeners to focus on timing (seen as the only exponent of rhythm in this work), studies have usually relied on flat sasasa . This is a type of modified speech in which F0 is “flat” (slightly falling throughout an utterance), while all vocalic intervals are replaced by [a] and all consonantal intervals by [s]; for example, it’s raining again would be rendered as asasasasas with the intervals corresponding to [ɪ] [tsɹ] [eɪ] [n] [ɪ] [ŋ] [ə] [g] [ɛ] [n]. There is evidence, however, that flat sasasa is not ecologically valid; for example, in Arvaniti ( 2012b ) utterances rendered into flat sasasa yielded different responses from low-pass filtered versions of the same utterances. Since both modifications retain timing characteristics but low-pass filtering is closer to actual speech, the differences in responses indicate that the percept resulting from sasasa is not close to what listeners obtain from speech.

To explore the issues with sasasa and the discrimination paradigm, Arvaniti and Rodriquez ( 2013 ) ran a series of AAX experiments with English as the standard (AA) and Danish, Spanish, Greek, Korean, and Polish as comparisons (X). Arvaniti and Rodriquez ( 2013 ) used two sasasa versions, flat sasasa , and sasasa that retained the original F0 of the utterances, and additionally manipulated the speaking rate of the stimuli so as to retain or eliminate differences in speaking rate between standards and comparison. The results showed that both speaking rate and F0 play a substantial role in driving discrimination, with effects depending on the language pair but not on putative rhythm class. When F0 and speaking rate differences were eliminated, discrimination was much weaker independently of the putative rhythm class of the languages involved. Overall, the results indicate that discrimination experiments are difficult for participants, who end up latching onto any differences they can find in the signal in order to complete the task. Critically, the results of Arvaniti and Rodriquez ( 2013 ) confirm that sasasa is not ecologically valid, in that it does not reflect the perception of speech rhythm in natural stimuli: if that were the case, changes in F0 or speaking rate would have had no effect on responses. The fact that they do indicates that listeners do not process the timing of segments independently of the other prosodic parameters present in the speech signal but, rather, they integrate prosodic information. This conclusion is supported by experiments on distal prosody (Dilley & McAuley, 2008 ; Dilley, Mattys, & Vinke, 2010 inter alia ). In short, experiments may lead to discrimination between languages for any number of reasons, while lack of discrimination does not necessarily mean that the languages involved are rhythmically related.

Similar arguments apply to studies in processing. Such studies rely mostly on variations of the spotting paradigm, whereby listeners are asked to spot (or monitor for) fragments (such as a syllable) in a continuous speech stream. Many of these experiments do show that a particular phonological constituent is salient in each language and useful to native listeners during processing (e.g., Cutler et al., 1986 , on the role of the syllable in English and French; Otake et al., 1993 , on the mora in Japanese; Murty, Otake, & Cutler, 2007 , on the mora in Tamil). More generally, studies have also confirmed the significance of stressed syllables for the acquisition and processing of so-called stress-timed languages like English and German (e.g., Schmidt-Kassow & Kotz, 2008 ; Rothermich, Schmidt-Kassow, Schwartze, & Kotz, 2010 ; Skoruppa et al., 2011 ). However, stress has also been shown to be crucial for processing in so-called syllable-timed languages, including Spanish and Greek (e.g., Soto-Faraco, Sebastián Gallés, & Cutler, 2001 ; Magne et al., 2007 ; Skoruppa et al., 2009 ; Arvaniti & Rathcke, 2015 ; Protopapas et al., 2016 ). In other words, the expected compartmentalization of stress versus syllable is not supported by experimental evidence. This should not be surprising. As Mattys and Melhorn ( 2005 ) point out, this body of literature often refers to “stress” when discussing speech processing of English; however, recognizing stress during processing requires that listeners can recognize syllables as well. In short, these findings neither prove membership to a rhythm class nor preclude the usefulness of other prosodic units during speech planning and processing.

The traditional idea of rhythm classes is not problematic only because it is unsupported by studies in production or perception. It is important to recognize that the conceptualization of rhythm as timing is problematic on cognitive grounds as well (see Arvaniti, 2009 , 2012a ; see Arvaniti, in press-a , for detailed arguments). A major problem is the implausibility of syllable-and mora-timing as rhythm mechanisms. Both represent a cadence , the simplest form of rhythm “produced by the simple repetition of the same stimulus at a constant frequency” (Fraisse, 1982 , p. 151). For a cadence to be perceived as such, however, stimuli must be sufficiently separated in time to be experienced as distinct, that is, for fusion to be avoided. According to Fraisse ( 1982 ) this temporal spacing is at least 200 ms. 1 However, the typical speaking rate of many languages classified as syllable- or mora-timed is much faster than that, with reported rates ranging from 128 to 143 ms per syllable (Dauer, 1983 , on Spanish, Greek and Italian; Pellegrino, Coupé, & Marsico, 2011 , on Italian, French, Spanish, and Japanese). At these rates, it would be extremely difficult if not impossible for each syllable or mora to be reliably perceived as a distinct beat (see, e.g., London, 2012 , chap. 2).

In addition, listeners exhibit subjective rhythmization , that is, they tend to impose a rhythmic pattern on cadences, typically grouping stimuli into trochees or iambs (Bolton, 1894 ; Woodrow, 1951 ; Fraisse, 1963 , 1982 ). This perceptual tendency is difficult to reconcile with the idea that all syllables or moras are equally prominent: even if they were all acoustically equal (and all evidence suggests they are not), they would not be perceived as such. This also begs the question: how would a child acquire a syllable- or mora-timed language if their perceptual system predisposes them not to perceive the language as such?

Further, research on rhythm perception shows that listeners can impose or maintain a rhythm without constant overt clues, particularly once a pattern is established (London, 2012 , chap. 1). In part, this is so because of dynamic attending , the fact that listeners pay selective attention to auditory events, focusing on those periodically occurring (e.g., Jones, 1981 ). Dynamic attending rests on the idea that humans cannot attend to all events (James, 1890 , cited in London, 2012 , chap. 1). Again, syllable- and mora-timing cannot be reconciled with this tendency, as they would require that speakers of so-called syllable- and mora-timed languages make no selection, and are capable of attending to all events in a rapidly paced series (cf. the issue with acquisition previously discussed). This idea is implausible, and unsupported by native speaker intuitions (e.g., Vaissière, 1991 , on French), processing (e.g., Jeon & Arvaniti, 2017 , on Korean), speech production (e.g., Chung & Arvaniti, 2013 , on Korean; Arvaniti & Rathcke, 2015 , on Greek), and acquisition (e.g., Tzakosta, 2004 , on Greek; Pons & Bosch, 2010 , on French and Spanish). In short, the idea of syllable- and mora-timing is psychologically implausible, while research on so-called syllable- and mora-timed languages shows that speakers of such languages focus either on phrasal boundaries (French, Korean) or stresses (Greek, Spanish), and rely on rhythm groups larger than the syllable or the mora.

3.3. Alternative Views on Rhythm

If rhythm is not based exclusively on the regular timing of some unit, then how is it created? As mentioned, a possibility is to see rhythm as a perceptual phenomenon, specifically the perception of speech as a series of groups of a similar and repetitive pattern (Arvaniti, 2009 ). This definition is not new; it is based on the psychological understanding of rhythm (e.g., Woodrow, 1951 , Fraisse, 1963 , 1982 ; London, 2012 ). It is also closer to the conception of rhythm used in phonology (e.g., Hayes, 1995 ), in which rhythm is seen as relying on the relative salience of constituents at several levels of the prosodic hierarchy (see section 4 ). If such a definition is adopted, then research on rhythm should focus on what phenomena could lead to listeners perceiving speech as consisting of groups of similar and repetitive pattern. Following Dauer ( 1983 ), a plausible organizational principle would be stress and the creation of stress feet, in languages that have stress. This, however, leaves open the question of how rhythm is created in languages that do not. Some suggestions are offered here.

First, the regularity that leads to the perception of rhythm may be related to alternations in duration, but this is neither necessary nor sufficient (since, as mentioned, listeners do not process timing as a dimension of the speech signal that is distinct from other prosodic parameters). In short, duration is not the only exponent of rhythm and should not be seen as such. Thus, although segmental timing is an essential component of a language’s phonetics, it deserves to be studied independently of the connection to rhythm (see, e.g., Turk & Shattuck-Hufnagel, 2000 , 2014 , and references therein).

A phonetic parameter beyond duration that may contribute to rhythm is amplitude. Tilsen and Arvaniti ( 2013 ) used empirical mode decomposition (EMD; Huang et al., 1998 ) to extract regularities from the amplitude envelope of filtered speech waveforms. This envelope displays quasi-periodic fluctuations in energy that tend to arise from (but do not completely coincide with) the alternation of vowels and consonants. Thus, for Tilsen and Arvaniti ( 2013 ) “rhythm is conceptualized as periodicity in the envelope, and greater stability of that periodicity corresponds to greater rhythmicity” (Tilsen & Arvaniti, 2013 , p. 629). Simplifying considerably, EMD extracts a number of basis functions from the signal, termed intrinsic mode functions (IMFs). Each IMF captures oscillations on a different time-scale and can be analyzed using a Hilbert transform to obtain an instantaneous phase; the instantaneous frequency ( ω ‎) of an IMF is the time derivative of phase. Tilsen and Arvaniti ( 2013 ) argue that in speech the instantaneous frequencies of the first two IMFs correspond to periodicities at the syllable-level ( ω ‎ 1 ) and foot-level ( ω ‎ 2 ), respectively. Their results show that the average ω ‎ 2 in their corpus is 2.5 Hz, a frequency that corresponds—assuming the interpretation of Tilsen and Arvaniti ( 2013 ) is correct—to recurrent beats every 400 ms. This is in line with the average foot duration reported in Dauer ( 1983 ). Further, the variance of ω ‎ 2 is comparable across the languages they examined, English and German, which are classed as stress-timed, Italian and Spanish, which are classed as syllable-timed, and Greek and Korean, which remain unclassified. The fact that ω ‎ 2 variance is similar across these languages suggests similarities in rhythmicity in languages that are traditionally considered to belong to distinct rhythm classes. In particular, it would suggest the presence of a louder element every approximately half a second and comparable levels of fluctuation from this standard in all the languages examined. The fact that this pattern applies even in Korean, a language without stress, indicates that stress is not required for grouping purposes.

In addition, research on Korean indicates that F0 may also play a part in creating rhythmic groupings. Jeon and Arvaniti ( 2017 ) found that the regular F0 pattern spanning the accentual phrase in Korean (a prosodic constituent of typically 3–4 syllables long) is more important during processing than having accentual phrases of equal duration (in number of syllables). This result agrees with previous literature on the processing of Korean (see Jeon & Arvaniti, 2017 , and references therein). A way to interpret this result is to recognize that in Korean, and possibly languages typologically similar to it such as French, rhythm may rely on the presence of a repetitive F0 pattern over short phrases, rather than on segmental timing. It is possible that changes associated with this F0 pattern give rise to the amplitude alternations reported by Tilsen and Arvaniti ( 2013 ) for Korean.

Although much more research is needed on these alternatives to the traditional view of rhythm as timing, it is important to reiterate that no acoustic parameter can be solely responsible for rhythm, due to perceptual integration, as previously mentioned. The importance of perceptual integration has been further demonstrated by a number of perceptual studies: Dilley and McAuley ( 2008 ), Kohler ( 2009 ), and Dilley et al. ( 2010 ), among others, have shown that the perception of grouping and relative prominence is influenced by changes in F0 patterns. In conclusion, moving away from the traditional rhythm class typology and considering how components of prosody may contribute to the creation of rhythm in languages with typologically distinct prosodic systems may yield the insights that have not been forthcoming in the study of speech rhythm as timing, and the adherence to the rhythm class typology.

4. Phrasing

Phrasing refers to the fact that in speech words are chunked together rather than being produced as distinct and independent elements in a string. Phrasing is critical for organizing and planning speech production, and influences perception as well (among many, Krivocapić & Byrd, 2012 ; Katsika, Shattuck-Hufnagel, Mooshammer, Tiede, & Goldstein, 2014 ; see Turk & Shattuck-Hufnagel, 2014 , for a review). Phrasing is also necessary to understand intonation (see section 5 ). Phrasing has been investigated from both a phonological and a phonetic perspective, though the two do not always agree. In order to understand the phonetic results, it is essential to understand the essential tenets of phonological accounts of phrasing.

According to phonological accounts of phrasing, words are grouped into a hierarchical prosodic structure that does not allow recursion (but see Ladd, 1988 , on phonetic evidence for limited prosodic recursion). A model implicitly adopted in much work on prosody, particularly intonation, is that proposed by Pierrehumbert and Beckman ( 1988 ; see also D’Imperio, Elordieta, Frota, Prieto, & Vigário, 2005 , for a review). This model does not assume a direct mapping from syntax (as do the models of Selkirk, 1984 , and Nespor & Vogel, 1986 ). Rather, phrasing is empirically determined, as it is affected by speaking rate, speech clarity, and the length of constituents. For instance, my girlfriend’s mother’s sister is a heavy smoker is more likely to be produced with a phrasal break after sister than is she’s a smoker . Similarly, clear speech is likely to result in shorter phrases than otherwise (Smiljanic & Bradlow, 2008 ). Further, Pierrehumbert and Beckman ( 1988 ) posit different levels depending on the language. For instance, they argue that the English prosodic hierarchy has three main levels, the prosodic word ( ω ‎), intermediate phrase (ip), and intonational phrase (IP), an analysis based on Beckman and Pierrehumbert ( 1986 ). In contrast, their analysis of Japanese prosody requires an additional level, that of the accentual phrase (AP). The AP features prominently in the prosodic analysis of French, and Korean (Fougeron & Jun, 2002 , and Jun, 2005a , respectively). An illustration of prosodic structure after Pierrehumbert and Beckman ( 1988 ) is given in ( 1 ) with a phrase from Polish (based on data from Arvaniti, Żygis, & Jaskuła, 2017 ).

An issue of phonetic interest relates to empirical evidence for prosodic structure. In phonological models, prosodic structure is said to regulate many connected speech phenomena. Following Selkirk ( 1980 ) these phenomena can be classed into the following categories:

Domain limit rules : rules apply at the edge of some prosodic domain, for example, Nespor and Vogel ( 1986 ) analyze voiceless stop aspiration in English as a domain limit rule that applies at the left edge of the foot.

Domain span rules : these apply within a specific prosodic domain; for example, the rule of s-voicing in Italian is said to apply to intervocalic /s/ within the prosodic word domain; similarly, flapping in American English can be analyzed as a domain span rule applying within the foot (Nespor & Vogel, 1986 ).

Domain juncture rules: these rules apply at the juncture between two constituents of a specific type, provided the boundary occurs within some higher constituent; for example, Dutch has an optional s-voicing rule that applies if /s/ occurs ω ‎-finally and the next ω ‎ begins with a vowel, provided both ω ‎s are part of the same intonational phrase (Gussenhoven & Jacobs, 2017 , chap. 12).

Phonetic research on rules like those discussed immediately prior, however, has shown that very often they are not categorical, as phonological models predict, but gradient, coarticulatory phenomena. Vowel deletion due to hiatus across a word boundary in Greek is a case in point. Arvaniti ( 1991 ) and Baltazani ( 2006 ) have shown that vowel deletion does not apply within the clitic group (as argued by Nespor & Vogel, 1986 ), or the small phrase z (as argued by Condoravdi, 1990 ) and is not based on vowel sonority (as argued by Malikouti-Drachman & Drachman, 1992 ). Rather, the reason for the divergence among these studies (and the fact that they do not agree on which vowel is deleted and in what contexts) has to do with the fact that in Greek most instances of vowel hiatus across a word boundary lead to vowel coalescence, not deletion (Baltazani, 2006 ; for a detailed review, see Arvaniti, 2007 ).

Seminal work on the gradient nature of connected speech phenomena phonologically described as categorical was presented by Nolan ( 1992 ), who reported EPG data on English coronal assimilation and degemination (Chomsky & Halle, 1968 ). Phonologically, this rule can be analyzed as a domain juncture rule whereby a coronal stop at the right edge of a prosodic word assimilates to the stop at the onset of the following word provided they are both in the same intermediate phrase. Nolan ( 1992 ) compared sequences such as make calls and late calls (both embedded in longer utterances). In make calls , degemination should lead to the sequence being pronounced [meɪkɔːlz]; in late calls , complete assimilation followed by degemination should lead to an identical sequence after the initial [l], that is, [leɪkɔːlz]. Nolan ( 1992 ) found, however, that sequences like late calls rarely show complete deletion of the coronal gesture; this gesture may be undershot (in that it does not result in a complete alveolar closure) and may overlap substantially in time with the velar closure, but it is rarely entirely absent. In other words, this is a pattern of gradient assimilation, resulting in traces of [t] being present in the signal. These traces affect transitions from the preceding vowel and research shows they are recoverable, that is, available to listeners during processing (Gow Jr., 2002 ). Similar results are reported by Zsiga ( 1995 ) on palatalization in American English (e.g., the palatalization of /s/ in miss you ). Zsiga ( 1997 ) also considered vowel harmony and assimilation in Igbo and concluded that while some connected speech processes are categorical others are gradient; she further argued that only the former should be formalized in phonology. In short, whether a particular pattern is absolute or gradient is a matter of empirical investigation.

Although phonetic studies have shown that some connected speech phenomena analyzed as phonological rules are in fact gradient, there are many other ways in which speakers demarcate prosodic structure. Initial boundaries, particularly those higher in the prosodic hierarchy, show articulatory strengthening (among many, Fougeron & Keating, 1997 , and Byrd, 2000 , on American English; Cho & Keating, 2001 , and Cho, Son, & Kim, 2016 , on Korean; Recasens & Espinosa, 2005 , on Catalan; Fougeron, 2001 , and Georgeton, Antolík, & Fougeron, 2016 , on French). Such strengthening can be manifested as a longer or more robust constriction of the initial consonant, but can also take other forms; for example, Dilley, Shattuck-Hufnagel, and Ostendorf ( 1996 ) showed that word-initial vowels in English are produced with glottalization, particularly if they are also initial to the intonational phrase. In addition, prosodic boundaries are associated with durational changes, particularly at the right edge, which is often associated with lengthening (Cambier-Langeveld & Turk, 1999 , on English and Dutch; Byrd & Saltzman, 2003 , on English; Nakai, Kunnari, Turk, Suomi, & Ylitalo, 2009 , on Finnish; Katsika, 2016 , and Loutrari, Tselekidou, & Proios, 2018 , on Greek; see Turk & Shattuck-Hufnagel, 2014 , for a review). In addition, prosodic boundaries may be tonally specified, particularly in languages that do not have stress. This is found with respect to the accentual phrase in French (Fougeron & Jun, 2002 ), Korean (Jun, 2005a ), Japanese (Pierrehumbert & Beckman, 1988 ), and Ambonese Malay (Maskikit-Essed & Gussenhoven, 2016 ), to mentioned but a few. The presence of tonal marking does not imply that segmental effects are lacking: Korean, for instance, is well known for its segmental changes at phrasal boundaries (Jun, 2005a ). Finally, listeners rely on these cues about prosodic phrasing during speech processing and utterance disambiguation (Hirschberg & Avesani, 2000 , Krivokapić & Byrd, 2012 , Jeon & Arvaniti, 2017 , Loutrari et al., 2018 , inter alia ).

5. Intonation

Intonation refers to the language-specific and systematic modulations of fundamental frequency (F0) that span entire utterances and have grammatical function(s), such as encoding pragmatic information and marking phrasal boundaries. As noted briefly in section 1 , the terms F0 , pitch , and intonation are often used interchangeably in the literature, a practice that has led to the confusion of linguistics and paralanguage, on the one hand, and of phonological phenomena with their phonetic exponents, on the other. To avoid this confusion, I discuss each term in some detail in section 5.1. .

5.1. Intonation, F0, and Pitch

F0, measured in Hz, is a property of the speech signal directly related to the rate of vibration of the vocal folds. F0 changes throughout an utterance in ways that relate to a number of factors. These include biological factors, such as a speaker’s age and gender: children have overall higher pitched voices than adults, and women have higher pitched voices than men (e.g., Daly & Warren, 2002 ; Warren, 2005 ; Clopper & Smiljanic, 2011 ; Graham, 2014 ; see Titze, 1994 , for an overview). These biological differences relate to the size of the larynx and the thickness and length of the vocal folds, but they are also exploited for indexical sociolinguistic purposes, so that people of similar build and biological sex may use different pitch range and have different average pitch (e.g., van Bezooijen, 1995 , on Japanese and Dutch; Yuasa, 2008 , on Japanese and American English). F0 is also used to index paralinguistic information, such as boredom, anger, or excitement (Ladd, 2008 , chap. 1; see also section 1 ).

In addition to socioindexical and paralinguistic functions, F0 serves two main linguistic purposes. First, at the lexical level, it is the prime exponent of lexical tonal contrasts, for languages that have them, such as Cantonese, Japanese, or Igbo; in these languages changes in F0 lead to changes in lexical meaning. 2 Second, at the postlexical (i.e., phrasal) level, F0 is used to mark prosodic boundaries and convey pragmatic meaning and information structure distinctions. It is these specific uses that will be referred to here as intonation , as they are part of a language’s prosodic system. Intonation is specified at the phrasal level by means of a complex interplay between metrical structure (informally, the representation of patterns of prominence), prosodic phrasing, syntax, and pragmatics; these factors determine where F0 movements will occur and of what type they will be. As discussed in section 2 , for instance, some changes in pitch synchronize with stressed syllables. It is important to note that intonation is used in all languages, whether they have lexical tone or not. Disentangling the contribution of lexical tone from that of intonation on F0 contours is not a trivial task, and is a topic on which more research is needed (for the analysis of systems combining lexical tone and intonation, see, among others, Pierrehumbert & Beckman, 1988 , and Venditti, 2005 , on Japanese; Bruce, 1977 , 2005 , on Swedish;Peng et al., 2005 , on Mandarin;Wong, Chan, & Beckman, 2005 , on Cantonese; and Downing & Rialland, 2017 , on a number of African tone languages).

F0 gives rise to the percept of pitch. There are several scales for measuring pitch but no strong consensus on which is best for the investigation of tone and intonation. Some studies use Hz (e.g., Rietveld & Gussenhoven, 1985 ; Arvaniti, Ladd, & Mennen, 1998 ), a practice that although occasionally frowned upon, is not aberrant in that the relationship between F0 and pitch is almost linear up to approximately 1,000 Hz, a threshold significantly above that of F0 in human speech (Stevens & Volkmann, 1940 ). The two pitch scales used most frequently in intonation studies are ERB (Equivalent Rectangular Bandwidth) and semitones. ERB reflects a semi-logarithmic relation between pitch and F0 in the frequencies used for intonation and has been shown to accurately reflect the relation between F0 and perceived pitch (Glasberg & Moore, 1990 ; Hermes & van Gestel, 1991 ). Semitones are a logarithmic transformation of the Hertz scale originally related to Western music. Although semitones are now increasingly used in intonation research, the evidence in favor of semitone use over ERB is sparse (Henton, 1989 ), and largely refuted (see Daly & Warren, 2001 , contra Henton, 1989 ; see also Stevens & Volkmann, 1940 ). To the author’s knowledge, the only research directly comparing various scales of pitch in intonation is Nolan ( 2003 ). Nolan asked 18 speakers to imitate the intonation of utterances produced by one male and one female talker and compared the imitated versions to the original intonation with both sets of pitch contours expressed in semitones, ERB, Hz, Mel, or Bark. He found that the differences between the two versions were smaller for semitones and ERB compared to Hz, Mel, and Bark, with differences in semitones being marginally smaller than ERB. This led Nolan ( 2003 ) to conclude that semitones best reflect how intonation is perceived. However, this conclusion rests on the assumption that speakers were accurate in their imitations. This assumption cannot be ascertained based on Nolan ( 2003 ). Further, semitones have the dubious advantage of minimizing differences between male and female speakers; this can be convenient for statistical analysis but it may well hide systematic differences related to sex and gender (as in Henton, 1989 ), which could come to light with other scales and separate by-gender analyses of data (cf. Daly & Warren, 2001 ). Since sex- and gender-related differences are valid and perceptible, eliminating them from analysis does not seem advisable. The same applies to other types of scaling differences as well; for example, Fujisaki and Hirose ( 1984 ) argue that by using a logarithmic scale of pitch they eliminated differences in the scaling of components of their model in different positions in an utterance. However, doing so may, once again, mask differences that are important for phonetic modeling and relevant for perception (e.g., Yuen, 2007 ).

Independently of the scale used, an issue faced by all researchers relates to what measurements are best for intonation research. F0 presents as a curve (with discontinuities due to voicelessness) and one of the biggest challenges is determining what elements of this curve need to be measured and accounted for. There is no consensus on this issue. Many researchers focus on straightforward measures such as measuring average F0 over specific stretches of speech that range from a syllable to entire utterances and beyond (see, e.g., many studies on stress, such as Ortega-Llebaria & Prieto, 2011 ; Gordon & Applebaum, 2010 ; Garellek & White, 2015 ). From a linguistic perspective, such measures are not particularly meaningful. In addition, they are unlikely to be representative of perception: listeners (of non-tonal languages at least) tend to perceive pitch movements as level pitch (e.g., Dilley & Brown, 2007 ; Haung & Johnson, 2010 ), and to equate this level pitch to a point between the mean and end frequency of a pitch movement (Nábělek, Nábělek, & Hirsch, 1970 ; ’t Hart, Collier, & Cohen, 1990 ). This means that listeners are likely to perceive rising pitch as high, and falling pitch as low. Given that rising pitch movements often tend to show overshoot (or peak delay ), that is, to extend beyond the syllable with which they are expected to co-occur, estimates based on averages of within syllable excursions are likely to under-estimate perceived pitch. This problem with F0 averaging may be the reason why research on pitch in relation to gender and sexual orientation has not always yielded results that matched known stereotypes (e.g., Gaudio, 1994 ; Waksler, 2001 ; Munson, McDonald, DeBoe, & White, 2006 ).

Similar comments can be made about measuring pitch dynamism . The term pitch dynamism refers to the frequency and extent of pitch excursions in a given stretch of speech. No single method of measuring dynamism is available. Gaudio ( 1994 , p. 46), following Eady ( 1982 ) measured “( 1 ) the average extent of changes in F0, using the absolute value of every pitch change (i.e., If2 - fl, If3 - f2, etc.); ( 2 ) the total number of ‘fluctuations,’ defined as changes in the pitch track from a positive to a negative slope, or vice versa; ( 3 ) the number of ‘upward’ and ‘downward’ fluctuations, defined as changes in pitch at least as great as some predetermined minimum value; and ( 4 ) the average number of fluctuations per second.” Daly and Warren ( 2001 ) used instead the first differential of the pitch curves to develop a measure of dynamism expressed in ERB/s and semitones/s. Similar measures are available in ProsodyPro (Xu, 2013 ). Although such measures can be informative, a possible issue is that they give equal weight to pitch movements that are deliberate (e.g., part of an accent), and others that may be incidental (e.g., transitions between accents; see section 5.2 ). It is unclear whether listeners attend equally to both types.

Much of the work that resorts to general descriptive measures of F0, such as average F0, is not conceived with some specific model of intonation in mind. Rather, in this work, F0 is treated as the main object of inquiry (e.g., Cooper & Sorensen, 1981 ). Other research on intonation has been couched in terms of a number of different models. Some of these models—such as INTSINT (International Transcription System for Intonation; Hirst & Di Cristo, 1998 ) and the frameworks collectively known as the British school (e.g., Crystal, 1969 ; O’Connor & Arnold, 1973 )—aim to present idealizations of F0 curves. For example, INTSINT includes the categories T (Top), H (Higher), U (Upstepped), S (Same), M (mid), D (Downstepped), L (Lower), B (Bottom), which can be used to reconstruct pitch tracks in a way that abstracts away from phonetic detail (e.g., by replacing curves with straight lines). Such models cannot easily capture useful generalizations about intonation, or describe phonetic detail. Other models, like PENTA (Parallel Encoding and Target Approximation; Xu, 2005 ) and Fujisaki ( 1983 ), focus directly on modeling F0 detail instead. For example, in PENTA, modeling success is measured based on how close the approximations remain to the original F0 curves. Such models capture the phonetics of F0 but have difficulty with intonation generalizations and some types of phonetic detail (see Arvaniti & Ladd, 2009 , and Arvaniti, 2019 , for discussions and illustrations). As argued by Arvaniti ( 2019 ), all these models, whether they focus on phonetic detail or rely on idealizations, essentially model F0 rather than intonation per se .

5.2. The Autosegmental-Metrical Model

A model that provides a principled separation between F0 curves and intonation is the autosegmental-metrical model of intonational phonology (henceforth AM). 3 By doing so, AM can account both for phonetic detail and allow for phonological generalizations (Arvaniti & Ladd, 2009 ; Arvaniti, 2019 ). The essential tenets of the model are largely based on Pierrehumbert’s dissertation ( 1980 ), with additional refinements built on experimental research and formal analysis involving a large number of languages (see also Bruce, 1977 , for an early understanding of tonal alignment and the decomposition of tunes into lexical and phrasal elements; see Ladd, 2008 , for a theoretical account; see Gussenhoven, 2004 , and Jun, 2005b , 2014 , for language surveys; see Arvaniti, in press-b , for an overview of AM).

The term autosegmental-metrical was coined by Ladd ( 1996 ) and reflects the connection between two sub-systems of phonology required to adequately account for intonation structure, an autosegmental tier representing intonation’s melodic part, and metrical structure representing phrasing and relative prominence. In AM, tunes are phonologically represented as a string of Low (L) and High (H) tones and combinations thereof (Pierrehumbert, 1980 ; Beckman & Pierrehumbert, 1986 ; Ladd, 2008 ). Tones are autosegments , abstract symbolic primitives that are independent of vowels and consonants. Their identity as Hs and Ls is determined by phonetic observation and defined in relative terms: H is used to represent tones deemed to be high in a melody with respect to the speaker’s range and other tones in the same contour; L is used to represent tones deemed to be low by the same criteria (cf. Pierrehumbert, 1980 , pp. 68–75). Tonal events (which may be composed of more than one tone) are considered morphemes with pragmatic meaning. All events in a melody contribute compositionally to the pragmatic interpretation of an utterance in tandem with propositional meaning and other pragmatic context (Pierrehumbert & Hischberg, 1990 ).

The relationship between tonal autosegments and the segmental string (often referred to as tune-text association ) is mediated by metrical structure. Specifically, in AM, tones associate either with constituent heads (informally, stressed syllables), or with phrasal boundaries. The former are referred to as pitch accents , for example, H*. The star notation reflects the fact that this tone is meant to be phonologically associated to a stressed syllable. Tones that associate with phrasal boundaries are collectively known as edge tones . All AM analyses recognize boundary tones as a type of edge tone, for example, H%. Many analyses also recognize a second type of edge tone, the phrase accent , for example, H-. Following Beckman and Pierrehumbert ( 1986 ), it is by and large understood that when both types of edge tones are posited, phrase accents associate with intermediate phrase boundaries, and boundary tones with intonational phrase boundaries. The representation and F0 contour in ( 2 ) provide an illustration of AM, using the same example as in ( 1 ).

The abstract phonological primitives of intonation are phonetically realized as tonal targets , that is, as points in the F0 contour. Tonal targets are usually turning points , such as peaks, troughs, and elbows in the contour; they are defined by their scaling and alignment . Scaling refers to the value of targets on an F0 or pitch scale; alignment refers to their position relative to segments, such as the onset of a stressed vowel or a phrase-final syllable. The representation and F0 contour in ( 3 ) illustrate the connection between phonological representation and phonetic realization, using the same example as in ( 1 ) and ( 2 ), and including the F0 track of the utterance with the tonal targets corresponding to the tune’s four tones marked as circles.

In AM, scaling is said to take place on the fly with every tone’s scaling being calculated as a fraction of the scaling of the preceding tone (Liberman & Pierrehumbert, 1984 ). There are three main influences on tonal scaling: declination, tonal context, and tonal identity. Following Pierrehumbert ( 1980 ), it is generally understood that the scaling of tones can be modeled with reference to a declining baseline that is invariant for each speaker (at a given time). The baseline is defined by its slope and a minimum value that is assumed to represent the bottom of the speaker’s range, a value that is considered stable for each speaker (Maeda, 1976 ; Menn & Boyce, 1982 ; Pierrehumbert & Beckman, 1988 ). The effect of declination is a systematic lowering of targets, though declination can be suspended (e.g., in questions), and is reset across phrasal boundaries (Ladd, 1988 ; see also Truckenbrodt, 2002 ). Listeners anticipate declination effects and adjust their processing of tonal targets accordingly (e.g., Yuen, 2007 ). L and H tones (apart from terminal L%s) are scaled above the baseline and with reference to it (cf. Liberman & Pierrehumbert, 1984 ). An exception is final peaks, which exhibit what Liberman and Pierrehumbert ( 1984 ) have called final lowering , because they are scaled lower than predicted. Final lowering has been reported in several languages with very different prosodic systems, including Japanese (Pierrehumbert & Beckman, 1988 ), Dutch (Gussenhoven & Rietveld, 1988 ), Yoruba (Connell & Ladd, 1990 , and Laniran & Clements, 2003 ), Kipare (Herman, 1996 ), Spanish (Prieto, Shih, & Nibert, 1996 ), and Greek (Arvaniti & Godjevac, 2003 ).

As mentioned, tonal alignment is defined as the position of the tonal target relative to the segmental string. Alignment is closely related to phonological association: for example, pitch accents are expected to co-occur with the metrically prominent syllable with which they are associated; boundary tones are associated with phrasal boundaries and typically realized on the phrase-final syllable (but see Gussenhoven, 2000 , for an alternative synchronization of boundary tones). The strict phonetic alignment observed by Arvaniti, Ladd, and Mennen ( 1998 ) for Greek pitch accents of the form L*+H gave rise to the notion of segmental anchoring , the idea that tonal targets anchor onto particular segments in phonetic realization. Specifically, in Greek, Arvaniti et al. found that the L* tone is synchronized with the onset of the accented syllable, while the H appears roughly 10 ms after the onset of the first post-accentual vowel. Segmental anchoring was explored in subsequent work by Ladd and colleagues (e.g., Ladd & Schepman, 2003 ; Atterer & Ladd, 2004 ; Ladd, Schepman, White, Quarmby, & Stackhouse, 2009 ). The idea of segmental anchoring also spurred a great deal of research in a variety of languages that largely supported it (among many, D’Imperio, 2001 , for Neapolitan Italian; Myers, 2003 , for Kinyarwanda; Elordieta & Calleja, 2005 , for Basque Spanish; Arvaniti & Garding, 2007 , for American English; Dalton & Ní Chasaide, 2007 , for Irish; Gordon, 2008 , for Chickasaw; Prieto, 2009 , for Catalan). However, it is not the case that such anchoring is equally strict in all languages, as demonstrated, for example, by Smiljanic ( 2006 ) for Serbian and Croatian, and by Welby and Lœvenbruck ( 2006 ) for French. Alignment variability may be related to a lack of sufficient vocalic material (e.g., Baltazani & Kainada, 2015 , on Epirus Greek; Grice, Ridouane, & Roettger, 2015 , on Berber), pertain to a specific tonal event (Frota, 2002 , Portuguese), or may simply be just a feature of a given intonational system (e.g., Arvaniti, 2016 , on Romani). One consistent finding regarding alignment is that many rising accents show peak delay , a term that refers to the fact that accentual pitch peaks appear after the syllable with which the accent is phonologically associated. This was first documented by Silverman and Pierrehumbert ( 1990 ), who examined the phonetic realization of prenuclear H* accents in American English, and has since been reported for South American Spanish (Prieto, van Santen, & Hirschberg, 1995 ), Greek (Arvaniti et al., 1998 ), Kinyarwanda (Myers, 2003 ), Catalan (Prieto, 2005 ), Irish (Dalton & Ní Chasaide, 2007 ), Chickasaw (Gordon, 2008 ), Bininj Gun-wok (Bishop & Fletcher, 2005 ), and Romani (Arvaniti, 2016 ), inter alia.

In AM, tonal targets are considered to be the sole exponents of the underlying phonological representation of intonation (but see later in this section for recent developments on this point). The rest of the contour is derived by interpolation between targets. Interpolation between targets is considered to be linear, with the exception of the sagging interpolation between H* pitch accents in English which, according to Pierrehumbert ( 1981 ), gives rise to an F0 dip between the two accentual peaks (for an alternative analysis that posits that the sag is the reflex of a low tone , see Ladd & Schepman, 2003 ).

The fact that the phonetic implementation of an AM phonological representation relies solely on the realization of its phonological tones as tonal targets means that at the phonetic level the parts of the F0 contour that are not targets do not need to be specified in order to be realized. In other words, there is no requirement for each syllable in an utterance to have some tonal specification, and in fact most syllables are not assigned a specific F0 value during production. This is referred to as underspecification in AM. Underspecification was first illustrated by Pierrehumbert and Beckman ( 1988 , pp. 13ff.) for Tokyo Japanese accentual phrases (APs). They showed that the F0 contours of APs without an accented word could be successfully modeled by positing only one H target, associated with the AP’s second mora, and one L target realized at the beginning of the following AP; the F0 slope from the H to the L target depends on the number of moras between the two. This change in F0 slope is difficult, if not impossible, to model if every mora is specified for F0, as such specifications would need to differ by AP length.

While data like those of Japanese show sparse tonal specification, AM predicts that it is also possible for an utterance to involve more tones than tone bearing units, a phenomenon known as tonal crowding . The Greek contour shown in Figure 7 , [zi] ‘is s/he alive?’, is such an instance: the phonological representation of this contour is L* (L+)H- L% (Arvaniti, Ladd, & Mennen, 2006a ), and all tones are associated with the single vowel in the utterance. In order for the tones to be realized, this vowel is significantly lengthened: in Figure 7 , it is 350 ms long. In contrast, in [epiˈzisane] ‘did they survive?’, shown in Figure 8 , there is sufficient segmental material for each tone to be realized on a different syllable, so the stressed [i] is just 120 ms long. Tonal crowding is extremely frequent, yet AM is the only model of intonation that can successfully handle it and predict its outcomes (see Arvaniti & Ladd, 2009 , and Arvaniti & Ladd, 2015 , for a comparison of the treatment of tonal crowding in AM and PENTA).

Tonal crowding is phonetically resolved in a number of ways: (a) truncation , the elision of part of the contour (Bruce, 1977 , on Swedish; Grice, 1995 , on British English; Arvaniti, 1998 , on Cypriot Greek; Grabe, 1998 , on English and German; Grabe, Post, Nolan, & Farrar, 2000 , on British English; Arvaniti & Ladd, 2009 , on Standard Greek); (b) undershoot , the realization of all tones without them reaching their targets (Bruce, 1977 , on Swedish; Arvaniti, Ladd, & Mennen, 1998 , 2000 , 2006a , 2006b , on Standard Greek; Prieto, 2005 , on Catalan; Arvaniti & Ladd, 2009 , on Standard Greek); (c) temporal realignment of tones (Silverman & Pierrehumbert, 1990 , on American English); (d) segmental lengthening, as in the example in Figure 8 , the aim of which is to accommodate the realization of all tones with as little undershoot as possible (Arvaniti & Ladd, 2009 , on Standard Greek; Grice, Savino, & Roettger, 2019 , on Bari Italian). Undershoot and temporal realignment often work synergistically giving rise to compression (e.g., Arvaniti, Żygis, & Jaskuła, 2017 , on Polish). Empirical evidence indicates that the mechanism used is specific to elements in a tune (Ladd, 2008 ; Arvaniti & Ladd, 2009 ; Arvaniti, 2016 ). Arvaniti ( 2016 ) and Arvaniti et al. ( 2017 ) in particular have argued that such different responses to tonal crowding can be used as a diagnostic to determine which parts of a tonal event are optional (those that are truncated in tonal crowding) and which are required (those that are compressed under the same conditions). Arvaniti et al. ( 2017 ) further posit that phonological representations should only include required elements.

Figure 7. Spectrogram and F0 of Greek utterance [zi] ‘is s/he alive?’, uttered as a question with a tune represented in AM as L* (L+)H- L%.

Figure 8. Spectrogram and F0 of Greek utterance [epiˈzisane] ‘did they survive?’, uttered as a question with a tune represented in AM as L* (L+)H- L%.

Despite the success of AM, several studies indicate that seeing tonal targets as points connected by linear interpolation may not provide a sufficiently accurate phonetic model of intonation, in the sense that such a model could be missing information that is critical for perception and the encoding of contrasts within an intonation system. Barnes and colleagues have shown that the pitch accents of English represented as L*+H and L+H* differ in terms of shape, the former being concave and the latter convex (Barnes, Veilleux, Brugos, & Shattuck-Hufnagel, 2012 ; Barnes, Brugos, Shattuck-Hufnagel, & Veilleux, 2013 ). This difference is not captured by the autosegmental representations of these accents, nor anticipated by linear interpolation between the L and H tones. In order to account for this difference, Barnes et al. ( 2012 , 2013 ) proposed Tonal Centre of Gravity (TCoG), a measurement that aims to capture the difference between accents perceived as predominantly low in pitch and accents perceived as predominantly high. The formula for TCoG is shown in ( 4 ).

An alternative way to think about the difference between English L*+H and L+H* would be to conceive of the L* tone of L*+H as having duration, that is, being a stretch of low F0, rather than being a point in the F0 contour. Similarly, H tones may be realized as plateaux . In some languages plateaux are used interchangeably with peaks (e.g., Arvaniti, 2016 , on Romani); in others the two are distinct, so that the use of peaks or plateaux affects the interpretation of the tune (e.g., D’Imperio, 2000 , and D’Imperio, Terken, & Piterman, 2000 , on Neapolitan Italian), the scaling of the tones involved (e.g., Knight & Nolan, 2006 , and Knight, 2008 , on British English), or both (Barnes et al., 2013 , on American English).

Data like those from plateaux , low F0 stretches, and different types of interpolation indicate that a phonetic model involving only targets as turning points and linear interpolation between them may be too simple to fully account for all phonetic detail pertaining to F0 curves. Although the perceptual relevance of additional details is at present far from clear, while there is evidence that details may not be relevant for processing tunes (’t Hart et al., 1990 ), recent research has focused on capturing such detail. Alternatives to measuring tonal targets include the use of functional Principal Component Analysis , which captures tune differences that are difficult to model in terms of targets (Gubian, Torreira, & Boves, 2015 ; Lohfink, Katsika, & Arvaniti, 2019 ), and the Synchrony approach of Cangemi, Albert, and Grice ( 2019 ), which integrates phonetic prominence and F0 information. Finally, recent studies have questioned the assumption that F0 is the sole exponent of intonation, and suggest instead that intonational categories may also be cued by additional parameters, such as changes in the duration and amplitude of segments synchronized with particular tonal events (see e.g., Niebuhr, 2012 , on German; Arvaniti et al., 2017 , on Polish; Gryllia, Baltazani, & Arvaniti, 2018 , and Lohfink et al., 2019 , on Greek).

6. Conclusion

Prosody is an important component of each language’s phonology and plays a critical role in speech production, language acquisition, and speech processing and perception. For these reasons it is important to expand research on the numerous facets of prosody. Progress, however, will be hindered if prosody, its phonetic exponents, and their role in expressing paralinguistic information are confounded in research.

Further Reading

  • Arvaniti, A. (2009). Rhythm, timing and the timing of rhythm. Phonetica , 66 , 46–63.
  • Arvaniti, A. (2016). Analytical decisions in intonation research and the role of representations: Lessons from Romani. Laboratory Phonology: Journal of the Association for Laboratory Phonology , 7 (1), 6.
  • Arvaniti, A. (2019). Crosslinguistic variation, phonetic variability, and the formation of categories in intonation. In S. Calhoun , P. Escudero , M. Tabain , & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 . Canberra, Australia: Australasian Speech Science and Technology Association.
  • Arvaniti, A. , & Ladd, D. R. (2009). Greek wh-questions and the phonology of intonation. Phonology , 26 , 46–63.
  • Beckman, M. E. , & Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In P. A. Keating (Ed.), Phonological structure and phonetic form: Papers in laboratory phonology (Vol. 3, pp. 7–33). Cambridge, UK: Cambridge University Press.
  • Beckman, M. E. , & Venditti, J. J. (2011). Intonation. In J. Goldsmith , J. Riggle , & A. C. L. Yu (Eds.), The handbook of phonological theory (pp. 485–532). Malden, MA: Wiley-Blackwell.
  • Brugos, A. , Shattuck-Hufnagel, S. , & Veilleux, N. (2006). Transcribing prosodic structure of spoken utterances with ToBI (MIT Open Courseware).
  • Chen, A. , Gussenhoven, C. , & Rietveld, T. (2004). Language-specificity in the perception of paralinguistic intonational meaning. Language and Speech , 47 , 311–349.
  • Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics , 11 , 51–62.
  • de Jong, K. J. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America , 97 (1), 491–504.
  • Dilley, L. , Mattys, S. L. , & Vinke, L. (2010). Potent prosody: Comparing the effects of distal prosody, proximal prosody, and semantic context on word segmentation. Journal of Memory and Language , 63 (3), 274–294.
  • Downing, L. J. , & Rialland, A. (Eds). (2017). Intonation in African tone languages . Berlin, Germany: De Gruyter Mouton.
  • Gordon, M. (2011). Stress: Phonotactic and phonetic evidence. In M. van Oostendorp , C. Ewen , E. Hume , & K. Rice (Eds.), The Blackwell companion to phonology (pp. 924–948). Malden, MA: Wiley-Blackwell.
  • Gordon, M. (2014). Disentangling stress and pitch accent: Toward a typology of prominence at different prosodic levels. In H. van der Hulst (Ed.), Word stress: Theoretical and typological issues (pp. 83–118). Cambridge, UK: Cambridge University Press.
  • Gussenhoven, C. (2004). The phonology of tone and intonation . Cambridge, UK: Cambridge University Press.
  • Jun, S. , & Fletcher, J. (2014). Methodology of studying intonation: From data collection to data analysis. In S. Jun (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 493–519). Oxford, UK: Oxford University Press.
  • Krivokapić, J. , & Byrd, D. (2012). Prosodic boundary strength: An articulatory and perceptual study. Journal of Phonetics , 40 , 430–442.
  • Ladd, D. R. (2008). Intonational phonology (2nd ed.). Cambridge, UK: Cambridge University Press.
  • Ladd, D. R. (2014). Simultaneous structure in phonology . Oxford, UK: Oxford University Press.
  • London, J. (2012). Hearing in time: Psychological aspects of musical meter . Oxford, UK: Oxford University Press.
  • Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation (Dissertation, MIT). Bloomington, IN: UILC. Published 1988.
  • Pierrehumbert, J. , & Beckman, E. (1988). Japanese tone structure . Cambridge, MA: MIT Press.
  • Pierrehumbert, J. , & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. R. Cohen , J. L. Morgan , & M. E. Pollack (Eds.), Intentions in communication (pp. 271–311). Cambridge MA: MIT Press.
  • Tremblay, A. , Broersma, M. , & Coughlin, C. E. (2018). The functional weight of a prosodic cue in the native language predicts speech segmentation in a second language. Bilingualism: Language and Cognition , 21 , 640–652.
  • Turk, A. , & Shattuck-Hufnagel, S. (2014). Timing in talking: What is it used for, and how is it controlled? Philosophical Transactions of the Royal Society B: Biological Sciences , 369 (1658), 20130395.
  • Adamou, E. , & Arvaniti, A. (2014). Illustrations of the IPA: Greek Thrace Xoraxane Romane. Journal of the International Phonetic Association , 44 (2), 223–231.
  • Arvaniti, A. (1991). The phonetics of Greek rhythm and its phonological implications (Unpublished doctoral dissertation). University of Cambridge.
  • Arvaniti, A. (1992). Secondary stress: Evidence from Modern Greek. In G. J. Docherty & D. R. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody (pp. 398–423). Cambridge, UK: Cambridge University Press.
  • Arvaniti, A. (1994). Acoustic features of Greek rhythmic structure. Journal of Phonetics , 22 , 239–268.
  • Arvaniti, A. (1998). Phrase accents revisited: Comparative evidence from Standard and Cypriot Greek. In R. H. Mannell & J. Robert-Ribes (Eds.), Proceedings of the 5th International Conference on Spoken Language Processing (Vol. 7, pp. 2883–2886). Sydney: Australian Speech Science and Technology Association, Incorporated (ASSTA).
  • Arvaniti, A. (2000). The phonetics of stress in Greek. Journal of Greek Linguistics , 1 , 9–38.
  • Arvaniti, A. (2007). On the relationship between phonology and phonetics (or why phonetics is not phonology). In J. Trouvain & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (pp. 19–24). Saarbrücken, Germany: University des Saarlandes.
  • Arvaniti, A. (2012a). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics , 40 , 351–373.
  • Arvaniti, A. (2012b). Rhythm classes and speech perception. In O. Niebuhr (Ed.), Prosodies: Context, function and communication (pp. 75–92). Berlin, Germany: Walter de Gruyter.
  • Arvaniti, A. (in press-a). Measuring rhythm. In R. A. Knight & J. Setter (Eds.), The Cambridge handbook of phonetics . Cambridge, UK: Cambridge University Press.
  • Arvaniti, A. (in press-b). The autosegmental-metrical model of intonational phonology. In S. Shattuck-Hufnagel & J. Barnes (Eds.), Prosodic theory and practice . Cambridge, MA: MIT Press.
  • Arvaniti, A. , & Garding, G. (2007). Dialectal variation in the rising accents of American English. In J. Cole & J. I. Hualde (Eds.), Papers in laboratory phonology (Vol. 9, pp. 547–576). Berlin, Germany: Mouton de Gruyter.
  • Arvaniti, A. , & Godjevac, S. (2003). The origins and scope of final lowering in English and Greek. In M.J. Solé , D. Recasens , & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona 3–9 August 2003 (pp. 1077–1080). Barcelona, Spain: ICPhS Organizing Committee.
  • Arvaniti, A. , & Ladd, D. R. (2015). Underspecification in intonation revisited: A reply to Xu, Li, Prom-on and Liu. Phonology , 32 (3), 537–541.
  • Arvaniti, A. , Ladd, D. R. , & Mennen, I. (1998). Stability of tonal alignment: The case of Greek prenuclear accents. Journal of Phonetics , 26 , 3–25.
  • Arvaniti, A. , Ladd, D. R. , & Mennen, I. (2000). What is a Starred Tone? Evidence from Greek. In M. Broe & J. Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition and the lexicon (pp. 119–131). Cambridge, UK: Cambridge University Press.
  • Arvaniti, A. , Ladd, D. R. , & Mennen, I. (2006a). Phonetic effects of focus and “tonal crowding” in intonation: Evidence from Greek polar questions. Speech Communication , 48 , 667–696.
  • Arvaniti, A. , Ladd, D. R. , & Mennen, I. (2006b). Tonal association and tonal alignment: Evidence from Greek polar questions and contrastive statements. Language and Speech , 49 , 421–450.
  • Arvaniti, A. , & Rathcke, T. (2015). The role of stress in syllable monitoring. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences . Glasgow, UK: The University of Glasgow.
  • Arvaniti, A. , & Rodriquez, T. (2013). The role of rhythm class, speaking rate, and F0 in language discrimination. Laboratory Phonology , 4 (1), 7–38.
  • Arvaniti, A. , Żygis, M. , & Jaskuła, M. (2017). The phonetics and phonology of the Polish calling melodies. Phonetica , 73 (3–4), 338–361.
  • Atterer, M. , & Ladd, D. R. (2004). On the phonetics and phonology of “segmental anchoring” of F0: Evidence from German. Journal of Phonetics , 32 , 177–197.
  • Baltazani, M. (2006). Focusing, prosodic phrasing, and hiatus resolution in Greek. In L. Goldstein , D. Whalen , & C. Best (Eds.), Laboratory phonology (Vol. 8, pp. 473–494). Berlin, Germany: Mouton de Gruyter.
  • Baltazani, M. , & Kainada, E. (2015). Drifting without an anchor: How pitch accents withstand vowel loss. Language and Speech , 58 (1), 84–115.
  • Barnes, J. , Brugos, A. , Shattuck-Hufnagel, S. , & Veilleux, N. (2013). On the nature of perceptual differences between accentual peaks and plateaux. In O. Niebuhr (Ed.), Understanding prosody: The role of context, function and communication (pp. 93–118). Berlin, Germany: De Gruyter.
  • Barnes, J. , Veilleux, N. , Brugos, A. , & Shattuck-Hufnagel, S. (2012). Tonal center of gravity: A global approach to tonal implementation in a level-based intonational phonology. Laboratory Phonology , 3 (2), 337–383.
  • Beckman, M. E. (1986). Stress and non-stress accent . Dordrecht, The Netherlands: Foris.
  • Beckman, M. E. , & Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In P. A. Keating (Ed.), Phonological structure and phonetic form: Papers in laboratory phonology (Vol 3, pp. 7–33). Cambridge, UK: Cambridge University Press.
  • Beckman, M. E. , Edwards, J. , & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In G. J. Docherty & D. R. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody (pp. 68–86). Cambridge, UK: Cambridge University Press.
  • Beckman, M. , Hirschberg, J. , & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 9–54). Oxford, UK: Oxford University Press.
  • Beckman, M. E. , & Pierrehumbert, J. (1986). Intonational structure in English and Japanese. Phonology Yearbook , 3 , 255–310.
  • Bishop, J. , & Fletcher, J. (2005). Intonation in six dialects of Bininj Gun-wok. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 331–361). Oxford, UK: Oxford University Press.
  • Bolinger, D. L. (1961). Generality, gradience, and the all-or-none . The Hague, The Netherlands: Mouton.
  • Bolton, T. L. (1894). Rhythm. The American Journal of Psychology , 6 (2), 145–238.
  • Botinis, A. (1989). Stress and prosodic structure in Greek: A phonological, acoustic, physiological and perceptual study . Lund, Sweden: Lund University Press.
  • Bruce, G. (1977). Swedish word accents in sentence perspective . Lund, Sweden: Gleerup.
  • Bruce, G. (2005). Intonational prominence in varieties of Swedish revisited. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 410–429). Oxford, UK: Oxford University Press.
  • Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica , 57 (1), 3–16.
  • Byrd, D. , & Saltzman, E. (2003). The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics , 31 (2), 149–180.
  • Cambier-Langeveld, T. , & Turk, A. (1999). A cross-linguistic study of accentual lengthening: Dutch vs. English. Journal of Phonetics , 27 , 171–206.
  • Campbell, N. , & Beckman, M. E. (1997). Stress, prominence, and spectral tilt. In A. Botinis , G. Kouroupetroglou , & G. Carayannis (Eds.), Intonation: Theory, models and applications (Proceedings of the ESCA Workshop on Intonation) (pp. 67–70). Athens, Greece: ESCA and University of Athens Department of Informatics.
  • Cangemi, F. , Albert, A. , & Grice, M. (2019). Modelling intonation: Beyond segments and tonal targets. In S. Calhoun , P. Escudero , M. Tabain , & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia, 2019 . Canberra, Australia: Australasian Speech Science and Technology Association.
  • Cho, T. , & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics , 29 (2), 155–190.
  • Cho, T. , & Keating, P. A. (2009). Effects of initial position versus prominence in English. Journal of Phonetics , 37 (4), 466–485.
  • Cho, T. , Son, M. , & Kim, S. (2016). Articulatory reflexes of the three-way contrast in labial stops and kinematic evidence for domain-initial strengthening in Korean. Journal of the International Phonetic Association , 46 (2), 129–155.
  • Chomsky, N. , & Halle, M. (1968). The sound pattern of English . New York, NY: Harper & Row.
  • Chung, Y. , & Arvaniti, A. (2013). Speech rhythm in Korean: Experiments in speech cycling . In Proceedings of Meetings on Acoustics (POMA): Proceedings of 21st International Congress of Acoustics , Montreal, 2–7 June 2013, 060216.
  • Classe, A. (1939). The rhythm of English prose . Oxford, UK: Basil Blackwell.
  • Clopper, C. G. , & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics , 39 (2), 237–245.
  • Conderman, G. , & Strobel, D. (2010). Fluency Flyers Club: An oral reading fluency intervention program. Preventing School Failure: Alternative Education for Children and Youth , 53 (1), 15–20.
  • Condoravdi, C. (1990). Sandhi rules of Greek and prosodic theory. In S. Inkelas & D. Zec (Eds.), Phonology-syntax connection (pp. 63–84). Chicago, IL: University of Chicago Press.
  • Connell, B. , & Ladd, D. R. (1990). Aspects of pitch realization in Yoruba. Phonology , 7 , 1–30.
  • Cooper, W. , & Sorensen, J. (1981). Fundamental frequency in sentence production . Heidelberg, Germany: Springer.
  • Crosswhite, K. (2003). Spectral tilt as a cue to word stress in Polish, Macedonian, and Bulgarian. In M.-J. Solé , D. Recasens , & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona 3–9 August 2003 (pp. 767–770). Barcelona, Spain: ICPhS Organizing Committee.
  • Crystal, D. (1969). Prosodic systems and intonation in English . Cambridge, UK: Cambridge University Press.
  • Cutler. A. (2015). Lexical stress in English pronunciation. In M. Reed & J. M. Levis (Eds.), The handbook of English pronunciation (pp. 106–124). New York, NY: John Wiley & Sons.
  • Cutler, A. , & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech & Language , 2 (3–4), 133–142.
  • Cutler, A. , Mehler, J. , Norris, D. , & Seguí, J. (1986). The syllable’s differing role in the segmentation of French and English. Journal of Memory and Language , 25 , 385–400.
  • Cutler, A. , Mehler, J. , Norris, D. , & Seguí, J. (1992). The monolingual nature of speech segmentation by bilinguals. Cognitive Psychology , 24 , 381–410.
  • Cutler, A. , & Otake, T. (1994). Mora or phoneme? Further evidence for language-specific listening. Journal of Memory and Language , 33 , 824–844.
  • Dalton, M. , & Ní Chasaide, A. (2007). Melodic alignment and micro-dialect variation in Connemara Irish. In C. Gussenhoven & T. Riad (Eds.), Tones and tunes (pp. 293–315). Berlin, Germany: Mouton de Gruyter.
  • Daly, N. , & Warren, P. (2001). Pitching it differently in New Zealand English: Speaker sex and intonation patterns. Journal of Sociolinguistics , 5 (1), 85–96.
  • Dauer, R. M. (1987). Phonetic and phonological components of language rhythm. In Proceedings of the 11th International Congress of Phonetic Sciences (pp. 447–450). Academy of Sciences of the Estonian S.S.R.
  • de Jong, K. J. (1994). Initial tones and prominence in Seoul Korean. OSU Working Papers in Linguistics , 43 , 1–14.
  • Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for deltaC. In P. Karnowski & I. Szigeti (Eds.), Language and language-processing: Proceedings of the 38th Linguistic Colloquium (pp. 231–241). Frankfurt, Germany: Peter Lang.
  • Dilley, L. C. , & Brown, M. (2007). Effects of pitch range variation on f 0 extrema in an imitation task. Journal of Phonetics , 35 (4), 523–551.
  • Dilley, L. C. , & McAuley, J. D. (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language , 59 , 294–311.
  • Dilley, L. , Shattuck-Hufnagel, S. , & Ostendorf, M. (1996). Glottalization of word-initial vowels as a function of prosodic structure. Journal of Phonetics , 24 , 423–444.
  • D’Imperio, M. (2000). The role of perception in defining tonal targets and their alignment (Unpublished doctoral dissertation). The Ohio State University.
  • D’Imperio, M. (2001). Focus and tonal structure in Neapolitan Italian. Speech Communication , 33 (4), 339–356.
  • D’Imperio, M. , Elordieta, G. , Frota, S. , Prieto, P. , & Vigário, M. (2005). Intonational phrasing in Romance: The role of syntactic and prosodic structure. In S. Frota , M. Vigário , & M. J. Freitas (Eds.), Prosodies (pp. 59–98). The Hague, The Netherlands: Mouton de Gruyter.
  • D’Imperio, M. , & Rosenthal, S. (1999). Phonetics and phonology of main stress in Italian. Phonology , 16 , 1–28.
  • D’Imperio, M. , Terken, J. , & Piterman, M. (2000). Perceived tone “targets” and pitch accent identification in Italian. In M. Barlow (Ed.), Proceedings of Australian International Conference on Speech Science and Technology (SST) (Vol. 8, pp. 201–211). Canberra: Australian Speech Science and Technology Association.
  • Eady. S. J. (1982). Differences in the F0 patterns of speech: Tone language versus stress language. Language and Speech , 25 , 29–42.
  • Elordieta, G. , & Calleja, N. (2005). Microvariation in accentual alignment in Basque Spanish. Language and Speech , 48 , 397–439.
  • Farnetani, E. , & Kori, S. (1990). Rhythmic structure in Italian noun phrases: A study of vowel durations. Phonetica , 47 , 50–65.
  • Fletcher, J. , Grabe, E. , & Warren, P. (2004). Intonational variation in four dialects of English: The high rising tune. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 390–409). Oxford, UK: Oxford University Press.
  • Fougeron, C. (2001). Articulatory properties of initial segments in several prosodic constituents in French. Journal of Phonetics , 29 , 109–135.
  • Fougeron, C. , & Jun, S. (2002). Realizations of accentual phrase in French intonation. Probus , 14 (1), 147–172.
  • Fougeron, C. , & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. The Journal of the Acoustical Society of America , 101 (6), 3728–3740.
  • Fourakis, M. , Botinis, A. , & Katsaiti, M. (1999). Acoustic characteristics of Greek vowels. Phonetica , 56 , 28–43.
  • Fraisse, P. (1963). The psychology of time . New York, NY: Harper and Row.
  • Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (pp. 149–180). New York, NY: Academic Press.
  • Frota, S. (2002). Tonal association and target alignment in European Portuguese nuclear falls. In C. Gussenhoven & N. Warner (Eds.), Laboratory phonology (Vol. 7, pp. 387–418). Berlin, Germany: Mouton de Gruyter.
  • Frota, S. , & Vigário, M. (2001). On the correlates of rhythmic distinctions: The European/Brazilian Portuguese case. Probus , 13 , 247–275.
  • Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech , 1 (2), 126–152.
  • Fujisaki, H. (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In P. F. MacNeilage (Ed.), The production of speech (pp. 39–55). Heidelberg, Germany: Springer-Verlag.
  • Fujisaki, H. , & Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (E) , 5 (4), 233–242.
  • Garellek, M. , & White, J. (2015). Phonetics of Tongan stress. Journal of the International Phonetic Association , 45 (1), 13–34.
  • Gaudio. R. P. (1994). Sounding gay: Pitch properties in the speech of gay and straight men. American Speech , 69 (1), 30–57.
  • Georgeton, L. , Antolík, T. K. , & Fougeron, C. (2016). Effect of domain initial strengthening on vowel height and backness contrasts in French: Acoustic and ultrasound data. Journal of Speech, Language and Hearing Research , 59 (6), S1575–S1586.
  • Glasberg, B. R. , & Moore, B. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research , 47 , 103–138.
  • Gooden, S. , Drayton, K. , & Beckman, M. (2009). Tone inventories and tune-text alignments: Prosodic variation in “hybrid” prosodic systems. Studies in Language , 33 (2), 354–394.
  • Gordon, M. (2008). Pitch accent timing and scaling in Chickasaw. Journal of Phonetics , 36 (3), 521–535.
  • Gordon, M. , & Applebaum, A. (2010). Acoustic correlates of stress in Turkish Kabardian. Journal of the International Phonetic Association , 40 , 35–58.
  • Gow, D. W., Jr. (2002). Does English coronal place assimilation create lexical ambiguity? Journal of Experimental Psychology: Human Perception and Performance , 28 (1), 163–179.
  • Grabe, E. (1998). Comparative intonational phonology: English and German . MPI Series in Psycholinguistics 7. Wageningen, Germany: Ponsen en Looien.
  • Grabe, E. , & Low, E. L. (2002). Acoustic correlates of rhythm class. In C. Gussenhoven & N. Warner (Eds.), Laboratory phonology (Vol. 7, pp. 515–546). Berlin, Germany: Mouton de Gruyter.
  • Grabe, E. , Post, B , Nolan, F. , & Farrar, K. (2000). Pitch accent realization in four varieties of British English. Journal of Phonetics , 28 , 161–185.
  • Graham, C. R. (2014). Fundamental frequency range in Japanese and English: The case of simultaneous bilinguals. Phonetica , 71 , 271–295.
  • Grice, M. (1995). Leading tones and downstep in English. Phonology , 12 (2), 183–233.
  • Grice, M. , Ridouane, R. , & Roettger, T. (2015). Tonal association in Tashlhiyt Berber: Evidence from polar questions and contrastive statements. Phonology , 32 (2), 241–266.
  • Grice, M. , Savino, M. , & Roettger, T. B. (2019). Tune-text negotiation: The effect of intonation on vowel duration. In S. Calhoun , P. Escudero , M. Tabain , & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 . Canberra, Australia: Australasian Speech Science and Technology Association.
  • Gryllia, S. , Baltazani, M. , & Arvaniti, A. (2018). The role of pragmatics and politeness in explaining prosodic variability . In K. Klessa , J. Bachan , A. Wagner , M. Karpiński , & D. Śledziński (Eds.), Proceedings of the 9th International Conference on Speech Prosody 2018 (pp. 158–162). Poznań.
  • Gubian, M. , Torreira, F. , & Boves, L. (2015). Using functional data analysis for investigating multidimensional dynamic phonetic contrasts. Journal of Phonetics , 49 , 16–40.
  • Gussenhoven, C. (2000). The boundary tones are coming: On the non-peripheral realization of boundary tones. In M. B. Broe & J. B. Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition and the lexicon (pp. 132–151). Cambridge, UK: Cambridge University Press.
  • Gussenhoven, C. , & Jacobs, H. (2017). Understanding phonology . New York, NY: Routledge.
  • Gussenhoven, C. , & Rietveld, A. C. M. (1988). Fundamental frequency declination in Dutch: Testing three hypotheses. Journal of Phonetics , 16 , 355–369.
  • Haag, W. K. (1979). An articulatory experiment on Voice Onset Time in German stop consonants. Phonetica , 36 (3), 169–181.
  • Hannon, E. E. , Lévêque, Y. , Nave, K. M. , & Trehub, S. E. (2016). Exaggeration of language-specific rhythms in English and French children’s songs. Frontiers of Psychology , 7 , 939.
  • Harrington, J. , Fletcher, J. , & Roberts, C. (1995). Coarticulation and the accented/unaccented distinction: Evidence from jaw movement data. Journal of Phonetics , 23 (3), 305–322.
  • Harris, M. J. , Gries, S. T. , & Miglio, V. G. (2014). Prosody and its applications to forensic linguistics . Linguistic Evidence in Security, Law and Intelligence , 2 , 11–29.
  • Hart, J. ’t , Collier, R. , & Cohen, A. (1990). A perceptual study of intonation: An Experimental-phonetic approach to speech melody . Cambridge, UK: Cambridge University Press.
  • Hayes, B. (1995). Metrical stress theory: Principles and case studies . Chicago, IL: University of Chicago Press.
  • Haung, T. , & Johnson, K. (2010). Language specificity in speech perception: Perception of Mandarin tones by native and nonnative listeners. Phonetica , 67 , 243–267.
  • Henton, C. G. (1989). Fact and fiction in the description of female and male pitch. Language & Communication , 9 (4), 299–311.
  • Herman, R. (1996). Final lowering in Kipare. Phonology , 13 , 171–196.
  • Hermes, D. J. , & van Gestel, J. C. (1991). The frequency scale of speech intonation. Journal of the Acoustical Society of America , 90 , 97–102.
  • Hirschberg, J. , & Avesani, C. (2000). Prosodic disambiguation in English and Italian. In A. Botinis (Ed.), Intonation: Analysis, modeling and technology (pp. 87–96). Dordrecht, The Netherlands: Springer.
  • Hirst, D. , & Di Cristo, A. (Eds.). (1998). Intonation systems: A survey of twenty languages . Cambridge, UK: Cambridge University Press.
  • Hirst, D. , Di Cristo, A. , & Espesser, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. In M. Horne (Ed.), Prosody: Theory and experiment, studies presented to Gösta Bruce (pp. 51–87). Dordrecht, The Netherlands: Kluwer Academic.
  • Horton, R. , & Arvaniti, A. (2013). Cluster and classes in the rhythm metrics . San Diego Linguistic Papers , 4 , 28–52.
  • Huang, N. E. , Shen, Z. , Long, S. R. , Wu, M. C. , Shih, H. H. , Zheng, Q. , . . . Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 454 , 903–995.
  • Huss, V. (1978). English word stress in the post-nuclear position. Phonetica , 35 , 86–105.
  • Hyman, L. M. (2006). Word-prosodic typology. Phonology , 23 , 225–257.
  • James, W. (1890). The principles of psychology . New York, NY: Dover Reprint.
  • Jeon, H. , & Arvaniti, A. (2017). The effects of prosodic context on word segmentation: Rhythmic irregularity and localised lengthening in Korean. The Journal of the Acoustical Society of America , 141 , 4251–4263.
  • Jones, M. R. (1981). Only time can tell: on the topology of mental space and time. Critical Inquiry , 7 , 557–576.
  • Jun, S. (2005a). Korean intonational phonology and prosodic transcription. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 201–229). Oxford, UK: Oxford University Press.
  • Jun, S. (Ed). (2005b). Prosodic typology: The phonology of intonation and phrasing . Oxford, UK: Oxford University Press.
  • Jun, S. (Ed.). (2014). Prosodic typology II: The phonology of intonation and phrasing . Oxford, UK: Oxford University Press.
  • Jun, S. , & Fletcher, J. (2014). Methodology of studying intonation: from data collection to data analysis. In S. Jun (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 493–519). Oxford, UK: Oxford University Press.
  • Kaminskaïa, S. , Tennant, J. , & Russell, A. (2016). Prosodic rhythm in Ontario French. Journal of French Language Studies , 26 (2), 183–208.
  • Katsika, A. (2016). The role of prominence in determining the scope of boundary lengthening in Greek. Journal of Phonetics , 55 , 149–181.
  • Katsika, A. , Shattuck-Hufnagel, S. , Mooshammer, C. , Tiede, M. , & Goldstein, L. (2014). Compatible vs. competing rhythmic grouping and errors. Language and Speech , 57 , 544–562.
  • Knight, R. A. (2008). The shape of nuclear falls and their effect on the perception of pitch and prominence: Peaks vs. plateaux. Language and Speech , 51 (3), 223–244.
  • Knight, R. A. , & Nolan, F. (2006). The effect of pitch span on intonational plateaux. Journal of the International Phonetic Association , 36 (1), 21–38.
  • Kohler, K. (2009). Rhythm in speech and language. A new research paradigm. Phonetica , 66 , 29–45.
  • Ladd, D. R. (1988). Declination “reset” and the hierarchical organization of utterances. Journal of the Acoustical Society of America , 84 , 530–544.
  • Ladd, D. R. (1996). Intonational phonology . Cambridge, UK: Cambridge University Press.
  • Ladd, D. R. , & Schepman, A. (2003). “Sagging transitions” between high pitch accents in English: Experimental evidence. Journal of Phonetics , 31 , 81–112.
  • Ladd, D. R. , Schepman, A. , White, L. , Quarmby, L. M. , & Stackhouse, R. (2009). Structural and dialectal effects on pitch peak alignment in two varieties of British English. Journal of Phonetics , 37 (2), 145–161.
  • Laniran, Y. O. , & Clements, G. N. (2003). Downstep and high raising: Interacting factors in Yoruba tone production. Journal of Phonetics , 31 , 203–250.
  • Lehiste, I. (1970). Suprasegmentals . Cambridge, MA: MIT Press.
  • Li, A. , & Post, B. (2014). L2 acquisition of prosodic properties of speech rhythm. Studies in Second Language Acquisition , 36 (2), 223–255.
  • Liberman, M. Y. , & Pierrehumbert, J. B. (1984). Intonational invariance under changes in pitch range and length. In M. Aronoff & R. T. Oehrle (Eds.), Language sound structure: Studies in phonology presented to Morris Halle (pp. 157–233). Cambridge, MA: MIT Press.
  • Liberman, M. , & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry , 8 , 249–336.
  • Lieberman, P. (1960). Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of America , 32 , 451–454.
  • Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H Theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modelling (Vol. 55, pp. 403–439). NATO ASI Series (Series D: Behavioural and Social Sciences). Dordrecht, The Netherlands: Springer.
  • Liss, J. M. , White, L. , Mattys, S. L. , Lansford, K. , Spitzer, S. , Lotto, A. J. , & Caviness, J. N. (2009). Quantifying speech rhythm deficits in the dysarthrias. Journal of Speech, Language, and Hearing Research , 52 (5), 1334–1352.
  • Lloyd James, A. (1940). Speech signals in telephony . London, UK: Pitman & Sons.
  • Lohfink, G. , Katsika, A. , & Arvaniti, A. (2019). Variability and category overlap in the realization of intonation . In S. Calhoun , P. Escudero , M. Tabain , & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 . Canberra, Australia: Australasian Speech Science and Technology Association.
  • Loukina A. , Kochanski, G. , Rosner, B. , Keane, E. , & Shih, C. (2011). Rhythm measures and dimensions of durational variation in speech. The Journal of the Acoustical Society of America , 129 (5), 3258–3270.
  • Loutrari, A. , Tselekidou, F. , & Proios, H. (2018). Phrase-final words in Greek storytelling speech: A study on the effect of a culturally-specific prosodic feature on short-term memory. Journal of Psycholinguistic Research , 47 (4), 947–957.
  • Lowit, A. (2014). Quantification of rhythm problems in disordered speech: A re-evaluation. Philosophical Transactions of the Royal Society B , 369 (1658).
  • Maeda, S. (1976). A characterization of American English intonation (Unpublished dissertation). MIT.
  • Magne, C. , Astésano, C. , Aramaki, M. , Ystad, S. , Kronland-Martinet, R. , & Besson, M. (2007). Influence of syllabic lengthening on semantic processing in spoken French: Behavioral and electrophysiological evidence. Cerebral Cortex , 17 , 2659—2668.
  • Malikouti-Drachman, A. , & Drachman, G. (1992). Greek clitics and lexical phonology. In W. U. Dressler , H. C. Luschützky , O. E. Pfeiffer , & J. R. Rennison (Eds.), Phonologica 1988 (pp. 197–206). Cambridge, UK: Cambridge University Press.
  • Maskikit-Essed, R. , & Gussenhoven, C. (2016). No stress, no pitch accent, no prosodic focus: The case of Ambonese Malay. Phonology , 33 (2), 353–389.
  • Mattys, S. L. , & Melhorn, J. F. (2005). How do syllables contribute to the perception of spoken English? Insight from the migration paradigm. Language and Speech , 48 (2), 223–253.
  • Menn, L. , & Boyce, S. (1982). Fundamental frequency and discourse structure. Language and Speech , 25 , 341–383.
  • Moon-Hwan, C. (2004). Rhythm typology of Korean speech. Cognitive Processing , 5 , 249–253.
  • Moore, B. C. J. (2012). An introduction to the psychology of hearing (6th ed.). Bingley, UK: Emerald Group.
  • Munson, B. , McDonald, E. C. , DeBoe, N. L. , & White, A. R. (2006). The acoustic and perceptual bases of judgments of women and men’s sexual orientation from read speech. Journal of Phonetics , 34 , 202–240.
  • Murty, L. , Otake, T. , & Cutler, A. (2007). Perceptual tests of rhythmic similarity: I. Mora rhythm. Language and Speech , 50 , 77–99.
  • Myers, S. (2003). F0 Timing in Kinyarwanda. Phonetica , 60 , 71–97.
  • Nábělek, I. V. , Nábělek, A. K. , & Hirsh, I. J. (1970). Pitch of tone bursts of changing frequency. Journal of the Acoustical Society of America , 48 , 536–553.
  • Nakai, S. , Kunnari, S. , Turk, A. , Suomi, K. , & Ylitalo, R. (2009). Utterance-final lengthening and quantity in Northern Finnish. Journal of Phonetics , 37 , 29–45.
  • Nazzi, T. , Bertoncini, J. , & Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception Performance , 24 , 756–766.
  • Nazzi, T. , Jusczyk, P. W. , & Johnson, E. K. (2000). Language discrimination by English-learning 5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language , 43 , 1–19.
  • Nazzi, T. , & Ramus, F. (2003). Perception and acquisition of linguistic rhythm by Infants. Speech Communication , 41 , 233–243.
  • Nespor, M. , & Vogel, I. (1986). Prosodic phonology . Dordrecht, The Netherlands: Foris.
  • Niebuhr, O. (2012). At the edge of intonation: The interplay of utterance-final F0 movements and voiceless fricative sounds in German. Phonetica , 69 , 7–21.
  • Nolan, F. (1992). The descriptive role of segments: Evidence from assimilation. In G. J. Doherty & D. R. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody (pp. 261–280). Cambridge, UK: Cambridge University Press.
  • Nolan, F. (2003). Intonational equivalence: An experimental evaluation of pitch scales.In M.J. Solé , D. Recasens , & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences , Barcelona 3–9 August 2003 (pp. 771–774). Barcelona, Spain: ICPhS Organizing Committee.
  • Nolan, F. , & Asu, E. L. (2009). The Pairwise Variability Index and coexisting rhythms in language. Phonetica , 66 , 64–77.
  • O’Connor, J. D. , & Arnold, G. F. (1973). Intonation of Colloquial English . London, UK: Longman.
  • Ortega-Llebaria, M. , Hong, G. , & Fan, Y. (2013). English speakers’ perception of Spanish lexical stress: Context-driven L2 stress perception. Journal of Phonetics , 41 (3–4), 186–197.
  • Ortega-Llebaria, M. , Nemogá, M. , & Presson, N. (2017). Long-term experience with a tonal language shapes the perception of intonation in English words: How Chinese-English bilinguals perceive “Rose?” vs. “Rose”. Bilingualism: Language and Cognition , 20 (2), 367–383.
  • Ortega-Llebaria, M. , & Prieto, P. (2007). Disentangling stress from accent in Spanish: Production patterns of the stress contrast in deaccented syllables. In P. Prieto , J. Mascaró , & M. J. Solé (Eds.), Segmental and prosodic issues in Romance phonology (pp. 155–176). Amsterdam, The Netherlands: John Benjamins.
  • Ortega-Llebaria, M. , & Prieto, P. (2011). Acoustic correlates of stress in Central Catalan and Castilian Spanish. Language and Speech , 54 (1), 1–25.
  • Otake, T. , Hatano, G. , Cutler, A. , & Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language , 32 , 358–378.
  • Pellegrino, F. , Coupé, C. , & Marsico, E. (2011). A cross-language perspective on speech information rate. Language , 87 , 539–558.
  • Peng, S. , Chan, M. K. M. , Tseng, C. , Huang, T. , Lee, O. K. , & Beckman, M. E. (2005).Towards a pan-Mandarin system for prosodic transcription. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 230–270). Oxford, UK: Oxford University Press.
  • Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation (Dissertation). MIT. Published 1988, Bloomington, IN: IULC.
  • Pierrehumbert, J. B. (1981). Synthesizing intonation. Journal of the Acoustical Society of America , 70 , 985–995.
  • Pierrehumbert, J. , & Hirschberg, J. (1990).The meaning of intonational contours in the interpretation of discourse. In P. R. Cohen , J. L. Morgan , & M. E. Pollack (Eds.), Intentions in communication (pp. 271–311). Cambridge MA: MIT Press.
  • Pons, F. , & Bosch, L. (2010). Stress pattern preference in Spanish-learning infants: The role of syllable weight. Infancy , 15 (3), 223–245.
  • Post, B. , & Payne, E. (2018). Speech rhythm in development: What is the child acquiring? In P. Prieto & N. Esteve-Gibert (Eds.), The development of prosody in first language acquisition (pp. 125–144). Amsterdam, The Netherlands: John Benjamins.
  • Prieto, P. (2005). Stability effects in tonal clash contexts in Catalan. Journal of Phonetics , 33 (2), 215–242.
  • Prieto, P. (2009). Tonal alignment patterns in Catalan nuclear falls. Lingua , 119 , 865–880.
  • Prieto, P. , Shih, C. , & Nibert, H. (1996). Pitch downtrend in Spanish. Journal of Phonetics , 24 , 445–473.
  • Prieto, P. , van Santen, J. , & Hirschberg, J. (1995). Tonal alignment patterns in Spanish. Journal of Phonetics , 23 , 429–451.
  • Protopapas, A. , Panagaki, E. , Andrikopoulou, A. , Gutiérrez Palma, N. , & Arvaniti, A. (2016). Priming stress patterns in word recognition. Journal of Experimental Psychology: Human Perception and Performance , 42 (11), 1739–1760.
  • Qin, Z. , Chien, Y.­F. , & Tremblay, A. (2017). Processing of word-level stress by Mandarin-speaking second-language learners of English. Applied Psycholinguistics , 38 , 541–570.
  • Ramus, F. , Dupoux, E. , & Mehler, J. (2003). The psychological reality of rhythm class: Perceptual studies. In M.J. Solé , D. Recasens , & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences , Barcelona 3–9 August 2003 (pp. 337–340). Barcelona, Spain: ICPhS Organizing Committee.
  • Ramus, F. , Nespor, M. , & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition , 73 , 265–292.
  • Recasens, D. , & Espinosa, A. (2005). Articulatory, positional and coarticulatory characteristics for clear /l/ and dark /l/: Evidence from two Catalan dialects. Journal of the International Phonetic Association , 35 (1), 1–25.
  • Renwick, M. E. L. (2013). Quantifying rhythm: Interspeaker variation in %V. Proceedings of Meetings on Acoustics (POMA) , 14 , 060011.
  • Rietveld, A. C. M. , & Gussenhoven, C. (1985). On the relation between pitch excursion size and prominence. Journal of Phonetics , 13 , 299–308.
  • Roach, P. (1982). On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In D. Crystal (Ed.), Linguistic controversies: Essays in linguistic theory and practice in honour of F. R. Palmer (pp. 73–79). London, UK: Edward Arnold.
  • Rogers, D. , & D’Arcangeli, L. (2004). Italian. Journal of the International Phonetic Association , 34 (1), 117–121.
  • Rothermich, K. , Schmidt-Kassow, M. , Schwartze M. , & Kotz, S. A. (2010). Event-related potential responses to metric violations: Rules versus meaning. NeuroReport , 21 (8), 580–584.
  • Schmidt-Kassow, M. , & Kotz, S. A. (2008). Event-related brain potentials suggest a late interaction of meter and syntax in the P600. Journal of Cognitive Neuroscience , 21 (9), 1693–1708.
  • Selkirk, E. (1980). Prosodic domains in phonology: Sanskrit revisited. In M. Aronoff (Ed.), Juncture (pp. 107–129). Saratoga, CA: Anma Libri.
  • Selkirk, E. O. (1984). Phonology and syntax: The relationship between sound and structure . Cambridge, MA: MIT Press.
  • Silverman, K. , Beckman, M. , Pitrelli, J. , Ostendorf, M. , Wightman, C. , Price, P. , . . . Hirschberg, J. (1992). ToBI: A standard for labelling English prosody. In Proceedings of the 1992 International Conference on Spoken Language Processing, 12–16 October, Banff . Banff, AB: ISCA.
  • Silverman, K. , & Pierrehumbert, J. (1990). The timing of prenuclear high accents in English. In J. Kingston & M. Beckman (Eds.), Papers in laboratory phonology I: Between the grammar and physics of speech (pp. 72–106). Cambridge, UK: Cambridge University Press.
  • Skoruppa, K. , Cristià, A. , Peperkamp, S. , & Seidl, A. (2011). English-learning infants’ perception of word stress patterns. Journal of the Acoustical Society of America , 130 (1), EL50-55.
  • Skoruppa, K. , Pons, F. , Christophe, A. , Bosch, L. , Dupoux, E. , Sebastián-Gallés, N. , . . . Peperkamp, S. (2009). Language-specific stress perception by 9-month-old French and Spanish infants. Developmental Science , 12 , 914–919.
  • Sluijter, A. M. C. , & van Heuven, V. J. (1996). Spectral balance as an acoustic correlate of linguistic stress. The Journal of the Acoustical Society of America , 100 , 2471–2485.
  • Smiljanic, R. (2006). Early vs. late focus: Pitch-peak alignment in two dialects of Serbian and Croatian. In L. Goldstein , D. H. Whalen , & C. T. Best (Eds.), Laboratory phonology (Vol. 8, pp. 495–518). Berlin, Germany: Mouton de Gruyter.
  • Smiljanic, R. , & Bradlow, A. (2008). Temporal organization of English clear and conversational speech. Journal of the Acoustical Society of America , 124 (5), 3171–3182.
  • Soto-Faraco, S. , Sebastián-Gallés, N. , & Cutler, A. (2001). Segmental and suprasegmental mismatch in lexical access. Journal of Memory and Language , 45 , 412–432.
  • Stetson, R. H. (1951). Motor phonetics: A study of speech movements in action . Amsterdam, The Netherlands: North-Holland Publishing Company.
  • Stevens, S. S. , & Volkmann, J. (1940). The relation of pitch to frequency: A revised scale. The American Journal of Psychology , 53 (3), 329–353.
  • Suomi, K. , & Ylitalo, R. (2004). On durational correlates of word Stress in Finnish. Journal of Phonetics , 32 (1), 35–63.
  • Tilsen, S. , & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. The Journal of the Acoustical Society of America , 134 (1), 628–639.
  • Titze, I. R. (1994). Principles of voice production . Englewood Cliffs, NJ: Prentice-Hall.
  • Truckenbrodt, H. (2002). Upstep and embedded register levels. Phonology 19 , 77–120.
  • Turk, A. E. , & Shattuck-Hufnagel, S. (2000). Word-boundary-related duration patterns in English. Journal of Phonetics , 28 , 397–440.
  • Tzakosta, M. (2004). Acquiring variable stress in Greek: An Optimality-Theoretic approach. Journal of Greek Linguistics , 5 , 97–125.
  • Vaissière, J. (1991). Rhythm, accentuation and final lengthening in French. In J. Sundberg , L. Nord , & R. Carlson (Eds.), Music, language, speech and brain: Proceedings of an international symposium at the Wenner-Gren Center, Stockholm, 5–8 September 1990 . London, UK: Palgrave.
  • Van Bezooijen, R. (1995). Sociocultural aspects of pitch differences between Japanese and Dutch women. Language and Speech , 38 (3), 253–265.
  • Venditti, J. J. (2005). The J_ToBI model of Japanese intonation. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 172–200). Oxford, UK: Oxford University Press.
  • Wagner, P. S. , & Dellwo, V. (2004). Introducing YARD (Yet Another Rhythm Determination) and re-introducing isochrony to rhythm research. In B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2004 , Nara, Japan , March 23–26, 2004.
  • Waksler, S. (2001). Pitch range and women’s sexual orientation. Word , 52 , 69–77.
  • Warren, P. (2005). Patterns of late rising in New Zealand English: Intonational variation or intonational change? Language Variation and Change , 17 (2), 209–230.
  • Welby, P. , & Loevenbruck, H. (2006). Anchored down in Anchorage: Syllable structure and segmental anchoring in French. Italian Journal of Linguistics , 18 , 74–124.
  • White, L. , & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics , 35 , 501–522.
  • White, L. , Mattys, S. L. , & Wiget, L. (2012). Language categorization by adults is based on sensitivity to durational cues, not rhythm class. Journal of Memory and Language , 66 , 665–679.
  • Wiget, L. , White, L. , Schuppler, B. , Grenon, I. , Rauch, O. , & Mattys, S. L. (2010). How stable are acoustic metrics of contrastive speech rhythm? The Journal of the Acoustical Society of America , 127 , 1559–1569.
  • Williams, B. (1985). Pitch and duration in Welsh stress perception: The implications for intonation. Journal of Phonetics , 13 (4), 381–406.
  • Wingfield, A. , Lahar, C. J. , & Stine, E. A. L. (1989). Age and decision strategies in running memory for speech: Effects of prosody and linguistic structure. Journal of Gerontology , 44 (4), P106–P113.
  • Witteman, J. , van Heuven, V. J. , & Schiller, N. O. (2012). Hearing feelings: A quantitative meta-analysis on the neuroimaging literature of emotional prosody perception. Neuropsychologia , 50 (2), 2752–2763.
  • Wong, W. Y. P. , Chan, M. K. M. , & Beckman, M. E. (2005). An autosegmental-metrical analysis and prosodic annotation conventions for Cantonese. In S. Jun (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 271–300). Oxford, UK: Oxford University Press.
  • Woodrow, H. (1951). Time perception. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1224–1236). New York, NY: Wiley.
  • Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the Acoustical Society of America , 95 , 2240–2253.
  • Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication , 46 (3–4), 220–251.
  • Xu, Y. (2013). ProsodyPro—A tool for large-scale systematic prosody analysis. In B. Bigi & D. Hirst (Eds.), Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France (pp. 7–10). Aix en Provence, France: Labratoire Parole et Langage.
  • Yakup, M. , & Sereno, J. A. (2016). Acoustic correlates of lexical stress in Uyghur. Journal of the International Phonetic Association , 46 (1), 61–77.
  • Yuasa, I. P. (2008). Culture and gender of voice pitch: A sociophonetic comparison of the Japanese and Americans . London, UK: Equinox.
  • Yuen, I. (2007). Declination and tone perception in Cantonese. In C. Gussenhoven & T. Riad (Eds.), Tones and tunes (pp. 63–77). Berlin, Germany: Mouton de Gruyter.
  • Zsiga, E. C. (1995). An acoustic and electropalatographic study of lexical and postlexical palatalization in American English. In B. Connell & A. Arvaniti (Eds.), Phonology and phonetic evidence (pp. 282–302). Cambridge, UK: Cambridge University Press.
  • Zsiga, E. C. (1997). Features, gestures, and Igbo vowels: An approach to the phonology-phonetics interface. Language , 73 (2), 227–274.

1. Fusion is evident in the impression that the noise of machine guns is a cadence (as implied by the oft-quoted metaphor of French machine-gun rhythm, first mentioned in Lloyd James, 1940 ). If slowed down by a factor of four, however, each machine-gun beat is in fact a complex sequence of sounds, closer to that of a beating heart, that is, an iamb.

2. Analyses differ regarding the typology of languages with tonal phenomena at the lexical level. For some, such languages form one category, in that they all use F0 to encode lexical meaning. They differ primarily in terms of the frequency of tonal specifications: at one end of the continuum we find languages like Cantonese, in which every syllable is specified for tone, while at the other end we find languages like Japanese, in which only one syllable per word in a subset of the lexicon is tonally specified (e.g., Beckman & Venditti, 2011 ). Others make a typological distinction between tonal languages in which more than one syllable is specified for tone and languages with pitch accent, such as Japanese, in which only one syllable is thus specified (e.g., Hyman, 2006 ). A discussion of this topic is beyond the scope of this article.

3. AM is often confused with ToBI (Tones and Break Indices), a framework based on AM and developed for the prosodic annotation of spoken corpora. ToBI was first developed for the annotation of American English (Silverman et al., 1992 ). It has since been revised and renamed to clarify it is designed for Mainstream American English (Brugos, Shattuck-Hufnagel, & Veilleux, 2006 ). Additional versions adapted to the needs of other languages have been developed in the past 25 years as well (see Jun, 2005b , and Jun, 2014 ). ToBI is not the prosodic equivalent of the International Phonetic Alphabet (Beckman, Hirschberg, & Shattuck-Hufnagel, 2005 ): it does not provide universal off-the-shelf categories and requires an AM analysis before it can be implemented for a new language. Jun and Fletcher ( 2014 ) and Arvaniti ( 2016 ) provide guidance on how to develop such an analysis.

Related Articles

  • Articulatory Phonetics
  • Syntax–Phonology Interface

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 09 June 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • [195.158.225.230]
  • 195.158.225.230

Character limit 500 /500

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

  • Yusuyin, Saierdaer
  • Zhao, Wenbo
  • Ou, Zhijian

There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. This paper explores the approach of pre-training with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle. We relax the requirement of gold-standard human-validated phonetic transcripts, and obtain International Phonetic Alphabet (IPA) based transcription by leveraging the LanguageNet grapheme-to-phoneme (G2P) models. We construct a common experimental setup based on the CommonVoice dataset, called CV-Lang10, with 10 seen languages and 2 unseen languages. A set of experiments are conducted on CV-Lang10 to compare, as fair as possible, the three approaches under the common setup for MCL-ASR. Experiments demonstrate the advantages of phoneme-based models (Whistle) for MCL-ASR, in terms of speech recognition for seen languages, crosslingual performance for unseen languages with different amounts of few-shot data, overcoming catastrophic forgetting, and training efficiency.It is found that when training data is more limited, phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. To support reproducibility and promote future research along this direction, we will release the code, models and data for the whole pipeline of Whistle at https://github.com/thu-spmi/CAT upon publication.

  • Computer Science - Sound;
  • Computer Science - Computation and Language;
  • Electrical Engineering and Systems Science - Audio and Speech Processing
  • Grades 6-12
  • School Leaders

NEW: Classroom Clean-Up/Set-Up Email Course! 🧽

What Is the Science of Reading?

Evidence-based research on what really works for kids.

What Is the Science of Reading

When it comes down to it, reading might be the most important skill kids learn in school. Being a fluent reader opens up endless opportunities for lifelong learning. That’s why schools and teachers everywhere are constantly trying to improve the way they teach this fundamental skill. One phrase that’s emerged in recent years is the “science of reading.” But what is the science of reading? How does it help teachers and students? Here’s an overview.

What is the science of reading?

Diagram of a human brain showing the parts involved in various reading skills (What is the science of reading?)

Throughout the last 40 years or so, there have been tens of thousands of studies into teaching and learning reading in multiple languages and countries. The science of reading compiles evidence from those studies to help us truly understand the best ways to teach and learn reading. The NWEA website describes it this way:

The science of reading is the converging evidence of what matters and what works in literacy instruction, organized around models that describe how and why.

Rather than guessing and experimenting with what might work, teachers use a structured learning approach that has been proven to be successful. Students get research-backed methods of helping them master this vital skill. Most importantly, the methods work well with all types of students, including (perhaps even especially) those who struggle.

The ultimate goal for students is reading comprehension—being able to identify the words individually AND understand what they mean altogether, fluently and efficiently.

Is there a video about the Science of Reading?

We created a video on this topic, featuring teacher and reading expert, Hilary Statum. She is an ESL teacher and regularly speaks on this topic— learn more about her here . Her video is perfect for sharing with families and community members because it sums it up in just two minutes. 

What are the key elements of the science of reading?

Colorful image of the five pillars of literacy: phonemic awareness, phonics, fluency, vocabulary, comprehension

After analyzing all the research, the National Panel of Reading identified these five elements as critical to reading comprehension:

Phonics is about recognizing letters and letter blends and the sounds they make. Think of a student sounding out letters individually or practicing sounds like “ch” or “st.”

Phonemic Awareness

Phonemic awareness is recognizing that letter sounds and blends put together make up words. When you speak the word “cat,” you don’t say “cuh-a-tuh.” But if you need to figure out how a word is spelled or pronounced, you slow down and sound out each letter or letter blend. That’s phonemic awareness.

While phonics and phonemic awareness are about being able to say or spell a word, vocabulary is about knowing what a word means. It’s one part of language comprehension. The bigger our vocabularies, the easier and more fluent our reading becomes.

Comprehension

Overall comprehension means understanding words individually, as well as sentences, paragraphs, and texts as a whole. Being able to sound out words is one thing, but without comprehension, reading is meaningless. The science of reading reminds us that comprehension is actually one of the earliest skills kids learn. They practice this skill even when someone else is reading aloud to them!

Fluency is putting it all together at the same time. Fluent readers sound out words effortlessly and focus on comprehension and meaning as second nature. They can read with expression and explain what they read without parroting the text.

Which models demonstrate the science of reading?

Diagram showing the simple view of reading: R = D x LC

Several popular models help break this all down. One popular option is the simple view of reading : Decoding (D) x Language Comprehension (LC) = Reading Comprehension (RC.)

  • Decoding is the process of translating written words into speech, and it incorporates phonics, phoneme awareness, spelling, and sight words.
  • Language comprehension incorporates vocabulary, language structure, background knowledge, and fluency.

A diagram of Scarborough's Rope, a model that explains skilled literacy

Another well-known model is Scarborough’s Rope, which shows how many strands weave together to form skilled reading. One weak strand can affect the overall rope, so all the skills are equally important. Learn more about Scarborough’s Rope here.

What does it look like in the classroom?

Student using a pointer to practice lists of words using phonics, while a teacher watches (What is the science of reading?)

A science of reading classroom usually follows a structured sequential curriculum , heavy on phonics. Kids spend a great deal of time learning sounds, blends, phonemes, and more. This enables them to quickly decode any word they come across.

Hands-on practice and repetition are key. Kids see fluent reading modeled for them, then try it on their own. They read one text multiple times, focusing on different elements. For instance, a first read-through might be about decoding: saying the words out loud. The next might focus on vocabulary. And a final read could tackle overall comprehension of the meaning of the text.

Some argue that a science of reading classroom drops the focus on leveled reading , instead striving to give kids the skills that enable them to tackle whatever interests them.

How does balanced literacy compare with the science of reading?

Colorful chart showing The Ladder of Reading (What is the science of reading?)

Balanced literacy isn’t easy to define, but it often includes a focus on “reading cues.” Sometimes you’ll hear the phrase MSV, which stands for meaning, sentence structure, and visual information.

In other words, when readers come across an unfamiliar word, they don’t study the word itself but instead look at words or cues around it (like pictures) to understand it. The idea is that kids should quickly be able to figure out a word and move on, keeping their interest in the text. Leveled reading is another key part of balanced literacy, often along with teaching reading and writing as separate subjects.

If you’ve been teaching reading for a while, you might be thinking, “But I like a balanced literacy approach. I teach some phonics, but I want kids to learn to love reading first! It’s no fun when they have to focus on sounds and letters over and over again.” Maybe. But here’s the thing about balanced literacy practices— the scientific evidence just isn’t there to back them up . Study after study has found that focusing on phonics and vocabulary builds reading comprehension much faster and more effectively than the MSV method.

Of course we want kids to love reading. But they’re more likely to enjoy it when they can learn it with less of a struggle. And advocates of the science of reading approach say their structured methods are more successful. It’s possible to ground kids in phonics and teach them to love books , at the same time.

Where can I learn more?

This is just an overview of a very comprehensive topic. Anyone who teaches reading should spend more time learning about the recommended science of reading methods. Here are some places to start:

  • 10 Helpful Science of Reading PD Books for Teachers
  • Florida Center for Reading Research
  • At a Loss for Words (Podcast and Transcript)
  • Carnegie Reading Webinar: Connecting the Sciences of Reading & Learning
  • Science of Reading: The Podcast

Want to talk about the science of reading with other teachers? Join the discussion in the WeAreTeachers HELPLINE group on Facebook .

Plus, check out what makes a good decodable text.

What Is the Science of Reading?

You Might Also Like

What Is Scarborough's Rope? on teal background with #BuzzwordsExplained logo

What Is Scarborough’s Reading Rope and How Do Teachers Use It?

The many strands that weave into skilled reading. Continue Reading

Copyright © 2024. All rights reserved. 5335 Gate Parkway, Jacksonville, FL 32256

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Don't Miss a Post! Subscribe

  • Guest Posts

Educators Technology

  • Educational AI
  • Edtech Tools
  • Edtech Apps
  • Teacher Resources
  • Special Education
  • Edtech for Kids
  • Buying Guides for Teachers

Educators Technology

Educators Technology

Innovative EdTech for teachers, educators, parents, and students

Phonemic Awareness Games and Activities

By Med Kharbach, PhD | Last Update: June 5, 2024

In today’s post, I am sharing with you this collection of handy phonemic awareness games and activities inspired by Ehri et al.’s (2001) research paper . My purposes is to offer educators, parents, and guardians a toolkit bursting with engaging, effective strategies to bolster phonemic awareness in young learners.

As we all know, phonemic awareness is a critical stepping stone in the journey toward reading fluency and comprehension. Recognizing this, I’ve curated a selection that aims to ignite curiosity, foster joy, and cultivate a deep understanding of the sounds that form the foundation of our language.

If you want to learn more about phonemic awareness, make sure to check our posts:

  • 5 Practical Tips for Teaching Phonemic Awareness
  • What Is Phonemic Awareness According to Scholars?
  • Phonological Awareness Versus Phonemic Awareness
  • 10 Reasons Why Phonemic Awareness Is Important for Early Literacy Development

Here are our favourite phonemic awareness games to try with your students and kids. These games are arranged into several areas targeted by phonemic awareness instruction. These are:

Phonemic Awareness Games

1. Phoneme Isolation

Phoneme isolation is a foundational skill in the realm of phonemic awareness, guiding children to discern individual sounds within words. This ability lays the groundwork for more complex phonemic skills, such as blending and segmenting, which are pivotal for reading and spelling.

To foster this skill, start with sounds that are easier to identify, typically beginning sounds, as they are more distinct and less likely to be influenced by the sounds that follow. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • “Phoneme Hopscotch”: This is a dynamic, physically engaging game that combines phoneme isolation with outdoor fun. Create a hopscotch grid using chalk and label each square with a letter or picture representing a word. Children take turns tossing a beanbag or a small stone onto the grid. When a player lands on a square, they must isolate and say the first, middle, or last sound of the word or letter in that square, depending on the round’s focus. For example, if the square has a picture of a “cat” and the focus is on the first sound, the child would say /k/. This game helps with phoneme isolation and encourages physical activity and can be easily adapted to focus on different phoneme positions as children’s skills develop.
  • “Sound Fishing”: “Sound Fishing” turns phoneme isolation into an exciting fishing expedition. Create a “pond” using a blue cloth or paper on the floor, and fill it with fish cutouts that have pictures or words on them. Attach a paperclip to each fish and create a fishing rod with a magnet tied to a string. As children “fish” for a word, they must identify and say the initial, medial, or final sound of the word on their catch. For instance, if a child catches a “fish” with the word “dog” on it, they might be asked to isolate and pronounce the final sound /g/. This game not only sharpens phoneme isolation skills but also adds an element of suspense and discovery to the learning process, making phonemic awareness practice an adventure.

2. Phoneme Identity

Phoneme identity focuses on recognizing the same sounds in different words, an essential skill for developing phonological awareness and an understanding that different words can share common sounds. This recognition is key for learning to decode and spell words by patterns rather than memorizing them as wholes.

To develop phoneme identity, begin with activities that highlight the initial sounds of words, as these are typically the easiest for children to identify. Use visual aids, such as pictures or objects, to support the auditory aspect of the activities. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • “Sound Bingo”: “Sound Bingo” offers a fun, interactive twist on the classic Bingo game, focusing on phoneme identity. Create Bingo cards with pictures or words that have common initial, medial, or final sounds. Instead of calling out numbers, the leader calls out a phoneme (/m/, /t/, /s/, etc.), and players must identify a picture or word on their card that contains that sound. When they find a match, they cover the picture or word with a marker. The first player to cover a predetermined pattern (a line, X, or full house) shouts “Bingo!” and wins. This game effectively reinforces phoneme identity by encouraging children to listen for and recognize the same sounds in different words, enhancing their phonological awareness in a communal, competitive setting.
  • “Sound Storytime”: “Sound Storytime” turns the act of reading into an interactive phoneme identity game. Choose a book with repetitive sounds or words and read it aloud to the children. Assign a specific sound for the children to listen for (/s/, /r/, /m/, etc.), and each time they hear a word with that sound, they perform a predetermined action (clap, stand up, make a gesture related to the sound, etc.). For example, if the sound is /s/ and the word “sun” comes up in the story, the children would perform the action. This game sharpens listening skills and integrates phoneme identity practice with literacy, making storytime even more engaging and educational.

Related: Phonics Games and Activities

3. Phoneme Categorization

Phoneme categorization sharpens a child’s auditory discrimination skills by asking them to identify the word that sounds different in a small group. This strategy is excellent for developing critical listening skills and an understanding that words are made up of sounds that can be manipulated and analyzed.

To develop phoneme categorization, start with words where the differing sound is at the beginning, as these are typically more straightforward for children to identify. Gradually progress to differences in the middle or end of words. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • “Musical Sounds”: “Musical Sounds” is an auditory twist on the classic game of musical chairs, designed to bolster phoneme categorization skills. In this version, instead of music stopping, you play recordings or say words out loud, and children walk or dance around a circle of chairs. Each round focuses on a set of three or four words where one word has a different initial, medial, or final sound (e.g., “mat, cat, rat, dog”). When the words stop, children must quickly sit down and identify the word that doesn’t belong with the others. To make it more inclusive and focus on the learning aspect, you can opt to not remove chairs, ensuring every child remains engaged throughout the game. This activity enhances listening and phonemic awareness and keeps children physically active and engaged.
  • “Sound Sorting”: “Sound Sorting” is a hands-on activity that reinforces phoneme categorization through a tactile and visual approach. Create cards with pictures or words that mostly have a common sound but include one or two that diverge. Divide the children into small groups and give each group a set of cards. Challenge them to sort the cards into two piles: those that share the same sound and the one(s) that don’t belong. For example, you might provide a set like “fan, van, pan, dog,” where “dog” is the outlier. After sorting, each group explains their reasoning, allowing for discussion and reinforcing the concept. This game helps children practice phoneme categorization and encourages teamwork and verbal expression, as they must articulate why certain words do not fit with the others.

4. Phoneme Blending

Phoneme blending is the ability to hear individual sounds (phonemes) and blend them together to form a word. This skill is crucial for reading, as it allows children to decode unfamiliar words by sounding them out.

To develop phoneme blending, begin with two-sound words (like “at” or “up”), gradually moving to words with three sounds, then four. Use a slow, clear voice to pronounce each sound distinctly, pausing slightly between sounds to give children time to process. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • Game 1: “Sound Train”: “Sound Train” is an engaging game that brings the concept of phoneme blending to life through the imagery of a train. Create train carriages from blocks, paper, or even digital slides, each bearing a single phoneme. Start with simple words by connecting two or three carriages (blocks or slides) together, each displaying a phoneme (/c/ /a/ /t/). Children act as the train’s engineer, moving the train along the track as they blend the sounds together to form a word. As they become more proficient, add more carriages to build longer words. This visual and kinesthetic approach helps children understand that when individual sounds (carriages) are linked (blended) together, they form words. The action of moving the train along reinforces the concept of blending sounds in sequence, making it a memorable and effective learning experience.
  • “Blend & Reveal”: “Blend & Reveal” is a captivating game that uses mystery and suspense to practice phoneme blending. Prepare a series of cards or digital slides with individual phonemes on them (/s/, /p/, /i/, /d/, /e/, /r/). Present the sounds to the children one at a time, encouraging them to blend the sounds together mentally. After all sounds have been presented, reveal a picture or the written word that corresponds to the blended sounds. This moment of revelation provides a rewarding sense of accomplishment. To increase engagement, use themes or characters that interest the children, turning each round into a mini-adventure where blending sounds unlocks the mystery word. This game improves phoneme blending skills and also enhances listening, attention, and memory, making it a holistic educational activity.

Related: Phonological Awareness Vs Phonics

5. Phoneme Segmentation

Phoneme segmentation involves breaking a word down into its individual sounds. This skill is crucial for spelling, as it requires the learner to identify and sequence the sounds in words.

To develop phoneme segmentation, start with simple, short words that have clear, distinct sounds. Encourage children to move from segmenting two-sound words to three-sound words, and then to more complex words as their confidence grows. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • Game 1: “Word Builders”: “Word Builders” is an interactive game that reinforces phoneme segmentation by allowing children to become architects of words. Use letter tiles, magnets, or cards with letters on them for this activity. Present a word aloud, such as “mat,” and ask the children to select the correct letters and place them in order to construct the word. As they pick each letter, encourage them to say the sound it represents, effectively segmenting the word into its phonemes (/m/ /a/ /t/). This hands-on approach not only helps with understanding phoneme segmentation but also bridges the gap to spelling and reading. To increase the challenge, include words of increasing length and complexity, or introduce a timer for added excitement.
  • “Phoneme Jump”: “Phoneme Jump” turns phoneme segmentation into a physical activity, making it especially appealing for energetic learners. Create a series of stepping stones or spots on the floor using paper, mats, or tape. Each spot represents a sound in a word. Say a word aloud, like “frog,” and have the children jump on the correct number of spots as they segment the word into its phonemes (/f/ /r/ /o/ /g/). With each jump, they say the sound that corresponds to that part of the word. This game not only helps with auditory processing and phoneme segmentation but also integrates gross motor skills, making learning more dynamic and memorable. For an added challenge, vary the words’ complexity or speed up the pace, encouraging quick thinking and active engagement.

6. Phoneme Deletion

Phoneme deletion tests a child’s ability to mentally manipulate phonemes within words by removing specific sounds. This skill is a bit more advanced and challenges students to think about words in a flexible way.

To develop phoneme deletion, begin with deleting initial sounds, as these are usually the easiest for children to identify and remove. Progress to deleting final and then medial sounds. Use familiar words to make this task more accessible. Here are two engaging games to develop this skill:

Phonemic Awareness Games

  • “Sound Snip”: “Sound Snip” is a creative and interactive game designed to practice phoneme deletion in a way that feels like a craft project. Provide children with word cards and scissors (real or imaginary for safety). Each card has a picture and the word to match. The teacher says a word and then instructs which sound to “snip” away. For example, with the word “clamp,” you might say, “Let’s snip away the /c/ sound. What word do we have left?” Children pretend to cut the initial sound from the word, or they can fold the card to hide the initial letter, revealing the new word “lamp.” This physical action, paired with the visual of the word changing, reinforces the concept of phoneme deletion in a tangible and memorable way.
  • “Sound Stealer”: “Sound Stealer” turns phoneme deletion into a playful, narrative-driven activity. Introduce a character, the “Sound Stealer,” who loves to sneak up and steal sounds from words. Present a series of words, either spoken or with visuals, and then announce which sound the Sound Stealer is going to take. For instance, if the word is “brick,” and the Sound Stealer takes the /b/ sound, the children have to figure out the remaining word is “rick.” To add to the fun, use a toy or a puppet as the Sound Stealer, and have it “take” the sound away, either by moving a piece of the word or covering it up. This game practices phoneme deletion and also engages children’s imagination and storytelling skills, making the learning process more engaging and effective.

Final thoughts

Through the playful yet purposeful activities and games we’ve discussed, from “Sound Detective” to “Sound Stealer,” we’re not just teaching children to isolate, blend, segment, or delete sounds in words. We are equipping them with tools to help them navigate the complex web of language. Phonemic awareness is the cornerstone of literacy, and by engaging children in these dynamic and interactive games, we’re building their confidence, nurturing their curiosity, and laying a solid foundation for their reading and writing skills.

phonetics research

Join our mailing list

Never miss an EdTech beat! Subscribe now for exclusive insights and resources .

phonetics research

Meet Med Kharbach, PhD

Dr. Med Kharbach is an influential voice in the global educational technology landscape, with an extensive background in educational studies and a decade-long experience as a K-12 teacher. Holding a Ph.D. from Mount Saint Vincent University in Halifax, Canada, he brings a unique perspective to the educational world by integrating his profound academic knowledge with his hands-on teaching experience. Dr. Kharbach's academic pursuits encompass curriculum studies, discourse analysis, language learning/teaching, language and identity, emerging literacies, educational technology, and research methodologies. His work has been presented at numerous national and international conferences and published in various esteemed academic journals.

phonetics research

Join our email list for exclusive EdTech content.

share this!

May 20, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

Rigid approach to teaching phonics is 'joyless' and is failing children in England, experts warn

by Taylor & Francis

learn to read

Experts have released robust research to show that phonics should be taught hand-in-hand with reading and writing to encourage true literacy and a love of reading, not through narrow synthetic phonics.

There is widespread disagreement globally across academic and educational spheres about the best way to teach children to learn to read and write. Despite a growing international trend towards a narrow approach to synthetic phonics, experts suggest there is a 'better way' to teach reading and writing.

In England, the system is among the most prescriptive in the world with 'synthetic phonics' being the method required by government. Yet in England, in 2023 the national tests at the end of primary education showed that 27% of children did not meet the standard in reading and 29% did not meet the standard for writing.

Latest research, including a new paper published in Literacy and featuring in upcoming book The Balancing Act , shows that the evidence does not support the efficacy of this approach—and experts are calling for an overhaul of the system to bring 'joy' back into reading.

"We know that being literate not only sets the foundation for better academic and socio-economic outcomes, but also that reading can support personal, social, and emotional development , enabling better mental health and greater capacity for empathy and critical thinking—we must stop letting children down," authors and education experts Dominic Wyse and Charlotte Hacking explain.

Synthetic phonics

Schools in England are required by the national curriculum to teach reading and writing using a system called "synthetic phonics," which involves a narrow emphasis on learning about sounds (phonemes) and the letters that stand for those sounds.

But Wyse and Hacking say these policies are in danger of harming children's education and are "a product of political ideology not in children's best interests."

Although synthetic phonics schemes can now be found in many classrooms across the world, the latest trend initially started in England. Wyse and Hacking explain how in England, teachers face a unique set of pressures to adopt synthetic phonics as the only approach.

They outline how schools are very strongly encouraged by government Department for Education (DfE) policies to buy often expensive commercial synthetic phonics schemes, and the synthetic phonics policy is also enforced by the inspectorate Ofsted. Children aged 6 are tested in phonics and the results are entered into the national pupil database, with those data used to hold the schools and teachers to account.

Wyse explains, "When children in England are about age six (Year 1) they must all sit a national test to decode a list of individual words that includes nonsense words. In 2023 21% of children did not achieve the expected standard—this is despite more than a decade of this synthetic phonics approach. Clearly, it isn't working. Our research shows a more effective way to teach reading and writing."

Previous research, laid out in the book, shows that the potential consequences for children not progressing sufficiently well in reading and writing are profound, and include being less able to access vital services in society; higher probability of poorer mental health , lower wages in life, and even of ending up in prison.

"Despite a wealth of research and scientific evidence to show there are multiple effective ways to teaching reading and writing, the only permitted way is synthetic phonics," Hacking explains.

A balanced approach

In 2022 a landmark research paper by Wyse and Alice Bradbury concluded that a balanced approach to teaching reading was more effective than a narrow synthetic phonics approach.

The paper provided some clear evidence-based recommendations about changes to policy and practice, which were not taken up. The Balancing Act goes further, not least by giving a vivid and detailed account of a new approach to teaching.

Hacking and Wyse, who have extensively examined the evidence, are now promoting a "balanced approach," focusing on using beloved children's texts to systematically teach the key elements that are vital to learn to read and write, including phonics.

This approach to teaching is explicitly built on new analyses of the most robust research studies undertaken to determine what are the most effective ways to teach phonics, reading, and writing.

They explain: "With this approach, the importance of comprehending and composing the meaning of written language is carefully balanced with the acquisition of a range of skills and knowledge. This enables pupils to see the real purposes for reading and writing.

"Instead of focusing narrowly on the sounds that letters represent, this approach also prioritizes the comprehension of text, the grammar of sentences, and teaching writing to help reading. The balanced approach is about understanding the structure of words and language as a whole." Contrary to a myth that some people have promoted, a balanced approach is not 'whole language' teaching in disguise.

This approach is summarized in a new theory and model, called "The Double Helix of Teaching Reading and Writing" . The underpinning research of the "The Double Helix of Reading and Writing," has also been published in Literacy .

Using 'real' books

Underpinning this balanced approach is the use of "real books," which are of "outstanding quality, inclusive, and diverse in their representations of people and places."

Under synthetic phonics schemes, children are usually given formulaic "decodable" texts designed to repeat a certain sound to encourage familiarization with the sound and a limited number of simple words. In some stages of the synthetic phonics program, the reading of whole texts may even be discouraged, but Wyse and Hacking believe the emphasis should be much more strongly on comprehending and enjoying real whole texts.

"Delighting in real books brings learning to life. This engages children and sustains their motivation to read and write for real purposes and for pleasure," they explain.

Wyse continues, "Instead of the drive to support money-making from synthetic phonics schemes our approach puts the work of authors of books for children center stage. Otherwise, children miss out on the art of outstanding illustrator authors, puns, wordplay, imagination, curiosity, creativity and so much more. Our approach is a far cry from narrow synthetic phonics lessons, which even when taught expertly simply haven't the same appeal for children."

"Meaning drives our approach to teaching reading and writing. It is the essence of human language, hence it should be the essence of teaching ," Hacking continues. "Teaching about sounds is meaningless unless it is contextualized in words, sentences and whole texts."

The experts argue that motivating children to read and write is foundational to the balanced approach, and it begins with engaging children through high-quality books.

Dominic Wyse et al, Decoding, reading and writing: the double helix theory of teaching, Literacy (2024). DOI: 10.1111/lit.12367

Provided by Taylor & Francis

Explore further

Feedback to editors

phonetics research

Study finds fresh water and key conditions for life appeared on Earth a half-billion years earlier than thought

11 hours ago

phonetics research

Saturday Citations: Praising dogs; the evolution of brown fat; how SSRIs relieve depression. Plus: Boeing's Starliner

18 hours ago

phonetics research

Nonreciprocal quantum batteries exhibit remarkable capacities and efficiency

phonetics research

New method optimizes lithium extraction from seawater and groundwater

20 hours ago

phonetics research

Rare 7-foot fish washed ashore on Oregon's coast garners worldwide attention

Jun 7, 2024

phonetics research

California wildfire pollution killed 52,000 in a decade: study

phonetics research

Quantum chemistry and simulation help characterize coordination complex of elusive element 61

phonetics research

A protein that enables smell in ants—and stops cell death

phonetics research

Cascadia Subduction Zone, one of Earth's top hazards, comes into sharper focus

phonetics research

New research finds lake under Mars ice cap unlikely

Relevant physicsforums posts, how is physics taught without calculus.

9 hours ago

Is "College Algebra" really just high school "Algebra II"?

Uk school physics exam from 1967.

May 27, 2024

Physics education is 60 years out of date

May 16, 2024

Plagiarism & ChatGPT: Is Cheating with AI the New Normal?

May 13, 2024

Physics Instructor Minimum Education to Teach Community College

May 11, 2024

More from STEM Educators and Teaching

Related Stories

phonetics research

Phonics teaching in England needs to change: New research points to a better approach

Jan 19, 2022

phonetics research

Perceiving sound-letter associations in English can help people learn to read it better

Dec 2, 2021

phonetics research

Whole language approach: Reading is more than sounding out words and decoding

Nov 12, 2019

phonetics research

New phonics test will do nothing to improve Australian children's literacy

Sep 18, 2017

phonetics research

Q&A: Millions of US children have mediocre reading skills. Engaged parents and a committed school curriculum can help

Oct 5, 2023

phonetics research

Beyond the 'Reading Wars': How the science of reading can improve literacy

Jun 12, 2018

Recommended for you

phonetics research

First-generation medical students face unique challenges and need more targeted support, say researchers

phonetics research

Investigation reveals varied impact of preschool programs on long-term school success

May 2, 2024

phonetics research

Training of brain processes makes reading more efficient

Apr 18, 2024

phonetics research

Researchers find lower grades given to students with surnames that come later in alphabetical order

Apr 17, 2024

phonetics research

Earth, the sun and a bike wheel: Why your high-school textbook was wrong about the shape of Earth's orbit

Apr 8, 2024

phonetics research

Touchibo, a robot that fosters inclusion in education through touch

Apr 5, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

IMAGES

  1. THE THEORY OF PHONETICS Lectures 1+2 By Elena

    phonetics research

  2. How to Teach Phonics using a Research-Based Approach

    phonetics research

  3. | General model of phonology, from speech phonetics to the lexicon

    phonetics research

  4. PPT

    phonetics research

  5. Articulatory Phonetics Research Paper

    phonetics research

  6. Theory of Phonetics By ICC-1 Speech Sounds

    phonetics research

VIDEO

  1. LNG220 Research methods in phonetics

  2. Linguistics 341

  3. Prosody//Class-5//PHONETICS(part-2)//Vowel Sounds//Syllabification//for MADRASA and SLST exams

  4. การสาธิตวิธีออกเสียง Lateral approximant

  5. Research Methodology -Mechanics of Prose I

  6. English Language & Linguistics at Kent

COMMENTS

  1. Phonetics

    Summary. Phonetics is the branch of linguistics that deals with the physical realization of meaningful distinctions in spoken language. Phoneticians study the anatomy and physics of sound generation, acoustic properties of the sounds of the world's languages, the features of the signal that listeners use to perceive the message, and the brain mechanisms involved in both production and ...

  2. Advancements of phonetics in the 21st century: Theoretical and

    Underlying 21st century advances in sound change research are late 20th century developments in gesturalist and exemplar approaches to phonetic study. • Recent theoretical developments advance the view of phonetic change as the incremental, arguably non-errorful outcome of systematic yet variable speech behaviors. •

  3. Journal of Phonetics

    The Journal of Phonetics publishes papers of an experimental or theoretical nature that deal with phonetic aspects of language and linguistic communication processes. Papers dealing with technological and/or pathological topics, or papers of an interdisciplinary nature are also suitable, provided …. View full aims & scope.

  4. Phonetics and Phonology

    The Stanford Department of Linguistics has a strong focus on phonetics and phonology, with a special emphasis on variation. Research. Our research integrates phonetic and phonological theory with other aspects of language structure (syntax, morphology) and language use (sociolinguistics, psycholinguistics, pragmatics) and pursues its implications for closely related fields (metrics, historical ...

  5. Phonetics and Phonology

    Phonetics is the study of speech sounds as physical entities (their articulation, acoustic properties, and how they are perceived), and phonology is the study of the organization and function of speech sounds as part of the grammar of a language. The perspectives of these two closely related subfields are combined in laboratory phonology, which seeks to understand the relationship between ...

  6. (PDF) Phonetics and Phonology: Overview

    Phonetics subsumes the physical aspects. of speech production and their relation to speech perception, while phonology addresses. the functional and systemic nature of the sounds of particular ...

  7. Research Article Advancements of phonetics in the 21st century

    Researchers have expanded research on phonetics and spoken word recognition separately to more language families (e.g. Best et al., 1988 on Zulu and El Aissati et al., 2012 on Berber, for just one example each of phonetics and spoken word recognition respectively). While the papers cited above are not a complete list, it is clear that English ...

  8. Phonetics

    Phonetics is the science of speech. It studies the articulation, acoustics, and perception of speech sounds. The phonetics group at Penn emphasizes the interdisciplinary and experimental nature of phonetics in both teaching and research. The group is engaged in a wide range of research topics, including laboratory studies of speech production ...

  9. Phonetics research themes

    Phonetics research themes. Members of the Phonetics Laboratory at Oxford are engaged in a wide range of research themes related to speech and language, including speech synthesis, computational phonology, the neurology of speech production, vocal tract imaging and the analysis and modelling of intonation in English. ...

  10. 126907 PDFs

    Phonetics is a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent... | Explore the latest full-text research PDFs ...

  11. Teaching Pronunciation: The State of the Art 2021

    The model for this lesson design is Celce-Murcia et al.'s (2010: 45, figure P2.2) communicative framework for teaching English pronunciation, which includes the phases of 1-Description and Analysis, 2-Listening Discrimination, 3-Controlled Practice, 4-Guided Practice and 5-Communicative Practice.

  12. PhonLab

    About Us. The UC Berkeley PhonLab (Phonetics/Phonology Lab) focuses on documenting and explaining sound patterns in language. This includes physical studies of aerodynamic and articulatory factors in speech production, behavioral and neural imaging studies of speech perception, as well as linguistic studies of synchronic and diachronic language ...

  13. Frontiers

    Recent advances in access to spoken-language corpora and development of speech processing tools have made possible the performance of "large-scale" phonetic and sociolinguistic research. This study illustrates the usefulness of such a large-scale approach—using data from multiple corpora across a range of English dialects, collected, and analyzed with the SPADE project—to examine how ...

  14. (PDF) Integrating phonetics and phonology in the study of linguistic

    Few concepts in phonetics and phonology research are as. widely used and as vaguely de fi ned as is the notion of promi-. nence. At the crossroads of signal and structure, of stress and. accent ...

  15. Photonics Research

    Nanoscale vortex line visualization and chiral nanostructure fabrication via two-photon polymerization direct laser writing. 29. MAR. 2024. Spin-decoupled meta-coupler empowered multiplexing and multifunction of guided wave radiation. 28. MAR. 2024. Light-soft-matter interaction: biophotonic rogue waves in red-blood-cell suspensions.

  16. Advancements of phonetics in the 21st century: Theoretical and

    DOI: 10.1016/j.wocn.2023.101275 Corpus ID: 264091122; Advancements of phonetics in the 21st century: Theoretical and empirical issues of spoken word recognition in phonetic research

  17. Advancements of phonetics in the 21st century ...

    Abstract. Variation in speech has always been important to phonetic theory, but takes center stage in the growing area of sociophonetics, which places the role of the social at the heart of the theoretical and methodological enterprise. This paper provides a comprehensive survey of key advances and theoretical issues in sociophonetic research ...

  18. Phonetics of Prosody

    Phonetic research on rules like those discussed immediately prior, however, has shown that very often they are not categorical, as phonological models predict, but gradient, coarticulatory phenomena. Vowel deletion due to hiatus across a word boundary in Greek is a case in point.

  19. A Meta-Analysis of the Effect of Phonological Awareness and/or Phonics

    To summarize key findings from previous research, explicit PA and phonics instruction helps beginning L2 readers, but this instruction should be performed upon a firm base of oral L2 proficiency. L1 literacy should also be taken into consideration to expedite the process through facilitating L1-L2 transfer and targeting problem areas.

  20. EIL Pronunciation Research and Practice: Issues, Challenges, and Future

    While research on phonetic/phonological theory and EIL pronunciation features has proliferated over the past few decades, research findings have remained largely divorced from the instructional materials for EIL pronunciation teaching. Therefore, there is a need for a comprehensive review of research on EIL pronunciation in order to shed some ...

  21. Research methods in articulatory phonetics I ...

    This article is Part I of a general overview of current methods in articulatory phonetics research. This part deals with methods used to investigate oral gestures—speech-related movements of the tongue, the lips, and the jaw (while Part II is concerned with methods studying laryngeal and nasal gestures, and the entire vocal tract).

  22. Salience of Greater Poland Polish Phonetic Variables

    This paper investigates the salience of five phonological variables: pre-sonorant voicing, nasal stopping, nasal assimilation, the czy-trzy merger, and denasalization in Greater Poland Polish. We elicited both production (audio recordings) and perception (psycholinguistic study) data from 65 participants. On this basis, we categorize the five ...

  23. (PDF) Pronunciation Learning and Teaching: What can Phonetics Research

    [email protected]. Abstract. C ontemporary research on the learning and teaching. o f second language (L2) pronunciation indicates that. " applied phonetics" must be defined as something. more ...

  24. Whistle: Data-Efficient Multilingual and Crosslingual Speech

    There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages.

  25. What Is the Science of Reading?

    After analyzing all the research, the National Panel of Reading identified these five elements as critical to reading comprehension: Phonics. Phonics is about recognizing letters and letter blends and the sounds they make. Think of a student sounding out letters individually or practicing sounds like "ch" or "st." Phonemic Awareness

  26. Theoretical achievements of phonetics in the 21st century: Phonetics of

    Within linguistic phonetic research, this was also found for contrastive breathy, modal, and creaky ('laryngealized') phonation types in Mazatec (Silverman et al., 1995, Blankenship, 2002, Garellek and Keating, 2011), and Ladefoged (1983) provides even earlier support for an acoustic continuum for the phonation types in !Xóõ using another ...

  27. Classroom Strategy Library

    The chart below lists all of the strategies currently in our library. To quickly find the strategies you need, use the filters below. For many of the strategies, you can also see which are best used before, during, and/or after reading (B/D/A). "Before" strategies activate students' prior knowledge and set a purpose for reading.

  28. Phonemic Awareness Games and Activities

    In today's post, I am sharing with you this collection of handy phonemic awareness games and activities inspired by Ehri et al.'s (2001) research paper . My purposes is to offer educators, parents, and guardians a toolkit bursting with engaging, effective strategies to bolster phonemic awareness in young learners. As we all know, phonemic awareness…

  29. Rigid approach to teaching phonics is 'joyless' and is failing children

    Wyse explains, "When children in England are about age six (Year 1) they must all sit a national test to decode a list of individual words that includes nonsense words. In 2023 21% of children did ...

  30. Watch Newsround

    Ricky has your Saturday Newsround. We look at the prime minister's early exit from D-Day commemorations, the big issues from the UK election TV debate and we hear from one of the very last ...