Identification of Minimal Pairs of Japanese Pitch Accent in Noise-Vocoded Speech

Research output: Contribution to journalArticlepeer-review


The perception of lexical pitch accent in Japanese was assessed using noise-excited vocoder speech, which contained no fundamental frequency (fo) or its harmonics. While prosodic information such as in lexical stress in English and lexical tone in Mandarin Chinese is known to be encoded in multiple acoustic dimensions, such multidimensionality is less understood for lexical pitch accent in Japanese. In the present study, listeners were tested under four different conditions to investigate the contribution of non-fo properties to the perception of Japanese pitch accent: noise-vocoded speech stimuli consisting of 10 3-ERBN-wide bands and 15 2-ERBN-wide bands created from a male and female speaker. Results found listeners were able to identify minimal pairs of final-accented and unaccented words at a rate better than chance in all conditions, indicating the presence of secondary cues to Japanese pitch accent. Subsequent analyses were conducted to investigate if the listeners' ability to distinguish minimal pairs was correlated with duration, intensity or formant information. The results found no strong or consistent correlation, suggesting the possibility that listeners used different cues depending on the information available in the stimuli. Furthermore, the comparison of the current results with equivalent studies in English and Mandarin Chinese suggest that, although lexical prosodic information exists in multiple acoustic dimensions in Japanese, the primary cue is more salient than in other languages.

Original languageEnglish
Article number887761
JournalFrontiers in Psychology
Publication statusPublished - 2022 May 31


  • Japanese
  • lexical pitch accent
  • noise-vocoded speech
  • secondary cues
  • speech perception

ASJC Scopus subject areas

  • Psychology(all)


Dive into the research topics of 'Identification of Minimal Pairs of Japanese Pitch Accent in Noise-Vocoded Speech'. Together they form a unique fingerprint.

Cite this