Playing with Praat, part 2: Monophthongs

In the previous article, I discussed vowel space and cardinal vowels, trying to establish a foundation to use to discuss more practical details of actual languages, such as pronunciation of vowels. In this article, I will use the data from the previous article to compare vowels in Chinese and Swedish. I originally meant to include English, but I skipped that for reasons that will become apparent below.

The main focus is of course Chinese. Vowels are known to be fairly easy to get roughly right (communication), but very hard to get native-like (accent). This is most likely because they exist in a space rather than on a spectrum (actually, a volume might be a better representation if we include lip-rounding). It’s simply very hard to create an abstract representation of a sound in another language when there are already partly overlapping phonemic categories from your native language (most research suggests that language interfere with each other in all kinds of ways, thus making it impossible to become like a monolingual native speaker).

So, my main question today is to see whether or not my pronunciation of Chinese vowels closely resemble my pronunciation of Swedish and whether or not they resemble some kind of model in Chinese. Intuitively, I would say that they don’t resemble Swedish much and that they are roughly correct. Still, at the time of writing this little prelude, I don’t really know, so let’s check it out!

My vowel space

Except for the below graph, I won’t repeat what I wrote in the previous article. This is what my attempt at producing the eight cardinal vowels looked like:

cardinal vowel chart olleAlthough not perfect, this roughly represents the range of sounds I can comfortably produce. Of course, I could possibly refine the blue line a bit by trying to produce vowels between e.g. [a] and [ɑ], but this would probably not change the overall picture by much. Still, as we shall see, some of my Swedish vowels go quite far beyond the confines of the blue line.

Chinese monophthongs

The most common way of counting Chinese monophthongs gives us ten different vowels. There are actually more allophones, but I have to draw the limit somewhere, so I’ll stick with the textbook examples. Keeping the blue line from the cardinal vowel chart above, I have plotted F1 and F2 of my monophthongs in Chinese. These were produced in real CV syllables when available (otherwise just V or VC).

chinese monophthongsTo make comparisons easier, let’s include the F1 and F2 values as well:

Vowel F1/F2
[i] 264/2135
[y] 260/2064
[ɛ] 438/1901
[a] 635/1100
[ǝ] 486/1316
[ɤ] 483/1064
[o] 496 718
[u] 430/683
[͡ɯ] 317/1420
[͡ɨ] 374/1625

Before we start comparing this with other languages, it makes sense to know roughly if these values are in the ballpark to start with. The following diagram is scanned from 朱川’s 2013 book 外國學生漢語語音學習對策, but is originally from 吳宗濟’s 漢語普通話單音節語圖冊 (1986). Based on the frequencies, I would guess this includes both male and female speakers, but I don’t know exactly how these values where elicited. If you happen to know of a better source, please let me know!

vowel distributionNot all sounds in this diagram are present in my sample and not all sounds present in my sample are in this diagram, but the general picture looks quite promising. All my vowels seems more closed that the samples here, but I think that’s either due to the way I normally speak (in any language) or because of the shape of my oral cavity. I have tried to produce [a] with a higher F1 than 700, but I simply can’t do it. The below graphs compare my vowels (left) to those from 吳宗濟 (right).

shape comparison

I note the following:

  1. My [y] is roughly as closed as my [i], but it’s meant to be slightly open (I have seen this in numerous references, not just the above graph. This difference ought to be very slight and perhaps not even noticeable.
  2. My null final [͡ɯ] might be a bit too closed. Perhaps this is a result of trying to pronounce this sound as clearly as possible. In any case, I’m pretty sure this isn’t a big problem, the difference is pretty small. Also, these vowels are apical and therefore might behave differently than normal vowels.
  3. My [u] is pretty open compared to the model. This should actually be immediately obvious when you look at my monophthongs, the [u] is very far from the top-right corner. Still, I think my pronunciation of the cardinal [u] is very exaggerated and quite far from the correct sound in Mandarin. I’m not sure if my [u] is too open or not.

I will add these observations to a list that I will later check with native speakers to see if the difference is significant or not. Still, the most striking observation is that apart from the mentioned oddities above, the shapes are very similar indeed. Now, let’s see how this compares to my Swedish vowels.

A cross-linguistic comparison

The main question I want to answer is if rely on my native Swedish vowels when pronouncing Chinese. I don’t think this is the case, but let’s find out! If you don’t know anything about Swedish vowels, I suggest you check out this article on Wikipedia. In short, it’s a lot more complicated than Chinese. Mastering the Swedish vowel system must be a nightmare for native speakers of Chinese! To do this, I simply recorded the same words used in the Wikipedia article. The syllabic environment isn’t identical, but anything I write here contain similar levels of error anyway, so it’ll have to do. Here’s the formant data for my Swedish vowels:

Vowel F1/F2
[iː] 250/2189
[ɪ] 287/2321
[eː] 251/2480
[e] 449/1977
[ɛː] 798/1774
[ɛ] 521/2084
[ɑː] 590/982
[a] 733/1194
[oː] 490/1038
[ɔ] 532/667
[uː] 495/727
[ʊ] 273/657
[ʉː] 275/1727
[ɵ] 385/1178
[yː] 273/2051
[ʏ] 317/1913
[øː] 387/1747
[œ] 441/1561

I plotted all that into the same graph, but I can’t be bothered to label them all. The most interesting finding is that it seems my vowel space drawn earlier is much too narrow. I have quite a few vowels outside what I thought were my extremes!

swedish chinese vowelsHowever, when I started comparing Chinese with Swedish, I soon realised that this approach is deeply flawed, and doesn’t work very well for the long vowels. I didn’t realise this before, but the long vowels in Swedish undergo quite a lot of change. For instance, this is a spectrogram of when I say the word “hel” (“whole” in Swedish (the light red area is the vowel):

helI have no training in Swedish phonetics and I don’t know how to measure this. The first part is quite similar to [i] and the vowel then gradually shifts to [e]. The results will obviously be very different depending on which part of the vowel I choose to measure.

There are other oddities as well. How come that [ɛː] is considerably more open than my [a] produced earlier, but still roughly in the middle in terms of front-back? There are obviously too many things going on here to make further analysis worthwhile. I think most of this comes from different modes of pronunciation or different linguistic contexts.

Conclusion

Even though this didn’t really turn out as expected, I still learnt a few things about my own voice. First, it seems like I’m definitely capable of producing sounds that are considerably more open than what I produced for the cardinal vowels. Perhaps I should use these sounds as a reference point and try to redraw the vowel space from the previous article?

Second, my Chinese vowels look pretty good when compared with the expected values. I noted three differences that ought to be investigated further: 1) my [i] and [y] differ only in lip rounding in Mandarin (almost exactly the same tongue position), this isn’t the case for the model I used; 2) my [ɯ] is too closed, it should be at roughly the same height as [ɨ], although I don’t think this is an issue, 3) my [u] is very open compared with both the model and my Swedish pronunciation, which is very interesting and should be checked, although I’m pretty sure my [u] in Chinese is pretty good.

Third, the method I’m using here isn’t very good. There are several reasons for this, but the most obvious one is that the phonetic environment differs quite a lot between the recordings. In Chinese, I read mostly open syllables (CV), whereas all the Swedish examples on Wikipedia are closed syllables (CVC). The initial consonants are also different, as is vowel length. I simply don’t know enough about Swedish phonetics or acoustic phonetics in general to be able to say how much an impact this has, but my guess is that it’s pretty large.

Thus, I will abandon my attempts at comparing between different languages for now and just stick with Chinese. In my next article, I will look at diphthongs and triphthongs (or glides followed by diphthongs if you prefer). Stay tuned!

Tags: , ,

  1. Pawel’s avatar

    Hi Olle,

    interesting articles. I just scanned through it, but I like the idea you’re using Praat to analyze your voice. I’ve done that, too to check my tones are all right and even trained using Praat to get the intonation right.

    As I said I only quickle scanned your article, but you mentioned some unexpected results in your swedish vowels and that you also have vowels outside of your vowel space. If you recorded the samples on different days, that could be one reason. You usually should take mean values from many recordings and I don’t know how many you had.
    Also: What words did you use to produce them? Usually you should take minimal pairs to get you samples or very similar structured words. E.g. if you have one sample of an vowel instance in a context like CVC your Cs should be same or at least similar. If you use a lateral, i.e. /l/ in one recording and some plosives, i.e. /p/ in another this lead to unexpected results. Especially voiced sounds influence the target phoneme.
    I’ve actually drawn my own vowel space diagram some years ago when I studied computational linguistics. Unfortunately I forgot a lot of the linguistic terms to sound more professional, though the best way to collect your samples in addition to the above mentioned is to stuck your words in a sentence and record this sentence. This way you assure, that the articulation is more natural as if you’d only record one word. Of course the sample should be always in the stressed syllable.

    cheers
    Pawel

    Reply

    1. Olle Linge’s avatar

      Hi Pawel! Thank you for your feedback. I’m pretty sure the odd results were due to differences in linguistic context, but the number of samples might also have played a role (I basically only used two and recorded more only if I found them to deviate a lot). The problem with my discussion of vowel space is that I don’t think my attempts to produce the cardinal vowels represents my most extreme vowels, I might produce more open vowels in specific syllables and so on. One problem when choosing linguistic contexts in different languages is of course that some features don’t exist in the other language, so it’s hard to find suitable minimal pairs that work within the language as well as cross-linguistically. I’m pretty sure I can do better than I did here, though, but on the other hand, I think I might be better off moving on and remembering this lesson for later. In short, linguistic context obviously matters a lot and the recording environment, including date, also makes a difference (although all these samples were in fact recorder on one single day). My main problem is with the vowel space part rather than the Swedish/Chinese vowels, I think!

      Reply

    2. Pawel’s avatar

      Hi Olle,
      you’ll always have allophones that are more open, closed or whatever depending on the context, so with your simple method I wouldn’t bother too much if your estimated vowel space is maybe smaller than what you produced later.
      After all what you did is just an estimation towards what you do in real life. Even with a lot of recordings it would be only that, an estimation. It’s good to know though what is actually going on and I bet you’ll find the insights very helpful in the future. At least I did.

      Reply

      1. Olle Linge’s avatar

        Yes, you are perfectly right, I think. I have had very little time to continue this project, but I will at least write some more about tones and using Praat, just not sure when. Thanks for your feedback!

        Reply

      2. Joe Pickett’s avatar

        If your range is not extending down F1 far enough, you should consider the Hz values do depend on the size of your vocal tract. Males will have “smaller” Hz values because of the larger size of their body -> deeper voice. Person and Barney values from 1952 show this with averages from Males, Females, and Children of American vowels. Though I read the article I was not clear why you chose not to include English.

        Reply