You are currently browsing articles tagged Phonetics.

I have played around with Praat a bit this semester and I have previously published two articles about my adventures in the land of phonetics, one about basic vowel space and one about monophthongs in different languages. I originally intended to write several articles, gradually building up to a guide for how to identify Mandarin syllables in Praat, but since I ran out of time, I’m jumping ahead in the series and publishing this article now. If you wan to try any of this yourself, you can download Praat here.

Identifying Mandarin syllables in Praat

To learn more about Chinese phonetics, I have been playing a little game with my self. I have a large number (1000+) syllables in Chinese recorded by a female speaker. I load one of the syllables into Praat randomly without looking. The goal is to figure out which syllable it is only by looking at it.

This is quite possible, although 100% accuracy is probably not achievable because some sounds are too hard to tell apart. I haven’t kept a detailed record of my score, but I think I get it completely right slightly more than 50% of the time and when I’m wrong, it’s usually just a little bit, such as mistaking “tán” for “pán” or similar.

The goal with writing this guide is primarily to help myself understand what I’m doing. It’ of course possible that someone else finds it useful, but probably not very many. This guide is basically a long discussion of what I do when I (rather successfully) identifies Mandarin syllables just by looking at the spectrogram and waveform.

If you have suggestion for how to improve the guide or references that can help me improve accuracy, let me know! Also note that I’m no expert, so please report any errors you find. I have taken a few courses in Chinese phonetics, but that’s about it, the rest I’ve learnt on my own, mostly in Chinese, so sometimes I might use inaccurate vocabulary in English, but it should be okay. Let’s get started!

Table of contents

  1. Step 1 – Tone
  2. Step 2 – Syllable structure
  3. Step 3 – Identify sounds
  4. Spectrogram challenge
  5. Conclusion
  6. References and further reading

Step 1 – Tone

I usually start with the tone because it’s the easiest part. Basic knowledge of the contours of Chinese tones should be enough for almost all cases. Exact F0 values (pitch) seldom need to be considered because the contour is always enough. The only potential trap for beginners is to fail to recognise that both T2 and T3 fall before they rise. The main difference is that the turning point comes later and is lower for T3. Compare:


Turning point of T2 and T3, both in ”ma”.

For the sake of completeness, here are typical cases of T1 and T4 as well:


T1 and T4, both in “ma”.

Step 2 – Syllable structure

When trying to determine which syllable we’re dealing with, it’s useful to try to get a general understanding of roughly what kind of syllable stricture we’re talking about first. The following section isn’t meant to determine exactly what these parts are, but rather to pinpoint the number of sounds and general syllable structure. Since Mandarin only has slightly more than 400 syllables (since we have already dealt with tone in step one) and the structure is very rigid (a full syllable is CGVN, Consonant Glide Vowel Nasal, where all parts are optional except the main vowel). It should of course be noted that most of the possible combinations don’t exist or don’t exist for certain tones.

Initial consonant: If voiced, e.g. [m n l], it looks like a vowel, but is generally weaker. Stops are usually visible through their releases and fricatives are easy to spot because of the noise-like turbulence. Affricates are combinations of stops and fricatives.

Glides and vowels: There is some controversy in phonology if G belongs with the preceeding consonant, the following vowel or if it fills a slot on its own, but for our purposes, it’s probably best consider it a vowel in addition to the main vowel.

Final consonants: Final consonants in Mandarin can only be [n ŋ ɻ]. If there seems to be something significant going on after the vowel has ended, it’s one of these finals. None of the syllables I’ve been playing with contains any [ɻ] finals (known as Er hua), so this won’t be part of this guide.


[an]: Note the rise in both F2 and F3 towards the end and the cancellation of F2 as the final begins (anti-resonance, see Identifying nasal consonants).

Step 3 – Identify sounds

Use this flow chart to figure out what to do next. Coloured steps in the flowchart have detailed discussions below. If you have problems with other steps, you probably need to read basic definitions of the relevant speech sounds, please refer to the relevant entries on Wikipedia.

chartIdentifying vowels

Identifying vowels can be tricky by simply looking at one single sample, but it’s still pretty easy to get the right idea by comparing F1 and F2 values. It also helps a lot being familiar with the syllable structure in Mandarin, because some monophtongs or diphthongs simply don’t occur in certain environments and can there fore be excluded.

For instance, if you think the syllable ends with a nasal, you don’t need to worry about the subtle differences between [i] and [y] because if there’s only one vowel sound, it has to be [i] because [y] can’t be followed directly by [n] or [ŋ]. Similarly, if you can identify one of the sibilants [ɕ ʂ s] accurately, you don’t need to differentiate the allophones of /ɨ/ because these are in complementary distribution.

So if you can’t identify the vowels exactly, narrow it down to a range of possible answers based on the general syllable structure. You will probably be able to guess which vowel it is later once you know more about the preceding and following sounds. However, it should be mentioned that vowels are usually the easiest to guess, so you probably want to gain as much information as possible in this step so you have fewer possibilities later.

Identifying finals

There are three finals: [n ŋ ɻ]. All of these influence the preceeding vowel to different extents (a lot in the case of [ɻ]) so identifying the final involves looking at the preceeding vowel as well, not just the final itself. In the case of [ɻ], there are some general signs (such as a drop in F3 which will approach F2), but more detailed knowledge of how Erhua influence the preceeding vowel(s) is probably necessar (see 朱川, 2013, or the article about Erhua on Wikipedia). In general, the spectrum should start approaching that of [ɻ] during the pronunciation of the vowel.

For [n ŋ] the situation is similar in that there are two things happening. First, they influence the quality of the preceeding vowel, and, second, the final itself is different. The easiest part to spot is that F2 and F3 are higher for [n] compared with [ŋ]. Let’s look at the spectrograms for [an] and [ɑŋ]:


[an]: Note the rise in both F2 and F3 towards the end and the cancellation of F2 as the final begins (anti-resonance, see Identifying nasal consonants).


[ɑŋ]: Both F2 and F3 are dropping rather than rising.

Identifying these finals only by looking at the finals themselves is hard, but as noted, [n] is more likely to have F2 cancelled out. This is far from completely reliable, though, but it is a clue.

Identifying fricatives

Fricatives all have noise-like turbulence and can be told apart by looking at the energy of the turbulence at different frequency ranges. In Mandarin, there are six fricatives [f ɕ ʂ s x ʐ]. Let’s first deal with some of the easier ones.

  • [ʐ] can be esaily identified because it’s voiced (see Identifying voiced consonants below). Remember to combine the information about the fricative with the following vowel since many of the fricatives are in complementary distribution.
  • [f] is a non-sibiliant and generally a lot weaker than the other fricatives (including [x] and shouldn’t be too hard to identify. The energy is also quite uniformly increasing with frequency (see picture below).
  • [x] has a less evenly distributed energy (several discrenible contentrations at different frequency levels. Compare the below pictures of ”heng” and ”feng”:


Now let’s have a look at the three remaining fricatives [ɕ ʂ s]. The first thing you need to do when identifying fricatives is to make sure you’re displaying the whole spectrogram (Praat is by default set to show 0-5000, which is not enough; set the upper limit to at least 10000, possibly even 15000).

If you don’t know anything about the speaker, it will be difficult, because all of these things are individual, but if you see a few sample, you can still calibrate your guesses. The easies way to deal with [ɕ] is to look at the following vowel (which is usually relatively easy to identify). Since [ɕ] is in complementary distirbution with [ʂ s], we will only look at how to tell the latter two a part here.

In general, the main difference between the retroflex affricate [ʂ] and its non-retroflex friend [s] is that the intensity of [ʂ] starts much, much lower, see the spectrograms of ”sa” and ”sha” below. The exact freqncy ranges might be different depending on the environment, so [ʂa] might not be identical to [tʂa], but the general trend is still there (and the difference is usually very large).


Non-retroflex (sa) vs. retroflex (sha).

Identifying plosives

This is by far the hardest part and I don’t think it’s theoretically possible to reach a very high accuracy. The reason is that the stops are too brief to identify properly and aren’t in complementary distribution, so looking at the following vowel seldom help. The only clue is often formant transitions.

According to locus theory, all consonants have a target frequency for each formant, even though this might be influenced by adjacent sounds. This means that the transition of the formants (F2 and F3) can help us identify the plosives themselves. This picture is taken from Kevin Russel’s phonetics site (Univeristy of Manitoba).


In general, we can see a pattern that looks as follows:

  • Bilabial locus frequency: Low F2, low F3
  • Alveolar locus frequency: Mid F2, high F3
  • Velar locus frequency: High F2, mid F3

Read more here, here and here. This is all very good in theory, but I find it very hard to actually use this to determine the plosive in question. Sometimes the transitions are hard to see or they simply don’t fit the patterns described above.

Identifying aspiration

Identifying aspiration is usually not very difficult, but can be somewhat complicated by affricates (which look al ittle bit like aspirated stops) and aspirated affricates such as [t͡ɕʰ t͡sʰ ʈ͡ʂʰ]. Let’s start with the main difference between the non-aspirated stops [p t k] and their aspirated counterparts.

The main difference is in the interval between the stop release and the voice onset (VoT). Non-aspirated stops have very short VoT, usually 10-35 ms, whereas aspirated stops have a much longer VoT, usually 70-100 ms (Chao & Chen, 2008). Let’s look at the [t tʰ] pair as an example:


VoT of aspirated and non-aspirated [t] in “da” and “ta” respectively.

The next problem is to separate affricates from aspirated stops. This is relatively easy if we know what fricatives look like (and we do, see Identifying fricatives above). The aspirated part looks very much like breathing out sharply [h], which is the frictionless version of Pinyin ”h”. The following spectrogram is such a (relatively) frictionless [h] in ”ha”:



As we know from our comparisons of fricatives, they don’t have such a uniform frequency distributions, so if we compare the pair [tʰ t͡sʰ], it should be relatively easy to see both the friction and the aspiration, although the two certainly overlaps to a certain extent:


The aspirated affricate [t͡sʰ]. Note the similarity to [s] and [h].

Finally, we need to look at aspirated versus non-aspirated affricates, e.g. Pinyin ”z” [t͡s] and ”c” [t͡sʰ]. As expected, we see that the fricative part similar to [s] is there for both affricates, but that the aspirated [h] part is missing for [t͡s] and it therefore has a substantially shorter VoT:


Both fricative [s] and apsiration [h] clearly visible in [t͡sʰ], “ca”.


Only fricative [s] visible, only minor gap before start of vowel.

If you can’t see the fricative, you probably need to adjust the spectrogram settings. The above diagram stops at 6000 Hz, which isn’t really enough to analyse fricatives, see Identifying fricatives above.

Identifying voiced consonants

This is one of the trickier parts. There are four voiced (initial) consonants [m n l ʐ]. First, [ʐ] is a fricative and should be quite easy to identify. If you look at the picture below, you can clearly see the fricative turbulence and the voicing:


[ʐ] is easy to identify because it’s the only voiced fricative, ”ran”.

The remaining three are much, much harder and are often indistinguishable just by looking at the spectrogram because they have similar F1 and F2. I have found no way of reliably telling them apart this way, but there are clues in the waveform.

Let’s start with [l], which has a glottal perturbation (creak) in each cycle, which is fairly easy to spot (the ”craggy” looking bits, compare this with the waveforms of [m n] below):


Five cycles of [l m n]

I have found no reliable way of separating [m n], but F2 seems more likely to be cancelled out by anti-resonance in [n] compared to [m].

Formant transitions for [m] are similar to those for [b p], while those for [n] are similar to [d t s z], but this can be very hard to see. Read more baout this here.

Spectrogram challenge

I’d be really surprised if anyone actually reads this far, but if you do and think this is interesting and/or fun, feel free to have a go at the following spectrograms. Which Mandarin syllables do they represent? Post a comment with your answers!

Spectrogram #1


Spectrogram #2


Spectrogram #3


Spectrogram #4


Spectrogram #5



It’s been both entertaining and educating to write this guide. There’s obviously more to spectrogram analysis that I have written here. My goal was simply to use what i have learnt in the past year or so to see what I could do with Mandarin syllables (which are a lot easier to analyse than, say, English or Swedish). This article probably contains some errors, so if you find anything that looks weird let me know! If you want more challenges, you can head over to Robert Hagiwara’s Monthly Mystery Spectrogram page. It hasn’t been updated for a long time, but it still contains a lot of useful information!

References and further reading


Here is a list of books, articles and websites that I’ve found useful. I also want to thank professors 朱川 and 曾金金 whose courses in phonetics I have attended. It’s so much easier to learn these things in collaborative discussions in class compared with on one’s own!

鄭靜宜. (2011). 語音聲學:說話聲音的科學. 心理出版社.

王理嘉、林燾. (2013). 語音學教程. 五南出版社.

曾金金. (2008). 華語語音資料庫及數位學習應用. 新學出版社林.

朱川. (2013). 外國學生漢語語音學習對策(增訂本). 新學林出版社.

Boersma, P., & Weenink, D. (2005). Praat: doing phonetics by computer (Version 4.3.01).

Chao, K. Y., & Chen, L. M. (2008). A cross-linguistic study of voice onset time in stop consonant productions. Computational Linguistics and Chinese Language Processing, 13(2), 215-232.

Duanmu, San. (2007). The phonology of standard Chinese. Oxford University Press.

McQuarie University. (2008). Speech Acoustics Topics.

Wikipedia. Mandarin Phonology.

Wikipedia. Acoustic phonetics (and related topics).

Tags: , ,

In the previous article, I discussed vowel space and cardinal vowels, trying to establish a foundation to use to discuss more practical details of actual languages, such as pronunciation of vowels. In this article, I will use the data from the previous article to compare vowels in Chinese and Swedish. I originally meant to include English, but I skipped that for reasons that will become apparent below.

The main focus is of course Chinese. Vowels are known to be fairly easy to get roughly right (communication), but very hard to get native-like (accent). This is most likely because they exist in a space rather than on a spectrum (actually, a volume might be a better representation if we include lip-rounding). It’s simply very hard to create an abstract representation of a sound in another language when there are already partly overlapping phonemic categories from your native language (most research suggests that language interfere with each other in all kinds of ways, thus making it impossible to become like a monolingual native speaker).

So, my main question today is to see whether or not my pronunciation of Chinese vowels closely resemble my pronunciation of Swedish and whether or not they resemble some kind of model in Chinese. Intuitively, I would say that they don’t resemble Swedish much and that they are roughly correct. Still, at the time of writing this little prelude, I don’t really know, so let’s check it out!

My vowel space

Except for the below graph, I won’t repeat what I wrote in the previous article. This is what my attempt at producing the eight cardinal vowels looked like:

cardinal vowel chart olleAlthough not perfect, this roughly represents the range of sounds I can comfortably produce. Of course, I could possibly refine the blue line a bit by trying to produce vowels between e.g. [a] and [ɑ], but this would probably not change the overall picture by much. Still, as we shall see, some of my Swedish vowels go quite far beyond the confines of the blue line.

Chinese monophthongs

The most common way of counting Chinese monophthongs gives us ten different vowels. There are actually more allophones, but I have to draw the limit somewhere, so I’ll stick with the textbook examples. Keeping the blue line from the cardinal vowel chart above, I have plotted F1 and F2 of my monophthongs in Chinese. These were produced in real CV syllables when available (otherwise just V or VC).

chinese monophthongsTo make comparisons easier, let’s include the F1 and F2 values as well:

Vowel F1/F2
[i] 264/2135
[y] 260/2064
[ɛ] 438/1901
[a] 635/1100
[ǝ] 486/1316
[ɤ] 483/1064
[o] 496 718
[u] 430/683
[͡ɯ] 317/1420
[͡ɨ] 374/1625

Before we start comparing this with other languages, it makes sense to know roughly if these values are in the ballpark to start with. The following diagram is scanned from 朱川’s 2013 book 外國學生漢語語音學習對策, but is originally from 吳宗濟’s 漢語普通話單音節語圖冊 (1986). Based on the frequencies, I would guess this includes both male and female speakers, but I don’t know exactly how these values where elicited. If you happen to know of a better source, please let me know!

vowel distributionNot all sounds in this diagram are present in my sample and not all sounds present in my sample are in this diagram, but the general picture looks quite promising. All my vowels seems more closed that the samples here, but I think that’s either due to the way I normally speak (in any language) or because of the shape of my oral cavity. I have tried to produce [a] with a higher F1 than 700, but I simply can’t do it. The below graphs compare my vowels (left) to those from 吳宗濟 (right).

shape comparison

I note the following:

  1. My [y] is roughly as closed as my [i], but it’s meant to be slightly open (I have seen this in numerous references, not just the above graph. This difference ought to be very slight and perhaps not even noticeable.
  2. My null final [͡ɯ] might be a bit too closed. Perhaps this is a result of trying to pronounce this sound as clearly as possible. In any case, I’m pretty sure this isn’t a big problem, the difference is pretty small. Also, these vowels are apical and therefore might behave differently than normal vowels.
  3. My [u] is pretty open compared to the model. This should actually be immediately obvious when you look at my monophthongs, the [u] is very far from the top-right corner. Still, I think my pronunciation of the cardinal [u] is very exaggerated and quite far from the correct sound in Mandarin. I’m not sure if my [u] is too open or not.

I will add these observations to a list that I will later check with native speakers to see if the difference is significant or not. Still, the most striking observation is that apart from the mentioned oddities above, the shapes are very similar indeed. Now, let’s see how this compares to my Swedish vowels.

A cross-linguistic comparison

The main question I want to answer is if rely on my native Swedish vowels when pronouncing Chinese. I don’t think this is the case, but let’s find out! If you don’t know anything about Swedish vowels, I suggest you check out this article on Wikipedia. In short, it’s a lot more complicated than Chinese. Mastering the Swedish vowel system must be a nightmare for native speakers of Chinese! To do this, I simply recorded the same words used in the Wikipedia article. The syllabic environment isn’t identical, but anything I write here contain similar levels of error anyway, so it’ll have to do. Here’s the formant data for my Swedish vowels:

Vowel F1/F2
[iː] 250/2189
[ɪ] 287/2321
[eː] 251/2480
[e] 449/1977
[ɛː] 798/1774
[ɛ] 521/2084
[ɑː] 590/982
[a] 733/1194
[oː] 490/1038
[ɔ] 532/667
[uː] 495/727
[ʊ] 273/657
[ʉː] 275/1727
[ɵ] 385/1178
[yː] 273/2051
[ʏ] 317/1913
[øː] 387/1747
[œ] 441/1561

I plotted all that into the same graph, but I can’t be bothered to label them all. The most interesting finding is that it seems my vowel space drawn earlier is much too narrow. I have quite a few vowels outside what I thought were my extremes!

swedish chinese vowelsHowever, when I started comparing Chinese with Swedish, I soon realised that this approach is deeply flawed, and doesn’t work very well for the long vowels. I didn’t realise this before, but the long vowels in Swedish undergo quite a lot of change. For instance, this is a spectrogram of when I say the word “hel” (“whole” in Swedish (the light red area is the vowel):

helI have no training in Swedish phonetics and I don’t know how to measure this. The first part is quite similar to [i] and the vowel then gradually shifts to [e]. The results will obviously be very different depending on which part of the vowel I choose to measure.

There are other oddities as well. How come that [ɛː] is considerably more open than my [a] produced earlier, but still roughly in the middle in terms of front-back? There are obviously too many things going on here to make further analysis worthwhile. I think most of this comes from different modes of pronunciation or different linguistic contexts.


Even though this didn’t really turn out as expected, I still learnt a few things about my own voice. First, it seems like I’m definitely capable of producing sounds that are considerably more open than what I produced for the cardinal vowels. Perhaps I should use these sounds as a reference point and try to redraw the vowel space from the previous article?

Second, my Chinese vowels look pretty good when compared with the expected values. I noted three differences that ought to be investigated further: 1) my [i] and [y] differ only in lip rounding in Mandarin (almost exactly the same tongue position), this isn’t the case for the model I used; 2) my [ɯ] is too closed, it should be at roughly the same height as [ɨ], although I don’t think this is an issue, 3) my [u] is very open compared with both the model and my Swedish pronunciation, which is very interesting and should be checked, although I’m pretty sure my [u] in Chinese is pretty good.

Third, the method I’m using here isn’t very good. There are several reasons for this, but the most obvious one is that the phonetic environment differs quite a lot between the recordings. In Chinese, I read mostly open syllables (CV), whereas all the Swedish examples on Wikipedia are closed syllables (CVC). The initial consonants are also different, as is vowel length. I simply don’t know enough about Swedish phonetics or acoustic phonetics in general to be able to say how much an impact this has, but my guess is that it’s pretty large.

Thus, I will abandon my attempts at comparing between different languages for now and just stick with Chinese. In my next article, I will look at diphthongs and triphthongs (or glides followed by diphthongs if you prefer). Stay tuned!

Tags: , ,

Phonetics is great fun. In this article series, I will share some self-experimentation in Chinese phonetics that I simply think are too nerdy to share on Hacking Chinese (perhaps I will find some way of publishing something about this there later, but this is the unedited director’s cut). This is the first of several articles where I discuss Chinese phonetics and some related experiments I’ve done with my own pronunciation. Before we get to the actual Chinese, we need some basic knowledge of phonetics, so I will talk mostly about vowels in general first and will start talking about Chinese vowels next time.

About this article

A word of warning, don’t expect this to be helpful or useful, but expect it to be interesting. I do think that a deep understanding of phonetics can really help learning to pronounce a foreign language, but it’s certainly not the most efficient method of learning. If you’re interested in what I spend my spare time doing at the moment, read on. If you want quick fixes to your own pronunciation, go mimic a native speaker instead or read articles about pronunciation on Hacking Chinese.

If you find anything wrong or dubious in this article, just leave a comment. I don’t pretend that I actually know these things well, so there might be errors here and there. Since part of the goal is to learn more about phonetics, pointing out a mistake I’ve made is equivalent to doing me a big favour!

This article is going to contain some jargon and it will require you to already understand some basic theory. Rather than spending hours explaining these thing, I will simply link to Wikipedia articles whenever necessary. I will try to make the narrative understandable even if you haven’t taken several courses in phonetics, though, but I might be a bit blind to what uninitiated people find hard.

If you want to, you can do everything I’ve done here yourself, you just need a microphone and Praat, which is a program developed for speech analysis and is free of charge. I’m not going to go into any details about how to use Praat now, but it’s fairly easy to find tutorials online.

Vowels and vowel space

Basically, vowels can be defined in a two dimensional space determined by how the tongue separates the oral cavity into two compartments, which will result in a signal with different formant frequencies. This means that if you look at the spectrogram of a vowel, you can actually see these formant frequencies and thereby roughly determine the place of articulation of this vowel. The picture below is from my pronunciation of [i] and the lower line with the red dots represent the first formant frequency, F1, and the second line with red dots represents the second formant frequency, F2.


The value of F1 is related to the openness of the vowel, i.e. how much you open your mouth and lower your tongue when pronouncing it.  Try pronouncing “bin” and “ban” in English and you should feel a big difference openness.A low F1 means that the vowel is closed, so the [i] above is a closed vowel because F1 is very low.  The opposite would be [a], which is an open vowel and has a relatively high F1. See the below spectrogram for [a].


The value of F2 is related to the back-front aspect of the vowel, i.e. how far forward or backward your tongue is positioned. Try pronouncing “beat” and then “boot” in English and you will feel difference between a front vowel in “beat” and a back vowel in “boot”. F2 decreases as the tongue retracts, so a [i] in “beat” has a very high F2, whereas [u] in “boot” has a lower one (although not as low as the cardinal [u] described below). Compare the formant frequencies of [a] above and [u] below. Note that F1 and F2 overlaps in this diagram, the formant at around 2100 Hz is F3, not F2.


Cardinal vowels and my personal vowel space

What the above means is that there is a range of possible vowels and that vowel quality can be defined in terms of the location in this space. In phonetics, there are eight cardinal vowels that occupy the corners and edges of this space and they can be represented in what’s called a vowel chart. You can check the IPA vowel chart on Wikipedia, which also has audio recordings or York University’s site which also contains a neat chart with audio. There are eight cardinal vowels, four front and four back, each set comes with different degrees of openness.

(Actually, there is a third dimension I have mostly ignored and will continue to ignore, and that is lip rounding. As you can see in the Wikipedia article above, there is a second set of cardinal vowels that matches the first eight, but are opposite in terms of lip-rounding. This is too complicated for this article and I will ignore anything else beyond the basics for now.)

One problem with these charts is that they are schematic rather than accurate representations of the oral cavity, the produced sound or the perceived sound. For instance, since the shape and size of the oral cavity and other resonance cavities vary between individuals, you can’t just compare someone’s formant frequencies for one vowel with those of someone else and conclude that A’s vowels are farther back than B’s.

One way of approaching the issue is to draw your own vowel space and see what the cardinal vowels look like when you pronounce them. This is very simple to do in theory:

  1. Record the eight cardinal vowels
  2. Measure F1 and F2 for these vowels
  3. Plot them on a formant diagram (F1 against F2)

Each step isn’t as easy as it looks, though, but more about that in a moment, I’ll show you my results first. This diagram shows F1 plotted against F2. Note that actual frequency is not the same as perceived frequency, so therefore the scales aren’t linear.

cardinal vowel chart olleThese are the eight cardinal vowels and their F1 and F2 frequencies. Here are the relevant numbers:

Vowel F1/F2
[a] 253/2309
[e] 335/2094
[ɛ] 461/1702
[a] 636/1404
[ɑ] 580/1007
[ɔ] 424/672
[o]  361/609
[u] 245/446

You can also plot the frequency of F1 and F2 for each vowel, which in my case gives something like this, which is fairly close to what it’s supposed to look like. Remember, the order of the cardinal vowels is from closed front via open front and open back to closed back. Thus, we expect F1 values to first increase and then decrease. We also expect F2 values to fall through out the cardinal vowel sequence. This is also what we find.

cardinal vowel formant graph

I don’t think much can be said about this, even though my ow rendering of the cardinal vowels isn’t perfect. It would be interesting to see what the model talker on York University’s site would look like plotted in a similar way to what I have done above. Still, I think the blue polygon in the first graph shows pretty well the limits of my articulation. I have tried to produce even more extreme vowels in each direction without succeeding. Brief checks show that my vowels in actual languages (Swedish, English, Chinese) fall within this range, but more about this later (especially Chinese, of course).

I want to be as cool as you, what should I do?

As promised, I won’t go into details in how to use Praat, but I will describe the general process briefly based on the three steps above. The first thing you need to do is record the cardinal vowels. This can be quite hard if you have no experience with trying to pronounce sounds other than those in your native language. Note that even though the same letters might be used in your alphabet, if you are a native speaker, the cardinal vowels typically don’t match the vowels in English. For instance, “i” in English can represent two sounds: /i/ and /ɪ/, but none of them are as open and fronted as the cardinal vowel [i]. Therefore, some practice is required. Start by mimicking the audio charts I linked to above.

Second, you need to measure the frequencies of F1 and F2 in Praat. You’ll have to figure out how to install and use the program on your own, but I’ll give some suggestions for measuring F1 and F2 for the vowels. The main problem is where to measure and there are several ways of doing this. The key is to be consistent. You can either choose the time where the intensity is the highest or when the vowel looks the most stable (i.e. F1 and F2 aren’t fluctuating). I don’t think it matter much which method you choose in this case, but I usually go with the highest intensity since that’s much more objective than the idea of stability.

Third, plot F1 against F2 in a graph. The easiest way is probably to do what I did and simply take a picture of a chart and then manually plot your vowels in any decent image editing program. Creating a graph like my cardinal vowel graph is pretty easy with any spreadsheet software.


The main point with writing this article is that I enjoy it. There are also secondary reasons, like sharing what I have done with others and the fact that I learn a lot about this simply by being forced to write about it rather than just doing it. This is just the first article in this series, next time I’ll look at monophthongs in Mandarin Chinese and how these relate to the vowel space I drew in this little experiment. I will then move on to diphthongs, triphthongs before leaving vowels altogether and start looking at tones, consonants and so on. Stay tuned!

Tags: , ,

Visit Hacking Chinese instead: This post about studying Chinese is partly or completely obsolete. A revised version, along with much more related to language learning can be found at Hacking Chinese. This post is kept here for the sake of consistency.

This is the third article about pronunciation and I will continue writing about this subject as long as I think I have something worthwhile to share with others. So, far this small series consists of these articles:

Part 1 – Introduction
Part 2 – Attitude
Part 3 – Identification (this article)
Part 4 – Tones
Part 5 – Analysis

Part 3 – Identification

Starting to read this article, I assume that you already have the right attitude (i.e. you realise that improving pronunciation is your own responsibility; see article one) and that you understand the importance of actually knowing in theory how Chinese is supposed to be pronounced (see article two). Obviously, having the right attitude and the right knowledge will not enable you pronounce a language perfectly. You still need to do two things: identify errors and finding ways of removing them. This article is about finding out your mistakes.

Passive learning won’t take you very far

It might sound easy or obvious to identify mistakes, but nothing could be further from the truth. For instance, teachers are not as much help as you might think, simply because there are too many students, they have too low demands, are complacent, think it’s embarrassing to correct foreigners too much or, in extreme cases, because they aren’t very sure of the theory themselves. I’ve written more about how to handle this in the second article.

As I’ve stated earlier, being a native speaker does not mean you know everything, so you can’t rely on friendly native speakers either (if teachers are not enough, the same is even more true for ordinary people). Most people are happy if they can understand what you say and will thus be very unlikely to correct you, even if they say they will

How to identify problems with pronunciation

There are of course a huge number of methods to do this, but  below I will discuss the ones I’ve found useful and/or interesting. A combination of many methods is more likely to do the trick that solely relying on one single strategy.

  1. Listening for pronunciation – Listening actively to native speakers is sometimes very helpful. This might be obvious, but I think most people listen for meaning and not for actual pronunciation. In Chinese, you can actually ignore what someone is actually saying (except if they’re talking to you , that is) and still learn something about pronunciation. Listen to the tones and the intonation of the various parts of the sentence.
  2. Reading easy textbooks – Find a text you can handle quite easily (i.e. with very few or no new words), a text book you have already studies or something similar will work well. Read it with your teacher, friend or whoever is kind enough to help you and make sure they point out mistakes. Read the same paragraph or sentence more than once if it’s hard. The reason the textbook has to be easy is that otherwise you will spend too much energy just understanding the sentence and thus your pronunciation will be somewhat impaired.
  3. Theoretical studies – Reading more or less theoretical descriptions of the languages (phonetics)  is helpful. There are also lots of other people out there who have had the same problems as you have. I’m only one person, there are lots of others who can help you shed light on pronunciation. As an example, take a look on this discussion of the third ton in Mandarin.
  4. Reading along with native speakers – Find a text which is reasonably easy and read it together with someone. Let them read a sentence, or even half a sentence, and mimic their way of speaking. Listen for tones, emphasis and other things which are almost impossible to learn in any other way.
  5. Record yourself – If you have never recorded yourself speaking the target language, I think you will be surprised at how many mistakes you can easily hear yourself. Reading textbooks is of course the easiest way, but I would also suggest that you record natural conversation to see how you fare when you’re speaking entirely on your own. Recording might make you nervous for a while, but this should go away quickly.
  6. Guessing games with native speakers – This is a brilliant and very effective method to analyse and identify problems with tones in Mandarin. It also works for other parts of learning Chinese and the principles involved can be used for other languages as well. Since this is such a wonderful idea, I have written a special article about it.


There are numerous ways of identifying problems with pronunciation, you simply need to find one that suits you as a person and your way of thinking. I suggest using as many different methods as possible, because they are likely to catch different kinds of problems.

If anyone has suggestions of further tactics that can be employed to spot errors, please let me know, both so that I can make this article more complete, but also so that I can improve my Chinese more easily. The important thing is to continue finding out new ways to improve, because relying on the same methods all the time is unlikely to illuminate all the aspects of pronunciation.

The next article will be an expansion of point six in the list above, i.e. it will introduce an ingenious way to identify errors with pronunciation. It’s most effective for tones, but can easily be adapted to other areas. Stay tuned and good luck!

Tags: , , ,