Chinese proficiency report 20

Even though only little more than a month has passed since the previous report,  enough has happened to motivate a new article. The previous post dealt with my Chinese ability in general terms and also more specifically about listening ability. I concluded that my ability in general is probably good enough to survive the master’s degree program I’m currently enrolled in. I also lamented the fact that spoken Chinese is much more context-based than any other language I’ve studied, which makes listening something of a problem sometimes. This time I will talk about pronunciation.

Attitude and pronunciation

I regularly ask people I speak with to offer suggestions for how to improve my pronunciation and I have done so since I first started learning. Often, these questions are met by surprise: “Why do you want to improve your pronunciation, it’s already quite good!” The answer is complex and something I plan to write more about later on Hacking Chinese, but it’s precisely because I have this attitude that I have managed to acquire the pronunciation I have now. If I were content with being better than the average foreigner, I could have stopped focusing on pronunciation years ago. The road to good pronunciation is very long and being complacent certainly doesn’t help. Thus, I still think I need to improve and that’s what this article is be about.

A question of tones

In general, I think my pronunciation of initials, medials and finals is pretty good. I don’t want to say that it’s perfect, but it’s definitely good enough for teaching Chinese to students. Even though I’ve tried, I haven’t found any serious problems in this area for a couple of years, so I don’t think this is something I need to improve much.

Tones are different, though. I don’t think I have much problems when pronouncing them individually or in isolated words, but as soon as words are connected into longer sentences and intonation starts being important, I make several small but significant mistakes. During this semester, I’ve asked a number of people to pay attention to my pronunciation (thanks to those who have helped, you know who you are). I have also recorded my own speaking to analyse myself. The native speakers may have used different words to describe these problems, but in essence, they mostly agree on the below analysis. I do, too.

Second tone, fourth tone and pitch range

In general, my tones are correct insofar as the direction is correct (second tone rises, fourth tone falls), but the distinction isn’t clear enough. Occasionally, both these tones are incomplete and usually too low. There are some other, minor problems, but I would say the main problem is pitch range. Since I seem to speak Chinese with a deeper voice than I speak English or Swedish, the pitch range becomes too narrow.

The suggested solution looks as follows:

  1. Prolong second tones. Since the direction is already correct, making the tone longer will ensure that it goes high enough as well. Once I’m used to that, it should be easier to keep the tone height while decreasing the time it takes to get there. Right now it feels a bit uncomfortable, but I’ve checked with a few people and they think the new, higher version sounds better.
  2. Modifying the onset of the fourth tone. Right now, the general contour is correct (falling), but the fall is a bit abrupt. If I want to speak clearly to allow others to master tones as will, I should mark the beginning of the fourth tone better. See the graphs below.
  3. Paying more attention to the third tone. In theory and most of the time when I speak, I pronounce it correctly, but I sometimes slip. This is probably best remedied through more practice. My best performance here is good enough, so it’s mostly a matter of making sure my normal performance approaches my best performance.

The T4 to the left is from a tone reference chart and is a model T4 (female speaker, which explains the higher pitch in general). The T4 to the right is my own normal T4. Both pitch contours were drawn using Praat. Note that I lack the plateau in the onset (red ellipse).

Proposed line of attack

Changing pronunciation in general is hard, changing pitch range when you speak is several magnitudes harder. People tend to identify themselves with their own voices, so changing something as basic as pitch range requires time and courage. I’ll simply have to accept that this will take some work and time. I don’t doubt that it can be done, though. my plan is roughly as follows, although it isn’t really serial (I won’t wait until step one is entirely completed before attempting step two and so on).

  1. Individual syllables
  2. Disyllabic words
  3. Sentences
  4. Natural speech

The goal is to practice each step until it feels natural. Changing natural speech will be very hard indeed, so I’ll try to build up to it slowly, making sure I’ve implemented the above changes in a controlled environment. I’ll have several native speakers helping me, so I’m sure I will be just fine.


Before I started analysing the problem(s) in detail, I did a recording with me reading an article. It’s roughly three minutes long and even if it isn’t the best I can do (I speak much better than I read), I still think it’s fairly representative of my pronunciation in general. I’ll leave that article alone for at least a month or two until I feel that I have actually achieved something. Then I’ll read it again and see if I have really achieved something or not. I used Praat to check some passages of that article, but tones in connected speech are so complex that I think that seeing the pitch contour doesn’t help much. In this case, I’ll rely on competent native speakers instead.

Please help

Changing pronunciation like this is difficult in many ways, but one of the hardest parts is staying focused. Naturally, I can’t spend all my energy thinking about tones, because that will make communication awkward. However, it would be very nice indeed if people who speak with me regularly remind me of this. As you well know, I don’t mind being corrected, indeed, it’s an essential part of learning. With your help, I should be able to correct these problems!

Tags: , ,

  1. nanpyn’s avatar


    To prolong the T2 is a good solution, but be careful with the length; otherwise, it will turn into a T1. To be precise, better prolong and connect the T2 with the following T1 or T4, so that the listener hears a natural peak or plateau, esp. for T2+T1 and T2+T4. However, if the T2 is at the end of a sentence, it’s alright and normal to speak it with a lower ending pitch.


    I’d like to thank you. You’re not only improving your pronunciation but also enlightening me to notice what I didn’t. I didn’t realize that it’s a matter of onset until I read your analysis. I simply thought there was a gap between the preceding tone and the T4. Now I know that it’s not a gap but that ears are unable to hear an abrupt falling contour unless there is a reference in the beginning. In sum, each syllable requires a certain length of time.

    In contrast, it somehow explains that why some people only hear the onset of T4 without hearing the falling part. In their ears, they may perceive the T4 as a loosened variant of the T1. e.g. Cantonese (in which the high falling tone is a variant of the high level tone) and Vietnamese (in which the high falling tone is absent)


    Compare these confusing combinations: T3+T2, T2+T3, T3+T3.

    You’re on the way of the last mile. Jiayou!


    1. Olle Linge’s avatar

      “To prolong the T2 is a good solution, but be careful with the length; otherwise, it will turn into a T1.”

      I think this will come naturally. I don’t think T2 turning into T1 is a big risk, actually, but I’ll be careful!

      “However, if the T2 is at the end of a sentence, it’s alright and normal to speak it with a lower ending pitch.”

      Thanks for reminding me! It sounds a bit exaggerated when journalists do this sometimes. It might be good for clarity, but doesn’t sound very natural (to me, at least).

      “In contrast, it somehow explains that why some people only hear the onset of T4 without hearing the falling part. In their ears, they may perceive the T4 as a loosened variant of the T1.”

      This is very interesting. Now that you mention it, it looks obvious, but I actually haven’t thought about this before. Of course this is why some people have this problem! I’ve always been perplexed by this, because in my ears, T1 and T4 are as different as they can possibly be.

      “Compare these confusing combinations: T3+T2, T2+T3, T3+T3.”

      Will record and analyse later. :)


    2. nanpyn’s avatar

      1. Ah, I should have stated that “if the T2 is at the end of a declarative sentence.” The overall intonation will also affect the pitch contour as we know.

      2. To learners whose native languages don’t have tones, once the learners hear the differences, they just hear them (as easy as filling in the blanks). To learners whose native languages do have tones, the learners’ ears are deceived by the sounds they have been accustomed to hear (as vague as listening to music with the headset in a very noisy place). Maybe that’s why the former group always speak the tones more accurately than the latter group do. Lucky you are, ha.

      3. Looking forward to them. :)