Article Review – “Voice Emotion Recognition by Cochlear-Implanted Children and Their Normally-Hearing Peers”

This week, I’m going to talk about a new study (PDF available for free through the link) by Chatterjee et al. (2015) that looked at how well adults and children can identify vocal emotion and how each group compares to their peers. (Chatterjee, M. Zion, D.J., Deroche, M.L., Burianek, B.A., Limb, C.J., Goren, A.P., Kulkarni, A.M., and Christensen, J.A. “Voice Emotion Recognition by Cochlear-Implanted Children and Their Normally-Hearing Peers.” Hearing Research (322), 2015, 151-162).


Detecting and identifying emotions in speech is really important for communication and social interaction. For example, if you’re talking with someone, and they mention that they just bought new pants, it’s important to be able to identify any subtext underlying their statement. Are they excited that they finally had time to go shopping? Are they angry that they spilled coffee all over their old pants? Are they sad to admit a favorite pair will no longer button? Identifying the emotion behind the statement is crucial to knowing how to respond appropriately! And, identifying the emotion isn’t just important for following-up; one study has even found that the ability of children to identify vocal emotion is correlated with their assessment of quality of life [1].

In a face-to-face conversation, facial expressions can aid in identifying vocal emotions. However, it’s harder in non-face-to-face conversation, such as on the phone. In those situations, we rely entirely on acoustic cues to distinguish different emotions from each other. These acoustic cues can include stuff like how fast we talk, pitch, how our pitch changes over the course of a sentence, and loudness.

Cochlear Implants (CIs) convey some of these cues better than other cues. For example, CIs tend to convey speaking rate very well but they are pretty bad at conveying pitch and changes in pitch accurately. (This is a fairly complex topic, and I don’t want to get too into the weeds here, so for now I’ll leave it at that).

Since identifying vocal emotion could potentially rely on many different acoustic cues, some of which are not accurately conveyed by CIs, Chatterjee et al. wanted to measure how well CI users could identify vocal emotion in speech. They looked at both children (who were pre-lingually deafened), and adults (who were, for the most part, post-lingually deafened, and therefore acquired language as children prior to receiving a CI).

The Study

The researchers studied 4 groups of people: normally-hearing children, children with CIs, normally-hearing adults, and adults with CIs. All of the participants were asked to listen to several sentences, and, for each sentence, identify whether the emotion underlying the sentence was happy, sad, scared, angry, or neutral. Although the sentences were neutral in content (an example is “her coat is on the chair”), the sentences were spoken by one of two talkers who were instructed to speak the sentence using one of the five emotions, and to really exaggerate the emotion. Sentences were recorded by one male talker and one female talker.

This article has a mountain of interesting results, but I’m going to focus on a few results that I found particularly interesting – I definitely encourage you to check out the article and look at the rest of the results yourself!

CI users (children and adults) had more trouble identifying vocal emotions than their normally-hearing peers


FIG. 5 of Chatterjee, et al – vocal emotion recognition scores for all test subject groups

The above figure (FIG. 5 from the article) shows the performance of each group (adults with normal hearing [aNH]; adults with cochlear implants [aCI]; children with normal hearing [cNH]; and children with cochlear implants [cCI]). Since there were 5 choices of emotion for each sentence, if a participant had guessed randomly, they would have scored 20% correct (this is marked in the figure by the black horizontal line). As you can see, on average, all of the groups did well above chance. However, while the normally-hearing participants, both adults and children, got almost 100% correct, the CI users had more trouble. The researchers found that the children with cochlear implants performed worse than both adults and children with normal hearing and, in general, similarly to adults with cochlear implants.

Another interesting thing you can see in the figure is the effect of the gender of the talker – in particular, CI users did worse identifying emotion for the male talker compared to the female talker. This is especially true for the adult CI users. One note of caution on this result though – the study only used sentences spoken by 1 male and 1 female, so this data isn’t enough to extrapolate CI users ability to recognize emotion for male talkers vs. female talkers in general.

Emotions that were easily confused & corresponding acoustic cues

The graph above (FIG. 5 from the article) shows that CI users did worse at identifying emotions than the normal hearing participants, but that’s for all emotions lumped together. The researchers also looked at what emotions the participants were likely to confuse for each other – for example, is happy often mistaken for scared?

One way to look at which emotions are confused for each other is by constructing a confusion matrix from the responses. Here’s an example of the confusion matrices for the male talker for adults (top matrix) and children (bottom matrix) with CIs (adapted from FIG. 10 of Chatterjee et al.)


Adapted from FIG. 10 of Chatterjee, et al. – confusion matrices for adult (top) and children (bottom) CI users for the male talker.

Each block in the confusion matrix indicates the number of times the emotion indicated in the column header was identified as the emotion indicated in the row header (averaged over all participants in each group). There were 12 sentences spoken with each emotion, so if a particular group (for example, adults with CIs) were to get a perfect score, the diagonal entries would all say “12.” Instead, in the two confusion matrices shown above, you can see that the diagonal values are higher than the off-diagonal values, but none of the entries are 12, indicating that none of the emotions were correctly identified by CI users 100% of the time.

If we look at off-diagonal entries with relatively high values, we can see which emotions were often confused with one another. I highlighted one example in red – “happy” and “scared.” (“Angry” and “neutral” is another pair that tended to be confused by CI users for the male talker). Note that these are only the responses for the male talker – FIG. 10 in the article shows confusion matrices for both male and female talkers and for both CI users and normally-hearing participants.

After looking at which emotions tended to be confused with each other, I think it’s interesting to see which acoustic cues tend to differentiate the easily confused emotions to see if it makes sense that CI users would confuse them. In this study, the authors looked at how 5 different acoustic cues vary for different emotions. Before I talk about those results, I’ll quickly explain the cues that the study analyzed:

  1. Mean F0 Height – F0 stands for “fundamental frequency.” Mean F0 height basically means the average pitch of the talker’s voice. So, a bass mean F0 height is lower than a soprano’s and male mean F0 height tends to be lower than female mean F0 height.
  2. F0 Range – This indicates how much the pitch of a talker’s voice varies over a sentence. If, over the course of the sentence, the speaker’s voice goes up and down a lot, they’d have a relatively high F0 range. Conversely, if they speak in a monotone, they’d have a lower F0 range.
  3. Duration – This is pretty simple – more quickly spoken sentences will have a shorter duration.
  4. Intensity Range – This indicates how much the speaker’s voice varies in loudness over the sentence
  5. Mean dB SPL – This indicates the average loudness over the course of the sentence

And here are graphs (adapted from FIG. 1 of Chatterjee, et al.) showing how the acoustic cues vary for the different emotions. Although there’s a lot of interesting information in here, I’m just going to focus on the male talker’s duration and F0 range for the “happy” and “scared” sentences, since those two tended to be confused, as discussed above.

acoustic cues.jpg

Acoustic cues for different emotions – adapted from FIG. 1 of Chatterjee, et al.

As you can see from the red boxes in the figure above, the male talker tended to speak “happy” and “scared” sentences with similar durations (look at the red boxes in the panel in the middle row, left column). However, he tended to vary pitch a lot more for “happy” sentences than for “scared” sentences (look at the red boxes in the top right panel labeled “F0 range”). Recall that duration tends to be conveyed well through the CI. However, variations in a speaker’s pitch (how much their voice goes up and down) tend to not be conveyed well through the CI. So, for the male talker, “happy” and “scared” were very similar to each other in a cue that is easy for CI users to use (duration), but they varied a lot in a cue that is hard for CI users to use (F0 range) .

This suggests that CI users tend to confuse emotions that vary primarily in acoustic cues that are not well-conveyed by the CI. (I want to be careful to not overstate this too much: I’m only looking at one pair of emotions that were easily confused for one of the two talkers. Also, the data in the article were produced based on just one male talker and just one female talker, so it’s possible that other talkers vary acoustic cues differently for different emotions – the authors have since collected data from many more talkers, so hopefully we will know more about acoustic cues underlying different emotions soon!)

Comparison of CI users to their peers using a CI-simulator

Chatterjee et al. tested normally-hearing adults and children using a CI simulator to compare the performance in the CI simulation to the actual performance by the CI users. This might sound sort of strange – why simulate the CI users when they collected actual data from the CI users?! One reason is that this particular type of CI simulation, the vocoder, lets us look at a particular type of deficit faced by CI users called spectral resolution. Here’s one way to think about spectral resolution – imagine banging on a piano with a ball – using a smaller ball corresponds to having better spectral resolution (because the smaller ball hits fewer keys), and using a larger ball corresponds to having worse spectral resolution (because the larger ball hits more keys). Using the vocoder, we can see how having better or worse spectral resolution affects performance on a particular task, in this case, identifying vocal emotion. This lets us see whether spectral resolution is important at all for performing the task, as well as how improving spectral resolution might improve performance.

One of the main parameters we can vary in the vocoder is the “number of channels.” Let’s go back to the ball example – 4 channels in the vocoder might correspond to banging on the piano with a basketball (worse spectral resolution), whereas 16 channels might correspond to using a golf ball (better spectral resolution). Although neither ball sounds great, you can imagine that the golf ball is better. This link has examples of what vocoded speech sounds like for different numbers of channels (scroll down to section 2) – if you listen to the sentences there, you’ll notice that it’s pretty easy to understand the sentence with 15 channels, but it’s really hard with 1 or 5 channels.

Ok, so back to the study – Chatterjee et al. tested normally-hearing adults and children using the vocoder with different numbers of channels – adults listened to 4 (worst spectral resolution), 8, and 16 (best spectral resolution) channels, and children only listened to 8 channels. Here’s a figure (adapted from FIG. 6 of Chatterjee, et al.) showing the results:


Performance with a CI simulation – adapted from FIG. 6 of Chatterjee, et al.

If you look at the red and blue boxes in the figure above, you can see that both adults and children with CIs performed similarly to normally-hearing adults listening to a simulator with 8 channels (a medium amount of spectral resolution), and that a simulator with 16 channels (making the spectral resolution better) would have improved performance for at least the female talker.

I think the most interesting thing about this figure is how poorly normally-hearing children listening to the CI-simulator did! Notice that their scores (highlighted by the green box) are much worse than the adults listening to the 8-channel simulator, AND, interestingly, much worse than the children with CIs! This indicates the huge benefit that children with CIs are receiving – they are performing, at least with respect to vocal emotion identification, like adults with CIs, and much better than normally-hearing children listening to a CI-simulator (probably because the children with CIs hear everything in daily life through the CI, whereas it probably takes time for children listening to a simulator to adapt to the sound of the simulations).

My Takeaways

If you’ve read this far – thank you! (Or maybe you’re my husband reading this under duress? Hi, G!)

I think this study has interesting implications for speech therapy for children with CIs – it’s clear from this data that at least some children have trouble identifying different vocal emotions, and focusing on this in some way might go a long way towards overcoming this deficit.

This study only looked at children with CIs, so it’s not clear from this whether children with milder hearing loss who wear hearing aids face the same problems. From interacting with T (9 months, with a mild hearing loss), I think he definitely notices different vocal emotions – for example, he will look up very attentively if I start talking in an angry or frustrated way (umm, not that that happens a lot!), and he’ll stare at me with huge eyes. Also, if my husband and I start talking in an excited way, he’ll sometimes “join in” by smiling and squealing. Although he of course can’t yet label different emotions, I think he’s definitely picking up on some of the acoustic cues underlying them (although, in all of these examples, he’s also certainly picking up on our facial expressions and body language, as well.).


[1] Schorr, JA. Roth, FP. Fox, NA. “Quality of Life for Children with Cochlear Implants: Perceived Benefits and Problems and the Perception of Single Words and Emotional Sounds.” Journal of Speech, Language, and Hearing Research. Vol. 52, 141-152. 2009.



Hearing Aids and Daycare

We had selected a daycare for T while I was pregnant, but once we found out T had a hearing loss and would need to wear hearing aids, I started second-guessing our decision to send him to daycare. I was worried that it would be hard to teach caregivers to put in the hearing aids. I also worried that the daycare might have high turnover, and it would be hard to teach all of the caregivers how to do this properly. At that point, we briefly considered either a nanny or a nanny-share.

I want to acknowledge here how lucky we are that we had a choice at all! Also, my mom watched T from when he was 3 months old (which was when I went back to work and when he first got hearing aids) till when he started daycare, which we are so grateful for – this gave us time to get comfortable with everything and make a childcare choice we were happy with.

Anyway, while we were thinking about whether daycare would still work for us, I searched online for information about babies who wear hearing aids and daycare, and I didn’t really find much. We did end up choosing to send T to daycare (when he was 5 months old), so I wanted to write about our experience with that here.

There were 3 main things I was worried about daycare handling properly:

  1. Making sure T wore his hearing aids whenever he was awake
  2. Taking his hearing aids off at nap time (I didn’t want him pulling them off and putting them in his mouth while he was in his crib)
  3. Not losing the hearing aids

All of these things turned out totally fine, and honestly, they were fine from the very beginning! When T first started, my husband went in on the first day and showed the two main caregivers how to put T’s hearing aids on. They got the hang of it right away! He also gave instructions for taking them off at nap time, changing the Stick and Stays on the processor (we use the Stick and Stays to stick the processor to T’s head behind his ears so that they don’t fall off) whenever they lost their stickiness, storing the hearing aids in a box in T’s cubby when he wasn’t wearing them (to ensure they weren’t accessible to any babies who could accidentally swallow the batteries), etc. I also sent in a document with written instructions for all of this. For the first few days, the caregivers had a bit of trouble getting the ear molds in all the way, but we showed them how to visually tell that they weren’t fully inserted in the ear, and also showed them how you can hear feedback when they aren’t fully inserted. After that, the ear molds were always in all the way when we picked T up.

In the spirit of being totally honest, I should note that, although I’m so happy and relieved with how well T’s daycare has handled his hearing aids, it took me a little while to get there (not because they were doing anything wrong, just because I worried so much!). For the first few weeks, my husband and I weren’t sure if they were taking his hearing aids off when he was napping – as I mentioned above, this was really important to me, because I didn’t want T pulling off his hearing aids while in his crib and putting them in his mouth. We weren’t sure if they were taking them off because the Stick and Stays seemed too sticky when we picked T up at the end of the day, and we weren’t sure if they were changing them during the day. So, I devised a plan where we marked the Stick and Stays in the morning with a Sharpie so we could tell if they had changed them during the day – my thinking was that if the same morning Stick and Stays came back at the end of the day but they were still sticky, we’d know that they hadn’t removed T’s hearing aids during nap time. (Yep. I’m basically the bumbling Hercule Poirot of baby hearing aids). As a result of my detective work, we were relieved and happy to find out that they were changing the Stick and Stays (just as we’d instructed!). I also happened to pop in at nap time one day and saw T napping without his hearing aids on, and I started to relax at this point (about the hearing aids. I’m still pretty tightly wound about lots of other stuff).

I realize that this post doesn’t really have any advice if you are thinking about whether or not to put your hearing-aid wearing baby in daycare or about how to ease that transition, and I’m not sure what advice I have on that front. We really liked the daycare when we visited (before we knew about T’s hearing loss), and we felt really comfortable with the idea of him being there. I wish I had more advice to give, but I hope our positive experience is helpful to read about!

Playing With Bubbles

One of my favorite things about T’s speech therapy sessions is that we learn new ways to interact and play with T, including playing with toys we already have at home. At our very first speech therapy session (when T was only 4 months old!), T’s speech therapist showed us ways to play with T while playing with bubbles.

T’s speech therapist’s eventual goal was to get T to produce “ba” and “pa” sounds. However, when we first started, our initial goal was just to reward T for producing any sounds at all. So, we’d count to 3, and then wait for T to say something (anything!), and reward him by blowing bubbles. To be honest, it took about a month or a month and a half for T (until he was 6ish months old) to be interested in the bubbles – I think he had trouble visually focusing on them at first. After that, he started to really love looking at the bubbles floating around, and especially loved when we caught one on the bubble wand and brought it to him to pop. He soon got the hang of “asking” for bubbles (he’d mostly say “da!” or “ga!”).

After that, we started trying to make the game a little harder for T. First, a little background – one of the things T’s speech therapist has talked to us about is pairing consonants that are similar but different in some context so that T can hear the two consonants together and try to hear the difference. An example of this (that we use in our bubble games) are “ba” and “pa” sounds. “Ba” and “pa” are really similar in that both are bilabial consonants – meaning they are formed with the lips pressed together (if you look in a mirror while making “ba” and “pa” sounds, you’ll notice that they visually look really similar!). However, “ba” and “pa” differ from each other in a parameter called “voice-onset time.” In the case of “ba” and “pa,” voice-onset time refers to the amount of time between when the lips are opened and when vibration of the vocal folds begins. The time is much shorter for “ba” than for “pa,” and you can hear it in how “pa” has sort of a more explosive attack than “ba.”

Anyway, we’ve been trying to get T to hear and produce “ba” and “pa” sounds using bubbles. Now that T is a bit older, we count to 3, and try to wait for him to make a “ba” sound (and if he is reluctant, we’ll say “ba-ba-ba-BUBBLES!”). Then, we’ll show him how to pop the bubbles, while saying “Pa-Pa-Pa-POP!”). T has become somewhat consistent in saying “ba” to get us to blow bubbles, and he is working on the “pa” sound – we’ve heard him say this a few times in the context of popping bubbles, which is exciting!

Article Review – “Statistical Learning by 8-Month-Old Infants”

Between last week’s article review and this week’s article review, I seem to be on a bit of a kick talking about infant language development. Don’t worry, I have something totally different in mind for next week!

Like last week, the article I’ll talk about here is a classic, and it describes an elegant body of research that changed how scientists think about how infants acquire language. The article is “Statistical Learning by 8-Month-Old Infants”  and is available for free as a PDF through the link. (Saffran, J.R., Aslin, R.N., Newport, E.L. “Statistical Learning by 8-Month-Old Infants.” Science, Vol. 274, No. 5294, 1926-1928, 1996). Note that the infants studied had normal hearing, and I’m not sure how the results would change with infants with hearing loss.


The study described in the article looked at how infants learn to segment a stream of speech into words – that is, identifying which chunks in a stream of speech constitute a word (rather than a syllable, a phrase, a sentence, etc.). When I first heard about the question underlying this study, my initial reaction was that this is silly -aren’t the words marked by pauses on either side (like how words are marked by spaces in written text)? It turns out this isn’t true at all!

For example – here’s the sound signal from me speaking the sentence “I really like Mississippi.” (I chose this sentence because of the variability in the number of syllables per word, not out of any particular fondness for Mississippi; I’ve actually never been to Mississippi!).


As you can see there, sometimes the word boundaries line up with the pauses in the signal, such as between the words “I” and “really.” Other times, there’s pretty much no pause between words, such as between the words “really” and “like.” And other times, there are large pauses within a word, such as in the word “Mississippi.” So, pauses or gaps are really not a good indicator of word boundaries!

Before I keep going, I can’t resist sharing a little anecdote – I had never considered how difficult the problem of identifying word boundaries in speech is until I watched my husband try and learn Tamil, the language that my family speaks. He would hear someone say something like “I ate an ice cream cone last Tuesday,” (but in Tamil), and he would ask me questions that were the equivalent of “what’s an eamco? what does asttue mean?” I would get so frustrated, because I had no idea what he was asking! (I’m a little embarrassed to admit that more than one visit with my family included me yelling “THAT’S NOT A WORD!” at my husband.).

All of this to say – segmenting a stream of speech into word chunks is actually really hard, even though we seem to learn to do this effortlessly in our native languages! Saffran et al. studied how infants learn to segment a stream of speech into words.

The Study

One potentially powerful cue that could be used to identify word boundaries are statistical regularities in how frequently one sound tends to follow another in a stream of speech. Saffran et al. give the example of the phrase “pretty baby” – over a huge corpus of speech (like what you might hear spoken over many days), the sound “ty” is more likely to follow the sound “pre” than the sound “ba” is to follow “ty.” If you were to keep track of how likely different sounds are to follow other sounds, over time, you might figure out that “pretty” is one word chunk and “baby” is another word chunk (of course, just knowing how likely one sound is to follow another doesn’t give you any idea of the meaning of the word; that’s a different problem to solve!).

Saffran et al. had previously showed that adults can use these probabilities to learn to segment speech into words, so they wanted to extend this work to see if infants can also use this information.

In Experiment 1, the researchers created a made-up language that had 4 nonsense words that each had 3 syllables. The words were: tupirogolabubidaku, and padoti. (These were the words for one of two conditions; they tested two groups of children with two different sets of words, to make sure that children didn’t have a bias for particular nonsense words). They then played a continuous stream of speech that consisted of these 4 words repeated in random order for 2 minutes. So, the speech might have sounded like “tupirogolabubidakupadotigolabubidaku…” The words were spoken in a monotone, and there were no pauses or stresses on particular syllables that might have indicated where word boundaries were. Note that, since there were no pauses between words and no other acoustic cues (tone, stress, etc.), the only difference between words and non-words were how frequently one syllable followed another. So, for example, “ku” always followed “da” (in bidaku) with 100% probability, but “pa” would only follow “ku” in the case where the words padoti followed bidaku (with 33% probability) – this would indicate that “da” and “ku” go together, whereas “ku” and “pa” generally don’t.

They then tested whether the infants (8 months old) could distinguish the nonsense words in this made-up language from non-words (that is, 3-syllable groups of sounds that weren’t any of the 4 words in the made-up language). The researchers created a test set that consisted of two of the nonsense words (tupiro and golabu) and two similar non-words that the infants had never heard in the stream of speech (dapiku and tilado). Note that all of the syllables in the non-words were present in the words in the stream of speech, but not in the same order. For example, “da” and “pi” were both syllables there were heard in the stream, but “pi” never followed “da.”

The infants were then tested to see whether they could discriminate between the words and non-words. They did this by seeing whether infants paid attention for longer after hearing the non-words (that weren’t present in the stream of speech they had listened to earlier) compared to the words that they had heard – the idea here is that infants pay attention longer to stimuli (sounds, visual objects, etc.) that they haven’t heard or seen before compared to stimuli that they’re familiar with. And, the researchers found that the infants did in fact pay attention for longer after hearing the non-words (on average, almost a full second longer). This indicates that they had learned what syllables should follow each other (in the example above, that “ku” should follow “da”), even after listening to the stream of speech for only 2 minutes!

But, merely knowing the order in which syllables should go isn’t enough to segment a stream of speech into words. For example, with the stream “tupirogolabubidaku…,” knowing the syllable order doesn’t tell you whether a word is golabu or bubida. The researchers conducted a second experiment, where the test set consisted of 2 words and two “part-words”. Each of the part-words were also three syllables, and were created by joining the final syllable of a word with the first two syllables of a different word (e.g., bubida – a combination of golabu and bidaku). In this case, the infants might have heard the part-words in the stream of speech – for example, there is some chance that bidaku could follow golabu, and in that case, the infant would hear bubida.  However, they would hear bubida far less frequently than they would hear either golabu or bidaku because “bi” is relatively unlikely to follow “bu,” since it spans a word boundary.

Experiment 2 was trickier than Experiment 1, since the non-words were combinations that the infants would have heard in the 2 minute stream, albeit less often than the words. Even with the increased difficulty of the task, the infants were still able to distinguish the words from the non-words!

These two experiments show that infants use statistics underlying speech sounds (how frequently one sound tends to follow another) to build up a mental representation of language, and they can use that to help them learn where word boundaries occur in speech. What’s more, they can do this VERY rapidly (in this case, after listening to just 2 minutes of a stream of nonsense words).

My Takeaways

Reading this study, it’s totally amazing to me that such young children (8 month old infants) were able to glean so much information after listening to a stream of speech consisting of words they had never heard before, for such a short time.

One of the things T’s speech therapist first said to us when T was only 2 months old is that we should talk to him A LOT – we should narrate what we’re doing, have “conversations” with him even though he wasn’t really responding, etc. We tried hard to do this, but honestly, it gets a little tiresome to basically just be talking to yourself – and I kind of wondered, what’s the point? Is T getting anything out of this? After reading this study, I think all of that talking must be really important! After all, to be able to build up a statistical model of language, T needs to hear lots and lots of examples of lots of different words. This also highlights the importance of T wearing his hearing aids – if it’s hard for T to hear the difference between two sounds (e.g., “sa” and “fa”), it will be hard for him to build up a mental model of how often one of those sounds follows another.






When your baby keeps pulling his hearing aids off

In this post, I wrote about using the Phonak Stick and Stays to keep the processors stuck on behind T’s ears. That worked really well from when T was 3 months old (when he first got hearing aids) up until now (at almost 9 months old). However, in the past week or so, T seems to have discovered that he has ears, and frequently reaches up to touch them, pull them, pull off the hearing aids and look at them, pull off the hearing aids and put them in his mouth, etc. I don’t think his ears are bothering him, and I don’t think the ear molds or processors are bothering him (no fluid in his ears, no other signs of an ear infection, there’s no obvious gapping between the ear mold and T’s ear, and when T is distracted playing, he doesn’t tend to yank at the hearing aids) – it seems like he’s just discovered that the hearing aids are there. Now, the Stick and Stays are not nearly sticky enough to withstand T’s grasp and pulling, so he can very easily get the hearing aids off. And, boy is he FAST!

So, we decided the Stick and Stays were no longer quite enough. We first tried using toupee tape instead of the Stick and Stays. The toupee tape is a bit stickier, but it turned out to not be sticky enough to foil T. Also, the toupee tape doesn’t come pre-cut into the shape of a hearing aid processor, so it’s a little more annoying to work with.

Next, we tried putting on a hat that covers T’s ears. We ordered these – the style is called a pilot’s cap, and other companies sell them as well. So far, the hat seems to be working pretty well! It’s made of a soft, lightweight cotton that’s not too hot for indoor-wear, and although T will still reach up for his ears, he can’t reach the processor to pull it off. And, it comes in lots of colors and looks adorable (picking out baby clothes has become something of a new hobby for me).

A few other notes about using a pilot’s cap – the fit is pretty important. You want the hat to be pretty snug, since if it’s too loose, it won’t fully cover the ears. At 9 months, T wears an XS, and it fits perfectly, although I do have to make sure to tie the chin straps pretty tightly. Also, if you try the pilot’s cap and find that it works well, I highly recommend getting backups (this is just common-sense when it comes to baby clothes), since they get pretty gross during mealtime.

Speech Therapy Session – February 12, 2016

T (almost 9 months!) had a blast at speech therapy today! We played a game where we rolled a ball back and forth – I sat with T and we rolled the ball to his speech therapist, who rolled it back to us. After we received the ball, we’d tap on it (well, I tapped; T pounded!) while saying “ba-ba-BALL!” and then we’d roll it the speech therapist, emphasizing the word “ROLL!” T got lots of practice hearing the contrast between the vowels “ahhh” and “ohhh,” and he also got to practice turn-taking by having to wait for the ball to come back to him. T is usually pretty patient about turn-taking, but he was SO EXCITED about the big red ball that he kept trying to race across the room to get the ball back. So, this was good practice for him!

And, something exciting happened! We’ve been working hard for the past few weeks on drawing T’s attention to the sound “shhhh.” For example, today at speech therapy, we were playing with a toy piano that plays a song when you push a button; when the song stopped, we’d say “SHHHHH!!! It’s quiet!! the music stopped!” and we would really emphasize the “SHHH” sound with gestures. We’ve been doing this sort of thing for a few weeks now, with music, and by pairing the “shhhh” sound with hiding during peekaboo. And today, for the first time, T produced the “shhh” sound! When the music on the toy stopped, T said “shhh.” We were all really excited – “shhh” is one of the more difficult Ling sounds to produce, since it requires careful mouth positioning, and it’s a bit difficult to hear, since it’s a relatively high-frequency sound that is also sort of soft. T’s speech therapist was excited, and said that all of our peekaboo practice was paying off. I didn’t mention that I’ve been privately working on “shhhh” sounds with T at home by singing and dancing to Taylor Swift’s “Shake It Off” – sometimes, T will start grinning at me when I sing “SHHHHHHHake it off” to him 🙂 (Go ahead and judge my taste in music – in the words of T-Swift – “haters gonna hate hate hate hate hate!”)

Article Review – “Cross-Language Speech Perception”

I wanted to talk about a really cool article that I first read in grad school about infant language development. Although this article was published in 1984, it’s from a classic series of experiments, and is very relevant to T at his current age (8.5 months). Note that the infants studied in the article all had normal hearing, and I’m not sure how the results would change for infants with hearing loss!

The article is “Cross-Language Speech Perception: Evidence for Perceptual Reorganization During the First Year of Life” and is available for free as a PDF through the link. (Werker and Tees. “Cross-Language Speech Perception: Evidence for Perceptual Reorganization During the First Year of Life.” Infant Behavior and Development, Vol. 7, Pages 49-63, 1984).


All languages have consonants – consonants are speech sounds that are articulated with a partial or full closure of the vocal tract. One defining feature of a consonant is its place of articulation, or where in the vocal tract the obstruction occurs. For example, the consonant sounds “p” and “b” are called “bilabial” because both lips close to form the obstruction. Another example is “alveolar” consonants, where the tongue presses against the gum ridge just behind the upper teeth – examples of alveolar consonants are “d” and “t”.

Place of articulation can be a defining feature for distinguishing two consonants – for example, a key difference between “ba” and “da” is that “ba” is made with the lips pressed together and “da” is made with the tongue pressed up against the alveolar ridge (just behind the upper teeth). “Ba” and “da” are also considered “contrastive” in English – this means that if you substitute one for the other in a word, the meaning changes (for example, “bang” and “dang”). So, in English, a bilabial articulation (“ba”) is contrastive with an alveolar articulation (“da”), but there are other pairs of places of articulation that are not contrastive in English.

For example, Hindi has a “retroflex” place of articulation – this position is created by curling the tongue backward toward the hard palate, and is made in conjunction with sounds that are similar to the English consonants of “t” and “d.” In Hindi, the retroflex articulation is contrastive with a “dental” articulation, where the tongue is pressed just behind the upper teeth. So, in Hindi, a “t” sound can be made with the tongue just behind the upper teeth (“dental”), or with the tongue curled way back (“retroflex”), and these two types of “ts” are different letters, and one substituted for the other in a word creates a different word.

English doesn’t have a retroflex consonant (only 11% of languages have retroflex consonants!) – in fact, native adult English speakers can’t really even hear the difference between a retroflex “t” and a dental “t” (which are different sounds and different letters in Hindi) – they tend to label both the retroflex “t” and the dental “t” as an alveolar articulation, which corresponds to the “t” sound in English.

What’s really interesting is that newborn babies (<6-8 months old) in native English-speaking families, can hear the difference between a retroflex “t” and a dental “t” – and sometime between early infancy and adulthood, they lose this ability. The general idea is that all babies, regardless of language spoken at home, are born with the ability to hear the difference between all these different consonant contrasts, and based on the language they hear around them, the brain “prunes” out the ability to hear the contrasts that aren’t needed to learn their native language. Werker and Tees studied when infants lose this ability.

The Experiment

In this post, I’ll focus on Experiment 2 described in the article. The authors had previously found that 6-8 month old infants could discriminate (or, hear the difference) between the Hindi retroflex “t” (which I’ll lablel here as tr) from the Hindi dental “t” (which I’ll label here as (td). In a pilot study, they found that English-speaking 4-year old children performed similarly to English-speaking adults – that is, they couldn’t hear the difference. So, in Experiment 2, the authors looked at whether 8-10 month old infants being raised in English-speaking homes could hear the difference between tr and td, and whether 10-12 month old infants raised in English-speaking homes could hear the difference between tr and td. They also compared the results to those of babies being raised in Hindi-speaking homes.

To test whether the babies could hear the difference between tr and td, they used a conditioned head-turn procedure. A string of one of the consonants was played in a loop, and then suddenly changed to be the other consonant (for example, “tr tr tr tr td“). The babies were conditioned to turn their head to look at a toy animal when they detected a change in the consonant being played. (This is actually kind of similar to the procedure for Visually Reinforced Audiometry used to measure audiograms in babies!)

The authors found that, of the babies being raised in English-speaking homes, most of the 6-8 month old infants could discriminate tr and td, some of the 8-10 month old infants could discriminate tr and td, and only a few of the 10-12 month old infants could discriminate tr and td. The 10-12 month old infants were significantly worse at discriminating the two consonants than either the 6-8 month old infants and the 8-10 month old infants. Additionally, the authors found that all of the 10-12 month old babies being raised in Hindi-speaking homes could discriminate tr and td. The figure below shows the proportions of infants in the different age groups that could discriminate these two consonants (it’s FIG. 4 from the article).


(Note that, in the above figure from the article, the graphs show both results using the Hindi consonants, as well as two consonants from a different language called Salish. Additionally, the top row of graphs (labeled “cross-sectional data”) shows results from different babies, and the bottom row of graphs (labeled “longitudinal data”) shows results from the same group of babies followed over time from 6-8 months through 10-12 months).

The results of this study show that infants up until 6-8 months of age can hear the difference between consonants that aren’t contrastive in their native language, but that they lose this ability somewhere between 8-12 months of age. A lot of important language development happens in the first year of life!

Testing Myself, My Husband, and My Baby

I had originally read this article because I was interested in T’s ability to discriminate retroflex and dental consonant since he’s in the interesting 8-10 month old age range where he might or might not be able to hear the difference.

The Adults – Me and My Husband

Before testing T, I decided to see whether my husband and I could hear the difference. I found synthesized audio files of retroflex and dental consonants here (although the synthesized consonants were the retroflex and dental “d” consonants rather than “t” consonants as used in the article; the retroflex/dental “d” consonants are also present in Hindi). If you click on that link, you can hear the consonants that I used (they’re labeled CV1 and CV7) – can you hear the difference?

To test myself and my husband, I randomly played either the same consonant (both retroflex or both dental) or different consonants and asked whether they were the same or different. If we were just guessing randomly, we’d get about 50% correct.

Let’s start with me – after testing myself on two sets of 20 comparisons, I got 47% correct – just as if I’d guessed randomly – it turns out, I can’t hear the difference between retroflex and dental consonants at all! And now for my husband – over two sets of 20 comparisons, he got 90% correct! He said that the difference was in the beginning part of the consonant – they sounded like they had different “attacks,” and that this difference was very clear to him.

It kind of irritated me that my husband was so much better than me at hearing the difference! So, I did a little digging – this [1] study by Pruitt et al. found that adult native Japanese speakers are better than adult native English speakers at hearing the difference between Hindi retroflex and dental consonants. They hypothesized that this is because Japanese contains a consonant contrast that is similar to the retroflex/dental contrast in Hindi (in Japanese, it’s the /d/ vs. flapped /r/, which is sometimes produced as a retroflex consonat). My husband went to a Japanese immersion school for several years as a child, so my current hypothesis is is that the reason he could discriminate the Hindi consonants so much better than me is his early exposure to Japanese!

The Baby

To test T, I first tried to see if he could tell the difference between the English “ba” and “da.” These two consonants also have a difference in place of articulation, but since T hears this every day, he should be able to tell the difference, regardless of age. I produced a stream of “ba”s and then switched to “da” (or vice versa), and he quickly looked up, indicating that he’d noticed the difference. (This is different than the way the babies were tested in the article, since I don’t have the same equipment that they had!).

I then repeated this with the Hindi retroflex consonants. A few times, it seemed like his attention shifted coincident with the change in the consonant, indicating that he might have heard the difference. However, there were also several instances where his attention didn’t shift at all. It’s hard to say whether this is because he didn’t hear the difference, or because his attention was drawn elsewhere (for example, to the laptop producing the sounds!).

Overall, I can’t say whether or not T can discriminate between Hindi retroflex and dental consonants – it might be that he can, but that the way I tested him wasn’t thorough enough to detect his ability to discriminate the two consonants. Alternatively, he might not be able to discriminate the two consonants, but we don’t know whether that’s because he previously had this ability and has since lost it with age (as in the case of the babies studied in the article), or whether he was never able to discriminate the two consonants (since I never tested him on this when he was younger). I wish I had done this earlier, when he was 6 months old, to see how he’d reacted then!


[1] Pruitt J.S., Jenkins J.J., and Strange W. “Training the perception of Hindi dental and retroflex stops by native speakers of American English and Japanese.” J. Acoust. Soc. Am., 119, 1684, 2006.