Article Review – “Statistical Learning by 8-Month-Old Infants”

Between last week’s article review and this week’s article review, I seem to be on a bit of a kick talking about infant language development. Don’t worry, I have something totally different in mind for next week!

Like last week, the article I’ll talk about here is a classic, and it describes an elegant body of research that changed how scientists think about how infants acquire language. The article is “Statistical Learning by 8-Month-Old Infants”  and is available for free as a PDF through the link. (Saffran, J.R., Aslin, R.N., Newport, E.L. “Statistical Learning by 8-Month-Old Infants.” Science, Vol. 274, No. 5294, 1926-1928, 1996). Note that the infants studied had normal hearing, and I’m not sure how the results would change with infants with hearing loss.


The study described in the article looked at how infants learn to segment a stream of speech into words – that is, identifying which chunks in a stream of speech constitute a word (rather than a syllable, a phrase, a sentence, etc.). When I first heard about the question underlying this study, my initial reaction was that this is silly -aren’t the words marked by pauses on either side (like how words are marked by spaces in written text)? It turns out this isn’t true at all!

For example – here’s the sound signal from me speaking the sentence “I really like Mississippi.” (I chose this sentence because of the variability in the number of syllables per word, not out of any particular fondness for Mississippi; I’ve actually never been to Mississippi!).


As you can see there, sometimes the word boundaries line up with the pauses in the signal, such as between the words “I” and “really.” Other times, there’s pretty much no pause between words, such as between the words “really” and “like.” And other times, there are large pauses within a word, such as in the word “Mississippi.” So, pauses or gaps are really not a good indicator of word boundaries!

Before I keep going, I can’t resist sharing a little anecdote – I had never considered how difficult the problem of identifying word boundaries in speech is until I watched my husband try and learn Tamil, the language that my family speaks. He would hear someone say something like “I ate an ice cream cone last Tuesday,” (but in Tamil), and he would ask me questions that were the equivalent of “what’s an eamco? what does asttue mean?” I would get so frustrated, because I had no idea what he was asking! (I’m a little embarrassed to admit that more than one visit with my family included me yelling “THAT’S NOT A WORD!” at my husband.).

All of this to say – segmenting a stream of speech into word chunks is actually really hard, even though we seem to learn to do this effortlessly in our native languages! Saffran et al. studied how infants learn to segment a stream of speech into words.

The Study

One potentially powerful cue that could be used to identify word boundaries are statistical regularities in how frequently one sound tends to follow another in a stream of speech. Saffran et al. give the example of the phrase “pretty baby” – over a huge corpus of speech (like what you might hear spoken over many days), the sound “ty” is more likely to follow the sound “pre” than the sound “ba” is to follow “ty.” If you were to keep track of how likely different sounds are to follow other sounds, over time, you might figure out that “pretty” is one word chunk and “baby” is another word chunk (of course, just knowing how likely one sound is to follow another doesn’t give you any idea of the meaning of the word; that’s a different problem to solve!).

Saffran et al. had previously showed that adults can use these probabilities to learn to segment speech into words, so they wanted to extend this work to see if infants can also use this information.

In Experiment 1, the researchers created a made-up language that had 4 nonsense words that each had 3 syllables. The words were: tupirogolabubidaku, and padoti. (These were the words for one of two conditions; they tested two groups of children with two different sets of words, to make sure that children didn’t have a bias for particular nonsense words). They then played a continuous stream of speech that consisted of these 4 words repeated in random order for 2 minutes. So, the speech might have sounded like “tupirogolabubidakupadotigolabubidaku…” The words were spoken in a monotone, and there were no pauses or stresses on particular syllables that might have indicated where word boundaries were. Note that, since there were no pauses between words and no other acoustic cues (tone, stress, etc.), the only difference between words and non-words were how frequently one syllable followed another. So, for example, “ku” always followed “da” (in bidaku) with 100% probability, but “pa” would only follow “ku” in the case where the words padoti followed bidaku (with 33% probability) – this would indicate that “da” and “ku” go together, whereas “ku” and “pa” generally don’t.

They then tested whether the infants (8 months old) could distinguish the nonsense words in this made-up language from non-words (that is, 3-syllable groups of sounds that weren’t any of the 4 words in the made-up language). The researchers created a test set that consisted of two of the nonsense words (tupiro and golabu) and two similar non-words that the infants had never heard in the stream of speech (dapiku and tilado). Note that all of the syllables in the non-words were present in the words in the stream of speech, but not in the same order. For example, “da” and “pi” were both syllables there were heard in the stream, but “pi” never followed “da.”

The infants were then tested to see whether they could discriminate between the words and non-words. They did this by seeing whether infants paid attention for longer after hearing the non-words (that weren’t present in the stream of speech they had listened to earlier) compared to the words that they had heard – the idea here is that infants pay attention longer to stimuli (sounds, visual objects, etc.) that they haven’t heard or seen before compared to stimuli that they’re familiar with. And, the researchers found that the infants did in fact pay attention for longer after hearing the non-words (on average, almost a full second longer). This indicates that they had learned what syllables should follow each other (in the example above, that “ku” should follow “da”), even after listening to the stream of speech for only 2 minutes!

But, merely knowing the order in which syllables should go isn’t enough to segment a stream of speech into words. For example, with the stream “tupirogolabubidaku…,” knowing the syllable order doesn’t tell you whether a word is golabu or bubida. The researchers conducted a second experiment, where the test set consisted of 2 words and two “part-words”. Each of the part-words were also three syllables, and were created by joining the final syllable of a word with the first two syllables of a different word (e.g., bubida – a combination of golabu and bidaku). In this case, the infants might have heard the part-words in the stream of speech – for example, there is some chance that bidaku could follow golabu, and in that case, the infant would hear bubida.  However, they would hear bubida far less frequently than they would hear either golabu or bidaku because “bi” is relatively unlikely to follow “bu,” since it spans a word boundary.

Experiment 2 was trickier than Experiment 1, since the non-words were combinations that the infants would have heard in the 2 minute stream, albeit less often than the words. Even with the increased difficulty of the task, the infants were still able to distinguish the words from the non-words!

These two experiments show that infants use statistics underlying speech sounds (how frequently one sound tends to follow another) to build up a mental representation of language, and they can use that to help them learn where word boundaries occur in speech. What’s more, they can do this VERY rapidly (in this case, after listening to just 2 minutes of a stream of nonsense words).

My Takeaways

Reading this study, it’s totally amazing to me that such young children (8 month old infants) were able to glean so much information after listening to a stream of speech consisting of words they had never heard before, for such a short time.

One of the things T’s speech therapist first said to us when T was only 2 months old is that we should talk to him A LOT – we should narrate what we’re doing, have “conversations” with him even though he wasn’t really responding, etc. We tried hard to do this, but honestly, it gets a little tiresome to basically just be talking to yourself – and I kind of wondered, what’s the point? Is T getting anything out of this? After reading this study, I think all of that talking must be really important! After all, to be able to build up a statistical model of language, T needs to hear lots and lots of examples of lots of different words. This also highlights the importance of T wearing his hearing aids – if it’s hard for T to hear the difference between two sounds (e.g., “sa” and “fa”), it will be hard for him to build up a mental model of how often one of those sounds follows another.







Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s