Article Review – “Musician Advantage for Speech-on-Speech Perception”

Today, I want to talk about a recently published article (full text here) that isn’t directly related to babies or hearing loss, but that I found really interesting and wanted to share! The article is “Musician Advantage for Speech-on-Speech Perception.” (Baskent, D. and Gaudrain, E. “Musician Advantage for Speech-on-Speech Perception.” J. Acoust. Soc. Am. 139, EL51. 2016).

Also, this paper got some great publicity in Scientific American!

Background

Anyone who’s tried to have a conversation in a crowded bar or restaurant knows that understanding what one person is saying when there’s background noise of other people talking is one of the hardest listening tasks (and one that people with hearing loss struggle the most with!). One of the challenges of understanding speech in the presence of other, competing speech is segregating the different people talking to be able to focus on the one person you want to hear (I talked a bit about differences between babies and adults in this type of task here).  This problem is often called the “cocktail party problem” – that is, if you’re in a noisy, crowded environment with other people talking, being able to understand  what one person you’re having a conversation with is saying.

The authors of this study hypothesized that musicians would be better able to understand speech in the presence of other, competing speech better than non-musicians. If musicians ARE better at understanding speech-on-speech, this might be for a few different reasons. First, musicians are better at identifying subtle changes in pitch (something they do all the time to know if they are playing something correctly and in tune!), and this might be really helpful for separating multiple speech streams. For example, they might be able to use pitch differences to group words that they hear as belonging to different voices. Secondly, over decades of practice, musicians hone their “listening skills” – so it might be that they are just better at shifting their auditory focus to what they want to hear than non-musicians.

So, the researchers first wanted to see if the musicians had an advantage at all. They also wanted to know, if the musicians did have an advantage, if the advantage seemed to be related to their better ability at detecting pitch changes, or if it seemed to be more generally related to an increased ability to shift focus to different speech streams.

The Study

The researchers tested 18 musicians and 20 non-musicians on their ability to understand a sentence (the target) in the presence of one competing talker (the masker) – so the subjects had to understand one person talking who was competing with a second person talking. In order to qualify as a musician for this study, participants had to have had 10+ years of training, began musical training before they were 7 years old, and had to have received musical training within the past 3 years.

To probe whether musicians were more able to take advantage of subtle pitch changes than non-musicians, the researchers manipulated how different the target sentence was from the masker sentence in 2 ways:

  1. The fundamental frequency (F0) – the fundamental frequency (F0) indicates the voice pitch of a person’s speech. So, men generally have lower F0s than women, children have lower F0s than adults, etc.
  2. An estimated Vocal Tract Length (VTL) – The vocal tract is a cavity that filters sounds that you produce – in a very simplified view, it’s kind of like a tube that goes from the vibrating vocal folds at one end to your mouth at the other end, and it helps shapes different sounds that you produce to make them sound like different vowels or consonants. The length of the vocal tract varies across people – children have shorter vocal tracts than adults, and men generally have longer vocal tracts than women. VTL doesn’t directly affect voice pitch (like F0), but it changes other frequencies in speech sounds (the formants – definitely getting a bit technical, but really interesting!). If you have two recordings of people talking and they have the same F0 but different VTLs, the pitch (how high or low their voice is) will be the same, but the quality and characteristics of their voice will sound different – that’s the VTL at work!

The researchers used some fancy software to manipulate the F0 and VTL of the target sentences and the masker sentences so that, in each trial the subjects listened to, the target and masker sentences were more alike or less alike. They measured how well musicians and non-musicians were able to understand the target sentences based on how similar the target sentence was to the masker sentence in terms of these two parameters.

And here are the results!

FIG. 1A (reproduced below) shows the average percent of the sentence the subjects correctly repeated back with various differences in VTL and F0 between the target and masker sentence. The leftmost panel shows the smallest difference in VTL between the target and masker sentences (in the leftmost panel, there was no difference in VTL), and the rightmost panel shows the largest difference in VTL between the target and masker. Within a panel, going left to right increases the F0 difference between the target and masker sentences (so, within a panel, the leftmost points are where the target and masker sentences had the same average voice pitch as each other).

The data from the musicians is shown in purple and the data from the non-musicians is shown in green.

1A.jpg

FIG. 1A from Baskent and Gaudrain

 

As you can see, both musicians and non-musicians were better able to understand the target sentence when the target sentence was “more different” than the masker sentence – if you look at the leftmost points in the leftmost panel (the hardest condition where there was no difference in F0 or VTL between the target and masker sentences), musicians had about 70% intelligibility and non-musicians had about 55% intelligibility. However, looking at the rightmost points in the rightmost panel (the easiest condition where there was the largest difference in both F0 and VTL between the target and masker sentences), both musicians and non-musicians did really well – better than 90% intelligibility. This makes a lot of sense – it’s easier to understand what a (high-pitched) child is saying when their speech is competing with a deep-voiced man compared to trying to understand what one child is saying when their speech is competing with another child.

And, regardless of how different the target and masker sentences were, musicians performed better than non-musicians – and a fairly substantial difference – you can see that the purple points are generally ~15-20 points higher than the green points.

Recall that the researchers wanted to know if a musician advantage was due to the musicians’ ability to detect very subtle pitch differences. Based on this data, it seems like the musician advantage might not primarily be due to musicians’ better pitch perception – in FIG. 1A above, the purple (musician) and green (non-musician) lines are parallel to each other, indicating that both groups were deriving equal benefit from larger pitch differences (larger differences in F0). So, it might be that the musicians are better than the non-musicians at focusing their auditory attention – after all, musicians do this all the time when they practice; for example, a musician in an orchestra has to both listen to what their section is playing as well as what the other sections are playing.

My Reflections

I couldn’t help relating the results of this study to my personal experiences! I started playing the violin and the piano when I was little (~6 years old), and played through college, although I haven’t played regularly since I finished college (many years ago).

I’ve long suspected that I’m much better at understanding speech in noise compared to my husband, G. (This is just a gut feeling, we haven’t thoroughly confirmed this). For example, when G and I go out to eat, I’m usually much better at simultaneously listening to him while eavesdropping on conversations next to us. If G wants to eavesdrop, he’ll have to stop talking to me and stop eating to focus his attention on what the people next to us are saying (while trying hard to look like he’s NOT paying attention to what they’re saying!). So, maybe it’s my childhood musical training that’s given me an edge here!

 

 

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s