Skip to main content

The Apple and Biscuit Show, Episode 5 – What did he just say? – Mike Thornton - Mike Thornton - Dr. Neil Hillman

In this episode, Neil and Jason talk to ‘Mr. Loudness’, Mike Thornton, about the issues of television programme loudness and dialogue intelligibility. Why is the background music too loud? Why can’t the viewers hear what the actors are saying? And why are films so loud in the cinema?

In a comprehensive and compelling journey starting with radio transmission concerns in the 1930s, to the present-day woes of broadcasters and streaming platforms, Mike’s accessible and understandable explanations demystify the raft of complex sound challenges that filmmakers continue to face in delivering effective soundtracks. 

Announcer Rosie

You’re listening to the Apple and Biscuit Show with Jason Nicholas and Dr. Neil Hillman.

Neil Hillman

Hello, and a very warm welcome to this edition of the Apple and Biscuit Show. I’m Dr. Neil Hillman.

Jason Nicholas

And I’m Jason Nicholas. We’re two seasoned professionals working in the film and television sound industry. The purpose of our podcast is to discuss the many ways sound is used in moving picture productions to entertain, inform, educate and engage audiences.

Neil Hillman

It’s also a great way for us to meet up with colleagues we admire, and to join with them in conversation, because how they utilize sound in their work, interests, intrigues, and inspires us. And in turn, we like to think that we’re quite different in the way that we as a team approach sound and sound design, not least of all thanks to backgrounds. Whilst we’re both long -standing and current practitioners who’ve spent years carrying out location sound recording, dialogue editing and being re -recording mixes, we also have strong academic links to the topic of moving picture sound – Jason, with a scholarly interest in human psychology, and me with a research background in sound design and its effects on human emotions. And so in this respect, our call to action is that we hope you’ll feel a connection to us after listening to the programme.

Jason Nicholas

Hopefully the content of our podcast and the guests who join us will prove enlightening to anyone with an interest in the medium. And the part that sound plays in filmmaking, regardless of their level of experience. Maybe you’re a student, just embarking on your studies, an industry newcomer, or you’re already an experienced professional. Maybe you just love movies. Wherever you join us from, you’re welcome along. And as always, we look forward to learning more than a thing or two from our guests.

Neil Hillman

Today on the show, we’re going to be asking deceptively simple questions like, Why are the advert so loud? And why can’t I hear what the actors are saying on television? And whilst we’re at it, we might even ask questions such as, how come I can hear every word in the reruns of 1970s programs like Kojak, Colombo, the Sweeney and Minder, but not modern television drama? Because these are very pertinent questions to television viewers all over the world and the answers it seems are not that straightforward.

Jason Nicholas

Which is all rather demoralizing because as audio professionals we now have the most remarkable tools to work with, particularly as location sound recordists, dialogue editors and re -recording mixers. We can capture process restore and distribute sound in is not previously imaginable. However, broadcasters and streamers face ongoing criticism from their audiences. There are many times when the music is too loud or mismatched from other content in the mix, and many times they just can’t hear what actors are saying. And it’s likely those same audiences will send angry emails to broadcasters to make this very point.

Neil Hillman

So our guest today is Mike Thornton, a man who should be able to shed some light on these things for us. In the industry, he’s known as Mr. Loudness for his work in helping to train creatives and to advise broadcasters on the importance of smoothing out the difference in volume of very loud TV commercials and very quiet drama scenes… The very thing that had us reaching for the volume button on the remote control every time an ad break started. And hopefully we can also explore the phenomena that surrounds dialogue intelligibility, or the conspicuous lack of it at times. Mike has worked in the broadcast audio industry for more than 40 years, firstly as a broadcast engineer at Marconi, and then at the independent local radio stations, Piccadilly Radio and Key 103 in Manchester, England. He’s been an outside broadcast engineer for both Piccadilly Radio and then with his own sound truck Omnibus Mobile, and he had the honour of mixing the first live Dolby Surround broadcast in the UK, which is a great factoid to know should the question who mixed the first Dolby Surround broadcast in the UK ever come up in the pub quiz. Now you know. He’s worked with the Digital Audio Workstation Software Pro Tool since the mid -1990s – when it was still only a four-track system and that he says was on a good day – using the system to record, edit and mix documentaries, comedy, drama and music for both radio and television. And now he’s the company chairman of the enormously successful website and blog production expert, a blog he started as Pro Tools Expert with fellow blogger and audio professional, Russ Hughes. Mike, welcome. Are you well? And where do we find you today?

Mike Thornton

Well, you find me sat at home in my studio, which is somewhere that has been my place of work now since, well, this particular one that I’m in now 2001, but prior to that, 1990, I went self -employed, left Piccadilly Radio. And so I’ve been working at home for a very long time. So when we were all forced to work from home a little while ago, it was no real big deal for me.

Neil Hillman

Splendid. If we may, can we start by talking about loudness, and then we’ll move on to the other hot potato, dialogue intelligibility, a little later. So can I ask a very basic loudness 101 question? Well, it’s going to be two, actually. When we hear people talking about lufs and loudness in a television program, what does that actually mean for us as professional sound people?

And what does it mean for the viewers at home? Is this something related to the loudness wars that radio stations to all out on in the 1980s to make sure that they were the loudest music station on the dial? And what’s the difference between the terms level and loudness to an engineer?

Mike Thornton

Okay, so we’ll start with the, what does it mean for viewers at home? Now, a little while ago, I mean sort of a decade or so ago, before we did loudness, loudness really started to happen in the broadcast industry around 2014 or so. So we’ve been kind of doing it for about 10 years now. But prior to that, for example, the BBC television received about 100 complaints relating to loudness issues over a 40 -day period. 61 were related to the background sound being too high, which is an area of intelligibility, which I’m sure we’ll come on to. 20 of those 100, so 20 % of those comments and complaints were about volume jumps between content. So things like, in the case of the BBC program trailers and announcements being louder than the content around them. Or in a case of commercial television, the adverts and the trailers being much louder than the programs that they’re sat in. And then 19 % of the complaints were related to volume range within the programs being too loud. And this often relates to the dialogue being too quiet and indistinct, and then music and some sound effects being too loud. And as you mentioned in your introduction, causing us to reach for the volume control. So that’s that one. In terms of the loudness wars that you referred to. Actually, it was less about the radio station loudness wars, but much more about television and the loudness jumps between adverts and programs. That was the biggest driver in the move to loudness within a broadcast environment. And in fact, in America, they were so concerned about it, they made it legislation. So you have the Calm Act in America, which basically is a piece of legislation that says the adverts cannot be louder than the programs they sit in. Most everywhere else pretty well, it’s been done by code of practice in essence, standards and code of practice rather than using the piece of legislation. Then in terms of what’s the difference between level and loudness, I mean, this is really key. In essence, loudness is the perceived strength of a piece of audio. So how loud we perceive it to be. And that will depend on things like level, frequency, content and duration. Whereas level is in a sense an electrical measurement of the audio signal. And then you’ve got the word volume to mix into these three phrases because they aren’t, it’s not as simple as to say, because if you turn up the level, the volume and therefore the loudness will get louder. But loudness is much more than just level. It can be frequency, content and duration.

Neil Hillman

Okay, dokey. Can you tell us something about the history of loudness, I mean, does it relate back to the early radio days? Presumably then there was some awareness of it. Was it something that was an issue back in the 40s, 50s or 60s? Now, I know that radio stations tended to favour VU meters, for instance, compared to the peak program meters we all grew up with in television studios. How do they differ? And how come there are so many types of loudness meter now, yet they all work to the same BS1770 standard?

Mike Thornton

In the late 1930s, as radio broadcasting was being established around the world, there was a need for a reliable meter to display the signal being sent to the transmitter. In the UK, the BBC developed what we lovingly know is the PPM, the peak program meter. But interestingly, around the same time, the German broadcasters, so remember this is late 1930s, where we weren’t perhaps talking to the Germans as much as we might normally do, i .e. the precursors the Second World War, but they achieved a very similar result to the BBC in terms of developing a peak reading meter. And obviously, in essence, therefore, most of Europe and territories in the world that were European facing tended to go down the PPM route, whereas in the US, radio broadcasters, they obviously faced the same issue, but for them, PPMs would have been too expensive to roll out to all the broadcasters. So they looked around for a passive, low -cost solution and settled on what we now know today as the VU meter, because essentially you could literally take a meter with a resistor, put it across the line, the output of the of the mixer. So that’s the sort of history. When we move to a sort of hybrid world, so as digital started to come into play, we had a sort of hybrid analogue digital world where we would set our reference point in analogue terms and BBC terms at PPM6, which is essentially plus 8 dBU. we would set our headroom, our sort of reference point in the digital world, to minus 10 dB full scale. In other words, 10 dBs below digital headroom. Because of course, unlike analogue where headroom is kind of a bit of a soft crunch, with digital, it’s very hard. point 1 of a dB below headroom, it’s perfect. Point 1 of a dB above headroom, it’s horrible. And so it was determined to give us a bit of breathing space by having this sort of 10 dBs of space. But interestingly, we were using that headroom. We just didn’t know about it because our beloved BBC PPMs don’t actually read all the peaks. They are what now have to be called quasi-PPMs, because they do what we need them to do, especially from an analogue world, but in a digital world, they don’t actually respond fast enough. Most broadcasting was working about peak level, because the reason the BBC and the Germans and so on so forth use PPMs is because in the days of AM transmitters, it was critical that the signal level didn’t go beyond a certain voltage. Otherwise, you would damage the transmitter. And so we needed to know, more importantly than anything else, where the peaks were. And so that’s why the PPM became a rule. Now, the challenge is that the PPM does not tell us how loud it is. So we developed that all sorts of speech would be peaking at six, music, four to five, and so on and so forth. And the sessions what we were doing there was making allowance for the fact that music is a lot denser sound and therefore for the same peak level, it’s going to be louder. And so we had, in essence, an almost unwritten, written fudge factor, because if you tried to have music peaking at peak six and speech speaking at peak six, the music would be perceived much louder. And that’s, in essence, what ended up being the loudness war, because in broadcasting especially if you like, let’s just focus on adverts. The advertiser wanted their advert to be louder than anybody else’s advert, unsurprisingly. And so we all started doing tricks which would maintain the maximum peak level spec requirement, but by compressing and doing all sorts of other things, we would increase the perceived loudness without exceeding the maximum peak level. And that’s what tipped us into the loudness wars, that’s what caused the fact that all the ads are alike. So whether it’s a 30 -second advert or a one -hour drama, you take the average loudness of the whole content, and that’s your reference point, and that’s how you normalize things. But what it doesn’t mean is you squash all the dynamics out of it.

Jason Nicholas

So can you explain a bit? I mean, now for delivery, most professionals rely on a plugin to deliver the optimum levels that are specified by a broadcast authority or a distributor. What’s going on under the hood with loudness normalization hardware or software? And where is the confusion now sometimes between the older peak level workflow versus loudness? It’s not just turning stuff up and down. How do those workflows differ?

Mike Thornton

The workflows do differ, But in fact, I would suggest that the loudness normalization hardware and software in my book should be banned, completely banned, because it’s undoing the work as audio professionals. We carefully craft a programme and we determine the most appropriate dynamic range for that program for the content and the audience and where the audience is going to be consuming said content. The example I often give is I always, even before loudness workflows, in my mind, in my head, when I was mixing a program, I was imagining somebody listening to this program driving down the motorway, again, this shows my age, in a mini, and I don’t mean the minis that are the more modern minis, I mean the old style, proper minis, driving down the motorway at 70 miles an hour, the ambient sound level in the space where they’re consuming that content will be very high. And so I need to determine in the narrative what are the most important things that the audience needs to be able to hear to be able to follow the narrative and not lose track. Now, that’s whether it’s a documentary or drama. It doesn’t matter. It’s the narrative of whatever it is that you’re portraying. And so I would make sure that, for instance, dialogue was very clearly audible, wasn’t masked by background sounds or sounding dull or whatever. It would be very clearly audible, because that in essence, most of the time in content, the narrative is being driven by the dialogue. Not always. There are sometimes cases where a sound effect is absolutely key to the narrative. From that point of view, we need to be able to mix our programs to the best of our ability and then know that downstream, the dynamics are not going to be changed by software or hardware in the transmission chain.

Jason Nicholas

So you’d advocate for very careful metering rather than relying on an output normalization. Is that what you’re saying?

Mike Thornton

Yes. I mean, there is going to be output normalization in the process, but instead of normalizing to peak level, we’re normalizing to loudness. So when you come to do that normalization, you are, in essence, turning things up and down. So, one LUFS, one loudness unit full scale is the same as one LKFS, loudness, K -weighted, full scale. Those two phrases have come into use. They are both identical, but the key thing I’m trying to get across here is that one LU is the same as turning the gain up by 1 dB. So if you want it louder by 1LU, you turn the gain up by 1 dB and it will happen. And so we’ve got that normalization, but we’re loudness normalizing. But the key thing here is that we’re not squashing the dynamic range completely out. It’s not so the whole program, the loudness is minus 23LUFS all the way through. There will be light and shade. There will be loud moments and quiet moments. And that’s great. In fact, that’s more than great. We want dynamic range. That said, we want dynamic range, which is appropriate for the audience and in the environment in which the audience is consuming the content. So if you’re in a cinema theatre, which in essence is a very well controlled, much easily more controllable environment, but most consumers’ environments are not that controllable. You know, you may well have people listening off-axis. You may well have the dishwasher on if you’re in the kitchen or a hundred and one other things, none of which I know anything about when I’m mixing the programme. It’s one of the things that really comes into the factor when we’re talking about intelligibility is that content mixed for cinema theoretical release cannot be taken and put on the broadcast environment and I would include streaming platforms in that category, i .e., content that’s going to be consumed at home, you can’t take content that was mixed for the theatre and expect it to work when it’s being played in a domestic environment.

Jason Nicholas

Just a sort of an aside question here. I mean, all of what we’re talking about is really important in the mixing stage, but is there anything else that people in production or other post -production professionals can have in mind, such as loudness that sound recordist or dialogue editors, especially in our case, that we should have in mind concerning loudness?

Mike Thornton

At the acquisition level, do what you’re always doing. You’re wanting to acquire clean dialogue. There are obviously all the issues, the challenging issues of unsuitable environments where film and TV are now shot, those are no different. Those problems are no different because we’re now working in the loudness workflow. So giving us the best cleanest audio possible remains priority one for acquisition. When it comes to production, as in post-production really, For me, whereas I would have used a conventional level meter, so a PPM or VU meter or whatever, to guide me, I would now suggest that having a loudness meter might be helpful. Because the key thing here is, at its very simplest, if we’re looking at the EBU spec, we’re looking at a loudness, a target loudness of minus 23 LUFS, in the, then in the case of loudness, I would start looking at having my loudness at minus 20, around the minus 23 mark. Because on a loudness meter, you’ve effectively got a number of parameters which are being displayed. The BS 1770 spec has specified what those parameters are, but what they chose not to do was to specify how it should be displayed. They allowed loudness meter manufacturers to develop their own ways of displaying these parameters. So the three parameters are what’s called momentary loudness. So that’s the loudness averaged over the last 400 milliseconds, which interestingly is about the same what we call integration level, averaging process that a VU meter has. It’s around that sort of same number. So that’s the momentary loudness. Then you’ve got the short -term loudness, and that’s averaged over the last three seconds, and that’s the one that I tend to use when I’m level -setting. So adjusting, getting things in the right sort of park. And then we’ve got what’s called the integrated loudness. And that’s the average loudness for the whole program, whether it’s a 30 second trail or a one-hour drama. You obviously don’t get the integrated loudness until the mix is finished. So that’s why you have these different types of measuring. So averaging over 400 milliseconds or averaging over three seconds to help us steer us into the right zone. And so from that point of view, in the post -production process, the meter that’s most interesting to me is the short -term loudness averaging over three seconds.

Neil Hillman

How do the loudness standards compare between music, radio, speech radio, television and movies? And do you think the consumer is getting an appreciable benefit of this extra care that should be being taken? I mean, it’s still a confusing sort of platform because R128. seven luffs and minus 16. It’s an enormous difference. There are. There are. Although, yes, there are and there isn’t. In terms of broadcasting, and I use that in its old -fashioned format, so radio, television, you know, transmitters, that sort of format, we’re looking in essence at minus 23 or minus 24, depending on which spec you’re looking at, and a maximum true peak of either minus 1 or minus 2. And those are the only two parameters in the BS1770 spec, which become pass-fail criteria in a delivery. There are other elements. So there’s the loudness range, which is a kind of, it’s the loudness equivalent of a dynamic range measurement. So it’s looking at the very loud stuff between the very loud content and the quieter content and giving us a number. And those are advisory. So again, in a context where it’s a domestic environment, I would, my advice is nothing larger than a 10, what they call LRA, loudness range number, and possibly even less. I mean, so at that parameter, at that point, you’re considering how the content is being consumed. A theatrical cinema film might have an LRA of 16, 18, 20, so a much wider dynamic change, whereas content for domestic environment. So what you’ll often see is in delivery specs is an advice on, we think that the LRA should be between 6 and 10 is often on delivery specs. But they’re all using the same BS 1770. The same effectively, B .S. 1770 was a model that was developed significantly with a significant amount of listening tests to try and put on objective measurement on how we perceive loudness because the reality is we all perceive loudness differently; so I tell you what we’ll do at this point, we’re going to play a small clip which has two pieces of audio and what I’ll ask the listeners to do is to have a listen to these two clips and then think about which one do you think is louder of the two clips?

SOUND EXCERPTS PLAY: horses hooves and music

Mike Thornton

So having listened to those two pieces of content, I would expect that some of you will have said, well, the first one, the horse’s hooves is louder than the second one, the music. Or some of you may have thought the other. And that’s one of the challenges. And it’s not just about the type of content, but also about the frequency. I sometimes play two pieces of pink noise, noise, a low, low frequency, one and high frequency, and asked the same question. Because we also perceive loudness differently depending on the frequency content. And this is something that we’ve know about for a long time. You have the Fletcher-Munson curves that came out in the 1930s, which show that the frequency response of our hearing is different depending on whether it’s loud or quiet. So there are all these different parameters at play. And some very clever research people looked at all of this, came up with a weighting curve called the K -waiting curve, and that was also driven by a significant amount of listening tests because we wanted to make sure that we come up with a measurement system related to most people’s hearing and how we hear and therefore how we perceive loudness.

Neil Hillman

There’s been a heated debate, hasn’t there, about the playback levels of films in movie theatres for years? And I’ve certainly been to screenings where the playback level is painfully high. We’re told that the optimum level for a theatre to calibrate and set its replay system is at something called Dolby 7, literally at number 7 on the rotary volume dial of the theatre’s replay amplifiers. And then this should be a perfect match for the level that the film was originally mixed at. But this rarely happens at our local multi -screen picture house, does it?

Mike Thornton

It doesn’t. And it is the cinema’s equivalent of the loudness wars. The way the cinema works is that loudness currently isn’t yet really part of the theoretical cinema workflow. The concept is that when you’re in the mixing theatre, which is usually at a fair size, because the whole point is you’re mixing in a space not dissimilar to a theatre that the audience will be listening, consuming the content. So you don’t, you shouldn’t ideally mix a cinema release in a small bedroom studio like this. It’s not appropriate. So you need a large space. And so they say that you calibrate your monitors at 85 dBA. And then you mix your content to suit. So you effectively are determining the loudness. You’re not completely determine the loudness because if you’re working to a calibrated monitoring system, inherently there is a degree of calibration in there. And what they say is when you’re encoding, when you’re mixing and encoding the cinema release, the volume control on your Dolby box, your encoding Dolby box is set at 7. And then when it comes to the theatre, their decoding box should also be set to seven. However, we’ve had a loudness war in the cinema workflow. And so each director wants their film louder than anybody else’s film. So we end up going down similar processes to get a perceived loudness without blowing, you know, without hitting digital headroom and so what happens is the director insists that this film is made louder and then it’s delivered and then in the multiplex the consumers complain to the management saying that I can’t listen to this film it’s too loud it’s uncomfortable and the only recourse the projectionist and I kind of use that phrase advisedly, I’d say the operator of the equipment, has no other alternative than to turn the knob down on the Dolby Decoder. And so most films these days are now being played back at about Dolby 4 point 5.

Neil Hillman

Let me guess, in the mixing theatre to compensate for that, they’ve been cranking it up at the other end?

Mike Thornton

Well, yeah, that’s another way of fixing the problem. Yes, exactly. And so you end up with this vicious circle. So we’re effectively still in the loudness walls in the cinema world. Whereas in the broadcast world, we’ve implemented a loudness normalization process. And it is my view that there will be a loudness normalization process implemented a workflow at some point.

Jason Nicholas

Yeah, a couple of months ago here in Sydney I was at a fully specked out Dolby Atmos mixing theatre and I was viewing I won’t say what it is but it’s a current large blockbuster Hollywood film and it was it was at 85 decibels but the content of the sonic content of the film was just an onslaught of sound. So it was very loud perceptibly. And I guess an aside thing about this is that it could be an occupational hazard for mixers in those types of environments working over weeks and weeks on a mix.

Mike Thornton

Absolutely.

Neil Hillman

Mike, how important is the way in which we monitor the programs we’re mixing? I don’t mix on headphones, but I do sometimes dialogue edit wearing them. The speakers in my studio calibrated and matched, but the suggested monitoring reference level of 79 dB for my small room, which is a similar size to yours, is way too loud for me to work with. And also in the mixing theatre calibrated at 85 dB. In the past, I’ve asked my mixing partner, if they mind if we mix at a slightly lower level, even though the material was for theatrical release. I’ve even heard of some movie mixers working with in-ear attenuators because the monitors are running too loudly on the mixing stage for extended periods of time.

Mike Thornton

Yeah, no, absolutely. Now, we’ve talked about the cinema release theatre being calibrated at 85 dB. And again, the calibration only works if we all play by the rules. If we start bending the rules, it stops working. So for me, the 85, as long as it’s a decent -sized room, is a valid benchmark. But when it comes to loudness workflows, I would also say that monitoring calibration, in fact, if anything, is more important than ever. What you want to be able to get to is a point where in your space, however big or small it is, whatever it is, when it seems too loud, you’re perceiving the mix to be too loud, it is too loud, as in what’s being on the meters. And equally, when it seems too quiet, it is too quiet when related to the meters. And what I, when I talk about calibrating your monitors, there are these guidelines. So you talked about, you know, 79 for a smaller room. And I would agree that quite often I don’t mix at 79. I mix or something nearer 76. But actually, the number doesn’t matter is the point I really want to get across. As part of the calibration process, when you’re setting up your monitoring, and it’s absolutely key that once you’ve set that point, you don’t turn too loud.

So what you need to do is to adjust your monitor volume control and increase the monitor level by 2DB. And what you’ll end up with, it might be a bit hit and miss to start with, but you will get to a point where you, the room, and effectively calibrate, you have a calibrated system so that When you mix, you will naturally mix to the correct loudness spec. And because our own ears are actually an incredibly good loudness measuring device. And naturally. So this is how live mixers do it. They work in the calibrated environment because they can’t go back and put it through a process to correct the gain structure to get the loudness bang on spec, they have to mix it right first time and that’s how they do it is because they are working in a calibrated environment and it happens naturally.

Jason Nicholas

So a couple kind of transitional questions here… Where are we going with object -based sound and loudness and what kind of opportunities are there for very personalized mixes that maintain the intent of filmmakers and mixers, but they’re optimized for the individual listener, and what’s the role of machine learning? I suppose if your amp or your headphones know where you personally have hearing deficiencies or sensitivities, I’m assuming like room correction, a personalized mix balance could be generated dynamically for an individual. And I’ve noticed recently that some music producers or mixing and mastering entirely on Apple’s AirPod headphones with the thought that their listeners will also have that specific device and listen in spatial audio. Is there a potential within some of these technologies to embed metadata or something like that within a mix that allows the listeners to more accurately experience the intent of the sound mixer?

Mike Thornton

I think object-based audio is, in many respects, the way forward. But this really is one of the answers to intelligibility. Because obviously, if you’ve got someone whose hearing is impaired, they’re going to find it harder to pick out the key elements that we want, the key parts of the narrative, from the rest of the mix. As our hearing deteriorates, that ability to be able to hone-in and pick up the sounds that we need to be able to listen to deteriorates, not least of which, because we’re losing our top end. Most of the time hearing deficiency, you’re losing your high frequencies. And that’s where all the consonant edges are, all the things that actually enable us to determine what is being said, is all in the consonants. And the consonants of all, you know, Bs and C’s and they’re   all high frequency. If you muffle those by losing the high frequencies, it’s much harder to determine a B from a C to a D. You can’t tell what the edge is. And so what we  need to have, and there’s been quite a bit of work done, is to be able to effectively deliver stems, for want of a better word, but actually they’ve taken it one stage further in some of the research work. So there’s quite a lot of research happening here in Manchester and that really focused the mind of the production department and they started looking and with the help of Salford university they did some work and what they did is they talked about the key narrative elements, they would score the key elements, so for instance all dialogue usually had a score, let’s say, of one. One is the most important. Five is the least important. But then certain other elements, usually like background sound effects or to a degree music, they would be scored much lower. And then the stems will be sent out to the consumer. And then the consumer would actually just have a single control that said more intelligible, less, you know, from normal to I want it much more intelligible. And they were using these scoring factors. So rather than having like a dialogue music effects stems, which is the sort of thing you would normally think, they would have a number one stem. everything that was scored with number two would be in a different stem everything scored with the number three would be in a third stem and so on and so forth and then what it meant is that the consumer didn’t have to know about music effects and didn’t have to become and learn how to be a mixer they could say well I need it more intelligible and just move the slider to being it more intelligible but in essence it was using object -based audio. In other words, separate, I mean, object -based, we often consider in the Dolby -Atmos workflow, but, you know, individual items that we can move around the sound field. But in essence, it’s an overarching term for delivering certain elements of the mix separately to another one. So, you know, and in this case, for improved intelligibility, especially for hearing impaired people, that ability to be able to send stems, whether it’s traditional stems, music effects dialogue, or whether it’s a scored set of stems, which, again, is my preference, because it comes back to the key narrative elements, is for me the way forward. Yes, machine learning, etc could play a part and we’ve kind of start of got that with personal HTRF but essentially that’s what happens when you take an immersive mix and you deliver it binaural into ear pods that’s the mechanism that’s used and at the moment most of us are using a generic one but if you can use a personalized one which is then personal to you and will have an impact of things like the shape of your earlobe as well as how good or not your hearing is, then yes, there are elements, but those are going to not perhaps be as effective as being able to effectively tweet the level of intelligibility. And there is some work being done by some of the research organisations to try and somehow gain intelligibility back at the consumer’s end. Again, I’m less keen about that idea. I would much rather we delivered content that was intelligible in the first place.

Neil Hillman

Well, that does bring us well and truly onto intelligibility because viewers’ complaints about loudness is a topic that has caused some consternation. But the most spirited feedback from audience has been that dialogue intelligibility is not what they’re looking for. So at its simplest, for some reason, research has shown us that around 80 % of viewers, even with a modern television set, are now watching their programs with the subtitles on. Now, that’s not because it’s a foreign language film, but because they can’t make out what’s being said easily enough in their own language. Now, everyone I know who works in sound thinks that that’s something we should be ashamed of as professionals. And there have even been a few instances of viewers’ complaints making the national news. In Britain, for example, the SSGB and police series Happy Valley came under fire for dialogue that was difficult to hear. The lavish costume drama Jamaica Inn was given the title Mumblegate by the mainstream media, and the last version of Charles Dickens Christmas Carol, an otherwise brilliant adaptation by Stephen Knight, was taken down off iPlayer and remixed overnight. Now, well, four of those dramas were…

responsible and how do we fix it, Mike?

Mike Thornton

Oh, well, yeah. When it comes to intelligibility, there are a raft of factors that are at play, some of which you’ve already alluded to. But there is no one single reason. I mean, you highlighted the television, the speaker in the television, and that is a factor, no doubt about it. But there are a wide range of factors at play. And often they combine, so you get one, not just one factor, but a combination of these factors. So, for example, mumbling actors, as we’ve referred to and Mumblegate, there’s a sense of desire for realism. And that shows itself in two ways. One is that actors are now, when they whisper, I I mean, okay, go back to the good old -fashioned days of theatre. You had a stage whisper. They were whispering, but the person on the back row of the theatre could still hear it. And that was something that was taught at drama school. Actors were taught how to do this. So you had to get across the perception that they were whispering, but it needed to be clearly audible on the back cheap seats. We have a situation now where that sort of desire for realism is outplaying the need for the content to be clearly audible and clearly visible. You talked about SSGB in your introduction, the second problem that perceives itself in terms of realism is if it’s dark, you can’t see the lips moving. So, I mean, all right, there are people with hearing deficiencies who depend on lit reading, but actually all of us are getting cues from lip reading. We may not be able to lip read perfectly, but it’s part of the information that we receive. And so, knows what’s being said because they’re reading it off the script. And again, by the time it comes to the mix, the director will almost certainly know every line off by heart because they’ve heard it so many times in the process. So they will say to the audio mixer, yeah, of course I can hear that. That’s fine. Yes, you can hear it because you know what’s being said, not the fact that it’s not intelligible. And so that is a real key point about pre -existing knowledge. It’s got nothing to do with the technology, nothing to do with anything. But when the director is making a call about how clear the dialogue is or whether it’s being masked by music or sound effects, that’s a factor. They can still hear what’s being said because they know what’s being said. So the next one is changes in production techniques. We’ve got a lot more multi -camera shoot. So back in the old days when you’re talking about the tele with the small speaker and all the rest of it, if it was a drama, it was all single camera shot and it was being recorded with a boom mic just out of shot and we could get very good quality audio with a boom mic that rejected the sounds that we didn’t want to come in because it’s a very directional microphone and we could deliver really good clear dialogue and it was much more natural. Now with multi -camera shooting, where they’re shooting the same scene from several cameras at several directions, we can’t get the boom mic in close enough to get a good recording. So now what happens is all the actors are wearing hidden radio mics. Whoever listens to somebody talking by putting their ear against their chest? But even so, they’re also hidden. They’re buried under clothes. Well, that’s going to take some of the top end off. impact on intelligibility. Then we’ve got the whole area of loudness range being too high, effectively taking a cinematic mix and in the case of streaming services, just transmitting that on the streaming service. It’s not being remixed for broadcast. There was a time there was budget set aside to remix that content to more suit the environment in which it was being consumed. That happens less and less. Certainly, to my knowledge, it’s not happening on the main streaming services. They are simply taking the cinematic mix and delivering it via streaming. And that’s why we have this slightly different delivery spec for most streaming services of minus 27 LUFS. Well, it isn’t minus 27 LUFS. It’s minus 27 Dial-norm, because they’ve actually also gone back to the old spec, which was effectively, you measured the loudness only of the dialogue. And there was the Dolby Dial-norm algorithm, which was able to extract the dialogue from the full mix and say how loud the dialogue was, because that’s what they were concerned with. In the early days of loudness, before BS1770, well, before R128 and  ATSC-85 and OP-59, all these started coming out, that was a kind of interim loudness measurement where we were more interested in the dialogue. And the streaming services have re -adopted that philosophy. Then we come to the point that, Neil, you’ve raised at the beginning, which is the speakers. In the good old days of CRT tubes, the cabinet was quite a reasonable size. And then you would have, also on the front, you’d have enough space to put a front facing, albeit elliptical, because we couldn’t make a round speakers that would have made the box slightly too, but a decent size, elliptical speaker in a cabinet that was a reasonable size and we were in a fighting chance. Now with flat screen televisions, there are no forwards facing loud speakers or most of the loudspeakers are rear facing because you don’t want a big bezel round the edge of the frame of the flat screen television. So the speakers are either downward facing, rearwards facing. Well, we’re already on the losing battle before we even start. Never mind the fact that the speakers are minute because again, it’s determined it’s being controlled by the size of the box that the flat screen television is in. So yes, that is a factor, but it’s my key point is it’s not the only factor. So we start using, we start moving into sound bars and all the rest of it to try and improve the sound because although the picture quality of the flat screen television is vastly superior to the old CRTs the sound reproduction has gone the other way and then there’s effectively there’s there are factors I’ll just highlight them to do with down mixing. So if you mix in, say, 5 .1, we as audio professionals have no control over how that is downmixed to a stereo -compatible mix for people who don’t have a 5 .1 system, which in reality is most people’s domestic environments are not going to be a 5 .1. The down mix is a set of parameters which is predetermined and equally can have an impact on intelligibility depending on the content that’s being mixed. Also, whether you’ve got a phantom centre. So the great thing about 5 .1 is that all dialogue would always come from the screen. But of course, the reality is, as we know, that all these clever ideas are 5 .1 and this, that and the other, and Atmos, don’t translate to most people’s domestic environments. And so the down mixing can have an impact. And finally, it’s something we’ve talked about before, but it becomes, in my mind, it becomes really important. And I’ve done quite a bit of research on this, is the mix must be done in a correctly sized room. As we talked about, if you mix in a big space, and a lot of TV drama now is being mixed in cinema theatrical dubbing suites because it’s it is effectively a film production that’s the workflow well that creates a mix which is going to be on play it’s going to be less intelligible for the domestic consumer and so mixing in an appropriate size space is is another key part and a key factor where it’s not been happening because more and more TV contents have been mixed in large dubbing theatres rather than small dubbing theatres which are more akin to a domestic environment.

Neil Hillman

Mike, thank you so much for being our guest today. It really has been a pleasure to hear your expert views on such important and topical technical issues.

Mike Thornton

Oh, it’s been great to join and be to talk about subjects that are close to my heart.

Neil Hillman

The viewing links at Production Expert to Mike Thornton’s fantastic tutorial videos on loudness are on today’s show notes page, along with Jason and my contact details as enthusiastic and experienced work for hire dialogue editors; my work as a sound producer, overseeing films from pre -production right through to post -production; as well as the live online coaching sessions we run that we like to think are relevant and useful for all craft creatives such as directors and picture editors and not just sound professionals.

Jason Nicholas

It’s our goal with our podcast to educate listeners from all backgrounds about the underappreciated wider roles sound plays in life as well as in the film and television industry and to also develop a sound language and style of dialogue that filmmakers can use to more easily communicate with each other about sound for moving pictures. If you like what you’ve heard today, please be sure to subscribe to the podcast and leave comments about what would be helpful to you and your work and who you’d like to hear from on the show. And lastly, thanks for listening.

Announcer Rosie

The Apple and Biscuit Show is written, produced, and presented by Jason Nicholas and Dr. Neil Hillman. It is edited and mixed by Jason Nicholas in our Sydney studios.

About the presenters:

Mike Thornton – 

Web: https://www.production-expert.com/author-bios/2019/11/26/mike-thornton 

Loudness tutorials: 

Start here for Part 1 of Mikes series of 6 videos: https://youtu.be/jvcfwtrL0uw?si=QVhc9ee-wgWb-cO8

Neil and Jason –

Details about Neil and Jason’s work as dialogue editors and mixers and how to contact them is here: https://www.theaudiosuite.com

Details of Neil’s 1-to-1 and Coaching Programmes for ambitious media professionals are available at 

https://www.drneilhillman.com and  https://soundproducer.com.au/coaching

Technical notes:

Written, produced and presented by Jason Nicholas and Dr Neil Hillman

Recorded using the Squadcast remote recording system

Programme edited by Jason Nicholas

 

Leave a Reply

Dr. Neil Hillman MPSE

Brisbane,
QLD 4073,
Australia…

… And world-wide online.

I live and work on the lands of the Aboriginal and Torres Strait Islander Peoples and I recognise them as the Traditional Custodians of this country.

T: +61 (0)431 983 262
E: neil@drneilhillman.com