Mixing Square Waves

Traditionally, loud music is made loud mostly in the final stage of production: mastering. Recording engineers capture recordings with wide dynamic range, mix engineers use compression strategically to make the various tracks work together, and a mastering engineer reduces the dynamic range of the mixed track as a whole to prepare it for duplication and broadcast.

This multi-stage approach, moving from wider to narrower dynamic range, is one of the reasons that loud music still sounds “good” – distortion levels are relatively low, mixes are coherent, and the average listener doesn’t seem to perceive a huge reduction in audio quality as a result of loud mastering.

Audio production in games is fundamentally different from music production. Instead of recording, mixing, and mastering we have

  • asset creation (which can include recording, sound design, and music and voice-over preparation)

  • and mixing (which for this discussion will include implementation)

and an insidious thing happens between the two: asset approval. Before mixing a game can even begin, every asset is individually scrutinized, sometimes by a single audio engineer or game designer, sometimes by a board of people. Because there’s no control over the context in which these sounds are presented, it’s likely they’ll be heard at monitoring levels that are either far too low or wildly inconsistent, which inevitably results in demands for louder, more impressive, mastered sounds. This reversal of the proper audio processing path – mastering then mixing instead of vice versa – is at the root of a lot of bad sound in games.

It’s almost impossible to mix using mastered sounds. A gun shot that sits at -6 dB RMS for 500 milliseconds is going to mask every other sound playing at the same time, and that gun is going to be fired a thousand times in a play session. When a second heavily-compressed sound (like, say, another gun) is played simultaneously, the two sounds are going to get smashed into the game’s limiter and become even more grossly distorted – the result is probably just going to sound like noise. The only way to manage the mix at all is with complex ducking schemes, and even then voice overs are constantly getting buried in score and SFX because there’s never enough time in development to ensure that these schemes work across an entire 10+ hour game.

And we’re putting up with all these mixing challenges in service of assets that sound awful: the misguided pursuit of exaggerated, over-the-top sounds ensures that they’re as distorted, harsh, and uncomfortable to listen to as possible. I praised Dead Space 2‘s modest startup sequence last week, but it has this problem in almost every one of its violent SFX – the saw gun and screaming baby monsters are basically just full-scale noise in the 2-4 kHz region, the resonant frequency band of our ear canal, literally making them the loudest, most annoying sounds possible. And they’re played back constantly.

These kinds of sounds could be reined in and still achieve the desired aesthetic goal. The following is a recording of the Scout’s guns from Team Fortress 2 – a game I love, but one I can’t listen to for a long time without needing to reach for either earplugs or aspirin:

The second is that same recording with some simple volume curve manipulation to give the gunshots a more reasonable dynamic envelope:

Both of these fit in with TF2’s aesthetic: the edited version is still beefy and cartoonish, and it retains the original’s punch. But, because it spends so much less time at its peak loudness, it’s less fatiguing to listen to repeatedly, and it would be much easier to mix with.

(I rendered those two with their overall level reduced by 3dB so that I could show this last one – this is what the guns could sound like if the game were mixed just 3dB quieter and the shots were given more dramatic dynamic treatment:

Note that while we perceive this version as being about as loud as version 2, the transients here are punchier and more exciting. This is one of the benefits of making quieter games. Grab all three versions here if you want to compare them in your DAW.)

Sounds as loud as the TF2 gunshots tend to cause another problem: volume creep. A single very loud sound in a game is like a cancer. It forces us to make other sounds louder so they compare favorably to it, and a plague of distortion spreads out to every audio asset in the game, eventually making them all sound like garbage. It’s basically an internal Loudness War (a Loudness Civil War?). In order to be heard at all next to the saws and screaming babies, Dead Space 2’s “stomp” sounds like it was run through a damn Tube Screamer.

To prevent volume creep, we need tighter control over a game’s audio at every stage of development. There are emerging standards for monitor calibration for video game mixing which would help bring game mixes more in line with other home entertainment media (you can read a little about them here, here, and here), but these standards need to extend beyond mixing. Asset creation and evaluation need to take place at or above this 79dB calibration level. Everyone contributing to the audio design needs to be on the same page.

I realize that in some cases it is perhaps unreasonable to expect every far-flung group working on a game to be able to conform to this standard, but I have an idea for a quick and dirty method of “calibration” that can at least ensure that no one on a team is monitoring at too low a level. In my next post I’ll test this idea in a video on making gun sounds!