But success hasn’t sent Minto to rest on his laurels. With every new release, he and the team at DICE are looking for ways to push the boundaries of sound, to take the next step to delivery an exemplary sound experience to game players. Here, Minto shares his thoughts of what’s happening in the game sound industry, what opportunities lay ahead, and what can help future game sound pros to get into the industry.
Interview by Jennifer Walden
What’s one advancement you’ve seen in game sound in the past year that you’re excited about?
Ben Minto (BM): For us at DICE, being able to render our final audio output in either object or traditional channel-based audio has been the most recent advancement that has had the largest impact across the audio team and in the products we deliver.
As far as content creation, design, and implementation in the Frostbite engine, little has changed to date. One notable exception was removing an old “shortcut” or assumption that we used to make. That is, when only rendering to a flat horizontal plane of channels (e.g. 7.1 surround or stereo), we would ignore the real-world height component from most audio-only emitters, i.e. those without a visual reference. For example, the “invisible” birds in trees we would spawn X meters away from the listener but in the same horizontal plane, as in the end we would be ‘compressing’ our 3D soundscape down into a 2D plane with no height information. Now, the ‘added’ height dimension details are great when trying to sell the overall audio experience.
Global DSP effects (e.g. low-pass filtering most sounds for low health), reverb returns, mixing and mastering, and other topics in the domain of object audio have all presented their own challenges and will continue to provide rich pickings for R&D and improvements for the foreseeable future.
Object audio is a format that many working with game audio will soon be working with, if they haven’t already embraced it
Moving to object audio early enabled us to become familiar with the format, to revisit and challenge old channel-based paradigms and to be ready with our first pass solutions, and in most cases a few iterations more, to address the previously mentioned challenges in time for embracing Dolby Atmos and shipping the first two titles, Star Wars Battlefront and Battlefield 1, with full Dolby Atmos over HDMI on the PC.
Being able to provide the Star Wars Battlefront codebase, already running object audio, to our colleagues at Criterion enabled them to build upon our work and further develop their own extensions they needed to deliver the audio experience for their PlayStation VR Title, Star Wars Battlefront – Rogue One: X-Wing VR Mission.
With Dolby Atmos coming to Xbox One and Windows 10, as part of their Windows Sonic platform, object audio is a format that many working with game audio will soon be working with, if they haven’t already embraced it.
The trailer for Battlefield 1, one of the games Ben Minto has worked on
What’s the biggest challenge for game audio at the moment? How do you see that resolved in the future? Tech wise, what would you want to see for game sound?
BM: In the general case, the complexity and scope of core assets is expanding exponentially with each iteration. The number of cars, guns, blasters, etc. in most titles are increasing with every console generation. This growth is being further compounded by DLC (downloadable content), service models and customization options.
The underlying audio models which handle the sound for these assets are also constantly increasing in complexity. For example, ten years ago a single weapon may have had its firing audio covered by a hand full of variants, or maybe even a single lone variant of ‘gunfire.wav.’ Today a weapon patch may combine many different layers at runtime, with each layer having the ability to play different content relating to various runtime game parameters, for example, different reflections based on environment type or different tails depending on distance from the listener.
If you go down the path of having a unique patch and unique sample data for each core asset, then the total number and complexity of each asset types increases, thus the number of patches and amount of sample data increases, maybe even exponentially! Handling the content creation, data management, consistency, debugging, etc. will become more and more time consuming and expensive.
Content sharing is the first step. For example, with weapons we could maybe share the layer “pistol tails” across all pistol models, whilst keeping some components unique for each individual gun asset. This will save on the total amount of unique sample data needed and give consistency across the pistol family.
[tweet_box]The Future Of Game Audio – with Ben Minto[/tweet_box]
Building hierarchical families of sound patches can be an optimal way of reducing the overhead needed for maintaining a large number of patches. For example, we may define a master weapon patch, and then from this define a child patch, e.g. pistol, which references the parent patch but only stores the differences from the parent patch (or where it deviates). Then from this child, we can have a further child (or grandchild to the original master patch). For example, a Walther PPK pistol would only store information about how it deviates from its parent ‘pistol’ patch.
In the example case of having to add a silencer to all weapons, then we just add that to the master weapon patch and in doing so it will propagate down through all generations
Now whilst each weapon will have its own patch, these are not entirely unique and will have varying degrees of shared behavior and content. The power of using a system like this comes when we discover a bug, or when we need to add new functionality such as a silencer to all weapons, or when we add more assets to the system, say through DLC. In the example case of having to add a silencer to all weapons, then we just add that to the master weapon patch and in doing so it will propagate down through all generations (as needed). Compare this approach to having to manually add and maintain silencer behavior and content for each individual weapon patch. The possibility for human (copy/paste) error and the time it takes are going to be a function of the total number of assets.
These are areas that we have faced at DICE over the years and we have successfully implemented processes and pipelines, together with the Frostbite development team, that can handle this continual growth in the complexity and total number of assets within a given project without the ‘costs’ growing exponentially. (For further reading see GDC 2015: Martin Loxton – “Smart Sound Design Using Modularity and Data Inheritance”)
So, as a more DICE specific challenge, carrying on from this, is the ability we now have to create ‘monsters.’ Due to the way we cull sounds even before they are created, based on their perceived inaudibility as determined by our HDR (mixing) solution, we can play fewer more ‘expensive’ sounds at runtime, compared to if we didn’t pre-cull sounds. A DICE 64-player MP game may want to play hundreds of sounds at any given moment (e.g. Foley for each player, weapons fire, vehicles, voice over, destruction, etc.), and yet we are usually only rendering around 20 or so sound patches at any given time.
Popular on A Sound Effect right now - article continues below:
-
42 %OFF28 %OFF28 %OFF
-
40 %OFF
As our CPU and memory budgets grow, so do these patches. As of today, these patches are manageable, readable and workable, but it feels like we are quickly approaching the human limitations of complexity. Whilst we have migrated to shared and hierarchical driven systems, somewhere at the heart of it all the ‘monster’ master patches dwell — behemoths constructed from conditional logic, events, sample players, DSP, mixing, output routing, etc., in all their glory. Most day-to-day work is carried out at the child level (or even great-great-great grandchild level!) in the hierarchy, but some days you just need to get in there and wrestle with these master patches.
Somewhere at the heart of it all the ‘monster’ master patches dwell — behemoths constructed from conditional logic, events, sample players, DSP, mixing, output routing, etc., in all their glory
In our Frostbite Editor all patches are represented in a graphical fashion — somewhat similar to a modular synth patch or something from Max/MSP. Currently in a sound patch all the base constituent nodes are always shown, e.g. ADSR, Sampler, Scale Clamp, Flanger, etc. and you navigate across the entire patch. In the future, simply being able to bundle groups of nodes into a prefab is going to be a simple solution to reduce the immediate visual complexity and improve the overall readability of our patches. Then these prefabs in turn can be used as nodes, which can then in turn be used inside larger prefabs and so on. The complexity will still exist, but not as a barrier. And, it should be that you never have to face the ‘monster’ head on unless you really want or need to.
One of the trailers for Star Wars Battlefront, another game Ben Minto worked on
Creatively, what would you like to see in the future for game sound?
BM: I think that once the boundaries and constraints disappeared for how games can be delivered to their audience —moving away from a ‘single price point packaged hardware goods’ model to the plethora of deliver mechanisms we have today, e.g. boxed, download, free to play, browser, streaming, multi-price point,, etc., that gave game creators a multitude of stages and a greater opportunity to be heard.
Any creative style and direction now has the chance to be represented and delivered through the game medium, so that the only barrier is finding a title that fits/accepts a certain style and tone (or vice versa). And if you can’t find one, then learn Unity and make your own game to match your audio style!
At least once a week something new will prick your ears – the kickstarter for Narita Boy launched not long ago with a heavily stylized trailer (audio too!). Here’s hoping that tone carries through to the finished product.
Any creative style and direction now has the chance to be represented and delivered through the game medium
There are great pillars of audio creativity that have mainstream appeal and approval, like Inside, and many titles on the horizon like Cuphead that have the possibility of delivering very creative and unique soundscapes.
What would I like to hear in the future? More like this please! Lots of variety, uniqueness, being bold, being edgy, being different, etc.
In terms of your own titles, any exciting projects on the horizon you can talk about?
BM: For the current project I am working on, being part of a true sequel team — as in all members of the audio department that were on the previous title and its expansions are also part of the team we have today on the sequel, is a great experience. I think it’s the first time I’ve ever been part of such a team. The team’s existing bonds, relationships and muscle memory make everything flow so much more smoothly and elegantly. This combined with working with two sister studios, EA Motive and Criterion, makes for an even larger and even happier audio family on this project.
That’s all exciting for me. As an individual you can achieve a certain result, but being part of a well-oiled and functioning team working towards a unified and consistent goal brings its own rewards, where it always feels like the whole is greater than the sum of the individual talents.
Want to know more about the sound for Battlefield 1, the Star Wars Battlefront trailer sound design – or the future of game audio? Check out these A Sound Effect stories below:
• Behind the sound of Battlefield 1
• Behind the Star Wars Battlefront trailer sound design
• The Future of Game Audio – a Q&A with Matthew Smith
What opportunities do you hope VR will offer game sound pros?
BM: VR, by its very nature, relies on a more intimate relationship between the listener and the (audio) experience. The player is shut off visually and sonically from real ‘reality’. They are in a situation where they are supposed to believe that they are really there. The player is not an avatar or another playable character, but is ‘really’ being exposed to the stimuli that he or she perceives. That places a great importance on, amongst other things, having a convincing soundscape.
Many working within the AAA space and beyond already benefit from this acceptance of the value of audio as part of the whole package
A good VR experience demands ‘good audio.’ Hopefully with this understanding a greater universal importance will be placed on our audio craft, both on the content and on those who work within the field. Many working within the AAA space and beyond already benefit from this acceptance of the value of audio as part of the whole package. I feel that VR and its demands will only help push this mantra even further.
Once we bring such experiences closer to the player, we must get more things “righter.” Simple everyday sounds, which we are very familiar with in the real world, need to be carefully modelled inside VR, especially those, for example, dealing with detailed and very subtle, delicate, personal Foley (e.g. simply moving our arm or turning in a seat) and interactions with familiar objects (e.g. handling a cup or flicking a switch). Getting these ‘right’ can make the ‘fantastic’ that we usually deal with in games look relatively easy! There are plenty of opportunities here for improvement.

Recording Tatooine ambiences in Dubai for Star Wars
Beyond VR and AR, any other major trends you’re seeing?
BM: Rightly so, there is still a great drive to getting sounds to sit and propagate “properly,” or as expected, through our virtual worlds. As a note, this is another trend that’s seeing further drive from the push for compelling VR experiences. We still haven’t nailed it. There are a variety of different solutions being followed by different groups, from the purely DSP/run-time approach to fully content driven implementations and many other hybrids in between.
This is a great example of something we at DICE term “Awesome vs. Authentic.” Do we always want real world behavior? Does real always sound right? We have managed to improve the level of detail, consistency and information that we can encode, and that can be readily decoded by our listeners in our sounds: What was that? How far away was it? Where did it happen? Is it a threat? Was it important? There it is again! So, how do we want to go about encoding spatial information into sounds? Do we want to be more “correct” or more decodable (if a conflict exists)?
Do we always want real world behavior? Does real always sound right?
Working in a built-up city I’m still surprised by how often physics gets it “wrong” when a helicopter flies overhead or an ambulance approaches from a distance. All the “conflicting” reflections from the buildings make it really hard for my brain to pinpoint where the sound is coming from, its path and also its direction of travel. Is this something we want to replicate in our title or do we want to bend the rules to make the scenarios more readable?
At DICE we embrace a hybridized solution, based on specific content and runtime calculations and DSP, and that for every iteration, adds more functionality, depth and “awesomeness” (awesomeness being guided by what is correct, but regularly being bent so it sounds “right”).
For game audio pros, how do you see the landscape changing in terms of job opportunities, must-have skills, and platforms to focus on? Do you have any advice for the next generation of game sound professionals?
BM: In the short term, I don’t see very much changing from where we are today. We have quite a high demand on individuals to be able to touch all aspects of the audio development process. We don’t typically divide roles into creative or technical, although we know that everyone has their own place on the spectrum — some are better or happier working in one area or specialization. While that can be essential on a per project basis, being able to see the bigger picture and being more flexible with fit is going to be beneficial when working in either a large team, a smaller one, or even as a lone wolf.
The quality and experience we are seeing from applicants these days is exceedingly high. Even when looking for interns, where the internship is part of an ongoing course, we receive applications from people who have learned the fundamentals of FMOD, WWise, Unity, etc. They already have a game or two under their belts. They know every DAW and associated plug-ins. They regularly go out field recording and have their own custom libraries.
The quality and experience we are seeing from applicants these days is exceedingly high
They are very socially visible (blog, Twitter, etc.) and have been working with audio for quite a few years already and are pretty damn good at it! These were the criteria we used to use to filter sound design applicants a few years ago, whereas now the average experience bar has moved up significantly. Personal qualities like team fit, self-motivation and the desire to learn and push boundaries are highly prized, and can often be more valuable and essential than experience or technical knowledge in demanding team-based setups.
Please share this:
-
20 %OFF
-
42 %OFF28 %OFF28 %OFF
-
40 %OFF