Written by Ira Bolden, photos courtesy of Ira Bolden
Spatial effects are a key element in creating clarity and dynamicity in any audio mix. In the world of stereo music, mix engineers will often create a sense of space and depth with techniques including panning, stereo widening, and the manipulation of time-based effects like reverb and delay. By shaping the space in which the music dwells, individual elements are allowed to breathe and interact in a way that creates a more engaging and polished listening experience.
The concept of spatialization in music has recently expanded thanks to the integration of binaural rendering engines into streaming services like Apple Music. Immersive audio formats such as Dolby Atmos, which were once the domain of the film industry, and hi-fi aficionados with ample disposable income, are now beginning to supplant traditional stereo as technologies advance, which enable their consumption over headphones. No longer limited to a stereo sound field or an audience restricted by access to surround sound speaker arrays, mix engineers now have the freedom and incentive to explore a sense of space that expands around, above, and behind the listener.
The rise of Spatial Audio in popular music signals a shift in the public consciousness that game designers would do well to take note of. As awareness of and demand for immersive sound increases, so too will quality expectations, applying pressure on game developers to come up with innovative new ways of improving the depth, accuracy, and engagement of their spatial sound design. Notable, too, is the fact that headphones are dominating as the primary medium by which listeners are consuming spatial audio. This means that as sound designers create in immersive mediums, they need to pay very close attention to how their work is going to translate within a binaural sound field.
BINAURAL AUDIO IN GAMES: A BRIEF HISTORY
Binaural Audio is a stereo format meant to emulate the way human ears experience sound in real-world environments. It can be produced by recording an audio source using a head-shaped microphone array or through digital rendering such as found in Apple Music, Dolby Atmos for Headphones, and Immerse Gaming software. Due to its effectiveness at translating an enveloping sense of sonic space with just two channels, as well as its assumption of a listening position at the center of the sound field, binaural audio is ideally suited to headphone playback. While approaches to binaural rendering vary widely from case to case, all of them rely on a Head Related Transfer Function (HRTF) at their core.
Game sound designers have been contending with 3D sound fields for a long time. Examples of binaural 3D audio implementation can be found as far back as 1998. In addition to being the year Brittany Spears secured her immortality as a global pop icon with the release of her single, “…Baby One More Time”, 1998 also saw the release of a slightly less lusty but no less influential piece of entertainment media: Half-Life.
Half-Life remains widely regarded as one of the most influential FPS titles of all time, lauded by critics and fans alike for its innovative contributions to the genre. Perhaps it was that same spirit of innovation that inspired the developers to integrate the Aureal A3D 2.0 audio engine into the game. Although the accuracy of its spatial definition pales in comparison to more modern examples, Half-Life’s use of binaural audio was groundbreaking for its time. To hear Aureal A3D 2.0 in action and to learn more about its tragic history, check out this YouTube video.
The untimely demise of Aureal A3D would not be the end of binaural audio in games. Many other companies entered the fray with their own 3D audio engines. From the practically ancient DirectSound3D, RealSpace3D, and Phonon 3D to more recent examples like Windows Sonic, Steam Audio, and Resonance, immersive audio technology has seen continuous innovation and improvement throughout its sorted yet unrelenting history. Yet, with nearly every iteration and permutation, one common thread has persisted: the use of generic HRTFs.
YOU’RE NO DUMMY: THE NECESSITY OF HRTF PERSONALIZATION
HRTF is the scientific designation given to the set of data that explains how you hear stuff. HRTFs are completely unique to every individual and include a mind-boggling array of anatomical idiosyncrasies, all of which our wrinkled human brains somehow manage to take into consideration when localizing sound. They’ve been the subject of extensive research and the linchpin of every “3D audio for headphones” solution you’ve ever heard. In the context of video games, an HRTF functions as the mathematical representation of “You” within a 3D sound field.
It goes without saying that you are not a B&K Head and Torso Simulator. You are also not the anatomical equivalent of your nearest neighbor unless perhaps you live in an underground facility populated with clones of yourself. And yet, sound designers have been forced time and time again to rely on generic HRTF modeling techniques, including the popular Nearest Neighbor approach featured in the Sony Tempest engine, to enable their players to experience immersive audio. This is tantamount to forcing you to listen with someone else’s ears; they may be ears, sure, but they’re not yours, and your brain knows it.
Have you ever listened to spatial audio on headphones and thought, “This doesn’t sound right?” Maybe you’re having a hard time distinguishing between sounds in front of you and behind you. Maybe everything sounds like it’s underwater, or maybe everything sounds like it’s above you for some reason. Although HRTF issues are not the sole contributing factor to negative spatial audio experiences like these, they rank very high on the list. This presents a significant problem both for the design phase and the end user experience, affecting both the designer’s ability to create immersive audio that will accurately translate to headphones and the player’s ability to… well, enjoy it.
So, if they’re so problematic, why use generic HRTFs at all?
AVOIDING YOUR PROBLEMS: PERSONALIZATION BY HALF-MEASURES
HRTF Personalization is not as easy as it might seem. In fact, it requires such a specialized base of knowledge that most companies choose to circumnavigate it entirely. Even those solutions that tout some degree of personalization are not actually delivering unique HRTFs to every user. Instead, they’re personalizing by half measures.
Take the aforementioned Nearest Neighbor method, for example. With this method, a user selects or is assigned an HRTF from a finite pool of premade HRTFs. The number of available HRTFs can vary, as can the sophistication of the matching methodology. For example, users may be asked to perform a sort of spatial imaging listening quiz that funnels them to one of three possible HRTF options based on their responses. Other solutions could require a full face scan from which certain identifiers are extracted to match the closest available HRTF, but these solutions are not usually very transparent with respect to what those identifiers are and how accurate the whole process really is across demographics.
Another half measure is to start with an artificial HRTF, usually based on one of those HATS dummy heads, and then allow the user to modify a limited set of parameters based on their personal measurements. Waves NX is one example of such an approach where users were able to enter their head circumference and interaural arc in place of the generic’s. These kinds of approaches, while better than no personalization at all, still fall short of true HRTF personalization because they operate using a very limited number of parameters that aren’t sufficient in covering the natural variety that occurs between every individual HRTF.
Granted, the traditional methods for actual personalization are quite cumbersome and in no way scalable for mass market distribution. Binaural microphones and anechoic chambers are cool and everything, but nobody wants to sit in a chair for hours while acoustic researchers measure impulse responses. Well, maybe not nobody – but you get the point.
ARTIFICIAL INTELLIGENCE: UNLOCKING SPATIAL AUDIO’S TRUE POTENTIAL
In addition to plagiarizing artists on a mass scale hitherto unimaginable, Artificial Intelligence is enabling us to achieve some pretty miraculous things. Among the list of AI’s accomplishments is the ability to generate entirely unique personalized HRTFs for anyone within 30 seconds, using just a smartphone and cloud computing. This relatively recent development has been deployed for the mass market in FINAL FANTASY XIV.
With the release of Immerse Gamepack, FINAL FANTASY XIV became the first MMORPG ever to integrate personalized binaurally rendered Higher Order Ambisonics (HOA) audio. With the exception of being limited to PC only (for now), this immersive experience is hardware agnostic – meaning players can hear it using any headphones. This was accomplished with the help of the Immerse™ AI Engine, developed by Embody – that’s us!
“With standard stereo audio technology, it can be difficult to fully grasp where the audio is coming from. Since each person has a unique head shape and ear positioning, audio coming from behind or above us is perceived differently by everyone. To address this, the Immerse Gamepack uses AI to analyze a photo of a person’s ear to tailor sounds specifically for that individual.” – Go “Kinugo” Kinuya, FFXIV Sound Team
Immerse employs a novel 3D reconstruction algorithm that is modeled taking into account the geometry of the human ear. A 2D image or video capture is fed into the algorithm from which a 3D model is extrapolated. The 3D output is then run through an Acoustic Scattering Neural Network (ASNN) designed on the principles of Boundary Element Method (BEM), which replicates how sound reflects and refracts off your ear from any direction and then outputs your HRTF. This algorithm deviates from other state-of-the-art 3D reconstruction techniques in that it’s specifically trained to analyze complex abstract ear structures as opposed to more common shape estimation problems.
If that information caused your eyes to glaze over, you’re not alone. If you’re interested in learning more about this machine learning-based approach to creating personalized HRTFs, grab a cup of coffee, settle in, and dig into this research paper, which covers the topic in extensive technical detail.
It may not seem like it at first, but this development is actually a pretty big deal. With the HRTF personalization problem finally solved, sound designers now have access to a powerful tool that will help them create a new generation of binaural audio experiences that are far more accurate in their articulation of spatial detail – particularly in the height dimension.
PERCEPTUAL TUNING: EXPANDING THE CREATIVE ROLE OF HRTF TECHNOLOGY
Scalable HRTF personalization infrastructure isn’t the only advancement required to truly elevate quality standards for immersive soundscapes. HRTFs have traditionally been used as a generic variable for achieving a baseline binaural spatialization effect. However, new techniques and technologies are being developed, which dramatically expand the creative possibilities for HRTF implementation.
Much like HRTFs themselves, no two games’ mixes or sound field requirements are exactly the same. As such, spatial sound fields and HRTF tunings should be designed by taking into account factors including environment, fluctuations in object density, dynamic changes in camera perspective, front vs. rear vs. height imaging, gameplay benefits, and audio mix. In most scenarios, HRTFs are treated as a sort of one-size-fits-all tool and are not customized in a way that is specifically designed to reinforce and complement these critical gameplay considerations.
Popular on A Sound Effect right now - article continues below:
-
45 %OFF
-
33 %OFF
-
30 %OFF
-
26 %OFF
Imagine that your sound team included a dedicated spatial audio mastering engineer whose primary responsibility was to tune the HRTF and spatial sound field to your exact requirements. The audiovisual field for on-screen vs off-screen action could be reinforced through adjustments to transitional angles and curves based on differing amounts of spatialization applied to front, side, rear, and height channels or to any available angle in the case of Ambisonic and object-based implementation. The amount and quality of early reflections in the HRTF could be customized in order to significantly aid in localization accuracy across the board. Finally, more traditional mastering options like EQ and gain adjustment could be applied for each angle or spatial region, helping to blend the immersive sound design more effectively with the game’s overall mix.
This is the process we undertook together with the Square Enix sound team when designing Immerse Gamepack. Our Immerse HRTF generation pipeline includes hundreds of customizable variables allowing for such granular control over spatial rendering, which we couple with our patented Clearfield™ technology to control the amount of spatialization applied to different parts of the sound field. By introducing sound designers to this technology and assisting them in its use and implementation, we hope to expand the industry’s understanding of the true creative potential of immersive sound.
INTEGRATION: ADDING PERSONALIZED SPATIAL AUDIO TO YOUR GAME
We currently have two options available for integrating Immerse personalized spatial audio technology into your game: Plugins for Wwise and game engines, and custom API integration. These tools provide the means for developers to both monitor channel or object-based audio on headphones with their own personalized HRTFs, and implement that same support on the front-end for their players. This allows you to accurately design and monitor immersive audio on headphones for a wide range of output formats within a binaural virtual environment that’s perfectly contiguous with the environment in which the majority of your players will hear it. If you’d like to learn more about this technology, you can visit our website at https://embody.co/gamedev.
Our mission is to empower sound designers to create better, more immersive spatial audio experiences. We hope you’ve found this article to be informative and that you’re left feeling inspired to further explore personalized, immersive sound design. We encourage you to contact us at dev@embodyvr.co. if you’re interested in exploring how our technology can be integrated into your next project.
A big thanks to Ira Bolden for sharing insights on Creating Better Spatial Audio Experiences
Please share this:
-
21 %OFF
-
45 %OFF
-
33 %OFF
-
30 %OFF
-
26 %OFF