You probably know most of this already.
Presence frequencies. Spatial cues. Prediction. Emotional beats.
What you may not have seen is how they all fit together into a single, practical model for how the brain decides what to listen to, in any context from the real world to film to games.
We only have one pair of ears.
No matter how many sound sources are around us, all the vibrations, from the deepest bass you feel in your chest to the hiss of a sibilant, ultimately pass through just two physical channels into the cochleae.
From there, the brain does something remarkable: it builds an artificial model of what it thinks is happening in the world. And here is the twist, every sound you โhearโ is already in the past. By the time your brain makes sense of it, the actual vibration has finished.
This reconstruction is surprisingly light on cognitive load. Our auditory system is so well adapted that it can stitch together an accurate, useful world model almost effortlessly, combining just enough sensory data with memory and expectation to guide action.
And expectations matter more than most people realise. If you hear the first part of a familiar sequence, your brain predicts what should come next, and often โhearsโ it even if it is not there. The brain fills in missing events so smoothly that, unless the absence breaks the pattern, you may never notice.
That reconstruction is shaped by two processes running at the same time:
Bottom-up processing โ driven by the properties of the sound itself
- The ear canal naturally amplifies the 1โ4 kHz โpresence bandโ where speech cues live, making those sounds stand out.
- Sudden changes in loudness, pitch, or timbre demand attention.
- Sounds from in front are prioritised because they align with our visual focus.
Top-down processing โ driven by context, memory, and goals
- The brain pattern matches against familiar voices, languages, and situations.
- If you recognise the voice or understand the words, they become more important.
- Predictions about what should happen next mean you notice when reality changes, or you simply fill in the expected sound without it ever being played.
- Expectations are shaped by the medium you are in. In the real world, you expect physical accuracy. In film, you expect selective realism and dramatic exaggeration. In games, you expect functional cues that guide play.
- Expectations are also shaped by the desired emotional trajectory. People listen to content to help them feel a certain way, whether that is fear and suspense, comfort and calm, adrenaline and triumph, or joy and laughter.
- Attention is dynamic. Even within the same media experience, minds wander once predictions are satisfied or when the scene becomes predictable. This can be a natural rest period or a moment to re-engage with subtle hooks.
- Once meaning is established, it is hard to shift. First impressions โlock inโ the source and meaning of a sound, and later cues rarely override them.
- Perceptual resets are possible but rare. To truly change meaning, you must break the listenerโs predictive model, for example by revealing a sound source visually or introducing a radically incompatible cue, forcing the brain to rebuild its interpretation.
Both processes work together.
A sudden clap behind you might demand attention (bottom-up) while, at the same moment, you are following a conversation because the topic matters (top-down).
Why this matters for sound design
Due to the brain reconstructing reality, we do not always need the โrealโ sound. We can use analogues that evoke the right place, texture, or emotional tone. We can even leave sounds out when the expectation is strong enough for the brain to fill them in, and those expectations will differ depending on whether the listener is in a real-world, film, or game context, and on the emotional journey they came for.
By giving the brain the right cues at the right moment, we can prompt it to complete the scene:
- A handful of well chosen elements can feel more real than a cluttered, literal soundscape.
- Adjusting timing, spatial cues, and presence band content can guide attention exactly where it needs to go.
- Strategic gaps, silence, partial sequences, or reduced detail, can be as powerful as adding more layers.
- Playing to (or against) medium specific and emotional trajectory expectations can make a moment instantly believable, or deliberately surprising.
- Recognising that attention shifts and minds wander means you can choose when to let the audience drift and when to bring them sharply back.
- Get the first cue right. Meaning locks in fast. If you want to change it later, be prepared to design for a deliberate perceptual reset.
In the end, good sound design works with both the physics of the ear and the psychology of the listener. It is not about reproducing the entire world, it is about shaping the world the brain believes is there.
About Dr Iain McGregor:
ย
Dr Iain McGregor is Associate Professor at Edinburgh Napier University, specialising in interactive media design and auditory perception. With over 30 years of experience in film, games, theatre, radio, television, and robotics, his work explores soundscapes, sonification, and human interaction. His research spans auditory displays, immersive audio, healthcare alerts, and human-robot interaction, and he holds a patent on the evaluation of auditory capabilities. Alongside his research, he mentors MSc and PhD students and collaborates with industry partners in mixed reality and robotics, helping to shape advanced auditory interfaces. Learn more about him here




