โฐ Last Chance To Save! Check out the deals ๐Ÿ‘‰ here

Feb 10, 2026 |

How to hear AI-generated or heavily edited audio

By Asbjoern Andersen
20 TIPS FOR DETECTING AI-GENERATED OR HEAVILY EDITED AUDIO
How do you determine if audio is AI generated or heavily-edited? There are some tell-tale signs, and here, Dr Iain McGregor shares 20 tips that can help you figure it out:
Written by Dr Iain McGregor

AI-generated audio keeps improving, but our ears are still remarkably good at telling when something does not behave like real sound. You do not need special tools or training. If you know what to listen for, many clues reveal themselves automatically.

Here are twenty simple listening tricks. They will not catch everything, but they can help you develop a more confident ear, whether you work in media or just enjoy paying attention to how sound behaves.

  1. Too clean to be real: Real rooms have texture: HVAC, small movements, distant noise. If a voice floats in emptiness with no sense of space, be sceptical.
  2. Backgrounds that never behave like real space: True ambience drifts. Cars pass, wind rises and falls, someone coughs once. If a background feels frozen or looped, treat it with caution.
  3. Over-compression used as camouflage: Sometimes fakes flatten everything to hide seams. Consonants punch, but breaths and texture vanish. Natural dynamics are missing.
  4. Plosives and sibilance that behave unnaturally: โ€œPโ€, โ€œTโ€, โ€œKโ€, โ€œSโ€, and โ€œSHโ€ normally change with head movement and mouth position. Identical versions every time suggest heavy editing or synthesis.
  5. No microphone drift: Real talkers shift, turn, laugh or swallow. These movements change tonal balance slightly. A voice that stays nailed to one spot for long stretches is unusual.
  6. Nothing ever gets tired: Real voices sag and recover. Pitch slips, breaths deepen, lip noise appears. Many synthetic voices stay perfectly steady, as if spoken in a single moment.
  7. Emotion and pacing that do not match the body: A voice may sound angry or sad, but with no breath effort, no timing change and no recovery. Laughter that arrives fully formed with no inhale and no tail-off is a common tell. Real emotion leaves marks, while fake emotion is decoration.
  8. Silences that feel switched off: Natural silence still carries room tone. If silence feels like a sudden vacuum, something has been removed.
  9. Timing that is too neat: Real speech has hesitations and small recoveries. When every phrase is perfectly fluent and evenly spaced, it may be assembled or generated.
  10. Phase and summing that do not behave physically: Natural layers interfere, creating tiny cancellations and colour shifts. If multiple sources never interact, or collapsing to mono changes nothing, the sound may not have been physically captured.
    Trending right now
    Explore all
  11. Mouth and breath noises that never change: Human voices produce tiny, irregular mouth sounds. Models often avoid them entirely, or repeat the same pattern.
  12. Edits that leave no transition: Every real edit leaves a seam, such as a tone change, a breath jump or a mismatch in room tone. If a sentence blends with no acoustic transition at all, treat it with caution.
  13. Volume behaviour that stays sterile: Turn natural audio up and hidden detail appears. Turn it down and the voice sinks into the room. Many fakes stay unnaturally perfect at all levels.
  14. The yawn test: Yawning changes the shape of your ear canal. Real recordings reveal reflections and low-level detail when this happens. Many fakes have nothing underneath, so they stay flat.
  15. Talk quietly over it: With real audio, your brain separates the two sources and you can still hear the room beneath the voice. Many fakes have no spatial bed to separate from.
  16. On speakers, move your head left and right: Real recordings live in space. Even small movements change timing and reflections. With many fakes, nothing shifts.
  17. On speakers, lean forward and back: Real voices have distance. Close sounds feel more direct, and farther sounds reveal more room. Many fakes sound identical at every distance.
  18. On speakers, scratch your head or rub your fingers near your ear: A small sound near your ear competes naturally with a real recording. Your brain hears depth. With many fakes, nothing separates and the voice feels stuck inside your head.
  19. Device and format fingerprints that do not match: Phone microphones roll off lows, add light automatic gain control, and often collapse to mono. Studio microphones carry fuller bandwidth and stable noise. If a clip said to be recorded on a phone has deep bass and sparkling highs, or a studio clip sounds telephony-bandlimited, the signal path does not match the claim.
  20. Power and environment cues that do not belong: Recordings often carry faint electrical and environmental clues. HVAC cycling, outdoor broadband noise, mild 50 or 60 Hz hum, and the general drift of a space that is alive. When these are present where they should not be, or absent where they should, it is worth being curious.

    ย 

Why these tests work

Real audio is messy by accident. Models remove variation to stay stable. When you change how you listen by talking, yawning, moving your head or adjusting volume, you force the sound to react like physics. Real recordings respond. Synthetic ones ignore you.
ย 

The futzing problem

As listeners learn these cues, deepfake makers will start adding artificial messiness in an attempt to pass. They might add fake room tone, fake microphone drift, synthetic mouth noise or artificial phase smear. Once a recording performs imperfection on purpose, it stops feeling natural. Even if the distortion is analogue, the presence of technology becomes obvious, and trust quietly falls.

  • Real sound is messy without trying.
  • Fake sound is messy on purpose.
  • Most people can hear the difference.

These checks are not proof of anything on their own, and they will not catch every example. They are a way to listen with a bit more curiosity about how recordings are made.

About Dr Iain McGregor:

ย 
Dr Iain McGregor is Associate Professor at Edinburgh Napier University, specialising in interactive media design and auditory perception. With over 30 years of experience in film, games, theatre, radio, television, and robotics, his work explores soundscapes, sonification, and human interaction. His research spans auditory displays, immersive audio, healthcare alerts, and human-robot interaction, and he holds a patent on the evaluation of auditory capabilities. Alongside his research, he mentors MSc and PhD students and collaborates with industry partners in mixed reality and robotics, helping to shape advanced auditory interfaces. Learn more about him here