Dr Iain McGregor - programme leader for the online MSc in Sound design at Edinburgh Napier University - has some great insights, thoughts and ideas on how this can (and probably will) be done:
Written by Dr Iain McGregor, and republished here with his kind permission
Embedding computer power and connectivity into everyday devices to create an Internet of Things (IoT) can be both an attractive and worrying development for many consumers. It is enticing as the functionality of devices can be extended dramatically, so that not only can many more objects communicate with each other, but that they can also potentially evolve according to end users’ requirements.
It is concerning for both consumers and manufacturers due to hacks, and unintended eavesdropping through the essential requirement to constantly monitor a device’s activity. Any product that plugs into a mains socket could potentially connect to the internet through powerline networking, without the owner even being aware of the additional functionality.
One of the most straightforward, and cost-effective methods to communicate with end users is through sound.
Battery powered devices might connect wirelessly as easily as a smart phone, or even more simply using Wi-Fi pairing.
One of the most straightforward, and cost-effective methods to communicate with end users is through sound. Both Piezo and MEMS loudspeakers and microphones are relatively inexpensive. They only take up a small amount of real-estate on a device and require little power.
Sounds from a visibly hidden world
Sounds are usually interpreted about 40 milliseconds faster than simple visual cues, which is one of the main reasons for starters’ pistols still being a mainstay in athletic competitions. This additional delay of visible signals is mostly due to the cognitive workload required to process imagery. Sound also provides a much broader range of communication for the physical space available on a device than a screen of a similar size. A microphone and a loudspeaker can occupy less surface area than a screen capable of displaying only a single legible word. Speech, music and sound effects can all be seamlessly integrated, without the need for glasses or even to look at the device being operated.
A microphone and a loudspeaker can occupy less surface area than a screen capable of displaying only a single legible word
This has been coupled with a trend for increasing default volume, and thereby annoyance with alert sounds that generally compete to be heard, or more often, ignored. Extensive customisability has had little effect on allowing artificially generated sounds to integrate naturally into pre-existing auditory environments, and yet still remain informative.
Designing effective sounds for the internet of things
Abstract synthesised sounds termed earcons are often adopted when there is no natural correlation, but they require to be memorised individually
However, the direct correlation between sounds in the physical world and the virtual world is predominantly hit and miss, but the underlying language is considerably less so. Abstract synthesised sounds termed earcons are often adopted when there is no natural correlation, but they require to be memorised individually.
Extensive listener testing can easily identify which sounds are most suitable for inclusion, whether they are captured from the physical world or synthesized. Two of the most useful sound design tips to improve an auditory cues effectiveness include the Doppler effect and presence frequencies. The Doppler effect communicates movement, and a raise in pitch indicates that something is coming towards the listener and therefore potentially important, whereas a drop in pitch suggests that the danger has passed and that it is safe to ignore.
By moving pitch up listeners think a sound is more important and needs to be attended to, conversely a falling pitch suggests that the cue does not need to be attended to. An increase in the presence frequencies of around 1 – 4 kHz replicates when a sound is directed towards a listener, whereas if these frequencies are missing, or are heard to drop off, the perception is that the sound is not intended to be heard, and that we are overhearing something that is either private or irrelevant. These frequencies are often found in alert sounds, and by pulsing them attention can be drawn more easily, especially when it comes to spatially locating the sound source.
Popular on A Sound Effect right now - article continues below:
Another issue to contend with is all of the other auditory cues that are often present in a shared environment. Audio watermarking is a well-established technique for tracking music plays across media, as well as identifying where leaked media content originated from. This technique can easily be applied to new interface sounds so that when devices are first switched on they can listen to what already exists within an auditory environment and alter their sounds accordingly. Firstly, this can ensure that there are sufficient differences between any sounds so that that perceptual confusion does not occur. More importantly, from an aesthetic approach, this means that if a user has specifically chosen a set of sounds for other devices then a complimentary set can automatically be chosen for the new device, without the user even being aware of a change. As the device is connected to the internet, the sounds do not even have to be preloaded, they can be downloaded as necessary, and if desired, evolve as new devices are introduced into the environment.
As the device is connected to the internet, the sounds do not even have to be preloaded, they can be downloaded as necessary, and if desired, evolve as new devices are introduced into the environment
If a watermark is not present then it is possible for a sample of a sound to be sent to a cloud service so that it can be identified, and its significance established. This technique is already popular for music, and has been utilised for wildlife, as well as for other applications. If it is an emergency sound, like a smoke alarm, then it could be used to turn off a device, or silence it entirely. If it is an alert sound that has been ignored then it could be reproduced by a second device to draw attention to it, or again the device could go silent. These sounds could be time aligned so that they are in phase to make them louder, or time delayed so that the sound from the closer device is slightly later giving the illusion that it is an echo. This would assist identification of the spatial location of the sound, which if there are multiple identical devices in an environment, could help the user understand where a sound was generated.
Monitoring the background level can also be used to identify when no one is in a room, and it can either ensure that the sound does not play, or that it plays through a device in a room that is occupied
Use of repeat button can provide information about the clarity of the sound, or the meaning not being sufficiently interpreted, so this could indicate that an alternative sound needs be sourced automatically, or that the current sounds requires to be either spectrally altered to make comprehension easier, or simply made louder. The skip function could be considered a silent function, so that the user possibly finds the sound unnecessary, if this function was selected for a specific auditory cue a predetermined number of times then the sound could be automatically replaced or disabled.
Speech input is definitely a practical, and, to a certain extent, popular form of interfacing with devices. This can be linked to any sound that is currently playing so that a form of ducking, or silencing completely occurs in real-time. This can allow natural communication between rooms, either by making everything quieter, or by relaying the speech through the device like an intercom, that is time synched to reinforce the speech coming directly from the source. This advanced form of baby monitor can also work in reverse if the desire is to isolate a listener from the speech of others in the shared environment, by turning up frequencies of any audible device that will mask the pre-existing speech, rendering it difficult to comprehend.
Functions which are partly audible due to being inside a case can have the frequency range extended by the built-in loudspeaker, so that they are more acoustically transparent, and thereby much easier to interpret. Alternatively, a form of active noise cancellation can be applied so that the vibrations caused by the loudspeakers are out of phase to the sound occurring naturally to decrease any sound being generated by the device. Some vehicles already make regular use of this technique, and it could be extended into domestic and commercial environments.
When considering what sounds to include, a sensible approach is to consider the function that the sounds performs. The designer’s intention obviously has to match the listeners’ perceptions, whether it is to warn, assist, incite, monitor, reassure, guide, forgive or protect the end user. Warnings are the most obvious, and essential for safety purposes, the common approach is for loud, high pitched sounds with intermittent timing. Assisting sounds provides some form of auditory reminder, incitement encourages actions, with monitoring allowing status communication. Reassurance confirms that appropriate actions were performed, forgiveness communicates that an error was made, but it can be rectified if a renewed attempt is made. Guidance assists end users with details about what activities are possible, such as it is safe to open a door. Protecting sounds provide auditory cues as a form of safeguard.
In a kitchen environment, the simple scanning of a QR code on a ready meal could turn on the oven, and when it is has heated up to the correct temperature provide a subtle alert that it is time to insert the meal. Once the cooking has completed then a conventional alarm could inform the intended diner, in whichever room they are located that it is ready to consume. If the sound is ignored a verbal offer of keeping it warm could be made, which would only require a simple ‘yes’ or ‘no’ response to affect the required temperature change in the oven. This could equally apply to any product, such as clothes in a washing machine, or even setting the correct temperature on an iron in order to prevent damage to a delicate piece of clothing.
Ultrasound could also be used for end users’ benefit, to communicate that a device is still on, when it should not be, like an iron that is stationary for a pre-specified duration
By introducing more sounds through the internet of things the world does not need to be noisier. Unlike manual devices where the sound is often a by-product and difficult to remove, in the virtual world devices can be set to only make sounds when absolutely necessary. As an individual becomes familiar or adept with a device and operates it correctly, fewer sounds could be generated over time. Eventually only errors could be communicated audibly, such as tapping a button too often, due to a delay in functioning and a finger being in the way to prevent visual confirmation, helping to make the device ‘fat finger friendly’.
The device could also listen out for maintenance cues to know when something has gone wrong. If a photocopier is jammed then the instructions could be read out in the end user’s chosen language, in sync with the actions being performed by the person trying to fix it, as an ultrasound cue is generated as a hatch opens, or even by interpreting a naturally occurring sonic cue via an unrelated device such as a mobile phone. This could help avoid the need to constantly bob up and down while trying to interpret the animations on a small screen, or blindly try to investigate the problem. The smart phone could also drastically expand the number of user maintainable devices in an environment, as verbal instructions could be augmented by the interpretation of appropriate acoustic cues.
• UI Sound Design: Henry Daw, on The Small Sounds That Make A Big Difference
Sounds for safety
Just like cookies allow advertisers to target content, and personal data stored on Facebook and other forms of social media can provide recommendations, these mechanisms could be used to customise the auditory content of the internet of things.
This could allow a much more holistic design approach where sounds are introduced as needed, based on similar users’ experiences, rather than having everything switched on to start off with
New devices might be registered by purchasers using existing online profiles and then the audio settings utilised by users with similar profiles would be set automatically on the new device. Whenever a device is connected it could send user settings to the cloud to track long term usage, further refining the settings for future users. This could allow a much more holistic design approach where sounds are introduced as needed, based on similar users’ experiences, rather than having everything switched on to start off with.
Sound design for the internet of things can transform the acoustic space in any environment. It can be used as much to switch sounds off as add them. Safety can be enhanced by ensuring that pre-existing alarms are clearly audible, and individuals in shared environments can either be more or less acoustically aware of each other, depending upon their preferences. Systems that actively listen out for pre-existing auditory cues can avoid masking or perceptual conflicts, and users with similar social media profiles can provide suitable initial settings. Sound events can be shared across devices with appropriate timing cues to guide listeners where to attend to, and choices updated according to usage patterns. Sounds might be introduced at subtler levels, and then turned up if they go unnoticed, rather than the more traditional approach of needing to be turned down or off entirely. The palette of sounds can also be much broader as new cues can be downloaded automatically rather than stored on the device at the point of manufacture, and designers can monitor each sound’s popularity and usage in order to maximise effectiveness and influence future designs. Privacy will be the main issue to overcome with the understandable fear of snooping, but most of these functions can be confined to a local private network with only intermittent connection to the internet.
Please share this:
Dr Iain McGregor is the programme leader for the online MSc in Sound design at Edinburgh Napier University. He runs the Centre for Interaction Design’s Auralisation suite, which is a dedicated 24.4 channel surround sound facility for conducting listening tests. He is currently working on a diverse range of projects, ranging from listeners’ experiences of linear and interactive media, as well as products and environments. Find him on LinkedIn here.
+ free sounds with every issue: