The idea is to adjust audio objects (I like to think of audio objects as different audio layers – ambient noise, dialog, music, etc. that can be adjusted both in volume and speaker location) to give each listener the best experience possible, given their circumstances. As Amazon says on their web site for their new phone:
“Fire phone uses the power of Dolby Digital Plus to create an immersive audio experience. Dolby Digital Plus auto-adjusts volume, creates virtual surround sound, and delivers easier-to-understand dialogue in movies and TV shows. Fire phone is designed to automatically optimize the audio profile based on what you’re doing, such as watching a movie or listening to music.”
The sensors on their new phone allow for Amazon to understand the environment that the user is experiencing (e.g. whether it is noisy, quiet, light or dark). The multiple cameras allow for the creation of a 3D profile of a person’s head and can detect where their ears are, relative to the speakers. They also optimize based on the type and brand of speakers (are they headphones, speakers from a TV or other external device, the phone’s speakers, etc.).
A practical reason for improving relative audio quality is to reduce the trouble calls that Amazon receives from customers regarding its streaming content. By dynamically adjusting dialog to a higher level relative to background effects and music, they are aiming to solve problems before they occur.
Hilmes described a system that adjusts to the user’s ability as well; essentially a sophisticated equalizer that adjusts for hearing loss. He explained how this audio signature could live in the cloud, such that a person’s hearing profile could follow him from device-to-device. Although he didn’t mention any commercial designs for the metadata that is inherent in such a signature, it is not hard to imagine the value of hearing loss metadata to Amazon. It also seems like this approach could also be valuable for preventing hearing loss (e.g. a profile that parents could set to limit the music volume for their kids).
From a content provider perspective, transmitting audio objects effectively means mixing the audio at the consumer device, instead of at the time of production. Roger Charlesworth of Charlesworth Media and Executive Director of the DTV Audio Group suggested that, long-term, an audio object approach promises to streamline production for the content producer. He cited the importance of metadata and the biggest impediment to adoption as cultural, as it means changing the live production workflow. He suggested the transition to IP for the audio workflow will be a big reason for the eventual success of audio objects.
From a camera to consumer infrastructure perspective, Sripal Mehta of Dolby Labs described an approach that uses the existing Dolby 5.1 or 7.1 infrastructures with what he termed “bolt-on additions.” Dolby’s market research indicates that viewers don’t want to be sound mixer technicians, but want pre-selected presentations that they set and forget. As an example, he showed a video of a Red Bull sponsored hockey game that provided viewers with different perspectives and the ability for the viewers to decide upon the presentation they wanted to see/hear. These perspectives included:
- A German announcer
- An American announcer
- A biased (fan) announcer – step aside Hawk Harrelson
- One with a British comedian who didn’t know or care to know the game in almost a Mystery Science Theater way (quite funny)
To this last point, audio objects offer the potential to create both better quality content, as well content that is compelling to audiences who would otherwise not be interested (e.g. the British comedian made the game entertaining to non-hockey fans).