Why Aren’t We Listening to Everything in 3D Already? An Overview of Current Applications of Immersive Audio / by Xiao Quan

            Immersive Audio (or 3D sound, Spatial Audio) has been somewhat of a buzzword in the field of music technology for quite some time. However, it’s not until recent years, alongside the exponential interest in Virtual, Augmented, and Mixed Reality technologies, do the general public gain more exposure to immersive audio. Branding terminologies such as ‘Dolby Atmos’, ‘360 Reality Audio’, or ‘HRTF Binaural’, are starting to appear in movie theaters, smartphone specifications, or some streaming services. However, despite this increased public exposure, commercial implementations of immersive audio are still in an experimental, trial and error stage. No one is certain whether it will be the new industry standard of audio. This article provides an overview of current applications of immersive audio in different fields and offers an outlook for its development potential.

            By essence, immersive audio stems from a continued attempt at reproducing spatialized sound. In contrast to stereo or multi-channel surround systems, immersive audio adds height information, and an improved spatial positioning of sound sources (Gerzon, 1973), (Olivieri, Peters, & Sen, 2019). Currently, popular immersive audio distribution formats, such as Dolby Atmos, DTS:X, Auro3D, and MPEG-H, are based on the idea of ‘Object-Based Audio.’ The word ‘object’ can roughly be understood as audio information encoded with spatialization metadata, which can later be decoded in the reproduction stage (Susal, Krauss, Tsingos, & Altman, 2016), (Flanagan, 2019). This implies that in contrast to channel-based approaches, spatialization arrangements are not hard-pressed into the final product, but instead re-constructed at the consumer’s end, depending on the reproduction configuration available. In other words, Object-Based Audio presents the opportunity for audio producers and engineers to make a ‘one-size fits all’ version of immersive audio for all reproduction systems, from 22.2, to stereo, to binaural for headphones. The supported receiver will decode and make output decisions to speakers based on how they are set up (Sexton, 2017). With this in mind, let’s look at some current applications of immersive audio.

            Cinema

            Perhaps the most predominant application of immersive audio is in the domain of film, where the establishment of a sense of place contributes highly to the audience’s engagement (Lynch, 2016). This is also a domain where immersive sound systems can be utilized to its fullest potential, as more speakers in acoustically treated spaces are being engaged at fuller volumes, compared to other reproduction environments. Since the inception of Dolby Atmos in 2012, we have seen a steady increase in the implementation of such systems in the cinematic landscape (Dolby Laboratories, 2013).

            VR

            Immersive audio is also steadfastly tied in with the development of VR, AR and MR technologies. In these fields, a highly precise spatialized sound reproduction is required. In addition, advantages offered by object-based audio rendering is paramount to the success of a VR application. These include flexible manipulation of an audio object’s spatial parameters with head-tracking, increased user personalization in streamed VR content, better control over diegetic and non-diegetic sounds in post-production with binaural processing, et cetera (Susal, Krauss, Tsingos, & Altman, 2016).

            Music Streaming

            At the time of writing, music streaming services that offer immersive content is limited. The two main distribution formats used are Dolby Atmos Music and Sony’s ‘360 Reality Audio’ with the latter based on MPEG-H 3D (Fraunhofer, 2019). The streaming platforms that distribute these contents are also limited, with Dolby Atmos Music only on Tidal on Android devices, and 360 Reality Audio on Tidal, Deezer, and nugs.net. Both of these services are launched in the final quarter of 2019. Therefore, it remains to be seen whether consumers will prefer the immersive formats over stereo.

            Live Broadcasting

            This is an area that has received much academic attention, particularly in conjunction with the MPEG-H 3D audio format presentations. Similar to how VR takes advantage of the audio object metadata for head-tracking and other creative decisions, Live Broadcasting in MPEG-H can utilizes this metadata to give the audience more control over their experience (Stenzel & Scuda, 2014). For example, when viewing a sports event at home, MPEG-H allows “consumers to have personalized playback options ranging from simple adjustments (such as increasing or decreasing the level of announcer’s commentary or actor’s dialogue relative to the other audio elements) to conceivable future broadcasts where several audio elements may be adjusted in level or position to tailor the audio playback experience to the user’s liking” (Herre, Hilpert, Kuntz & Plogsties, p. 823, 2015). However, commercial implementations of such systems are currently few.

            Live Sound

            Immersive live sound is, at present, a niche market. The major players include d&b Audiotechnik and L-Acoustics’ L-ISA systems (FOH Magazine, 2019). Similar to the cinematic realm, immersive audio systems in this domain take advantage of Object-based Audio rendering to create a more precise distribution of spatialized sound. However, we are witnessing a steady increase in venues adopting new object-based live sound systems for more immersive concert experiences (FOH Magazine, 2019).

            Automotive

Lastly, let look at cars. Commercial attempts in integrating immersive audio systems in automobiles are few. However, a study experimenting running object-based audio in car processors have been made in May 2019 (Kovačević, Kaprocki & Popović), with the conclusion that no significant increase in processing power is required to run an object-based audio system in an automobile context. This implies potential in future implementations of immersive audio systems in the automotive domain.

            Conclusion

            From the overview above, we can see that the reason behind our current attention on immersive audio is two-fold. The first being an improved listening experience for the consumer: with the additional height information and increased precision in spatial positioning, immersive audio is a more accurate representation of spatial sound. The second reason is the versatility it enables, for both consumers and producers, to create and consume the same content in different ways. As the possibilities of entertainment distribution platforms increase with new technological advancements, the scalability of a distribution format becomes very important. Object-Based Immersive Audio technology provides such scalability, making it a ‘future-proof’ way to produce content. For the average consumer, though we are not listening to everything in immersive formats right now, I suspect that we will be forced to adapt to it, not so much for the listening experience, but for the versatility it provides. Thus, I believe the phrase ‘Immersive Audio’ is somewhat of a misnomer. Perhaps a more accurate description for these formats would be something like ‘Ubiquitous Spatial Audio’ instead.

Citations

Dolby Atmos Reaches 85-Title Milestone with New Films Announced at ShowEast 2013. (2013,

October 22). Retrieved from http://investor.dolby.com/news-releases/news-release-details/dolby-atmos-reaches-85-title-milestone-new-films-announced

Herre, J., Hilpert, J., Kuntz, A., & Plogsties, J. (2015). MPEG-H audio—the new standard for universal spatial/3D audio coding. Journal of the Audio Engineering Society62(12), 821-830.

Kovačević, J., Kaprocki, N., & Popović, A. (2019, May). Review of automotive audio technologies: immersive audio case study. In 2019 Zooming Innovation in Consumer Technologies Conference (ZINC) (pp. 98-99). IEEE.

Lynch, D. (2016). Catching the big fish. Penguin.

Multi-Channel Arrays. (2019, October 17). Retrieved from https://fohonline.com/articles/techfeature/multi-channel-arrays/

Multi-channel Arrays: "Immersive" is the New Surround – Part 2. (2019, November 13). Retrieved from https://fohonline.com/articles/tech-feature/multi-channel-arraysimmersive-is-the-newsurround-part-2/

Gerzon, M. A. (1973). Periphony: With-height sound reproduction. Journal of the audio engineering society21(1), 2-10.

Olivieri, F., Peters, N., & Sen, D. (2019). Scene-Based Audio and Higher Order Ambisonics: A technology overview and application to Next-Generation Audio, VR and 360 Video.

Roginska, A., & Geluso, P. (Eds.). (2017). Immersive sound: The art and science of binaural and multi-channel audio. Taylor & Francis.

Flanagan, P. (2019, June 20). 5G and MPEG-H for Ultra-Immersive Gaming and Entertainment. Retrieved March 2, 2020, from https://www.youtube.com/watch?v=Jl8zBR9YgXE

Sexton, C. (2017, October). Immersive Audio: Optimizing Creative Impact without Increasing Production Costs. In Audio Engineering Society Convention 143. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh

Stenzel, H., & Scuda, U. (2014, October). Producing interactive immersive sound for MPEG-H: A field test for sports broadcasting. In Audio Engineering Society Convention 137. Audio Engineering Society.

Susal, J., Krauss, K., Tsingos, N., & Altman, M. (2016, September). Immersive audio for VR. In Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh