class papers

Why Aren’t We Listening to Everything in 3D Already? An Overview of Current Applications of Immersive Audio by Xiao Quan

            Immersive Audio (or 3D sound, Spatial Audio) has been somewhat of a buzzword in the field of music technology for quite some time. However, it’s not until recent years, alongside the exponential interest in Virtual, Augmented, and Mixed Reality technologies, do the general public gain more exposure to immersive audio. Branding terminologies such as ‘Dolby Atmos’, ‘360 Reality Audio’, or ‘HRTF Binaural’, are starting to appear in movie theaters, smartphone specifications, or some streaming services. However, despite this increased public exposure, commercial implementations of immersive audio are still in an experimental, trial and error stage. No one is certain whether it will be the new industry standard of audio. This article provides an overview of current applications of immersive audio in different fields and offers an outlook for its development potential.

            By essence, immersive audio stems from a continued attempt at reproducing spatialized sound. In contrast to stereo or multi-channel surround systems, immersive audio adds height information, and an improved spatial positioning of sound sources (Gerzon, 1973), (Olivieri, Peters, & Sen, 2019). Currently, popular immersive audio distribution formats, such as Dolby Atmos, DTS:X, Auro3D, and MPEG-H, are based on the idea of ‘Object-Based Audio.’ The word ‘object’ can roughly be understood as audio information encoded with spatialization metadata, which can later be decoded in the reproduction stage (Susal, Krauss, Tsingos, & Altman, 2016), (Flanagan, 2019). This implies that in contrast to channel-based approaches, spatialization arrangements are not hard-pressed into the final product, but instead re-constructed at the consumer’s end, depending on the reproduction configuration available. In other words, Object-Based Audio presents the opportunity for audio producers and engineers to make a ‘one-size fits all’ version of immersive audio for all reproduction systems, from 22.2, to stereo, to binaural for headphones. The supported receiver will decode and make output decisions to speakers based on how they are set up (Sexton, 2017). With this in mind, let’s look at some current applications of immersive audio.

            Cinema

            Perhaps the most predominant application of immersive audio is in the domain of film, where the establishment of a sense of place contributes highly to the audience’s engagement (Lynch, 2016). This is also a domain where immersive sound systems can be utilized to its fullest potential, as more speakers in acoustically treated spaces are being engaged at fuller volumes, compared to other reproduction environments. Since the inception of Dolby Atmos in 2012, we have seen a steady increase in the implementation of such systems in the cinematic landscape (Dolby Laboratories, 2013).

            VR

            Immersive audio is also steadfastly tied in with the development of VR, AR and MR technologies. In these fields, a highly precise spatialized sound reproduction is required. In addition, advantages offered by object-based audio rendering is paramount to the success of a VR application. These include flexible manipulation of an audio object’s spatial parameters with head-tracking, increased user personalization in streamed VR content, better control over diegetic and non-diegetic sounds in post-production with binaural processing, et cetera (Susal, Krauss, Tsingos, & Altman, 2016).

            Music Streaming

            At the time of writing, music streaming services that offer immersive content is limited. The two main distribution formats used are Dolby Atmos Music and Sony’s ‘360 Reality Audio’ with the latter based on MPEG-H 3D (Fraunhofer, 2019). The streaming platforms that distribute these contents are also limited, with Dolby Atmos Music only on Tidal on Android devices, and 360 Reality Audio on Tidal, Deezer, and nugs.net. Both of these services are launched in the final quarter of 2019. Therefore, it remains to be seen whether consumers will prefer the immersive formats over stereo.

            Live Broadcasting

            This is an area that has received much academic attention, particularly in conjunction with the MPEG-H 3D audio format presentations. Similar to how VR takes advantage of the audio object metadata for head-tracking and other creative decisions, Live Broadcasting in MPEG-H can utilizes this metadata to give the audience more control over their experience (Stenzel & Scuda, 2014). For example, when viewing a sports event at home, MPEG-H allows “consumers to have personalized playback options ranging from simple adjustments (such as increasing or decreasing the level of announcer’s commentary or actor’s dialogue relative to the other audio elements) to conceivable future broadcasts where several audio elements may be adjusted in level or position to tailor the audio playback experience to the user’s liking” (Herre, Hilpert, Kuntz & Plogsties, p. 823, 2015). However, commercial implementations of such systems are currently few.

            Live Sound

            Immersive live sound is, at present, a niche market. The major players include d&b Audiotechnik and L-Acoustics’ L-ISA systems (FOH Magazine, 2019). Similar to the cinematic realm, immersive audio systems in this domain take advantage of Object-based Audio rendering to create a more precise distribution of spatialized sound. However, we are witnessing a steady increase in venues adopting new object-based live sound systems for more immersive concert experiences (FOH Magazine, 2019).

            Automotive

Lastly, let look at cars. Commercial attempts in integrating immersive audio systems in automobiles are few. However, a study experimenting running object-based audio in car processors have been made in May 2019 (Kovačević, Kaprocki & Popović), with the conclusion that no significant increase in processing power is required to run an object-based audio system in an automobile context. This implies potential in future implementations of immersive audio systems in the automotive domain.

            Conclusion

            From the overview above, we can see that the reason behind our current attention on immersive audio is two-fold. The first being an improved listening experience for the consumer: with the additional height information and increased precision in spatial positioning, immersive audio is a more accurate representation of spatial sound. The second reason is the versatility it enables, for both consumers and producers, to create and consume the same content in different ways. As the possibilities of entertainment distribution platforms increase with new technological advancements, the scalability of a distribution format becomes very important. Object-Based Immersive Audio technology provides such scalability, making it a ‘future-proof’ way to produce content. For the average consumer, though we are not listening to everything in immersive formats right now, I suspect that we will be forced to adapt to it, not so much for the listening experience, but for the versatility it provides. Thus, I believe the phrase ‘Immersive Audio’ is somewhat of a misnomer. Perhaps a more accurate description for these formats would be something like ‘Ubiquitous Spatial Audio’ instead.

Citations

Dolby Atmos Reaches 85-Title Milestone with New Films Announced at ShowEast 2013. (2013,

October 22). Retrieved from http://investor.dolby.com/news-releases/news-release-details/dolby-atmos-reaches-85-title-milestone-new-films-announced

Herre, J., Hilpert, J., Kuntz, A., & Plogsties, J. (2015). MPEG-H audio—the new standard for universal spatial/3D audio coding. Journal of the Audio Engineering Society62(12), 821-830.

Kovačević, J., Kaprocki, N., & Popović, A. (2019, May). Review of automotive audio technologies: immersive audio case study. In 2019 Zooming Innovation in Consumer Technologies Conference (ZINC) (pp. 98-99). IEEE.

Lynch, D. (2016). Catching the big fish. Penguin.

Multi-Channel Arrays. (2019, October 17). Retrieved from https://fohonline.com/articles/techfeature/multi-channel-arrays/

Multi-channel Arrays: "Immersive" is the New Surround – Part 2. (2019, November 13). Retrieved from https://fohonline.com/articles/tech-feature/multi-channel-arraysimmersive-is-the-newsurround-part-2/

Gerzon, M. A. (1973). Periphony: With-height sound reproduction. Journal of the audio engineering society21(1), 2-10.

Olivieri, F., Peters, N., & Sen, D. (2019). Scene-Based Audio and Higher Order Ambisonics: A technology overview and application to Next-Generation Audio, VR and 360 Video.

Roginska, A., & Geluso, P. (Eds.). (2017). Immersive sound: The art and science of binaural and multi-channel audio. Taylor & Francis.

Flanagan, P. (2019, June 20). 5G and MPEG-H for Ultra-Immersive Gaming and Entertainment. Retrieved March 2, 2020, from https://www.youtube.com/watch?v=Jl8zBR9YgXE

Sexton, C. (2017, October). Immersive Audio: Optimizing Creative Impact without Increasing Production Costs. In Audio Engineering Society Convention 143. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh

Stenzel, H., & Scuda, U. (2014, October). Producing interactive immersive sound for MPEG-H: A field test for sports broadcasting. In Audio Engineering Society Convention 137. Audio Engineering Society.

Susal, J., Krauss, K., Tsingos, N., & Altman, M. (2016, September). Immersive audio for VR. In Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh

The Emergence of The Maker Community and Its Impact on Music Technology Education by Xiao Quan

  “The Maker Movement” is a recent social phenomenon that had garnered wide-spread attention from students, institutions, and businesses. It first emerged somewhat as an underground counter-culture movement. According to Dale Dougherty, the founder of Make magazine, “the maker movement has come about in part because of people’s need to engage passionately with objects in ways that make them more than just consumers.” (Dougherty, 2012, p. 12). Indeed, the community typically consists of enthusiasts in robotics and electronics who, usually self-motivated, design and manufacture innovative technologies and objects, with the help of increasingly available electronic components, digital fabrication tools, and open-source code sharing platforms. These include microcontrollers, PCI boards, 3D-printers, CNC machines and GitHub. Physical communal spaces that provide access to more comprehensive tools are often referred to as ‘makerspace’, or ‘hackerspace’, where people come together to make unconventional physical products and share ideas. ‘Makers’ can also connect via the internet, which offers a vibrant swarm of educational information and shared resources for potential projects. 

The movement’s emergence can be traced to Seymour Papert’s theory of constructivism, which asserts learning is done “by constructing knowledge through the act of making something shareable” (Martinez & Stager, 2013, p.21, as cited in Halverson & Sheridan, 2014). This is significant because it aligns the movement with the pedagogies of many progressive educational institutions. MIT, for example, created the FabLabs in 2005 as “pedagogical environments that allow everyday people to solve their own problems by producing (rather than purchasing or outsourcing) the tools they need” (Halverson & Sheridan, 2014, p. 499). Nowadays, many prominent higher education institutions offer a makerspace, or digital fabrication facilities for students interested in rapid prototyping ideas, typically in disciplines that involve engineering and electronics. NYU, for example, provides a makerspace at its Tandon School of Engineering, a fabrication lab at Tisch’s ITP, as well as a Fab Lab in the Steinhardt School of Culture, Education, and Human Development.

In Music Technology, especially in the field of electronic music, the idea of making new innovative technologies independently has always been prominent. Don Buchla and Robert Moog are two prime examples who, in the 60s, driven by two distinct visions of what electronic music is and should be, invented the Buchla music box and the keyboard-controlled synthesizers, respectively (Garud & Karnoe, 2013). In the same philosophical vein of constructivist theory, the term ‘Maker Education’ was first introduced by Dougherty, who asserts that “by harnessing the power of making, Maker Education allows us to create engaging and motivating learning experiences” (Dougherty, 2012, as cited in Hughes, 2018, p. 292). Under this belief, current ‘maker’ projects within music technology include a) using microcontrollers, such as Arduino, to rapid prototype new midi controllers, musical instruments, audio effect processors, and other audio-visual performance devices; b) creating wearable technologies; c) designing for music production in virtual reality and games, and others (Hughes, 2018).

In the realm of Higher Education, however, many Music Technology programs surveyed in the US, UK, and Europe, remains to be centered around the field of recording and mixing, with few exceptions such as NYU, Berklee, and Carnegie Mellon University, who offers a more inter-disciplinary approach (Hughes, 2018). Alayna Hughes is a strong proponent of integrating maker education philosophy in Higher Education. She argues that such curriculums could provide students with a more diverse set of skills that broaden their employment opportunities. In addition to various engineering positions in recording studios and live sound, the maker education framework enables students to be employable in areas such as digital fabrication, wearable creation, game audio design, controller design, app design, C programming careers, et cetera (Hughes, 2018). The validity of this argument, however, remains to be examined.

In my view, the idea of maker education in Music Technology is a powerful learning paradigm that encourages innovation at the forefront of the field, where ideas from other disciplines start to converge. However, it can also potentially backfire at learners if they don’t have a strong foundation in the discipline they are innovating in. I am a firm believer in knowing the rules before breaking them. In my opinion, one drawback of the maker movement, which is also embedded in its nature, is that the maker creates the value and direction of their creations. While this is certainly liberating and democratizing for some, for others without a solid foundation in a particular discipline, this self-deterministic ideal of learning can sometimes become an easy way out, as there are no pre-established hierarchies of value that validate their projects. We see examples of this more often in the artistic realm, performance to be exact. Coming from an acting background educated in a maker education inspired program, I have seen numerous performance projects fail, unable to connect with its audience, for this particular reason. Thus, I believe, only after students have gained a firm foundational understanding of a particular discipline, can this educational paradigm be most effective. The same should be taken into consideration when approaching the increasingly diverse landscape of music technology education.

 

 

 

Citations

Dougherty, D. (2012). The maker movement. Innovations: Technology, governance, globalization7(3), 11-14.

Pinch, T. J. (2001). Why do you go to a music store to buy a synthesizer: path dependence and the social construction of technology. Path dependence and creation, 381-400.

Hughes, A. (2018). Maker music: Incorporating the maker and hacker community into music technology education. Journal of Music, Technology & Education11(3), 287-300.

Halverson, E. R., & Sheridan, K. (2014). The maker movement in education. Harvard educational review84(4), 495-504.

Martinez, S. L., & Stager, G. S. (2013). Invent to learn: Making. Tinkering, and Engineering in the Classroom.

From Stereo to Wavefield Synthesis: A Brief Overview of Current Multi-Channel Recording and Reproduction Technologies by Xiao Quan

            Since our hunter-gatherer days, we have been looking for cost-effective ways to enter alternative narrative realities through aural manipulation, such as through ritual performance in Paleolithic caves found in France (Reznikoff, 2008), or in classical amphitheaters of Ancient Greece that reduces reverberation for speech clarity (Chourmouziadou & Kang, 2008). With the advent and advancements of recording and reproduction technologies in the 19th and 20th centuries, our capabilities to be ‘taken away’ by sound have improved drastically. This article briefly summarizes the advancement of current recording and reproduction technologies. 

           In 1933, stereo recording technology was invented and patented for EMI by Alan Blumlein (Blumlein, 1933). This is an early attempt at recording and reproducing sound with more detailed spatial information than mono recordings. The patent outlines that, using two directional microphones and two loudspeakers with proper technique, we can create and reconstruct sound faithfully with “near-replica of the original directional sound image” (Geluso, 2017, pp. 63). From a consumer’s perspective, this means with just one extra speaker, we can have an order of magnitude difference in the reproduced sound experience. It is no wonder that with the invention of the world’s first stereo-headphones in 1958, the stereo format had become and remains to be the mainstay for sound reproduction in the consumer market (Geluso, 2017).

           Since the commercial implementation of the stereo format, the audio industry has been obsessed with developing more immersive recording and reproduction technologies, though struggling to find commercial success comparable to that of stereo. The first attempt is the ‘quadrophonic’ format, “proposed by Peter Scheiber in 1968. It required four speakers placed at four separate corners of the listening environment for playback (Torick, 1998). Though much content had been made for this new format, it was doomed for failure due to its lack of cost-effectiveness compared to that of stereo. However, this failed attempt inspired many other multi-channel attempts aimed at the consumer market.

           Throughout the late 70s, Dolby Laboratories have been making a series of surround sound innovations in cinema sound systems, starting from a four-channel matrix system in 1976’s A Star Is Born to 1978’s Superman with the world’s first 5.1 surround sound system (Davis, 2003). This setup provides a much more realistic spatialized sound image than stereo, with wider areas of sweet spots compared to ‘quad’, as a result of its emphasis on creating a center channel (Davis, 2003). Because of its success in the cinemas, Dolby Surround had become a major player in multi-channel sound systems. Subsequently, more discrete channels were added on the horizontal plane and the height axis to form 7.1, 10.2, or even 22.2 channel surround sound systems. (Davis, 2003; Rumsey, 2012)

           Another method of reproducing spatialized sound is the sound field approach. Unlike channel-based systems that are speaker and listener oriented, “the sound field approach is based on a non-speaker-centric physical representation of the sound waves” (Nicol, 2017, pp. 290). In other words, the sounds recorded by individual microphones do not correlate to discrete channels. Instead, multiple microphones are placed in a spherical arrangement, such as the tetrahedron, to capture all sound information in a particular sound environment, thus the term ‘Ambisonic’. The recorded signals are then algorithmically processed to form spatial sound components: W, X, Y, Z, and can be then converted for use in various speaker configurations, from stereo, surround, to binaural (Nicol, 2017). With more capsules in an Ambisonic microphone, we can encode higher-order sound components. However, this requires an extensive encoding and decoding process for us to map signals to loudspeakers.

           One such technique for accurately reproducing spatialized sound components is wave field synthesis (Daniel, Moreau, & Nicol, 2003). By essence, wave field synthesis reproduction is based on the assumption of using “an infinitely large number of infinitely small loudspeakers… to generate sound fields that maintain the temporal and spatial properties” of a virtual sound source (Sporer, Brandenburg, Brix, & Sladeczek, 2017, pp. 320). Furthermore, it is capable of placing virtual sound sources both in front and behind the speaker array, giving it a unique advantage over channel-based formats. (Sporer, Brandenburg, Brix, & Sladeczek, 2017). The disadvantage of wave field synthesis is its high cost associated with the huge number of loudspeakers needed for the system to work effectively.

           In conclusion, the forefront of research for sound recording and reproduction is centered in immersive formats, with various methods aiming for the optimal balance between experience, cost, and ease of use. As I wrote at the beginning of this article, human beings have been looking for various ways to enter into alternative narrative realities for millenniums. Technology is almost always a means to an end. At this stage, it seems to me that the most successful implementation for the average consumer to obtain the balance between these things, is still stereo headphones, or earbuds. Thus, I envision a future where spatialized recording and synthesis, decoded into a binaural format, will be the overruling method of sound reproduction in the years ahead.

           

  

Citations

Blumlein, A. (1933). British Patent Specification 394,325. Reprinted. Journal of Audio Engineering Society, 6(2), 91.

Boren, B. (2017). History of 3D Sound. In Immersive Sound (pp. 40-62). Routledge.

Chourmouziadou, K., & Kang, J. (2008). Acoustic evolution of ancient Greek and Roman theatres. Applied Acoustics, 69(6), 514–529.

Daniel, J., Moreau, S., & Nicol, R. (2003, March). Further investigations of high-order ambisonics and wavefield synthesis for holophonic sound imaging. In Audio Engineering Society Convention 114. Audio Engineering Society.

Davis, M. F. (2003). History of spatial coding. Journal of the Audio Engineering Society, 51(6),  554-569.

Nicol, R. (2017). Sound Field. In Immersive Sound (pp. 290-324). Focal Press.

Reznikoff, I. (2008). Sound resonance in prehistoric times: A study of Paleolithic painted caves and rocks. Journal of the Acoustical Society of America, 123(5), 3603.

Roginska, A., & Geluso, P. (Eds.). (2017). Immersive sound: The art and science of binaural and multi-channel audio. Taylor & Francis.

Rumsey, F. (2012). Spatial audio. Routledge.

Sporer, T., Brandenburg, K., Brix, S., & Sladeczek, C. (2017). Wave Field Synthesis. In Immersive Sound (pp. 311-332). Routledge.

Torick, E. (1998). Highlights in the history of multichannel sound. Journal of the Audio Engineering Society, 46(1/2), 27-31.