What Are We Talking About When We Talk About Immersion? by Xiao Quan

What Are We Talking About When We Talk About Immersion?

Introduction 

‘Immersion’ is a term that’s been used extensively by researchers, marketers, reviewers, and consumers alike in different domains to convey a sense of mental engagement, typically pertaining to a narrative experience such as reading a novel or watching a film. In recent years, with the popularization of virtual reality and new media art industries, it’s also frequently used to convey the experience of sensory envelopment created by technology. In the audio world, it’s the dominating term to describe reproduction systems that provide height information in addition to horizontal during playback. Currently, there’s no consensus on its exact definition or measurement. This generalization of the term without a comprehensive framework of analysis creates confusion for researchers and potential disappointments for consumers. As an attempt to create an overarching framework to define and measure both the contributing factors and the immersive experience itself, this paper will summarize how it’s currently been defined in various domains, make distinctions between immersion other concepts such as transportation, envelopment, and flow, and provide a tentative framework for the discussion of immersion as outlined in Agrawal et al.’s literature review of the term (2019).

Current Definitions in The Study of Films, Games, Virtual Reality, and Acoustics

1.     Film

In studies of film viewing, ‘immersion’ is typically used as a variable to investigate the cognitive processes that occur during film viewing. Recent studies often use virtual reality CAVE, 3D, and 2D presentations of a film as variations of immersion (Visch et al., 2010; Rooney et al., 2012). Therefore, it is often defined “in terms of the sensory information the technology provides to the user” that is outside the psychology of an individual (Rooney et al., 2012, p. 409).

2.     Games

Rather than being defined as a strictly perceptual concept, immersion in game studies is typically referred to as a psychological experience that occurs during gameplay. It resembles the feeling of engagement, engrossment, or flow, and is typically associated with feeling enjoyment toward the gaming experience (Brown & Cairns, 2004; Thon, 2008). It’s also discovered that immersion as a psychological state can be measured quantitatively, both subjectively and objectively using questionnaires and related tasks (Jannett et al., 2008).

3.     Virtual Reality

This is the field where ‘immersion’ is frequently used to describe both perceptual and psychological experiences of engaging in VR. It is acknowledged as a ‘multifaceted construct’ that “involves physical and mental participation and implies getting away from everyday experience, playing a different role or taking on a new identity (Carù & Cova, 2006; Hudson et al., 2019, p. 461).”

4.     Immersive Audio

In the landscape of immersive audio, immersion is almost exclusively used to describe reproduction methods such as binaural, object-based audio, or higher-order ambisonic reproduction (Roginska & Geluso, 2017). Immersion level, as defined as “the impression of being submerged into, enveloped or surrounded by the environment” of different systems has been tested by Aspök et al (2016, p. 565).

An Extensive Understanding of Immersion

From the above overview, we can see that Immersion is often viewed in two ways. One refers to an objective experience provided by technology, the other an inner psychological state of absorption. The former can be exemplified by Slater’s definition:

 

“Let’s reserve the term “immersion” to stand simply for what the technology delivers from an objective point of view. The more the system delivers displays (in all sensory modalities) and tracking that preserves fidelity in relation to their equivalent real-world sensory modalities, the more that is ‘immersive’ (Slater, 2003, p. 1).”

            This argument is used to distinguish ‘immersion’ with ‘presence’ where he argues that presence is a human reaction to immersion, defined as the feeling of ‘being there’ whereas immersion should strictly be characterized objectively (Slater, 2003). Nowadays, given the way immersion is used to describe a much larger array of experiences, we need a broader framework for understanding the term.

In Agrawal et al.’s research on immersion terminology, besides being an “objective property of a system/technology”, it’s also characterized “as an individual’s psychological state”. Further, they summarized the contributing factors that can lead to this state, expanding the definitional realm of the term. The first of which is similar to presence, termed as “Subjective Sense of Being Surrounded or Experiencing Multisensory Stimulation (Agrawal et al., 2019, p. 3).” While this is not strictly equated with psychological immersion, it can facilitate the process. The second has to do with the narrative of the experience, if there is one. This can be also understood as “narrative immersion” where “players shift their attention to the narrative structure” and develop empathy towards the characters in the story (Thon, 2008, p. 38). The third is characterized as ‘Absorption when Facing Strategic and/or Tactical Challenges (Agrawal et al., p. 3).’ This is akin to Csikszentmihalyi's theory of flow, where one obtains an optimal state of performance when facing an adequate balance between challenge and their abilities to overcome it (Csikszentmíhalyi, 1991).

Distinctions of Immersion between Presence, Transportation, and Flow

As concluded by Agrawal et al., immersion should be distinguished from presence, transportation, and flow in the following ways:

            Presence

            Whereas immersion implies a certain level of mental involvement, presence merely suggests the perceptual experience of being at a different place other than the current physical reality. An example would be one can feel present at a laundromat while listening to a binaural recording, while not feel immersed because of the relatively uneventful nature of the experience.

            Transportation

            Transportation is very similar to Agrawal et al.’s definition of psychological immersion, only that it’s traditionally associated with literary experiences. Whereas immersion is commonly used to describe a plethora of experiences in other domains. Efforts have been made to distinguish the two, but no adequate conclusions were made (Van Laer et al., 2014; Agrawal et al., 2019). Further studies should be conducted to analyze the correlations and distinctions between narrative transportation to psychological immersion.

            Flow

            Flow experience is typically associated with an extreme, “all-or-nothing,” and optimal experience (Frochot et al., 2017) whereas immersion can be measured as a graded, either positive or negative experience (Jannett et al., 2008). It can be viewed that immersion to be a subset of experience that may contribute to flow.

A Proposed Definition of Immersion

            Based on previous discussions, we need a definition that can account for various interpretations of immersion in different contexts. Agreeing that it’s a highly subjective experience, Agrawal et al. proposed the following definition:

 

“Immersion is a phenomenon experienced by an individual when they are in a state of deep mental involvement in which their cognitive processes (with or without sensory stimulation) cause a shift in their attentional state such that one may experience disassociation from the awareness of the physical world (Agrawal et al., 2019, p. 5)”

           

            It’s important to note that in this definition, immersion is a mental process, and can be achieved with or without sensory stimulation. This implies that when it comes to achieving immersive experiences, the developmental focus of technology should be on creating discernable differences in perception, rather than on blindly increasing the technical specifications of a system. This potential for technology to illicit immersion is defined as “Immersive Potential” The subject-dependent aspect of immersive experience is defined as “Immersive Tendency.” This is varied according to an “individual’s predisposition to experience immersion” and can be measured through questionnaires (Agrawal et al., 2019, p. 5).

Implications for Future Research

            Though an overarching framework of immersion has been provided in Agrawal et al.’s research, the validity of some of the assumptions remains to be tested. Whether or not the experience elicited from narrative transportation or absorption when facing intellectual and/or technical challenge qualifies as immersion requires additional study (Agrawal et al., 2019). More importantly, few studies have accounted for the combinational influence of content, individual tendencies, as well as system/technology’s effect on experiencing immersion. This framework provides a groundwork for such a study. The challenges stand in the subjective nature of the experience. Additional questionnaires that measure a subject’s immersive tendency that also accounts for more complex dimensions such as empathetical reactions to narrative, and/or preferences towards intellectual complexities should be developed and tested in conjunction with a system’s potential to illicit immersion.

 

 

Citations

Agrawal, S., Simon, A., Bech, S., Bærentsen, K., & Forchhammer, S. (2019). Defining Immersion: Literature Review and Implications for Research on Immersive Audiovisual Experiences. In 147th AES Pro Audio International Convention. Audio Engineering Society.

 

Biocca, F., & Levy, M. R. (2013). Communication in the age of virtual reality. Routledge.

 

Brown, E., & Cairns, P. (2004, April). A grounded investigation of game immersion. In CHI'04 extended abstracts on Human factors in computing systems (pp. 1297-1300).

 

Carù, A., & Cova, B. (2006). How to facilitate immersion in a consumption experience: Appropriation operations and service elements. Journal of Consumer Behaviour: An International Research Review5(1), 4-14.

 

Csikszentmíhalyi, M. (1991). Flow: The psychology of optimal experience In: New York: HarperPerennial.

 

Frochot, I., Elliot, S., & Kreziak, D. (2017). Digging deep into the experience–flow and immersion patterns in a mountain holiday. International Journal of Culture, Tourism and Hospitality Research.

 

Jennett, C., Cox, A. L., Cairns, P., Dhoparee, S., Epps, A., Tijs, T., & Walton, A. (2008). Measuring and defining the experience of immersion in games. International journal of human-computer studies66(9), 641-661.

 

Roginska, A., & Geluso, P. (Eds.). (2017). Immersive sound: The art and science of binaural and multi-channel audio. Taylor & Francis.

 

Rooney, B., Benson, C., & Hennessy, E. (2012). The apparent reality of movies and emotional arousal: A study using physiological and self-report measures. Poetics40 (5), 405-422.

 

Slater, M. (2003). A note on presence terminology. Presence connect3(3), 1-5.

 

Thon, J. N. (2008). Immersion revisited: on the value of a contested concept. Lapland University Press.

 

Van Laer, T., De Ruyter, K., Visconti, L. M., & Wetzels, M. (2014). The extended transportation-imagery model: A meta-analysis of the antecedents and consequences of consumers' narrative transportation. Journal of Consumer research40(5), 797-817.

 

Visch, V. T., Tan, E. S., & Molenaar, D. (2010). The emotional and cognitive effect of immersion in film viewing. Cognition and Emotion24(8), 1439-1445.

 

 

 

 

Facilitating Flow Experience in Music Education by Xiao Quan

            When being asked to write autographical stories of music-making, graduate students at Boston Conservatory often produced writings that reflected transcendent or religious themes (Bernard, 2009). They recounted experiences that felt larger than life, often outside of conscious control. Their attention is hyper-focused in the particular musical activity that they forgot the passage of time and lost track of self-consciousness. In Mary Alberici’s (2004) doctoral dissertation, where she interviewed 10 college-level musicians, she recorded similar accounts of musical experiences and described them as “rising above normal physical and mental fears and concerns to a peak experience that is remembered and sought after again and again” (2004, p. 22). Though these experiences are often described and celebrated in traditional literature as ‘a stroke of genius’ or ‘touched by God’ and are shared by many professional musicians, few systematic academic studies have been conducted to dissect the underlying factors that contribute to such experiences, particularly concerning music education. In this essay, I will briefly summarize prominent research that has attempted to do so.

            Perhaps the most well-known attempt at tackling the phenomenon is Csikszentmihalyi’s theory of flow. Driven by a desire to understand what engages people in an activity, he set out to interview people from various professions, ranging from musicians, doctors, dancers to rock climbers (Csikszentmihalyi, 1975). From these interviews, he realized that a particular state of optimal performance was reported across professions. Similar to the accounts of conservatory students, they reported feeling full engagement, deep concentration, and elevated abilities in performing a task. Csikszentmihalyi described this as the ‘flow state,’ analogous to the expression of ‘being in the flow’ many people used in their interviews. He concluded that achieving flow is determinant of both the challenge of the task and the subject’s ability to overcome that challenge (Csikszentmihalyi, 1991).

            Subsequently, the preconditions for achieving and maintaining such a balance between challenge and ability in relation to music education have been researched extensively by Lori Custodero. Namely, she conducted longitudinal studies with children from age four to five with follow up studies when they are eleven and twelve years old to derive developmental implementations of flow indicators in young children’s music learning (Custodero 1998, 1999, 2000b). She then used this information to investigate the music education of infants, toddlers, and early school-aged children by systematic observation, aimed at deriving the social factors that contribute to the aforementioned optimal challenge-ability balance (Custodero, 2000a). The results are summarized below.

1.     The challenge component in the flow experience is greatly facilitated by the presence of an outside figure, such as family or adult teachers. However, such presence can also be flow-inhibiting. It’s only when they are invited, provide children with clear goals and immediate feedback, and guarantees children’s autonomy in solving the problem, does their presence become flow-inducing.

2.     Students must have autonomy in grappling with a challenge. To encourage autonomy, the sequence of studied material should be arranged according to the students’ abilities. The material should also be presented in a somewhat simplified manner to allow for students’ own expansion of knowledge during instruction. Furthermore, extended time for expanding of the studied material in different contexts outside school should be designed into the educational structure.

3.     The subject must be enjoyable to the learner for flow experience to emerge. To elicit maximum engagement, the learner must be encouraged to participate and make suggestions to the curriculum, rather than being explained to over a broad spectrum of knowledge. Physical movements also facilitated flow experiences as observed when playing an instrument.

            Simply put, to facilitate flow in music learning, we need: a) a clear validated goal offered by adult presence; b) individual time and space to explore and iterate learned knowledge in various contexts; and c) artistic authenticity in the activity relevant to the learner. This insight is perhaps valuable to not only music education, but to various other disciplines as well. We can already see how a) and b) are implemented in machine learning algorithms. With the objective of maximally engaging people’s attention becomes ever more pervasive for all sorts of tech companies, it’s important to understand from both a consumer and a corporate perspective how we are being engaged in activities. Future studies regarding flow states could be relevant to music education, emotional wellbeing, immersive experiences, and more.

 

 

  

Citations

Alberici, M. (2004). A phenomenological study of transcendent music performance in higher education (Doctoral dissertation, University of Missouri-Saint Louis).

 

Bernard, R. (2009). Music Making, Transcendence, Flow, and Music Education. International Journal of Education & the Arts10(14), n14.

 

Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. san francisco: Josseybass. Well-being: Thefoundations of hedonic psychology, 134-154.

 

Csikszentmihalyi, M. (1991). Flow: The psychology of optimal experience (Vol. 41). New York: HarperPerennial.

 

Custodero, L. A. (1998). Observing flow in young children's music learning. General Music Today, 12(1), 21-27.

 

Custodero, L. A. (1999). Construction of Musical Understandings: The Cognition-Flow Interface.

Custodero, L. A. (2000a). Engagement and experience: a model for the study of children’s musical cognition. In Proceedings of the sixth international conference on music perception and cognition. Keele, UK: Keele University Department of Psychology.

 

Custodero, L. A. (2000b). Engagement mid interaction: A multiple-perspective investigation of challenge in children's music learning.

 

Custodero, L. A. (2002). Seeking challenge, finding skill: Flow experience and music education. Arts education policy review, 103(3), 3-9.

 

 

 

Networked Musical Performance: A Dance with Latency? by Xiao Quan

With the zeitgeist of economic globalization and the rapid development of broadband internet infrastructure, human communications have embraced the virtual realm. From banking, stocks, news, to food, rides, and now education, more and more aspects of information exchange happen online rather than in person. In the domain of music, however, though post-production and distribution are happening more often online, with talent collaboration platforms such as ‘fiverr’ and streaming services such as Spotify and YouTube, live networked performances from distributed musicians have yet to be embraced by the mainstream. This paper outlines the challenges faced by both the technicians and the musicians when attempting to play music through a network. Important research that attempts to work around these obstacles has been summarized.  

The idea of networked music is simple. Like the concept of the telephone that enables people in different locations to talk to each other in ‘real-time’, networked music attempts to enable musicians in different locations to play music together, “without all the bother of buying plane tickets and the time it takes to travel (Oliveros, 2009, p. 433).” The economic advantages for such a way of music-making are straightforward and its applications all-encompassing, from distributed recording sessions to mobile music gaming, to online karaoke rooms. In a traditional performative sense, the audience of a networked performance can experience multicultural performances within their own context at the same time (Backman, 2011). However, the reality of these visions at a technological level still faces inevitable obstacles that make music performance as we understand in real life almost impossible to execute over a network.

The main reason for this is network latency. Because of the time it takes for a signal to travel through various network protocols and geopolitical barriers, a typical roundtrip latency for a long-distance voice call is around 200ms (Backman, 2011). In the past two decades, various attempts at reducing this latency have been made in commercial and academic realms alike. Since its creation in 2007 at CCRMA, Stanford University, JackTrip has been extensively used as the software for transmitting high-quality audio with low latency for network music performances within high-speed research networks. Yet the ‘Mouth to Ear’ latency can seldomly be reduced to the minimum requirement for group ensemble performance, which is within 24ms (Gurevish et al., 2004; Bartlette et al., 2006; Hupke et al., 2019; Tsioutas et al., 2019). As a result, many research endeavors have been conducted to find creative solutions to such a limitation.

Technically, besides creating software like JackTrip to reduce latency, numerous attempts have been made to enhance synchronization between distributed musicians. In Renaud’s research (2011, p.1), “a semi-standardized cueing framework” for networks with over 50ms of latency had been proposed. Subsequently, Alexandrak and Bader proposed a scheme of utilizing computer accompaniment techniques, triggering pre-recorded solos to represent remote musicians in real time (2014). Recently, progress has been made in creating a ‘global metronome’ to accurately account for latencies across distributed musicians (Hupke et al., 2019). This method utilizes two distributed metronomes and GPS positioning to determine the latency between participants. It then adds artificial latency between the network to realize a ‘delayed’ ensemble synchronization between musicians.

From the perspective of musicians, frameworks of Quality of Service (QoS) and Quality of Experience (QoE) of Networked Music Performance have been proposed (Colmenares et al., 2013; Tsioutas et al., 2019). In Tsioutas et al.’s research, a new measurement concept “Quality of Musician’s Experience (QoME) that combines subjective and objective measures of a networked session is being proposed. These measurements include variables ranging from room acoustics, performer’s affective states, classification of the music being played, as well as network latency, audio quality, and network jitter into account forming a holistic picture of the experience. Interestingly, in a small scale educational setting, Iorwerth et al.’s case studies of remote music performance learning done between The University of the Highlands and Islands and Glasgow Caledonian University have shown that network latency is not the most troublesome factor affecting negatively towards QoE, but rather, time management and communication turn out to be more challenging online.

In conclusion, networked music performance is a fast-growing research topic. It will likely be referenced in a wide range of applications, not limited to stage performance. As most recent research concludes, the quality and direction of future research in Networked Performance will greatly depend on technological developments with latency reduction being the end goal. There may be one day where ultra-low latency communication (within the minimum requirement for ensemble synchronization) can be achieved. In the meantime, more attention will be paid on improving the overall Quality of Experience for these applications on the individual level, as well as on effectively creating specific genres to maximize the potency of spectatorship of such events.  

 

 

 

Citations

 

Alexandrak, C., & Bader, R. (2014, January). Using computer accompaniment to assist networked music performance. In Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society.

 

Backman, J. (2011, September). Portable and Networked Devices for Musical Creativity. In Audio Engineering Society Conference: 43rd International Conference: Audio for Wirelessly Networked Personal Devices. Audio Engineering Society.

 

Bartlette, C., Headlam, D., Bocko, M., & Velikic, G. (2006). Effect of network latency on interactive musical performance. Music Perception: An Interdisciplinary Journal24(1), 49-62.

 

Colmenares, J. A., Peters, N., Eads, G., Saxton, I., Jacquez, I., Kubiatowicz, J. D., & Wessel, D. (2013). A multicore operating system with QoS guarantees for network audio applications. Journal of the Audio Engineering Society61(4), 174-184.

 

Gurevish, M., Chafe, C., Leslie, G., & Tyan, S. (2004, November). Simulation of Networked Ensemble Performance with Varying Time Delays: Characterization of Ensemble Accuracy. In ICMC.

 

Hupke, R., Beyer, L., Nophut, M., Preihs, S., & Peissig, J. (2019, October). Effect of a Global Metronome on Ensemble Accuracy in Networked Music Performance. In Audio Engineering Society Convention 147. Audio Engineering Society.

 

Hupke, R., Sridhar, S., Genovese, A., Nophut, M., Preihs, S., Beyer, T., ... & Peissig, J. (2019, October). A Latency Measurement Method for Networked Music Performances. In Audio Engineering Society Convention 147. Audio Engineering Society.

 

Iorwerth, M., Moore, D., & Knox, D. (2015, August). Challenges of using Networked Music Performance in education. In Audio Engineering Society Conference: UK 26th Conference: Audio Education. Audio Engineering Society.

 

Oliveros, P. (2009). Networked Music: Low and High Tech. Contemporary Music Review28(4-5), 433-435.

 

Oliveros, P., Weaver, S., Dresser, M., Pitcher, J., Braasch, J., & Chafe, C. (2009). Telematic music: six perspectives. Leonardo Music Journal19(1), 95-96.

 

Renaud, A. B. (2011, November). Cueing and composing for long distance network music collaborations. In Audio Engineering Society Conference: 44th International Conference: Audio Networking. Audio Engineering Society.

 

Tsioutas, K., Doumanis, I., & Xylomenos, G. (2019, March). A framework for understanding and defining Quality of Musicians’ Experience in Network Music Performance environments. In Audio Engineering Society Convention 146. Audio Engineering Society.

 

 

 

 

What to Do with The Data? Current Methods and Applications of Music Information Retrieval by Xiao Quan

            The definition of the study of Music Information Retrieval (MIR) is quite straightforward: to extract meaningful information from a piece of music. However, what to do with the information extracted is not as easy to generalize in one sentence. In this essay, I will provide an outline of some popular MIR tasks and approaches that pertains to digital audio content performed in research, as well as some of its current commercial implementations.

            Ranking from a scale of low to high subjectivity, some popular MIR tasks include: Tempo Estimation, Key Detection, Note Onset, Beat Tracking, Melody Extraction, Chord Estimation, Structural Segmentation, Music Auto-tagging and Mood Recognition (Choi et al., 2017). While some of these features such as pitch, tempo, and note onset can be articulated logically, with a large pool of domain knowledge for identification, while others such as Structural Segmentation, Music Auto-tagging, and Mood Recognition can be highly subjective, without much strict logic for conventional algorithms to identify (McFee, Nieto & Bello, 2015; Lamere 2008). However, with the development of higher computing power and Machine Learning algorithms, especially Deep Learning, these highly subjective tasks are also becoming probable and even accurate for computers to perform (Choi et al., 2017).

            Whatever the tasks are, the first step for most MIR research is often to transform a one-dimensional discrete-time signal into a two-dimensional frequency and time representation, e.g. spectrograms (Choi et al., 2017). Because many current popular machine learning libraries are written in python, a popular tool for such conversion is the ‘librosa’ library written by McFee et al. (2015). Depending on the task, different spectrograms are being created for subsequent feature extraction. Common spectrograms include Short-Time Fourier Transform (STFT), Mel-Spectrogram, Constant-Q Transform (CQT), and Chromagram (McFee et al. 2015). Of the four, STFT is the fastest and the most efficient to perform, but often less useful in tasks that are frequency/pitch related, as it provides a linear distribution of center frequencies. Mel-Spectrogram and CQT, however, provides better results when performing more subjective tasks, such as boundary detection (Ullrich, Schlüter & Grill, 2014) or learning latent features for music recommendation (Van den Oord, Dieleman & Schrauwen, 2013, as cited in Choi et al., 2017). This is because in these spectrograms, the distributions of center frequencies are spaced logarithmically to match human perception, in the case of CQT, pitch classes. Lastly, Chromagrams are like an extension of CQT, with frequencies within a pitch class folded into a set of notes of a scale on the y-axis.

            The recent research trend for extracting features from the aforementioned spectrograms is to utilize deep learning, a form of machine learning algorithm that, instead of using strict pre-described logic to perform a task, learns rules from large amounts of example data to guide its behavior. This method has proven to be effective for tasks that are complex, subjective, with hard-to-define groundtruths, such as music auto-tagging and genre classification (Choi et al, 2017; 2018). In the past decade, the success of online music services such as Spotify, Shazam, and Tidal, have increased both commercial and academic attention on MIR research and applications. Though MIR applications are extensive and interdisciplinary, ranging from music information recognition plugins, score-following, hit-song prediction, to new interfaces for music interaction and browsing (Schedl, Gómez, & Urbano, 2014), the predominant application of MIR research is Playlist Generation and Recommendation, as it’s one of the highest level problems for MIR. The current trend is using deep learning methods to study both music content similarities as well as pre-existing human selected sequences to calculate the next track recommendation, or realize automatic playlist continuation (Schedl, M. 2019).

            In conclusion, the merging of MIR and Data Science techniques will continue to drive future developments of MIR applications. As we step into the age of 5G connectivity, an exponential growth of multi-dimensional data will inevitably become available to us. How exactly will this guide the future MIR research topics is hard to predict, yet it’s certain that the accuracy and quality of current MIR/Deep Learning tasks will continue to improve with more data to learn from.

 

  

 

 

Citations

 

Choi, K., Fazekas, G., Cho, K., & Sandler, M. (2017). A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396.

 

Choi, K., Fazekas, G., Cho, K., & Sandler, M. (2017). The effects of noisy labels on deep convolutional neural networks for music classification. arXiv preprint arXiv:1706.02361.

 

Choi, K., Fazekas, G., Cho, K., & Sandler, M. (2018). The effects of noisy labels on deep convolutional neural networks for music tagging. IEEE Transactions on Emerging Topics in Computational Intelligence2(2), 139-149.

 

Lamere, P. (2008). Social tagging and music information retrieval. Journal of new music research37(2), 101-114.

 

McFee, B., Nieto, O., & Bello, J. P. (2015, October). Hierarchical Evaluation of Segment Boundary Detection. In ISMIR (pp. 406-412).

 

Schedl, M. (2019). Deep Learning in Music Recommendation Systems. Frontiers in Applied Mathematics and Statistics5, 44.

 

Schedl, M., Gómez, E., & Urbano, J. (2014). Music information retrieval: Recent developments and applications. Foundations and Trends® in Information Retrieval8(2-3), 127-261.

 

Ullrich, K., Schlüter, J., & Grill, T. (2014, October). Boundary Detection in Music Structure Analysis using Convolutional Neural Networks. In ISMIR (pp. 417-422).

 

Van den Oord, A., Dieleman, S., & Schrauwen, B. (2013). Deep content-based music recommendation. In Advances in neural information processing systems (pp. 2643-2651).

 

 

Spatial Music Composition: Summary of Current Approaches by Xiao Quan

Electroacoustic music has always been at the forefront of redefining how music can be created and listened to through technology. It is also for this reason that it sits outside the realm of mainstream music listening, as not all people are interested in doing the work required to entertain such explorations. Yet ideas originated from the electroacoustic populates the current mainstream scene. For years, Pop and EDM genres have been obsessed with exploring new timbres while relying on traditional harmonies and rhythms to appeal to the masses. As the future of commercial audio technology is approaching immersive sound, space as a music parameter will inevitably be explored in how mainstream music will be made. This paper summarizes such explorations currently existing in electroacoustic music as outlined in Enda Bates’ Doctoral Thesis at the University of Dublin (2009), hoping to provide insights on how space can be integrated into mainstream music production other than ‘adding reverb’.

Unlike timbre, melody, harmony, and rhythm, space is a much broader music parameter that relates to “dimensions of individual sound objects, the relationships between sound objects and the relationship between the sound objects and the acoustic space in which they are heard (Bates, 2009, p1).” It’s therefore useful to look at space in terms of utility for clarity. One such function of space would be “to make complexity intelligible” (Harley, 1997, p.75). The spatial separation of different sound sources permits more intelligibility compared when they are all grouped in one spot. Composers such as Charles Ives, Henry Brant, and John Cage are composers whose work makes use of this approach. In Ive’s ‘The Unanswered Question’ for example, three distinct and harmonically and rhythmically dissonant musical layers of strings, woodwinds and brass are separated spatially to facilitate intelligibility of the piece as a whole (Bates, 2009).

Another way electroacoustic composers utilize space is through the repetition of spatial motifs. This is evident in German composer Karlheinz Stockhausen’s work Kontakte, in which “rotation, looping, alternation” are extensively exploited by recording a rotating speaker with four encircling microphones and playing back with a quadrophonic loudspeaker arrangement around the audience. His main goal here is to achieve a “serialization of angular direction” of the sound sources (Bates, 2009, p. 212).

Towards the end of the 20th century, Denis Smalley developed what is known as ‘the theory of spectromorphology’ as an attempt to create a unified aesthetic framework for listening to electroacoustic music, as traditional restrictions of timbre, rhythm, and harmonies no longer in existence (Smalley, 1997). In it, he stresses that any perceived sound can be viewed as a form of a physical gesture, which he terms as ‘gestural surrogacy’. The characteristics of such surrogacies can be analyzed by looking at its ‘trajectory’, which includes “onset, continuant, and termination” (Smalley, 1997), similar to an ADSR envelop. This profile of spectromorphology of a given sound object can be used as an organizing principle for its movements inside a spatialized music composition (Bates, 2009). This is the foundation of what Bates calls a gestural approach to spatial composition.

On the side of spatial music instruments, augmented instrument is an approach to link an instrument performer’s musical gesture to the spatial gesture of a given composition (Bates, 2009). An augmented instrument often consists of a preexisting musical instrument, such as a guitar or a violin, with additional hardware to enable spatial or timbral signal processing. One such device is the polyphonic pickup, which outputs individual signal channels for different strings in a string instrument. This could enable the creation of a spatialized composition using a surround speaker array; or be used with software plugins to be additionally processed for different spectromorphology profiles (Bates, 2009).

In conclusion, the research for spatial music composition techniques is still relatively minimal and restricted to the realm of electroacoustics. With the newly object-based distribution formats such as Dolby Atmos, MPEG-H, creators are granted more permission in the spatializing of sound objects. It seems from Bates’ thesis that the theory of spectromorphology is a good way of organizing spatial gestures and musical gestures compositionally, yet how this could influence the composition of traditional musical genres, or perhaps informing the creation of a new spatial mainstream musical genre in the age of the immersive is still a relatively unknown world waiting to be explored.

 

 

Citations

 

Bates, E. (2009). The Composition and Performance of Spatial Music (Doctoral dissertation, Trinity College Dublin).

 

Harley, M. A. (1997). An American in Space: Henry Brant's" Spatial Music". American Music, 70-92.

 

Reynolds, C. W. (1987, August). Flocks, herds and schools: A distributed behavioral model. In Proceedings of the 14th annual conference on Computer graphics and interactive techniques (pp. 25-34).

 

Smalley, D. (1997). Spectromorphology: explaining sound-shapes. Organised sound2(2), 107-126.

HRTF For the Masses: Current Approaches and Challenges by Xiao Quan

            The perception of spatial sound is an inherent part of our natural listening experience. However, spatial sound reproduction remains to be a challenge when it comes to commercial implementation in 2020. Binaural sound through headphones seems to be the most cost-effective way, hardware-wise at an individual level, to bring the experience of virtual spatial audio to the masses, yet obstacle remains. In this essay, I will briefly describe the factors that influence how we perceive spatial sound. I will then explain what HRTFs are, how they can be measured, modeled, or selected; and lastly, what are current approaches and challenges of finding the right fit of HRTFs for the average consumer.

            Three main factors influence how we locate a sound source: Interaural Time Difference (ITD), Interaural Intensity Difference (IID), and Spectral Shaping by the Pinnae (Outer Ear) Structures (Wenzel et al., 2017). While ITD and IID are primarily responsible for helping us locate sound on the horizontal plane, our pinnae structure helps us locate sound on the vertical plane. The ITD and IID are determined by the size of our head and upper shoulders: the width and density of our head and shoulders cause subtle time and intensity differences for sounds coming from the side compared to sounds coming from the front. This subtle interaural difference of intensity and time enables us to determine the spatial characteristics of a sound source on the horizontal plane (Middlebrooks & Green, 1991). Recent research has shown that for broadband sounds, ITDs are responsible for determining the location of frequencies up to 4000 Hz (Bernstein, 2001), whereas for temporal fine structures in sound, ITDs are useful when identifying the location of frequencies up to 1400Hz (Brughera, Dunai, & Hartmann, 2013). On the vertical dimension, the material and shape of our outer ear (pinnae) act as a spectral coloration filter to all incoming sounds. This coloration effect of the pinnae is highly directionally dependent, especially on the vertical dimension. Therefore, our brain can use this variation in frequency to determine the vertical location of a sound source (Searle et al., 1975; Wenzel et al., 2017).

            All of the factors above that influences how we perceive spatial sound can be measured and transferred into mathematical functions, with azimuth (q ), elevation (f ), distance (d), and angular frequency (w ) as variables, known as ‘Head-Related Transfer Functions’ (HRTFs). These transfer functions can then be selected and applied onto an audio signal to mimic how it is perceived as it reaches our ears, offering it spatial characteristics (Roginska, 2017). The acoustic measurement process for HRTFs is expensive and laborious. To do it properly, one must first create an anechoic environment. Then, a speaker array along various points on the vertical plane, equidistance to the subject must be setup. For the horizontal plane, we can either rotate the test subject or the speaker array to obtain the positional data of the test signal. Next, the subject (which can either be a human being or a test mannequin head) needs to have binaural microphones inserted in his or her ears. The subject must remain stationary while test signals are being played from various points along the speaker array in the virtual sphere that surrounds the subject’s head. These signals are then picked up by the binaural microphones. The differences in spectral information between the recorded signal and the original signal are coded in alignment with the changes in the aforementioned variables, such as azimuth and elevation, to formulate the HRTFs of the subject. The whole process takes at least an hour (Roginska, 2017).

            In an ideal world, one set of laboriously measured HRTFs can be generalized and applied to audio signals so that everyone can enjoy spatial sound in 3D audio applications. However, studies have shown that individual variations in pinnae structure contribute significantly to where the notches of spectral coloration are located (Middlebrooks & Green, 1992). Thus, a particular set of HRTFs measured for one person can yield vastly different extents of sound localization performances for others. Yet it’s unrealistic to create customized HRTFs for every individual consumer. Therefore, methods need to be invented to obtain the optimal balance between the accuracy of HRTFs and the ease of acquiring them, for binaural sound design to be commercially viable.  

            Besides individual acoustically measured HRTFs, there are three main approaches aimed at overcoming this dilemma. 1) Reconstructing HRTFs based on 3D model scans of the test subject (Katz, 2001). 2) User-selected HRTFs with customized IID and ITD characteristics, based on simple measurements of head width and torso size (Algazi et al., 2001). 3) User-selected HRTFs from database (Roginska et al., 2010). In conclusion, the path moving forward for HRTF selection and synthesis for the optimal balance between accuracy and cost is an ongoing area of binaural research. Various fields such as gaming and virtual reality have recently rolled out hardware support for processing 3D audio. The results of upcoming studies on this topic will directly affect how we experience reproduced sound in the future.

           

 

Citations

Algazi, V. R., Duda, R. O., Morrison, R. P., & Thompson, D. M. (2001, October). Structural composition and decomposition of HRTFs. In Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575) (pp. 103-106). IEEE.

 

Bernstein, L. R. (2001). Auditory processing of interaural timing information: new insights. Journal of neuroscience research66(6), 1035-1046.

 

Brughera, A., Dunai, L., & Hartmann, W. M. (2013). Human interaural time difference thresholds for sine tones: The high-frequency limit. The Journal of the Acoustical Society of America133(5), 2839-2855.

 

Katz, B. F. (2001). Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. The Journal of the Acoustical Society of America110(5), 2440-2448.

 

Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual review of psychology42(1), 135-159.

 

Middlebrooks, J. C., & Green, D. M. (1992). Observations on a principal components analysis of head‐related transfer functions. The Journal of the Acoustical Society of America92(1), 597-599.

 

Roginska, A., Santoro, T. S., & Wakefield, G. H. (2010, November). Stimulus-dependent HRTF preference. In Audio Engineering Society Convention 129. Audio Engineering Society.

 

Roginska, A. (2017). Binaural audio through headphones. In Immersive Sound (pp. 88-123). Routledge.

 

Searle, C. L., Braida, L. D., Cuddy, D. R., & Davis, M. F. (1975). Binaural pinna disparity: another auditory localization cue. The Journal of the Acoustical Society of America57(2), 448-455.

 

Wenzel, E. M., Begault, D. R., & Godfroy-Cooper, M. (2017). Perception of spatial sound. In Immersive sound (pp. 5-39). Routledge.

 

 

 

Why Aren’t We Listening to Everything in 3D Already? An Overview of Current Applications of Immersive Audio by Xiao Quan

            Immersive Audio (or 3D sound, Spatial Audio) has been somewhat of a buzzword in the field of music technology for quite some time. However, it’s not until recent years, alongside the exponential interest in Virtual, Augmented, and Mixed Reality technologies, do the general public gain more exposure to immersive audio. Branding terminologies such as ‘Dolby Atmos’, ‘360 Reality Audio’, or ‘HRTF Binaural’, are starting to appear in movie theaters, smartphone specifications, or some streaming services. However, despite this increased public exposure, commercial implementations of immersive audio are still in an experimental, trial and error stage. No one is certain whether it will be the new industry standard of audio. This article provides an overview of current applications of immersive audio in different fields and offers an outlook for its development potential.

            By essence, immersive audio stems from a continued attempt at reproducing spatialized sound. In contrast to stereo or multi-channel surround systems, immersive audio adds height information, and an improved spatial positioning of sound sources (Gerzon, 1973), (Olivieri, Peters, & Sen, 2019). Currently, popular immersive audio distribution formats, such as Dolby Atmos, DTS:X, Auro3D, and MPEG-H, are based on the idea of ‘Object-Based Audio.’ The word ‘object’ can roughly be understood as audio information encoded with spatialization metadata, which can later be decoded in the reproduction stage (Susal, Krauss, Tsingos, & Altman, 2016), (Flanagan, 2019). This implies that in contrast to channel-based approaches, spatialization arrangements are not hard-pressed into the final product, but instead re-constructed at the consumer’s end, depending on the reproduction configuration available. In other words, Object-Based Audio presents the opportunity for audio producers and engineers to make a ‘one-size fits all’ version of immersive audio for all reproduction systems, from 22.2, to stereo, to binaural for headphones. The supported receiver will decode and make output decisions to speakers based on how they are set up (Sexton, 2017). With this in mind, let’s look at some current applications of immersive audio.

            Cinema

            Perhaps the most predominant application of immersive audio is in the domain of film, where the establishment of a sense of place contributes highly to the audience’s engagement (Lynch, 2016). This is also a domain where immersive sound systems can be utilized to its fullest potential, as more speakers in acoustically treated spaces are being engaged at fuller volumes, compared to other reproduction environments. Since the inception of Dolby Atmos in 2012, we have seen a steady increase in the implementation of such systems in the cinematic landscape (Dolby Laboratories, 2013).

            VR

            Immersive audio is also steadfastly tied in with the development of VR, AR and MR technologies. In these fields, a highly precise spatialized sound reproduction is required. In addition, advantages offered by object-based audio rendering is paramount to the success of a VR application. These include flexible manipulation of an audio object’s spatial parameters with head-tracking, increased user personalization in streamed VR content, better control over diegetic and non-diegetic sounds in post-production with binaural processing, et cetera (Susal, Krauss, Tsingos, & Altman, 2016).

            Music Streaming

            At the time of writing, music streaming services that offer immersive content is limited. The two main distribution formats used are Dolby Atmos Music and Sony’s ‘360 Reality Audio’ with the latter based on MPEG-H 3D (Fraunhofer, 2019). The streaming platforms that distribute these contents are also limited, with Dolby Atmos Music only on Tidal on Android devices, and 360 Reality Audio on Tidal, Deezer, and nugs.net. Both of these services are launched in the final quarter of 2019. Therefore, it remains to be seen whether consumers will prefer the immersive formats over stereo.

            Live Broadcasting

            This is an area that has received much academic attention, particularly in conjunction with the MPEG-H 3D audio format presentations. Similar to how VR takes advantage of the audio object metadata for head-tracking and other creative decisions, Live Broadcasting in MPEG-H can utilizes this metadata to give the audience more control over their experience (Stenzel & Scuda, 2014). For example, when viewing a sports event at home, MPEG-H allows “consumers to have personalized playback options ranging from simple adjustments (such as increasing or decreasing the level of announcer’s commentary or actor’s dialogue relative to the other audio elements) to conceivable future broadcasts where several audio elements may be adjusted in level or position to tailor the audio playback experience to the user’s liking” (Herre, Hilpert, Kuntz & Plogsties, p. 823, 2015). However, commercial implementations of such systems are currently few.

            Live Sound

            Immersive live sound is, at present, a niche market. The major players include d&b Audiotechnik and L-Acoustics’ L-ISA systems (FOH Magazine, 2019). Similar to the cinematic realm, immersive audio systems in this domain take advantage of Object-based Audio rendering to create a more precise distribution of spatialized sound. However, we are witnessing a steady increase in venues adopting new object-based live sound systems for more immersive concert experiences (FOH Magazine, 2019).

            Automotive

Lastly, let look at cars. Commercial attempts in integrating immersive audio systems in automobiles are few. However, a study experimenting running object-based audio in car processors have been made in May 2019 (Kovačević, Kaprocki & Popović), with the conclusion that no significant increase in processing power is required to run an object-based audio system in an automobile context. This implies potential in future implementations of immersive audio systems in the automotive domain.

            Conclusion

            From the overview above, we can see that the reason behind our current attention on immersive audio is two-fold. The first being an improved listening experience for the consumer: with the additional height information and increased precision in spatial positioning, immersive audio is a more accurate representation of spatial sound. The second reason is the versatility it enables, for both consumers and producers, to create and consume the same content in different ways. As the possibilities of entertainment distribution platforms increase with new technological advancements, the scalability of a distribution format becomes very important. Object-Based Immersive Audio technology provides such scalability, making it a ‘future-proof’ way to produce content. For the average consumer, though we are not listening to everything in immersive formats right now, I suspect that we will be forced to adapt to it, not so much for the listening experience, but for the versatility it provides. Thus, I believe the phrase ‘Immersive Audio’ is somewhat of a misnomer. Perhaps a more accurate description for these formats would be something like ‘Ubiquitous Spatial Audio’ instead.

Citations

Dolby Atmos Reaches 85-Title Milestone with New Films Announced at ShowEast 2013. (2013,

October 22). Retrieved from http://investor.dolby.com/news-releases/news-release-details/dolby-atmos-reaches-85-title-milestone-new-films-announced

Herre, J., Hilpert, J., Kuntz, A., & Plogsties, J. (2015). MPEG-H audio—the new standard for universal spatial/3D audio coding. Journal of the Audio Engineering Society62(12), 821-830.

Kovačević, J., Kaprocki, N., & Popović, A. (2019, May). Review of automotive audio technologies: immersive audio case study. In 2019 Zooming Innovation in Consumer Technologies Conference (ZINC) (pp. 98-99). IEEE.

Lynch, D. (2016). Catching the big fish. Penguin.

Multi-Channel Arrays. (2019, October 17). Retrieved from https://fohonline.com/articles/techfeature/multi-channel-arrays/

Multi-channel Arrays: "Immersive" is the New Surround – Part 2. (2019, November 13). Retrieved from https://fohonline.com/articles/tech-feature/multi-channel-arraysimmersive-is-the-newsurround-part-2/

Gerzon, M. A. (1973). Periphony: With-height sound reproduction. Journal of the audio engineering society21(1), 2-10.

Olivieri, F., Peters, N., & Sen, D. (2019). Scene-Based Audio and Higher Order Ambisonics: A technology overview and application to Next-Generation Audio, VR and 360 Video.

Roginska, A., & Geluso, P. (Eds.). (2017). Immersive sound: The art and science of binaural and multi-channel audio. Taylor & Francis.

Flanagan, P. (2019, June 20). 5G and MPEG-H for Ultra-Immersive Gaming and Entertainment. Retrieved March 2, 2020, from https://www.youtube.com/watch?v=Jl8zBR9YgXE

Sexton, C. (2017, October). Immersive Audio: Optimizing Creative Impact without Increasing Production Costs. In Audio Engineering Society Convention 143. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh

Stenzel, H., & Scuda, U. (2014, October). Producing interactive immersive sound for MPEG-H: A field test for sports broadcasting. In Audio Engineering Society Convention 137. Audio Engineering Society.

Susal, J., Krauss, K., Tsingos, N., & Altman, M. (2016, September). Immersive audio for VR. In Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.

Sony Introduces All New "360 Reality Audio" Based on MPEG-H. (2019, January 10). Retrieved from https://www.audioblog.iis.fraunhofer.com/sony-360-reality-audio-mpegh

The Emergence of The Maker Community and Its Impact on Music Technology Education by Xiao Quan

  “The Maker Movement” is a recent social phenomenon that had garnered wide-spread attention from students, institutions, and businesses. It first emerged somewhat as an underground counter-culture movement. According to Dale Dougherty, the founder of Make magazine, “the maker movement has come about in part because of people’s need to engage passionately with objects in ways that make them more than just consumers.” (Dougherty, 2012, p. 12). Indeed, the community typically consists of enthusiasts in robotics and electronics who, usually self-motivated, design and manufacture innovative technologies and objects, with the help of increasingly available electronic components, digital fabrication tools, and open-source code sharing platforms. These include microcontrollers, PCI boards, 3D-printers, CNC machines and GitHub. Physical communal spaces that provide access to more comprehensive tools are often referred to as ‘makerspace’, or ‘hackerspace’, where people come together to make unconventional physical products and share ideas. ‘Makers’ can also connect via the internet, which offers a vibrant swarm of educational information and shared resources for potential projects. 

The movement’s emergence can be traced to Seymour Papert’s theory of constructivism, which asserts learning is done “by constructing knowledge through the act of making something shareable” (Martinez & Stager, 2013, p.21, as cited in Halverson & Sheridan, 2014). This is significant because it aligns the movement with the pedagogies of many progressive educational institutions. MIT, for example, created the FabLabs in 2005 as “pedagogical environments that allow everyday people to solve their own problems by producing (rather than purchasing or outsourcing) the tools they need” (Halverson & Sheridan, 2014, p. 499). Nowadays, many prominent higher education institutions offer a makerspace, or digital fabrication facilities for students interested in rapid prototyping ideas, typically in disciplines that involve engineering and electronics. NYU, for example, provides a makerspace at its Tandon School of Engineering, a fabrication lab at Tisch’s ITP, as well as a Fab Lab in the Steinhardt School of Culture, Education, and Human Development.

In Music Technology, especially in the field of electronic music, the idea of making new innovative technologies independently has always been prominent. Don Buchla and Robert Moog are two prime examples who, in the 60s, driven by two distinct visions of what electronic music is and should be, invented the Buchla music box and the keyboard-controlled synthesizers, respectively (Garud & Karnoe, 2013). In the same philosophical vein of constructivist theory, the term ‘Maker Education’ was first introduced by Dougherty, who asserts that “by harnessing the power of making, Maker Education allows us to create engaging and motivating learning experiences” (Dougherty, 2012, as cited in Hughes, 2018, p. 292). Under this belief, current ‘maker’ projects within music technology include a) using microcontrollers, such as Arduino, to rapid prototype new midi controllers, musical instruments, audio effect processors, and other audio-visual performance devices; b) creating wearable technologies; c) designing for music production in virtual reality and games, and others (Hughes, 2018).

In the realm of Higher Education, however, many Music Technology programs surveyed in the US, UK, and Europe, remains to be centered around the field of recording and mixing, with few exceptions such as NYU, Berklee, and Carnegie Mellon University, who offers a more inter-disciplinary approach (Hughes, 2018). Alayna Hughes is a strong proponent of integrating maker education philosophy in Higher Education. She argues that such curriculums could provide students with a more diverse set of skills that broaden their employment opportunities. In addition to various engineering positions in recording studios and live sound, the maker education framework enables students to be employable in areas such as digital fabrication, wearable creation, game audio design, controller design, app design, C programming careers, et cetera (Hughes, 2018). The validity of this argument, however, remains to be examined.

In my view, the idea of maker education in Music Technology is a powerful learning paradigm that encourages innovation at the forefront of the field, where ideas from other disciplines start to converge. However, it can also potentially backfire at learners if they don’t have a strong foundation in the discipline they are innovating in. I am a firm believer in knowing the rules before breaking them. In my opinion, one drawback of the maker movement, which is also embedded in its nature, is that the maker creates the value and direction of their creations. While this is certainly liberating and democratizing for some, for others without a solid foundation in a particular discipline, this self-deterministic ideal of learning can sometimes become an easy way out, as there are no pre-established hierarchies of value that validate their projects. We see examples of this more often in the artistic realm, performance to be exact. Coming from an acting background educated in a maker education inspired program, I have seen numerous performance projects fail, unable to connect with its audience, for this particular reason. Thus, I believe, only after students have gained a firm foundational understanding of a particular discipline, can this educational paradigm be most effective. The same should be taken into consideration when approaching the increasingly diverse landscape of music technology education.

 

 

 

Citations

Dougherty, D. (2012). The maker movement. Innovations: Technology, governance, globalization7(3), 11-14.

Pinch, T. J. (2001). Why do you go to a music store to buy a synthesizer: path dependence and the social construction of technology. Path dependence and creation, 381-400.

Hughes, A. (2018). Maker music: Incorporating the maker and hacker community into music technology education. Journal of Music, Technology & Education11(3), 287-300.

Halverson, E. R., & Sheridan, K. (2014). The maker movement in education. Harvard educational review84(4), 495-504.

Martinez, S. L., & Stager, G. S. (2013). Invent to learn: Making. Tinkering, and Engineering in the Classroom.