SlideShare Explore Search You. Submit Search. Successfully reported this slideshow. We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime. Multimodal Interaction. Upcoming SlideShare. Like this presentation? Why not share!
Multimodal Interaction: An Introduc Embed Size px. Start on. Show related SlideShares at end. WordPress Shortcode. Published in: Education. Full Name Comment goes here. Are you sure you want to Yes No.
IM2 Interactive Multimodal Information Management
Takuya Nishimura Thank you for your excellent slide. The messages flowing through the streams are time stamped at origin with an originating time that is carried downstream through the application pipeline. Consider, for instance, a simple pipeline that performs face tracking: video frames are captured by a camera component and sent to a component that converts the images to grayscale and then to a tracking component that produces face tracking results.
The video frames emitted by a camera component are time stamped with an originating time that corresponds to the moment the frame was captured. As the image is passed along to a grayscale component and to the face tracker, the same originating time is carried along with the resulting messages. In addition, each message carries the time when it was created. This mechanism gives all components in the pipeline access to information about latencies with respect to the real world for all messages.
Furthermore, these timestamps enable efficient scheduling as well as correct and reproducible synchronization. The temporal nature of the streams enables a time-algebra and set of synchronization primitives that simplify development. As a concrete example, suppose we want to correlate the face tracking results from the pipeline described above with sound source localization information to determine which one of multiple people present in front of a robot is talking.
Because the speech source identification component has access to the originating times of the messages on both incoming streams, it can pair and synchronize them, according to when the events actually happened in the world rather than according to when they arrived at the component. The runtime provides a stream join operator that enables reproducible synchronization, freeing up the developer from having to think through the intricacies of temporal reasoning. Other time-related primitives, like sampling and interpolation, are also available.
The runtime implements a scheduler that controls the execution of the various components in the pipeline by paying attention to the originating times of the messages arriving at the components and giving priority to the oldest ones that is, those with the earliest originating times. The developer has control over how messages flow through the streams via delivery policies that specify where and when it is OK to drop messages or that can describe throttling behaviors.
The programming model implemented by the runtime allows for developing components as if they were single-threaded. At runtime, it couples them via a streaming message-passing system that allows for concurrent execution of components while providing isolation and protecting state. This approach is made possible by an automatic deep cloning subsystem and frees the component developer from having to think through the intricacies of concurrent execution, simplifying development efforts. Typically, an initial prototype is constructed and deployed and components are iteratively refined and tuned based on the data collected with the running system.
Data and experimentation play a central role in this process. The runtime enables automatic persistence of the data flowing through the streams. The persistence mechanism is optimized for throughput and allows a developer to log in a unified manner all relevant data flowing through the application. Furthermore, because timing information is also persisted, data can be replayed from a store in a variety of ways, enabling experimentation scenarios. For instance, in the example described above, once the video and audio streams were captured, the developer can easily re-run the application opening these streams from a store rather than from the sensors, enabling the exploration of how tuning various parameters in the downstream components face tracking, sound source localization and so on might change the final results.
APIs and mechanisms for coupling multiple stores in larger datasets and operating over entire datasets are also available. The runtime enables parallel, coordinated computation in a single process or in a distributed fashion across multiple processes. In addition to the set of primitives and the core programming and execution model provided by the runtime, a number of specialized tools and APIs are available, further enabling and supporting the development process.
The framework includes a sophisticated visualization tool for temporal, multimodal data: Platform for Situated Intelligence. The tool allows for inspecting and visualizing the various streams persisted by a Platform for Situated Intelligence application. Multiple visualizers are available: timeline visualizers show various types of data over time for example, numerical, audio, speech recognition results and so on ; 2D and 3D instant visualizers can show the data corresponding to a certain time-point for example, images from a video stream.
The visualizers can be composited and overlaid in a variety of ways; for instance, the second panel in the video above shows an image stream visualizer overlaid with a visualizer for face tracking results. The tool enables temporal navigation with panning and zooming over time as well as working with datasets that encompass stores persisted from multiple runs of a system.
It enables both offline and live visualization; in live-mode, the tool can be invoked from or can connect to a running Platform for Situated Intelligence application and enables visualization of the live streams flowing through the application. Supported by the data replay abilities in the runtime, APIs are provided that enable developers to define datasets that wrap the data obtained from multiple runs of an application and process and analyze this data offline, generating relevant statistics, or running batch experiments.
The tools and APIs will include support for creating and manipulating datasets based on data logged by applications written using the platform, data annotation capabilities that are tightly integrated with visualization, support for feature engineering and development, integration for training with various ML frameworks and services, in-app evaluation and model deployment.
Ultimately, we believe lowering the barrier to entry for development of multimodal, integrative AI applications will rest to a large degree on creating an open, extensible, thriving ecosystem of reusable components. During the golden era of automata, music also served as a tool for understanding the human motor control while performing highly skilful tasks. The subject area of this book is inherently inter- and trans-disciplinary. Recent advances in a wide range of subject areas that contributed to the developments and possibilities as presented in this book include computer science, multimodal interfaces and processing, artificial intelligence, electronics, robotics, mechatronics and beyond.
Over recent decades, Computer Science research on musical performance issues has been very much active and intense. For example, computer-based expressive performance systems that are capable of transforming a symbolic musical score into an expressive musical performance considering time, sound and timbre deviations.
At the same time, recent technological advances in robot technology, music content processing, machine learning, and others are enabling robots to emulate the physical dynamics and motor dexterity of musicians while playing musical instruments and exhibit cognitive capabilities for musical collaboration with human players. Nowadays, the research on musical robots opens many opportunities to study different aspects of humans.
These include understanding human motor control, how humans create expressive music performances, finding effective ways of musical interaction, and their applications to education and entertainment. For several decades, researchers have been developing more natural interfaces for musical analysis and composition and robots for imitating musical performance. Robotics has long been a fascinating subject area, encompassing the dreams of science fiction and industry alike.
Recent progress is shifting the focus of robotics. Once it was confined to highly specialized industrial applications and now it infiltrates our everyday lives and living spaces. This book consists of a collection of scientific papers to highlight cutting edge research related to this interdisciplinary field, exploring musical activities, interactive multimodal systems and their interactions with robots to further enhance musical understanding, interpretation, performance, education and enjoyment.
It covers some of the most important ongoing interactive multimodal systems and robotics research topics. This book contains 14 carefully selected and reviewed contributions. From this, more advanced methods for the analysis, modeling and understanding of musical performance and novel interfaces for musical expression can be conceived. The second section concentrates on the development of automated instruments and anthropomorphic robots designed to study the human motor control from an engineering point of view, to better understand how to facilitate the human-robot interaction from a musical point of view and to propose novel ways of musical expression.
The idea for this book has been formed over several meetings during related conferences between the two editors, observing and involving the developments from a wider range of related research topics as highlighted here. We can still remember the initial discussions during the i-Maestro workshops and the International Computer Music Conferences.
We would like to take this opportunity to thank all authors and reviewers for their invaluable contributions and thanks to Dr. Thomas Ditzinger from the Springer for his kind support and invaluable insights over the development of this book. We are grateful to many people including our families for their support and understanding, institutions and funding bodies acknowledged in separate Chapters without whom this book would not be possible.
We hope you will enjoy this book and find it useful and exciting. Diana S. Dannenberg, H. Solis is the author and co-author of over technical papers for International Journals and Conferences. Kia Ng received his B. His research interests include interactive multimedia, gesture analysis, computer vision and computer music. Kia has also organised over 15 international events including conferences, exhibitions and a convention.
Kia is a chartered scientist, a fellow of the Royal Society of Arts and a fellow of the Institute of Directors. Web: www. However the greatest engineering challenge of developing a robot with human-like shape still required further technological advances. Thanks to the progress in many related subject areas including robot technology, artificial intelligence, computation power, and others, the first full-scale anthropomorphic robot, the Waseda Robot No.
Following this success, the first attempt at developing an anthropomorphic musical robot was carried out at the Waseda University in The Waseda Robot No. Kato argued that the artistic activity such as playing a keyboard instrument would require human-like intelligence and dexterity. The performance of any musical instrument is not well defined and far from a straightforward challenge due to the many different perspectives and subject areas.
State-of-the-art development of interactive multimodal systems provides advancements which enable enhanced human-machine interaction and novel possibilities for embodied robotic platforms. An idealized musical robot requires many different complex systems to work together; integrating musical representation, techniques, expressions, detailed analysis and control, for both playing and listening.
It also needs sensitive multimodal interactions within the context of a J. Solis and K. Ng Eds. Ng piece, interpretation and performance considerations, including: tradition, individualistic and stylistic issues, as well as interactions between performers, and the list grows. Due to the inherent interdisciplinary nature of the topic, this book is a collection of scientific papers intended to highlight cutting edge research related to these interdisciplinary fields, exploring musical activities interactive multimedia and multimodal systems and their interactions with robots, to further enhance musical understanding, interpretation, performance, education and enjoyment.
This book consists of 14 chapters with different key ideas, developments and innovations. These concepts and systems contribute to the basis of playing gesture and the understanding of musical instrument playing. Different systems and interfaces have been developed to measure, model and analyse musical performance.
Building on these advancements, further approaches for modeling, understanding and simulation of musical performance as well as novel interfaces for musical expression can be conceived. These Chapters also present a range of application scenarios including technology-enhanced learning. Furthermore: we can see people move to music in dancing, in walking, at concerts and in various everyday listening situations, making so-called soundaccompanying movements. Such common observations and more systematic research now converge in suggesting that sensations of body movement are indeed integral to musical experience as such.
This is the topic of Chapter 2 which includes an overview of current research within the field as well as an overview of various aspects of music-related body movement. This Chapter proposes that sound-movement relationships are manifest at the timescale of the chunk, meaning in excerpts in the approximately 0.
Focusing on sound-action chunks is useful because at this timescale we find many salient musical features: various rhythmical and textural patterns e. All these chunk-level musical features can be correlated with body movement features by carefully capturing and processing sound-producing and sound-accompanying movements as well as by extracting perceptually salient features from the sound. Needless to say, there are many technological and conceptual challenges in documenting such sound-action links, requiring good interdisciplinary teamwork with contributions from specialists in musicology, music perception, movement science, signal processing, machine learning and robotics.
This is investigated in Chapter 3 on audio analysis and transcription. Automatic music transcription is the process of analyzing a musical recorded signal, or a musical performance, and converting it into a symbolic notation or any equivalent representation concerning parameters such as pitch, onset time, duration and intensity.
It is one of the most challenging tasks in the field of Music Information Retrieval, and it is a problem of great interest for many fields and applications, from interactive music education to audio track recognition, music search on the Internet and via mobiles. This Chapter aims to analyze the evolution of music understanding algorithms and models from monophonic to polyphonic, showing and comparing the solutions.
Music transcription systems are typically based on two main tasks: the pitch estimation and note tracking associated to the retrieval of temporal information like onset times, note durations…. Many different techniques have been proposed to cope with these problems. For pitch estimation, the most recent approaches are often based on a joint analysis of the signal in the time-frequency domain, since simple spectral amplitude has revealed to be not sufficient to achieve satisfactory transcription accuracies. Many other models have been developed: auditory model based front ends, grouped in the Computational Auditory Scene Analysis, have been largely studied and applied in the 90s; however, the interest toward this approach has decreased.
The most used techniques in recent literature are: Nonnegative Matrix Factorization, Hidden Markov Models, Bayesian models, generative harmonic models and the use of jointed frequency and time information. Regarding temporal parameter information, the detection of note onsets and offsets is often devolved upon detecting rapid spectral energy over time. Techniques such as the phase-vocoder based functions, applied to audio spectrogram, seem to be more robust with respect to peak-picking algorithms performed upon the signal envelope.
A strongly relevant and critical aspect is represented by the evaluation models and methods of the performance of music transcription systems. It joins, in a cross-disciplinary perspective, theoretical and experimental findings from several disciplines, from psychology to biomechanics, computer science, social science, and the performing arts. This Chapter presents a historical survey of research on multimodal analysis of expressive gesture and of how such a research has been applied to music performance.
It introduces models, techniques, and interfaces developed in several research projects involving works carried out in the framework of the EyesWeb project, and provides an overview of topics and challenges for future research. Key results described in this Chapter include automatic systems that can classify gestures according to basic emotion categories e.
The chapter also discusses current research trends involving the social dimension of expressive gesture which is particularly important for group playing. Interesting topics include interaction between performers, between performers and conductor, between performers and audience. This chapter discusses the conceptualization and design of digital musical instruments DMIs.
While certain guiding principles may exist and be applied globally in the field of digital instrument design, the chapter seeks to demonstrate that design choice for DMIs depends on particular goals and constraints present in the problem domain. Approaches to instrument design in 3 different contexts are presented: application to new music performance; use within specialized medical imaging environments; and interaction with virtual musical instruments. Chapter 5 begins with a short discussion on the idea of tangibility in instrument design and how this aspect of human interfacing has been handled in the computer interaction community vs.
It then builds on Rasmussen's typology of human information processing, a framework that divides human control into several categories of behaviour, and discusses how these categories can be applied to various types of musical interaction. This Chapter presents three use-cases corresponding to different development areas. First is a description of the motivations for the design of the T-Sticks, a family of cylindrical, hand-held digital musical instruments intended for live performance. Choices of sensing, mapping, sound synthesis and performance techniques are discussed.
This guided the choice and integration of sensors, as well as the design of the instrument body. The Ballagumi provides a sound-controlling tool subjects can interact with inside a scanner to help neuroscientists learn about the brain during musical creation. Finally the idea of virtual DMIs and their interaction through haptic forcefeedback devices is described. A software tool for construction of virtual instruments based on rigid body simulation is introduced, which can connect to audio programming environments for use with real-time sound synthesis.
The physics governing the interaction between the bow and the string are such that the sound output alone does not uniquely determine the physical input used to produce it. Therefore, a sound recording alone may be viewed as an incomplete representation of the performance of a violin, viola, cello, or double bass. Furthermore, despite our detailed understanding of the physics of the bowed string family, until recently, the physical constraints of these instruments and the performance technique they require have prevented detailed study of the intricacies of live bowing technique.
Today, advancements in sensor technology now offer the ability to capture the richness of bowing gesture under realistic, unimpeded playing conditions. This Chapter reviews the significance of the primary bowing parameters of bow force, bow speed, and bow-bridge distance position along the length of the string and presents a measurement system for violin to accurately capture these parameters during realistic playing conditions.
This system uses inertial, force and position sensors for capturing right hand technique, and is optimized to be small, lightweight, portable and unobtrusive in realistic violin performances. Early investigations using this method elucidate the salient differences between standard bowing techniques, as well as reveal the diversity of individual players themselves. In addition to exploring how such studies may contribute to greater understanding of physical performance, a discussion of implications for gesture classification, virtual instrument development, performance archiving and bowed string acoustics is included.
It touches on one of the key requirements for an idealized musical robot to serve as a teacher or a classmate to support the learning. In order to understand the gesture of a player and to offer appropriate feedback or interactions, such a system would need to measure and analyze the movement of the instrumental playing.
There is a wide range of motion tracking technologies including sensor, video tracking and 3D motion capture mocap systems but this is not straightforward with traditional instruments such as the violin and cello. Currently, the majority of musical interfaces are mainly designed as tools for multimedia performance and laboratory analysis of musical gesture. However, exciting explorations in pedagogical applications have started to appear. This Chapter focuses on the i-Maestro 3D Augmented Mirror AMIR which utilizes 3D motion capture and sensor technologies to offer online and offline feedback for technology-enhanced learning for strings.
It provides a survey on related pedagogical applications and describes the use of a mirror metaphor to provide a 3D visualization interface design including motion trails to visualise shapes of bowing movement. Sonification is also applied to provide another modality of feedback. Learning to play an instrument is a physical activity. The technologies discussed here may be used to develop and enhance awareness of body gesture and posture and to avoid these problems. This technology can be used to capture a performance in greater detail than to a video recording and has the potential to assist both teachers and students in numerous ways.
A musical robot that can provide technology-enhanced learning with multimodal analysis and feedback such as those discussed in this chapter would be able to contribute to musical education. Example works in this context such as the systems as described in Section II have proved beneficial. It will not only motivate interests and inspire learning for learner but may also provide critical analysis for professional performers. While this subject has been researched since the beginning of electronic and computer music, nowadays the wide availability of cost-effective and miniature sensors creates unprecedented new opportunities for such applications.
Nevertheless the current software tools available to handle complex gesture-sounds interactions remain limited. The framework we present aims to complement standard practices in gesture-sound mapping, emphasizing particularly the role of time morphology, which seems too often neglected.
Our gesture analysis is divided into two stages to clearly separate lowlevel processing that is specific to the sensor interface and high-level processing performed on temporal profiles. This high-level processing is based on a tool that we specifically developed for the analysis of temporal data in real-time, called the gesture follower. It is based on machine learning techniques, comparing the incoming dataflow with stored templates.
This Chapter introduces the notion of temporal mapping, as opposed to spatial mapping, to insist on the temporal aspects of the relationship between gesture, sound and musical structures. The general framework can be applied to numerous data types, from movement sensing systems, sensors or sounds descriptors. The Chapter discusses a typical scenario experimented in music and dance performances, and installations.
The authors believe that the methodology proposed can be applied with many other different paradigms and open a large field of experimentation which is currently being pursed. It addresses the fundamental concepts for mimicking the performance and interactions of musicians. In particular, the evaluation of the technical and performance aspects of the proposed automatic instrument are stressed. Undoubtedly, the performance evaluation is not an easy task. In fact, it is rather difficult to evaluate the degree of excellence i.
Moreover, music is an expression of sound in time. Although a sequence of musical notes is arranged in a piece of music by time series, the implicit relationship between sound and time depends on the performer. Of course, if sound and time are not appropriately correlated, the musical expressivity cannot be displayed. There are relatively few woodwind robots, and experience has shown that controlling the air is always a critical and difficult problem.
The bagpipe literature is unclear about the air regulation requirements for successful playing, so this work offers some new insights. McBlare shows that bagpipes are playable over the entire range with constant pressure, although the range of acceptable pressure is narrow and depends upon the reed.
The finger mechanism of McBlare is based on electro-mechanical relay coils, which are very fast, compact enough to mount adjacent to tone holes, and inexpensive. This shows that the mechanics for woodwinds need not always involve complex linkages. One motivation for building robotic instruments is to explore new control methods. This opens new possibilities for composers and performers, and leads to new music that could not be created by human player.
The chapter describes several modes of real-time gestural control that have been implemented. McBlare has been presented publicly at international festivals and conferences, playing both traditional bagpipe music and new compositions created especially for McBlare.
In order to build a robot that can produce good sounds and perform expressively, it is important to realize a tight human-robot interaction. Therefore, one of the purposes of this chapter is to develop an anthropomorphic violin playing robot that can perform expressive musical sounds. In this chapter, an anthropomorphic human sized manipulator for bowing is introduced. Also, interesting mechanisms of the left hand for fingering with three fingers are introduced. Although the robot is still under construction, both the right arm and the left hand will be connected and produce expressive sounds in the near future.
The other purpose of this chapter is introduction and analysis of kansei in violin-playing. Recently many Japanese researchers in various fields such as robotics, human-machine interface, psychology, sociology, and so on, are focusing on kansei.
Musical Robots and Interactive Multimodal Systems
However, there is no research on musical robots considering kansei at the moment. To develop a robot that can understand and express human kansei is also 1 Musical Robots and Interactive Multimodal Systems: An Introduction 9 very important for smooth human-robot communication. For this purpose, kansei is defined and an information flow from musical notes to musical sounds including kansei is proposed.
Based on the flow, some analyses of human violinplaying were carried out and one of those results is discussed. In the first part of this Chapter, an overview of the development of fluteplaying robots is briefly introduced and the details of the development of an anthropomorphic flutist robot are given. This research is focused on enabling the robot to play a flute by accurately controlling the air beam parameters width, angle, velocity and length by mechanically reproducing the following organs: lungs, lips, vocal cord, tongue, arms, fingers, neck, and eyes.
All the mechanically simulated organs are controlled by means of an auditory feedback controller. As a result, the developed flutist robot is capable of playing the flute to the level of proficiency comparable to that of an intermediate flute player. In the later part, an overview of the development of saxophone-playing robots is also introduced and the details on the development of an anthropomorphic saxophonist robot are given.
This research is focused on enabling the robot to play an alto saxophone by accurately controlling the air pressure and vibration of the single reed. For this purpose, the following organs were mechanically reproduced: lungs, lips, tongue, arms and fingers. All the mechanically simulated organs are controlled by means of a pressure-pitch feed-forward controller.
As a result, the developed saxophonist robot is capable of playing the saxophone to the level of proficiency comparable to that of a beginner saxophone player. However, one of the main novelties was completing the loop and fusing all three of these areas together. The work in this chapter presents research on how to build such a system in the specific genre of musical applications.
The body of work described in this chapter is truly an artistic 10 J. Ng venture calling on knowledge from a variety of engineering disciplines, musical traditions, and philosophical practices. Much of the research in the area of computer music has primarily been based on Western music theory. This chapter fully delves into applying the algorithms developed in the context of North Indian classical music. Most of the key contributions of this research are based on exploring the blending of both these worlds.
The goal of the work is to preserve and extend North Indian musical performance using state of the art technology including multimodal sensor systems, machine learning and robotics. The process of achieving our goal involved strong laboratory practice with regimented experiments with large data sets, as well as a series of concert performances showing how the technology can be used on stage to make new music, extending the tradition of Hindustani music.
Shimon represents a major step forward in both robotic musicianship and interactive improvisation. In contrast, Shimon plays a melodic instrument, has four, instead of two percussion actuators, and is able to present a much larger gestural and musical range than previous robots. Shimon employs a unique motor control system, taking into account the special requirements of a performing robotic musician: dynamic range, expressive movements, speed, and safety.
To achieve these aims, the system uses physical simulation, cartoon animation techniques, and empirical modeling of actuator movements. The robot is also interactive, meaning that it listens to a human musician play a live show, and improvises in real-time jointly with the human counterpart.
In order to solve this seeming paradox—being both responsive and real-time, Shimon uses a novel anticipatory approach and uses a gesture-based method to music viewing visual performance in the form of movement and music generation as parts of the same core process. A traditional interactive music system abstracts the musical information away from its physical source and then translates it back to movements. By taking an embodied approach the movement and the music of the robot are one, making the stage performance a complete experience for both other musicians and the audience.
Embodied cognition is gaining popularity both in the field of cognitive psychology and in that of artificial intelligence.
However, this is the first time that such an approach has been used for robotic musicianship. The authors evaluate their system in a number of human-subject studies, testing how robotic presence affects synchronization with musicians, as well as the audience's appreciation of the duo. The findings show a promising path to the better understanding of the role of the physical robot's "body" in the field of computer-generated interactive musicianship.
However, Humanoid Robots are mainly equipped with sensors that allow them to acquire information about their environment. Based on the anthropomorphic design of humanoid robots, it is therefore important to emulate two of the human's most important perceptual organs: the eyes and the ears. For this purpose, the humanoid robot integrates vision sensors in its head and aural sensors attached to the sides for stereo-acoustic perception. In the case of a musical interaction, a major part of the typical performance is based on improvisation.
In these parts musicians take turns in playing solos based on the harmonies and rhythmical structure of the piece. Upon finishing his solo section, one musician will give a visual signal, a motion of the body or his instrument, to designate the next soloist. After both musicians get used to each other, they may musically interact. In this chapter, toward enabling the multimodal interaction between the musician and musical robot, the Musical-based Interaction System MbIS is introduced and described. The proposed MbIS is composed by two levels of interaction that enables partners with different musical skill levels to interact with the robot.
In order to verify the capabilities of using the MbIS, a set of experiments were carried out to verify the interactive capabilities of an anthropomorphic flute robot introduced in Section II, Chapter 5. Will this produce a robotic musician capable of playing a musical instrument with expression, interacting with co-performers or teaching how to play the instrument? Obviously, there are many more layers and complex interactions between the systems and many more challenging research avenues, including interpretations, imitations, expressions, interactions and beyond.
At the time of writing, there remain a range of scientific research and technical issues e. Ng playing of several different instruments i. There are many more qualities and features to be explored. For example, it would be an important feature for future musical robots to be able to improve their own musical performance by analyzing the sound produced by its own and by listening and comparing with co-performers human or other robots with some kind of automated musical learning strategy. Currently, several research projects are focusing on producing robust computational models of music communication behavior e.
Musical appreciation, expression and development are inherently interdisciplinary involving many different aspects of experiences and interactions. To further advance musicianship for robots, there are a wide range of relevant and interrelated subject areas including emotional research, music appreciation quantification, integration of music and dance, and many others. Developments in entertainment robotics have been increasing for the past few years and fundamental issues for humanrobot musical interaction are starting to be addressed.
It would be exciting to explore novel methods for the quantification of music appreciation to assure the effectiveness of the interaction.
Passar bra ihop
The combination of music with other performing arts i. The list of related topics mentioned above is far from complete and the list continues to grow over time. This is an exciting time! We have seen many scientific and technological developments that have transformed our life on many different levels. With the continuing advancements such as those discussed in this book, we look forward to continuing research and development to realize a musical robot who is capable of different musical skills and musicianship to play and teach music that can be applied in many different application scenarios to enrich our life with the expression of music.
One core issue in music production and perception is the relationship between sound features and action features. From various recent research, it seems reasonable to claim that most people, regardless of levels of musical expertise, have fairly good knowledge of the relationship between sound and sound production, as e. The challenge now is to explore these sound-action links further, in particular at the micro-levels of musical sound such as in timbre and texture, and at the meso-levels of various rhythmical and contoural features. As suggested by the seminal work of Pierre Schaeffer and co-workers on so-called sonic objects several decades ago, perceptually salient features can be found on the chunk-level in music, meaning in fragments of sound-action in the approximately 0.
In this chapter, research on the emergence of sound-action chunks and their features will be presented together with some ideas for practical applications. This may be an obvious observation, yet it is far from trivial when we consider the consequences of this dissociation of sound and action for the design and use of electronic musical instruments: loosing the link between action and sound also makes us loose one of the most important mental schemas for how we conceive of, and perceive, musical sound.
One possible answer to this challenge is to try to correlate the input data of whatever synthesis model is used e. Such labels on the output sound signal may be extended further into the metaphorical realm with attributes such as 'wet', 'dry', 'smooth', 'rough', 'dark', 'bright', etc. However, another and complimentary answer to this challenge is to extrapolate mental schemas of sound-action relationships from our past and presumably massive experiences of non-electronic musical instruments as well as everyday sonic environments, to new electronic instruments.
It could in particular be interesting to see how what we perceive as somehow meaningful sound-action units, what could be called sound-action chunks, emerge in our experiences of music, and how general principles for sound-action chunk formation may be applied to novel means for producing musical sound.
Actually, relating what we hear to mental images of how we assume what we hear is produced, is a basic phenomenon of auditory perception and cognition in general. In the field of ecological acoustics, it has been documented that listeners usually have quite accurate notions of how everyday and musical sounds are produced [9, 31], including both the types of actions e. As to the perception of the actions involved in sound production, the claim of the so-called motor theory and various variants of this theory has for several decades been that auditory perception, primarily in speech but also in other areas, is accompanied by mental simulations of the assumed sound-producing actions: when listening to language, there is a mostly covert simulation of the phonological gestures assumed to be at the cause of the sounds [6, 26].
And it has been documented that when people listen to music, there are similar activations of the motor-related areas of the brain, in particular in the case of expert musicians , but also in the case of novice musicians after a quite short period of training . Dependent on level of expertise, these motor images of sound production may vary in acuity: a native speaker of a language, e. Chinese, will usually have much finer motor images of the various sounds of the Chinese language than a foreigner, yet the foreigner will be able to perceive, albeit coarsely, the difference between the phonological gestures of Chinese and another unfamiliar language, e.
Likewise in music, we may see significant differences in the acuity of motor images of sound production between novices and experts, yet there are still correspondences in the motor images at overall or coarse levels of acuity . According to the motor theory of perception, motor images are fairly robust yet also flexible because they are based on very general motor schemas, e.
These main classes are quite distinct, yet applicable to very many variant instances see section 2. In the so-called embodied view of perception and cognition [7, 8], motor schemas are seen as basic for all cognition, not only auditory perception. This means that all perception and reasoning, even rather abstract thinking, is understood as related to images of action. In our case, projecting images of sound-action relationships from past musical and environmental sonic experiences onto new musical instruments could be seen as a case of anthropomorphic, know-to-unknown, projection, and as a matter of basic functioning of our mental apparatus, what we see as a motormimetic element in music perception .
This includes both kinematic and effort-related images, raising some intriguing questions of how electronic instrument sounds, i. For this reason, we shall now first have a look at some main types of music-related actions, then consider features of musical sound at different timescales, before we turn to principles of chunking and various sonic features within chunks, and at the end of the chapter present some ideas on how concepts of sound-action chunks can be put to use in practical contexts.
As suggested in  it could be useful then to initially distinguish between sound-producing and sound-accompanying actions.
Sound-producing actions include both excitatory actions such as hitting, stroking, blowing, and modulatory or sound-modifying actions such as changing the pitch with the left hand on a string instrument or the mute position on a brass instrument, as well as in some cases selection actions such as pulling the stops on an organ.
Related to this, we find various communicative actions that performers use variously to communicate within an ensemble or to communicate expressive intent to the audience, such as swaying the body, nodding the head, or making exaggerated or theatrical hand movements on the guitar before a downbeat [3, 38]. Sound-accompanying actions include all kinds of body movements that listeners may make to music, such as moving the whole body or parts of the body to the beat of the music, gesticulate to some feature of the music, or imitate soundproducing actions of the music.
This latter category can be seen in various instances of so-called air-instrument performance such as air-drums, air-guitar, and air-piano. Imitations of sound-producing actions are interesting in our context because they in many cases attest to quite extensive knowledge of sound production even by untrained listeners. Such cases of air instrument performances 16 R. Another important element is the abovementioned variable acuity involved here: people who are not capable of playing percussion or guitar 'for real' still seem to have coarse, yet quite well informed, notions of how to hit and move between the drums and other instruments of an imaginary drum set, or of how to pluck the strings and move the left hand on the neck of an imaginary guitar.
But also on a more general level, listeners seem to readily catch on to the kinematics of sound-production, e. Such general kinematic and dynamic correspondences between sound and movement can also be observed in dancers' spontaneous movements to musical sound, i. The kinematic and effort-related action images afforded by the sounds can then be correlated with some basic sound-producing actions. Sometimes also referred to as ballistic movement, impulsive sound-producing actions are typically based on so-called open loop motor control, meaning that the entire movement trajectory is preplanned and executed without feedback because it is so fast that continuous control and adjustment in the course of the trajectory would usually not be feasible, however this has been much debated in the motor control literature .