Enrico Caruso, Alfred Tomatis and modern neurobiology: prosocial listening, steady heartbeat and successful communication

Alfred Tomatis, a French ear, nose and throat physician, intensively explored the connections between hearing, voice and mind between 1950 and 1960. The great, golden voice of the 20th century, Enrico Caruso, and the analysis of his vocals played an important role in this. Tomatis came to the conclusion that listening to medium and high tones makes the voice sound fuller and clearer. Furthermore, the “desire to listen” and a successful communication also emerge on the psychological level. Current neurobiological findings, in particular the Polyvagal Theory proposed by American researcher Stephen Porges, verify this statement and 2 extend its meaning: listening, secure attachment and a peaceful state of the unconscious nervous system go hand in hand. A special part of the vagus nerve, the “great calming nerve” or the parasympathetic nervous system, is of central importance. Its activation promotes an attentive attitude towards ourselves and others.

Tomatis’ Laws and Enrico Caruso’s voice

Dr. Tomatis worked in the 1950s as a classic ear, nose and throat physician in Paris. His father, an opera singer, referred him opera colleagues with vocal problems. Through conventional tests, a laryngoscopy for example, Tomatis discovered that there was no tangible physical cause explaining the voice disorders of some of these singers. An unusual thought (for that time) crossed his mind: he decided to examine the frequency spectrum of their voice and then compare it with their hearing. The concordance was surprising: tones that sounded reduced were also missing in their voice [1]. These findings were formulated in the first Tomatis law and were later confirmed by other searchers: the voice contains frequencies of overtones only as are equivalently perceived by the ear. For example: the average deterioration of older people’s hearing ability with regard to high frequencies. As a result, the overtones of their voice are reduced. Tomatis built a device with amplifiers and filters. Singers with voice disorders sang into a microphone. Tomatis then amplified the frequencies that were not well heard and let the singers listen to their now improved voice through headphones in real time. The result was that the previously reduced or missing tones immediately reappeared in their voice. These experiments verified the feedback loop between hearing and voice, and resulted in the second Tomatis law: if a person is given the opportunity to normally hear frequencies that were not well heard anymore, these frequencies immediately and unconsciously appear again in the voice. Based on these initial studies, a therapy device meant to condition the hearing was developed: the “Electronic Ear”.

Tomatis then wondered how to achieve an “ideal” hearing for the voice and in the broader sense also for communication. He analyzed Enrico Caruso’s voice, which could particularly touch many people’s hearts. During mass screenings in the 1940s, it had been established that the human ear is most sensitive to the tonal range between 1000 and 5000 Hz [2]. Very deep and very high tones have to be significantly emphasized so that the human ear can perceive them at all. Tomatis realized that Caruso’s hearing emphasized these medium and high tones to a far greater extent. The above-described feedback loop between hearing and voice allowed Caruso to bring forth this extraordinary harmonic richness, the special charisma of his voice.

How did Caruso (or his body) and his brain accomplish this special emphasis of high frequencies? Two small muscles which unconsciously regulate the eardrum’s tension play an important role in Tomatis’ explanation (Fig. 1). Many people don’t know about these muscles and yet they use them all the time. For example, if we suddenly hear a loud noise, the clinking sound of glass and then our child crying, we will be immediately wide awake, our attention will be directed outwards, our heartbeat will be accelerated and we will jump up to determine the cause of these events.


Fig. 1. Middle ear muscles and transmission of sound waves to the inner ear and hair cells.

In classical theory, the ANS consists of two large branches which are antipodal and successively triggered in this situation: the calming parasympathetic and the activating sympathetic. The sympathetic nervous system is the stress mediator that triggers interest, desire and curiosity. These muscles, controlled by the autonomic nervous system (ANS), then tense the eardrum, i.e. the ears are “pricked up”. Similarly to the stretched membrane of a drum or timpani, higher tones are now better transmitted to the inner ear and brain. We become attentive, awake and alert. Moreover, our heartbeat and respiratory rhythm then become perfectly synchronized and the face takes on a lively tension. Decades after Alfred Tomatis’ laws, this unconscious response of our ear muscles and nervous system was confirmed by the Polyvagal Theory, result of American neurophysiologist Stephen Porges’ research [3].

Bessel van der Kolk, a well-known trauma therapist, on the importance of the Polyvagal Theory: “It is an extraordinary experience to listen to a new piece of music or a new scientific idea. Something that […] switches on not only a light, but a whole gallery of lights in our mind, forever changing our understanding of the meaning and purpose of life.” Van der Kolk includes Porges’ work from 1999, in which he presented the Polyvagal Theory [3, p. 11]. His research revolutionized old theories on the functions of the ANS and supported the views of many other neurobiologists: our nervous system is focused on communication and contact. We are social beings.

Autonomic nervous system and the classical view of the duality between parasympathetic and sympathetic

On an evolutionary level, the ANS is an older part of our nervous system. It lies far below the brain and spinal cord and its fine nerve branches reach all internal organs as well as the skin. Thanks to the ANS, when we get up in the morning, we do not need to consciously adjust our body temperature to the ambient air or check how much stomach acid we need to digest our breakfast. The intensity and focus of our attention as well as the blood flow in our skin and muscles are controlled by the ANS. For example: on a Sunday afternoon, we doze off on the couch. Our attention is low and unfocused, it revolves around ourselves and our body. Our heat shifts through body and mind in times of excitement, when the brain recognizes a threat, an insecure situation or an unsolved problem, like a loud clinking noise or a crying child from the example above. There are two basal options: fight or escape. Its name means “sympathy, compassion” [4]. Feelings and emotions were originally mainly associated with the sympathetic nervous system [5]. Complementarily, the calming parasympathetic nervous system is always activated when the brain recognizes that the situation is peaceful and secure, on the inside and on the outside. Even Darwin explored the function of the parasympathetic nervous system [6]. In the classical view, the parasympathetic nervous system mainly regulates the digestive system, calms the heartbeat and the breathing, and remains active during our sleep. To put it pointedly, this view of the ANS in the biological nature of man encompasses essentially two options: digest and sleep, or fight and escape.

Over the past few decades, the significance of some portions of the parasympathetic nervous system with regard to the attentive perception of our body and also to communication with other people has been increasingly emphasized. The main nerve of the parasympathetic nervous system, the vagus nerve, is particularly important. Its name refers to an “outflow”: it leaves the brain at about the height of the ears, branches out towards the eardrum, wanders downwards along both sides of the esophagus and branches out towards the larynx to supply the vocal folds. Therefore, we speak and sing thanks to the vagus nerve. It then branches out towards the heart and lungs, and slides along both sides of the stomach into the abdominal cavity to control large parts of the intestine. It is also referred to as the “great, calming nerve” [7] because it calms the heartbeat and respiratory rhythm and is also responsible for the coordination between heartbeat and breathing. Breathing makes our heartbeat fluctuate naturally: when inhaling, it accelerates somewhat; when exhaling, it becomes slightly slower. This phenomenon can be compared with two reins that allow the rhythm of our heart to follow an ebb and flow. This fluctuation of heartbeat and respiratory rhythm plays a role in the heart rate variability. It shows the state of our health: during a 4 depression, the heart rate variability is reduced, the reins are figuratively pulled close together in case of constant stress [8]. This could also explain why the risk of getting a heart attack increases during a depression [9].

Thus, a highly active vagus nerve is a sign of good health. The body is in a ground state of security and peace. A sudden, abrupt and very strong decrease of the heartbeat, a life-threatening bradycardia, can sometimes happen, but only very rarely. This neurogenic (nervous) shock is triggered by the vagus nerve. Why is a high activity of the vagus nerve with a fluctuating heart rate variability a sign of health in some cases, while in other cases it can be life-threatening like a neurogenic shock (bradycardia)? This is also known as the “vagal paradox” [3, p. 23-25] and could not be explained properly until a few years ago.

Another physical reaction scheme was a puzzle until a few years ago: we have described above how the parasympathetic nervous system puts the body into a state of relaxation and calm alertness. If a problem arises, the sympathetic nervous system is switched on, with its basal reaction patterns consisting in fight or escape. Time and again, people unfortunately experience situations that appear hopeless with regard to intensity and duration. These include experiences of violence during war, sexual violence or accidents. On a physical level, it frequently becomes a freeze response with maximal internal excitement. It is the last natural option given to animals and humans. An antelope running away from a tiger plays dead when it sees no other way out [10]. This also occurs to people suffering from mental trauma. It thereby typically becomes dissociation, i.e. a part of the soul hides in a place inside where it is no longer affected by events.

How is this reaction to trauma provoked on a biological level? It is obviously quite different from the sympathicotonic activation, although it resembles externally the parasympathetic calm and is linked to extreme tension and excitement.

The Polyvagal Theory

The Polyvagal Theory, made public by Stephen Porges in the late 1990s, provides a fundamental explanation for these phenomena.

In each vagus nerve, there are two parts: an older and a younger one, on an evolutionary level, with their own nerve nuclei in the brain. The older part is called the dorsal nucleus (dorsal = back), the younger part is called the ventral branch (ventral = abdomen, anterior).

In evolution, the older dorsal nucleus originated in reptiles. Reptiles have a thick skin. When confronted with new stimuli, there is a moment of pause, or freeze, before they orient themselves and respond. In this moment of orientating freeze, their nervous system slows the heartbeat via the dorsal nucleus to save oxygen. This reaction is thus adaptive propitious for reptiles. With mammals, the situation is completely different: they are deft and usually inferior to reptiles on a level of strength. They survived through rapid response and had to learn to orient themselves and react quickly. Moreover, in relation to body size, their oxygen consumption is five times higher than reptiles. The younger ventral branch originated with mammals. The ventral branch of mammals also calms the heartbeat; while orienting themselves or facing danger, this “vagal brake” on the heart slackens (the other way around in reptiles).

Skin became soft and sensitive with mammals. Feelings became differentiated and vocal communication became more important. Mammals do not just want to sit in the sun and eat (like reptiles). On top of territorial defense and hunting, they also want to snuggle up, rub against each other and play. Social communication became increasingly important. With these changes, a functional network of nerves enabling social communication developed in the brain. Porges called this neural circuit the “Social Engagement System” (SES) [3, p. 86]. The SES includes five of the twelve cranial nerves, including the fifth cranial nerve (trigeminal nerve) that supplies the facial skin and the jaw muscles, as well as those that unconsciously tense the eardrum. The trigeminal nerve allows us to feel when someone touches our face. The SES also includes the seventh cranial nerve (facial nerve), which allows us to move our facial muscles and controls the muscles of facial expression. It also controls the second muscle behind the eardrum, the stapedius muscle. (An extended approach of the importance of these two middle ear muscles according to Tomatis and Porges is presented below.) The third nerve composing the SES is the ninth cranial nerve, the glossopharyngeal nerve, which allows us to swallow and move the muscles of our oral cavity. Thus, it is important for eating and making sounds. The fourth nerve of the SES is the tenth cranial nerve, the vagus nerve, already mentioned earlier. The ventral branch of mammals is important for the SES. Its activation calms the heartbeat and breath. It branches out to the vocal folds, and thus allows us to model our voice 5 according to our inner feelings. Finally, the eleventh cranial nerve, the accessory nerve, which allows us to turn our head and thus rapidly respond to auditory or visual stimuli.

The five nerves of the SES thus control all muscles that are important for verbal and vocal communication. With the Polyvagal Theory, the classical duality between parasympathetic and sympathetic becomes a trio that plausibly explains the contradictions specified above (vagal paradox and traumatic freeze response) on a biological level and also draws a more differentiated and amiable picture of the human and animal nature.

Our brain distinguishes three types of emotional interaction with the world:

  • i) security and well-being;
  • ii) serious problem, fear and danger;
  • iii) fear of death, helplessness and hopelessness.

First type: the ventral branch and the SES are active to create the feeling of safety and security. The body is in a state of relaxed alertness. The skin is well supplied with blood. The facial tension is correct and ready to express lively, joyful feelings. The heartbeat is calm and fluctuates according to the breath. The vocal cords are ready to tense and be used.

This Social Engagement System, with its five nerves and associated muscles, is functional at birth; however, most other body muscles branch out their nerves only much later. After a natural birth, a newborn baby can turn its head after just a few hours. It recognizes its mother’s voice and can show its feelings with its own voice, whether it is satisfied or wants to call attention to its grievances by screaming. It can suck, swallow and feel things. The motivity of its arms and legs fully develops only much later. It will take the child a few more months to roll over, crawl or walk. After birth, we are focused, on a neurobiological level, not only on eating and sleeping, but also on communication and contact.

Second type: our brain reacts to fear, danger and unresolved problems. Thus, the activity of the ventral branch and the SES is weakened, and the sympathetic nervous system is activated. If the sympathetic nervous system, with its possibilities consisting in fight or escape, does not lead to a solution or if the internal excitation and perception of threat begin to become unbearable, the third type of emotional interaction activates itself.

In the third case, the body reverts to the evolutionarily oldest component of the ANS, the dorsal nucleus. Like with reptiles, while orientating itself, it freezes, plays dead, with an abrupt decrease of heart rate and blood pressure. This state is associated with maximal excitation and, simultaneously, a distance to feelings, a dissociation.

Under normal circumstances, the dorsal nucleus is slowed down by the sympathetic nervous system, which is in turn slowed down by the ventral branch. Thus, the evolutionarily younger system always keeps a check on the older one. The older system steps into the breach only in case of emergency, so to speak (principle of Jackson [11]).

However, when the body is in a normal state, the dorsal nucleus also functions in humans. It controls the organs under the diaphragm, such as the stomach and intestines. Thus, a “healthy” dorsal nucleus controls the digestive organs and is more active during sleep.

Polyvagal Theory, listening, Tomatis and Caruso

Let us come back to Caruso’s golden voice and Dr. Tomatis’ studies on the connections between voice, mind and hearing. Hearing medium and high tones generates an extraordinary harmonic richness in the voice, and it is emphasized by the tension of the middle ear muscles. It is mostly the ANS that unconsciously controls the middle ear muscles. How is this action classified according to the Polyvagal Theory?

With mammals and humans, listening is classified under communication. In principle, the hearing process is as follows: sound waves are intercepted by the ear canals and directed towards the eardrum. The eardrum with the periotic bone (composed of three bones) is part of the air-filled middle ear. These three bones transfer pressure waves to the inner ear through a small membrane. The salty liquid in the snail-shaped inner ear that makes 2.5 turns contains hair cells. The pressure wave passes through the inner ear and bounces on to the hair cells. The hair cell responds to this wave with an electric discharge that is transmitted to the brain through the auditory nerve. In the middle ear, there are two middle ear muscles (already mentioned before): the tensor tympani muscle and the stapedius muscle. The middle ear is a kind of lever mechanism, which increases the sound pressure level approximately thirty times. It is regulated by these 6 two muscles. When they tense, especially the stapedius muscle, they protect the inner ear and hair cells from excessive volume, mostly from deeper tones (called the stapedius or acoustic reflex). The tensor tympani muscle sensitizes the sound conduction by tautening the eardrum.

In the classical view of ear, nose and throat medicine, the regulation of these muscles is mostly the result of a reflex, like if you put your hand on a hot stove and it withdraws reflexively, even before you realize that the stove was hot. According to Tomatis, muscles are also significantly influenced by emotional processes [1, 12]. If our emotions in the brain say “I want to hear this voice because it sounds so melodious and warm”, the middle ear muscles unconsciously contract. The frequency components of the human voice lie in the middle and higher tonal range. Tensing the middle ear muscles and eardrum amplifies and transmits these medium and high tones to the inner ear. The voice stands out amongst deeper ambient noise. (This explains our capacity to hear (when our hearing ability is normal) the voice of our interlocutor despite a wild babble of voices. Our stomach and brain say “yes” to the other person’s voice and our middle ear muscles tense correctly.)

Stephen Porges explicitly assigns the function of the middle ear muscles in this connection. They are part of the SES. Together with the ventral branch, they always tense when our brain feels that we are in a state of security and well-being. Related to the child’s development, this means that the tensing of our middle ear muscles is most likely associated with an experience of secure attachment. In this sense, Caruso’s voice touches us not only because of its richness and exceptional timbre, but also because our ANS assigns it to a state of comforting safety and ebullient joy, qualities of the ventral branch.

In the systemic listening therapy, based on Dr. Tomatis’ research, we conduct special hearing tests, a special type of audiogram. In these hearing tests, the sensitivity of a person’s hearing is determined for a variety of tones. If the hearing test shows a special emphasis on the medium and high tones described above, it is a sign of a good voice command as well as good interpersonal skills and a secure attachment. During the listening therapy adapted to their hearing tests, clients listen to Mozart’s music through special headphones. The music is altered by a sound converter. A fundamental principle of the listening therapy is that the socalled “noise gate” generates a sort of “musical micro-gymnastics” for the middle ear muscles. Thus, the emphasis of the music is constantly either on low frequencies or on high frequencies. The emphasis on low frequencies is relaxing for the middle ear muscles. When the eardrum is somewhat relaxed, deep tones are better perceived. Patients usually react to this change by relaxing deeply. The music then switches again, which stimulates the tautening of the middle ear muscles. High notes are better transmitted; the patients typically become more alert.

Based on the Polyvagal Theory, it can be assumed that the noise gate generates a pulsating activation, especially of the ventral branch. The noise gate’s emphasis on low frequencies thus activates the ventral, social branch. The emphasis on low frequencies stimulates the “good” dorsal nucleus, which mediates a sense of security and “loving peace”. This statement is very important because many other recent studies indicate that an activation of the vagus, in particular the ventral branch, enhances an attitude of attentiveness and meditation. The “musical micro-gymnastics” of the noise gate is so much more than just a muscular training. It acts on the ANS and can lead to an experience of peace, letting go and well-being. This can be used, very successfully so, with patients experiencing a burnout, anxiety, depression or tinnitus, precisely on the physical aspects of these disorders. For example: the heart’s “reins”, too tight during a depression, that can be relaxed through the listening therapy. If these reins are loosened again and recover an individual rhythm of activity and rest, then force, timbre and joy can reenter a tense or fragile voice. By finding oneself again, it is also possible to find others.


1 Tomatis A.A.: L’oreille et la vie. Itinéraire d’une recherche sur l’audition, la langue et la communication. Paris, Robert Laffon, 1977. (English: The Ear and the Voice. Scarecrow Press, 2004.)

2 Dieroff H.G.: Lärmschwerhörigkeit. (Noise-induced hearing loss) Jena, Gustav Fischer, 1994, p 46.

3 Porges S: Die Polyvagal-Theorie: Neurophysiologische Grundlagen der Therapie. (The Polyvagal Theory: Neurophysiological Foundations of Therapy) Paderborn, Junfermann Verlag, 2010.

4 Sympathetic nervous system. http://en.wikipedia.org/wiki/Sympathetic_nervous_system

5 Cannon W.B.: The mechanism of emotional disturbance of bodily functions. The New England Journal of Medicine 1928;198:877-884.

6 Darwin C.R.: The Expression of the Emotions in Man and Animals. London, John Murray, 1872.

7 Schnack G.: Der Grosse Ruhe-Nerv. 7 Sofort-Hilfen gegen Stress und Burnout. (The calming parasympathetic nervous system. 7 immediate tips against stress and burnout.) Freiburg i.Br., Kreuz, 2012.

8 Bauer J.: Das Gedächtnis des Körpers. Wie Beziehungen und Lebensstile unsere Gene steuern. (Body memory. How relationships and lifestyles control our genes.) München, Piper, 2008.

9 Kreuzer P.M., Vielsmeier V., Langguth B.: Chronic tinnitus: an interdisciplinary challenge. Dtsch Arztebl Int. 2013;110(16):278–284.

10 Levine P.A.: Waking the Tiger: Healing Trauma. Berkeley, CA, North Atlantic Books, 1997.

11 Jackson J.H.: Evolution and dissolution of the nervous system; in Taylor J (ed): Selected Writings of John Hughlings Jackson, vol 2. New York, Basic Books, 1958, pp 45–75.

12 Tomatis A.A.: La nuit utérine. Paris, Editions Stock, 1981.