Alexa is many things to many people. Now she’s become, of all things, a mirror. Amazon just announced her latest skill, one that will close the emotional feedback loop for tens of millions of smart speaker users: Alexa can now respond to us with a variety of tones (excited, say, or depressed). This can provide an opportunity to reflect or influence our own emotions. Sensing and expressing emotional affect through voice will make human-to-AI conversations feel less robotic, but it’s also a critical ingredient for voice interactions that support healthcare, transportation, hospitality, retail, and more.
Mirroring or Countering Emotions?
Imagine chatting with your personal assistant in your car late at night. You’re driving back from New York to Boston and the Connecticut Turnpike is a bore. When I led the Innovation Center at Viant, we built a prototype called “Soundtrack for Your Life.” This was a virtual DJ that knew your tastes, could recognize your context, knew who you were with, and sensed your emotional state. Playing music that everyone in the car/office/home likes is still a killer app, Spotify. The key insight we discovered in this prototype was: Sometimes you want music to mirror your mood; other times you rely on it to change your mood. So for Alexa interactions, in what contexts do we want an emotional mirror or a regulator?
Alexa is many things to many people. Now she’s become, of all things, a mirror.
If you knew someone was driving at 70 mph, alone, and at risk of falling asleep in the car, you wouldn’t play melancholy music, obviously. In this case you want a regulator: Pump up the dance mix, please!
Conversely, after that Thanksgiving political debate with your moronic uncle, or a heated meeting where you’re overly stressed, its useful to play Pablo Casals’ Bach suites, to decompress.
For a society with an epidemic of chronic stress, emotional-state regulation is a real opportunity. Simply identifying and naming emotional states is useful for most of us, but certainly for kids on the autism spectrum. My friend, Jason Kahn, started a company with Boston Children’s Hospital called Mightier, which helps these kids become more aware of their emotional states. It does this through an armband that measures heart rate variability and galvanic skin response, and provides a feedback loop in a form that kids respond to—by playing a video game. Calming the body automatically increases the force field around your spaceship in the game, so it’s easier to win.
Eventually, playing the game makes self-regulation easier. This form of biofeedback is great for learning, especially when it's entertaining too. But it requires wearing a wrist band and playing within the Mightier app. Amazon’s innovation could provide a more subtle and effective behavioral feedback loop for a massive audience where input is based on how you speak and output is expressed as vocal style.
Empathetic Voice Interaction
If a friend calls to say she didn’t get a much-wanted job, or that his date was a flop, the worst thing that you can do is to brush it off quickly with, “Cheer up—it’s not the end of the world!” A high-EQ friend will mirror your affect, acknowledge your pain, and sit in your discomfort. This mirroring builds psychological and emotional intimacy, as my psychoanalytic wife would say, “Empathic listening is the first step to creating a therapeutic alliance.”
Amazon’s innovation could provide a subtle, effective behavioral feedback loop for a massive audience where input is based on how you speak and output is expressed as vocal style.
I’m not suggesting that Alexa will automate psychotherapy yet, but the Alexa team has launched something quite profound by using the voice channel to express emotion. Why? Because voice is pervasive, and an emotion-laden voice is subtle. Even subconscious.
Here’s a thought experiment: Through what other modality might you reflect emotional states? What about visual cues? A big meter? Colored light? The facial expressions of a social robot?
At the MIT Media Lab, I created an ambient feedback device for team meetings called "The Balanced Table" that shows whether a conversation is in balance. That is, are people attuned to turn-taking in conversation? This “balance” data is illustrated on the table itself by illuminating a constellation of LEDs very slowly in front of the person who's talking. Over the course of 10-15 minutes, we found, people glance down at the surface and naturally encourage introverts to pitch in ideas. Or, for our extroverts, to just stop talking already.Embedded content: https://vimeo.com/244740596
We considered displaying affect through color on the table (depressed as blue, raging as red) but I thought it seemed too public—and potentially embarrassing—to expose this signal in a work context, and the table was already trying to do a lot.
I think Alexa’s voice would be perfect in this regard: just the right modality and channel to understand and “echo” the excitement for a question such as: “Who won the Patriots game?” Answer delivered with enthusiasm: “The Pats, of course!”
When delivering news about politics, climate change, or fake news, Alexa would use an appropriately depressed tone. The demo Amazon provided was in the context of a game, where the emotional state relates to whether the AI is ahead or behind. Gaming is an appropriate testbed to experiment around, before emotionally coloring the news, the weather (“Get ready for a high-pressure system! So excited,”) or letting you know your sick-and-depressed mood is now entering its second week.
My question for you: To what extent will these emotion-colored conversations affect our own mood, or change our behaviors? For example, if somebody says, “So excited for our team meeting at 11 o'clock!” Are we psyched? Or does the “excitement” seem manufactured and fake, and may even backfire?
An Opportunity for Subconscious Nudging?
If Alexa’s responses to your Amazon Fresh orders for healthy choices come back with positive feedback (“Good for you, David!”), while the third box of Oreos bring a depressed confirmation (“The order is in, but it’s not good to eat your emotions,”)—will you change your order next time?
To what extent will these emotion-colored conversations affect our own mood, or change our behaviors?
As personal digital assistants are given access to more context, more bio-signals through Apple watches (not likely), Google FitBits (also not likely), and Amazon home security cameras like the Ring doorbell (very likely), plus population data on food purchase behavior, voice-based emotional feedback will color interactions across a huge range of topics. Over time it will get more pervasive and woven into the fabric of our lives: Alexa isn’t only in our kitchens, bedrooms, living rooms, and microwaves, but she also comes along for the ride in our car, she is worn on our faces and even our ring fingers. For better or for worse, we are wedded.
We know that people mirror the emotional states of others, but will we mirror our personal digital assistants in the same way? Is this type of feedback loop most valuable to a specific population like kids on the spectrum, or is emotional awareness something we all need?