How We Got to Alexa, and Where Do We Go from Here?

artificial intelligence

How We Got to Alexa, and Where Do We Go from Here?

A Book on Voice Computing Raises Some Humanist Questions

August 16, 2019
by Ken Gordon

“While voice computing offers the world new powers and conveniences, we should not become so overawed that we forget to evaluate the many risks,” writes James Vlahos, at the end of Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think. “But voice, when done right, has the potential to be the most naturalistic technology we have ever invented. It’s a misconception that AI must necessarily be coldly algorithmic. We can infuse it with the best of our own values and empathy. We can make it smart, delightful, odd, and empathetic. With voice we can finally make machines that are less alien and more like us.”

Vlahos is right about both risk-assessment and the necessity of creating humane voice interfaces. Problem is, Talk to Me won’t really help future technologists with these endeavors. To be fair, this wasn’t in the book’s scope of work. Talk to Me’s job description goes something like this: Tell the story of how voice computing found its voice. The book provides a good overview of how we stumbled into the Era of Alexa, one that progressed one startup, one prototype, one Alexa Prize contest, at a time. In that respect, it’s a job well done. We meet the characters who created the virtual characters with whom we daily speak: Siri & Co.

The bulk of the volume is spent detailing how this tech was developed and deployed, with the occasional remark, from some techno-ethicist, or Vlahos himself, about potential dangers. To give an example, Vlahos cites a psychology professor named Peter Kahn, who “worries about a ‘domination model’ in which the user makes demands and receives rewards but doesn’t have to reciprocate. This, he says, is unhealthy for moral and emotional development, especially for children. At worst, the human can begin to abuse his power.” The issue gets two more paragraphs, and then the narrative moves on. This notion of teaching kids to use their voice as a form of mastery-over-others is precisely the point in “Alexa in the Age of Casual Rudeness.” I wanted a chapter—maybe several of them!—on this topic.


Yes, researchers like Sherry Turkle are altering people to the dangers of synthetic relationships, but their voices are obviously less loud, less influential than those of the big tech companies, who are motivated by other considerations. Martin Buber, the great philosopher of I and Thou fame, writes that “Relation is reciprocity,” but few in the voice business seem overly worried that human-machine relations will always be lopsided—and that this might be a serious problem. Vlahos writes: “The key issue to consider is whether this new class of beings is detracting from relationships with actual humans.” He’s correct, but will the people who fund and sell these technologies consider this the key issue? Seems unlikely, to be honest.

As I read Talk to Me, I kept thinking: What would be profoundly valuable is a book that will help future voice computing innovators and companies reflect on what they’re doing. The long-term consequences, intended and otherwise, of voice should be watched carefully, and adjusted for. How might this happen? What are the questions to ask? What kind of longitudinal design process might work here? What would be the value of such a long-term approach?

The challenge with such questions is obvious: asking and answering them would slow down the innovation process. Fact is, there’s a wild race to create endlessly more realistic bots, to figure out search and voice, and to find the right business model. Most companies don’t have the vision or resources to consider the long-term effects of their tech (though the ones that do will reap serious benefits by ensuring that the voice systems they put out there work properly, safely, and well for the people using them). It’s probably necessary for some nonpartisan governmental oversight to monitor and evaluate the effects voice has on the population, though that seems highly unlikely at the deregulatory mood of the present time.

One thought: drug scheduling could be a good model here. (The relationship people have to their smart speakers seems mildly addicting; one can easily imagine how the dependencies will increase as the tech gets better are more deeply embedded into our lives.) Talk to Me sketches out some of the risks involved—it’s clear that it will take time and patience to understand the effects to conversing with machines posing as people—and it would be beneficial to codify the degree and nature of these risks as we move forward.

Synthetic relationships might prove truly beneficial for those who are isolated and lonely (Sachin Jain talks about the massive health problems caused by loneliness), but they might also be easily—and frequently—abused by ordinary people who find the challenges of connecting in a nuanced way with other humans too troublesome. We see what social media has done to reinforce the cognitive biases of, well, everyone… imagine when this new, extremely intimate technology becomes good enough to mimic every form of human interaction—even negative and hateful ones. It’s an unsettling thought.

Talk to Me reminds us of how innovators, in general, are not overly focused on risk—because our job is to move toward an ideal future state, as soon as possible. We move through the iterative prototyping process until we bring a thing to market, and that’s that. But perhaps it shouldn’t be. Part of innovation’s job could be—could become—understanding the long-term consequences of a product or service on the public. Learning after launch makes great sense.

One of the best moments in the book is when Vlahos sits in on a meeting with the team behind Cortana, Microsoft’s virtual assistant, when they encountered an article about children using “slave talk” while conversing with bots. The team took a deep breath and tried to figure out how Cortana should respond when users ask the assistant about unseemly subjects like child pornography, rape, geocide.

Vlahos writes: “Natural-language understanding often fails to pick up on the nuances of meaning,” and then shows the Cortana team struggling to find appropriate answers to abhorrent queries.

You can see the ethical challenges here, in which the team tries out these responses “I can’t believe you are going there” and “I can’t believe I have to protest this bullshit” before finally agreeing on the single word “Horrible” as a response.

Now “Horrible” might or might not be a sufficient response. Putting the reader into that room is so valuable, I think, because it models ethical design for future conversational designers. These people need to understand what ethical situations look like and sound like—and to take part in such conversations, as early as possible. I don’t believe that ethics is a substantial part of design (and design education). It should be, as Brian Larossa suggests in Design Observer.

I would have loved for Vlahos, or his publisher, to produce an appendix, maybe a supplement, to the book, one for companies and designers, as a long-term framework for creating humanist bots. I’m thinking of the little book Jim Collins wrote for social innovators after he published Good to Great. Or it would be good to see a series of questions that builders of future bots should ask themselves to ensure that this tech is being developed in a way that will serve, and not enslave, those who use it. A good model is Andrew McAfee and Erik Brynjolffson’s Machine, Platform, Crowd, which lists, at the end of each chapter, some succinct questions, distilled from the text, for readers to ask themselves.

Look at me wishing for another book! That’s a lot to ask. Maybe this wished-for text shouldn’t be more writing… but a conversation between some humans. Hey, James Vlahos: come onto our Resonance Test podcast and let’s talk about it!

Photo by Jan Antonin Kolar on Unsplash

filed in: artificial intelligence, healthcare

About the Author