Designing for the voice interface. Part one

This is a short introduction looking at the things that need to be considered before the decision is taken as to whether voice would be appropriate as an interface for your product.


With the ever-increasing power of artificial intelligence more and more vendors are launching systems with voice user interfaces (VUI) Some of these systems allow developers to integrate with them creating using what Google calls ‘conversation actions’ when using Google Assistant on their Google Home hardware, or Amazon calls ‘skills’. on it’s echo hardware.

The design of the VUI Builds on from the early days of the telephone-based, interactive voice recognition (IVR) systems. The modern Artificial Intelligence (AI) powered systems benefit from a massive improvement in recognition outcomes (especially in noisy environments) helped not only by the advances in AI but also the improvements in noise cancelling technology, which was driven by a desire from the mobile device manufacturers to improve the clarity in human to human conversations.

In a recent report on the current internet trends, produced by the venture capitalist company ()01, the benefits to using voice technology were listed as speed, (humans can speak 150 vs. only 40 words typed per minute on average) ease, (convenient, hands-free, instant) and personalised (context driven, keyboard free ability to understand wide context of questions based on prior questions, interactions, location and other semantics) In this report the authors appear to have overlooked the benefits of using voice to those users that suffer from disabilities that prohibit them from using other interfaces without major adoption and effort on the part of the user.

Before deciding if a VUI is appropriate for the project we are working on or product we are designing we need to explore the needs of both the business and the user.

What are the reasons that a business may choose to use a voice user interface?

Businesses will have to be very sure of why they feel the need to use voice interfaces and whether they need voice on it’s own or a multi-modal approach combining voice with other input/output methods.

A business may be motivated to use a VUI for a number of reasons such as:

  • Economy (reduces the need for expensive call centres)
  • New business (VUI especially smart digital assistants provide another channel for customers to access products and services)
  • Competition (Getting to a new market first)
  • A new way to solve existing or new problems (giving the business a market advantage)
  • Providing a better user experience
  • The technology required (connectivity, microphone, speaker, processor) already exists and is carried by a large proportion of users in their smartphones

In addition, ()02 suggest that the following criteria should be thought about when deciding if and when a VUI would be appropriate:

  • When the user’s hands or eyes are busy
  • When only a limited keyboard and/or screen is available
  • When pronunciation is the subject matter of computer use
  • When natural language interaction is preferred ()02

If a business decides that a VUI is appropriate to their needs there are three choices:

  • A custom solution
  • An 'off the peg' solution provided by a speech technology vendor
  • Plugging in to a third party virtual assistant.

()03 is clear that the business is ultimately concerned about the bottom line, which is, Am I getting the ROI I expected from this automated solution? and is customer service being enhanced? This will force company into a prioritization of business opportunities (i.e., Which customers/leads/tasks are most valuable to the business and should be handled by a live agent? Which customers/leads/tasks are less valuable or have less impact on customer engagement and can be handled via automation?)

But what about the customer needs?

Although et. al. ()04 suggest that, there are also many appealing advantages from the point of view of the end-user who may find it Intuitive, efficient, ubiquitous or enjoyable the main motivation when designing a digital product needs to be on fulfilling user needs and finding the tasks that users complete in order to achieve those needs. It must not be forgotten that in addition to the task specific needs, all users, to a greater or lesser extent have the more general needs of security, privacy and, just as importantly the need not to feel stupid or embarrassed. These points may make users less likely to use a VUI especially when in public and other methods of completing the task will have to be thought of.

Even with those needs Identified, will users still be wary of using what may seem to be a very 'techy' system? Perhaps not, as ()03 expresses, end-users are becoming increasingly technology savvy, and thus speech-enabled applications must evolve quickly to meet consumer needs. Users expect highly effective, efficient solutions that are likable and quickly learned.

Users don’t like to feel that they are battling with an impenetrable wall of malfunctioning and misbehaving technology so the VUI will have to have a very low failure rate. Speech recognition accuracy is currently 95% ()01 and Andrew Ng, who is the chief scientist of the technology giant Baidu, is quoted in the KPCB report as saying that when accuracy gets to 99%, all of us in the room will go from barely using it to using it all the time. 99% is a game changer. No one wants to wait 10 seconds for a response. Accuracy, followed by latency, are the two key metrics for a production speech system. ()01.

Although there will always be a temptation to use the 'latest bit of kit', really understanding how the business and user needs will be best met will point you in the direction that the project could take. Although the voice interface may have a lot of interesting uses it may become annoying or frustrating for the users if it is not applied wisely.

Hopefully this short article will help get the conversation started within your teams and help in your discussions with the product stakeholders.

In part 2 I will be looking at user research


  1. (). 2016 Internet trends report.Kleiner Perkins Caufield Byers. Retrieved January 15, 2017, from
  2. & (). The role of voice input for human-machine communication. Proceedings of the National Academy of Sciences,92(22), 9921-9927. doi:10.1073/pnas.92.22.9921
  3. (). Voice user interface design – purpose and process. Microsoft Retrieved from
  4. & (). Voice: User Interface Design (2nd ed.).Boston, MA: Addison-Wesley Educational Publishers
Please follow and like us:

Also published on Medium.

Leave a Reply

Your email address will not be published. Required fields are marked *