Voice or speaker recognition technology is the ability of a digital device to receive, recognize, and interpret spoken commands—and carry out such commands using sound input as an interface.
Artificial-intelligence-backed (AI) voice recognition technologies are gaining traction and set to reach billions of people in the decades ahead. Because it is human nature to look for convenience and social connection, many of us find it comforting to “talk” to a machine when interacting with it, instead of just touching the keyboard or surfing it with a mouse.
Citing a report from Juniper Research, Tech Crunch estimates that there will be at least 8 billion voice assistants on the planet by the year 2023. Google had announced plans of integrating its Assistant with one billion Android devices. As of last year, Amazon’s Alexa has been servicing or conversing with 100 million users.
Advancements in voice technology
The biggest names in tech are leading the way in AI, AI assistants, and voice recognition technology. Last August, Amazon’s Alexa and Microsofts Cortana announced that they would officially access each other’s features to leverage the market further. In a manner of speaking, these AI voice tech tools of two competing companies will soon be “talking” to each other.
To give one possible outcome: Google Duplex’s smart speakers will allow full-stack conversation with machine AI, allowing their virtual assistants to execute human-like interactions instead of relying on the one-way command and control interface. It sounds straight out of science-fiction: instead of the human mobile device owner, it is Google Duplex’s AI who makes appointments for the latter by directly interacting with the person they want to connect with. Right now, Google Duplex is being used mostly for restaurant reservations, but give it a few years, its capabilities will allow it to include appointments with doctors or school principals, or even face-to-face business conferences.
Current limitations in AI fuzzy logic
Machine language is currently boolean, meaning it can only respond to true or false statements (1 and 0). AI fuzzy logic expands these capabilities to understand the full range of sounds necessary when interacting with human voices. Variations in accents, pronunciations, phonetics, homonyms, synonyms, and slang in human language present a challenge in machine-human voice interaction.
A majority of voice tech today is using the English language as a base. But the challenges of using the homegrown language of a particular group, people, or location are slowly being resolved through machine learning and fuzzy logic. Supporting other languages is becoming a priority for developers who want to expand their market reach.
Applications and advantages
Voice-activated commands and machine interaction have already captured the imagination of society since the very first days that science-fiction media first introduced them; Star Trek: The Original Series’ female computer voice was one of the pioneers.
Over time, today’s users learned to enjoy the convenience of interacting with machines, especially when giving instructions lowers the learning curve necessary to operate digital devices.
That convenience further expands the reach of e-commerce. One Accenture study projected $40 billion worth of purchases using voice tech-driven technologies by 2022 in the USA and UK alone.
We can only expect consumer purchases using voice technology to increase as more and more people find the experience satisfying while technology improves parallel to commercial growth.
Voice technologies also simulate a more personal interaction with machines. Regardless of whether it is just a curious novelty or a smart, compelling replication of the human voice itself, the (simulated) personalized service provided by a fully interactive AI-assistant can drive more consumers to use voice tech and eventually make purchases.
AI and machine learning are not only applicable to the accuracy of voice tech interaction. It can also predict consumer taste and preferences, and in turn, recommend the best purchase options. We can already see this feature in YouTube videos, Amazon, and Facebook. Content is curated based on previous engagement by the user, then further enhances the user experience by prioritizing content (including advertisements) that interests them.
The future of voice technology
Salespeople are traditionally the lifeblood of any company. Their ability to close deals from prospects fuels the cash flow of the organization. Though more and more commercial and financial transactions have become available using technology, they still lack that winning, heartwarming touch of a human salesperson, an appeal which commerce has relied on for decades.
AI assistants can change that status quo, too. People inherently are skeptical of the words of salespeople, recognizing that their objective is to make sales for commissions while not giving the buyer the best option. AI assistants may not present that problem because their machine nature may apparently make them more objective, more neutral, and perhaps even less crafty in dealing with a human costumer.
Companies providing content and data to support voice tech platforms can significantly gain a headstart against their rivals. Digiday says that 43% of companies have made that investment. Inevitably, many others will follow suit.