Databricks Simplifies Data Engineering Processes, Shortens Iterative Cycles for Core Technologies
Databricks, provider of the leading Unified Analytics Platform powered by Apache Spark™, today announced that Voicebox, the voice technology supplier for the automotive, mobile, home and IoT markets, has selected the Databricks Unified Analytics Platform to reduce time to market of their voice recognition technology. Since incorporating Databricks’ Unified Analytics Platform in 2016, Voicebox’s natural language technology has been used to process over a billion spoken statements per month.
Register to hear more about Voicebox’s experience with the Databricks Unified Analytics Platform in a webinar taking place Wednesday, March 28 at 10:00 am Pacific Time: Data Contributions for a Conversational AI Platform.
Voicebox is the leader in conversational Artificial Intelligence (AI) development, building award-winning voice applications for over fifteen years. The company leverages patented context management algorithms to model human conversation to go beyond the current one-question, one-answer paradigm. The technology includes components for automated speech recognition (ASR), natural language understanding (NLU), and text-to-speech (TTS). Voicebox also supplies tools for authoring conversational domains and a set of domain showcases that use code and UX examples to demonstrate how to leverage advanced ASR and NLU capabilities to build exceptional conversational assistants.
For example, when a Voicebox customer says, “Play the latest song from Coldplay,” the ASR component must be able to map the spoken phonemes to the word “Coldplay”, while the NLU capabilities map “Coldplay” to an appropriate database query or ID value. Maintaining this capability in real-time becomes very complex when the data is always changing, such as when new artists, albums, and songs are released. Customers don’t like waiting for lengthy updates to the company’s machine-learned ASR and NLU models. Prior to Databricks, Voicebox’s entity extraction pipeline used a custom in-house solution for cleaning the data, training models, and delivering them to customers. This in-house solution was difficult to maintain and often error-prone.
With the use of the Unified Analytics Platform, Voicebox is able to build, schedule, and run automated production data pipelines that keep their ASR and NLU deep learning systems up-to-date. The company’s use of Databricks to build a performant data processing pipeline also enables VoiceBox to measure more precisely the latency of each intermediate step before the end-user sees results. As a result, VoiceBox is able to operate at higher peak traffic volumes without sacrificing latency. Voicebox’s cloud platform captures anonymized audio recordings, then uses Databricks’ unified analytics in its crowdsourcing pipeline that uses external and internal crowds to evaluate accuracy. Thanks to CIP (Continuous Improvement Processes) and Databricks, these efforts have increased end-customer facing ROI metrics such as ASR accuracy and end-to-end accuracy between 14 percent and 24 percent since the program began; NLU accuracy has increased over 3 percent.
“Our engineering team functions like night and day with the Databricks platform in terms reliability and speed,” said Peyvand Khademi, director of Data Platform and Services at Voicebox. “The Unified Analytics Platform has reduced engineering complexity, facilitated a streamlined workflow, and enabled a shorter development cycle for our core AI voice recognition technologies, products, and solutions. It would be entirely fair to say that we have overhauled our engineering processes in the Voicebox Data Team since switching to Databricks.”