The primary barrier that slows down the adaptation of voice assistants is insufficient speech recognition accuracy. Additionally, challenges such as language coverage, users' expectations, security, cost, and complexity of deployment and integration must also be addressed to provide the best possible voice user experience. Kardome has developed a software solution for the mobility segment that dramatically improves existing in-car speech recognition systems to address these multiple challenges.

Kardome Mobility Empowers Automotive OEMs to Create Next-Gen Voice Interfaces with AI-Powered Technology

The primary barrier that slows down the adaptation of voice assistants is insufficient speech recognition accuracy. Additionally, challenges such as language coverage, users' expectations, security, cost, and complexity of deployment and integration must also be addressed to provide the best possible voice user experience. Kardome has developed a software solution for the mobility segment that dramatically improves existing in-car speech recognition systems to address these multiple challenges.

Dr. Dani Cherkassky
Dr. Dani Cherkassky
CEO, Co-founder
Product Updates

Table of Contents

Integrating natural language processing and machine learning has enabled a more seamless and intuitive in-car voice interaction experience, making voice assistants a crucial part of modern automotive technology.

However, despite their many benefits, some challenges still need to be addressed. The primary barrier that slows down the adaptation of voice assistants is insufficient speech recognition accuracy, which can be especially challenging in noisy and crowded car environments. Additionally, challenges such as language coverage, users' expectations, security, cost, and complexity of deployment and integration must also be addressed to provide the best possible voice user experience.

These barriers become even more prominent in a typical car setting, typically a small, noisy space often packed with people. In-car voice assistants' “hearing capabilities” must be enhanced to ensure sufficient speech recognition accuracy. 

Due to the lack of technological solutions that provide human-level hearing capabilities for in-car voice assistants, some car manufacturers have deployed microphones close to each car seat. Such microphone networks are expensive in terms of the respective bill of materials (BOM), installation, and maintenance costs. As a result, in most vehicles, access to the voice assistant is limited to the driver while using a single microphone array in the overhead compartment. 

Kardome has developed a software solution for the mobility segment that dramatically improves existing in-car speech recognition systems to address these multiple challenges.

Kardome Mobility is the only software solution to capture up to six speakers over three seat rows with a single mic array in the overhead compartment. It isolates the desired speech, reduces background noise and echo, and can identify the person speaking with voice biometrics.

Examining Challenges with Effective Speech Capture for In-Car Voice Technology

Acoustic Complexity in Vehicles

Cars are often boisterous environments. The engine sound dominates the noise at low speeds and on smooth surfaces, while at high speeds, wind noise becomes more prominent. When driving in the city, the most significant contributor is road noise,  i.e., the friction between the tires and the road surface. Although electric cars have less engine noise, road and wind noise still pose substantial challenges to speech recognition systems.

Moreover, with the trend toward shared mobility and autonomous driving, cars are often occupied by more than one person. This means conversational noise, interference between speakers, and noise from speakers’ devices are also issues.

Manufacturers must tackle the problems of background noise and multiple people talking in a vehicle to provide the best possible voice user experience. According to an In-Car Voice Assistant Consumer Adoption Report by Voice.bot, 60 percent of drivers say that voice assistant quality is a factor in their decision-making process, while 13 percent consider it a significant factor. 

Moreover, it’s not just the driver and front passengers who expect access to voice assistance; all passengers in a car anticipate flawless voice access. As the use of autonomous vehicles increases, the demand for accurate voice interaction by car drivers and passengers will become even more crucial.

Complex Integration and Costs

Relying on beamforming, OEMs must deploy microphone arrays in the roof liner of a vehicle, one for each passenger,  to enable a reliable voice user interface. Each microphone array uses a beamforming algorithm to direct sound capturing toward the target speaker while attempting to mitigate driving noises and interfering speakers.

Deploying multiple microphones for improved reliability is a significant cost. It is expensive in BOM installation and maintenance and incurs high design costs since the microphone network needs to be customized for each car's interior.

Moreover, deploying multiple microphone arrays imposes design constraints and compromises the car's aesthetics. For example, vehicles with glass tops cannot accommodate microphones above the seats.

Why do car manufacturers avoid using only a single microphone array in the overhead compartment and employ beamforming to steer speech capture toward all seats in the car? The answer is simple: it doesn't work.

Beamforming involves modeling soundscapes using a set of one-dimensional parameters called the "direction of arrival." However, in any enclosed environment, such as a vehicle, sound waves travel through a direct path and bounce off the windows and panels of the car, eventually reaching a microphone array from hundreds of different directions.

Beamforming can only focus on a single path, leading to an incorrect representation of the actual sound environment. Consequently, beamforming technology fails to capture speech effectively if a speaker is more than 50 centimeters away from the microphones.

Kardome’s Innovations

Kardome's innovations include spot-forming. This proprietary, multi-dimensional soundscape analysis method decodes spatial cues such as echoes in a space by extracting the relative locations between each sound source in the environment and the microphone array. 

Spot-forming is a technology that can infer the entire reflection pattern produced by each sound source in a soundscape. It does so without requiring sound sources (people speaking) to take any action. The environment geometry and relative location between the sources and the device define the reflection patterns. As a result, spot-forming is a location-based technology that can classify speakers based on their position in space. 

Spot-forming overcomes the inherent modeling deficiencies of beamforming and accurately decodes the multi-dimensional soundscape in closed environments. Kardome's solution has practical benefits in cars since a single microphone array in the overhead compartment can create an acoustic zoom toward each occupant in a vehicle. 

Kardome Mobility is the only software solution to capture up to six individual people speaking over three seat rows with a single mic array.

Kardome Mobility

Based on the spot-forming framework, Kardome has developed a complete edge audio stack for the automotive industry called Kardome Mobility.

Kardome Mobility Includes the Following Functionalities: 

  • Spot-forming-based Audio Front End (AFE): Spot-forming’s 3D model uses reverberation to separate sound (speech) from different locations. The AFE includes multichannel acoustic echo cancellation, noise reduction, source (speakers) separation, and the ability to identify where speech is coming from in the vehicle.
  • Wake Word: Proprietary Edge recognition model designed only to start listening when it hears specific trigger words, such as “Alexa” or “Hey Siri.”
  • Voice Biometrics: Proprietary Edge model for identifying/authenticating a user based on the individual’s voice.

The Kardome Mobility software package is a comprehensive voice stack using spot-forming. Proprietary voice biometrics and wake word AI models are used on top of it and are designed and trained to work under the spot-forming framework.

Kardome’s spatial hearing software allows machines to accurately recognize the speaker's voice, location, and speech content, even in the most challenging sound environments. By integrating Speech AI modules into the AFE training process, Kardome's approach delivers superior performance compared to a fragmented system that independently develops the AFE and Speech AI.

Kardome Mobility supports the following in-car use cases: 

  • Communication
  • Hands-free Telephony
  • In-car communication, announcement mode only
  • Voice Artificial Intelligence (AI)
  • Wake Word recognition
  • Voice Identification
  • Automatic Speech Recognition: Through an interface with a third-party ASR engine.

The AFE module operates in two regimes: providing input to voice AI modules and communication modules such as Hands-Free Telephony (HFT). In voice AI, the AFE separates the target speaker to improve speech recognition rates, while the second regime optimizes speech quality at the output. The system automatically switches these parameters to maximize performance in both scenarios. 

Below is a block diagram of the AFE module that utilizes a microphone array consisting of eight MEMS microphones. Such a system has up to six sound-capturing spots across three seat rows using a single mic array in the overhead compartment.

We have summarized the performance of the AFE for speech AI applications and HFT in this study: https://bit.ly/speechrecognitionstudy

Seamless Integration with Hardware

Kardome Mobility is a software solution integrated into the infotainment system firmware.

OEMs can implement the software on the primary Application Processor (AP) as a Linux library or Android application. Alternatively, Kardome Mobility can be implemented using dedicated DSPs integrated into the AP silicon, such as Qualcomm's Hexagon DSP and Samsung's HiFi DSP, or with external dedicated chips. When applied, the Kardome Mobility software must have access to the audio output reference signals for Acoustic Echo Canceller implementation. 

Kardome Mobility can work with any microphone array of four or more microphones. The microphone elements used are typically simple MEMS microphones. A typical microphone array for Kardome Mobility comprises 8 MEMS microphones with an overall dimension of 20 x 50 x 5 mm. Also, as mentioned, OEMs typically prefer to locate such a microphone array in the overhead compartment and to interface it with the central infotainment system using an A2B audio bus

Conclusion: Kardome Mobility's In-Car Voice Tech Revolution

Kardome Mobility is leading the way in revolutionizing in-car voice technology. Its spot-forming voice AI technology eliminates the limitations of traditional voice systems, empowering each passenger with a personalized voice interface. Kardome's innovative approach ensures precise and accurate voice recognition, paving the way for a truly connected and personalized driving experience.

Learn more about Kardome Mobility here: https://bit.ly/Kardome-Mobility


For more information on how Kardome Mobility can improve your vehicles’ voice user interfaces, schedule an appointment with one of our experts: SCHEDULE AN APPOINTMENT