Clarity on Acoustic Echo Cancellation Operation Brings Clarity to Audio Calls

It’s a standing joke within the AVL industry to refer to the work of AVL integrators as “the dark arts.” If that’s true, then Acoustic Echo Cancellation (or AEC) is a truly special kind of dark magic that the many use, but only the most talented of audio wizards truly understand. This is because there is a lot of abstract thinking involved, making it hard to explain what AEC actually does. Without a firm grasp of how AEC functions, it’s hard to understand the various ways it can be applied.

AEC is an audio processing effect found in Digital Signal Processors (DSPs) that are designed for audio conferencing, when there is a room with microphones and loudspeakers on the “near side” of a phone or online call as well as someone at the “far side.” The goal of AEC technology is to remove what’s referred to as “acoustic echo,” a byproduct of performing an audio call in a room with microphones and loudspeakers.

Acoustic echo occurs on a phone call when the far-side speech (from the distant person) is played over local loudspeakers in the near-side room. This audio is picked up by microphones in the near-side room and then transmitted back to the far side. Because of the inherent delays of transmitting audio from one geographical location to another, this transmitted signal is a delayed version of the original voice, thus sounding like an echo. In addition, acoustic elements in the near-side room can introduce more echo into the sound picked up by the microphones, thus increasing the amount of echo in the output signal.

The goal of AEC is to remove this echo in the signal. It essentially works by “telling” the microphone to ignore any sounds coming out of the loudspeaker. The AEC algorithm is “told” about audio coming out of the loudspeaker through a special input called the “reference.” Sending the right signal to the reference is critical. Here’s a useful rule of thumb:

“Send to the reference whatever you’re sending to the loudspeaker.”

This keeps the AEC algorithm well informed of any audio processing that ultimately alters the sound of the loudspeakers. In particular, care must be taken that any dynamic processing such as compression or limiting employed on the loudspeaker output signal is also applied to the reference signal. Following the above rule of thumb ensures this requirement is met.

AEC Wiring in HARMAN Audio Architect Software

There are several parameters in AEC, but the single most important one to consider is Echo Return Loss (or ERL). ERL measures how loudly the secondary audio (that is, the “echo” from the loudspeakers) is coming into the near side microphone, and assesses how closely the AEC reference signal matches the AEC input signal from the microphone. An ERL meter measures the room’s natural attenuation of the audio as it leaves the loudspeaker and re-enters the microphone. The higher the ERL, the harder it is to completely remove the echo.

For BSS audio devices, AEC works flawlessly with ERL meter readings below 10dB. The algorithm will continue to work with ERL readings higher than 10dB, but the convergence rate will decrease in that range. The convergence rate measures how fast the AEC algorithm can recognize and remove echo from the signal path. The faster the convergence rate, the faster the AEC processor can account for changes in the room, such as a person or microphone moving.

The best way to control the ERL is with a good gain structure setup. Ensuring there is suitable gain for the input signal will keep the signal-to-noise ratio (and thus the ERL level) as low as possible. Managing gain also provides reasonable headroom for the AEC input signal. This ensures distortion free sound and optimal performance for AEC.

One factor that will complicate an AEC system is “voice lift” (also known as “local sound reinforcement”), where the local microphones feed both the far-side and near-side loudspeakers. This typically applies in large rooms where other participants located in the same room are unable to hear the person speaking. A basic method of voice lift is when a microphone is simply routed to all loudspeakers in the room, including the loudspeaker directly above it. A more advanced method of voice lift is called “mix minus,” wherein a microphone is routed only to distant loudspeakers in the room, not nearby loudspeakers (because nearby listeners can already hear the speech emanating from its natural source: human lips). Therefore, listeners will hear the talker’s voice at a uniform level anywhere in the room. The ratio of speech coming from loudspeakers versus lips changes as you move in the room, but the combined level remains the same. Also, because mix minus increases the distance between microphone and loudspeaker, it also helps reduce feedback.

In any case, when voice lift is involved—whether basic or mix minus—the signal sent to the reference must be carefully chosen, and the aforementioned rule of thumb (“send to the reference whatever you’re sending to the loudspeaker”) no longer applies. With voice lift, the signal being sent to the loudspeaker contains the local microphone. And sending a microphone signal to its own AEC reference would cause the AEC algorithm to cancel itself. This results in a highly distorted and unintelligible signal and a very frustrated remote caller. Therefore, systems with voice lift require a modified rule of thumb:

“Send to the reference whatever you’re sending to the loudspeaker, minus any local mics.”

This results in a pristine mic signal for the far-side (free of echo and distortion), while still feeding a mix of both near-side and far-side audio to the room loudspeakers.

Of course, this is just the tip of the iceberg when it comes to AEC. To learn more about how AEC works, you can sign up for training from HARMAN Professional University, where there are online courses available.

Clarity on Acoustic Echo Cancellation Operation Brings Clarity to Audio Calls

Kevin Brown