I am working on integrating a VoIP call (Asterisk setup) with a real-time WebSocket bot. The goal is to send incoming voice data to the bot over WebSocket and play back the bot’s response to the caller. I have attempted two approaches but am facing significant audio quality issues in both:
Approach 1: Using ARI
I connected to the ARI server and created a bridge.
A local channel and an external channel were created and added to the bridge.
A snoop channel was created on the local channel with spy="in" to capture incoming audio.
Another snoop channel was created with whisper="out" to inject the bot’s response.
created an external media connection on the external channel, and I also started an RTP UDP server.
Issue:
The audio sent over WebSocket is extremely noisy, making it incomprehensible to the bot. Additionally, the bot's response audio, when sent back to the caller, is not audible.
Approach 2: Using AudioSocket
I set up an AudioSocket server to handle the call’s audio.
The incoming audio is successfully sent to the bot, but similar noise issues persist.
Issue:
While the bot’s response audio is at least audible to the caller, it is still not understandable due to excessive noise.
Troubleshooting Attempts
I have tried resampling the audio before sending it to the bot, but this did not improve the quality.
I have spent several days troubleshooting this issue without success.
Request for Help
I am unsure what I might be doing wrong in my setup. Is there a better way to handle the audio streams, or am I missing any critical configuration? Any guidance would be greatly appreciated.
Asterisk Version using is 18.
The bot is gpt4o realtime preview.