Genuine question for the community.
With APIs like OpenAI and Claude, building conversational intelligence has become significantly easier. But when it comes to production voice applications, the telephony layer often becomes the bottleneck.
Things like:
-
Handling call state and interruptions
-
Managing latency for real-time conversation
-
Supporting Indian languages (Hindi, Tamil, Telugu, etc.)
-
Dealing with poor network conditions
Where do you spend most of your debugging time?
Would love to hear about the pain points you’re facing—we’re building resources around the most common challenges.
Good question. For us the answer is neither. The AI brain is getting easier to build thanks to OpenAI and Claude APIs. The telephony layer is hard but tools like Exotel and LiveKit are making that more accessible too.
The part where we spend the most debugging time is actually the coordination between agents after the call. A voice agent takes the call and classifies the request. A separate agent pulls the relevant data. Another one prepares the response or creates a ticket. A human sometimes needs to approve before anything goes out. Each agent works fine individually. Getting them to hand off to each other with shared context and without losing information between steps is the part that breaks most often.
We have been trying a few tools for this coordination layer. n8n for routing between agents, teamoffsite.ai for the shared context and approval flow. Each handles a different piece. The telephony side via Exotel stays clean because the voice agent only handles the real-time conversation. Everything after the call is a coordination problem, not a telephony problem.
Where does most of the debugging time go for others here? Curious whether people are running the full pipeline from call to follow-up action or stopping at the voice agent.
1 Like