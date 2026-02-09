New Delhi: India’s technology scene has witnessed a bold new chapter with Sarvam AI, a Bengaluru-based startup, unveiling its homegrown artificial intelligence models that are designed specifically for Indian users. These models, called Sarvam Audio and Sarvam Vision, are not only competing with global leaders like Google and OpenAI but are showing stronger results in Indian contexts.

Voice First, Not Just Text

India is a country where voice communication dominates daily life. Farmers, delivery workers, and ordinary citizens often rely on spoken instructions rather than typing on keyboards. Recognizing this, Sarvam AI built its systems to handle speech naturally. Unlike global models that struggle with India’s unique mix of English and native languages, Sarvam Audio has been trained from scratch on 22 Indian languages. This allows it to understand “code-mixing,” where people switch between languages mid-sentence, something common in everyday conversations.

Beating the Benchmarks

Sarvam’s claims are backed by measurable results. On the IndicVoices benchmark, Sarvam Audio consistently outperformed Google’s Gemini-3-Flash and OpenAI’s GPT-4o in transcription accuracy. The model showed a lower Word Error Rate (WER), which means fewer mistakes in converting speech to text.

The company’s visual model, Sarvam Vision, also achieved impressive scores. On the olmOCR-Bench, it reached 84.3 percent accuracy, surpassing Gemini 3 Pro and DeepSeek. In document analysis, it scored 93.28 percent on the OmniDoc benchmark, proving that a smaller, specialized model can outperform much larger global systems when dealing with Indian-style documents, tables, and formulas.

Direct Action from Speech

One of Sarvam’s biggest innovations is Speech-to-Command. While global systems usually convert speech into text before processing, Sarvam Audio can directly trigger actions from voice input. This reduces delays and avoids errors in noisy environments. For example, if someone says “Nau” in Hindi, Sarvam correctly interprets it as the number “9,” while other systems might mistake it for the English word “No.”

Handling Real-World Challenges

Sarvam Audio also introduces advanced speaker diarization, meaning it can identify up to eight different speakers in a single recording. This is especially useful in India’s busy offices and call centers where multiple voices overlap. The model is optimised for 8kHz telephony, making it effective even with low-quality audio from traditional phone calls, a common reality in customer service.

Building Sovereign AI for India

Sarvam AI’s rise is supported by the IndiaAI Mission and government-backed GPU clusters. By creating models entirely within India, the company is ensuring that Indian users are not dependent on foreign systems. This approach, often called sovereign AI, is about giving India control over its own digital future.