Sunday Special: How Bhashini is teaching AI to speak 22 Indian languages
“There are ground rules,” Dave explains. “The word ‘doctor,’ for instance, will not be rendered as ‘tabib’ or ‘daktar’. Similarly, medical terms like bronchial carcinoma, a variant of lung cancer, will not be given convoluted translations. So far, nearly 2 lakh such translations have been completed for Gujarati alone.”
Bhashini’s core team gathered recently at DAU to review four years of work as the initiative reached a significant milestone. With both an app and a website now live, the team took stock of what has been achieved and mapped the road ahead.
Top officials of Bhashini said that Bhashini now has 57 languages including 22 Indian and 35 foreign. In addition, there are 14 languages on the text platform. On February 22, tribal language Bhili was also added.
Translation is only part of the story. “The focus is equally on pragmatics — it should not only convey the literal meaning but should also make the context clear. The use of Bhashini has been successfully demonstrated at events like the Maha Kumbh,” he added.
The team that met at DAU
Experts associated with the project said that one of its focus areas is data sovereignty — keeping user data within Indian systems rather than routing it through foreign servers. The platform is also designed to be voice-native, allowing users to interact directly through speech, without needing to type a single word. Recently, at the India AI Impact Summit 2026, MeitY unveiled VoicERA — an open-source, end-to-end Voice AI stack.
Govt departments can now quickly deploy voice-enabled citizen services in areas such as agriculture advisories, livelihood services, grievance redressal, citizen feedback, and scheme discovery.
Prof Dipti Misra Sharma, a computational linguistics expert from IIIT Hyderabad and chief investigator for the Indian Language to Indian Language Machine Translation (IL-IL MT) initiative, offers a window into both the scale and the complexity of the work.
The diversity of India’s languages presents a challenge and an opportunity, she notes. “We increasingly encounter mixed-language usage such as “Jab We Met”. How to train a model to translate that? These were the questions we wrestled with when working on the project. To give a sense of the computational scale involved, a supercomputer, Param Siddhi AI at C-DAC, was deployed for the task,” she adds.
Need for Green AI
A new lease of life for Dogri
Dr Preeti Dubey, assistant professor of Computer Applications at Government College for Women, Udhampur, represented the Dogri language cohort at the DAU gathering. Compared to major Indian languages, she explained, Dogri enters the AI age with a thinner training dataset.
“Like elsewhere in India, the Jammu and Himachal Pradesh region — home to the bulk of Dogri speakers — is seeing more children enrol in English-medium schools. Many learn Dogri at home, from parents or grandparents. For the project, we reached out to local libraries, converting physical pages to PDFs and then using optical character recognition to glean Dogri words and sentences,” she said.
With a speaker base estimated between 2 and 5 million, Dogri’s inclusion in Bhashini represents a meaningful step toward preserving the language in an increasingly AI-dominated digital world — a model, experts say, of how technology can be turned towards linguistic conservation rather than homogenization.
How AI learns to speak a new language
The process by which AI acquires a language is fundamentally different from how a child does. Where a baby begins by mimicking sounds and slowly learns to connect spoken words to visual symbols, AI learns by detecting statistical patterns across massive datasets.
“If we give an input for translating Gujarati word into Bengali, the system first compares the Gujarati word to English or Hindi, which have far larger training corpora, and then maps that to Bengali, before generating the final translation. But for Bhashini, the larger goal is to improve direct Indian language-to-Indian language translation, and that demands much stronger parallel sentence data, rigorous evaluation and terminology control,” said one team member.
A major part of generative AI involves predicting the next word in a sentence — an ability that only becomes reliable when the AI deeply understands a language’s syntax. This is where linguists become indispensable. They set terminology and style guidelines, review outputs for fluency and adequacy, and help the system navigate the natural variation in real-world usage. Gujarati, like most Indian languages, follows a subject-object-verb sentence structure — quite different from English’s subject-verb-object order — and teams must define a standard register while carefully documenting acceptable dialectal variations, from Surti to Kathiawadi, rather than imposing a single “correct” form.
The building blocks
Data collection: Models are trained on large monolingual corpora drawn from books, news websites, govt documents, online articles and voice samples, supplemented by parallel sentence pairs across language pairs.
Cleaning and standardization: Scientists remove noise and duplicates, and normalize spelling, script, punctuation and formatting for consistency.
Finding and teaching comparable data: The system first learns to match exact equivalent words across languages, then moves to structure and grammar. Source sentences are matched with the target sentences and quality checks are applied throughout.
Tokenization: Words are broken down into smaller units called tokens, allowing the model to process each element individually and handle rare words and morphologically complex forms.
Training the neural machine translation (NMT) model: — A transformer-based system is trained to transfer meaning at the sentence level, generating translations one token at a time.
Employing neural networks: The neural network begins predicting the next word and completing sentences, improving in accuracy through repeated exposure to data.
Beta testing and limited deployment: Once initial benchmarks are met, the model is connected to major databases and the broader internet, where it continues learning autonomously within defined parameters, developing a progressively deeper understanding of the language.
Transfer learning and execution: The model learns from existing data through knowledge transfer (for example, from Hindi to Gujarati) on a multilingual platform at regular intervals. It is assessed through metrics and human review, fine-tuned for specific domains such as medical or legal, and then deployed. Ongoing updates enable it to incorporate feedback and support real-time training.
You Can Also Check: Ahmedabad AQI
|
Bank Holidays in Ahmedabad |
Gold Rate Today in Ahmedabad |
Silver Rate Today in Ahmedabad
Popular from City
- Delhi court directs cricketer Shikhar Dhawan’s ex-wife to return Rs 5.7cr
- 'Mayday, Mayday, Mayday': SpiceJet Boeing 737 with 150 on board makes emergency landing at Delhi airport following ‘engine failure’ after takeoff
- Assam woman dragged out of car, gang-raped in front of fiancé; gang robbed couple of Rs 10,000
- UK to issue eVisas from February 25; no need to hand over passports during processing
- CBSE tells Karnataka schools: No class 10, 12 before April 1
end of article
Trending Stories
- T20 World Cup 2026 Super 8 Points Table: India's road to the final four gets complicated
- CBSE Class 10 Science Paper 2026 PDF Now Available: Download and Analyze
03:14 India Rebuts Pakistan At UNHRC: J&K budget over double IMF bailout; India asserts region's legal status03:59 Mark Carney’s First India Visit: Canada no longer links India to violent crimes; trade, security talks ahead07:13 Watch: Netanyahu surprises ‘friend’ PM Modi in traditional Indian attire; gets a ‘shaandaar’ response- India semi-finals qualification scenario: What India must do to stay alive at T20 World Cup
03:47 Trump to raise tariff to 15% or more for some; no hike for China - countries with trade deals to be ‘accommodated’
Featured in city
- Delhi Lawyer Shot At, Probe On: Police launch investigation after Naveen Boxer claims attack; one injured
- Water supply to be disrupted in east, south Bengaluru on Feb 26
- Ex-YouTuber Bonu Komali texts mother in Kuwait, hangs self at Hyderabad home; diary mentions 'unrequited love' for techie
- Day after launch, entire Delhi-Ghaziabad-Meerut RRTS corridor likely to cross 1 lakh ridership
- 2 vendors arrested for applying rat poison to fruits in Mumbai
- ‘Sorry Papa, galti se ho gaya’: Lucknow teen’s apology after shooting, dismembering father
Photostories
- Your heart is talking: 10 signs of an unhealthy heart you should not ignore.
- 9 quintals of adulterated mawa seized in Kanpur: 5 methods to check mawa purity at home
- Mercury Retrograde 2026 survival guide for every zodiac sign
- 'The Bluff', 'Cross'; Best of OTT shows to watch before February ends
- Karan Kundrra and Tejasswi Prakash’s love story: A look into the beloved ‘Bigg Boss 15’ couple’s relationship
- How does Buckingham Palace look from inside: 7 breathtaking pictures
- How to make Masala Omelette with just 1 tsp of oil
- Baby names inspired by peace and calm energy
- 6 countries that don’t really have “names” — just official descriptions
- 8 unique shade-tolerant plants for a lush balcony garden
Videos
03:48 From Accusations To Engagement: Canada Changes Tone On Indian Interference Before Carney Visit03:14 'Living In La La Land': India Destroys Pakistan At UN, Says J&K's Budget Is Double Of IMF Bailout05:00 Delhi And Himachal Police Face Off Over Arrest Of Protesting Youth Congress Workers At AI Summit24:13 Red Carpet Welcome For PM Modi In Israel, Congress Slams Visit Over Gaza ‘Genocide’ | Headlines@803:52 Big Honour for India: PM Modi Becomes First to Receive Knesset Speaker’s Medal05:06 "Zero Tolerance For Terrorist Acts..."Jaishankar Sends Strong Message Against Terror at UNHRC33:06 'We Feel Your Pain': PM Modi Shares 'Pain Of Terror' With Israel, Slams Hamas And October 7 Attack07:13 ‘More Than A Friend, A Brother’: Israel PM Benjamin Netanyahu Hails PM Modi During Knesset Address05:28 PM Modi’s Visit to Israel End IAF’s Need For Tanker Aircrafts | Watch
Up Next
Start a Conversation
Post comment