APP Users: If unable to download, please re-install our APP.
Only logged in User can create notes
Only logged in User can create notes

General Studies 3 >> Science & Technology

audio may take few seconds to load

LANGUAGE LEARNING MODEL (LLM)

LANGUAGE LEARNING MODEL (LLM)

 
 
 
1. Context
 
At the AI Impact Summit, the Bengaluru-based startup Sarvam AI released two Large Language Models (LLMs), which are the foundation for AI systems that power services like Google’s Gemini and OpenAI’s ChatGPT. The two models were trained on 35 billion and 105 billion parameters respectively, and were less power- and compute-intensive than comparable models, while demonstrating improvements over other models in Indian languages, Pratyush Kumar, a Sarvam co-founder said.
 
 
2. What are Language Learning Models (LLM)?
 
 
  • Language Learning Models, more commonly referred to as Large Language Models (LLMs), are a type of artificial intelligence system designed to understand and generate human language.
  • They are built to read text, identify patterns in how language is used, and then produce responses that are coherent and contextually relevant. The term “large” refers to the enormous amount of data they are trained on, as well as the vast number of parameters—mathematical values—that help them process and predict language.
  • At their core, these models work by learning from examples. During training, they are exposed to massive collections of text drawn from books, articles, research papers, and other publicly available material.
  • Instead of memorizing specific answers, they learn the statistical relationships between words. In simple terms, they learn how likely one word is to follow another in a given context. Over time, this ability to predict the next word in a sentence becomes highly refined, allowing the model to generate complete paragraphs, essays, summaries, translations, or even computer code.
  • Modern language models are typically built using a neural network architecture known as the Transformer. This design allows the system to pay attention to the relationships between words in a sentence, even if those words are far apart.
  • Because of this, the model can understand context better than earlier language-processing systems. For example, it can distinguish between different meanings of the same word depending on how it is used in a sentence, and it can maintain coherence across longer passages of text.
  • Although these models can appear intelligent, they do not truly “understand” language in the human sense. They do not possess consciousness, personal experiences, or emotions.
  • Their responses are generated based on learned patterns rather than genuine comprehension. This means they can sometimes produce incorrect or misleading information, especially if the training data contained errors or biases.
  • Language Learning Models have become important because they change the way humans interact with technology. Instead of using rigid commands or technical instructions, users can communicate naturally in everyday language.
  • This has applications in education, business, governance, research, customer service, and many other fields. By enabling machines to process and generate language fluently, these models act as powerful tools that assist with writing, problem-solving, and information analysis.
 
 
3.How are LLM are Trained ?
 
  • Large Language Models are developed and deployed using clusters of high-performance Graphics Processing Units (GPUs). The expense of procuring these GPUs, combined with the substantial electricity required to operate them for extended training periods, often amounts to several million dollars.
  • Equally critical to this process is access to vast volumes of data, much of which is sourced from the internet. However, online content is far more abundant in English, European languages, and East Asian languages such as Korean and Japanese, compared to most Indian languages.
  • This imbalance creates a dual difficulty for building LLMs within India using domestic funding.
  • First, the limited availability of high-quality data in Indian languages means that many models either deliver weaker performance in these languages or consume additional computational resources—often translating user inputs into English for processing and then translating responses back into the original language.
  • Although machine translation for Indian languages has improved significantly and is frequently relied upon to enhance output quality, this approach is not always optimal.
  • Second, financial constraints present another barrier. Developing and training large-scale language models requires significant capital investment, which can be difficult for Indian companies to justify, particularly in the absence of clear and immediate commercial applications tailored to local markets.
  • Dependence on translation layers also poses practical challenges for developers aiming to promote indigenous LLMs.
  • For instance, locally developed models such as Sarvam’s 35-billion-parameter system—demonstrated at a summit research symposium and adapted for use on feature phones—may face limitations if their performance in Indian languages is not robust. Such shortcomings can affect user experience, adoption rates, and overall effectiveness in real-world applications
 
4. Government Initiatives 
 
 
  • Under the IndiaAI Mission, the government has supported domestic AI development by facilitating large-scale computing infrastructure within the country.
  • More than 36,000 GPUs have been deployed across data centres run by Indian companies such as Yotta, enabling researchers and startups to undertake model training and inference at concessional rates.
  • As part of this initiative, Sarvam was provided access to 4,096 GPUs from a shared national compute facility, with government support for this effort estimated at nearly ₹100 crore.
  • The total infrastructure cost of this GPU cluster is reported to be around ₹246 crore, though the resources are expected to remain available for broader use beyond a single project.
  • The Ministry of Electronics and Information Technology has promoted indigenous LLM development for multiple strategic reasons. A central concern is that models created abroad may lack both the incentive and the contextual depth needed to effectively support India’s diverse linguistic landscape.
  • Additionally, building domestic capacity to train and deploy large language models is viewed as essential for strengthening India’s broader artificial intelligence ecosystem and nurturing homegrown expertise.
  • In this context, Sarvam’s unveiling of its two language models marks an important milestone in India’s efforts to build a high-performance yet cost-efficient LLM. The government appears keen to replicate the kind of cost innovation seen when China’s DeepSeek introduced its R1 model, which was rapidly adopted across the AI sector for reducing training and inference expenses without sacrificing performance quality. Policymakers hope to encourage a similar competitive advantage in India
 
 
5. Way Forward
 
 

An important advancement for AI systems designed to operate efficiently in local environments has been the development of the Mixture of Experts (MoE) architecture. Early large language models were built with hundreds of billions—or even more than a trillion—parameters, and during inference they generally relied on activating the entire network of parameters to generate responses. This approach significantly increased computational costs and made each query resource-intensive.

In contrast, the MoE framework improves efficiency by engaging only a selected subset of the model’s parameters for any given task. By activating just a portion of the overall network rather than the whole system, MoE-based models can process requests more quickly while reducing computational load and operational expenses

 

 
 
For Prelims: Current events of national and international importance
For Mains: GS-III: Awareness in the fields of IT, Space, Computers, robotics, nano-technology, bio-technology and issues relating to intellectual property rights.
 
 
Previous Year Questions

1.With the present state of development, Artificial Intelligence can effectively do which of the following? (UPSC CSE 2020)

1. Bring down electricity consumption in industrial units

2. Create meaningful short stories and songs

3. Disease diagnosis

4. Text-to-Speech Conversion

5. Wireless transmission of electrical energy

Select the correct answer using the code given below:

(a) 1, 2, 3 and 5 only

(b) 1, 3 and 4 only 

(c) 2, 4 and 5 only 

(d) 1, 2, 3, 4 and 5

Answer (b)

(b) 1, 3, and 4 only

Explanation:

  1. Bring down electricity consumption in industrial units - AI can optimize energy usage and reduce consumption in industrial settings through predictive maintenance and optimization algorithms.
  2. Create meaningful short stories and songs - While AI can generate text and music, creating truly meaningful and original artistic content remains a challenge.
  3. Disease diagnosis - AI has demonstrated capabilities in disease diagnosis through medical imaging analysis, pattern recognition, and data-driven diagnostics.
  4. Text-to-Speech Conversion - AI can effectively convert text into speech with high accuracy and natural-sounding voice synthesis.
  5. Wireless transmission of electrical energy - While AI may be involved in optimizing energy transmission systems, the direct wireless transmission of electrical energy is primarily a technological and engineering challenge, not directly related to AI capabilities
 
Source: The Hindu

 

Youtube:

Share to Social