Loading…
Friday April 10, 2026 9:30am - 11:30am GMT+07

Authors - Makrand Dhanokar, Anirban Sarkar, Prajakta Dange Sant, Shivakarthik S, Krishnanjan Bhattacharjee, Swati Mehta
Abstract - Named Entity Recognition (NER) is an essential task for sequence labelling and information extraction that plays a fundamental role in subsequent Natural Language Processing (NLP) applications, such as information retrieval, question answering, knowledge graph development, and machine translation. Although significant advancements have been made in NER for high resource languages, achieving effective entity recognition in Indian languages continues to be an unresolved research challenge because of linguistic diversity, complex morphology, typological differences, flexible word order, script differences, and prevalent codemixing. The scarce presence of annotated datasets and the lack of standardized evaluation metrics further limit supervised and transfer learning methods in these low resource environments. This document introduces a multilingual NER framework rooted in Sentence embeddings derived from Large Language Models (LLMs) and inference guided by prompts. The suggested method employs contextual; language independent embeddings obtained from pretrained multilingual LLMs to encode semantic representations of Indian and foreign languages within a common embedding space. Rather than using traditional token level classification, entity recognition and classification are achieved via structured prompting, allowing for zero-shot and few-shot generalization without the need for task specific finetuning. The system guarantees that entity identification and retrieval take place in the same language as the input text, maintaining linguistic accuracy and reducing error propagation caused by translation. To tackle domain variability and informal writing, constraints/guardrails for prompts and simple rule-based normalization are utilized to manage orthographic differences, script inconsistencies, and codemixed phrases often found in user generated content and social media. Experimental assessment across various Indian languages shows reliable enhancements in precision, recall, and F1score compared to traditional neural and transformer-based benchmarks, especially in low resource conditions. The findings suggest that embeddings powered by LLMs along with prompt-based reasoning provide a scalable and data efficient option for multilingual NER. This project advances the development of resilient, inclusive, and language adaptive systems for extracting information in linguistically varied settings.
Paper Presenter
Friday April 10, 2026 9:30am - 11:30am GMT+07
Virtual Room D Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link