Authors - Fredy Gavilanes-Sagnay, Edison Loza-Aguirre, Luis Castillo-Salinas, Narcisa de Jesus Salazar Alvarez Abstract - Ayurveda, India's ancient system of medicine, is full of inter-connected knowledge about diseases, their symptoms, herb and formulation (compounds). However, texts such as Charaka Samhita are mostly unstructured and cannot be readily analysed computationally. This work presents AyurKOSH which is a machine-readable, high-quality Ayurvedic dataset that is designed as a Knowledge Graph (KG) in order to support Artificial Intelligence driven research. The dataset is represented as subject–predicate–object triplets, which enables semantic interoperability, graph traversal, and multi-hop inferencing across entities. The dataset is designed by following schema-driven ontology which standardizes relationships between various nodes such as diseases, symptoms, pharmacological attributes, and compound formulations. DB Schema ensures consistency and computational tractability. AyurKOSH has the structured data of diseases and related symptoms, drug preparations, herbs and the detailed pharmacological properties are Rasa, Guna, Virya, Vipaka, Karma. The graph structure shows real-world biomedical network characteristics such as high sparsity and low average degree, which makes it suitable for embedding-based learning, graph neural networks, and explainable AI frameworks. Moreover, there is botanical metadata and herb-substitution relationships added for the prediction of synergy and repurposing of drugs. The dataset facilitates applications in biomedical NLP, and automated reasoning systems and clinical decision assistance, and pedagogy in integrative medicine. AyurKOSH became available for academic and non-commercial research under CC BY-NC-SA 4.0 license.
Authors - W M I T Warnasooriya, T D Jayadeera, A M G S Adhikari, M A F Zumra, A J Vidanaralage, M Samaraweera Abstract - The integration of large language models (LLMs) into primary educa tion remains limited in low resource, diglossic languages like Sinhala. General purpose models often produce grammatically inconsistent or cognitively over whelming output for young learners. This paper introduces a grade-adaptive, con straint-driven framework for automated Sinhala story and quiz generation target ing Grades 1-5. Building upon an 8-billion-parameter Sinhala-adapted LLaMA 3 model, we apply Quantized Low-Rank Adaptation (QLoRA) using a curated multi-task educational dataset. The system enforces tier-specific linguistic con straints separating conversational Sinhala for lower grades from formal written Sinhala for upper grades while embedding strict structural rules such as con trolled sentence counts (5-6 vs. 7-8) and validated multiple-choice formats (3 vs. 4 options). Evaluation on 100 structured prompts demonstrated substantial im provements over a zero-shot baseline: structural compliance increased from 64% to 93%, and hallucination-related failures decreased from 31% to 8%. Further more, evaluation against 50 unseen real-world classroom prompts yielded a 0.0% crash rate and 95% register adherence, confirming robust qualitative perfor mance. Results demonstrate that diglossia-aware dataset engineering and con straint-aware fine-tuning enable reliable, pedagogically aligned deployment of LLMs in low-resource primary learning environments.
Authors - S. M. Mizanoor Rahman Abstract - Removable USB storage devices are widely used in day-to day computing, but they also introduce risks such as unauthorized data transfer and misuse of external media. Understanding how these devices are used on a system is important during forensic investigations, espe cially when analyzing potential data leakage incidents. On Windows sys tems, traces of USB activity are not stored in a single location. Instead, they are distributed across registry entries, system logs, and file system records. Examining these sources individually often makes it difficult to form a clear picture of events. This paper introduces a forensic frame work that brings together USB-related artifacts from multiple system components and analyzes them in a unified manner. The method gath ers data from sources such as registry entries, Plug-and-Play logs, and f ile system structures, and then aligns them based on their timestamps. A Python-based implementation is used to automate this process and to relate device connection events with file operations. Experiments con ducted on a Windows setup show that the framework can identify device usage and reconstruct the sequence of related activities with clarity. By combining evidence into a single timeline, the approach helps simplify analysis and supports consistent interpretation of results.
Authors - Shamita Jagarlamudi, Soormayee Joshi, Aman Aditya, Anushka Gangwar, Pratvina Talele Abstract - Federated Learning (FL) is a privacy-preserving, distributed learning framework where models are trained locally on client devices, and only the trained parameters are shared with a central server. Nevertheless, FL encounters substantial obstacles in real-world applications due to data heterogeneity, such as non-IID distributions leading to local inconsistencies and client drift thereby diminishing global model efficacy. To tackle these challenges, we propose a Federated Prox Drift Correction (FedPDC), an effective and practical method designed to mitigate client drift and local overfitting through the use of drift correction and proximal terms. Comprehensive experiments conducted on public datasets demonstrate that FedPDC performance is superior compared to state-of-the-art methods.
Authors - U. A. Walke, G. A. Kulkarni, Pranav Mungankar, Om Kale, Tejas Kadam Abstract - Digitizing damaged historical texts requires multiple processing steps that can propagate semantic noise through the workflow. Efforts have been made to improve the recognition, correction, and normalization steps of the pipeline, but few studies have quantified model-level effects in isolation under a controlled architecture setup. Here we present Probanza, an extensible staged evaluation framework that decouples preprocessing normalization from semantic modeling to facilitate clean comparisons between LLMs. We perform super-resolution, contextual correction, and historical normalization before English translation. We selected 30 total degraded pages from the Florentine Codex and digitized them with three LLM configurations: GPT-5, GPT-4o, and Gemini 3 Flash. Co sine similarity was computed between model predictions and archival baseline translations to measure semantic accuracy. A one-way repeated-measures ANOVA was done to examined differences across configurations. The analysis revealed a significant main effect of LLM configuration. Gemini 3 Flash pro duced the highest mean similarity (M = .881, SD = .075), while GPT-5 (M = .783, SD = .147) and GPT-4o (M = .769, SD = .135) which were not significantly dif ferent from one another. Our results demonstrate that significant differences exist between LLM configurations for the task of digitizing damaged historical texts when preprocessing is held constant. Probanza allows an isolating model-level effects comparison in LLM-based historical digitization workflows.
Authors - Kushall Pal Singh, Vijay Kumar, Monu Verma, Dinesh Kumar Tyagi, Santosh Kumar Vipparthi Abstract - Hybrid enterprise environments spanning on-premises systems and public cloud services increase exposure to credential abuse, lateral movement, and misconfiguration-driven attack paths, motivating continuous verification and policy enforcement beyond perimeter assumptions. This paper presents an Azure-native, AI-enhanced Zero Trust framework that integrates identity-first enforcement (Microsoft Entra Conditional Access, Continuous Access Evaluation, and Privileged Identity Management), telemetry centralization (Microsoft Sentinel with UEBA), and an Azure Machine Learning classifier that outputs a probability-derived 0–100 trust score. Because identity policy engines consume bounded native signals, the framework binds external scoring to enforcement using SOAR automation that updates policy-targeted identity group membership via Microsoft Graph. A controlled A/B evaluation compares a static baseline (non-adaptive enforcement) with an adaptive mode (ML-in-the-loop scoring and automated score-to-policy binding) using MITRE ATT&CK-aligned scenarios: impossible travel sign-in, privilege escalation attempts via privileged activation workflows, and lateral movement via remote access/filesharing pathways. Quantitative outcomes are reported using median (P50) and tail (P95) time-to-detect, decision latency, and false-positive rate. To technically validate the adaptive control loop, the paper also reports an instrumented latency decomposition (trigger delay, playbook runtime, ML scoring call duration, and score-to-policy execution time) to show which components dominate end-to-end delay.
Authors - Karuppasamy E, Krithika V, Harish P, Pravinbaalaa V, Satheeskumar Abstract - The large online data consist of duplication and plagiarized contents. Due to Artificial Intelligence, data generation has become very easy. But, it may also lack an ethical data generation process. Hence, there is a need of validating plagiarism free data for authentic usage. In this research work, authors focus on word-level plagiarism detection methods in Natural Language Processing. The proposed method uses a comparative analysis of cosine similarity, Euclidean distance and Manhattan distance methods for word-level plagiarism detection for different n-gram sizes. The inculcation of n-gram size improved the accuracy compared to unigram based methods. The experimental results of the cosine similarity method outperform Euclidean and Manhattan distance methods by achieving an average accuracy range of 88 % to 92 % and 75 % to 80 % for direct plagiarism and lightly paraphrased text respectively. The future work is to identify reused images and visual contents.
Authors - Nagaraj.M, V. Balamurugan, Matam Veera Chandra Kundan, M.J. Mathesh, V. Vijairam Abstract - Academic credential fraud is a global issue that undermines institutional trust. Although blockchain solutions provide immutability, they are generally reactive, securing documents only after potential errors or fraud have already occurred. This paper proposes a proactive approach to prevent inconsistencies before degree issuance. We introduce a hybrid model that integrates Digital Twins as a preventive validation layer and Multichain as an immutable ledger. The Digital Twin operates as a virtual sensor during the degree creation process at Universidad El Bosque, simulating and validating academic, financial, and national exam data (Saber Pro) in real time; if inconsistencies are detected, “red flags” are triggered prior to issuance. Once validated, the degree’s hash is anchored to a Multichain network. A functional prototype developed in Python achieved a 100% detection rate of inconsistent records during testing. The pro-posed model transforms the academic certification process into a proactive, se-cure, and trustworthy ecosystem by combining preventive validation with block-chain immutability.
Authors - S. M. Mizanoor Rahman Abstract - Driver fatigue is a major cause of accidents on the road that generates major safety issues for drivers as well as passengers. Real-time detection of driver fatigue can help avert accidents by warning the driver about impending lapses in his attention. This paper proposes a real-time automated system for the detection of driver fatigue through observation of eye blink and yawn, which are major notifications for fatigue. The system uses a combination of deep learning models that give high accuracy levels in detecting a drowsy driver. Eye blink is detected by using a state-of-the-art object detection model that is trained to locate the open and closed states of the eyes accurately using correct coordinate mapping methods, giving an accuracy level of 96 percent. Yawning is detected using a combination of CNN and LSTM models that allow it to analyze spatial information as well as temporal information obtained through videos, giving an accuracy level of 98 percent. Both of these modules work on real-time camera inputs, which makes it possible for a constant monitoring of the alertness of the driver. Whenever the driver is found dozing off due to either excessive blinking or yawning, the system releases a real time auditory warning alert to caution the driver. The result of the experiments has justified that the capability of the combined system works well while operating reliability with low-latency responses in real time. This study has shown that the hybrid detection strategy with spatial and temporal analysis is quite effective in detecting a dozy driver on the road and developing such a system that can be helpful in increasing the safety of the road.
Authors - Kaniska D, Shreya J V, Srinidhi K, Sudhakar K S, Bagavathi Sivakumar P, Krishna Priya G Abstract - Language modeling of clinical text in healthcare pens down a necessitated context along with a high level of security measure for sensitive patient information. A few large language models have shown very good clinically related performance in documentation, summarization, and these models have been rolled out freely. Therefore, these models generate hallucinated or non verifiable outputs. Retrieval augmented approaches thus fix the problem by limiting the answer to the evidences retrieved. However, majority of the existing systems rely on the textual records only and the integration of the diagnostic imaging is not done systematically. In this paper, we put forward a retrieval grounded multimodal clinical modeling framework that unifies structured clinical text with imaging-derived contextual features. A patient specific vector indexing approach is used for isolated retrieval and a modality aware visual analytics approach turn imaging outputs into structured signals, hence language generation. The entire framework is performed fully offline, thus supporting privacy preserving deployment in resource-limited clinical settings. Experimental results show steady multimodal integration as well as the semantic consistency alignment between the retrieved evidence and the generated output.