Authors - Mohd. Zuhaib Ahmed, Akash Priya, Deepti Chopra, Pankaj Kumar Abstract - Effective landing and take-off (LTO) decision-making in mil itary aviation is critically dependent on airfield serviceability and pre vailing weather conditions. A fundamental challenge is the absence of structured expert pilot decision logs, as such data are operationally sen sitive and access-restricted. This work presents a replicable methodolog ical framework for developing machine learning-based decision support systems in domains where operational data are scarce or classified. The pipeline encompasses synthetic data forged using correlated Monte Carlo sampling, constrained by location-specific geographic, seasonal, and ter rain parameters across ten Indian Air Force (IAF) stations, yielding ap proximately 60,000 simulated operational scenarios. The dataset is gen erated within domain-constrained operational bounds to ensure physi cal plausibility. A rule-based expert classification system assigns opera tional status as Green (Safe), Orange (Caution), or Red (Unsafe); four ML algorithms are subsequently evaluated: Logistic Regression, Naïve Bayes, Support Vector Machines, and Decision Trees. The Decision Tree achieves the highest performance, with an accuracy of 0.983, an F1 score of 0.983, and a ROC-AUC of 0.984. The proposed framework supports two deployment pathways: the rule engine as a deterministic automa tion tool for standard clearances, and the ML model as the inference core of a real-time Human-in-Loop (HIL) expert system requiring opera tor authorisation at every decision. As expert pilot decision logs become available, the system may be progressively elevated to a fully adaptive expert system.
Authors - Ritesh Kumar Verma, Preethiya T Abstract - Contemporary customer support systems require processing a massive number of user queries with low latency and high semantic relevance. Rule-based systems fail to capture context, while fully LLM-based systems are computation ally expensive and suffer from high latency. This paper introduces an adaptive AI-assisted customer support automation system using an optimized Retrieval Augmented Generation (RAG) model. The proposed system combines Azure OpenAI embeddings, FAISS-based vector search, selective Cross-Encoder re ranking, and a Learning-to-Rank (LambdaMART) model for adaptive score fu sion. Unlike vanilla RAG models, the proposed system adaptively re-ranks only the top-k retrieved candidates, trading off ranking precision and latency. Experi ments were carried out on a 1,30,000-sample e-commerce customer support da taset with query-response pairs annotated with intent labels. Compared to rule based retrieval, embedding+FAISS, and vanilla RAG models, the proposed hybrid system showed improved top-1 retrieval precision with a concurrent reduc tion in end-to-end latency from 0.414s to 0.365s (≈11.8% relative improvement). The LambdaMART model adaptively learned weights from FAISS and Cross Encoder scores, improving ranking robustness and eliminating misranked top re sponses. The system was implemented on Azure Machine Learning with a cloud scale pipeline and interactive Streamlit web interface, showcasing the cost-effec tive inference capabilities of the proposed system via selective re-ranking.
Authors - Abdelrahman El Antably, Ali Hamdi, Ammar Mohamed Abstract - Large Language Models (LLMs) frequently generate plausi ble but incorrect information, known as hallucinations. Detecting these errors at a fine-grained level is crucial, especially for morphologically rich languages like Arabic with limited resources. We introduce BAL ANCE:Bi-perspective Analysis for LLM Accuracy via coNsensus ChEck ing, a novel dual-judge framework for token-level hallucination detection in Arabic LLM outputs. Our six-module pipeline features context filtra tion, argument decomposition, and distinct strict and lenient LLM-based judges. A consensus coordinator then synthesizes their verdicts, and a span annotator precisely localizes errors. Evaluated on the Arabic sub set of the SemEval-2025 MuSHROOM benchmark, BALANCE achieved an Intersection over Union (IoU) score of 72.87%. This significantly outperforms the task’s winning system by approximately 8.76% rela tive improvement and consistently surpasses zero-shot baselines across various LLMs by up to 39.80 percentage points.
Authors - Duy Pham, Tung-Duong Le-Duc, Anh-Tai Pham-Nguyen, Trung Nguyen Mai, Long Nguyen, Dien Dinh Abstract - Multimodal knowledge graphs improve structured knowledge representation and tasks such as cross-graph entity alignment. However, most benchmarks focus on resource-rich languages and assume dense relational structures and balanced attributes. Low-resource languages like Vietnamese pose additional challenges, including structural sparsity, attribute asymmetry, and modality noise. To address this gap, we in troduce DBWiki-VN15K, the first large-scale Vietnamese multimodal knowledge graph dataset for entity alignment. Built from Wikidata and DBpedia, it contains 15,000 aligned entity pairs with relational triples, lo calized numerical attributes, and visual modalities. The dataset provides both word-segmented and unsegmented text to support different linguis tic processing approaches. Experiments with state-of-the-art multimodal entity alignment models reveal that structure-guided multimodal fusion and dynamic modality weighting are more robust to sparse and noisy features. Additionally, unsegmented subword tokenization better han dles cross-graph translation inconsistencies than strict Vietnamese word segmentation. DBWiki-VN15K offers a realistic benchmark for studying multilingual and multimodal knowledge fusion. Our dataset is available at: https://github.com/Tim50c/DBWiki-VN15K.
Authors - Ritesh Jawarkar, Reena Satpute, Sudhir Agarmore Abstract - Because sleep problems can influence the health of a person and his/her quality of life, such diagnosis and treatment relies on specific classification. Even though single deep learning and machine learning models have shown their potential, they are limited by overfitting and bias in the model. In order to solve these issues, the current research proposes the expansion of the ensemble learning-based sleep disorder classification through the inclusion of machine learning model predictions. A voting classifier enhances the optimization base classifier outputs in terms of robustness and classification accuracy. According to Sleep Health and Lifestyle Dataset, the ensemble method has 97.3 percent accuracy with individual models. The interface is designed as a Flask-based web interface that allows user authentication to increase user interaction and usage of the system on a real-time basis. Suggested extension ensures the reliable, accurate and easy-to-use automated sleep problem diagnosis.
Authors - Aman Kumar, Mary Subaja christo Abstract - Content Delivery Networks (CDNs) play an essential role in enhancing the content delivery speed by caching frequently requested data in edge servers distributed across geographical regions. Traditional CDNs utilize rule-based policy and machine learning approaches for optimizing the cache. Machine learning is performed centrally, and the cache optimization is performed using the traffic logs collected by the central server. Although the use of central learning approaches is beneficial, it poses certain limitations, including data privacy and high communication cost. The central learning approach aggregates raw data, which poses data privacy issues. This paper proposes an architecture for secure federated learning, which is utilized for cache hit prediction in CDNs. The proposed architecture is evaluated using a synthetic dataset containing 1,30,548 records, and the features include temporal and network features. The proposed architecture is compared with the traditional central learning approach, and the results reveal that the secure federated learning model achieves an accuracy of 70.15%, which is comparable to the central learning approach. The proposed architecture is found to reduce data privacy exposure by 30%.
Authors - Bambang Marsudi Salim, Hudan Studiawan, Baskoro Adi Pratomo Abstract - Digital forensic investigations face a growing threat from sophisticated log tampering, in which adversaries delete or modify computer event logs to conceal evidence of criminal activity. This paper presents an empirical comparison of A Search and Iterative Deepening A* (IDA*) for reconstructing falsified computer event logs, extending the previous bipartite graph framework. Three log artefacts were constructed from the public forensic timeline dataset: an original computer log, a trusted ISP log, and a deliberately falsified log containing 15 strategically deleted events. To address timestamp heterogeneity arising from different system and ISP browser log parsers, a window-based matching strategy is introduced. Experiments conducted across maximal consecutive event sequences (MCES) demonstrate that IDA* consistently explores fewer nodes than A*. Anomaly detection identified 60.7% of browser events as uncorroborated by ISP records, achieving 60.0% recall on the 15 deliberately deleted events.
Authors - Akshay Ladha, Supraja P Abstract - Twitter social media platforms have become the primary means of communication for customer support, requiring rapid, accurate, and scalable response solutions. Conventional customer support mechanisms are primarily manual and inefficient in handling large volumes of real-time interactions. This paper presents an AI-Assisted Twitter Support System that combines deep learning with distributed streaming engines to automate real-time customer interactions. The system design utilizes Apache Kafka for tweet streaming, Apache Spark Streaming for distributed processing, and Long Short-Term Memory (LSTM) networks for sentiment analysis and multi-class complaint classification. A confidence-aware decision-making module is used to ensure that automated responses are produced only when the prediction confidence level exceeds certain thresholds, thus avoiding potential miscommunications. The system was trained and tested on the Kaggle Airline Sentiment dataset (1,46,400 tweets) with three sentiment classes and eight complaint categories. The sentiment analysis model attained an accuracy of 85.2% (F1-score of 0.846), and the complaint classification model attained an accuracy of 80.5% (F1-score of 0.792). The complete pipeline maintained an average latency of 2.9 seconds with a maximum processing rate of 2500 tweets per minute.
Authors - Pravitha N R, Sreelakshmi S R, Valsalachandran K, Savithri S Abstract - The rapid expansion of digital services has significantly increased the collection and processing of personal data through online platforms such as e-commerce systems, social media applications, and digital payment services. To regulate the use of personal information, governments worldwide have introduced data protection regulations such as the General Data Protection Regulation (GDPR), the Digital Personal Data Protection Act (DPDPA), and the California Consumer Privacy Act (CCPA). Organizations publish privacy policies to inform users about their data practices; however, these policies are often lengthy, complex, and difficult for users to understand. Consequently, users frequently accept privacy policies without fully reviewing how their personal data is collected, processed, and shared. Recent research has explored automated approaches for privacy policy analysis using artificial intelligence techniques, including machine learning, natural language processing, and large language models. Retrieval-Augmented Generation (RAG) has further enhanced compliance evaluation by linking policy statements with relevant regulatory clauses. Despite these advancements, challenges remain, such as the lack of standardised datasets, limited explainability of AI decisions, dependence on prompt design, and insufficient validation with regulatory experts. This paper discusses future research directions in AI-driven privacy policy compliance analysis and highlights emerging opportunities for improving regulatory compliance assessment, user privacy protection, and transparent privacy governance in digital ecosystems.
Authors - Ayushi Raj, Malathy C Abstract - The rapid growth of sensitive data requires backup systems that are both storage-efficient and risk-aware. Traditional backup approaches rely on static policies that ignore temporal changes, data sensitivity, and redundancy, leading to inefficient storage use and higher risk exposure. This work proposes a risk-adaptive backup optimization framework integrating temporal modelling, sensitivity-aware deduplication, and online learning. The system reconstructs data evolution using intrinsic timestamps and quantifies data criticality through continuous sensitivity scoring. A unified risk model combines sensitivity, change intensity, and exposure over time to determine backup urgency. An online rein forcement learning agent dynamically optimizes backup decisions based on evolving data patterns. The framework applies secure, sensitivity-based dedupli cation to reduce redundancy while preserving privacy. Operating in a read-only, metadata-driven manner, it ensures compliance with strict data governance re quirements. By decoupling decision logic from storage, the system supports hy brid cloud environments. Experimental results show reduced storage costs and controlled risk, demonstrating its effectiveness for scalable, intelligent data pro tection.