Authors - Linda Sara Mathew, Anna Irene Ditto, Anna Keerthana V, Cristal James Tomy Abstract - With proper and real-time crop mapping and yield prediction, agricultural planning, food security, and climate-resilient decisions are necessitated. The conventional field surveys are slow, expensive and inconsistent whereas the increased supply of multispectral, hyperspectral and SAR satellite imagery has made automated crop surveillance possible. Nevertheless, operational methods continue to suffer significant setbacks, such as low accuracy in the presence of a cloud cover, lack of empirical models of the complex time-dependence of temporal growth, difficulties in treating mixed pixels in the smallholder landscape, and the lack of a single framework that incorporates optical, SAR, and phenology data. Even though recent researchers have investigated deep spatio temporal models to map rice, SAR–optical fusion, mixed-pixel decomposition, temporal attention networks, multi-GPU UNet architectures, and phenology-based yield estimation, none of them have an all-encompassing, scalable framework. The study suggests a Multimodal Deep Spatio-Temporal Framework that involves multispectral alongside SAR images and phenological data, which can be used to automatically map crops and predict yields. With CNN-LSTM encoders, attention-based TCNs, adaptive mixed-pixel processing, multimodal fusion, and multi-GPU segmentation, the framework should help provide a powerful, scalable agricultural intelligence system that can be used to monitor the region and country in real-time.
Authors - Emerson Joey Caro Abstract - Detecting brain tumors or Brain Tumor Detection(BTD) from MRI scans is an essential step in the assessing of the presence and characteristics of any tumors and formulating an appropriate clinical management plan. The manual interpretation of MRI images by radiolo gists is not time-efficient as well as susceptible to mistakes, which drives the need for automated, accurate and reliable computational methods. In this study we will compare the most advanced Deep Learning (DL) ar chitectures, including traditional CNNs (VGG19, ResNet50, DenseNet), modernized CNNs inspired by transformer design (ConvNext) and Effi cientNet, to tell apart between tumor and non-tumor categories in brain MRI scans. Each model is trained and evaluated on a standardized dataset relying on measurable data such as accuracy, precision, recall, F1-score, F1 score, and confusion matrix. Our results demonstrate that modern CNN architectures such as ConvNext and EfficientNet outper form traditional CNNs, which capture both local texture, spatial patterns and the global spatial context, leading to improved context, resulting in enhanced classification performance. This benchmark is informative in evaluating the best models used in deep learning and adopt them to identify brain tumors, and in turn may be used in optimizing the use of diagnostic decision-making to improve and reducing the burden on the diagnosis.
Authors - Vasavi Ravuri, Indupriya Vempati, Sai Anuradha Kappaganthula, Pavani Muppalla, Navya Taduri Abstract - In the shadow of overlooked safety violations, different factories have lost thousands, in terms of capital as well as lives. Which is especially harrowing as these were caused due to easily preventable work accidents or easily noticeable defective machinery. Our paper dives into how artificial intelligence based methodologies, particularly, would help in mitigating these risks based on past and present research. We also recommend a potential prototype system according to the findings from the literature we reviewed, for Real-Time worker safety check and automated industrial machine quality inspection system. We have reviewed four major topics pertaining to our system: [1] Personal Protective Equipment (PPE) compliance detection through CCTV monitoring as opposed to manual monitoring, [2] industrial machine quality inspection for automatic defect identification [3] evaluation of previously used object detection models and their performance for industry applications, and [4] system level considerations for practical deployment of the said systems on a large scale. We have compared methods, deployment strategies and results from existing studies to identify key criteria like scalable architectures as well as low latency processing. We are highlighting challenges such as insufficient annotated data for rare machinery defects, good accuracy in harsh industrial conditions that might hinder detection of safety violations, and ethical issues with worker monitoring as well in this paper.
Authors - Siddalingappagouda Biradar, Vinod B Durdi, Suganthi Neelagiri, Devaraju Ramakrishna, Preeti Khanwalkar, Shashi Raj K Abstract - Phishing attacks continue to evolve in scale and sophistication, working on weaknesses across infrastructure, content, and user behavior. Earlier studies demonstrated that hybrid feature representations combining URL, HTML, and infrastructure features significantly outperform single-source approaches, with tree-based and deep learning models achieving detection accuracies exceeding 95%. However, these studies also revealed limitations related to global feature selection, cluster-agnostic learning, and evaluation protocols that may lead to optimistic performance estimates. In this paper, propose a multi-cluster phishing detection framework that organizes features into three complementary clusters: Cluster 0 (C0) for infrastructure and transport-layer characteristics, Cluster 1 (C1) for URL and HTML content features, and Cluster 2 (C2) for behavioral and campaign-level patterns. To address the limitations of traditional feature selection methods, we introduce HC²FS (Heuristic-Constrained Class-Conditional Feature Selection), a cluster-aware and class-conditional approach that preserves low-variance yet highly discriminative phishing indicators. The proposed system is evaluated on large-scale datasets comprising over 600 combined features, using a strict 80% training and 20% testing split enforced prior to feature selection and model training.
Authors - Koutaro HACHIYA, Ioannis PATIAS Abstract - Inference latency remains a critical bottleneck in deploying large language models, for real-time and resource-constrained environments. Prior work has proposed latency formulations that express latency as a function of key parameters. However, they often assume a linear dependence on sequence length, which fails to generalize to tasks involving significantly longer sequences, such as document-level language modeling, long-context retrieval, or time-series forecasting, where latency scales nonlinearly and unpredictably. This paper addresses the limitations of existing latency formulations by proposing three complementary enhancements to improve generalization across varying sequence lengths. First, we introduce a nonlinear term for sequence length, capturing the superlinear growth in latency observed in transformer-based architectures due to quadratic attention mechanisms and memory overhead. Second, we propose a sequence-length-dependent scaling factor for the sequence length parameter itself, allowing the model to adaptively adjust its sensitivity based on empirical latency profiles across different tasks and hardware configurations. Third, we incorporate an empirical correction term enabling calibration of the latency model to account for hardware-specific and implementation-level nuances. By explicitly modeling the nonlinear and context-sensitive behavior of sequence length, our approach offers a more faithful representation of latency dynamics. This work lays the foundation for more adaptive and hardware-aware latency estimation frameworks, with implications for model deployment, scheduling, and cost optimization in production systems. We conclude by discussing future directions for integrating dynamic profiling and reinforcement learning to further refine latency predictions in evolving runtime environments.
Authors - Joao Paulo Sousa, Tiago Lopes, Tatiana Ferreira, Tatiana Batista, Pedro Malheiro, Joao Vitorino, Barbara Barroso, Carlos Costa Abstract - Medical hyperspectral imaging (MHSI) represents a burgeoning paradigm in diagnostic visualization, capable of capturing contiguous spectral signatures across hundreds of narrow wavelengths to delineate pathological structures invisible to the human eye. Despite its diagnostic richness, the advancement of deep learning models in the MHSI domain is severely constrained by two primary challenges: the extreme scarcity of high-quality, pixel-level annotated datasets and the overwhelming data redundancy inherent in high-dimensional hypercubes. Traditional self-supervised methods, particularly masked image modeling, often fail to prioritize discriminative tissue signatures, while domain-agnostic transfer learning from natural images proves inappropriate due to structural and feature-level incongruities. This paper introduces a novel high-quality research methodology: Reinforced Spatio-Spectral In-Context Learning (RSS-ICL). This framework integrates an asynchronous advantage actor-critic (A3C) reinforcement learning agent with visual in-context learning (ICL). The proposed model employs the RL agent to dynamically learn adaptive masking strategies that prioritize high-entropy, "hardto- reconstruct" spatio-spectral voxels, thereby forcing the backbone architecture to capture intricate biochemical signatures during pre-training. By reformulating segmentation as a supportquery inpainting task, RSS-ICL facilitates universal medical segmentation, allowing the model to adapt to novel clinical tasks and unseen tissue types in a zero-shot or one-shot manner. Theoretical arguments suggest that this synergistic approach effectively bridges the gap between low-level signal recovery and high-level semantic understanding in hyperspectral analysis. Through rigorous methodological development and empirical support from existing selfsupervised benchmarks, this paper outlines a path for accelerating the deployment of interpretable, annotation-efficient clinical AI.
Authors - Sushmita Sarkar, Sumit Kumar Debnath Abstract - Multi-angle image synthesis is highly important when it comes to the generation of 3D scenes. But the current methods are either ex pensive in terms of computational costs or lack photorealism in their outputs. We propose a novel sketch and text based multiview image generation approach that solves the above-mentioned problems by mak ing use of multimodal diffusion models efficiently. Our pipeline utilises DreamShaper v8 for converting the input sketch and text into a pho torealistic 2D image and then passes this 2D image into a fine-tuned Zero123plus model for the final generation of consistent multiview im ages, showing a 43.69% improvement in the overall perceptual quality compared to baseline sketch-to-multiview models. Moreover, our pipeline shows flexibility in scalability by generating anywhere from 6 to 64 consis tent multiview images according to the requirements of the downstream tasks. We demonstrate the success of our pipeline through extensive ex periments conducted using voxel-based grid approaches and Neural Ra diance Fields (NeRF). Our pipeline greatly reduces computational costs, all while maintaining photorealism in the outputs, confirming the poten tial of sketch and text based multimodal conditioning as an intuitive and efficient paradigm for controlled 3D content generation.
Authors - Carl Kugblenu, Petri Vuorimaa Abstract - Compressed-domain audio steganography poses a critical foren sic challenge in modern VoIP systems, particularly within low-bitrate codecs. Traditional deep learning models often lack interpretability and struggle with low embedding rates. This paper introduces AUSPEX, a lightweight forensic framework ( 170k parameters) optimized for uni versal compressed audio steganalysis. A novel three-channel tensoriza tion strategy is proposed; incorporating raw bits, temporal derivatives, and bit stability to amplify subtle embedding perturbations. A non trainable high-pass residual stream further enhances sensitivity to first and second-order temporal noise. To ensure forensic transparency, a dual level explainability framework integrates intrinsic spatial attention with post-hoc Integrated Gradients, providing bit-level evidence attribution. Experiments demonstrate detection across CNV and PMS algorithms at low embedding rates. AUSPEX advances the field by unifying ef f icient, edge-deployable detection with rigorous human-centric forensic interpretability.
Authors - Nitika Gawande, Pradnya Bapat, Sanyukta Sasane, Trupti Bankar, Rakhi Dongaonkar, Rashmi Apte, Mangesh Bedekar Abstract - The abstract of the study emphasizes the thorough discussion of cussword usage in Hollywood films over a period of thirty five years, from 1990 to 2025, particularly in genres such as Action, Comedies, and Romances. On the basis of a carefully selected dataset of cusswords from Kaggle along with a considerable subtitle file dataset (.srt), the results have been obtained to determine whether profanity has been used over the years with an appropriate level of intensity in the respective genres of films.
Authors - Kostiantyn Hrishchenko, Oleksii Pysarchuk Abstract - Flexible Job Shop Scheduling Problems (FJSP) involve large discrete decision spaces and strict feasibility constraints, making them challenging for deep reinforcement learning methods. In this work, we study how state represen tation and feature extraction architecture influence the performance of action masked Proximal Policy Optimization (PPO) in flexible scheduling. The scheduling task is formulated as a sequential assignment of operations to machines with a fixed discrete action space, where infeasible actions are removed using a feasibility mask. The environment state is represented using three heter ogeneous feature blocks describing resource availability, operation readiness, and time-related attributes of assignment alternatives. We compare a baseline single-branch encoder with a multi-branch feature extraction architecture that processes these blocks separately before aggregation. Experiments were conducted on the Brandimarte MK benchmark suite (MK01 MK10). Under identical training conditions, the multi-branch representation achieved lower makespan on 9 out of 10 instances, with relative improvements ranging from 2.4% to 27.8% compared to the single-branch baseline. The largest reductions were observed on MK06 (−27.8%) and MK10 (−25.2%), while per formance remained comparable on MK08. Training results indicate improved stability and more consistent convergence for structured representations. These results demonstrate that structured state design and feature extraction ar chitecture are critical factors in action-masked reinforcement learning for flexible job shop scheduling.