Authors - Abdelrahman El Antably, Ali Hamdi, Ammar Mohamed Abstract - Large Language Models (LLMs) frequently generate plausi ble but incorrect information, known as hallucinations. Detecting these errors at a fine-grained level is crucial, especially for morphologically rich languages like Arabic with limited resources. We introduce BAL ANCE:Bi-perspective Analysis for LLM Accuracy via coNsensus ChEck ing, a novel dual-judge framework for token-level hallucination detection in Arabic LLM outputs. Our six-module pipeline features context filtra tion, argument decomposition, and distinct strict and lenient LLM-based judges. A consensus coordinator then synthesizes their verdicts, and a span annotator precisely localizes errors. Evaluated on the Arabic sub set of the SemEval-2025 MuSHROOM benchmark, BALANCE achieved an Intersection over Union (IoU) score of 72.87%. This significantly outperforms the task’s winning system by approximately 8.76% rela tive improvement and consistently surpasses zero-shot baselines across various LLMs by up to 39.80 percentage points.