Loading…
Friday April 10, 2026 9:30am - 11:30am GMT+07

Authors - Abdelrahman El Antably, Ali Hamdi, Ammar Mohamed
Abstract - Large Language Models (LLMs) frequently generate plausi ble but incorrect information, known as hallucinations. Detecting these errors at a fine-grained level is crucial, especially for morphologically rich languages like Arabic with limited resources. We introduce BAL ANCE:Bi-perspective Analysis for LLM Accuracy via coNsensus ChEck ing, a novel dual-judge framework for token-level hallucination detection in Arabic LLM outputs. Our six-module pipeline features context filtra tion, argument decomposition, and distinct strict and lenient LLM-based judges. A consensus coordinator then synthesizes their verdicts, and a span annotator precisely localizes errors. Evaluated on the Arabic sub set of the SemEval-2025 MuSHROOM benchmark, BALANCE achieved an Intersection over Union (IoU) score of 72.87%. This significantly outperforms the task’s winning system by approximately 8.76% rela tive improvement and consistently surpasses zero-shot baselines across various LLMs by up to 39.80 percentage points.
Paper Presenter
Friday April 10, 2026 9:30am - 11:30am GMT+07
Virtual Room G Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link