Authors - Christian Vera, Christian Torres-Moran Abstract - This study examines how students distribute visual attention and coordinate gaze with response selection when solving image-supported multiple-choice questions in a Google Forms interface. Twenty-five students participated, selected through convenience sampling under explicit inclusion and exclusion criteria, while both fixations and click events were recorded. Oculomotor signals were processed using clustering algorithms to derive participant-specific gaze AOIs and click AOIs, complemented by a 3×3 grid-based spatial analysis to quantify global space utilization. Metrics were computed including time to first fixation, total fixation duration and fixation counts per area, transitions between areas, and the proportion of pre-response fixations within the region where the click was executed. Results show a systematic concentration of fixations in the central band of the interface, where the image and response options are located, with one or two dominant areas accounting for most fixation time. The optimal number of gaze clusters ranged from two to eight across participants, reflecting more focused versus more exploratory strategies. A high level of attention–action coupling was observed, with 80% to 95% of clicks occurring within the same area that concentrated most fixations. These findings support the use of eye track-ing as a tool for cognitive validation of item design and inform principles for more efficient and transparent digital assessments.