Salous, Mazen and Lange, Daniel and Reeken, Timo Von and Wolters, Maria and Heuten, Wilko and Boll, Sussane and Abdenebaoui, Larbi
IEEE Access
IEEE Access
Ensuring that blind and visually impaired (BVI) individuals can fully participate in today’s image-rich digital world remains a significant challenge. Current visual assistants often rely on annotations from sighted contributors, which may fail to capture BVI individuals’ preferences and expectations. Addressing this challenge, we explore how to develop optimal, conversational image descriptions that resonate with BVI individuals, and how to effectively scale their creation. We propose a semi-automatic approach that couples large language models (LLMs) with iterative, BVI-driven refinements. Starting with initial LLM-generated image descriptions, a small set of BVI experts refine them to better meet BVI individuals’ needs. These enhanced examples guide subsequent LLM outputs, which are then evaluated by BVI end-users to confirm improved quality and satisfaction. We contribute to constructing large-scale, BVI-centered training data, thereby advancing inclusive and conversational visual accessibility. Throughout our studies, the results showa significant improvement in BVI end-users’ satisfaction with image conversational descriptions when edited by BVI experts. Additionally, the results indicate promising improvements when the LLM regenerates descriptions using those edits as few-shot examples. Moreover, we got valuable insights from BVI participants’ focused mainly on optimizing clarity, relevance, and interaction patterns for image descriptions and conversational exchanges between AI and BVI individuals.
2025
article
1-1
Ability Haptic Tablet For The Accessibility of Digital Content to the Visually Impaired