New Publication: ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

March 10, 2025

New paper from the lab is appearing in IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG; IF: 6.5) and is being presented at IEEE VR 2025: ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality.

In this paper, Duke I3T Lab PhD student Yanming Xiu explores using vision-language models to detect that virtual elements are blocking important parts of the real world in AR. The paper introduces a new attack dataset and develops an architecture that employs edge computing to enable detection of real-world obstruction attacks with >92% accuracy with a latency of under 1s.

Paper PDF
Codebase and dataset
Accompanying IEEE VR 2025 demonstration: PDF and a video

IEEE VR 2025 demonstration of VIDDAR implementation with a Meta Quest 3 headset

We thank DARPA and the NSF for supporting this research.