March 10, 2025
New paper from the lab is appearing in IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG; IF: 6.5) and is being presented at IEEE VR 2025: ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality.
In this paper, Duke I3T Lab PhD student Yanming Xiu explores using vision-language models to detect that virtual elements are blocking important parts of the real world in AR. The paper introduces a new attack dataset and develops an architecture that employs edge computing to enable detection of real-world obstruction attacks with >92% accuracy with a latency of under 1s.
- Paper PDF
- Codebase and dataset
- Accompanying IEEE VR 2025 demonstration: PDF and a video
We thank DARPA and the NSF for supporting this research.