Jedi Vision | ECE443-Sp26 Project Showcase

By

Ajinkya Gokule, Colin Pannikkat, Sarvesh Thiruppathi Ahila and David Gesl

C++

Python

Artificial Intelligence (AI)

OpenCV

Computer Vision

3D Printing

Hardware

Audio

Docker

Spring 2026

Jetson Orin Nano

Stereo Camera

Camera

Bone Conduction

Video Depth Anything

GStreamer

Audio Transformation

Serialization

Steam Audio

Stereo Depth

YOLO

Jedi Vision is a real-time assistive navigation system designed to help visually impaired individuals perceive their surroundings through spatialized audio feedback. The system runs on an NVIDIA Jetson Orin Nano edge computer equipped with stereo cameras and bone conduction headphones, translating live visual scene information into intuitive 3D sound cues that convey the direction and distance of nearby people. The system implements a multi-stage perception pipeline spanning two languages and runtimes. A Python based vision driver performs real-time object detection using YOLOv11 with ByteTrack persistence, computes stereo depth via Semi-Global Block Matching (SGBM) on calibrated and rectified camera pairs, and fuses 2D bounding boxes with 3D reprojected coordinates using Gaussian-weighted depth sampling. Detected object positions are serialized into a compact binary struct format and transmitted over ZeroMQ IPC sockets to a C++ spatial audio engine built on Valve's Steam Audio SDK. The audio engine applies Head-Related Transfer Function (HRTF) binaural rendering, inverse-distance attenuation, and air absorption modeling to produce spatialized tones or musical cues through PortAudio output which enables users to perceive that a person is "to their left and two meters away" purely through sound. Notable accomplishments include achieving real-time inference at ~30 FPS on the Jetson Orin Nano using CUDA acceleration, successful stereo depth calibration and rectification for metric 3D coordinate extraction, a fully Dockerized deployment pipeline that packages the complete vision + audio stack, and a sophisticated audio rendering engine supporting both synthesized pentatonic tones and pre-recorded spatial song playback with per-object note allocation. Key challenges included managing the Jetson's constrained 8 GB memory during inference and Docker builds, tuning stereo block matching parameters for reliable depth at close range, and synchronizing the Python vision loop with the C++ audio render thread across the IPC boundary. Future improvements include adding voice command input for hands-free interaction, integrating semantic segmentation for richer scene understanding, implementing the VPI hardware-accelerated depth estimator for further latency reduction, integrating with a robust haptic feedback system so that users can perceive objects outside their FOV using an additional sense (physical feel), and performing user studies with visually impaired participants.

❮ ❯

2 Lifts

Awards

Artifacts

Name	Description
Project Poster	Final expo poster design. Outlines the entire system overview, team roles, and technical architecture of the system.	Download
Executive Summary	Summary outlining project description, achievements and challenges, top next steps for future development, and a major-milestone timeline visual.	Download
Showcase Video Upload	Direct Video Upload to showcase video	Link
Project Document	The comprehensive project document.	Link
Featured on COE LinkedIn	The college of engineering featured our project on their social media!	Link

Engineering Project Showcase

Engineering Project
Showcase

Jedi Vision | ECE443-Sp26 Project Showcase

By

Ajinkya Gokule, Colin Pannikkat, Sarvesh Thiruppathi Ahila and David Gesl

Awards

Artifacts