Jedi Vision is a real-time assistive navigation system designed to help visually impaired individuals perceive their surroundings through spatialized audio feedback. The system runs on an NVIDIA Jetson Orin Nano edge computer equipped with stereo cameras and bone conduction headphones, translating live visual scene information into intuitive 3D sound cues that convey the direction and distance of nearby people. The system implements a multi-stage perception pipeline spanning two languages and runtimes. A Python based vision driver performs real-time object detection using YOLOv11 with ByteTrack persistence, computes stereo depth via Semi-Global Block Matching (SGBM) on calibrated and rectified camera pairs, and fuses 2D bounding boxes with 3D reprojected coordinates using Gaussian-weighted depth sampling. Detected object positions are serialized into a compact binary struct format and transmitted over ZeroMQ IPC sockets to a C++ spatial audio engine built on Valve's Steam Audio SDK. The audio engine applies Head-Related Transfer Function (HRTF) binaural rendering, inverse-distance attenuation, and air absorption modeling to produce spatialized tones or musical cues through PortAudio output which enables users to perceive that a person is "to their left and two meters away" purely through sound. Notable accomplishments include achieving real-time inference at ~30 FPS on the Jetson Orin Nano using CUDA acceleration, successful stereo depth calibration and rectification for metric 3D coordinate extraction, a fully Dockerized deployment pipeline that packages the complete vision + audio stack, and a sophisticated audio rendering engine supporting both synthesized pentatonic tones and pre-recorded spatial song playback with per-object note allocation. Key challenges included managing the Jetson's constrained 8 GB memory during inference and Docker builds, tuning stereo block matching parameters for reliable depth at close range, and synchronizing the Python vision loop with the C++ audio render thread across the IPC boundary. Future improvements include adding voice command input for hands-free interaction, integrating semantic segmentation for richer scene understanding, implementing the VPI hardware-accelerated depth estimator for further latency reduction, integrating with a robust haptic feedback system so that users can perceive objects outside their FOV using an additional sense (physical feel), and performing user studies with visually impaired participants.