GamingVision
Making games accessible for visually impaired players
About
GamingVision is a Windows accessibility tool that uses computer vision and text-to-speech to make video games accessible to visually impaired players. The application detects UI elements in games using trained YOLO models, extracts text via OCR, and reads it aloud with configurable priority levels.
As a visually impaired gamer, I built this tool to solve my own accessibility challenges. Many games lack built-in screen reader support, making it difficult for players with low vision to read menus, inventory items, and in-game text. GamingVision bridges that gap.
This project evolved from my earlier Python-based "No Man's Access" tool, rebuilt from the ground up in C#/.NET 8 for better performance, easier distribution, and a proper Windows GUI.
Key Features
- Real-time object detection using YOLOv11 models via ONNX Runtime
- GPU acceleration with DirectML (works with NVIDIA, AMD, and Intel GPUs)
- Three-tier detection system: Primary (auto-read), Secondary, and Tertiary objects
- Windows text-to-speech with configurable voices and speeds per tier
- OCR integration using Windows.Media.Ocr for text extraction
- Global hotkeys so you can control the app while gaming
- Per-game profiles with custom hotkeys and voice settings
- Training Tool for collecting screenshots and creating models for new games
- High contrast, accessibility-first interface
- Debug logging for troubleshooting
How It Works
GamingVision uses a three-tier detection system designed around how gamers actually interact with game UIs:
- Primary objects - Quick-reference items like menu titles, item names, button labels. These can be set to auto-read when they change.
- Secondary objects - Detailed information like descriptions and quest logs. Read on-demand via hotkey.
- Tertiary objects - Additional context like controls, hints, and menus. Also read on-demand.
This approach lets you quickly navigate menus and only hear detailed information when you actually want it, rather than being overwhelmed with constant speech.
Visual Overlay
In addition to text-to-speech, GamingVision can display a visual overlay that highlights detected objects with high-contrast markers. Waypoints and other tracked elements are covered with black-bordered white boxes, making them much easier to see for players with low vision.
Note: The overlay feature is still in early development. It currently reduces game performance and has noticeable latency between detection and drawing, which can cause visual artifacts. Despite these limitations, it's still incredibly helpful for tracking where you need to go in-game.
Default Hotkeys
- Alt+1 - Read primary objects
- Alt+2 - Read secondary objects
- Alt+3 - Read tertiary objects
- Alt+4 - Stop reading
- Alt+5 - Toggle detection on/off
- Alt+Q - Quit application
All hotkeys are configurable per game in the Game Settings panel.
Getting Started
Requirements
- Windows 10 or 11 (64-bit)
- .NET 8.0 Runtime
- GPU with DirectML support (NVIDIA, AMD, or Intel) - falls back to CPU if unavailable
Application-wide settings like GPU acceleration and debug logging can be configured in the App Settings panel.
Quick Start
- Download the latest release from GitHub
- Extract the ZIP file to a folder of your choice
- Run GamingVision.exe
- Select a game from the dropdown (No Man's Sky is included)
- Click "Start Detection" and launch your game
- Use the hotkeys to have UI elements read aloud
Download & Links
Adding Support for New Games
GamingVision uses per-game YOLO models to detect UI elements. Each game requires its own trained model because UI layouts, fonts, and visual styles vary significantly between games. If you want GamingVision to support a game that isn't available yet, you can help by collecting training data.
Training Data Collection Tool
GamingVision includes a console-based tool for collecting screenshots. If a model already exists for a game, it will automatically pre-label new screenshots to speed up the process.
- Run the Training Tool and select a game (or create a new profile)
- Launch your game and play normally
- Press F1 whenever there's an interesting UI element on screen
- Screenshots are saved automatically to the training_data folder
- Press Escape to return to the menu
Want a Model for Your Favorite Game?
The model training pipeline using Python and CUDA is still being refined, so I'm not asking everyone to train their own models just yet. However, if you're interested in getting a specific game supported:
- Use the Training Tool to collect screenshots from the game
- Contact me and I'll walk you through the annotation process
- Send me your annotated training data and I'll train the model for you
This collaborative approach ensures quality models while the training workflow matures. Reach out at jpdoesdev@gmail.com if you'd like to help add support for a new game.
Why This Project Exists
The primary goal of GamingVision is to help visually impaired players enjoy games that would otherwise be inaccessible. But there's a bigger picture here.
This tool also serves as a demonstration for game developers. Everything GamingVision does through computer vision and external screen reading could be done far more effectively if built directly into games. Native accessibility features would be faster, more accurate, and wouldn't require players to run additional software.
If you're a game developer interested in making your game more accessible, I'd love to collaborate. The techniques used in GamingVision - tiered UI reading, configurable speech priorities, hotkey-triggered announcements - could all be implemented natively with much better results.
Help Add New Games
If you'd like to help expand GamingVision's game support, you can collect training data for games you play. Use the Training Tool to capture screenshots, then get in touch and I'll help you through the annotation and training process. Every new game model helps more visually impaired players enjoy games they couldn't access before.
Get In Touch
Interested in working together on game accessibility? Reach out: