GamingVision | JPDoes.Dev

About

GamingVision is a Windows accessibility tool that uses computer vision and text-to-speech to make video games accessible to visually impaired players. The application detects UI elements in games using trained YOLO models, extracts text via OCR, and reads it aloud with configurable priority levels.

As a visually impaired gamer, I built this tool to solve my own accessibility challenges. Many games lack built-in screen reader support, making it difficult for players with low vision to read menus, inventory items, and in-game text. GamingVision bridges that gap.

This project evolved from my earlier Python-based "No Man's Access" tool, rebuilt from the ground up in C#/.NET 8 for better performance, easier distribution, and a proper Windows GUI.

GamingVision main application window showing game selection dropdown, Start Detection button, status panel displaying detection state and GPU info, and keyboard shortcuts reference including Alt+1 through Alt+5 and Alt+Q

Key Features

Real-time object detection using YOLOv11 models via ONNX Runtime
GPU acceleration with DirectML (works with NVIDIA, AMD, and Intel GPUs)
Three-tier detection system: Primary (auto-read), Secondary, and Tertiary objects
Windows text-to-speech with configurable voices and speeds per tier
OCR integration using Windows.Media.Ocr for text extraction
Global hotkeys so you can control the app while gaming
Per-game profiles with custom hotkeys and voice settings
Training Tool for new game models
High contrast, accessibility-first interface
Debug logging

How It Works

GamingVision uses a three-tier detection system designed around how gamers actually interact with game UIs:

Primary objects - Quick-reference items like menu titles, item names, button labels. These can be set to auto-read when they change.
Secondary objects - Detailed information like descriptions and quest logs. Read on-demand via hotkey.
Tertiary objects - Additional context like controls, hints, and menus. Also read on-demand.

This approach lets you quickly navigate menus and only hear detailed information when you actually want it, rather than being overwhelmed with constant speech.

Visual Overlay

In addition to text-to-speech, GamingVision can highlight detected objects with high-contrast visual markers. This helps players with low vision quickly locate UI elements on screen.

This feature is in early development and may have performance trade-offs depending on your hardware.

Default Hotkeys

Alt+1 - Read primary objects
Alt+2 - Read secondary objects
Alt+3 - Read tertiary objects
Alt+4 - Stop reading
Alt+5 - Toggle detection on/off
Alt+Q - Quit application

All hotkeys are configurable per game in the Game Settings panel.

Game Settings window showing capture method options, detection settings with confidence thresholds and auto-read cooldown sliders, voice configuration for primary, secondary, and tertiary tiers with voice selection and speech rate controls, and customizable hotkey assignments

Getting Started

Requirements

Windows 10 or 11 (64-bit)
.NET 8.0 Runtime
GPU with DirectML support (NVIDIA, AMD, or Intel) - falls back to CPU if unavailable

Application-wide settings like GPU acceleration and debug logging can be configured in the App Settings panel.

Application Settings window showing GPU acceleration toggle with detected graphics card info, and debugging options including enable logging checkbox with log file path

Quick Start

Download the latest release from GitHub
Extract the ZIP file to a folder of your choice
Run GamingVision.exe
Select a game from the dropdown (No Man's Sky is included)
Click "Start Detection" and launch your game
Use the hotkeys to have UI elements read aloud

Download & Links

Download Latest Release GitHub Repository

Adding Support for New Games

Each game requires its own YOLO model because every game has a unique UI design. Adding support for a new game means collecting screenshots and training a custom model.

Training Data Collection Tool

GamingVision includes a console-based tool for collecting training screenshots. While playing the game, press F1 to capture a screenshot. The tool saves images to a training_data folder. If a model already exists for the game, it will auto-label detected objects. Press Escape to exit.

Process for New Game Support

Collect screenshots using the Training Tool while playing
Contact me for guidance on annotating the screenshots
Submit annotated data for model training

If you'd like to help add support for a game you play, reach out:

jpdoesdev@gmail.com

Why This Project Exists

The primary goal of GamingVision is to help visually impaired players enjoy games that would otherwise be inaccessible. But there's a bigger picture here.

This tool also serves as a demonstration for game developers. Everything GamingVision does through computer vision and external screen reading could be done far more effectively if built directly into games. Native accessibility features would be faster, more accurate, and wouldn't require players to run additional software.

If you're a game developer interested in making your game more accessible, I'd love to collaborate. The techniques used in GamingVision - tiered UI reading, configurable speech priorities, hotkey-triggered announcements - could all be implemented natively with much better results.

Community Contribution

You can help expand GamingVision's game support by collecting training data. The more games we have models for, the more players we can help.

Get In Touch

Interested in working together on game accessibility? Reach out:

jpdoesdev@gmail.com