| 1 | # Ball detector |
| 2 | |
| 3 | Two detector backends. The HSV classical path is the default. The YOLO path |
| 4 | is the fallback for harder lighting (HDR tone mapping, stadium shadows, |
| 5 | white jerseys). |
| 6 | |
| 7 | ## Why a learned detector at all |
| 8 | |
| 9 | The HSV + circularity tracker is fast and zero-dependency, but it false-positives |
| 10 | on jersey numbers, scoreboard graphics, and field highlights that share the |
| 11 | baseball's near-white tone. Under HDR-to-SDR tone mapping (Monster card output |
| 12 | when the Xbox has HDR enabled), the ball loses saturation and the HSV envelope |
| 13 | has to be widened so much that the false-positive rate becomes unworkable. |
| 14 | |
| 15 | A single-class YOLO detector trained directly on the capture feed fixes this. |
| 16 | |
| 17 | ## Training summary |
| 18 | |
| 19 | | Metric | Value | |
| 20 | | --------------- | ------ | |
| 21 | | Architecture | YOLOv11n (single-class, `ball`) | |
| 22 | | Image size | 640 | |
| 23 | | Epochs | 45 (early-stopped from 80) | |
| 24 | | Batch | 8 | |
| 25 | | Final mAP50 | 0.94 | |
| 26 | | Final mAP50-95 | 0.38 | |
| 27 | | Final precision | 0.92 | |
| 28 | | Final recall | 0.93 | |
| 29 | |
| 30 | Training curves:  |
| 31 | |
| 32 | Box precision / recall / F1 / PR curves: |
| 33 | |
| 34 | | Curve | Plot | |
| 35 | | ----- | ---- | |
| 36 | | Precision |  | |
| 37 | | Recall |  | |
| 38 | | F1 |  | |
| 39 | | PR |  | |
| 40 | |
| 41 | Confusion matrix (single class plus background): |
| 42 | |
| 43 |  |
| 44 | |
| 45 | ## Dataset |
| 46 | |
| 47 | Frames are sampled at 12 FPS from live capture during batting practice and |
| 48 | labeled in YOLO format with a single class. The labeled set is split 70 / 20 / |
| 49 | 10 train / val / test by `tools/yolo_split_dataset.py`. A hard-negatives pass |
| 50 | (false-positive frames from the previous-generation HSV tracker, labeled with |
| 51 | empty boxes) reduces background activations on jerseys and scoreboards. |
| 52 | |
| 53 | Dataset volumes are not committed to this repository. Training is intended to |
| 54 | be reproduced from each operator's own capture feed; see |
| 55 | `tools/yolo_collect_frames.py` and `tools/yolo_label_ball.py`. |
| 56 | |
| 57 | ## Runtime |
| 58 | |
| 59 | The trained `best.pt` is exported to ONNX and consumed by |
| 60 | `io_titan/mlb26_gcv_yolo.py` inside Gtuner IV's Computer Vision worker. ONNX |
| 61 | keeps inference at ~22 ms per frame on CPU at 320x320 input, which is well |
| 62 | inside the 60 FPS budget. GPU inference is roughly 4 ms. |