docs/BALL_DETECTOR.md · Pitch Tracker CV

62 lines · markdown

# Ball detector
 
Two detector backends. The HSV classical path is the default. The YOLO path
is the fallback for harder lighting (HDR tone mapping, stadium shadows,
white jerseys).
 
## Why a learned detector at all
 
The HSV + circularity tracker is fast and zero-dependency, but it false-positives
on jersey numbers, scoreboard graphics, and field highlights that share the
baseball's near-white tone. Under HDR-to-SDR tone mapping (Monster card output
when the Xbox has HDR enabled), the ball loses saturation and the HSV envelope
has to be widened so much that the false-positive rate becomes unworkable.
 
A single-class YOLO detector trained directly on the capture feed fixes this.
 
## Training summary
 
| Metric          | Value  |
| --------------- | ------ |
| Architecture    | YOLOv11n (single-class, `ball`) |
| Image size      | 640    |
| Epochs          | 45 (early-stopped from 80) |
| Batch           | 8      |
| Final mAP50     | 0.94   |
| Final mAP50-95  | 0.38   |
| Final precision | 0.92   |
| Final recall    | 0.93   |
 
Training curves: ![results](images/results.png)
 
Box precision / recall / F1 / PR curves:
 
| Curve | Plot |
| ----- | ---- |
| Precision | ![P](images/BoxP_curve.png) |
| Recall    | ![R](images/BoxR_curve.png) |
| F1        | ![F1](images/BoxF1_curve.png) |
| PR        | ![PR](images/BoxPR_curve.png) |
 
Confusion matrix (single class plus background):
 
![confusion](images/confusion_matrix_normalized.png)
 
## Dataset
 
Frames are sampled at 12 FPS from live capture during batting practice and
labeled in YOLO format with a single class. The labeled set is split 70 / 20 /
10 train / val / test by `tools/yolo_split_dataset.py`. A hard-negatives pass
(false-positive frames from the previous-generation HSV tracker, labeled with
empty boxes) reduces background activations on jerseys and scoreboards.
 
Dataset volumes are not committed to this repository. Training is intended to
be reproduced from each operator's own capture feed; see
`tools/yolo_collect_frames.py` and `tools/yolo_label_ball.py`.
 
## Runtime
 
The trained `best.pt` is exported to ONNX and consumed by
`io_titan/mlb26_gcv_yolo.py` inside Gtuner IV's Computer Vision worker. ONNX
keeps inference at ~22 ms per frame on CPU at 320x320 input, which is well
inside the 60 FPS budget. GPU inference is roughly 4 ms.

1	# Ball detector
2
3	Two detector backends. The HSV classical path is the default. The YOLO path
4	is the fallback for harder lighting (HDR tone mapping, stadium shadows,
5	white jerseys).
6
7	## Why a learned detector at all
8
9	The HSV + circularity tracker is fast and zero-dependency, but it false-positives
10	on jersey numbers, scoreboard graphics, and field highlights that share the
11	baseball's near-white tone. Under HDR-to-SDR tone mapping (Monster card output
12	when the Xbox has HDR enabled), the ball loses saturation and the HSV envelope
13	has to be widened so much that the false-positive rate becomes unworkable.
14
15	A single-class YOLO detector trained directly on the capture feed fixes this.
16
17	## Training summary
18
19	\| Metric \| Value \|
20	\| --------------- \| ------ \|
21	\| Architecture \| YOLOv11n (single-class, `ball`) \|
22	\| Image size \| 640 \|
23	\| Epochs \| 45 (early-stopped from 80) \|
24	\| Batch \| 8 \|
25	\| Final mAP50 \| 0.94 \|
26	\| Final mAP50-95 \| 0.38 \|
27	\| Final precision \| 0.92 \|
28	\| Final recall \| 0.93 \|
29
30	Training curves: ![results](images/results.png)
31
32	Box precision / recall / F1 / PR curves:
33
34	\| Curve \| Plot \|
35	\| ----- \| ---- \|
36	\| Precision \| ![P](images/BoxP_curve.png) \|
37	\| Recall \| ![R](images/BoxR_curve.png) \|
38	\| F1 \| ![F1](images/BoxF1_curve.png) \|
39	\| PR \| ![PR](images/BoxPR_curve.png) \|
40
41	Confusion matrix (single class plus background):
42
43	![confusion](images/confusion_matrix_normalized.png)
44
45	## Dataset
46
47	Frames are sampled at 12 FPS from live capture during batting practice and
48	labeled in YOLO format with a single class. The labeled set is split 70 / 20 /
49	10 train / val / test by `tools/yolo_split_dataset.py`. A hard-negatives pass
50	(false-positive frames from the previous-generation HSV tracker, labeled with
51	empty boxes) reduces background activations on jerseys and scoreboards.
52
53	Dataset volumes are not committed to this repository. Training is intended to
54	be reproduced from each operator's own capture feed; see
55	`tools/yolo_collect_frames.py` and `tools/yolo_label_ball.py`.
56
57	## Runtime
58
59	The trained `best.pt` is exported to ONNX and consumed by
60	`io_titan/mlb26_gcv_yolo.py` inside Gtuner IV's Computer Vision worker. ONNX
61	keeps inference at ~22 ms per frame on CPU at 320x320 input, which is well
62	inside the 60 FPS budget. GPU inference is roughly 4 ms.