PyroGuardian
Edge-AI UAV Fire Detection System
“How do we deploy an 86.7M-parameter vision model on a <10W edge device at 30 FPS?”
90%
Inference Speedup
30
FPS at 720p
51 GB
Dataset Size
<1s
Alert Latency
01. The Problem
Early fire detection from UAVs can save lives — but existing approaches demand a choice: powerful cloud inference (too slow over cellular links, unacceptable in remote terrain) or lightweight edge models (too inaccurate for safety-critical alerting). No existing system deployed an 80M+ parameter detection model on a sub-10W edge device while sustaining real-time video throughput.
02.Why It's Hard
An 86.7M-parameter RT-DETR model in PyTorch FP32 at 720p produces ~3 FPS on Jetson Nano — too slow for real-time operation. The challenge is precision–performance–power: FP16 quantization introduces accuracy regression in fine-grained fire boundary detection; CUDA kernel fusions must be hand-validated against detection benchmarks; and the inference pipeline must remain stable across 8 fire conditions (smoldering, crown fire, grass fire, structural fire, and 4 night variants).
03.Our Approach: TensorRT FP16 + DeepStream Pipeline
Converted the RT-DETR checkpoint to ONNX, then applied TensorRT FP16 quantization with custom calibration data from the curated 51GB dataset to minimize accuracy regression on fire-specific low-contrast frames. The DeepStream SDK manages buffer pools, CUDA memory, and H.264 encoding in a single pipeline — eliminating Python GIL bottlenecks. AWS SNS severity scoring uses a sliding-window confidence aggregator to suppress single-frame false positives without sacrificing detection latency.
Architecture — Four-stage pipeline: UAV camera → Jetson Nano inference → H.264 stream → AWS alerting & monitoring.
- 1.UAV camera captures 720p frames at 30 FPS via MIPI CSI
- 2.Jetson Nano runs RT-DETR via TensorRT FP16 engine — 90% faster than PyTorch baseline
- 3.DeepStream manages buffer pools + CUDA memory — zero-copy pipeline to H.264 encoder
- 4.H.264 stream transmitted over cellular link with 20% lower bandwidth vs. raw MJPEG
- 5.Severity score computed from sliding-window confidence aggregation (suppresses single-frame false positives)
- 6.AWS SNS dispatches prioritized alerts to 500+ users with RBAC role filtering, <1s latency
04. Key Results
- ▹Applied TensorRT FP16 + CUDA acceleration achieving 90% inference speedup vs. PyTorch FP32, sustaining 30 FPS at 720p
- ▹Curated 51GB multi-scenario dataset — 10K+ annotated frames across 8 fire conditions with targeted augmentations
- ▹AWS SNS alerts with RBAC roles: <1s latency for 500+ users, dynamic severity-score prioritization
- ▹Deployed in 3+ industries with H.264 low-bandwidth streaming, reducing video feed downtime by 20%
- ▹Runner-Up, Honeywell Drone Technologies Hackathon 2024 (200+ teams)
| Method | FPS @ 720p |
|---|---|
| PyTorch FP32 (baseline) | ~3 FPS |
| ONNX FP32 | ~9 FPS |
| TensorRT FP16 (ours)← ours | 30 FPS |
05.What I Learned & Open Questions
TensorRT calibration data must match deployment conditions precisely — calibrating on generic ImageNet samples introduced 8% mAP regression on dark/smoky fire frames vs. calibrating on the target dataset.
The DeepStream buffer pool size is the single biggest tunable — too small and you drop frames; too large and you exceed Jetson's shared memory budget.
Open question: Can sparse attention mechanisms in the transformer backbone reduce VRAM pressure enough to fit larger detection models on next-generation edge hardware?
Open question: How does detection accuracy degrade under adversarial weather conditions (rain, fog) that weren't represented in the training dataset?