Deployed2024

PyroGuardian

Edge-AI UAV Fire Detection System

“How do we deploy an 86.7M-parameter vision model on a <10W edge device at 30 FPS?”

90%

Inference Speedup

FPS at 720p

51 GB

Dataset Size

<1s

Alert Latency

01. The Problem

Early fire detection from UAVs can save lives — but existing approaches demand a choice: powerful cloud inference (too slow over cellular links, unacceptable in remote terrain) or lightweight edge models (too inaccurate for safety-critical alerting). No existing system deployed an 80M+ parameter detection model on a sub-10W edge device while sustaining real-time video throughput.

02.Why It's Hard

An 86.7M-parameter RT-DETR model in PyTorch FP32 at 720p produces ~3 FPS on Jetson Nano — too slow for real-time operation. The challenge is precision–performance–power: FP16 quantization introduces accuracy regression in fine-grained fire boundary detection; CUDA kernel fusions must be hand-validated against detection benchmarks; and the inference pipeline must remain stable across 8 fire conditions (smoldering, crown fire, grass fire, structural fire, and 4 night variants).

03.Our Approach: TensorRT FP16 + DeepStream Pipeline

Converted the RT-DETR checkpoint to ONNX, then applied TensorRT FP16 quantization with custom calibration data from the curated 51GB dataset to minimize accuracy regression on fire-specific low-contrast frames. The DeepStream SDK manages buffer pools, CUDA memory, and H.264 encoding in a single pipeline — eliminating Python GIL bottlenecks. AWS SNS severity scoring uses a sliding-window confidence aggregator to suppress single-frame false positives without sacrificing detection latency.

Architecture — Four-stage pipeline: UAV camera → Jetson Nano inference → H.264 stream → AWS alerting & monitoring.

1.UAV camera captures 720p frames at 30 FPS via MIPI CSI
2.Jetson Nano runs RT-DETR via TensorRT FP16 engine — 90% faster than PyTorch baseline
3.DeepStream manages buffer pools + CUDA memory — zero-copy pipeline to H.264 encoder
4.H.264 stream transmitted over cellular link with 20% lower bandwidth vs. raw MJPEG
5.Severity score computed from sliding-window confidence aggregation (suppresses single-frame false positives)
6.AWS SNS dispatches prioritized alerts to 500+ users with RBAC role filtering, <1s latency

04. Key Results

▹Applied TensorRT FP16 + CUDA acceleration achieving 90% inference speedup vs. PyTorch FP32, sustaining 30 FPS at 720p
▹Curated 51GB multi-scenario dataset — 10K+ annotated frames across 8 fire conditions with targeted augmentations
▹AWS SNS alerts with RBAC roles: <1s latency for 500+ users, dynamic severity-score prioritization
▹Deployed in 3+ industries with H.264 low-bandwidth streaming, reducing video feed downtime by 20%
▹Runner-Up, Honeywell Drone Technologies Hackathon 2024 (200+ teams)

Method	FPS @ 720p
PyTorch FP32 (baseline)	~3 FPS
ONNX FP32	~9 FPS
TensorRT FP16 (ours)← ours	30 FPS

05.What I Learned & Open Questions

TensorRT calibration data must match deployment conditions precisely — calibrating on generic ImageNet samples introduced 8% mAP regression on dark/smoky fire frames vs. calibrating on the target dataset.
The DeepStream buffer pool size is the single biggest tunable — too small and you drop frames; too large and you exceed Jetson's shared memory budget.
Open question: Can sparse attention mechanisms in the transformer backbone reduce VRAM pressure enough to fit larger detection models on next-generation edge hardware?
Open question: How does detection accuracy degrade under adversarial weather conditions (rain, fog) that weren't represented in the training dataset?

06. Tech Stack

Edge AITensorRTCUDAPyTorchRT-DETRAWS SNSNVIDIA Jetson NanoDeepStreamOpenCV

07. Artifacts

GitHub