IoT 百萬設備架構選型 Part 1:核心架構與技術選型

IoT 1M Device Architecture Overview
Phase 1 核心架構:EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry

English Abstract — Part 1 of 3. Core architecture for 1M IoT devices: EMQX (Rule Engine direct write) → TimescaleDB (hot 7d + Continuous Aggregates) → FastAPIBFF → Dashboard. Includes protocol selection, broker comparison, three-tier storage, BFF design, cost estimation (~$17-33K/month), and AWS/GCP mapping. Scale-out (>1M): add Redpanda + ClickHouse.

前言

1 百萬台設備、每 10 秒回報一次 = 100K writes/sec~1.7 TB/day

本系列採用漸進式架構 — Phase 1 只需 4 個核心組件即可上線:

Phase 組件 適用規模
1 (MVP) EMQX + TimescaleDB + FastAPI + BFF < 100 萬
2 + ClickHouse + S3(冷熱分層) 同上
3 + Multi-tenant + DR 同上
4 + Redpanda + FastStream(event streaming) > 100 萬
系列文章: Part 1 核心架構(本篇) Part 2 安全與多租戶 Part 3 運維與可靠性

架構資料流

Telemetry(Device → Dashboard)

flowchart LR
    D[Device] -->|MQTT| B[EMQX]
    B -->|Rule Engine| T[TimescaleDB]
    T --> A[FastAPI]
    A --> BFF[BFF]
    BFF -->|WebSocket| U[Dashboard]

Command(Dashboard → Device)

flowchart RL
    U[Dashboard] -->|WS/REST| BFF[BFF]
    BFF --> A[FastAPI]
    A -->|MQTT QoS 1| B[EMQX]
    B --> D[Device]

EMQX Rule Engine 直寫 TimescaleDB,無 Event Streaming 中間層。延遲 <50ms,適合 1M 以下。>1M 需 dedup / event replay 時再加 Redpanda。


通訊協定與 Broker

協定 雙向 功耗 Overhead 適用
MQTT v5 Yes 極低 2-byte header IoT 預設
CoAP 有限 極低 UDP NB-IoT 受限設備
gRPC Yes HTTP/2 + protobuf Service-to-service

MQTT v5 關鍵功能:Correlation ID、Shared subscriptions、Message expiry、Retained messages、LWT。

Broker 1M 連線 Clustering 推薦 說明
EMQX ✓ (100M+) RAFT ★★★★★ 開源、Rule Engine 內建、社群最大
HiveMQ 原生 ★★★★ 商業授權、企業支援佳
Mosquitto x (~100K) 僅 dev 單線程、無 clustering
部署 AWS GCP 月費
EMQX Cloud AWS GCP ~$8-15K
Self-hosted K8s EKS GKE ~$3-8K + ops

Python 後端:asyncio

Free Threading (PEP 703) 預計 Python 3.16 (~2028) 才正式。1M 連線是 I/O-bound,asyncio + uvloop (2-4x 提升) + 多 worker 進程是正解。

HAProxy → Uvicorn worker 1..N (asyncio + uvloop, ~50-100K conn/worker)
           └── Redis/NATS cross-process pub/sub

BFF (Backend for Frontend)

Backend 不直接面對前端 UI。BFF 層負責 WebSocket、API 聚合、Response 裁切。

flowchart LR
    subgraph BE["FastAPI Backend"]
        D[Device API]
        T2[Telemetry API]
        C[Command API]
    end
    BE -->|gRPC / REST| W[BFF-Web]
    BE -->|gRPC / REST| M[BFF-Mobile]
    W -->|WebSocket| WD[Dashboard]
    M -->|REST + Push| MA[Mobile App]
職責 不做
Backend Device CRUD、Telemetry、Command、RBAC、MQTT UI 邏輯
BFF WS 管理、聚合查詢、裁切、i18n 直連 DB/MQTT
面向 選擇 說明
語言 FastAPI 或 Next.js API Routes 依前端團隊技術棧
BFF → Backend gRPC 或 REST gRPC 效能好,REST 開發快
快取 Redis Status cache + WS pub/sub

EMQX Rule Engine 資料寫入

不使用獨立 Event Streaming。Rule Engine 內建 PostgreSQL connector 直寫 TimescaleDB:

  • SQL-like 過濾:SELECT * FROM "telemetry/#" WHERE payload.temperature > 50
  • 訊息轉發、格式轉換、基本 dedup、Rate Limiting + 背壓

1M 設備需 event replay / 多消費者時,加入 Redpanda(Phase 4)。


三層儲存策略

DB 保留 查詢延遲 月成本/TB 說明
Hot TimescaleDB 7 days <10ms ~$200 原始解析度,Dashboard 即時查詢
Warm ClickHouse 30-90d 50-500ms ~$50 1min/5min 聚合,分析查詢
Cold S3 + Parquet 秒級 ~$2-5 時/日聚合,DuckDB ad-hoc

Continuous Aggregates 是 Dashboard 查詢的關鍵 — 自動預聚合 1min/5min/1hr,查詢量降 600x。

CREATE MATERIALIZED VIEW sensor_1min
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 minute', time) AS bucket, device_id,
       avg(temperature) AS avg_temp, max(temperature) AS max_temp
FROM telemetry GROUP BY bucket, device_id;
flowchart TD
    D[Devices] -->|MQTT| E[EMQX]
    E -->|Rule Engine| H[TimescaleDB]
    H -->|Aggregate| W[ClickHouse]
    W -->|Archive| C[S3 Parquet]
    H -->|Query| A[FastAPI]
    A -->|gRPC| B[BFF]
    B -->|WS| U[Dashboard]

成本估算: 1M 設備雲端 managed 約 ~$17-33K/月(AWS/GCP),詳見 Part 3 成本與風險


技術選型總表

選擇 AWS GCP
設備協定 MQTT v5
Broker EMQX EMQX Cloud EMQX Cloud
Data Ingestion Rule Engine
Streaming (>1M) Redpanda MSK Redpanda Cloud
Backend FastAPI + asyncio Fargate Cloud Run
BFF FastAPI / Next.js Fargate Cloud Run
UI 推送 WebSocket via BFF ALB Cloud LB
Hot DB TimescaleDB Timescale Cloud Timescale Cloud
Warm DB ClickHouse ClickHouse Cloud ClickHouse Cloud
Cold S3 + Parquet S3 GCS
Observability OTel + Grafana CloudWatch Cloud Monitoring

下一篇

相關資源




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • /yt2pdf 全解析:YouTube 影片 → 雙語 PDF 摘要的 6 階段自動化 Pipeline
  • Claude Code Agent 架構深度拆解:8 個可複用的 Production 設計模式
  • 本地 Agent Swarm 框架全解析:從架構比較到簡單實作
  • IoT 百萬設備架構選型 Part 3:運維、成本與可靠性
  • IoT 百萬設備架構選型 Part 2:安全與多租戶