Kevin’s Tech Blog

/yt2pdf 全解析：YouTube 影片 → 雙語 PDF 摘要的 6 階段自動化 Pipeline

2026-04-04T02:00:00+00:00

yt2pdf：從 YouTube URL 到雙語 PDF 的 6 階段 Pipeline

English Abstract — This post dissects /yt2pdf, a 6-stage automation pipeline that converts any YouTube video into bilingual (EN + Traditional Chinese) PDF summaries. The pipeline chains yt-dlp subtitle extraction with Whisper ASR fallback, AI-powered bilingual summarization, headless Chrome PDF rendering with base64-embedded images, and Backblaze B2 cloud upload with 7-day presigned URLs. We examine the transcript fallback strategy, the orchestrator pattern, and 6 key design decisions for production deployment.

前言

在之前的 Channel Plugin 實戰中，我們建立了 Telegram / Discord 雙向溝通的基礎設施。使用者開始在頻道裡丟 YouTube 連結，問「這個影片在講什麼？」——但每次都要手動看完影片、整理摘要、再回覆，效率太低。

/yt2pdf 就是為了解決這個問題：一個指令，自動擷取字幕、產生雙語摘要、輸出 PDF、上傳雲端。使用者在 Telegram 輸入 /yt2pdf https://youtube.com/watch?v=xxx，幾分鐘後就收到一份排版精美的 PDF 下載連結。

這是 claude-code-channels v1.1.0 的核心功能。本文拆解它的完整 Pipeline 架構。

Pipeline 架構總覽

整個流程分為 6 個階段，每個階段由獨立的 Python 模組負責：

完整 6 階段架構：字幕擷取三層 Fallback + 轉換 Pipeline + Tech Stack

簡化流程圖：

flowchart TB
    A["1. Parse URL"] --> B["2. Thumbnail + Metadata"]
    B --> D{"3. Subtitles?"}
    D -->|"Manual / Auto"| E["SRT → Text"]
    D -->|"No subs"| F["Whisper ASR"] --> E
    E --> G["4. AI Summary (EN + zh-TW)"]
    G --> H["5. Markdown → HTML → PDF"]
    H --> J["6. Upload B2 → Presigned URL"]

頻道整合的角度看，流程更簡單：

flowchart LR
    U["User"] -->|"/yt2pdf URL"| T["Telegram / Discord"]
    T -->|"MCP notification"| CC["Agent Session"]
    CC -->|"run pipeline"| PY["Python Scripts"]
    PY -->|"JSON result"| CC
    CC -->|"reply + URLs"| T

每個 Python 模組職責清楚：

模組	職責	輸入	輸出	說明
`get_transcript.py`	字幕擷取 + Whisper Fallback	Video ID	Plain text	三層 Fallback確保有字幕
`build_html.py`	Markdown → 排版 HTML	`.md` file	`.html` file	Base64 圖片嵌入
`build_pdf.py`	HTML → PDF	`.html` file	`.pdf` file	Headless Chrome 渲染
`upload_b2.py`	上傳 B2 + 產生連結	`.pdf` file	Presigned URL	7 天限時下載
`yt2pdf.py`	串接以上四步	`.md` files	JSON array	一次性 B2 授權

字幕擷取的三層 Fallback策略

字幕擷取是整個 Pipeline 最不確定的環節——不是每支影片都有字幕。get_transcript.py 實作了三層 Fallback：

方法	來源	工具	延遲	準確度	適用場景
Manual subtitles	人工上傳字幕	yt-dlp `--write-subs`	~2s	最高	有人工字幕的影片
Auto-generated	YouTube 自動產生	yt-dlp `--write-auto-subs`	~2s	中等	英語影片、熱門語言
Whisper ASR	音訊轉文字	ffmpeg + HuggingFace API	30-120s	高	無字幕的影片

核心邏輯：

def get_transcript(video_id: str) -> str | None:
    video_url = f"https://www.youtube.com/watch?v={video_id}"

    with tempfile.TemporaryDirectory(prefix="yt_transcript_") as tmpdir:
        tmp = Path(tmpdir)

        # Layer 1 & 2: Try subtitles (manual → auto-generated)
        srt = download_subtitles(video_url, tmp, lang="en")
        if srt:
            text = srt_to_text(srt)
            if len(text) > 100:
                return text

        # Layer 3: Whisper fallback
        text = whisper_transcribe_hf(video_url, tmp)
        if text and len(text) > 100:
            return text

    return None

Production Notes — Whisper Fallback會增加 30-120 秒延遲（取決於影片長度），而且 HuggingFace Inference API 有 rate limit。建議在頻道回覆中先發 “Processing…” 訊息，讓使用者知道系統正在處理。SRT 解析會自動去除時間戳和序號，只保留純文字。

Markdown → PDF：3 步轉換

拿到字幕後，AI 產生雙語 Markdown 摘要。接下來要把 .md 轉成可下載的 PDF。orchestrator yt2pdf.py 串接這三步：

def process_one(md_path, title, upload, b2_prefix, b2_authorized):
    lang = _detect_lang(md_path)  # *_en.md → "en", *_zh-tw.md → "zh-tw"
    result = {"lang": lang, "md": str(md_path)}

    # Step 1: Markdown → styled HTML (base64 embedded images)
    html_content = build_html(md_path, title=title, lang=lang)
    html_path = md_path.with_suffix(".html")
    html_path.write_text(html_content, encoding="utf-8")

    # Step 2: HTML → PDF via headless Chrome
    pdf_path = md_path.with_suffix(".pdf")
    pdf_result = html_to_pdf(html_path, pdf_path)

    # Step 3: Upload to B2 (optional)
    if upload and pdf_result and b2_authorized:
        b2_path = f"{b2_prefix}/{pdf_path.name}"
        url = upload_file(pdf_result, b2_path)
        result["url"] = url

    return result

三步的關鍵設計：

Base64 圖片嵌入 — build_html.py 會把本地圖片（如 thumb.jpg）轉為 data URI 嵌入 HTML。這讓 PDF 完全自包含，離線也能正常顯示。

CJK 字型支援 — HTML 模板指定 "Noto Sans TC", "Microsoft JhengHei", "PingFang TC" 字型堆疊，確保繁體中文在各平台都能正確渲染。CSS 使用 @page { size: A4; margin: 2cm; } 控制頁面尺寸。

Headless Chrome — build_pdf.py 用 google-chrome --headless --print-to-pdf 產生 PDF。Chrome 的 CSS 引擎是所有 PDF 方案中 CJK 支援最完整的。

Production Notes — Docker 環境需要 --no-sandbox 旗標和 fonts-noto-cjk 套件。build_pdf.py 會自動搜尋 google-chrome、google-chrome-stable、chromium 等執行檔路徑。

B2 上傳與 Presigned URL

PDF 產生後，上傳到 Backblaze B2 雲端儲存並產生 presigned URL：

一次性授權 — orchestrator 在啟動時呼叫 authorize_b2() 一次，所有檔案共用同一個 session
日期分區路徑 — yt2pdf/2026-04-04/summary_en.pdf，方便按日期清理
7 天 TTL — b2 get-download-url-with-auth --duration 604800 產生限時下載連結
降級策略 — B2 上傳失敗時，直接在頻道附加 PDF 檔案作為 fallback

為什麼用 presigned URL 而不是直接附件？Telegram Bot API 傳送檔案時會拆成獨立訊息——如果同時傳 EN + zh-TW 兩個 PDF，使用者會收到 3 條訊息（文字 + 2 個檔案），體驗很差。用 URL 可以把所有資訊整合在一條回覆中。

Command Spec 驅動的 Agent 協作

整個 Pipeline 的入口不是 Python，而是一份 Command Spec：.claude/commands/yt2pdf.md。

這份 197 行的 Markdown 文件定義了 6 個步驟的完整流程——從 URL 解析、thumbnail 下載、metadata 擷取、transcript 提取、summary 生成到 PDF 輸出。它本質上是一個結構化的 prompt，告訴 Agent 怎麼協調各個 Python script。

Step 1: Parse & Acknowledge    → 解析 URL，回覆 "Processing..."
Step 2: Download Thumbnail     → curl YouTube CDN
Step 3: Fetch Metadata         → yt-dlp --dump-json
Step 4: Generate Summary       → AI 產生雙語 Markdown
Step 5: Build PDFs & Upload    → python3 scripts/yt/yt2pdf.py ...
Step 6: Reply with Results     → 格式化回覆 + presigned URLs

這跟前幾天拆解的 Agent 架構中的 Command System 是同一個模式——用結構化文件定義工作流程，讓 Agent 按步驟執行。差別在於這裡的 Command Spec 不只定義步驟，還包含錯誤處理策略和頻道特定的回覆格式（Telegram vs Discord vs Slack）。

設計決策總覽

決策	選擇	替代方案	理由
圖片嵌入	Base64 data URI	外部圖片連結	PDF 自包含，離線可讀，轉寄不會破圖
PDF 引擎	headless Chrome	wkhtmltopdf / WeasyPrint	CJK 字型支援最佳，CSS 渲染最完整
檔案交付	Presigned URL (7 天)	直接附件	避免 Telegram 拆成多條訊息
Pipeline 輸出	JSON stdout	檔案寫入 / exit code	機器可解析，Agent 直接讀取結果
語言偵測	檔名慣例 (`*_en.md`)	內容偵測 / 明確參數	簡單可靠，零外部依賴
目錄結構	`YYYY-MM-DD/VIDEO_ID/`	平面目錄	按日期清理、避免 ID 衝突

Production Notes — B2 credential 建議使用 Application Key（非 Master Key），且限定單一 bucket 的權限。HuggingFace token 要注意 rate limit——免費方案的 Whisper large-v3 模型每小時有請求上限。

Claude Code Agent 架構深度拆解：8 個可複用的 Production 設計模式

2026-04-01T02:00:00+00:00

從 Claude Code 原始碼提煉的 8 個可複用架構模式

English Abstract — This post dissects the internal architecture of a production agent system, extracting 8 reusable design patterns from 1,902 TypeScript source files: Tool Registration Pipeline, Side-Query for cost-efficient routing, Coordinator/Worker with XML result injection, Hook Event System (16×5×7 combinatorics), and Context Compaction three-layer strategy. Each pattern includes pseudocode and practical adoption guidance.

昨天的文章從外部比較了主流 Agent Swarm 框架。今天我們換一個角度：深入一個 production 級 agent 系統的原始碼，看它是怎麼設計的。

我們分析了 1,902 個 TypeScript 檔案、21 個子系統，提煉出 8 個可直接複用的設計模式。不論你用的是 CrewAI、LangGraph 還是自建框架，這些模式都能直接套用。

架構總覽：21 個子系統

整個系統可以分為 5 大層次：

層次	子系統	職責
執行層	Tool System, Skill System	工具定義、註冊、執行
協調層	Agent/Subagent, Coordinator	多 Agent 分工與結果整合
安全層	Permission, Hook System	權限控制、事件攔截
記憶層	State Store, Memory, Context	狀態管理、上下文壓縮
擴展層	Plugin, Command, Output Style	模組化擴展機制

本文深入 5 個最具採用價值的模式（Pattern 1, 3, 4, 6, 8），簡述其餘 3 個。

Pattern 1: Tool Registration Pipeline

問題：工具來自多個來源（內建、Plugin、MCP、使用者自訂），需要統一管理並控制可見性。

解法是一個 四階段 pipeline，每一步都可以插入邏輯：

// Stage 1: Define — 宣告工具的 schema 和能力
const toolDefs: ToolDef[] = [
  { name: "Bash", schema: bashSchema, isDestructive: true },
  { name: "FileRead", schema: readSchema, isReadOnly: true },
  // ... MCP tools, plugin tools, user tools
];

// Stage 2: Build — 實例化工具，注入 context
const tools = toolDefs.map(def => buildTool(def, context));

// Stage 3: Filter — 根據 deny rules 移除不允許的工具
const filtered = tools.filter(t => !denyRules.matches(t.name));

// Stage 4: Assemble — 排序（按名稱，穩定 prompt cache）並組裝
const pool = assembleToolPool(filtered.sort(byName));

關鍵設計決策：

Fail-closed — 預設拒絕，必須明確允許才能使用
按名稱排序 — 工具順序穩定，最大化 prompt cache hit rate
Feature gate 注入 — 在 Build 階段根據 feature flag 決定是否包含工具

Production Notes — 如果你正在建 agent 系統，不要把 tool 註冊寫成一個大的 if-else。用 pipeline 模式讓每個階段獨立可測試。特別是 Filter 階段 — 它讓你不用改程式碼就能關閉特定工具。

Pattern 3: Side-Query Pattern

問題：每次決策都用主模型太貴。記憶檢索、權限判斷、路由分派這些「輔助判斷」不需要最強的模型。

解法是 side-query — 在主對話旁邊開一個輕量的 LLM 查詢：

// 記憶檢索：用較小模型挑選相關記憶
async function recallMemories(query: string): Promise<Memory[]> {
  const allMemories = await scanMemoryFiles();    // 掃描所有記憶檔
  const selected = await sideQuery({
    model: "fast",                                 // 用便宜的模型
    prompt: `從以下記憶中選出與 "${query}" 相關的（最多 5 個）`,
    context: allMemories.map(m => m.summary),
  });
  return selected;
}

// 權限分類：2 階段 XML classifier
async function classifyPermission(toolCall: ToolCall): Promise<Decision> {
  const stage1 = await sideQuery({
    model: "fast",
    prompt: `判斷此工具呼叫的安全性：${toolCall.name}(${toolCall.args})`,
  });
  if (stage1 === "soft_deny") {
    return await sideQuery({ model: "fast", prompt: "進一步評估..." });
  }
  return stage1;  // "allow" or "ask"
}

實際應用場景：

用途	主模型	Side-Query	成本比	說明
記憶檢索	Opus	Haiku	~20:1	大量記憶快速篩選
權限判斷	Opus	Haiku	~20:1	語意判斷不需推理
路由分派	Opus	Sonnet	~5:1	中等複雜度路由

Production Notes — Side-query 的 prompt 要精心設計 — 它是整個系統最高頻的 LLM 呼叫。建議固定 prompt 格式以最大化 cache hit，並設定 timeout 防止 side-query 拖慢主對話。

Pattern 4: Coordinator / Worker + XML 結果注入

問題：複雜任務需要多個 Agent 並行處理，但共享狀態會帶來競爭問題。

解法是 Coordinator/Worker 模式 — Coordinator 只負責規劃和整合，Workers 非同步執行：

Coordinator (規劃)
    │
    ├── Worker A (Research)    ──async──→  XML
    ├── Worker B (Implement)   ──async──→  XML
    └── Worker C (Test)        ──async──→  XML
    │
    └── Coordinator (整合所有結果)

四個階段：

Research — 搜集資訊、理解需求
Synthesis — 整合發現、制定方案
Implementation — 並行執行具體工作
Verification — 驗證結果、品質檢查

結果注入機制：Worker 完成後，結果以 XML 格式注入 Coordinator 的對話：

  worker-a
  completed
  Found 3 relevant APIs: ...

關鍵設計：

Coordinator 不直接使用工具 — 只有 AgentTool、SendMessage、TaskStop
Workers 在 獨立 context 中執行 — 不共享 state，避免競爭
XML 注入是 append-only — 不會修改已有的對話歷史

Production Notes — 這個模式的核心是「不共享狀態」。昨天我們比較的 LangGraph 用 graph state 共享，CrewAI 用 sequential task passing。Coordinator/Worker 則完全解耦 — 代價是 Coordinator 需要更強的整合能力。適合高併發、低耦合的場景。

Pattern 6: Hook Event System

問題：系統需要可擴展性，但不希望核心程式碼被修改。

解法是一個 高度可組合的 Hook 系統：16 種事件 × 5 種 Hook 類型 × 7 個來源。

16 種事件

SessionStart, Setup, UserPromptSubmit,
PreToolUse, PostToolUse, PostToolUseFailure,
PermissionRequest, PermissionDenied,
Stop, FileChanged, WorktreeCreate,
SubagentStart, Notification, Elicitation, CwdChanged

5 種 Hook 類型

Hook 類型	執行方式	適用場景
Command	Shell script (exit 0=pass, 2=block)	快速檢查、git hooks
Prompt	LLM side-query	語意判斷、品質檢查
Agent	生成 subagent 驗證	複雜驗證邏輯
HTTP	Remote callback	外部審核系統
Callback	JS function (runtime)	內部擴展

HookMatcher 模式匹配

"Bash"              → 攔截所有 Bash 呼叫
"Bash(git *)"       → 只攔截 git 相關指令
"Write(*.env)"      → 攔截寫入 .env 檔案
"Edit(**/*.ts)"     → 攔截編輯 TypeScript 檔案

來源優先序（高 → 低）：

userSettings > projectSettings > localSettings > policySettings > pluginHook > sessionHook > builtinHook

Production Notes — Hook 系統是投資報酬率最高的架構元件。一個 PreToolUse command hook 可以實現：程式碼審查（lint before write）、安全檢查（block dangerous commands）、日誌記錄（audit trail）。建議從 command hook 開始，需要語意判斷時再升級到 prompt hook。

Pattern 8: Context Compaction 三層策略

問題：長對話耗盡 context window，但簡單截斷會丟失關鍵資訊。

解法是 三層漸進式壓縮：

Layer 1: Micro Compact（turn 內壓縮）
  → 觸發：單次回應過長
  → 壓縮：移除冗餘 tool output，保留摘要

Layer 2: Auto Compact（token 閾值觸發）
  → 觸發：total tokens > context_window - 13,000
  → 壓縮：用 LLM 摘要歷史對話，保留近期 turns

Layer 3: Manual Compact（使用者觸發）
  → 觸發：/compact 指令
  → 壓縮：最激進 — 只保留核心 context

停止條件（Diminishing Returns Detection）：

// 連續壓縮 3+ 次，且每次只省 < 500 tokens → 停止
if (continuations >= 3 && tokenDelta < 500) {
  return "compaction_exhausted";
}

Production Notes — 這就是我們上一篇討論 task planner 的原因 — 壓縮會丟失 in-progress 的 task 狀態。解法是把 task 持久化到檔案系統（.claude/tasks/），讓它不受 context 壓縮影響。

其他值得關注的模式

Pattern 2: Immutable State Store + Change Hooks

單一 immutable store 搭配 reactive change hooks。用 Object.is 判斷是否真正變更，避免多餘的 side effect。比 event bus 更可預測 — 每個 state change 都有明確的因果鏈。

Pattern 5: Permission Rule System

每條權限規則記錄來源（user/project/policy/plugin/builtin），支援優先序和 audit trail。Policy settings 可以覆蓋使用者設定，實現企業級管控。

Pattern 7: Deferred Loading

當工具數量超過 prompt 容量時，不把所有定義放進 system prompt。標記 shouldDefer=true 的工具只在被搜尋時才載入 — 用 searchHint 關鍵字做 lazy discovery，節省大量 tokens。

給 Agent 系統開發者的建議

優先採用順序（從投資報酬率排列）：

Hook System — 最小侵入性，立即獲得可擴展性
Tool Pipeline — 統一管理工具來源，避免 if-else 地獄
Context Compaction — 長對話必備，越早做越好
Side-Query — 成本最佳化的關鍵，production 必須有
Coordinator/Worker — 需要並行處理時再引入

最小可行 Agent 架構：

Tool Pipeline + Hook System + State Store = 可生產的 Agent
加上 Side-Query + Compaction = 可規模化的 Agent
加上 Coordinator/Worker = 可並行的 Agent Swarm

本地 Agent Swarm 框架全解析：從架構比較到簡單實作

2026-03-31T02:00:00+00:00

本地 Agent Swarm 框架架構總覽

English Abstract — This post surveys the mainstream local agent swarm frameworks in 2026: CrewAI (role-based crews), AutoGen/AG2 (actor-model conversations), LangGraph (graph-based state machines), and smolagents (code-first minimal agents). We compare their architectures, learning curves, and trade-offs, then implement a minimal 2-agent swarm using Hugging Face’s smolagents to demonstrate how lightweight multi-agent orchestration can be.

Multi-Agent 協作已經從研究論文走進生產環境。當你的 LLM 應用需要不同角色分工——一個搜資料、一個寫摘要、一個檢查品質——你需要一個 Agent Swarm 框架來協調它們。

但框架那麼多，哪個適合你？本文從架構本質出發，幫你做出選擇。

為什麼要用本地 Agent Swarm？

三個核心理由：

隱私與合規 — 敏感資料不出內網，適合金融、醫療場景
成本控制 — 用本地模型（Ollama、vLLM）取代 API 調用，長期成本降 10 倍以上
延遲可控 — 內網通訊 < 1ms vs API 調用 200-500ms

Production Notes — 即使用本地模型，你仍然可以在開發階段用雲端 API 快速迭代，部署時再切換到本地推理。大部分框架都支援這種混合模式。

主流框架比較

框架	Stars	架構模式	代表企業	下載量/月	學習曲線
LangGraph	~28k	圖狀態機（Nodes + Edges）	LinkedIn, Uber, Klarna	38.5M	中等
CrewAI	~46k	角色分工（Role + Goal）	Novo Nordisk, Oracle	5.2M	簡單
AutoGen/AG2	~57k	Actor 模型 / 對話驅動	⚠ 維護模式	—	困難
smolagents	~26k	Code-first 極簡	早期階段	—	簡單

補充框架 — MetaGPT（~64k stars）以 SOP 模擬軟體公司運作，適合程式碼生成場景但不適用通用 Agent 協作。OpenAI Agents SDK（取代已封存的 Swarm）由 HP、Intuit、Oracle 等企業採用，但綁定 OpenAI API。

CrewAI — 最直覺的角色扮演

CrewAI 的核心概念是「團隊」：每個 Agent 有角色、目標和 背景故事，被分配到任務，然後組成 Crew 執行。

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Research Analyst",
    goal="Find the latest trends in AI agent frameworks",
    backstory="You are a senior tech analyst..."
)

writer = Agent(
    role="Content Writer",
    goal="Write a concise summary from research findings",
    backstory="You are a technical blogger..."
)

research_task = Task(description="Research top 5 agent frameworks", agent=researcher)
write_task = Task(description="Write a summary article", agent=writer)

crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()

優點：上手最快，概念清晰，社群活躍（成長最快的框架）
缺點：複雜工作流的控制力有限

AutoGen — 曾經的明星，現已進入維護模式

Microsoft 的 AutoGen 在 v0.4 做了完全重寫，採用 Actor 模型。但 2025 年 10 月起已進入維護模式，Microsoft 將其與 Semantic Kernel 合併為統一的 Microsoft Agent Framework。原始創作者（Chi Wang、Qingyun Wu）離開 Microsoft，建立了社群驅動的 AG2 fork。

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat

researcher = AssistantAgent("researcher", model_client=model_client)
writer = AssistantAgent("writer", model_client=model_client)

team = RoundRobinGroupChat([researcher, writer], max_turns=3)
result = await team.run(task="Research and summarize AI trends")

優點：Actor 模型架構設計優秀，可分散式部署
缺點：已停止新功能開發，v0.2 → v0.4 不相容，社群分裂為 AG2 fork
⚠ 注意：如果你現在才要選框架，不建議新專案採用 AutoGen

LangGraph — 企業級生產首選

LangGraph 用有向圖來定義 Agent 之間的流轉邏輯。每個節點是一個處理步驟，邊決定下一步走向。它是目前企業生產環境採用率最高的多 Agent 框架：

LinkedIn — AI 招募助手，自動化候選人配對
Uber — 服務 5,000 名工程師，節省 21,000+ 開發小時
Klarna — 客服 AI 處理 8,500 萬用戶，回覆時間縮短 80%

from langgraph.graph import StateGraph

graph = StateGraph(AgentState)
graph.add_node("researcher", research_node)
graph.add_node("writer", writer_node)
graph.add_edge("researcher", "writer")

app = graph.compile(checkpointer=MemorySaver())
result = app.invoke({"task": "Research AI trends"})

優點：工作流可視化，checkpoint + human-in-the-loop，企業實戰驗證最多
缺點：需要理解圖資料結構，boilerplate 較多

smolagents — Code-first 極簡主義

Hugging Face 的 smolagents 核心只有 ~1000 行程式碼。Agent 直接寫 Python code 來呼叫工具，不用 JSON schema。

from smolagents import CodeAgent, HfApiModel, DuckDuckGoSearchTool

model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
result = agent.run("What are the top AI agent frameworks in 2026?")

優點：最輕量，支援本地 HF 模型，code-first 比 JSON 更靈活
缺點：多 Agent 協作功能較新，生態系較小，尚無知名企業採用案例

Production Notes — 如果你只是需要 單一 Agent + 工具呼叫，smolagents 是最佳起點。需要 多角色協作 用 CrewAI。需要 複雜工作流 + checkpoint + 企業級生產 用 LangGraph。

簡單實作：smolagents 雙 Agent 協作

選擇 smolagents 是因為它最輕量、不依賴特定 API provider、且支援本地模型。

安裝

pip install smolagents[litellm] duckduckgo-search

程式碼

"""
minimal_swarm.py - 最小化的雙 Agent 協作範例
Agent A (Manager): 協調任務分配
Agent B (WebSearch): 搜尋網路資訊
"""
from smolagents import CodeAgent, LiteLLMModel, DuckDuckGoSearchTool, tool

@tool
def summarize_text(text: str) -> str:
    """Summarize the given text into 3 bullet points."""
    return f"Summary of: {text[:100]}..."

# 使用 LiteLLM 支援任意 LLM provider
model = LiteLLMModel(model_id="gpt-4o-mini")  # 或 ollama/llama3.2

# Web Search Agent
web_agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
    name="web_search_agent",
    description="Searches the web for information on a given topic",
)

# Manager Agent (orchestrates the web agent)
manager = CodeAgent(
    tools=[],
    model=model,
    name="manager",
    managed_agents=[web_agent],
)

# 執行
result = manager.run(
    "Search for the top 3 local AI agent frameworks in 2026, "
    "and give me a brief comparison."
)
print(result)

執行結果

$ python minimal_swarm.py

╭─ Manager Agent ──────────────────────────────────────╮
│ I'll delegate the web search to my web_search_agent. │
╰──────────────────────────────────────────────────────╯
╭─ web_search_agent ───────────────────────────────────╮
│ Searching: "top local AI agent frameworks 2026"      │
│ Found 5 results...                                   │
╰──────────────────────────────────────────────────────╯
╭─ Manager Agent ──────────────────────────────────────╮
│ Based on web_search_agent's findings:                │
│                                                      │
│ 1. LangGraph (~28k stars) - Enterprise production    │
│ 2. CrewAI (~46k stars) - Role-based, easiest setup   │
│ 3. smolagents (~26k stars) - Code-first, minimal     │
│                                                      │
│ For quick prototyping: CrewAI or smolagents           │
│ For production at scale: LangGraph                   │
╰──────────────────────────────────────────────────────╯

以上為簡化的示意輸出，實際執行結果會因模型和搜尋結果而異。

Production Notes — LiteLLMModel 讓你用同一份程式碼切換任意 LLM：gpt-4o-mini（雲端）、ollama/llama3.2（本地）、或 anthropic/...（其他 provider）。部署時只改 model_id 即可。

框架選型決策樹

你的需求是什麼？
│
├── 只需要單一 Agent + 工具 → smolagents
│
├── 需要多角色協作
│   ├── 簡單的順序/並行執行 → CrewAI
│   └── 複雜的條件分支/迴圈 → LangGraph
│
└── 企業級生產部署 → LangGraph（已被 LinkedIn, Uber, Klarna 驗證）

總結

如果你是…	推薦	理由
剛接觸 Agent 的開發者	smolagents	最少 boilerplate，10 行就能跑
需要快速建立 Agent 團隊	CrewAI	角色概念直覺，社群資源豐富
建構複雜工作流	LangGraph	圖模型 + checkpoint + human-in-the-loop
企業級生產部署	LangGraph	LinkedIn, Uber, Klarna 驗證，38.5M 月下載

⚠ AutoGen 已不建議新專案採用 — 自 2025 年 10 月起進入維護模式，Microsoft 已將其合併至 Microsoft Agent Framework。

Agent Swarm 的未來趨勢是 更輕量的核心 + 更強的互操作性。smolagents 的 ~1000 行核心證明了一個好的 Agent 框架不需要很複雜。市場正在向 圖式工作流（LangGraph 領先）收斂，CrewAI 也在積極整合 LangChain 生態。

IoT 百萬設備架構選型 Part 3：運維、成本與可靠性

2026-03-30T02:02:00+00:00

Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry

English Abstract — Part 3 of 3. Operations: three-layer rate limiting (EMQX → Rule Engine → App), content-based dedup, anomaly detection. Edge resilience: exponential backoff + jitter, offline buffering (RAM/SQLite/MQTT Session Expiry). Server HA with RPO/RTO per component. OpenTelemetry end-to-end tracing. Multi-region DR (Active-Passive). Team onboarding risk and phased rollout.

系列文章： Part 1 核心架構 Part 2 安全與多租戶 Part 3 運維與可靠性（本篇）

Rate Limiting + Dedup

設備異常（firmware bug、sensor malfunction）可能瞬間灌入大量資料：

flowchart LR
    D[Faulty Device] -->|10 msg/s limit| E[EMQX]
    E -->|SQL filter| T[TimescaleDB]
    T -->|circuit breaker| A[FastAPI]
    E -->|exceed 10x| X1[Disconnect]

層	限制	超限動作	說明
EMQX	10 msg/s, 50 KB/s	Throttle → disconnect	第一道防線，per-client
Rule Engine	SQL filter + dedup	丟棄不符條件	基本過濾，無需代碼
FastAPI	Per-tenant rate limit	Alert + reject	業務邏輯層防護

Dedup 策略

層	策略
MQTT Broker	Packet ID tracking
Rule Engine	SQL WHERE + timestamp 比對
Application	`(device_id, timestamp, hash)`
Database	`ON CONFLICT DO NOTHING`

異常偵測

類型	偵測	處理
超頻上報	Rate > 10x	Broker throttle
範圍異常	超 physical range	丟棄 + 告警
時序異常	偏差 > 5min	標記 suspect
靜默設備	> 3x 正常間隔	LWT → offline 告警

Edge Resilience

Reconnect + Offline Buffer

stateDiagram-v2
    [*] --> Connected
    Connected --> Disconnected: Network down
    Disconnected --> Retry1s: Retry 1s
    Retry1s --> Connected: OK
    Retry1s --> Retry2s: Fail
    Retry2s --> RetryMax: Backoff + jitter
    RetryMax --> Connected: OK
    note right of Disconnected: Write to local buffer
    note right of Connected: Drain buffer on reconnect

Buffer	容量	持久性	適用
Ring buffer (RAM)	1-10K msg	斷電失	MCU
SQLite on flash	100K+ msg	持久	Gateway
MQTT v5 Session Expiry	Broker 端	Broker 存活時	所有

QoS 1：PUBLISH → 等 PUBACK → timeout 5s → pending → 重連後重送。搭配 application dedup 不重複。

Thundering Herd

Broker 恢復 → 1M 設備同時重連。Device jitter 分散 0-5min + EMQX max_conn_rate=10000/s → 100s 有序恢復。

Server-Side HA

組件	HA 策略	RPO	RTO
EMQX	3-5 node RAFT	0	<30s
TimescaleDB	Patroni + streaming replication	~0	<30s
ClickHouse	ReplicatedMergeTree	~0	<60s
FastAPI	K8s 3+ replicas	—	<5s

Multi-Region DR

層	AWS	GCP
MQTT	EMQX cluster linking	跨 Zone
DB	RDS cross-region replica	Cloud SQL cross-region
Cold	S3 CRR	GCS Dual-Region
DNS	Route 53 failover	Cloud DNS routing

Active-Passive： Primary 處理流量，Secondary 有 replica，DNS failover → RPO ~min, RTO ~5-10min。

Observability

Day 1 就做好，不是事後補。

flowchart TD
    E[EMQX] -->|metrics| P[Prometheus]
    A[FastAPI] -->|traces| OT[OpenTelemetry]
    BF[BFF] -->|traces| OT
    T[TimescaleDB] -->|metrics| P
    OT --> J[Jaeger]
    OT --> P
    OT --> L[Loki]
    P --> G[Grafana]
    J --> G
    L --> G

組件	監控重點	告警閾值
EMQX	連線數、msg rate、Rule Engine	> 900K、deny > 1%
TimescaleDB	Write throughput、disk	< 80K/s、> 80%
FastAPI	Latency、error rate	P99 > 200ms
BFF	WS connections	> 10K

每條 telemetry 帶 trace_id（Rule Engine 注入），Jaeger 一鍵查 device → Dashboard 完整鏈路。

雲端 vs 地端

服務	AWS	GCP	地端
Broker	EMQX Cloud	EMQX Cloud	EMQX on K8s
K8s	EKS	GKE Autopilot	K3s
Hot DB	Timescale Cloud	Timescale Cloud	VM
Warm DB	ClickHouse Cloud	ClickHouse Cloud	K8s
Cold	S3	GCS	MinIO
Cold 查詢	Athena	BigQuery	DuckDB
監控	CloudWatch	Cloud Monitoring	Grafana
月費	~$17-33K	~$17-33K	~$8-15K + ops

GKE Autopilot 比 EKS 易上手。BigQuery 按 scan 計價對 IoT 分析較划算。

成本估算

1M 設備月費（雲端 managed）

組件	AWS 月費	GCP 月費	說明
EMQX Cloud (3 node)	~$8-15K	~$8-15K	MQTT Broker
TimescaleDB	~$3-5K	~$3-5K	Hot 7d + Continuous Agg
ClickHouse Cloud	~$2-4K	~$2-4K	Warm 30-90d 分析
S3/GCS (~50 TB)	~$1-2K	~$1-2K	Cold 長期歸檔
K8s (Backend+BFF)	~$2-4K	~$2-4K	3-5 nodes
Observability	~$1-3K	~$1-3K	Grafana + OTel
合計	~$17-33K	~$17-33K
+ Redpanda (>1M)	+$5-10K	+$5-10K	Scale-out 時加入

不同規模

規模	架構	月費	說明
< 10 萬	EMQX + TimescaleDB + FastAPI + BFF	~$3-8K	2-3 人團隊
10-100 萬	+ ClickHouse + S3	~$10-20K	本文核心架構
> 100 萬	+ Redpanda + FastStream + DR	~$30-60K	Event streaming

建議： PoC 10 萬裝置先驗證（成本約 1/5），精算後再決定 managed vs self-hosted。

團隊與交付風險

風險	緩解	說明
核心 4 系統運維	OTel Day 1 + Grafana	統一 dashboard 降低認知負擔
新人 2-3 月上手	從極簡版開始	逐步加組件，避免一次全上
多租戶 ACL 出錯	Unit test + staging	配錯即資安事件
成本超預期	PoC 10 萬裝置	精算後再決定 managed vs self-hosted

導入順序：

Phase 1 (Month 1-2): EMQX + TimescaleDB + FastAPI + BFF + OTel
Phase 2 (Month 3-4): + ClickHouse + S3
Phase 3 (Month 5-6): + Multi-tenant + DR
Phase 4 (>1M):       + Redpanda + FastStream

後續考慮

OTA firmware update pipeline
Edge computing / gateway aggregation
Active-Active geo-replication（EMQX cluster linking + CRDT）

系列連結

Part 1：核心架構 — EMQX + TimescaleDB + FastAPI + BFF、成本估算
Part 2：安全與多租戶 — HTTPS/TLS、mTLS、Cert Rotation、RBAC、Topic ACL
Redpanda Documentation — Event Streaming（Scale-out >1M）

IoT 百萬設備架構選型 Part 2：安全與多租戶

2026-03-30T02:01:00+00:00

Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry

English Abstract — Part 2 of 3. Device security: HTTPS/TLS for all communication, mTLS X.509 for device authentication, software-based certificate rotation (90-365d), JIT provisioning, anti-spoofing measures. Multi-tenancy: MQTT Topic ACL namespace isolation, PostgreSQL Row-Level Security, 4-role RBAC model, dual-layer command authorization.

系列文章： Part 1 核心架構 Part 2 安全與多租戶（本篇） Part 3 運維與可靠性

Device Identity

認證方式

方式	安全性	適用	說明
mTLS (X.509)	最高	預設	CA chain 免存 per-device credential
PSK	中	受限設備	gateway 後方使用，rotation 較痛苦
JWT	高	OAuth2 整合	Stateless 驗證，需 refresh

MAC 可偽造、serial 可猜測 — Device ID 必須搭配密碼學憑證：

MQTT Client ID：{tenant}:{type}:{serial}
X.509 CN 匹配 Client ID → mTLS 自動綁定
DB PK：UUID v4

通訊安全

層	機制	說明
傳輸	TLS 1.2+ (8883)	加密 + 完整性
身份	mTLS 雙向驗證	Broker 驗 device，device 驗 broker
應用	Payload HMAC (optional)	防中間人改寫

Certificate Rotation：

有效期 90-365 天，到期前 30 天自動 CSR 換發
雙 CA chain 確保 rotation 不斷線
到期未更新 → CRL 撤銷 → 強制斷線 + 告警

Provisioning

flowchart TD
    R[Root CA] --> F[Intermediate CA]
    F -->|Bootstrap| D[First Connect]
    D -->|Verify| REG[Registry]
    REG -->|Issue cert| D2[Online]

方式	安全	適用
JIT	高	一般 fleet（推薦）
Claim-based	中	批量同型號
API 預註冊	高	已知 device list

防偽裝： One-time bootstrap token、Device fingerprint hash、Provisioning API rate limit、Allowlist/Denylist。

EMQX 認證鏈

mTLS → cert CN 取 device identity（peer_cert_as_clientid = cn）
JWT → RS256 簽名 + claims 驗證
HTTP → 外部 auth service（legacy 設備）

EMQX 支援 CRL + OCSP Stapling — 設備 compromise 時即時撤銷。

Multi-Tenancy

Broker 隔離

模式	隔離	適用	說明
共享 EMQX + Topic ACL	邏輯	95% 租戶	成本最低，ACL 管理
Broker-per-tenant	進程	法規要求	醫療/金融等合規場景
混合	視 tier	推薦	Standard 共享 + Enterprise 獨立

Topic 命名空間

{tenant}/d/{device}/telemetry      # 遙測
{tenant}/d/{device}/cmd/request    # 指令
{tenant}/d/{device}/cmd/response   # 回應
{tenant}/d/{device}/config/desired # 期望組態
{tenant}/g/{group}/cmd/request     # 群組廣播

Tenant ID 永遠第一層 → ACL 前綴比對。設備禁止 wildcard subscribe。

RBAC

權限	Super Admin	Tenant Admin	Operator	Viewer
管理 tenants	✓
註冊/停用設備	✓	✓
發送任意指令	✓	✓
發送預核准指令	✓	✓	✓
查看 Dashboard	✓	✓	✓	✓
OTA 部署	✓	✓

Command 雙層驗證

API 端： User role + command 權限 + device status + rate limit
Device 端： 驗簽名（防 injection）+ 驗 timestamp（防 replay）+ 驗 command_type

DB Tenant 隔離

策略	隔離	適用	說明
Row-Level Security	邏輯	預設	單 schema，policy 自動過濾
Schema-per-tenant	中	中等需求	N 個 schema migration
DB-per-tenant	最強	Enterprise	最高成本，完全隔離

TimescaleDB 按 (tenant_id, time) 分區 → 查詢自動 pruning，可按 tenant 設定不同 retention。

Part 1：核心架構 — EMQX + TimescaleDB + FastAPI + BFF、成本估算
Part 3：運維、成本與可靠性 — Rate Limiting、Edge Resilience、DR、Observability、成本估算

IoT 百萬設備架構選型 Part 1：核心架構與技術選型

2026-03-30T02:00:00+00:00

Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry

English Abstract — Part 1 of 3. Core architecture for 1M IoT devices: EMQX (Rule Engine direct write) → TimescaleDB (hot 7d + Continuous Aggregates) → FastAPI → BFF → Dashboard. Includes protocol selection, broker comparison, three-tier storage, BFF design, cost estimation (~$17-33K/month), and AWS/GCP mapping. Scale-out (>1M): add Redpanda + ClickHouse.

前言

1 百萬台設備、每 10 秒回報一次 = 100K writes/sec、~1.7 TB/day。

本系列採用漸進式架構 — Phase 1 只需 4 個核心組件即可上線：

Phase	組件	適用規模
1 (MVP)	EMQX + TimescaleDB + FastAPI + BFF	< 100 萬
2	+ ClickHouse + S3（冷熱分層）	同上
3	+ Multi-tenant + DR	同上
4	+ Redpanda + FastStream（event streaming）	> 100 萬

系列文章： Part 1 核心架構（本篇） Part 2 安全與多租戶 Part 3 運維與可靠性

架構資料流

Telemetry（Device → Dashboard）

flowchart LR
    D[Device] -->|MQTT| B[EMQX]
    B -->|Rule Engine| T[TimescaleDB]
    T --> A[FastAPI]
    A --> BFF[BFF]
    BFF -->|WebSocket| U[Dashboard]

Command（Dashboard → Device）

flowchart RL
    U[Dashboard] -->|WS/REST| BFF[BFF]
    BFF --> A[FastAPI]
    A -->|MQTT QoS 1| B[EMQX]
    B --> D[Device]

EMQX Rule Engine 直寫 TimescaleDB，無 Event Streaming 中間層。延遲 <50ms，適合 1M 以下。>1M 需 dedup / event replay 時再加 Redpanda。

通訊協定與 Broker

協定	雙向	功耗	Overhead	適用
MQTT v5	Yes	極低	2-byte header	IoT 預設
CoAP	有限	極低	UDP	NB-IoT 受限設備
gRPC	Yes	高	HTTP/2 + protobuf	Service-to-service

MQTT v5 關鍵功能：Correlation ID、Shared subscriptions、Message expiry、Retained messages、LWT。

Broker	1M 連線	Clustering	推薦	說明
EMQX	✓ (100M+)	RAFT	★★★★★	開源、Rule Engine 內建、社群最大
HiveMQ	✓	原生	★★★★	商業授權、企業支援佳
Mosquitto	x (~100K)	無	僅 dev	單線程、無 clustering

部署	AWS	GCP	月費
EMQX Cloud	AWS	GCP	~$8-15K
Self-hosted K8s	EKS	GKE	~$3-8K + ops

Python 後端：asyncio

Free Threading (PEP 703) 預計 Python 3.16 (~2028) 才正式。1M 連線是 I/O-bound，asyncio + uvloop (2-4x 提升) + 多 worker 進程是正解。

HAProxy → Uvicorn worker 1..N (asyncio + uvloop, ~50-100K conn/worker)
           └── Redis/NATS cross-process pub/sub

BFF (Backend for Frontend)

Backend 不直接面對前端 UI。BFF 層負責 WebSocket、API 聚合、Response 裁切。

flowchart LR
    subgraph BE["FastAPI Backend"]
        D[Device API]
        T2[Telemetry API]
        C[Command API]
    end
    BE -->|gRPC / REST| W[BFF-Web]
    BE -->|gRPC / REST| M[BFF-Mobile]
    W -->|WebSocket| WD[Dashboard]
    M -->|REST + Push| MA[Mobile App]

層	職責	不做
Backend	Device CRUD、Telemetry、Command、RBAC、MQTT	UI 邏輯
BFF	WS 管理、聚合查詢、裁切、i18n	直連 DB/MQTT

面向	選擇	說明
語言	FastAPI 或 Next.js API Routes	依前端團隊技術棧
BFF → Backend	gRPC 或 REST	gRPC 效能好，REST 開發快
快取	Redis	Status cache + WS pub/sub

EMQX Rule Engine 資料寫入

不使用獨立 Event Streaming。Rule Engine 內建 PostgreSQL connector 直寫 TimescaleDB：

SQL-like 過濾：SELECT * FROM "telemetry/#" WHERE payload.temperature > 50
訊息轉發、格式轉換、基本 dedup、Rate Limiting + 背壓

1M 設備需 event replay / 多消費者時，加入 Redpanda（Phase 4）。

三層儲存策略

層	DB	保留	查詢延遲	月成本/TB	說明
Hot	TimescaleDB	7 days	<10ms	~$200	原始解析度，Dashboard 即時查詢
Warm	ClickHouse	30-90d	50-500ms	~$50	1min/5min 聚合，分析查詢
Cold	S3 + Parquet	年	秒級	~$2-5	時/日聚合，DuckDB ad-hoc

Continuous Aggregates 是 Dashboard 查詢的關鍵 — 自動預聚合 1min/5min/1hr，查詢量降 600x。

CREATE MATERIALIZED VIEW sensor_1min
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 minute', time) AS bucket, device_id,
       avg(temperature) AS avg_temp, max(temperature) AS max_temp
FROM telemetry GROUP BY bucket, device_id;

flowchart TD
    D[Devices] -->|MQTT| E[EMQX]
    E -->|Rule Engine| H[TimescaleDB]
    H -->|Aggregate| W[ClickHouse]
    W -->|Archive| C[S3 Parquet]
    H -->|Query| A[FastAPI]
    A -->|gRPC| B[BFF]
    B -->|WS| U[Dashboard]

成本估算： 1M 設備雲端 managed 約 ~$17-33K/月（AWS/GCP），詳見 Part 3 成本與風險。

技術選型總表

層	選擇	AWS	GCP
設備協定	MQTT v5	—	—
Broker	EMQX	EMQX Cloud	EMQX Cloud
Data Ingestion	Rule Engine	—	—
Streaming (>1M)	Redpanda	MSK	Redpanda Cloud
Backend	FastAPI + asyncio	Fargate	Cloud Run
BFF	FastAPI / Next.js	Fargate	Cloud Run
UI 推送	WebSocket via BFF	ALB	Cloud LB
Hot DB	TimescaleDB	Timescale Cloud	Timescale Cloud
Warm DB	ClickHouse	ClickHouse Cloud	ClickHouse Cloud
Cold	S3 + Parquet	S3	GCS
Observability	OTel + Grafana	CloudWatch	Cloud Monitoring

Part 2：安全與多租戶 — HTTPS/TLS、mTLS、Cert Rotation、RBAC、Topic ACL、DB 隔離
Part 3：運維、成本與可靠性 — Rate Limiting、Edge Resilience、DR、Observability、成本估算、團隊風險

Claude Code Channel Plugin 開發實戰：Telegram Inline Buttons

2026-03-27T02:00:00+00:00

Channel Plugin 架構：Telegram Inline Buttons 與 Cache Patching 機制

English Abstract — Claude Code’s --channels flag only accepts official plugin identifiers and re-extracts the plugin into a cache directory on every launch, overwriting local modifications. After trying 6 different approaches (pre-sync copy, background watcher, --plugin-dir, --mcp-config, symlink, and cache patching), we found that cache patching — rewriting the cached .mcp.json to redirect --cwd to a local fork — is the cleanest workaround: idempotent, no residual state, and compatible with inbound channel notifications. This article also covers implementing Telegram inline keyboard buttons via raw Bot API format (bypassing grammy’s serialization issue), callback query handling, and credential isolation with TELEGRAM_STATE_DIR.

前言

claude-code-channels 是一個讓 Claude Code 透過 Telegram、Discord、Slack、LINE、WhatsApp 等通訊平台互動的開源專案。每個 channel 都是一個 MCP server，以 Bun subprocess 的形式運行，透過 stdio transport 與 Claude Code session 溝通。

今天的目標看似簡單：讓 Telegram 的 reply tool 支援 inline keyboard buttons。實作按鈕本身不難，但在過程中踩到了 Claude Code plugin cache 的覆蓋機制，最終花了更多時間在架構問題上。這篇文章記錄完整過程。

Channel 資料流

以下是 channel message 流程：

Normal Message Flow:

flowchart LR
    U[User] -->|message| B[Bot]
    B -->|notification| M[MCP Server]
    M -->|stdio| C[Claude Code]
    C -->|reply tool| M
    M -->|sendMessage| B
    B --> U

Inline Button Flow:

flowchart LR
    C[Claude Code] -->|reply + buttons| M[MCP Server]
    M -->|inline_keyboard| B[Bot]
    B --> U[User]
    U -->|callback_query| B
    B -->|notification| M
    M -->|forward| C

Telegram Inline Buttons 實作

需求

當 Claude 需要用戶回應一組固定選項時（Yes/No、Approve/Reject、1~5 數字），讓用戶直接按按鈕比打字更直覺。官方 Telegram plugin 的 reply tool 只支援純文字，沒有按鈕參數。

方案設計

在 reply tool 加一個 optional buttons 參數，二維字串陣列，每個內層陣列代表一排按鈕：

// Claude 可以發送任意按鈕組合
reply({ chat_id, text: "確認部署?", buttons: [["Yes", "No"]] })
reply({ chat_id, text: "選擇方案:", buttons: [["方案A", "方案B"], ["取消"]] })

實際效果

三種 inline button 場景：部署確認（Approve/Reject）、功能評分（1-5）、方案選擇（Plan A/B/Skip），底部為 /session status 指令回傳的 STM 狀態

截圖展示了三種常見的互動場景：

部署確認 — 二選一的 Approve/Reject，按下後按鈕消失並顯示結果
功能評分 — 單排 5 個數字按鈕，適合量化回饋
方案選擇 — 兩排按鈕（多選項 + 跳過），支援任意排列組合

每次按鈕點擊都會作為 inbound message 回傳給 Claude Code session，meta 帶 button: "true" 標記，Claude 可以直接根據選擇繼續執行。

關鍵技術決策

使用 raw Telegram API format，而非 grammy 的 InlineKeyboard class。

grammy 的 InlineKeyboard class 搭配 spread operator 傳入 sendMessage options 時，reply_markup 會在序列化過程中丟失 — API 回傳成功但 Telegram 不顯示按鈕。用 curl 直接呼叫 Bot API 測試正常，確認是 grammy class 的問題。改用 raw format 後立即解決：

const replyMarkup = {
  inline_keyboard: buttons.map(row =>
    row.map(label => ({
      text: String(label),
      callback_data: `btn:${String(label).slice(0, 59)}`, // 64 bytes 上限
    }))
  ),
}

按鈕點擊後的 Callback 處理： 攔截 btn: prefix 的 callback_query，三步完成：

answerCallbackQuery() — 消除 Telegram loading 動畫
editMessageText() — 更新訊息顯示選擇結果（防止重複點擊）
mcp.notification() — 將按鈕 label 作為 inbound message 轉發給 Claude Code session（meta: button=true）

最後在 MCP server 的 instructions 加上一句引導：

Prefer buttons over asking the user to type whenever the response is a small fixed set of choices.

這樣 Claude 在猜數字、確認操作等場景會主動使用 buttons，不需用戶提醒。

Plugin Cache 覆蓋問題

發現問題

按鈕功能寫好了，直接改 ~/.claude/plugins/cache/claude-plugins-official/telegram/0.0.4/server.ts，重啟 Claude Code，按鈕不出現。加了 diagnostic watermark 到回傳值：

sent (id: 54)          ← 沒有 [local-v1] watermark

確認：Claude Code 在 --channels plugin:telegram@claude-plugins-official 啟動時，會 re-extract 官方 plugin 到 cache，覆蓋所有修改。

嘗試過的 6 種方案

#	方案	結果
1	Pre-sync cp	x — re-extract 覆蓋
2	Background watcher	x — race condition
3	`--plugin-dir`	x — 不支援 channel plugins
4	`--mcp-config`	x — 無 channel notification
5	Symlink	△ — 可行，殘留管理麻煩
6	Cache patching	✓ 穩定、無殘留、idempotent

Bun Transpile Cache 的額外坑

即使成功把修改放進 cache，重啟後仍可能跑舊 code。原因是 bun 會 cache transpiled TypeScript，即使 server.ts 檔案改了，bun 仍可能使用舊的 cached bytecode。需要 rm -rf /tmp/bun-* 清除。

這個問題已開 upstream issue：anthropics/claude-plugins-official#1057

Cache Patching 架構

最終採用的方案：將官方 plugin fork 到專案裡做版控，啟動時改寫 plugin cache 的啟動設定（.mcp.json），讓 Claude Code 跑我們的 fork code。

工作原理

Claude Code 啟動 --channels plugin:telegram@claude-plugins-official 時，會把官方 plugin 解壓到 cache 目錄。Cache patching 不改 server.ts，而是改寫 cache 裡的 .mcp.json，把 --cwd 指向專案裡的 local fork：

~/.claude/plugins/cache/.../telegram/0.0.4/.mcp.json（改寫後）：

{
  "mcpServers": {
    "telegram": {
      "command": "bun",
      "args": ["run", "--cwd", "/external_plugins/telegram-channel/", "server.ts"],
      "env": { "TELEGRAM_STATE_DIR": "/.claude/channels/telegram" }
    }
  }
}

這樣 Claude Code 仍用官方 plugin identifier 啟動（保留 inbound notification 路由），但實際執行的是我們 fork 過的 code：

/external_plugins/telegram-channel/
    ├── .mcp.json
    ├── server.ts          ← fork（版控裡的 source of truth）
    ├── skills/
    └── node_modules/

為什麼不用 symlink？ Symlink 方案可行，但會留下殘留檔案（.official 備份目錄），升級時也需要額外清理。Cache patching 是 idempotent 的 — 每次啟動重新寫入，不殘留任何狀態。

通用場景

Cache patching 不限於 Telegram — 任何 --channels plugin:xxx@claude-plugins-official 都適用同一套機制。只要將官方 plugin fork 到 external_plugins/-channel/，啟動腳本就能自動改寫對應的 .mcp.json。目前 claude-code-channels 已對 Telegram 和 Discord 使用此方案。

適用條件：

需要修改官方 channel plugin 的行為（加功能、修 bug、改 skill 路徑）
需要保留官方 plugin identifier 的 inbound notification 路由
不想維護 symlink 或其他有狀態的 workaround

重要設定：`channelsEnabled`

Claude Code 的 channel notification（inbound 訊息）預設是關閉的。需要在 settings 裡開啟：

{
  "channelsEnabled": true
}

沒有這個設定，outbound（發訊息）正常運作，但 inbound 會被靜默丟棄 — 這是最容易被忽略的坑。

Credential 與 STATE_DIR 隔離

Bot token 和 access control 只存在專案目錄內（透過 TELEGRAM_STATE_DIR 環境變數指定），不會暴露到 ~/.claude/channels/telegram/ 這個全域路徑。這確保每個專案的 credentials 互相隔離，不會被其他 Claude Code session 讀取。

官方 plugin 的 skills（/telegram:access、/telegram:configure）原本把路徑 hardcode 為 ~/.claude/channels/telegram/，導致設定 TELEGRAM_STATE_DIR 後 pairing 失敗。Fork 後修復：skills 改用 $STATE shorthand，由 $TELEGRAM_STATE_DIR 解析，fallback 到 global。同樣的修復也套用到 Discord plugin 和 ACCESS.md 文件。

總結

Takeaways

Plugin 開發最大障礙：Claude Code 的 --channels 只接受官方 plugin identifier，啟動時會 re-extract 覆蓋 cache。目前沒有官方的 local plugin 載入方式。
Cache patching 是最乾淨的 workaround：改寫 plugin cache 的啟動設定指向 local fork，idempotent 且無殘留，比 symlink、pre-sync、background watcher 都穩定。
channelsEnabled: true 容易被忽略：沒有這個設定，outbound 正常但 inbound 被靜默丟棄，debug 時很容易誤判為 bot polling 問題。
Bun transpile cache 是隱性坑：改了 TypeScript 原始碼，bun 可能仍跑舊版本。清 /tmp/bun-* 或設定 BUN_DISABLE_CACHE=1 可解。
建議官方改進：
- --channels 支援 local path（類似 --plugin-dir）
- Plugin 啟動時加 --no-cache 避免 transpile cache 問題

深入解析 Claude Code 的 Ralph Loop Stop Hook

2026-03-26T02:00:00+00:00

Ralph Loop Stop Hook 運作流程與 State File 結構

English Abstract — The Ralph Loop Stop Hook is a bash-based hook for Claude Code that enables autonomous, iterative AI agent sessions. When Claude finishes a response, the Stop Hook intercepts the session exit, reads the agent’s transcript, checks for a completion promise, and — if the task isn’t done — re-injects the original prompt to continue the loop. This article dissects the 191-line script: state file architecture (YAML frontmatter + markdown prompt), session isolation to prevent cross-session interference, JSONL transcript parsing, Perl-based tag detection, and atomic state updates. Includes the actual source code with production safety considerations.

Claude Code 的 Hook 機制讓開發者可以在 AI agent 的生命週期中插入自訂邏輯。其中 Stop Hook 是最強大的一種 — 它在 Claude 每次完成回應時觸發，可以決定是否阻止 session 結束並繼續執行。Ralph Loop 正是利用這個機制，實現了 AI agent 的自主迭代。

起因：一個神秘的 Permission Denied

事情的起點是我的 Claude Code session 底部不斷閃過這行錯誤：

Ran 1 stop hook (ctrl+o to expand)
⎿ Stop hook error: Failed with non-blocking status code:
  /bin/sh: 1: ~/.claude/plugins/marketplaces/claude-plugins-official/
  plugins/ralph-loop/hooks/stop-hook.sh: Permission denied

每次 Claude 完成回應都會觸發一次，雖然標示 non-blocking（不影響正常使用），但反覆出現讓人好奇 — Ralph Loop 到底是什麼？為什麼它的 Stop Hook 會在我的 session 裡觸發？

追查後才發現，這是安裝 claude-plugins-official marketplace 時一起帶入的 plugin。腳本沒有執行權限（chmod +x），所以每次都報 Permission Denied。修正權限後錯誤消失，但也因此讓我深入研究了這個設計精巧的 Stop Hook。

什麼是 Ralph Loop？

Ralph Loop 是一個 Stop Hook 腳本，核心功能很簡單：

Claude 完成回應 → Stop Hook 觸發
檢查是否有活躍的迴圈（state file 是否存在）
如果任務未完成 → 阻止 session 結束，重新注入 prompt
Claude 讀取自己上一輪的輸出，繼續改進

這創造了一個自我參照的迭代迴路 — Claude 反覆檢視並改進自己的工作，直到達成完成條件或達到迭代上限。

運作流程

1. Hook 觸發與 State 檢查

Stop Hook 首先讀取 stdin 的 JSON 輸入，然後檢查 state file 是否存在：

# Read hook input from stdin (advanced stop hook API)
HOOK_INPUT=$(cat)

# Check if ralph-loop is active
RALPH_STATE_FILE=".claude/ralph-loop.local.md"

if [[ ! -f "$RALPH_STATE_FILE" ]]; then
  # No active loop - allow exit
  exit 0
fi

Production Notes — exit 0 代表 hook 正常完成但不阻擋。只有輸出 {"decision": "block"} 的 JSON 才能阻止 session 結束。State file 不存在時，hook 是完全透明的。

2. YAML Frontmatter 解析

State file 使用 YAML frontmatter + Markdown body 的格式，與 Jekyll post 結構一致：

# Parse markdown frontmatter and extract values
FRONTMATTER=$(sed -n '/^---$/,/^---$/{ /^---$/d; p; }' "$RALPH_STATE_FILE")
ITERATION=$(echo "$FRONTMATTER" | grep '^iteration:' | sed 's/iteration: *//')
MAX_ITERATIONS=$(echo "$FRONTMATTER" | grep '^max_iterations:' | sed 's/max_iterations: *//')
COMPLETION_PROMISE=$(echo "$FRONTMATTER" | grep '^completion_promise:' | \
  sed 's/completion_promise: *//' | sed 's/^"\(.*\)"$/\1/')

State file 結構如下：

---
iteration: 3
max_iterations: 10
completion_promise: "DONE"
session_id: abc123
---
Your prompt text here.
每次迭代都會將這段 prompt 重新注入 Claude。

3. Session 隔離

State file 是 project-scoped（位於 .claude/ 目錄），但 Stop Hook 會在該 project 下的所有 Claude Code session 中觸發。如果另一個 session 開了同一個 project，不應該被這個 loop 阻擋：

STATE_SESSION=$(echo "$FRONTMATTER" | grep '^session_id:' | \
  sed 's/session_id: *//' || true)
HOOK_SESSION=$(echo "$HOOK_INPUT" | jq -r '.session_id // ""')

if [[ -n "$STATE_SESSION" ]] && [[ "$STATE_SESSION" != "$HOOK_SESSION" ]]; then
  exit 0  # Wrong session - don't interfere
fi

Production Notes — 沒有 session isolation 的話，在同一個 project 開兩個 terminal 跑 Claude Code，一個 session 的 loop 會阻擋另一個 session 的正常退出。這是實際部署中很容易踩到的坑。

4. 迭代上限與數值驗證

在做算術運算前，先驗證欄位是否為合法數字 — 防止 state file 被手動編輯後導致 bash 報錯：

if [[ ! "$ITERATION" =~ ^[0-9]+$ ]]; then
  echo "Warning: State file corrupted" >&2
  rm "$RALPH_STATE_FILE"
  exit 0
fi

# Check if max iterations reached
if [[ $MAX_ITERATIONS -gt 0 ]] && [[ $ITERATION -ge $MAX_ITERATIONS ]]; then
  echo "Ralph loop: Max iterations ($MAX_ITERATIONS) reached."
  rm "$RALPH_STATE_FILE"
  exit 0
fi

5. Transcript 解析

Claude Code 的 transcript 是 JSONL 格式（每行一個 JSON），每個 content block（text / tool_use / thinking）都是獨立的一行。Hook 需要從中提取最後一段 assistant 文字：

TRANSCRIPT_PATH=$(echo "$HOOK_INPUT" | jq -r '.transcript_path')

# Extract last 100 assistant lines for performance
LAST_LINES=$(grep '"role":"assistant"' "$TRANSCRIPT_PATH" | tail -n 100)

# Parse and get the final text block
LAST_OUTPUT=$(echo "$LAST_LINES" | jq -rs '
  map(.message.content[]? | select(.type == "text") | .text) | last // ""
')

Production Notes — tail -n 100 是效能考量：長時間 session 的 transcript 可能有數千行，全部用 jq slurp 會很慢。100 行足以涵蓋最近的 assistant 回應。

6. Completion Promise 偵測

Ralph Loop 使用 tag 作為完成信號。Claude 在輸出中寫入 DONE 就代表任務已完成：

if [[ "$COMPLETION_PROMISE" != "null" ]] && [[ -n "$COMPLETION_PROMISE" ]]; then
  # Extract text from  tags using Perl for multiline support
  PROMISE_TEXT=$(echo "$LAST_OUTPUT" | \
    perl -0777 -pe 's/.*?(.*?)<\/promise>.*/$1/s; s/^\s+|\s+$//g; s/\s+/ /g' \
    2>/dev/null || echo "")

  # Literal string comparison (not glob pattern matching)
  if [[ -n "$PROMISE_TEXT" ]] && [[ "$PROMISE_TEXT" = "$COMPLETION_PROMISE" ]]; then
    echo "Ralph loop: Detected $COMPLETION_PROMISE"
    rm "$RALPH_STATE_FILE"
    exit 0
  fi
fi

Production Notes — 使用 = 而非 == 做比較是刻意的：[[ ]] 中 == 會做 glob pattern matching，如果 promise 文字包含 * 或 ? 會導致非預期的匹配。= 是 literal string comparison，更安全。

7. 迴圈繼續

如果 promise 未達成且迭代未到上限，hook 會：

更新 state file 的 iteration 計數（原子操作）
提取 prompt 文字
輸出 JSON 阻止 session 結束

NEXT_ITERATION=$((ITERATION + 1))

# Atomic state update: temp file + mv
TEMP_FILE="${RALPH_STATE_FILE}.tmp.$$"
sed "s/^iteration: .*/iteration: $NEXT_ITERATION/" "$RALPH_STATE_FILE" > "$TEMP_FILE"
mv "$TEMP_FILE" "$RALPH_STATE_FILE"

# Extract prompt (everything after the closing ---)
PROMPT_TEXT=$(awk '/^---$/{i++; next} i>=2' "$RALPH_STATE_FILE")

# Output JSON to block the stop and feed prompt back
jq -n \
  --arg prompt "$PROMPT_TEXT" \
  --arg msg "Ralph iteration $NEXT_ITERATION | To stop: output $COMPLETION_PROMISE" \
  '{ "decision": "block", "reason": $prompt, "systemMessage": $msg }'

Production Notes — mv 是 POSIX 保證的原子操作（在同一檔案系統上）。直接 sed -i 在寫入中途若進程被殺，會留下損壞的 state file。temp file + mv 確保 state file 永遠是完整的。

實際應用場景

自動化測試修復迴圈：

/ralph-loop "Run the failing tests. Fix the code. Re-run tests.
Repeat until all pass." --max-iterations 5 --completion-promise "ALL TESTS PASS"

文件品質自審迴圈：

/ralph-loop "Review the PR diff. Check for bugs, security issues,
and style violations. If you find issues, fix them and re-review."
--max-iterations 3 --completion-promise "REVIEW COMPLETE"

漸進式重構：

/ralph-loop "Refactor the auth module. Each iteration, improve one aspect:
naming, error handling, or test coverage."
--max-iterations 4 --completion-promise "REFACTOR DONE"

安全機制總結

機制	用途	實作方式
`max_iterations`	防止無限迴圈	達到上限時刪除 state file，exit 0
Session Isolation	防止跨 session 干擾	比對 `session_id`
數值驗證	防止 state 損壞導致 crash	regex 驗證 + 清理
Atomic Update	防止 state file 寫入中途損壞	temp file + `mv`
Promise Literal Match	防止 glob 字元誤匹配	`=` 取代 `==`
Transcript Cap	防止長 session 效能問題	`tail -n 100`

References

Claude Code Hooks — Official Documentation
Ralph Loop Plugin — ralph-wiggum on npm
Stop Hook Deep Dive — Claude Code Stop Hook: Force Task Completion
Source Script — stop-hook.sh

Source: osisdie/osisdie.github.io — PRs and Issues welcome!

LLM 整合 RAG 技術的核心挑戰與突破方向

2026-03-25T02:00:00+00:00

RAG 核心挑戰與對應的突破解決方案

English Abstract — As RAG (Retrieval-Augmented Generation) moves from proof-of-concept to production in 2026, six core challenges have emerged: retrieval quality gaps, the “Lost in the Middle” attention problem, knowledge conflicts between retrieved documents and parametric memory, hallucination propagation from bad retrievals, inability to perform multi-hop reasoning, and latency/cost at scale. This article examines each challenge and maps them to four breakthrough solutions: Hybrid Search + Reranking, Agentic RAG / Graph RAG, Self-RAG, and the RAGAS evaluation framework — with pseudocode examples and production considerations.

2026 年生產環境中，RAG 不再是「加分項」，而是「必備項」— 但多數團隊仍在踩雷。本文全面分析 RAG 面臨的六大核心挑戰與四大突破方向，附帶 pseudocode 與實戰注意事項。

核心挑戰

1. 檢索品質的瓶頸

RAG 的效果高度依賴「找得到」的前提。傳統向量相似度搜尋（cosine similarity）在語意模糊或多義詞情境下容易失準，例如查詢「蘋果市值」時可能同時召回水果和科技公司的文件。此外，文件切分（chunking）策略若處理不當，同一個概念被切斷後，單獨的 chunk 會失去上下文意義。→ 這正是 Hybrid Search + Reranking 要解決的問題。

2. 知識整合的挑戰（Lost in the Middle）

研究顯示，當 LLM 的 context window 塞入大量 retrieved 文件時，模型對位於中間位置的文件注意力顯著下降，容易忽略關鍵資訊。這個問題在 context 超過 4k token 時尤為明顯。→ 解法是 Long-Context 重新排列與壓縮式摘要。

3. 知識衝突（Knowledge Conflict）

外部檢索到的文件與 LLM 本身的參數知識（parametric knowledge）可能互相矛盾。例如模型訓練時學到「X 是 CEO」，但最新文件顯示已換人，模型可能固執地相信自己的舊知識。→ 需要指令強化明確提示「以文件為準」。

4. 幻覺傳染（Hallucination Propagation）

若 retriever 召回了錯誤或無關文件，LLM 傾向於「信任」並據此生成，反而比不做 RAG 更糟，因為模型會把錯誤資訊包裝成有根據的回答。→ Faithfulness 評估模型與 RAGAS 框架能有效偵測這個問題。

5. 跨文件推理受限（Multi-hop Reasoning）

複雜問題需要跨多份文件進行推理（A → B → C），但標準 RAG 是「一次性」檢索，無法像人類一樣逐步找到中間線索再繼續深挖。→ Agentic RAG 與 Graph RAG 正是為此而生。

6. 延遲與成本

每次請求需要即時做 embedding 搜尋、重排序（reranking），加上 LLM 推理，整體延遲在生產環境中是顯著挑戰。→ 透過快取 + 預計算索引可有效緩解。

深入解析：突破解決方案

Hybrid Search + Reranking

結合稀疏檢索（BM25，擅長精確關鍵字匹配）與稠密向量檢索，再透過 Cross-Encoder 做二次排序。這種兩階段架構（召回 100 篇 → 精排 top-5）大幅提升最終送入 LLM 的文件品質，是目前業界主流作法。

# Hybrid Search + Reranking pseudocode
bm25_results = bm25_search(query, top_k=50)
vector_results = vector_search(embed(query), top_k=50)

# Reciprocal Rank Fusion
candidates = rrf_merge(bm25_results, vector_results, k=60)

# Cross-Encoder reranking
scored = cross_encoder.predict([(query, doc) for doc in candidates])
top_docs = sorted(scored, reverse=True)[:5]

Production Notes — Cross-Encoder reranking 延遲約 50-200ms（取決於模型大小）。可用輕量 reranker（如 bge-reranker-v2-m3）在 <50ms 完成。召回階段用 ANN 近似搜尋（HNSW）而非暴力搜尋以降低 p99 延遲。

Agentic RAG 與 Graph RAG

Agentic RAG 讓 LLM 作為 agent，根據前一次檢索的結果決定下一個查詢，支援跨文件多步推理。Graph RAG（Microsoft 2024 年提出）則將知識以圖結構儲存，能捕捉實體間的關係，對「比較型」和「概念聯結型」問題效果顯著優於傳統向量 RAG。

# Agentic RAG pseudocode — iterative retrieval loop
context = []
for step in range(MAX_ITERATIONS):  # guard: prevent infinite loops
    action = llm.decide(query, context)  # "search" | "answer" | "refine"
    if action == "answer":
        return llm.generate(query, context)
    elif action == "search":
        new_query = llm.rewrite_query(query, context)
        docs = retriever.search(new_query)
        context.extend(docs)
    elif action == "refine":
        query = llm.decompose(query)  # break into sub-questions

Production Notes — 務必設定 MAX_ITERATIONS（建議 3-5），避免 agent 陷入無限循環。每輪迭代的 token 消耗會累積，需監控成本。Graph RAG 的建圖成本高（indexing 階段），但查詢階段效率與向量 RAG 相當。

Self-RAG

這是一個較根本的架構改變：模型學會在生成過程中自行插入特殊 token，決定「現在需不需要檢索」、「這段生成是否有文件支持」，把檢索決策內化到模型本身，而非外部固定流程。

# Self-RAG — model generates special tokens during inference
output_tokens = []
for segment in generate_segments(query):
    # Model outputs a retrieval decision token
    if segment.retrieval_token == "[Retrieve=Yes]":
        docs = retriever.search(segment.text)
        segment = regenerate_with_context(segment, docs)
    # Model self-evaluates with support token
    if segment.support_token == "[Fully Supported]":
        output_tokens.append(segment)
    elif segment.support_token == "[No Support]":
        output_tokens.append(flag_as_uncertain(segment))

Production Notes — Self-RAG 需要專門微調的模型（原論文使用 Llama 2 微調）。推論延遲比標準 RAG 高約 1.5-2x，因為需要多次生成 + 評估。適合高精度場景（醫療、法律），不適合低延遲需求。

RAGAS 評估框架

RAG 系統的評估一直是痛點。RAGAS 提供了四個維度的自動化評估：

Faithfulness – 生成是否忠實於文件
Answer Relevancy – 答案是否回答問題
Context Recall – 需要的資訊是否被召回
Context Precision – 召回的文件是否相關

# RAGAS evaluation pseudocode
for question, ground_truth in eval_dataset:
    contexts = retriever.search(question)
    answer = llm.generate(question, contexts)

    scores = {
        "faithfulness":      ragas.faithfulness(answer, contexts),
        "answer_relevancy":  ragas.relevancy(answer, question),
        "context_recall":    ragas.recall(contexts, ground_truth),
        "context_precision": ragas.precision(contexts, question),
    }
# Aggregate scores to track system improvements over time

Production Notes — RAGAS 本身使用 LLM 做評估（LLM-as-judge），因此評估成本與被評估系統的推論成本相當。建議在 CI/CD 中對 golden dataset（50-100 筆）跑 RAGAS，設定 threshold 作為品質門檻。

有了可量化的指標，系統改進才有方向。

總結趨勢

目前領域的方向是從「靜態一次性檢索」走向「動態、自反式、多輪」的架構。Long Context 模型的崛起（如 Gemini 1.5 Pro 的 1M token window）讓部分人質疑 RAG 是否仍有必要，但實際上 RAG 的價值在於知識的可更新性與可溯源性，而非只是解決 context 長度問題，這是純粹增大 context window 無法取代的。兩者更可能是互補而非替代關係。

References

Lost in the Middle — Liu et al., 2023. Lost in the Middle: How Language Models Use Long Contexts
Graph RAG — Microsoft, 2024. From Local to Global: A Graph RAG Approach · GitHub
Self-RAG — Asai et al., 2023. Self-RAG: Learning to Retrieve, Generate, and Critique
RAGAS — GitHub · Documentation
Corrective RAG — Yan et al., 2024. Corrective Retrieval Augmented Generation

Recommended Repos

microsoft/graphrag — Production-ready Graph RAG implementation
explodinggradients/ragas — RAG evaluation framework
run-llama/llama_index — Full-featured RAG framework
langchain-ai/langchain — LLM application framework with RAG support

Source: osisdie/osisdie.github.io — PRs and Issues welcome!

Kevin’s Tech Blog

/yt2pdf 全解析：YouTube 影片 → 雙語 PDF 摘要的 6 階段自動化 Pipeline

前言

Pipeline 架構總覽

字幕擷取的三層 Fallback策略

Markdown → PDF：3 步轉換

B2 上傳與 Presigned URL

Command Spec 驅動的 Agent 協作

設計決策總覽

相關連結

Claude Code Agent 架構深度拆解：8 個可複用的 Production 設計模式

架構總覽：21 個子系統

Pattern 1: Tool Registration Pipeline

Pattern 3: Side-Query Pattern

Pattern 4: Coordinator / Worker + XML 結果注入

Pattern 6: Hook Event System

16 種事件

5 種 Hook 類型

HookMatcher 模式匹配

Pattern 8: Context Compaction 三層策略

其他值得關注的模式

Pattern 2: Immutable State Store + Change Hooks

Pattern 5: Permission Rule System

Pattern 7: Deferred Loading

給 Agent 系統開發者的建議

相關連結

本地 Agent Swarm 框架全解析：從架構比較到簡單實作

為什麼要用本地 Agent Swarm？

主流框架比較

CrewAI — 最直覺的角色扮演

AutoGen — 曾經的明星，現已進入維護模式

LangGraph — 企業級生產首選

smolagents — Code-first 極簡主義

簡單實作：smolagents 雙 Agent 協作

安裝

程式碼

執行結果

框架選型決策樹

總結

相關連結

IoT 百萬設備架構選型 Part 3：運維、成本與可靠性

Rate Limiting + Dedup

Dedup 策略

異常偵測

Edge Resilience

Reconnect + Offline Buffer

Thundering Herd

Server-Side HA

Multi-Region DR

Observability

雲端 vs 地端

成本估算

1M 設備月費（雲端 managed）

不同規模

團隊與交付風險

後續考慮

系列連結

IoT 百萬設備架構選型 Part 2：安全與多租戶

Device Identity

認證方式

通訊安全

Provisioning

EMQX 認證鏈

Multi-Tenancy

Broker 隔離

Topic 命名空間

RBAC

Command 雙層驗證

DB Tenant 隔離

下一篇

IoT 百萬設備架構選型 Part 1：核心架構與技術選型

前言

架構資料流

Telemetry（Device → Dashboard）

Command（Dashboard → Device）

通訊協定與 Broker

Python 後端：asyncio

BFF (Backend for Frontend)

EMQX Rule Engine 資料寫入

三層儲存策略

重要設定：`channelsEnabled`