<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://osisdie.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://osisdie.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-04-05T01:05:07+00:00</updated><id>https://osisdie.github.io/feed.xml</id><title type="html">Kevin’s Tech Blog</title><subtitle>A tech blog covering LLM, AI, RAG, .NET, Python, and Cloud engineering. </subtitle><entry><title type="html">/yt2pdf 全解析：YouTube 影片 → 雙語 PDF 摘要的 6 階段自動化 Pipeline</title><link href="https://osisdie.github.io/blog/2026/yt2pdf-pipeline/" rel="alternate" type="text/html" title="/yt2pdf 全解析：YouTube 影片 → 雙語 PDF 摘要的 6 階段自動化 Pipeline"/><published>2026-04-04T02:00:00+00:00</published><updated>2026-04-04T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/yt2pdf-pipeline</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/yt2pdf-pipeline/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-overview-480.webp 480w,/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-overview-800.webp 800w,/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="yt2pdf 6-stage pipeline overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">yt2pdf：從 YouTube URL 到雙語 PDF 的 6 階段 Pipeline</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — This post dissects <code class="language-plaintext highlighter-rouge">/yt2pdf</code>, a 6-stage automation pipeline that converts any YouTube video into bilingual (EN + Traditional Chinese) PDF summaries. The pipeline chains yt-dlp subtitle extraction with Whisper ASR fallback, AI-powered bilingual summarization, headless Chrome PDF rendering with base64-embedded images, and Backblaze B2 cloud upload with 7-day presigned URLs. We examine the transcript fallback strategy, the orchestrator pattern, and 6 key design decisions for production deployment.</p> </blockquote> <h2 id="前言">前言</h2> <p>在<a href="/blog/2026/claude-code-channel-plugin-dev/">之前的 Channel Plugin 實戰</a>中，我們建立了 Telegram / Discord 雙向溝通的基礎設施。使用者開始在頻道裡丟 YouTube 連結，問「這個影片在講什麼？」——但每次都要手動看完影片、整理摘要、再回覆，效率太低。</p> <p><code class="language-plaintext highlighter-rouge">/yt2pdf</code> 就是為了解決這個問題：<strong>一個指令，自動擷取字幕、產生雙語摘要、輸出 PDF、上傳雲端</strong>。使用者在 Telegram 輸入 <code class="language-plaintext highlighter-rouge">/yt2pdf https://youtube.com/watch?v=xxx</code>，幾分鐘後就收到一份排版精美的 PDF 下載連結。</p> <p>這是 <a href="https://github.com/osisdie/claude-code-channels">claude-code-channels</a> v1.1.0 的核心功能。本文拆解它的完整 Pipeline 架構。</p> <hr/> <h2 id="pipeline-架構總覽">Pipeline 架構總覽</h2> <p>整個流程分為 6 個階段，每個階段由獨立的 Python 模組負責：</p> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-detail-480.webp 480w,/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-detail-800.webp 800w,/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-detail-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/yt2pdf-pipeline/yt2pdf-pipeline-detail.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="yt2pdf 6-stage pipeline detail" loading="lazy" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">完整 6 階段架構：字幕擷取三層 Fallback + 轉換 Pipeline + Tech Stack</figcaption> </figure> <p>簡化流程圖：</p> <pre><code class="language-mermaid">flowchart TB
    A["1. Parse URL"] --&gt; B["2. Thumbnail + Metadata"]
    B --&gt; D{"3. Subtitles?"}
    D --&gt;|"Manual / Auto"| E["SRT → Text"]
    D --&gt;|"No subs"| F["Whisper ASR"] --&gt; E
    E --&gt; G["4. AI Summary (EN + zh-TW)"]
    G --&gt; H["5. Markdown → HTML → PDF"]
    H --&gt; J["6. Upload B2 → Presigned URL"]
</code></pre> <p>頻道整合的角度看，流程更簡單：</p> <pre><code class="language-mermaid">flowchart LR
    U["User"] --&gt;|"/yt2pdf URL"| T["Telegram / Discord"]
    T --&gt;|"MCP notification"| CC["Agent Session"]
    CC --&gt;|"run pipeline"| PY["Python Scripts"]
    PY --&gt;|"JSON result"| CC
    CC --&gt;|"reply + URLs"| T
</code></pre> <p>每個 Python 模組職責清楚：</p> <table> <thead> <tr> <th>模組</th> <th>職責</th> <th>輸入</th> <th>輸出</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">get_transcript.py</code></td> <td>字幕擷取 + Whisper Fallback</td> <td>Video ID</td> <td>Plain text</td> <td>三層 Fallback確保有字幕</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">build_html.py</code></td> <td>Markdown → 排版 HTML</td> <td><code class="language-plaintext highlighter-rouge">.md</code> file</td> <td><code class="language-plaintext highlighter-rouge">.html</code> file</td> <td>Base64 圖片嵌入</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">build_pdf.py</code></td> <td>HTML → PDF</td> <td><code class="language-plaintext highlighter-rouge">.html</code> file</td> <td><code class="language-plaintext highlighter-rouge">.pdf</code> file</td> <td>Headless Chrome 渲染</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">upload_b2.py</code></td> <td>上傳 B2 + 產生連結</td> <td><code class="language-plaintext highlighter-rouge">.pdf</code> file</td> <td>Presigned URL</td> <td>7 天限時下載</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">yt2pdf.py</code></td> <td>串接以上四步</td> <td><code class="language-plaintext highlighter-rouge">.md</code> files</td> <td>JSON array</td> <td>一次性 B2 授權</td> </tr> </tbody> </table> <hr/> <h2 id="字幕擷取的三層-fallback策略">字幕擷取的三層 Fallback策略</h2> <p>字幕擷取是整個 Pipeline 最不確定的環節——不是每支影片都有字幕。<code class="language-plaintext highlighter-rouge">get_transcript.py</code> 實作了三層 Fallback：</p> <table> <thead> <tr> <th>方法</th> <th>來源</th> <th>工具</th> <th>延遲</th> <th>準確度</th> <th>適用場景</th> </tr> </thead> <tbody> <tr> <td>Manual subtitles</td> <td>人工上傳字幕</td> <td>yt-dlp <code class="language-plaintext highlighter-rouge">--write-subs</code></td> <td>~2s</td> <td>最高</td> <td>有人工字幕的影片</td> </tr> <tr> <td>Auto-generated</td> <td>YouTube 自動產生</td> <td>yt-dlp <code class="language-plaintext highlighter-rouge">--write-auto-subs</code></td> <td>~2s</td> <td>中等</td> <td>英語影片、熱門語言</td> </tr> <tr> <td>Whisper ASR</td> <td>音訊轉文字</td> <td>ffmpeg + HuggingFace API</td> <td>30-120s</td> <td>高</td> <td>無字幕的影片</td> </tr> </tbody> </table> <p>核心邏輯：</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_transcript</span><span class="p">(</span><span class="n">video_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span> <span class="o">|</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">video_url</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="s">https://www.youtube.com/watch?v=</span><span class="si">{</span><span class="n">video_id</span><span class="si">}</span><span class="sh">"</span>

    <span class="k">with</span> <span class="n">tempfile</span><span class="p">.</span><span class="nc">TemporaryDirectory</span><span class="p">(</span><span class="n">prefix</span><span class="o">=</span><span class="sh">"</span><span class="s">yt_transcript_</span><span class="sh">"</span><span class="p">)</span> <span class="k">as</span> <span class="n">tmpdir</span><span class="p">:</span>
        <span class="n">tmp</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="n">tmpdir</span><span class="p">)</span>

        <span class="c1"># Layer 1 &amp; 2: Try subtitles (manual → auto-generated)
</span>        <span class="n">srt</span> <span class="o">=</span> <span class="nf">download_subtitles</span><span class="p">(</span><span class="n">video_url</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="n">lang</span><span class="o">=</span><span class="sh">"</span><span class="s">en</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">srt</span><span class="p">:</span>
            <span class="n">text</span> <span class="o">=</span> <span class="nf">srt_to_text</span><span class="p">(</span><span class="n">srt</span><span class="p">)</span>
            <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">100</span><span class="p">:</span>
                <span class="k">return</span> <span class="n">text</span>

        <span class="c1"># Layer 3: Whisper fallback
</span>        <span class="n">text</span> <span class="o">=</span> <span class="nf">whisper_transcribe_hf</span><span class="p">(</span><span class="n">video_url</span><span class="p">,</span> <span class="n">tmp</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">text</span> <span class="ow">and</span> <span class="nf">len</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">100</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">text</span>

    <span class="k">return</span> <span class="bp">None</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — Whisper Fallback會增加 30-120 秒延遲（取決於影片長度），而且 HuggingFace Inference API 有 rate limit。建議在頻道回覆中先發 “Processing…” 訊息，讓使用者知道系統正在處理。SRT 解析會自動去除時間戳和序號，只保留純文字。</p> </blockquote> <hr/> <h2 id="markdown--pdf3-步轉換">Markdown → PDF：3 步轉換</h2> <p>拿到字幕後，AI 產生雙語 Markdown 摘要。接下來要把 <code class="language-plaintext highlighter-rouge">.md</code> 轉成可下載的 PDF。orchestrator <code class="language-plaintext highlighter-rouge">yt2pdf.py</code> 串接這三步：</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">process_one</span><span class="p">(</span><span class="n">md_path</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">upload</span><span class="p">,</span> <span class="n">b2_prefix</span><span class="p">,</span> <span class="n">b2_authorized</span><span class="p">):</span>
    <span class="n">lang</span> <span class="o">=</span> <span class="nf">_detect_lang</span><span class="p">(</span><span class="n">md_path</span><span class="p">)</span>  <span class="c1"># *_en.md → "en", *_zh-tw.md → "zh-tw"
</span>    <span class="n">result</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">lang</span><span class="sh">"</span><span class="p">:</span> <span class="n">lang</span><span class="p">,</span> <span class="sh">"</span><span class="s">md</span><span class="sh">"</span><span class="p">:</span> <span class="nf">str</span><span class="p">(</span><span class="n">md_path</span><span class="p">)}</span>

    <span class="c1"># Step 1: Markdown → styled HTML (base64 embedded images)
</span>    <span class="n">html_content</span> <span class="o">=</span> <span class="nf">build_html</span><span class="p">(</span><span class="n">md_path</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">,</span> <span class="n">lang</span><span class="o">=</span><span class="n">lang</span><span class="p">)</span>
    <span class="n">html_path</span> <span class="o">=</span> <span class="n">md_path</span><span class="p">.</span><span class="nf">with_suffix</span><span class="p">(</span><span class="sh">"</span><span class="s">.html</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">html_path</span><span class="p">.</span><span class="nf">write_text</span><span class="p">(</span><span class="n">html_content</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="sh">"</span><span class="s">utf-8</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Step 2: HTML → PDF via headless Chrome
</span>    <span class="n">pdf_path</span> <span class="o">=</span> <span class="n">md_path</span><span class="p">.</span><span class="nf">with_suffix</span><span class="p">(</span><span class="sh">"</span><span class="s">.pdf</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">pdf_result</span> <span class="o">=</span> <span class="nf">html_to_pdf</span><span class="p">(</span><span class="n">html_path</span><span class="p">,</span> <span class="n">pdf_path</span><span class="p">)</span>

    <span class="c1"># Step 3: Upload to B2 (optional)
</span>    <span class="k">if</span> <span class="n">upload</span> <span class="ow">and</span> <span class="n">pdf_result</span> <span class="ow">and</span> <span class="n">b2_authorized</span><span class="p">:</span>
        <span class="n">b2_path</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">b2_prefix</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">pdf_path</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="sh">"</span>
        <span class="n">url</span> <span class="o">=</span> <span class="nf">upload_file</span><span class="p">(</span><span class="n">pdf_result</span><span class="p">,</span> <span class="n">b2_path</span><span class="p">)</span>
        <span class="n">result</span><span class="p">[</span><span class="sh">"</span><span class="s">url</span><span class="sh">"</span><span class="p">]</span> <span class="o">=</span> <span class="n">url</span>

    <span class="k">return</span> <span class="n">result</span>
</code></pre></div></div> <p>三步的關鍵設計：</p> <p><strong>Base64 圖片嵌入</strong> — <code class="language-plaintext highlighter-rouge">build_html.py</code> 會把本地圖片（如 <code class="language-plaintext highlighter-rouge">thumb.jpg</code>）轉為 data URI 嵌入 HTML。這讓 PDF 完全自包含，離線也能正常顯示。</p> <p><strong>CJK 字型支援</strong> — HTML 模板指定 <code class="language-plaintext highlighter-rouge">"Noto Sans TC", "Microsoft JhengHei", "PingFang TC"</code> 字型堆疊，確保繁體中文在各平台都能正確渲染。CSS 使用 <code class="language-plaintext highlighter-rouge">@page { size: A4; margin: 2cm; }</code> 控制頁面尺寸。</p> <p><strong>Headless Chrome</strong> — <code class="language-plaintext highlighter-rouge">build_pdf.py</code> 用 <code class="language-plaintext highlighter-rouge">google-chrome --headless --print-to-pdf</code> 產生 PDF。Chrome 的 CSS 引擎是所有 PDF 方案中 CJK 支援最完整的。</p> <blockquote> <p><strong>Production Notes</strong> — Docker 環境需要 <code class="language-plaintext highlighter-rouge">--no-sandbox</code> 旗標和 <code class="language-plaintext highlighter-rouge">fonts-noto-cjk</code> 套件。<code class="language-plaintext highlighter-rouge">build_pdf.py</code> 會自動搜尋 <code class="language-plaintext highlighter-rouge">google-chrome</code>、<code class="language-plaintext highlighter-rouge">google-chrome-stable</code>、<code class="language-plaintext highlighter-rouge">chromium</code> 等執行檔路徑。</p> </blockquote> <hr/> <h2 id="b2-上傳與-presigned-url">B2 上傳與 Presigned URL</h2> <p>PDF 產生後，上傳到 Backblaze B2 雲端儲存並產生 presigned URL：</p> <ul> <li><strong>一次性授權</strong> — orchestrator 在啟動時呼叫 <code class="language-plaintext highlighter-rouge">authorize_b2()</code> 一次，所有檔案共用同一個 session</li> <li><strong>日期分區路徑</strong> — <code class="language-plaintext highlighter-rouge">yt2pdf/2026-04-04/summary_en.pdf</code>，方便按日期清理</li> <li><strong>7 天 TTL</strong> — <code class="language-plaintext highlighter-rouge">b2 get-download-url-with-auth --duration 604800</code> 產生限時下載連結</li> <li><strong>降級策略</strong> — B2 上傳失敗時，直接在頻道附加 PDF 檔案作為 fallback</li> </ul> <p>為什麼用 presigned URL 而不是直接附件？Telegram Bot API 傳送檔案時會拆成<strong>獨立訊息</strong>——如果同時傳 EN + zh-TW 兩個 PDF，使用者會收到 3 條訊息（文字 + 2 個檔案），體驗很差。用 URL 可以把所有資訊整合在一條回覆中。</p> <hr/> <h2 id="command-spec-驅動的-agent-協作">Command Spec 驅動的 Agent 協作</h2> <p>整個 Pipeline 的入口不是 Python，而是一份 <strong>Command Spec</strong>：<code class="language-plaintext highlighter-rouge">.claude/commands/yt2pdf.md</code>。</p> <p>這份 197 行的 Markdown 文件定義了 6 個步驟的完整流程——從 URL 解析、thumbnail 下載、metadata 擷取、transcript 提取、summary 生成到 PDF 輸出。它本質上是一個<strong>結構化的 prompt</strong>，告訴 Agent 怎麼協調各個 Python script。</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Parse &amp; Acknowledge    → 解析 URL，回覆 "Processing..."
Step 2: Download Thumbnail     → curl YouTube CDN
Step 3: Fetch Metadata         → yt-dlp --dump-json
Step 4: Generate Summary       → AI 產生雙語 Markdown
Step 5: Build PDFs &amp; Upload    → python3 scripts/yt/yt2pdf.py ...
Step 6: Reply with Results     → 格式化回覆 + presigned URLs
</code></pre></div></div> <p>這跟<a href="/blog/2026/claude-code-agent-architecture/">前幾天拆解的 Agent 架構</a>中的 <strong>Command System</strong> 是同一個模式——用結構化文件定義工作流程，讓 Agent 按步驟執行。差別在於這裡的 Command Spec 不只定義步驟，還包含<strong>錯誤處理策略</strong>和<strong>頻道特定的回覆格式</strong>（Telegram vs Discord vs Slack）。</p> <hr/> <h2 id="設計決策總覽">設計決策總覽</h2> <table> <thead> <tr> <th>決策</th> <th>選擇</th> <th>替代方案</th> <th>理由</th> </tr> </thead> <tbody> <tr> <td>圖片嵌入</td> <td>Base64 data URI</td> <td>外部圖片連結</td> <td>PDF 自包含，離線可讀，轉寄不會破圖</td> </tr> <tr> <td>PDF 引擎</td> <td>headless Chrome</td> <td>wkhtmltopdf / WeasyPrint</td> <td>CJK 字型支援最佳，CSS 渲染最完整</td> </tr> <tr> <td>檔案交付</td> <td>Presigned URL (7 天)</td> <td>直接附件</td> <td>避免 Telegram 拆成多條訊息</td> </tr> <tr> <td>Pipeline 輸出</td> <td>JSON stdout</td> <td>檔案寫入 / exit code</td> <td>機器可解析，Agent 直接讀取結果</td> </tr> <tr> <td>語言偵測</td> <td>檔名慣例 (<code class="language-plaintext highlighter-rouge">*_en.md</code>)</td> <td>內容偵測 / 明確參數</td> <td>簡單可靠，零外部依賴</td> </tr> <tr> <td>目錄結構</td> <td><code class="language-plaintext highlighter-rouge">YYYY-MM-DD/VIDEO_ID/</code></td> <td>平面目錄</td> <td>按日期清理、避免 ID 衝突</td> </tr> </tbody> </table> <blockquote> <p><strong>Production Notes</strong> — B2 credential 建議使用 Application Key（非 Master Key），且限定單一 bucket 的權限。HuggingFace token 要注意 rate limit——免費方案的 Whisper large-v3 模型每小時有請求上限。</p> </blockquote> <hr/> <h2 id="相關連結">相關連結</h2> <ul> <li><strong>Channel Plugin 實戰</strong> — <a href="/blog/2026/claude-code-channel-plugin-dev/">從零開始建立 Telegram / Discord 雙向頻道</a></li> <li><strong>Agent 架構拆解</strong> — <a href="/blog/2026/claude-code-agent-architecture/">8 個可複用的 Production 設計模式</a></li> <li><strong>Agent Swarm 框架比較</strong> — <a href="/blog/2026/local-agent-swarm/">5 大本地 Agent Swarm 框架全解析</a></li> </ul>]]></content><author><name></name></author><category term="claude-code"/><category term="youtube"/><category term="pdf"/><category term="pipeline"/><category term="automation"/><category term="python"/><category term="whisper"/><category term="b2"/><summary type="html"><![CDATA[從 yt-dlp 字幕擷取、Whisper 語音辨識、AI 摘要生成到 headless Chrome PDF 輸出與 B2 雲端上傳，拆解 /yt2pdf 的完整 6 階段 Pipeline]]></summary></entry><entry><title type="html">Claude Code Agent 架構深度拆解：8 個可複用的 Production 設計模式</title><link href="https://osisdie.github.io/blog/2026/claude-code-agent-architecture/" rel="alternate" type="text/html" title="Claude Code Agent 架構深度拆解：8 個可複用的 Production 設計模式"/><published>2026-04-01T02:00:00+00:00</published><updated>2026-04-01T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/claude-code-agent-architecture</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/claude-code-agent-architecture/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/claude-code-architecture/claude-code-architecture-overview-480.webp 480w,/assets/img/blog/2026/claude-code-architecture/claude-code-architecture-overview-800.webp 800w,/assets/img/blog/2026/claude-code-architecture/claude-code-architecture-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/claude-code-architecture/claude-code-architecture-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="Agent Architecture Patterns Overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">從 Claude Code 原始碼提煉的 8 個可複用架構模式</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — This post dissects the internal architecture of a production agent system, extracting 8 reusable design patterns from 1,902 TypeScript source files: Tool Registration Pipeline, Side-Query for cost-efficient routing, Coordinator/Worker with XML result injection, Hook Event System (16×5×7 combinatorics), and Context Compaction three-layer strategy. Each pattern includes pseudocode and practical adoption guidance.</p> </blockquote> <p><a href="/blog/2026/local-agent-swarm/">昨天的文章</a>從外部比較了主流 Agent Swarm 框架。今天我們換一個角度：<strong>深入一個 production 級 agent 系統的原始碼</strong>，看它是怎麼設計的。</p> <p>我們分析了 1,902 個 TypeScript 檔案、21 個子系統，提煉出 <strong>8 個可直接複用的設計模式</strong>。不論你用的是 CrewAI、LangGraph 還是自建框架，這些模式都能直接套用。</p> <hr/> <h2 id="架構總覽21-個子系統">架構總覽：21 個子系統</h2> <p>整個系統可以分為 5 大層次：</p> <table> <thead> <tr> <th>層次</th> <th>子系統</th> <th>職責</th> </tr> </thead> <tbody> <tr> <td><strong>執行層</strong></td> <td>Tool System, Skill System</td> <td>工具定義、註冊、執行</td> </tr> <tr> <td><strong>協調層</strong></td> <td>Agent/Subagent, Coordinator</td> <td>多 Agent 分工與結果整合</td> </tr> <tr> <td><strong>安全層</strong></td> <td>Permission, Hook System</td> <td>權限控制、事件攔截</td> </tr> <tr> <td><strong>記憶層</strong></td> <td>State Store, Memory, Context</td> <td>狀態管理、上下文壓縮</td> </tr> <tr> <td><strong>擴展層</strong></td> <td>Plugin, Command, Output Style</td> <td>模組化擴展機制</td> </tr> </tbody> </table> <p>本文深入 5 個最具採用價值的模式（Pattern 1, 3, 4, 6, 8），簡述其餘 3 個。</p> <hr/> <h2 id="pattern-1-tool-registration-pipeline">Pattern 1: Tool Registration Pipeline</h2> <p><strong>問題</strong>：工具來自多個來源（內建、Plugin、MCP、使用者自訂），需要統一管理並控制可見性。</p> <p>解法是一個 <strong>四階段 pipeline</strong>，每一步都可以插入邏輯：</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Stage 1: Define — 宣告工具的 schema 和能力</span>
<span class="kd">const</span> <span class="nx">toolDefs</span><span class="p">:</span> <span class="nx">ToolDef</span><span class="p">[]</span> <span class="o">=</span> <span class="p">[</span>
  <span class="p">{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Bash</span><span class="dl">"</span><span class="p">,</span> <span class="na">schema</span><span class="p">:</span> <span class="nx">bashSchema</span><span class="p">,</span> <span class="na">isDestructive</span><span class="p">:</span> <span class="kc">true</span> <span class="p">},</span>
  <span class="p">{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">"</span><span class="s2">FileRead</span><span class="dl">"</span><span class="p">,</span> <span class="na">schema</span><span class="p">:</span> <span class="nx">readSchema</span><span class="p">,</span> <span class="na">isReadOnly</span><span class="p">:</span> <span class="kc">true</span> <span class="p">},</span>
  <span class="c1">// ... MCP tools, plugin tools, user tools</span>
<span class="p">];</span>

<span class="c1">// Stage 2: Build — 實例化工具，注入 context</span>
<span class="kd">const</span> <span class="nx">tools</span> <span class="o">=</span> <span class="nx">toolDefs</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="nx">def</span> <span class="o">=&gt;</span> <span class="nf">buildTool</span><span class="p">(</span><span class="nx">def</span><span class="p">,</span> <span class="nx">context</span><span class="p">));</span>

<span class="c1">// Stage 3: Filter — 根據 deny rules 移除不允許的工具</span>
<span class="kd">const</span> <span class="nx">filtered</span> <span class="o">=</span> <span class="nx">tools</span><span class="p">.</span><span class="nf">filter</span><span class="p">(</span><span class="nx">t</span> <span class="o">=&gt;</span> <span class="o">!</span><span class="nx">denyRules</span><span class="p">.</span><span class="nf">matches</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">));</span>

<span class="c1">// Stage 4: Assemble — 排序（按名稱，穩定 prompt cache）並組裝</span>
<span class="kd">const</span> <span class="nx">pool</span> <span class="o">=</span> <span class="nf">assembleToolPool</span><span class="p">(</span><span class="nx">filtered</span><span class="p">.</span><span class="nf">sort</span><span class="p">(</span><span class="nx">byName</span><span class="p">));</span>
</code></pre></div></div> <p><strong>關鍵設計決策</strong>：</p> <ul> <li><strong>Fail-closed</strong> — 預設拒絕，必須明確允許才能使用</li> <li><strong>按名稱排序</strong> — 工具順序穩定，最大化 prompt cache hit rate</li> <li><strong>Feature gate 注入</strong> — 在 Build 階段根據 feature flag 決定是否包含工具</li> </ul> <blockquote> <p><strong>Production Notes</strong> — 如果你正在建 agent 系統，不要把 tool 註冊寫成一個大的 if-else。用 pipeline 模式讓每個階段獨立可測試。特別是 Filter 階段 — 它讓你不用改程式碼就能關閉特定工具。</p> </blockquote> <hr/> <h2 id="pattern-3-side-query-pattern">Pattern 3: Side-Query Pattern</h2> <p><strong>問題</strong>：每次決策都用主模型太貴。記憶檢索、權限判斷、路由分派這些「輔助判斷」不需要最強的模型。</p> <p>解法是 <strong>side-query</strong> — 在主對話旁邊開一個輕量的 LLM 查詢：</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 記憶檢索：用較小模型挑選相關記憶</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nf">recallMemories</span><span class="p">(</span><span class="nx">query</span><span class="p">:</span> <span class="kr">string</span><span class="p">):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="nx">Memory</span><span class="p">[]</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">allMemories</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">scanMemoryFiles</span><span class="p">();</span>    <span class="c1">// 掃描所有記憶檔</span>
  <span class="kd">const</span> <span class="nx">selected</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">sideQuery</span><span class="p">({</span>
    <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">fast</span><span class="dl">"</span><span class="p">,</span>                                 <span class="c1">// 用便宜的模型</span>
    <span class="na">prompt</span><span class="p">:</span> <span class="s2">`從以下記憶中選出與 "</span><span class="p">${</span><span class="nx">query</span><span class="p">}</span><span class="s2">" 相關的（最多 5 個）`</span><span class="p">,</span>
    <span class="na">context</span><span class="p">:</span> <span class="nx">allMemories</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="nx">m</span> <span class="o">=&gt;</span> <span class="nx">m</span><span class="p">.</span><span class="nx">summary</span><span class="p">),</span>
  <span class="p">});</span>
  <span class="k">return</span> <span class="nx">selected</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// 權限分類：2 階段 XML classifier</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nf">classifyPermission</span><span class="p">(</span><span class="nx">toolCall</span><span class="p">:</span> <span class="nx">ToolCall</span><span class="p">):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="nx">Decision</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">stage1</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">sideQuery</span><span class="p">({</span>
    <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">fast</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">prompt</span><span class="p">:</span> <span class="s2">`判斷此工具呼叫的安全性：</span><span class="p">${</span><span class="nx">toolCall</span><span class="p">.</span><span class="nx">name</span><span class="p">}</span><span class="s2">(</span><span class="p">${</span><span class="nx">toolCall</span><span class="p">.</span><span class="nx">args</span><span class="p">}</span><span class="s2">)`</span><span class="p">,</span>
  <span class="p">});</span>
  <span class="k">if </span><span class="p">(</span><span class="nx">stage1</span> <span class="o">===</span> <span class="dl">"</span><span class="s2">soft_deny</span><span class="dl">"</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="k">await</span> <span class="nf">sideQuery</span><span class="p">({</span> <span class="na">model</span><span class="p">:</span> <span class="dl">"</span><span class="s2">fast</span><span class="dl">"</span><span class="p">,</span> <span class="na">prompt</span><span class="p">:</span> <span class="dl">"</span><span class="s2">進一步評估...</span><span class="dl">"</span> <span class="p">});</span>
  <span class="p">}</span>
  <span class="k">return</span> <span class="nx">stage1</span><span class="p">;</span>  <span class="c1">// "allow" or "ask"</span>
<span class="p">}</span>
</code></pre></div></div> <p><strong>實際應用場景</strong>：</p> <table> <thead> <tr> <th>用途</th> <th>主模型</th> <th>Side-Query</th> <th>成本比</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>記憶檢索</td> <td>Opus</td> <td>Haiku</td> <td>~20:1</td> <td>大量記憶快速篩選</td> </tr> <tr> <td>權限判斷</td> <td>Opus</td> <td>Haiku</td> <td>~20:1</td> <td>語意判斷不需推理</td> </tr> <tr> <td>路由分派</td> <td>Opus</td> <td>Sonnet</td> <td>~5:1</td> <td>中等複雜度路由</td> </tr> </tbody> </table> <blockquote> <p><strong>Production Notes</strong> — Side-query 的 prompt 要精心設計 — 它是整個系統最高頻的 LLM 呼叫。建議固定 prompt 格式以最大化 cache hit，並設定 timeout 防止 side-query 拖慢主對話。</p> </blockquote> <hr/> <h2 id="pattern-4-coordinator--worker--xml-結果注入">Pattern 4: Coordinator / Worker + XML 結果注入</h2> <p><strong>問題</strong>：複雜任務需要多個 Agent 並行處理，但共享狀態會帶來競爭問題。</p> <p>解法是 <strong>Coordinator/Worker 模式</strong> — Coordinator 只負責規劃和整合，Workers 非同步執行：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Coordinator (規劃)
    │
    ├── Worker A (Research)    ──async──→ &lt;task-notification&gt; XML
    ├── Worker B (Implement)   ──async──→ &lt;task-notification&gt; XML
    └── Worker C (Test)        ──async──→ &lt;task-notification&gt; XML
    │
    └── Coordinator (整合所有結果)
</code></pre></div></div> <p><strong>四個階段</strong>：</p> <ol> <li><strong>Research</strong> — 搜集資訊、理解需求</li> <li><strong>Synthesis</strong> — 整合發現、制定方案</li> <li><strong>Implementation</strong> — 並行執行具體工作</li> <li><strong>Verification</strong> — 驗證結果、品質檢查</li> </ol> <p><strong>結果注入機制</strong>：Worker 完成後，結果以 XML 格式注入 Coordinator 的對話：</p> <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;task-notification&gt;</span>
  <span class="nt">&lt;task-id&gt;</span>worker-a<span class="nt">&lt;/task-id&gt;</span>
  <span class="nt">&lt;status&gt;</span>completed<span class="nt">&lt;/status&gt;</span>
  <span class="nt">&lt;result&gt;</span>Found 3 relevant APIs: ...<span class="nt">&lt;/result&gt;</span>
<span class="nt">&lt;/task-notification&gt;</span>
</code></pre></div></div> <p><strong>關鍵設計</strong>：</p> <ul> <li>Coordinator <strong>不直接使用工具</strong> — 只有 AgentTool、SendMessage、TaskStop</li> <li>Workers 在 <strong>獨立 context</strong> 中執行 — 不共享 state，避免競爭</li> <li>XML 注入是 <strong>append-only</strong> — 不會修改已有的對話歷史</li> </ul> <blockquote> <p><strong>Production Notes</strong> — 這個模式的核心是「不共享狀態」。昨天我們比較的 LangGraph 用 graph state 共享，CrewAI 用 sequential task passing。Coordinator/Worker 則完全解耦 — 代價是 Coordinator 需要更強的整合能力。適合高併發、低耦合的場景。</p> </blockquote> <hr/> <h2 id="pattern-6-hook-event-system">Pattern 6: Hook Event System</h2> <p><strong>問題</strong>：系統需要可擴展性，但不希望核心程式碼被修改。</p> <p>解法是一個 <strong>高度可組合的 Hook 系統</strong>：16 種事件 × 5 種 Hook 類型 × 7 個來源。</p> <h3 id="16-種事件">16 種事件</h3> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SessionStart, Setup, UserPromptSubmit,
PreToolUse, PostToolUse, PostToolUseFailure,
PermissionRequest, PermissionDenied,
Stop, FileChanged, WorktreeCreate,
SubagentStart, Notification, Elicitation, CwdChanged
</code></pre></div></div> <h3 id="5-種-hook-類型">5 種 Hook 類型</h3> <table> <thead> <tr> <th>Hook 類型</th> <th>執行方式</th> <th>適用場景</th> </tr> </thead> <tbody> <tr> <td><strong>Command</strong></td> <td>Shell script (exit 0=pass, 2=block)</td> <td>快速檢查、git hooks</td> </tr> <tr> <td><strong>Prompt</strong></td> <td>LLM side-query</td> <td>語意判斷、品質檢查</td> </tr> <tr> <td><strong>Agent</strong></td> <td>生成 subagent 驗證</td> <td>複雜驗證邏輯</td> </tr> <tr> <td><strong>HTTP</strong></td> <td>Remote callback</td> <td>外部審核系統</td> </tr> <tr> <td><strong>Callback</strong></td> <td>JS function (runtime)</td> <td>內部擴展</td> </tr> </tbody> </table> <h3 id="hookmatcher-模式匹配">HookMatcher 模式匹配</h3> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"Bash"              → 攔截所有 Bash 呼叫
"Bash(git *)"       → 只攔截 git 相關指令
"Write(*.env)"      → 攔截寫入 .env 檔案
"Edit(**/*.ts)"     → 攔截編輯 TypeScript 檔案
</code></pre></div></div> <p><strong>來源優先序</strong>（高 → 低）：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>userSettings &gt; projectSettings &gt; localSettings &gt; policySettings &gt; pluginHook &gt; sessionHook &gt; builtinHook
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — Hook 系統是投資報酬率最高的架構元件。一個 <code class="language-plaintext highlighter-rouge">PreToolUse</code> command hook 可以實現：程式碼審查（lint before write）、安全檢查（block dangerous commands）、日誌記錄（audit trail）。建議從 command hook 開始，需要語意判斷時再升級到 prompt hook。</p> </blockquote> <hr/> <h2 id="pattern-8-context-compaction-三層策略">Pattern 8: Context Compaction 三層策略</h2> <p><strong>問題</strong>：長對話耗盡 context window，但簡單截斷會丟失關鍵資訊。</p> <p>解法是 <strong>三層漸進式壓縮</strong>：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Layer 1: Micro Compact（turn 內壓縮）
  → 觸發：單次回應過長
  → 壓縮：移除冗餘 tool output，保留摘要

Layer 2: Auto Compact（token 閾值觸發）
  → 觸發：total tokens &gt; context_window - 13,000
  → 壓縮：用 LLM 摘要歷史對話，保留近期 turns

Layer 3: Manual Compact（使用者觸發）
  → 觸發：/compact 指令
  → 壓縮：最激進 — 只保留核心 context
</code></pre></div></div> <p><strong>停止條件（Diminishing Returns Detection）</strong>：</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 連續壓縮 3+ 次，且每次只省 &lt; 500 tokens → 停止</span>
<span class="k">if </span><span class="p">(</span><span class="nx">continuations</span> <span class="o">&gt;=</span> <span class="mi">3</span> <span class="o">&amp;&amp;</span> <span class="nx">tokenDelta</span> <span class="o">&lt;</span> <span class="mi">500</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="dl">"</span><span class="s2">compaction_exhausted</span><span class="dl">"</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — 這就是我們<a href="/blog/2026/local-agent-swarm/">上一篇</a>討論 task planner 的原因 — 壓縮會丟失 in-progress 的 task 狀態。解法是把 task 持久化到檔案系統（<code class="language-plaintext highlighter-rouge">.claude/tasks/</code>），讓它不受 context 壓縮影響。</p> </blockquote> <hr/> <h2 id="其他值得關注的模式">其他值得關注的模式</h2> <h3 id="pattern-2-immutable-state-store--change-hooks">Pattern 2: Immutable State Store + Change Hooks</h3> <p>單一 immutable store 搭配 reactive change hooks。用 <code class="language-plaintext highlighter-rouge">Object.is</code> 判斷是否真正變更，避免多餘的 side effect。比 event bus 更可預測 — 每個 state change 都有明確的因果鏈。</p> <h3 id="pattern-5-permission-rule-system">Pattern 5: Permission Rule System</h3> <p>每條權限規則記錄 <strong>來源</strong>（user/project/policy/plugin/builtin），支援優先序和 audit trail。Policy settings 可以覆蓋使用者設定，實現企業級管控。</p> <h3 id="pattern-7-deferred-loading">Pattern 7: Deferred Loading</h3> <p>當工具數量超過 prompt 容量時，不把所有定義放進 system prompt。標記 <code class="language-plaintext highlighter-rouge">shouldDefer=true</code> 的工具只在被搜尋時才載入 — 用 <code class="language-plaintext highlighter-rouge">searchHint</code> 關鍵字做 lazy discovery，節省大量 tokens。</p> <hr/> <h2 id="給-agent-系統開發者的建議">給 Agent 系統開發者的建議</h2> <p><strong>優先採用順序</strong>（從投資報酬率排列）：</p> <ol> <li><strong>Hook System</strong> — 最小侵入性，立即獲得可擴展性</li> <li><strong>Tool Pipeline</strong> — 統一管理工具來源，避免 if-else 地獄</li> <li><strong>Context Compaction</strong> — 長對話必備，越早做越好</li> <li><strong>Side-Query</strong> — 成本最佳化的關鍵，production 必須有</li> <li><strong>Coordinator/Worker</strong> — 需要並行處理時再引入</li> </ol> <p><strong>最小可行 Agent 架構</strong>：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tool Pipeline + Hook System + State Store = 可生產的 Agent
加上 Side-Query + Compaction = 可規模化的 Agent
加上 Coordinator/Worker = 可並行的 Agent Swarm
</code></pre></div></div> <hr/> <h2 id="相關連結">相關連結</h2> <ul> <li><strong>昨日文章</strong> — <a href="/blog/2026/local-agent-swarm/">本地 Agent Swarm 框架全解析</a></li> <li><strong>Agent Architecture Reference</strong> — <a href="/docs/architecture/agent-architecture/">本文分析的架構文件來源</a></li> <li><strong>Claude Code</strong> — <a href="https://docs.anthropic.com/en/docs/claude-code">Anthropic 官方 Agent Coding Tool</a></li> <li><strong>LangGraph</strong> — <a href="https://github.com/langchain-ai/langgraph">github.com/langchain-ai/langgraph</a> — 企業級圖工作流引擎</li> <li><strong>CrewAI</strong> — <a href="https://github.com/crewAIInc/crewAI">github.com/crewAIInc/crewAI</a> — 角色分工框架</li> </ul> <hr/> <blockquote> <p>Source: <a href="https://github.com/osisdie/osisdie.github.io">osisdie/osisdie.github.io</a> — PRs and Issues welcome!</p> </blockquote>]]></content><author><name></name></author><category term="claude-code"/><category term="agent-swarm"/><category term="architecture"/><category term="automation"/><category term="llm"/><summary type="html"><![CDATA[從 1,902 個 TypeScript 檔案中提煉出 8 個可直接採用的 Agent 架構模式 — Tool Pipeline、Side-Query、Coordinator/Worker、Hook System、Context Compaction]]></summary></entry><entry><title type="html">本地 Agent Swarm 框架全解析：從架構比較到簡單實作</title><link href="https://osisdie.github.io/blog/2026/local-agent-swarm/" rel="alternate" type="text/html" title="本地 Agent Swarm 框架全解析：從架構比較到簡單實作"/><published>2026-03-31T02:00:00+00:00</published><updated>2026-03-31T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/local-agent-swarm</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/local-agent-swarm/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/local-agent-swarm/local-agent-swarm-overview-480.webp 480w,/assets/img/blog/2026/local-agent-swarm/local-agent-swarm-overview-800.webp 800w,/assets/img/blog/2026/local-agent-swarm/local-agent-swarm-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/local-agent-swarm/local-agent-swarm-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="Local Agent Swarm Frameworks Overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">本地 Agent Swarm 框架架構總覽</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — This post surveys the mainstream local agent swarm frameworks in 2026: CrewAI (role-based crews), AutoGen/AG2 (actor-model conversations), LangGraph (graph-based state machines), and smolagents (code-first minimal agents). We compare their architectures, learning curves, and trade-offs, then implement a minimal 2-agent swarm using Hugging Face’s smolagents to demonstrate how lightweight multi-agent orchestration can be.</p> </blockquote> <p><strong>Multi-Agent 協作</strong>已經從研究論文走進生產環境。當你的 LLM 應用需要不同角色分工——一個搜資料、一個寫摘要、一個檢查品質——你需要一個 <strong>Agent Swarm</strong> 框架來協調它們。</p> <p>但框架那麼多，哪個適合你？本文從架構本質出發，幫你做出選擇。</p> <hr/> <h2 id="為什麼要用本地-agent-swarm">為什麼要用本地 Agent Swarm？</h2> <p>三個核心理由：</p> <ol> <li><strong>隱私與合規</strong> — 敏感資料不出內網，適合金融、醫療場景</li> <li><strong>成本控制</strong> — 用本地模型（Ollama、vLLM）取代 API 調用，長期成本降 10 倍以上</li> <li><strong>延遲可控</strong> — 內網通訊 &lt; 1ms vs API 調用 200-500ms</li> </ol> <blockquote> <p><strong>Production Notes</strong> — 即使用本地模型，你仍然可以在開發階段用雲端 API 快速迭代，部署時再切換到本地推理。大部分框架都支援這種混合模式。</p> </blockquote> <hr/> <h2 id="主流框架比較">主流框架比較</h2> <table> <thead> <tr> <th>框架</th> <th>Stars</th> <th>架構模式</th> <th>代表企業</th> <th>下載量/月</th> <th>學習曲線</th> </tr> </thead> <tbody> <tr> <td><strong>LangGraph</strong></td> <td>~28k</td> <td>圖狀態機（Nodes + Edges）</td> <td>LinkedIn, Uber, Klarna</td> <td>38.5M</td> <td>中等</td> </tr> <tr> <td><strong>CrewAI</strong></td> <td>~46k</td> <td>角色分工（Role + Goal）</td> <td>Novo Nordisk, Oracle</td> <td>5.2M</td> <td>簡單</td> </tr> <tr> <td><strong>AutoGen/AG2</strong></td> <td>~57k</td> <td>Actor 模型 / 對話驅動</td> <td>⚠ 維護模式</td> <td>—</td> <td>困難</td> </tr> <tr> <td><strong>smolagents</strong></td> <td>~26k</td> <td>Code-first 極簡</td> <td>早期階段</td> <td>—</td> <td>簡單</td> </tr> </tbody> </table> <blockquote> <p><strong>補充框架</strong> — <strong>MetaGPT</strong>（~64k stars）以 SOP 模擬軟體公司運作，適合程式碼生成場景但不適用通用 Agent 協作。<strong>OpenAI Agents SDK</strong>（取代已封存的 Swarm）由 HP、Intuit、Oracle 等企業採用，但綁定 OpenAI API。</p> </blockquote> <h3 id="crewai--最直覺的角色扮演">CrewAI — 最直覺的角色扮演</h3> <p>CrewAI 的核心概念是「團隊」：每個 Agent 有 <strong>角色</strong>、<strong>目標</strong> 和 <strong>背景故事</strong>，被分配到 <strong>任務</strong>，然後組成 <strong>Crew</strong> 執行。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">crewai</span> <span class="kn">import</span> <span class="n">Agent</span><span class="p">,</span> <span class="n">Task</span><span class="p">,</span> <span class="n">Crew</span>

<span class="n">researcher</span> <span class="o">=</span> <span class="nc">Agent</span><span class="p">(</span>
    <span class="n">role</span><span class="o">=</span><span class="sh">"</span><span class="s">Research Analyst</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">goal</span><span class="o">=</span><span class="sh">"</span><span class="s">Find the latest trends in AI agent frameworks</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">backstory</span><span class="o">=</span><span class="sh">"</span><span class="s">You are a senior tech analyst...</span><span class="sh">"</span>
<span class="p">)</span>

<span class="n">writer</span> <span class="o">=</span> <span class="nc">Agent</span><span class="p">(</span>
    <span class="n">role</span><span class="o">=</span><span class="sh">"</span><span class="s">Content Writer</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">goal</span><span class="o">=</span><span class="sh">"</span><span class="s">Write a concise summary from research findings</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">backstory</span><span class="o">=</span><span class="sh">"</span><span class="s">You are a technical blogger...</span><span class="sh">"</span>
<span class="p">)</span>

<span class="n">research_task</span> <span class="o">=</span> <span class="nc">Task</span><span class="p">(</span><span class="n">description</span><span class="o">=</span><span class="sh">"</span><span class="s">Research top 5 agent frameworks</span><span class="sh">"</span><span class="p">,</span> <span class="n">agent</span><span class="o">=</span><span class="n">researcher</span><span class="p">)</span>
<span class="n">write_task</span> <span class="o">=</span> <span class="nc">Task</span><span class="p">(</span><span class="n">description</span><span class="o">=</span><span class="sh">"</span><span class="s">Write a summary article</span><span class="sh">"</span><span class="p">,</span> <span class="n">agent</span><span class="o">=</span><span class="n">writer</span><span class="p">)</span>

<span class="n">crew</span> <span class="o">=</span> <span class="nc">Crew</span><span class="p">(</span><span class="n">agents</span><span class="o">=</span><span class="p">[</span><span class="n">researcher</span><span class="p">,</span> <span class="n">writer</span><span class="p">],</span> <span class="n">tasks</span><span class="o">=</span><span class="p">[</span><span class="n">research_task</span><span class="p">,</span> <span class="n">write_task</span><span class="p">])</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">crew</span><span class="p">.</span><span class="nf">kickoff</span><span class="p">()</span>
</code></pre></div></div> <ul> <li><strong>優點</strong>：上手最快，概念清晰，社群活躍（成長最快的框架）</li> <li><strong>缺點</strong>：複雜工作流的控制力有限</li> </ul> <h3 id="autogen--曾經的明星現已進入維護模式">AutoGen — 曾經的明星，現已進入維護模式</h3> <p>Microsoft 的 AutoGen 在 v0.4 做了完全重寫，採用 <strong>Actor 模型</strong>。但 <strong>2025 年 10 月起已進入維護模式</strong>，Microsoft 將其與 Semantic Kernel 合併為統一的 Microsoft Agent Framework。原始創作者（Chi Wang、Qingyun Wu）離開 Microsoft，建立了社群驅動的 <strong>AG2</strong> fork。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">autogen_agentchat.agents</span> <span class="kn">import</span> <span class="n">AssistantAgent</span>
<span class="kn">from</span> <span class="n">autogen_agentchat.teams</span> <span class="kn">import</span> <span class="n">RoundRobinGroupChat</span>

<span class="n">researcher</span> <span class="o">=</span> <span class="nc">AssistantAgent</span><span class="p">(</span><span class="sh">"</span><span class="s">researcher</span><span class="sh">"</span><span class="p">,</span> <span class="n">model_client</span><span class="o">=</span><span class="n">model_client</span><span class="p">)</span>
<span class="n">writer</span> <span class="o">=</span> <span class="nc">AssistantAgent</span><span class="p">(</span><span class="sh">"</span><span class="s">writer</span><span class="sh">"</span><span class="p">,</span> <span class="n">model_client</span><span class="o">=</span><span class="n">model_client</span><span class="p">)</span>

<span class="n">team</span> <span class="o">=</span> <span class="nc">RoundRobinGroupChat</span><span class="p">([</span><span class="n">researcher</span><span class="p">,</span> <span class="n">writer</span><span class="p">],</span> <span class="n">max_turns</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">team</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span><span class="n">task</span><span class="o">=</span><span class="sh">"</span><span class="s">Research and summarize AI trends</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div> <ul> <li><strong>優點</strong>：Actor 模型架構設計優秀，可分散式部署</li> <li><strong>缺點</strong>：已停止新功能開發，v0.2 → v0.4 不相容，社群分裂為 AG2 fork</li> <li><strong>⚠ 注意</strong>：如果你現在才要選框架，不建議新專案採用 AutoGen</li> </ul> <h3 id="langgraph--企業級生產首選">LangGraph — 企業級生產首選</h3> <p>LangGraph 用有向圖來定義 Agent 之間的流轉邏輯。每個節點是一個處理步驟，邊決定下一步走向。它是目前<strong>企業生產環境採用率最高</strong>的多 Agent 框架：</p> <ul> <li><strong>LinkedIn</strong> — AI 招募助手，自動化候選人配對</li> <li><strong>Uber</strong> — 服務 5,000 名工程師，節省 21,000+ 開發小時</li> <li><strong>Klarna</strong> — 客服 AI 處理 8,500 萬用戶，回覆時間縮短 80%</li> </ul> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">langgraph.graph</span> <span class="kn">import</span> <span class="n">StateGraph</span>

<span class="n">graph</span> <span class="o">=</span> <span class="nc">StateGraph</span><span class="p">(</span><span class="n">AgentState</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="nf">add_node</span><span class="p">(</span><span class="sh">"</span><span class="s">researcher</span><span class="sh">"</span><span class="p">,</span> <span class="n">research_node</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="nf">add_node</span><span class="p">(</span><span class="sh">"</span><span class="s">writer</span><span class="sh">"</span><span class="p">,</span> <span class="n">writer_node</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="nf">add_edge</span><span class="p">(</span><span class="sh">"</span><span class="s">researcher</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">writer</span><span class="sh">"</span><span class="p">)</span>

<span class="n">app</span> <span class="o">=</span> <span class="n">graph</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="n">checkpointer</span><span class="o">=</span><span class="nc">MemorySaver</span><span class="p">())</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">app</span><span class="p">.</span><span class="nf">invoke</span><span class="p">({</span><span class="sh">"</span><span class="s">task</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Research AI trends</span><span class="sh">"</span><span class="p">})</span>
</code></pre></div></div> <ul> <li><strong>優點</strong>：工作流可視化，checkpoint + human-in-the-loop，企業實戰驗證最多</li> <li><strong>缺點</strong>：需要理解圖資料結構，boilerplate 較多</li> </ul> <h3 id="smolagents--code-first-極簡主義">smolagents — Code-first 極簡主義</h3> <p>Hugging Face 的 smolagents 核心只有 ~1000 行程式碼。Agent 直接寫 Python code 來呼叫工具，不用 JSON schema。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">smolagents</span> <span class="kn">import</span> <span class="n">CodeAgent</span><span class="p">,</span> <span class="n">HfApiModel</span><span class="p">,</span> <span class="n">DuckDuckGoSearchTool</span>

<span class="n">model</span> <span class="o">=</span> <span class="nc">HfApiModel</span><span class="p">(</span><span class="sh">"</span><span class="s">Qwen/Qwen2.5-Coder-32B-Instruct</span><span class="sh">"</span><span class="p">)</span>
<span class="n">agent</span> <span class="o">=</span> <span class="nc">CodeAgent</span><span class="p">(</span><span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="nc">DuckDuckGoSearchTool</span><span class="p">()],</span> <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">agent</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span><span class="sh">"</span><span class="s">What are the top AI agent frameworks in 2026?</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div> <ul> <li><strong>優點</strong>：最輕量，支援本地 HF 模型，code-first 比 JSON 更靈活</li> <li><strong>缺點</strong>：多 Agent 協作功能較新，生態系較小，尚無知名企業採用案例</li> </ul> <blockquote> <p><strong>Production Notes</strong> — 如果你只是需要 <strong>單一 Agent + 工具呼叫</strong>，smolagents 是最佳起點。需要 <strong>多角色協作</strong> 用 CrewAI。需要 <strong>複雜工作流 + checkpoint + 企業級生產</strong> 用 LangGraph。</p> </blockquote> <hr/> <h2 id="簡單實作smolagents-雙-agent-協作">簡單實作：smolagents 雙 Agent 協作</h2> <p>選擇 smolagents 是因為它最輕量、不依賴特定 API provider、且支援本地模型。</p> <h3 id="安裝">安裝</h3> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>smolagents[litellm] duckduckgo-search
</code></pre></div></div> <h3 id="程式碼">程式碼</h3> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">
minimal_swarm.py - 最小化的雙 Agent 協作範例
Agent A (Manager): 協調任務分配
Agent B (WebSearch): 搜尋網路資訊
</span><span class="sh">"""</span>
<span class="kn">from</span> <span class="n">smolagents</span> <span class="kn">import</span> <span class="n">CodeAgent</span><span class="p">,</span> <span class="n">LiteLLMModel</span><span class="p">,</span> <span class="n">DuckDuckGoSearchTool</span><span class="p">,</span> <span class="n">tool</span>

<span class="nd">@tool</span>
<span class="k">def</span> <span class="nf">summarize_text</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Summarize the given text into 3 bullet points.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Summary of: </span><span class="si">{</span><span class="n">text</span><span class="p">[</span><span class="si">:</span><span class="mi">100</span><span class="p">]</span><span class="si">}</span><span class="s">...</span><span class="sh">"</span>

<span class="c1"># 使用 LiteLLM 支援任意 LLM provider
</span><span class="n">model</span> <span class="o">=</span> <span class="nc">LiteLLMModel</span><span class="p">(</span><span class="n">model_id</span><span class="o">=</span><span class="sh">"</span><span class="s">gpt-4o-mini</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># 或 ollama/llama3.2
</span>
<span class="c1"># Web Search Agent
</span><span class="n">web_agent</span> <span class="o">=</span> <span class="nc">CodeAgent</span><span class="p">(</span>
    <span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="nc">DuckDuckGoSearchTool</span><span class="p">()],</span>
    <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
    <span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">web_search_agent</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">description</span><span class="o">=</span><span class="sh">"</span><span class="s">Searches the web for information on a given topic</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="c1"># Manager Agent (orchestrates the web agent)
</span><span class="n">manager</span> <span class="o">=</span> <span class="nc">CodeAgent</span><span class="p">(</span>
    <span class="n">tools</span><span class="o">=</span><span class="p">[],</span>
    <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
    <span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">manager</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">managed_agents</span><span class="o">=</span><span class="p">[</span><span class="n">web_agent</span><span class="p">],</span>
<span class="p">)</span>

<span class="c1"># 執行
</span><span class="n">result</span> <span class="o">=</span> <span class="n">manager</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span>
    <span class="sh">"</span><span class="s">Search for the top 3 local AI agent frameworks in 2026, </span><span class="sh">"</span>
    <span class="sh">"</span><span class="s">and give me a brief comparison.</span><span class="sh">"</span>
<span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</code></pre></div></div> <h3 id="執行結果">執行結果</h3> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python minimal_swarm.py

╭─ Manager Agent ──────────────────────────────────────╮
│ I'll delegate the web search to my web_search_agent. │
╰──────────────────────────────────────────────────────╯
╭─ web_search_agent ───────────────────────────────────╮
│ Searching: "top local AI agent frameworks 2026"      │
│ Found 5 results...                                   │
╰──────────────────────────────────────────────────────╯
╭─ Manager Agent ──────────────────────────────────────╮
│ Based on web_search_agent's findings:                │
│                                                      │
│ 1. LangGraph (~28k stars) - Enterprise production    │
│ 2. CrewAI (~46k stars) - Role-based, easiest setup   │
│ 3. smolagents (~26k stars) - Code-first, minimal     │
│                                                      │
│ For quick prototyping: CrewAI or smolagents           │
│ For production at scale: LangGraph                   │
╰──────────────────────────────────────────────────────╯
</code></pre></div></div> <blockquote> <p>以上為簡化的示意輸出，實際執行結果會因模型和搜尋結果而異。</p> </blockquote> <blockquote> <p><strong>Production Notes</strong> — <code class="language-plaintext highlighter-rouge">LiteLLMModel</code> 讓你用同一份程式碼切換任意 LLM：<code class="language-plaintext highlighter-rouge">gpt-4o-mini</code>（雲端）、<code class="language-plaintext highlighter-rouge">ollama/llama3.2</code>（本地）、或 <code class="language-plaintext highlighter-rouge">anthropic/...</code>（其他 provider）。部署時只改 <code class="language-plaintext highlighter-rouge">model_id</code> 即可。</p> </blockquote> <hr/> <h2 id="框架選型決策樹">框架選型決策樹</h2> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>你的需求是什麼？
│
├── 只需要單一 Agent + 工具 → smolagents
│
├── 需要多角色協作
│   ├── 簡單的順序/並行執行 → CrewAI
│   └── 複雜的條件分支/迴圈 → LangGraph
│
└── 企業級生產部署 → LangGraph（已被 LinkedIn, Uber, Klarna 驗證）
</code></pre></div></div> <hr/> <h2 id="總結">總結</h2> <table> <thead> <tr> <th>如果你是…</th> <th>推薦</th> <th>理由</th> </tr> </thead> <tbody> <tr> <td>剛接觸 Agent 的開發者</td> <td><strong>smolagents</strong></td> <td>最少 boilerplate，10 行就能跑</td> </tr> <tr> <td>需要快速建立 Agent 團隊</td> <td><strong>CrewAI</strong></td> <td>角色概念直覺，社群資源豐富</td> </tr> <tr> <td>建構複雜工作流</td> <td><strong>LangGraph</strong></td> <td>圖模型 + checkpoint + human-in-the-loop</td> </tr> <tr> <td>企業級生產部署</td> <td><strong>LangGraph</strong></td> <td>LinkedIn, Uber, Klarna 驗證，38.5M 月下載</td> </tr> </tbody> </table> <blockquote> <p><strong>⚠ AutoGen 已不建議新專案採用</strong> — 自 2025 年 10 月起進入維護模式，Microsoft 已將其合併至 Microsoft Agent Framework。</p> </blockquote> <p>Agent Swarm 的未來趨勢是 <strong>更輕量的核心 + 更強的互操作性</strong>。smolagents 的 ~1000 行核心證明了一個好的 Agent 框架不需要很複雜。市場正在向 <strong>圖式工作流</strong>（LangGraph 領先）收斂，CrewAI 也在積極整合 LangChain 生態。</p> <hr/> <h2 id="相關連結">相關連結</h2> <ul> <li><strong>CrewAI</strong> — <a href="https://github.com/crewAIInc/crewAI">github.com/crewAIInc/crewAI</a></li> <li><strong>AutoGen</strong> — <a href="https://github.com/microsoft/autogen">github.com/microsoft/autogen</a></li> <li><strong>AG2 (AutoGen fork)</strong> — <a href="https://github.com/ag2ai/ag2">github.com/ag2ai/ag2</a></li> <li><strong>LangGraph</strong> — <a href="https://github.com/langchain-ai/langgraph">github.com/langchain-ai/langgraph</a></li> <li><strong>smolagents</strong> — <a href="https://github.com/huggingface/smolagents">github.com/huggingface/smolagents</a></li> <li><strong>OpenAI Agents SDK</strong> — <a href="https://github.com/openai/openai-agents-python">github.com/openai/openai-agents-python</a></li> <li><strong>MetaGPT</strong> — <a href="https://github.com/FoundationAgents/MetaGPT">github.com/FoundationAgents/MetaGPT</a></li> <li><strong>企業採用數據</strong> — <a href="https://blog.langchain.com/is-langgraph-used-in-production/">Is LangGraph Used In Production?</a></li> <li><strong>AutoGen 維護模式</strong> — <a href="https://venturebeat.com/ai/microsoft-retires-autogen-and-debuts-agent-framework-to-unify-and-govern">Microsoft retires AutoGen</a></li> </ul> <hr/> <blockquote> <p>Source: <a href="https://github.com/osisdie/osisdie.github.io">osisdie/osisdie.github.io</a> — PRs and Issues welcome!</p> </blockquote>]]></content><author><name></name></author><category term="agent-swarm"/><category term="multi-agent"/><category term="orchestration"/><category term="llm"/><category term="automation"/><category term="python"/><summary type="html"><![CDATA[比較主流本地 Agent Swarm 框架（CrewAI、AutoGen、LangGraph、smolagents），並用 smolagents 實作一個最小化的雙 Agent 協作範例]]></summary></entry><entry><title type="html">IoT 百萬設備架構選型 Part 3：運維、成本與可靠性</title><link href="https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part3/" rel="alternate" type="text/html" title="IoT 百萬設備架構選型 Part 3：運維、成本與可靠性"/><published>2026-03-30T02:02:00+00:00</published><updated>2026-03-30T02:02:00+00:00</updated><id>https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part3</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part3/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/iot-architecture/iot-architecture-overview-480.webp 480w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-800.webp 800w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/iot-architecture/iot-architecture-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="IoT 1M Device Architecture Overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — Part 3 of 3. Operations: three-layer <strong>rate limiting</strong> (EMQX → Rule Engine → App), content-based <strong>dedup</strong>, anomaly detection. Edge resilience: <strong>exponential backoff + jitter</strong>, offline buffering (RAM/SQLite/MQTT Session Expiry). Server HA with RPO/RTO per component. <strong>OpenTelemetry</strong> end-to-end tracing. Multi-region DR (Active-Passive). Team onboarding risk and phased rollout.</p> </blockquote> <blockquote> <table> <tbody> <tr> <td><strong>系列文章：</strong> <a href="/blog/2026/iot-1m-device-architecture/">Part 1 核心架構</a></td> <td><a href="/blog/2026/iot-1m-device-architecture-part2/">Part 2 安全與多租戶</a></td> <td>Part 3 運維與可靠性（本篇）</td> </tr> </tbody> </table> </blockquote> <hr/> <h2 id="rate-limiting--dedup">Rate Limiting + Dedup</h2> <p>設備異常（firmware bug、sensor malfunction）可能瞬間灌入大量資料：</p> <pre><code class="language-mermaid">flowchart LR
    D[Faulty Device] --&gt;|10 msg/s limit| E[EMQX]
    E --&gt;|SQL filter| T[TimescaleDB]
    T --&gt;|circuit breaker| A[FastAPI]
    E --&gt;|exceed 10x| X1[Disconnect]
</code></pre> <table> <thead> <tr> <th>層</th> <th>限制</th> <th>超限動作</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>EMQX</strong></td> <td>10 msg/s, 50 KB/s</td> <td>Throttle → disconnect</td> <td>第一道防線，per-client</td> </tr> <tr> <td><strong>Rule Engine</strong></td> <td>SQL filter + dedup</td> <td>丟棄不符條件</td> <td>基本過濾，無需代碼</td> </tr> <tr> <td><strong>FastAPI</strong></td> <td>Per-tenant rate limit</td> <td>Alert + reject</td> <td>業務邏輯層防護</td> </tr> </tbody> </table> <h3 id="dedup-策略">Dedup 策略</h3> <table> <thead> <tr> <th>層</th> <th>策略</th> </tr> </thead> <tbody> <tr> <td>MQTT Broker</td> <td>Packet ID tracking</td> </tr> <tr> <td>Rule Engine</td> <td>SQL WHERE + timestamp 比對</td> </tr> <tr> <td>Application</td> <td><code class="language-plaintext highlighter-rouge">(device_id, timestamp, hash)</code></td> </tr> <tr> <td>Database</td> <td><code class="language-plaintext highlighter-rouge">ON CONFLICT DO NOTHING</code></td> </tr> </tbody> </table> <h3 id="異常偵測">異常偵測</h3> <table> <thead> <tr> <th>類型</th> <th>偵測</th> <th>處理</th> </tr> </thead> <tbody> <tr> <td>超頻上報</td> <td>Rate &gt; 10x</td> <td>Broker throttle</td> </tr> <tr> <td>範圍異常</td> <td>超 physical range</td> <td>丟棄 + 告警</td> </tr> <tr> <td>時序異常</td> <td>偏差 &gt; 5min</td> <td>標記 suspect</td> </tr> <tr> <td>靜默設備</td> <td>&gt; 3x 正常間隔</td> <td>LWT → offline 告警</td> </tr> </tbody> </table> <hr/> <h2 id="edge-resilience">Edge Resilience</h2> <h3 id="reconnect--offline-buffer">Reconnect + Offline Buffer</h3> <pre><code class="language-mermaid">stateDiagram-v2
    [*] --&gt; Connected
    Connected --&gt; Disconnected: Network down
    Disconnected --&gt; Retry1s: Retry 1s
    Retry1s --&gt; Connected: OK
    Retry1s --&gt; Retry2s: Fail
    Retry2s --&gt; RetryMax: Backoff + jitter
    RetryMax --&gt; Connected: OK
    note right of Disconnected: Write to local buffer
    note right of Connected: Drain buffer on reconnect
</code></pre> <table> <thead> <tr> <th>Buffer</th> <th>容量</th> <th>持久性</th> <th>適用</th> </tr> </thead> <tbody> <tr> <td>Ring buffer (RAM)</td> <td>1-10K msg</td> <td>斷電失</td> <td>MCU</td> </tr> <tr> <td>SQLite on flash</td> <td>100K+ msg</td> <td>持久</td> <td>Gateway</td> </tr> <tr> <td>MQTT v5 Session Expiry</td> <td>Broker 端</td> <td>Broker 存活時</td> <td>所有</td> </tr> </tbody> </table> <p>QoS 1：PUBLISH → 等 PUBACK → timeout 5s → pending → 重連後重送。搭配 application dedup 不重複。</p> <h3 id="thundering-herd">Thundering Herd</h3> <p>Broker 恢復 → 1M 設備同時重連。Device jitter 分散 0-5min + EMQX <code class="language-plaintext highlighter-rouge">max_conn_rate=10000/s</code> → 100s 有序恢復。</p> <hr/> <h2 id="server-side-ha">Server-Side HA</h2> <table> <thead> <tr> <th>組件</th> <th>HA 策略</th> <th>RPO</th> <th>RTO</th> </tr> </thead> <tbody> <tr> <td>EMQX</td> <td>3-5 node RAFT</td> <td>0</td> <td>&lt;30s</td> </tr> <tr> <td>TimescaleDB</td> <td>Patroni + streaming replication</td> <td>~0</td> <td>&lt;30s</td> </tr> <tr> <td>ClickHouse</td> <td>ReplicatedMergeTree</td> <td>~0</td> <td>&lt;60s</td> </tr> <tr> <td>FastAPI</td> <td>K8s 3+ replicas</td> <td>—</td> <td>&lt;5s</td> </tr> </tbody> </table> <h3 id="multi-region-dr">Multi-Region DR</h3> <table> <thead> <tr> <th>層</th> <th>AWS</th> <th>GCP</th> </tr> </thead> <tbody> <tr> <td>MQTT</td> <td>EMQX cluster linking</td> <td>跨 Zone</td> </tr> <tr> <td>DB</td> <td>RDS cross-region replica</td> <td>Cloud SQL cross-region</td> </tr> <tr> <td>Cold</td> <td>S3 CRR</td> <td>GCS Dual-Region</td> </tr> <tr> <td>DNS</td> <td>Route 53 failover</td> <td>Cloud DNS routing</td> </tr> </tbody> </table> <p><strong>Active-Passive：</strong> Primary 處理流量，Secondary 有 replica，DNS failover → RPO ~min, RTO ~5-10min。</p> <hr/> <h2 id="observability">Observability</h2> <p><strong>Day 1 就做好</strong>，不是事後補。</p> <pre><code class="language-mermaid">flowchart TD
    E[EMQX] --&gt;|metrics| P[Prometheus]
    A[FastAPI] --&gt;|traces| OT[OpenTelemetry]
    BF[BFF] --&gt;|traces| OT
    T[TimescaleDB] --&gt;|metrics| P
    OT --&gt; J[Jaeger]
    OT --&gt; P
    OT --&gt; L[Loki]
    P --&gt; G[Grafana]
    J --&gt; G
    L --&gt; G
</code></pre> <table> <thead> <tr> <th>組件</th> <th>監控重點</th> <th>告警閾值</th> </tr> </thead> <tbody> <tr> <td>EMQX</td> <td>連線數、msg rate、Rule Engine</td> <td>&gt; 900K、deny &gt; 1%</td> </tr> <tr> <td>TimescaleDB</td> <td>Write throughput、disk</td> <td>&lt; 80K/s、&gt; 80%</td> </tr> <tr> <td>FastAPI</td> <td>Latency、error rate</td> <td>P99 &gt; 200ms</td> </tr> <tr> <td>BFF</td> <td>WS connections</td> <td>&gt; 10K</td> </tr> </tbody> </table> <p>每條 telemetry 帶 <code class="language-plaintext highlighter-rouge">trace_id</code>（Rule Engine 注入），Jaeger 一鍵查 device → Dashboard 完整鏈路。</p> <hr/> <h2 id="雲端-vs-地端">雲端 vs 地端</h2> <table> <thead> <tr> <th>服務</th> <th>AWS</th> <th>GCP</th> <th>地端</th> </tr> </thead> <tbody> <tr> <td>Broker</td> <td>EMQX Cloud</td> <td>EMQX Cloud</td> <td>EMQX on K8s</td> </tr> <tr> <td>K8s</td> <td>EKS</td> <td>GKE Autopilot</td> <td>K3s</td> </tr> <tr> <td>Hot DB</td> <td>Timescale Cloud</td> <td>Timescale Cloud</td> <td>VM</td> </tr> <tr> <td>Warm DB</td> <td>ClickHouse Cloud</td> <td>ClickHouse Cloud</td> <td>K8s</td> </tr> <tr> <td>Cold</td> <td>S3</td> <td>GCS</td> <td>MinIO</td> </tr> <tr> <td>Cold 查詢</td> <td>Athena</td> <td>BigQuery</td> <td>DuckDB</td> </tr> <tr> <td>監控</td> <td>CloudWatch</td> <td>Cloud Monitoring</td> <td>Grafana</td> </tr> <tr> <td>月費</td> <td>~$17-33K</td> <td>~$17-33K</td> <td>~$8-15K + ops</td> </tr> </tbody> </table> <p>GKE Autopilot 比 EKS 易上手。BigQuery 按 scan 計價對 IoT 分析較划算。</p> <hr/> <h2 id="成本估算">成本估算</h2> <h3 id="1m-設備月費雲端-managed">1M 設備月費（雲端 managed）</h3> <table> <thead> <tr> <th>組件</th> <th>AWS 月費</th> <th>GCP 月費</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>EMQX Cloud (3 node)</td> <td>~$8-15K</td> <td>~$8-15K</td> <td>MQTT Broker</td> </tr> <tr> <td>TimescaleDB</td> <td>~$3-5K</td> <td>~$3-5K</td> <td>Hot 7d + Continuous Agg</td> </tr> <tr> <td>ClickHouse Cloud</td> <td>~$2-4K</td> <td>~$2-4K</td> <td>Warm 30-90d 分析</td> </tr> <tr> <td>S3/GCS (~50 TB)</td> <td>~$1-2K</td> <td>~$1-2K</td> <td>Cold 長期歸檔</td> </tr> <tr> <td>K8s (Backend+BFF)</td> <td>~$2-4K</td> <td>~$2-4K</td> <td>3-5 nodes</td> </tr> <tr> <td>Observability</td> <td>~$1-3K</td> <td>~$1-3K</td> <td>Grafana + OTel</td> </tr> <tr> <td><strong>合計</strong></td> <td><strong>~$17-33K</strong></td> <td><strong>~$17-33K</strong></td> <td> </td> </tr> <tr> <td>+ Redpanda (&gt;1M)</td> <td>+$5-10K</td> <td>+$5-10K</td> <td>Scale-out 時加入</td> </tr> </tbody> </table> <h3 id="不同規模">不同規模</h3> <table> <thead> <tr> <th>規模</th> <th>架構</th> <th>月費</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>&lt; 10 萬</td> <td>EMQX + TimescaleDB + FastAPI + BFF</td> <td>~$3-8K</td> <td>2-3 人團隊</td> </tr> <tr> <td>10-100 萬</td> <td>+ ClickHouse + S3</td> <td>~$10-20K</td> <td>本文核心架構</td> </tr> <tr> <td>&gt; 100 萬</td> <td>+ Redpanda + FastStream + DR</td> <td>~$30-60K</td> <td>Event streaming</td> </tr> </tbody> </table> <p><strong>建議：</strong> PoC 10 萬裝置先驗證（成本約 1/5），精算後再決定 managed vs self-hosted。</p> <hr/> <h2 id="團隊與交付風險">團隊與交付風險</h2> <table> <thead> <tr> <th>風險</th> <th>緩解</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>核心 4 系統運維</td> <td>OTel Day 1 + Grafana</td> <td>統一 dashboard 降低認知負擔</td> </tr> <tr> <td>新人 2-3 月上手</td> <td>從極簡版開始</td> <td>逐步加組件，避免一次全上</td> </tr> <tr> <td>多租戶 ACL 出錯</td> <td>Unit test + staging</td> <td>配錯即資安事件</td> </tr> <tr> <td>成本超預期</td> <td>PoC 10 萬裝置</td> <td>精算後再決定 managed vs self-hosted</td> </tr> </tbody> </table> <p><strong>導入順序：</strong></p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Phase 1 (Month 1-2): EMQX + TimescaleDB + FastAPI + BFF + OTel
Phase 2 (Month 3-4): + ClickHouse + S3
Phase 3 (Month 5-6): + Multi-tenant + DR
Phase 4 (&gt;1M):       + Redpanda + FastStream
</code></pre></div></div> <hr/> <h2 id="後續考慮">後續考慮</h2> <ul> <li><strong>OTA firmware update pipeline</strong></li> <li><strong>Edge computing / gateway aggregation</strong></li> <li><strong>Active-Active geo-replication</strong>（EMQX cluster linking + CRDT）</li> </ul> <hr/> <h2 id="系列連結">系列連結</h2> <ul> <li><a href="/blog/2026/iot-1m-device-architecture/">Part 1：核心架構</a> — EMQX + TimescaleDB + FastAPI + BFF、成本估算</li> <li><a href="/blog/2026/iot-1m-device-architecture-part2/">Part 2：安全與多租戶</a> — HTTPS/TLS、mTLS、Cert Rotation、RBAC、Topic ACL</li> <li><a href="https://docs.redpanda.com/">Redpanda Documentation</a> — Event Streaming（Scale-out &gt;1M）</li> </ul>]]></content><author><name></name></author><category term="iot"/><category term="mqtt"/><category term="devops"/><category term="observability"/><category term="disaster-recovery"/><summary type="html"><![CDATA[Rate Limiting、Edge Resilience、Server HA、DR、OpenTelemetry、成本估算（~$17-33K/月）、團隊導入風險與 Phase 導入順序]]></summary></entry><entry><title type="html">IoT 百萬設備架構選型 Part 2：安全與多租戶</title><link href="https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part2/" rel="alternate" type="text/html" title="IoT 百萬設備架構選型 Part 2：安全與多租戶"/><published>2026-03-30T02:01:00+00:00</published><updated>2026-03-30T02:01:00+00:00</updated><id>https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part2</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/iot-1m-device-architecture-part2/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/iot-architecture/iot-architecture-overview-480.webp 480w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-800.webp 800w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/iot-architecture/iot-architecture-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="IoT 1M Device Architecture Overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — Part 2 of 3. Device security: <strong>HTTPS/TLS</strong> for all communication, <strong>mTLS X.509</strong> for device authentication, software-based certificate rotation (90-365d), JIT provisioning, anti-spoofing measures. Multi-tenancy: MQTT <strong>Topic ACL</strong> namespace isolation, PostgreSQL <strong>Row-Level Security</strong>, 4-role <strong>RBAC</strong> model, dual-layer command authorization.</p> </blockquote> <blockquote> <table> <tbody> <tr> <td><strong>系列文章：</strong> <a href="/blog/2026/iot-1m-device-architecture/">Part 1 核心架構</a></td> <td>Part 2 安全與多租戶（本篇）</td> <td><a href="/blog/2026/iot-1m-device-architecture-part3/">Part 3 運維與可靠性</a></td> </tr> </tbody> </table> </blockquote> <hr/> <h2 id="device-identity">Device Identity</h2> <h3 id="認證方式">認證方式</h3> <table> <thead> <tr> <th>方式</th> <th>安全性</th> <th>適用</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>mTLS (X.509)</strong></td> <td>最高</td> <td><strong>預設</strong></td> <td>CA chain 免存 per-device credential</td> </tr> <tr> <td>PSK</td> <td>中</td> <td>受限設備</td> <td>gateway 後方使用，rotation 較痛苦</td> </tr> <tr> <td>JWT</td> <td>高</td> <td>OAuth2 整合</td> <td>Stateless 驗證，需 refresh</td> </tr> </tbody> </table> <p>MAC 可偽造、serial 可猜測 — <strong>Device ID 必須搭配密碼學憑證</strong>：</p> <ul> <li>MQTT Client ID：<code class="language-plaintext highlighter-rouge">{tenant}:{type}:{serial}</code></li> <li>X.509 CN 匹配 Client ID → mTLS 自動綁定</li> <li>DB PK：UUID v4</li> </ul> <h3 id="通訊安全">通訊安全</h3> <table> <thead> <tr> <th>層</th> <th>機制</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>傳輸</td> <td>TLS 1.2+ (8883)</td> <td>加密 + 完整性</td> </tr> <tr> <td>身份</td> <td>mTLS 雙向驗證</td> <td>Broker 驗 device，device 驗 broker</td> </tr> <tr> <td>應用</td> <td>Payload HMAC (optional)</td> <td>防中間人改寫</td> </tr> </tbody> </table> <p><strong>Certificate Rotation：</strong></p> <ul> <li>有效期 90-365 天，到期前 30 天自動 CSR 換發</li> <li>雙 CA chain 確保 rotation 不斷線</li> <li>到期未更新 → CRL 撤銷 → 強制斷線 + 告警</li> </ul> <h3 id="provisioning">Provisioning</h3> <pre><code class="language-mermaid">flowchart TD
    R[Root CA] --&gt; F[Intermediate CA]
    F --&gt;|Bootstrap| D[First Connect]
    D --&gt;|Verify| REG[Registry]
    REG --&gt;|Issue cert| D2[Online]
</code></pre> <table> <thead> <tr> <th>方式</th> <th>安全</th> <th>適用</th> </tr> </thead> <tbody> <tr> <td><strong>JIT</strong></td> <td>高</td> <td>一般 fleet（推薦）</td> </tr> <tr> <td>Claim-based</td> <td>中</td> <td>批量同型號</td> </tr> <tr> <td>API 預註冊</td> <td>高</td> <td>已知 device list</td> </tr> </tbody> </table> <p><strong>防偽裝：</strong> One-time bootstrap token、Device fingerprint hash、Provisioning API rate limit、Allowlist/Denylist。</p> <h3 id="emqx-認證鏈">EMQX 認證鏈</h3> <ol> <li><strong>mTLS</strong> → cert CN 取 device identity（<code class="language-plaintext highlighter-rouge">peer_cert_as_clientid = cn</code>）</li> <li><strong>JWT</strong> → RS256 簽名 + claims 驗證</li> <li><strong>HTTP</strong> → 外部 auth service（legacy 設備）</li> </ol> <p>EMQX 支援 CRL + OCSP Stapling — 設備 compromise 時即時撤銷。</p> <hr/> <h2 id="multi-tenancy">Multi-Tenancy</h2> <h3 id="broker-隔離">Broker 隔離</h3> <table> <thead> <tr> <th>模式</th> <th>隔離</th> <th>適用</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>共享 EMQX + Topic ACL</strong></td> <td>邏輯</td> <td>95% 租戶</td> <td>成本最低，ACL 管理</td> </tr> <tr> <td>Broker-per-tenant</td> <td>進程</td> <td>法規要求</td> <td>醫療/金融等合規場景</td> </tr> <tr> <td><strong>混合</strong></td> <td>視 tier</td> <td><strong>推薦</strong></td> <td>Standard 共享 + Enterprise 獨立</td> </tr> </tbody> </table> <h3 id="topic-命名空間">Topic 命名空間</h3> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{tenant}/d/{device}/telemetry      # 遙測
{tenant}/d/{device}/cmd/request    # 指令
{tenant}/d/{device}/cmd/response   # 回應
{tenant}/d/{device}/config/desired # 期望組態
{tenant}/g/{group}/cmd/request     # 群組廣播
</code></pre></div></div> <p>Tenant ID 永遠第一層 → ACL 前綴比對。設備禁止 wildcard subscribe。</p> <h3 id="rbac">RBAC</h3> <table> <thead> <tr> <th>權限</th> <th style="text-align: center">Super Admin</th> <th style="text-align: center">Tenant Admin</th> <th style="text-align: center">Operator</th> <th style="text-align: center">Viewer</th> </tr> </thead> <tbody> <tr> <td>管理 tenants</td> <td style="text-align: center">✓</td> <td style="text-align: center"> </td> <td style="text-align: center"> </td> <td style="text-align: center"> </td> </tr> <tr> <td>註冊/停用設備</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center"> </td> <td style="text-align: center"> </td> </tr> <tr> <td>發送任意指令</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center"> </td> <td style="text-align: center"> </td> </tr> <tr> <td>發送預核准指令</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center"> </td> </tr> <tr> <td>查看 Dashboard</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> </tr> <tr> <td>OTA 部署</td> <td style="text-align: center">✓</td> <td style="text-align: center">✓</td> <td style="text-align: center"> </td> <td style="text-align: center"> </td> </tr> </tbody> </table> <h3 id="command-雙層驗證">Command 雙層驗證</h3> <ul> <li><strong>API 端：</strong> User role + command 權限 + device status + rate limit</li> <li><strong>Device 端：</strong> 驗簽名（防 injection）+ 驗 timestamp（防 replay）+ 驗 command_type</li> </ul> <h3 id="db-tenant-隔離">DB Tenant 隔離</h3> <table> <thead> <tr> <th>策略</th> <th>隔離</th> <th>適用</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>Row-Level Security</strong></td> <td>邏輯</td> <td><strong>預設</strong></td> <td>單 schema，policy 自動過濾</td> </tr> <tr> <td>Schema-per-tenant</td> <td>中</td> <td>中等需求</td> <td>N 個 schema migration</td> </tr> <tr> <td>DB-per-tenant</td> <td>最強</td> <td>Enterprise</td> <td>最高成本，完全隔離</td> </tr> </tbody> </table> <p>TimescaleDB 按 <code class="language-plaintext highlighter-rouge">(tenant_id, time)</code> 分區 → 查詢自動 pruning，可按 tenant 設定不同 retention。</p> <hr/> <h2 id="下一篇">下一篇</h2> <ul> <li><a href="/blog/2026/iot-1m-device-architecture/">Part 1：核心架構</a> — EMQX + TimescaleDB + FastAPI + BFF、成本估算</li> <li><a href="/blog/2026/iot-1m-device-architecture-part3/">Part 3：運維、成本與可靠性</a> — Rate Limiting、Edge Resilience、DR、Observability、成本估算</li> </ul>]]></content><author><name></name></author><category term="iot"/><category term="mqtt"/><category term="security"/><category term="multi-tenancy"/><category term="rbac"/><summary type="html"><![CDATA[HTTPS/TLS 通訊加密、mTLS X.509 設備認證、Certificate Rotation、JIT Provisioning、Multi-Tenancy Topic ACL + RLS、RBAC 四角色權限]]></summary></entry><entry><title type="html">IoT 百萬設備架構選型 Part 1：核心架構與技術選型</title><link href="https://osisdie.github.io/blog/2026/iot-1m-device-architecture/" rel="alternate" type="text/html" title="IoT 百萬設備架構選型 Part 1：核心架構與技術選型"/><published>2026-03-30T02:00:00+00:00</published><updated>2026-03-30T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/iot-1m-device-architecture</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/iot-1m-device-architecture/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/iot-architecture/iot-architecture-overview-480.webp 480w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-800.webp 800w,/assets/img/blog/2026/iot-architecture/iot-architecture-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/iot-architecture/iot-architecture-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="IoT 1M Device Architecture Overview" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Phase 1 核心架構：EMQX + TimescaleDB + FastAPI + BFF + OpenTelemetry</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — Part 1 of 3. Core architecture for 1M IoT devices: <strong>EMQX</strong> (Rule Engine direct write) → <strong>TimescaleDB</strong> (hot 7d + Continuous Aggregates) → <strong>FastAPI</strong> → <strong>BFF</strong> → Dashboard. Includes protocol selection, broker comparison, three-tier storage, BFF design, cost estimation (~$17-33K/month), and AWS/GCP mapping. Scale-out (&gt;1M): add Redpanda + ClickHouse.</p> </blockquote> <h2 id="前言">前言</h2> <p>1 百萬台設備、每 10 秒回報一次 = <strong>100K writes/sec</strong>、<strong>~1.7 TB/day</strong>。</p> <p>本系列採用<strong>漸進式架構</strong> — Phase 1 只需 4 個核心組件即可上線：</p> <table> <thead> <tr> <th>Phase</th> <th>組件</th> <th>適用規模</th> </tr> </thead> <tbody> <tr> <td><strong>1 (MVP)</strong></td> <td>EMQX + TimescaleDB + FastAPI + BFF</td> <td>&lt; 100 萬</td> </tr> <tr> <td>2</td> <td>+ ClickHouse + S3（冷熱分層）</td> <td>同上</td> </tr> <tr> <td>3</td> <td>+ Multi-tenant + DR</td> <td>同上</td> </tr> <tr> <td>4</td> <td>+ Redpanda + FastStream（event streaming）</td> <td>&gt; 100 萬</td> </tr> </tbody> </table> <blockquote> <table> <tbody> <tr> <td><strong>系列文章：</strong> Part 1 核心架構（本篇）</td> <td><a href="/blog/2026/iot-1m-device-architecture-part2/">Part 2 安全與多租戶</a></td> <td><a href="/blog/2026/iot-1m-device-architecture-part3/">Part 3 運維與可靠性</a></td> </tr> </tbody> </table> </blockquote> <hr/> <h2 id="架構資料流">架構資料流</h2> <h3 id="telemetrydevice--dashboard">Telemetry（Device → Dashboard）</h3> <pre><code class="language-mermaid">flowchart LR
    D[Device] --&gt;|MQTT| B[EMQX]
    B --&gt;|Rule Engine| T[TimescaleDB]
    T --&gt; A[FastAPI]
    A --&gt; BFF[BFF]
    BFF --&gt;|WebSocket| U[Dashboard]
</code></pre> <h3 id="commanddashboard--device">Command（Dashboard → Device）</h3> <pre><code class="language-mermaid">flowchart RL
    U[Dashboard] --&gt;|WS/REST| BFF[BFF]
    BFF --&gt; A[FastAPI]
    A --&gt;|MQTT QoS 1| B[EMQX]
    B --&gt; D[Device]
</code></pre> <p>EMQX Rule Engine 直寫 TimescaleDB，無 Event Streaming 中間層。延遲 &lt;50ms，適合 1M 以下。&gt;1M 需 dedup / event replay 時再加 Redpanda。</p> <hr/> <h2 id="通訊協定與-broker">通訊協定與 Broker</h2> <table> <thead> <tr> <th>協定</th> <th>雙向</th> <th>功耗</th> <th>Overhead</th> <th>適用</th> </tr> </thead> <tbody> <tr> <td><strong>MQTT v5</strong></td> <td>Yes</td> <td>極低</td> <td>2-byte header</td> <td><strong>IoT 預設</strong></td> </tr> <tr> <td>CoAP</td> <td>有限</td> <td>極低</td> <td>UDP</td> <td>NB-IoT 受限設備</td> </tr> <tr> <td>gRPC</td> <td>Yes</td> <td>高</td> <td>HTTP/2 + protobuf</td> <td>Service-to-service</td> </tr> </tbody> </table> <p>MQTT v5 關鍵功能：Correlation ID、Shared subscriptions、Message expiry、Retained messages、LWT。</p> <table> <thead> <tr> <th>Broker</th> <th>1M 連線</th> <th>Clustering</th> <th>推薦</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>EMQX</strong></td> <td>✓ (100M+)</td> <td>RAFT</td> <td>★★★★★</td> <td>開源、Rule Engine 內建、社群最大</td> </tr> <tr> <td>HiveMQ</td> <td>✓</td> <td>原生</td> <td>★★★★</td> <td>商業授權、企業支援佳</td> </tr> <tr> <td>Mosquitto</td> <td>x (~100K)</td> <td>無</td> <td>僅 dev</td> <td>單線程、無 clustering</td> </tr> </tbody> </table> <table> <thead> <tr> <th>部署</th> <th>AWS</th> <th>GCP</th> <th>月費</th> </tr> </thead> <tbody> <tr> <td><strong>EMQX Cloud</strong></td> <td>AWS</td> <td>GCP</td> <td>~$8-15K</td> </tr> <tr> <td>Self-hosted K8s</td> <td>EKS</td> <td>GKE</td> <td>~$3-8K + ops</td> </tr> </tbody> </table> <hr/> <h2 id="python-後端asyncio">Python 後端：asyncio</h2> <p>Free Threading (PEP 703) 預計 Python 3.16 (~2028) 才正式。1M 連線是 I/O-bound，asyncio + <strong>uvloop</strong> (2-4x 提升) + 多 worker 進程是正解。</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>HAProxy → Uvicorn worker 1..N (asyncio + uvloop, ~50-100K conn/worker)
           └── Redis/NATS cross-process pub/sub
</code></pre></div></div> <hr/> <h2 id="bff-backend-for-frontend">BFF (Backend for Frontend)</h2> <p>Backend 不直接面對前端 UI。BFF 層負責 WebSocket、API 聚合、Response 裁切。</p> <pre><code class="language-mermaid">flowchart LR
    subgraph BE["FastAPI Backend"]
        D[Device API]
        T2[Telemetry API]
        C[Command API]
    end
    BE --&gt;|gRPC / REST| W[BFF-Web]
    BE --&gt;|gRPC / REST| M[BFF-Mobile]
    W --&gt;|WebSocket| WD[Dashboard]
    M --&gt;|REST + Push| MA[Mobile App]
</code></pre> <table> <thead> <tr> <th>層</th> <th>職責</th> <th>不做</th> </tr> </thead> <tbody> <tr> <td><strong>Backend</strong></td> <td>Device CRUD、Telemetry、Command、RBAC、MQTT</td> <td>UI 邏輯</td> </tr> <tr> <td><strong>BFF</strong></td> <td>WS 管理、聚合查詢、裁切、i18n</td> <td>直連 DB/MQTT</td> </tr> </tbody> </table> <table> <thead> <tr> <th>面向</th> <th>選擇</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td>語言</td> <td>FastAPI 或 Next.js API Routes</td> <td>依前端團隊技術棧</td> </tr> <tr> <td>BFF → Backend</td> <td>gRPC 或 REST</td> <td>gRPC 效能好，REST 開發快</td> </tr> <tr> <td>快取</td> <td>Redis</td> <td>Status cache + WS pub/sub</td> </tr> </tbody> </table> <hr/> <h2 id="emqx-rule-engine-資料寫入">EMQX Rule Engine 資料寫入</h2> <p>不使用獨立 Event Streaming。Rule Engine 內建 PostgreSQL connector 直寫 TimescaleDB：</p> <ul> <li>SQL-like 過濾：<code class="language-plaintext highlighter-rouge">SELECT * FROM "telemetry/#" WHERE payload.temperature &gt; 50</code></li> <li>訊息轉發、格式轉換、基本 dedup、Rate Limiting + 背壓</li> </ul> <blockquote> <p>1M 設備需 event replay / 多消費者時，加入 Redpanda（Phase 4）。</p> </blockquote> <hr/> <h2 id="三層儲存策略">三層儲存策略</h2> <table> <thead> <tr> <th>層</th> <th>DB</th> <th>保留</th> <th>查詢延遲</th> <th>月成本/TB</th> <th>說明</th> </tr> </thead> <tbody> <tr> <td><strong>Hot</strong></td> <td>TimescaleDB</td> <td>7 days</td> <td>&lt;10ms</td> <td>~$200</td> <td>原始解析度，Dashboard 即時查詢</td> </tr> <tr> <td><strong>Warm</strong></td> <td>ClickHouse</td> <td>30-90d</td> <td>50-500ms</td> <td>~$50</td> <td>1min/5min 聚合，分析查詢</td> </tr> <tr> <td><strong>Cold</strong></td> <td>S3 + Parquet</td> <td>年</td> <td>秒級</td> <td>~$2-5</td> <td>時/日聚合，DuckDB ad-hoc</td> </tr> </tbody> </table> <p><strong>Continuous Aggregates</strong> 是 Dashboard 查詢的關鍵 — 自動預聚合 1min/5min/1hr，查詢量降 600x。</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="n">MATERIALIZED</span> <span class="k">VIEW</span> <span class="n">sensor_1min</span>
<span class="k">WITH</span> <span class="p">(</span><span class="n">timescaledb</span><span class="p">.</span><span class="n">continuous</span><span class="p">)</span> <span class="k">AS</span>
<span class="k">SELECT</span> <span class="n">time_bucket</span><span class="p">(</span><span class="s1">'1 minute'</span><span class="p">,</span> <span class="nb">time</span><span class="p">)</span> <span class="k">AS</span> <span class="n">bucket</span><span class="p">,</span> <span class="n">device_id</span><span class="p">,</span>
       <span class="k">avg</span><span class="p">(</span><span class="n">temperature</span><span class="p">)</span> <span class="k">AS</span> <span class="n">avg_temp</span><span class="p">,</span> <span class="k">max</span><span class="p">(</span><span class="n">temperature</span><span class="p">)</span> <span class="k">AS</span> <span class="n">max_temp</span>
<span class="k">FROM</span> <span class="n">telemetry</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">bucket</span><span class="p">,</span> <span class="n">device_id</span><span class="p">;</span>
</code></pre></div></div> <pre><code class="language-mermaid">flowchart TD
    D[Devices] --&gt;|MQTT| E[EMQX]
    E --&gt;|Rule Engine| H[TimescaleDB]
    H --&gt;|Aggregate| W[ClickHouse]
    W --&gt;|Archive| C[S3 Parquet]
    H --&gt;|Query| A[FastAPI]
    A --&gt;|gRPC| B[BFF]
    B --&gt;|WS| U[Dashboard]
</code></pre> <blockquote> <p><strong>成本估算：</strong> 1M 設備雲端 managed 約 ~$17-33K/月（AWS/GCP），詳見 <a href="/blog/2026/iot-1m-device-architecture-part3/#成本估算">Part 3 成本與風險</a>。</p> </blockquote> <hr/> <h2 id="技術選型總表">技術選型總表</h2> <table> <thead> <tr> <th>層</th> <th>選擇</th> <th>AWS</th> <th>GCP</th> </tr> </thead> <tbody> <tr> <td>設備協定</td> <td><strong>MQTT v5</strong></td> <td>—</td> <td>—</td> </tr> <tr> <td>Broker</td> <td><strong>EMQX</strong></td> <td>EMQX Cloud</td> <td>EMQX Cloud</td> </tr> <tr> <td>Data Ingestion</td> <td><strong>Rule Engine</strong></td> <td>—</td> <td>—</td> </tr> <tr> <td>Streaming (&gt;1M)</td> <td><strong>Redpanda</strong></td> <td>MSK</td> <td>Redpanda Cloud</td> </tr> <tr> <td>Backend</td> <td><strong>FastAPI + asyncio</strong></td> <td>Fargate</td> <td>Cloud Run</td> </tr> <tr> <td>BFF</td> <td><strong>FastAPI / Next.js</strong></td> <td>Fargate</td> <td>Cloud Run</td> </tr> <tr> <td>UI 推送</td> <td><strong>WebSocket via BFF</strong></td> <td>ALB</td> <td>Cloud LB</td> </tr> <tr> <td>Hot DB</td> <td><strong>TimescaleDB</strong></td> <td>Timescale Cloud</td> <td>Timescale Cloud</td> </tr> <tr> <td>Warm DB</td> <td><strong>ClickHouse</strong></td> <td>ClickHouse Cloud</td> <td>ClickHouse Cloud</td> </tr> <tr> <td>Cold</td> <td><strong>S3 + Parquet</strong></td> <td>S3</td> <td>GCS</td> </tr> <tr> <td>Observability</td> <td><strong>OTel + Grafana</strong></td> <td>CloudWatch</td> <td>Cloud Monitoring</td> </tr> </tbody> </table> <hr/> <h2 id="下一篇">下一篇</h2> <ul> <li><a href="/blog/2026/iot-1m-device-architecture-part2/">Part 2：安全與多租戶</a> — HTTPS/TLS、mTLS、Cert Rotation、RBAC、Topic ACL、DB 隔離</li> <li><a href="/blog/2026/iot-1m-device-architecture-part3/">Part 3：運維、成本與可靠性</a> — Rate Limiting、Edge Resilience、DR、Observability、成本估算、團隊風險</li> </ul> <h3 id="相關資源">相關資源</h3> <ul> <li><a href="https://docs.emqx.com/">EMQX Documentation</a> — MQTT Broker + Rule Engine</li> <li><a href="https://docs.timescale.com/">TimescaleDB Documentation</a> — Time-Series DB + Continuous Aggregates</li> </ul>]]></content><author><name></name></author><category term="iot"/><category term="mqtt"/><category term="architecture"/><category term="python"/><category term="event-driven"/><summary type="html"><![CDATA[Phase 1 核心架構：EMQX Rule Engine + TimescaleDB + FastAPI + BFF，含三層儲存、成本估算、AWS/GCP 對照]]></summary></entry><entry><title type="html">Claude Code Channel Plugin 開發實戰：Telegram Inline Buttons</title><link href="https://osisdie.github.io/blog/2026/claude-code-channel-plugin-dev/" rel="alternate" type="text/html" title="Claude Code Channel Plugin 開發實戰：Telegram Inline Buttons"/><published>2026-03-27T02:00:00+00:00</published><updated>2026-03-27T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/claude-code-channel-plugin-dev</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/claude-code-channel-plugin-dev/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/channel-plugin-dev/channel-plugin-dev-overview-480.webp 480w,/assets/img/blog/2026/channel-plugin-dev/channel-plugin-dev-overview-800.webp 800w,/assets/img/blog/2026/channel-plugin-dev/channel-plugin-dev-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/channel-plugin-dev/channel-plugin-dev-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="Claude Code Channel Plugin Architecture" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Channel Plugin 架構：Telegram Inline Buttons 與 Cache Patching 機制</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — Claude Code’s <code class="language-plaintext highlighter-rouge">--channels</code> flag only accepts official plugin identifiers and re-extracts the plugin into a cache directory on every launch, overwriting local modifications. After trying 6 different approaches (pre-sync copy, background watcher, <code class="language-plaintext highlighter-rouge">--plugin-dir</code>, <code class="language-plaintext highlighter-rouge">--mcp-config</code>, symlink, and cache patching), we found that <strong>cache patching</strong> — rewriting the cached <code class="language-plaintext highlighter-rouge">.mcp.json</code> to redirect <code class="language-plaintext highlighter-rouge">--cwd</code> to a local fork — is the cleanest workaround: idempotent, no residual state, and compatible with inbound channel notifications. This article also covers implementing Telegram inline keyboard buttons via raw Bot API format (bypassing grammy’s serialization issue), callback query handling, and credential isolation with <code class="language-plaintext highlighter-rouge">TELEGRAM_STATE_DIR</code>.</p> </blockquote> <h2 id="前言">前言</h2> <p><a href="https://github.com/osisdie/claude-code-channels">claude-code-channels</a> 是一個讓 Claude Code 透過 Telegram、Discord、Slack、LINE、WhatsApp 等通訊平台互動的開源專案。每個 channel 都是一個 MCP server，以 Bun subprocess 的形式運行，透過 stdio transport 與 Claude Code session 溝通。</p> <p>今天的目標看似簡單：讓 Telegram 的 <code class="language-plaintext highlighter-rouge">reply</code> tool 支援 <strong>inline keyboard buttons</strong>。實作按鈕本身不難，但在過程中踩到了 Claude Code plugin cache 的覆蓋機制，最終花了更多時間在架構問題上。這篇文章記錄完整過程。</p> <hr/> <h2 id="channel-資料流">Channel 資料流</h2> <p>以下是 channel message 流程：</p> <p><strong>Normal Message Flow:</strong></p> <pre><code class="language-mermaid">flowchart LR
    U[User] --&gt;|message| B[Bot]
    B --&gt;|notification| M[MCP Server]
    M --&gt;|stdio| C[Claude Code]
    C --&gt;|reply tool| M
    M --&gt;|sendMessage| B
    B --&gt; U
</code></pre> <p><strong>Inline Button Flow:</strong></p> <pre><code class="language-mermaid">flowchart LR
    C[Claude Code] --&gt;|reply + buttons| M[MCP Server]
    M --&gt;|inline_keyboard| B[Bot]
    B --&gt; U[User]
    U --&gt;|callback_query| B
    B --&gt;|notification| M
    M --&gt;|forward| C
</code></pre> <hr/> <h2 id="telegram-inline-buttons-實作">Telegram Inline Buttons 實作</h2> <h3 id="需求">需求</h3> <p>當 Claude 需要用戶回應一組固定選項時（Yes/No、Approve/Reject、1~5 數字），讓用戶直接按按鈕比打字更直覺。官方 Telegram plugin 的 <code class="language-plaintext highlighter-rouge">reply</code> tool 只支援純文字，沒有按鈕參數。</p> <h3 id="方案設計">方案設計</h3> <p>在 <code class="language-plaintext highlighter-rouge">reply</code> tool 加一個 optional <code class="language-plaintext highlighter-rouge">buttons</code> 參數，二維字串陣列，每個內層陣列代表一排按鈕：</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Claude 可以發送任意按鈕組合</span>
<span class="nf">reply</span><span class="p">({</span> <span class="nx">chat_id</span><span class="p">,</span> <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">確認部署?</span><span class="dl">"</span><span class="p">,</span> <span class="na">buttons</span><span class="p">:</span> <span class="p">[[</span><span class="dl">"</span><span class="s2">Yes</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">No</span><span class="dl">"</span><span class="p">]]</span> <span class="p">})</span>
<span class="nf">reply</span><span class="p">({</span> <span class="nx">chat_id</span><span class="p">,</span> <span class="na">text</span><span class="p">:</span> <span class="dl">"</span><span class="s2">選擇方案:</span><span class="dl">"</span><span class="p">,</span> <span class="na">buttons</span><span class="p">:</span> <span class="p">[[</span><span class="dl">"</span><span class="s2">方案A</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">方案B</span><span class="dl">"</span><span class="p">],</span> <span class="p">[</span><span class="dl">"</span><span class="s2">取消</span><span class="dl">"</span><span class="p">]]</span> <span class="p">})</span>
</code></pre></div></div> <h3 id="實際效果">實際效果</h3> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/channel-plugin-dev/telegram-buttons-480.webp 480w,/assets/img/blog/2026/channel-plugin-dev/telegram-buttons-800.webp 800w,/assets/img/blog/2026/channel-plugin-dev/telegram-buttons-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/channel-plugin-dev/telegram-buttons.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="Telegram Inline Buttons Demo" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">三種 inline button 場景：部署確認（Approve/Reject）、功能評分（1-5）、方案選擇（Plan A/B/Skip），底部為 /session status 指令回傳的 STM 狀態</figcaption> </figure> <p>截圖展示了三種常見的互動場景：</p> <ul> <li><strong>部署確認</strong> — 二選一的 Approve/Reject，按下後按鈕消失並顯示結果</li> <li><strong>功能評分</strong> — 單排 5 個數字按鈕，適合量化回饋</li> <li><strong>方案選擇</strong> — 兩排按鈕（多選項 + 跳過），支援任意排列組合</li> </ul> <p>每次按鈕點擊都會作為 inbound message 回傳給 Claude Code session，meta 帶 <code class="language-plaintext highlighter-rouge">button: "true"</code> 標記，Claude 可以直接根據選擇繼續執行。</p> <h3 id="關鍵技術決策">關鍵技術決策</h3> <p><strong>使用 raw Telegram API format，而非 grammy 的 InlineKeyboard class。</strong></p> <p>grammy 的 <code class="language-plaintext highlighter-rouge">InlineKeyboard</code> class 搭配 spread operator 傳入 <code class="language-plaintext highlighter-rouge">sendMessage</code> options 時，<code class="language-plaintext highlighter-rouge">reply_markup</code> 會在序列化過程中丟失 — API 回傳成功但 Telegram 不顯示按鈕。用 curl 直接呼叫 Bot API 測試正常，確認是 grammy class 的問題。改用 raw format 後立即解決：</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">replyMarkup</span> <span class="o">=</span> <span class="p">{</span>
  <span class="na">inline_keyboard</span><span class="p">:</span> <span class="nx">buttons</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="nx">row</span> <span class="o">=&gt;</span>
    <span class="nx">row</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="nx">label</span> <span class="o">=&gt;</span> <span class="p">({</span>
      <span class="na">text</span><span class="p">:</span> <span class="nc">String</span><span class="p">(</span><span class="nx">label</span><span class="p">),</span>
      <span class="na">callback_data</span><span class="p">:</span> <span class="s2">`btn:</span><span class="p">${</span><span class="nc">String</span><span class="p">(</span><span class="nx">label</span><span class="p">).</span><span class="nf">slice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">59</span><span class="p">)}</span><span class="s2">`</span><span class="p">,</span> <span class="c1">// 64 bytes 上限</span>
    <span class="p">}))</span>
  <span class="p">),</span>
<span class="p">}</span>
</code></pre></div></div> <p><strong>按鈕點擊後的 Callback 處理：</strong> 攔截 <code class="language-plaintext highlighter-rouge">btn:</code> prefix 的 <code class="language-plaintext highlighter-rouge">callback_query</code>，三步完成：</p> <ol> <li><code class="language-plaintext highlighter-rouge">answerCallbackQuery()</code> — 消除 Telegram loading 動畫</li> <li><code class="language-plaintext highlighter-rouge">editMessageText()</code> — 更新訊息顯示選擇結果（防止重複點擊）</li> <li><code class="language-plaintext highlighter-rouge">mcp.notification()</code> — 將按鈕 label 作為 inbound message 轉發給 Claude Code session（meta: <code class="language-plaintext highlighter-rouge">button=true</code>）</li> </ol> <p>最後在 MCP server 的 <code class="language-plaintext highlighter-rouge">instructions</code> 加上一句引導：</p> <blockquote> <p>Prefer buttons over asking the user to type whenever the response is a small fixed set of choices.</p> </blockquote> <p>這樣 Claude 在猜數字、確認操作等場景會主動使用 buttons，不需用戶提醒。</p> <hr/> <h2 id="plugin-cache-覆蓋問題">Plugin Cache 覆蓋問題</h2> <h3 id="發現問題">發現問題</h3> <p>按鈕功能寫好了，直接改 <code class="language-plaintext highlighter-rouge">~/.claude/plugins/cache/claude-plugins-official/telegram/0.0.4/server.ts</code>，重啟 Claude Code，按鈕不出現。加了 diagnostic watermark 到回傳值：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sent (id: 54)          ← 沒有 [local-v1] watermark
</code></pre></div></div> <p><strong>確認：Claude Code 在 <code class="language-plaintext highlighter-rouge">--channels plugin:telegram@claude-plugins-official</code> 啟動時，會 re-extract 官方 plugin 到 cache，覆蓋所有修改。</strong></p> <h3 id="嘗試過的-6-種方案">嘗試過的 6 種方案</h3> <table> <thead> <tr> <th>#</th> <th>方案</th> <th>結果</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>Pre-sync cp</td> <td>x — re-extract 覆蓋</td> </tr> <tr> <td>2</td> <td>Background watcher</td> <td>x — race condition</td> </tr> <tr> <td>3</td> <td><code class="language-plaintext highlighter-rouge">--plugin-dir</code></td> <td>x — 不支援 channel plugins</td> </tr> <tr> <td>4</td> <td><code class="language-plaintext highlighter-rouge">--mcp-config</code></td> <td>x — 無 channel notification</td> </tr> <tr> <td>5</td> <td>Symlink</td> <td>△ — 可行，殘留管理麻煩</td> </tr> <tr> <td>6</td> <td><strong>Cache patching</strong></td> <td><strong>✓ 穩定、無殘留、idempotent</strong></td> </tr> </tbody> </table> <h3 id="bun-transpile-cache-的額外坑">Bun Transpile Cache 的額外坑</h3> <p>即使成功把修改放進 cache，重啟後仍可能跑舊 code。原因是 <strong>bun 會 cache transpiled TypeScript</strong>，即使 <code class="language-plaintext highlighter-rouge">server.ts</code> 檔案改了，bun 仍可能使用舊的 cached bytecode。需要 <code class="language-plaintext highlighter-rouge">rm -rf /tmp/bun-*</code> 清除。</p> <p>這個問題已開 upstream issue：<a href="https://github.com/anthropics/claude-plugins-official/issues/1057">anthropics/claude-plugins-official#1057</a></p> <hr/> <h2 id="cache-patching-架構">Cache Patching 架構</h2> <p>最終採用的方案：將官方 plugin fork 到專案裡做版控，啟動時改寫 plugin cache 的啟動設定（<code class="language-plaintext highlighter-rouge">.mcp.json</code>），讓 Claude Code 跑我們的 fork code。</p> <h3 id="工作原理">工作原理</h3> <p>Claude Code 啟動 <code class="language-plaintext highlighter-rouge">--channels plugin:telegram@claude-plugins-official</code> 時，會把官方 plugin 解壓到 cache 目錄。Cache patching <strong>不改 <code class="language-plaintext highlighter-rouge">server.ts</code></strong>，而是改寫 cache 裡的 <code class="language-plaintext highlighter-rouge">.mcp.json</code>，把 <code class="language-plaintext highlighter-rouge">--cwd</code> 指向專案裡的 local fork：</p> <p><code class="language-plaintext highlighter-rouge">~/.claude/plugins/cache/.../telegram/0.0.4/.mcp.json</code>（改寫後）：</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"mcpServers"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"telegram"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"bun"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"args"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"run"</span><span class="p">,</span><span class="w"> </span><span class="s2">"--cwd"</span><span class="p">,</span><span class="w"> </span><span class="s2">"&lt;project&gt;/external_plugins/telegram-channel/"</span><span class="p">,</span><span class="w"> </span><span class="s2">"server.ts"</span><span class="p">],</span><span class="w">
      </span><span class="nl">"env"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"TELEGRAM_STATE_DIR"</span><span class="p">:</span><span class="w"> </span><span class="s2">"&lt;project&gt;/.claude/channels/telegram"</span><span class="w"> </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div> <p>這樣 Claude Code 仍用官方 plugin identifier 啟動（保留 inbound notification 路由），但實際執行的是我們 fork 過的 code：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;project&gt;/external_plugins/telegram-channel/
    ├── .mcp.json
    ├── server.ts          ← fork（版控裡的 source of truth）
    ├── skills/
    └── node_modules/
</code></pre></div></div> <p><strong>為什麼不用 symlink？</strong> Symlink 方案可行，但會留下殘留檔案（<code class="language-plaintext highlighter-rouge">.official</code> 備份目錄），升級時也需要額外清理。Cache patching 是 <strong>idempotent</strong> 的 — 每次啟動重新寫入，不殘留任何狀態。</p> <h3 id="通用場景">通用場景</h3> <p>Cache patching 不限於 Telegram — <strong>任何</strong> <code class="language-plaintext highlighter-rouge">--channels plugin:xxx@claude-plugins-official</code> 都適用同一套機制。只要將官方 plugin fork 到 <code class="language-plaintext highlighter-rouge">external_plugins/&lt;channel&gt;-channel/</code>，啟動腳本就能自動改寫對應的 <code class="language-plaintext highlighter-rouge">.mcp.json</code>。目前 <a href="https://github.com/osisdie/claude-code-channels">claude-code-channels</a> 已對 Telegram 和 Discord 使用此方案。</p> <p>適用條件：</p> <ul> <li>需要修改官方 channel plugin 的行為（加功能、修 bug、改 skill 路徑）</li> <li>需要保留官方 plugin identifier 的 inbound notification 路由</li> <li>不想維護 symlink 或其他有狀態的 workaround</li> </ul> <h3 id="重要設定channelsenabled">重要設定：<code class="language-plaintext highlighter-rouge">channelsEnabled</code></h3> <p>Claude Code 的 channel notification（inbound 訊息）預設是關閉的。需要在 settings 裡開啟：</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"channelsEnabled"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div> <p><strong>沒有這個設定，outbound（發訊息）正常運作，但 inbound 會被靜默丟棄</strong> — 這是最容易被忽略的坑。</p> <h3 id="credential-與-state_dir-隔離">Credential 與 STATE_DIR 隔離</h3> <p>Bot token 和 access control 只存在<strong>專案目錄</strong>內（透過 <code class="language-plaintext highlighter-rouge">TELEGRAM_STATE_DIR</code> 環境變數指定），不會暴露到 <code class="language-plaintext highlighter-rouge">~/.claude/channels/telegram/</code> 這個全域路徑。這確保每個專案的 credentials 互相隔離，不會被其他 Claude Code session 讀取。</p> <p>官方 plugin 的 skills（<code class="language-plaintext highlighter-rouge">/telegram:access</code>、<code class="language-plaintext highlighter-rouge">/telegram:configure</code>）原本把路徑 hardcode 為 <code class="language-plaintext highlighter-rouge">~/.claude/channels/telegram/</code>，導致設定 <code class="language-plaintext highlighter-rouge">TELEGRAM_STATE_DIR</code> 後 pairing 失敗。Fork 後修復：skills 改用 <code class="language-plaintext highlighter-rouge">$STATE</code> shorthand，由 <code class="language-plaintext highlighter-rouge">$TELEGRAM_STATE_DIR</code> 解析，fallback 到 global。同樣的修復也套用到 Discord plugin 和 <code class="language-plaintext highlighter-rouge">ACCESS.md</code> 文件。</p> <hr/> <h2 id="總結">總結</h2> <h3 id="takeaways">Takeaways</h3> <ol> <li> <p><strong>Plugin 開發最大障礙</strong>：Claude Code 的 <code class="language-plaintext highlighter-rouge">--channels</code> 只接受官方 plugin identifier，啟動時會 re-extract 覆蓋 cache。目前沒有官方的 local plugin 載入方式。</p> </li> <li> <p><strong>Cache patching 是最乾淨的 workaround</strong>：改寫 plugin cache 的啟動設定指向 local fork，idempotent 且無殘留，比 symlink、pre-sync、background watcher 都穩定。</p> </li> <li> <p><strong><code class="language-plaintext highlighter-rouge">channelsEnabled: true</code> 容易被忽略</strong>：沒有這個設定，outbound 正常但 inbound 被靜默丟棄，debug 時很容易誤判為 bot polling 問題。</p> </li> <li> <p><strong>Bun transpile cache 是隱性坑</strong>：改了 TypeScript 原始碼，bun 可能仍跑舊版本。清 <code class="language-plaintext highlighter-rouge">/tmp/bun-*</code> 或設定 <code class="language-plaintext highlighter-rouge">BUN_DISABLE_CACHE=1</code> 可解。</p> </li> <li> <p><strong>建議官方改進</strong>：</p> <ul> <li><code class="language-plaintext highlighter-rouge">--channels</code> 支援 local path（類似 <code class="language-plaintext highlighter-rouge">--plugin-dir</code>）</li> <li>Plugin 啟動時加 <code class="language-plaintext highlighter-rouge">--no-cache</code> 避免 transpile cache 問題</li> </ul> </li> </ol> <h3 id="相關連結">相關連結</h3> <ul> <li><a href="https://github.com/osisdie/claude-code-channels">claude-code-channels</a> — 本專案</li> <li><a href="https://github.com/anthropics/claude-plugins-official/issues/1057">anthropics/claude-plugins-official#1057</a> — Bun cache issue</li> </ul>]]></content><author><name></name></author><category term="claude-code"/><category term="telegram"/><category term="plugin"/><category term="mcp"/><category term="automation"/><summary type="html"><![CDATA[從 Telegram inline buttons 到 plugin cache 覆蓋問題，記錄嘗試 6 種方案最終用 cache patching 解決的完整過程]]></summary></entry><entry><title type="html">深入解析 Claude Code 的 Ralph Loop Stop Hook</title><link href="https://osisdie.github.io/blog/2026/ralph-loop-stop-hook/" rel="alternate" type="text/html" title="深入解析 Claude Code 的 Ralph Loop Stop Hook"/><published>2026-03-26T02:00:00+00:00</published><updated>2026-03-26T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/ralph-loop-stop-hook</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/ralph-loop-stop-hook/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/ralph-loop/ralph-loop-overview-480.webp 480w,/assets/img/blog/2026/ralph-loop/ralph-loop-overview-800.webp 800w,/assets/img/blog/2026/ralph-loop/ralph-loop-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/ralph-loop/ralph-loop-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="Ralph Loop Stop Hook Architecture" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">Ralph Loop Stop Hook 運作流程與 State File 結構</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — The Ralph Loop Stop Hook is a bash-based hook for Claude Code that enables autonomous, iterative AI agent sessions. When Claude finishes a response, the Stop Hook intercepts the session exit, reads the agent’s transcript, checks for a completion promise, and — if the task isn’t done — re-injects the original prompt to continue the loop. This article dissects the 191-line script: state file architecture (YAML frontmatter + markdown prompt), session isolation to prevent cross-session interference, JSONL transcript parsing, Perl-based <code class="language-plaintext highlighter-rouge">&lt;promise&gt;</code> tag detection, and atomic state updates. Includes the actual source code with production safety considerations.</p> </blockquote> <p>Claude Code 的 Hook 機制讓開發者可以在 AI agent 的生命週期中插入自訂邏輯。其中 <strong>Stop Hook</strong> 是最強大的一種 — 它在 Claude 每次完成回應時觸發，可以決定是否<strong>阻止 session 結束並繼續執行</strong>。Ralph Loop 正是利用這個機制，實現了 AI agent 的自主迭代。</p> <hr/> <h2 id="起因一個神秘的-permission-denied">起因：一個神秘的 Permission Denied</h2> <p>事情的起點是我的 Claude Code session 底部不斷閃過這行錯誤：</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ran 1 stop hook (ctrl+o to expand)
⎿ Stop hook error: Failed with non-blocking status code:
  /bin/sh: 1: ~/.claude/plugins/marketplaces/claude-plugins-official/
  plugins/ralph-loop/hooks/stop-hook.sh: Permission denied
</code></pre></div></div> <p>每次 Claude 完成回應都會觸發一次，雖然標示 <strong>non-blocking</strong>（不影響正常使用），但反覆出現讓人好奇 — <strong>Ralph Loop 到底是什麼？為什麼它的 Stop Hook 會在我的 session 裡觸發？</strong></p> <p>追查後才發現，這是安裝 <a href="https://github.com/anthropics/claude-plugins-official">claude-plugins-official</a> marketplace 時一起帶入的 plugin。腳本沒有執行權限（<code class="language-plaintext highlighter-rouge">chmod +x</code>），所以每次都報 Permission Denied。修正權限後錯誤消失，但也因此讓我深入研究了這個設計精巧的 Stop Hook。</p> <hr/> <h2 id="什麼是-ralph-loop">什麼是 Ralph Loop？</h2> <p>Ralph Loop 是一個 <strong>Stop Hook 腳本</strong>，核心功能很簡單：</p> <ol> <li>Claude 完成回應 → Stop Hook 觸發</li> <li>檢查是否有活躍的迴圈（state file 是否存在）</li> <li>如果任務未完成 → <strong>阻止 session 結束</strong>，重新注入 prompt</li> <li>Claude 讀取自己上一輪的輸出，繼續改進</li> </ol> <p>這創造了一個<strong>自我參照的迭代迴路</strong> — Claude 反覆檢視並改進自己的工作，直到達成完成條件或達到迭代上限。</p> <hr/> <h2 id="運作流程">運作流程</h2> <h3 id="1-hook-觸發與-state-檢查">1. Hook 觸發與 State 檢查</h3> <p>Stop Hook 首先讀取 stdin 的 JSON 輸入，然後檢查 state file 是否存在：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Read hook input from stdin (advanced stop hook API)</span>
<span class="nv">HOOK_INPUT</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span><span class="si">)</span>

<span class="c"># Check if ralph-loop is active</span>
<span class="nv">RALPH_STATE_FILE</span><span class="o">=</span><span class="s2">".claude/ralph-loop.local.md"</span>

<span class="k">if</span> <span class="o">[[</span> <span class="o">!</span> <span class="nt">-f</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
  <span class="c"># No active loop - allow exit</span>
  <span class="nb">exit </span>0
<span class="k">fi</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — <code class="language-plaintext highlighter-rouge">exit 0</code> 代表 hook 正常完成但不阻擋。只有輸出 <code class="language-plaintext highlighter-rouge">{"decision": "block"}</code> 的 JSON 才能阻止 session 結束。State file 不存在時，hook 是完全透明的。</p> </blockquote> <h3 id="2-yaml-frontmatter-解析">2. YAML Frontmatter 解析</h3> <p>State file 使用 <strong>YAML frontmatter + Markdown body</strong> 的格式，與 Jekyll post 結構一致：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Parse markdown frontmatter and extract values</span>
<span class="nv">FRONTMATTER</span><span class="o">=</span><span class="si">$(</span><span class="nb">sed</span> <span class="nt">-n</span> <span class="s1">'/^---$/,/^---$/{ /^---$/d; p; }'</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span><span class="si">)</span>
<span class="nv">ITERATION</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$FRONTMATTER</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'^iteration:'</span> | <span class="nb">sed</span> <span class="s1">'s/iteration: *//'</span><span class="si">)</span>
<span class="nv">MAX_ITERATIONS</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$FRONTMATTER</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'^max_iterations:'</span> | <span class="nb">sed</span> <span class="s1">'s/max_iterations: *//'</span><span class="si">)</span>
<span class="nv">COMPLETION_PROMISE</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$FRONTMATTER</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'^completion_promise:'</span> | <span class="se">\</span>
  <span class="nb">sed</span> <span class="s1">'s/completion_promise: *//'</span> | <span class="nb">sed</span> <span class="s1">'s/^"\(.*\)"$/\1/'</span><span class="si">)</span>
</code></pre></div></div> <p>State file 結構如下：</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">iteration</span><span class="pi">:</span> <span class="m">3</span>
<span class="na">max_iterations</span><span class="pi">:</span> <span class="m">10</span>
<span class="na">completion_promise</span><span class="pi">:</span> <span class="s2">"</span><span class="s">DONE"</span>
<span class="na">session_id</span><span class="pi">:</span> <span class="s">abc123</span>
<span class="nn">---</span>
<span class="s">Your prompt text here.</span>
<span class="s">每次迭代都會將這段 prompt 重新注入 Claude。</span>
</code></pre></div></div> <h3 id="3-session-隔離">3. Session 隔離</h3> <p>State file 是 <strong>project-scoped</strong>（位於 <code class="language-plaintext highlighter-rouge">.claude/</code> 目錄），但 Stop Hook 會在該 project 下的<strong>所有 Claude Code session 中觸發</strong>。如果另一個 session 開了同一個 project，不應該被這個 loop 阻擋：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">STATE_SESSION</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$FRONTMATTER</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'^session_id:'</span> | <span class="se">\</span>
  <span class="nb">sed</span> <span class="s1">'s/session_id: *//'</span> <span class="o">||</span> <span class="nb">true</span><span class="si">)</span>
<span class="nv">HOOK_SESSION</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$HOOK_INPUT</span><span class="s2">"</span> | jq <span class="nt">-r</span> <span class="s1">'.session_id // ""'</span><span class="si">)</span>

<span class="k">if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$STATE_SESSION</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$STATE_SESSION</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"</span><span class="nv">$HOOK_SESSION</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">exit </span>0  <span class="c"># Wrong session - don't interfere</span>
<span class="k">fi</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — 沒有 session isolation 的話，在同一個 project 開兩個 terminal 跑 Claude Code，一個 session 的 loop 會阻擋另一個 session 的正常退出。這是實際部署中很容易踩到的坑。</p> </blockquote> <h3 id="4-迭代上限與數值驗證">4. 迭代上限與數值驗證</h3> <p>在做算術運算前，先驗證欄位是否為合法數字 — 防止 state file 被手動編輯後導致 bash 報錯：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[[</span> <span class="o">!</span> <span class="s2">"</span><span class="nv">$ITERATION</span><span class="s2">"</span> <span class="o">=</span>~ ^[0-9]+<span class="nv">$ </span><span class="o">]]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Warning: State file corrupted"</span> <span class="o">&gt;</span>&amp;2
  <span class="nb">rm</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span>
  <span class="nb">exit </span>0
<span class="k">fi</span>

<span class="c"># Check if max iterations reached</span>
<span class="k">if</span> <span class="o">[[</span> <span class="nv">$MAX_ITERATIONS</span> <span class="nt">-gt</span> 0 <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="o">[[</span> <span class="nv">$ITERATION</span> <span class="nt">-ge</span> <span class="nv">$MAX_ITERATIONS</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Ralph loop: Max iterations (</span><span class="nv">$MAX_ITERATIONS</span><span class="s2">) reached."</span>
  <span class="nb">rm</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span>
  <span class="nb">exit </span>0
<span class="k">fi</span>
</code></pre></div></div> <h3 id="5-transcript-解析">5. Transcript 解析</h3> <p>Claude Code 的 transcript 是 <strong>JSONL 格式</strong>（每行一個 JSON），每個 content block（text / tool_use / thinking）都是獨立的一行。Hook 需要從中提取最後一段 assistant 文字：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">TRANSCRIPT_PATH</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$HOOK_INPUT</span><span class="s2">"</span> | jq <span class="nt">-r</span> <span class="s1">'.transcript_path'</span><span class="si">)</span>

<span class="c"># Extract last 100 assistant lines for performance</span>
<span class="nv">LAST_LINES</span><span class="o">=</span><span class="si">$(</span><span class="nb">grep</span> <span class="s1">'"role":"assistant"'</span> <span class="s2">"</span><span class="nv">$TRANSCRIPT_PATH</span><span class="s2">"</span> | <span class="nb">tail</span> <span class="nt">-n</span> 100<span class="si">)</span>

<span class="c"># Parse and get the final text block</span>
<span class="nv">LAST_OUTPUT</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$LAST_LINES</span><span class="s2">"</span> | jq <span class="nt">-rs</span> <span class="s1">'
  map(.message.content[]? | select(.type == "text") | .text) | last // ""
'</span><span class="si">)</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — <code class="language-plaintext highlighter-rouge">tail -n 100</code> 是效能考量：長時間 session 的 transcript 可能有數千行，全部用 jq slurp 會很慢。100 行足以涵蓋最近的 assistant 回應。</p> </blockquote> <h3 id="6-completion-promise-偵測">6. Completion Promise 偵測</h3> <p>Ralph Loop 使用 <code class="language-plaintext highlighter-rouge">&lt;promise&gt;</code> tag 作為完成信號。Claude 在輸出中寫入 <code class="language-plaintext highlighter-rouge">&lt;promise&gt;DONE&lt;/promise&gt;</code> 就代表任務已完成：</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$COMPLETION_PROMISE</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"null"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$COMPLETION_PROMISE</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
  <span class="c"># Extract text from &lt;promise&gt; tags using Perl for multiline support</span>
  <span class="nv">PROMISE_TEXT</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$LAST_OUTPUT</span><span class="s2">"</span> | <span class="se">\</span>
    perl <span class="nt">-0777</span> <span class="nt">-pe</span> <span class="s1">'s/.*?&lt;promise&gt;(.*?)&lt;\/promise&gt;.*/$1/s; s/^\s+|\s+$//g; s/\s+/ /g'</span> <span class="se">\</span>
    2&gt;/dev/null <span class="o">||</span> <span class="nb">echo</span> <span class="s2">""</span><span class="si">)</span>

  <span class="c"># Literal string comparison (not glob pattern matching)</span>
  <span class="k">if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$PROMISE_TEXT</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$PROMISE_TEXT</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"</span><span class="nv">$COMPLETION_PROMISE</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"Ralph loop: Detected &lt;promise&gt;</span><span class="nv">$COMPLETION_PROMISE</span><span class="s2">&lt;/promise&gt;"</span>
    <span class="nb">rm</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span>
    <span class="nb">exit </span>0
  <span class="k">fi
fi</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — 使用 <code class="language-plaintext highlighter-rouge">=</code> 而非 <code class="language-plaintext highlighter-rouge">==</code> 做比較是刻意的：<code class="language-plaintext highlighter-rouge">[[ ]]</code> 中 <code class="language-plaintext highlighter-rouge">==</code> 會做 <strong>glob pattern matching</strong>，如果 promise 文字包含 <code class="language-plaintext highlighter-rouge">*</code> 或 <code class="language-plaintext highlighter-rouge">?</code> 會導致非預期的匹配。<code class="language-plaintext highlighter-rouge">=</code> 是 literal string comparison，更安全。</p> </blockquote> <h3 id="7-迴圈繼續">7. 迴圈繼續</h3> <p>如果 promise 未達成且迭代未到上限，hook 會：</p> <ol> <li>更新 state file 的 iteration 計數（原子操作）</li> <li>提取 prompt 文字</li> <li>輸出 JSON 阻止 session 結束</li> </ol> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">NEXT_ITERATION</span><span class="o">=</span><span class="k">$((</span>ITERATION <span class="o">+</span> <span class="m">1</span><span class="k">))</span>

<span class="c"># Atomic state update: temp file + mv</span>
<span class="nv">TEMP_FILE</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">RALPH_STATE_FILE</span><span class="k">}</span><span class="s2">.tmp.</span><span class="nv">$$</span><span class="s2">"</span>
<span class="nb">sed</span> <span class="s2">"s/^iteration: .*/iteration: </span><span class="nv">$NEXT_ITERATION</span><span class="s2">/"</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span> <span class="o">&gt;</span> <span class="s2">"</span><span class="nv">$TEMP_FILE</span><span class="s2">"</span>
<span class="nb">mv</span> <span class="s2">"</span><span class="nv">$TEMP_FILE</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span>

<span class="c"># Extract prompt (everything after the closing ---)</span>
<span class="nv">PROMPT_TEXT</span><span class="o">=</span><span class="si">$(</span><span class="nb">awk</span> <span class="s1">'/^---$/{i++; next} i&gt;=2'</span> <span class="s2">"</span><span class="nv">$RALPH_STATE_FILE</span><span class="s2">"</span><span class="si">)</span>

<span class="c"># Output JSON to block the stop and feed prompt back</span>
jq <span class="nt">-n</span> <span class="se">\</span>
  <span class="nt">--arg</span> prompt <span class="s2">"</span><span class="nv">$PROMPT_TEXT</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--arg</span> msg <span class="s2">"Ralph iteration </span><span class="nv">$NEXT_ITERATION</span><span class="s2"> | To stop: output &lt;promise&gt;</span><span class="nv">$COMPLETION_PROMISE</span><span class="s2">&lt;/promise&gt;"</span> <span class="se">\</span>
  <span class="s1">'{ "decision": "block", "reason": $prompt, "systemMessage": $msg }'</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — <code class="language-plaintext highlighter-rouge">mv</code> 是 POSIX 保證的<strong>原子操作</strong>（在同一檔案系統上）。直接 <code class="language-plaintext highlighter-rouge">sed -i</code> 在寫入中途若進程被殺，會留下損壞的 state file。temp file + mv 確保 state file 永遠是完整的。</p> </blockquote> <hr/> <h2 id="實際應用場景">實際應用場景</h2> <p><strong>自動化測試修復迴圈：</strong></p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/ralph-loop <span class="s2">"Run the failing tests. Fix the code. Re-run tests.
Repeat until all pass."</span> <span class="nt">--max-iterations</span> 5 <span class="nt">--completion-promise</span> <span class="s2">"ALL TESTS PASS"</span>
</code></pre></div></div> <p><strong>文件品質自審迴圈：</strong></p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/ralph-loop <span class="s2">"Review the PR diff. Check for bugs, security issues,
and style violations. If you find issues, fix them and re-review."</span>
<span class="nt">--max-iterations</span> 3 <span class="nt">--completion-promise</span> <span class="s2">"REVIEW COMPLETE"</span>
</code></pre></div></div> <p><strong>漸進式重構：</strong></p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/ralph-loop <span class="s2">"Refactor the auth module. Each iteration, improve one aspect:
naming, error handling, or test coverage."</span>
<span class="nt">--max-iterations</span> 4 <span class="nt">--completion-promise</span> <span class="s2">"REFACTOR DONE"</span>
</code></pre></div></div> <hr/> <h2 id="安全機制總結">安全機制總結</h2> <table> <thead> <tr> <th>機制</th> <th>用途</th> <th>實作方式</th> </tr> </thead> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">max_iterations</code></td> <td>防止無限迴圈</td> <td>達到上限時刪除 state file，exit 0</td> </tr> <tr> <td>Session Isolation</td> <td>防止跨 session 干擾</td> <td>比對 <code class="language-plaintext highlighter-rouge">session_id</code></td> </tr> <tr> <td>數值驗證</td> <td>防止 state 損壞導致 crash</td> <td>regex 驗證 + 清理</td> </tr> <tr> <td>Atomic Update</td> <td>防止 state file 寫入中途損壞</td> <td>temp file + <code class="language-plaintext highlighter-rouge">mv</code></td> </tr> <tr> <td>Promise Literal Match</td> <td>防止 glob 字元誤匹配</td> <td><code class="language-plaintext highlighter-rouge">=</code> 取代 <code class="language-plaintext highlighter-rouge">==</code></td> </tr> <tr> <td>Transcript Cap</td> <td>防止長 session 效能問題</td> <td><code class="language-plaintext highlighter-rouge">tail -n 100</code></td> </tr> </tbody> </table> <hr/> <h2 id="references">References</h2> <ul> <li><strong>Claude Code Hooks</strong> — <a href="https://docs.anthropic.com/en/docs/claude-code/hooks">Official Documentation</a></li> <li><strong>Ralph Loop Plugin</strong> — <a href="https://www.npmjs.com/package/ralph-wiggum">ralph-wiggum on npm</a></li> <li><strong>Stop Hook Deep Dive</strong> — <a href="https://claudefa.st/blog/tools/hooks/stop-hook-task-enforcement">Claude Code Stop Hook: Force Task Completion</a></li> <li><strong>Source Script</strong> — <a href="https://github.com/osisdie/osisdie.github.io/blob/main/assets/docs/code/ralph-loop/stop-hook.sh"><code class="language-plaintext highlighter-rouge">stop-hook.sh</code></a></li> </ul> <hr/> <blockquote> <p>Source: <a href="https://github.com/osisdie/osisdie.github.io">osisdie/osisdie.github.io</a> — PRs and Issues welcome!</p> </blockquote>]]></content><author><name></name></author><category term="claude-code"/><category term="hooks"/><category term="agent-loop"/><category term="automation"/><category term="bash"/><summary type="html"><![CDATA[拆解 Ralph Loop Stop Hook 的運作機制 — 讓 AI Agent 自主迭代的關鍵技術]]></summary></entry><entry><title type="html">LLM 整合 RAG 技術的核心挑戰與突破方向</title><link href="https://osisdie.github.io/blog/2026/rag-challenges-and-breakthroughs/" rel="alternate" type="text/html" title="LLM 整合 RAG 技術的核心挑戰與突破方向"/><published>2026-03-25T02:00:00+00:00</published><updated>2026-03-25T02:00:00+00:00</updated><id>https://osisdie.github.io/blog/2026/rag-challenges-and-breakthroughs</id><content type="html" xml:base="https://osisdie.github.io/blog/2026/rag-challenges-and-breakthroughs/"><![CDATA[<figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/blog/2026/rag-challenges/rag-challenges-overview-480.webp 480w,/assets/img/blog/2026/rag-challenges/rag-challenges-overview-800.webp 800w,/assets/img/blog/2026/rag-challenges/rag-challenges-overview-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/blog/2026/rag-challenges/rag-challenges-overview.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" alt="RAG Challenges and Solutions Architecture" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> <figcaption class="caption">RAG 核心挑戰與對應的突破解決方案</figcaption> </figure> <blockquote> <p><strong>English Abstract</strong> — As RAG (Retrieval-Augmented Generation) moves from proof-of-concept to production in 2026, six core challenges have emerged: retrieval quality gaps, the “Lost in the Middle” attention problem, knowledge conflicts between retrieved documents and parametric memory, hallucination propagation from bad retrievals, inability to perform multi-hop reasoning, and latency/cost at scale. This article examines each challenge and maps them to four breakthrough solutions: Hybrid Search + Reranking, Agentic RAG / Graph RAG, Self-RAG, and the RAGAS evaluation framework — with pseudocode examples and production considerations.</p> </blockquote> <p>2026 年生產環境中，RAG 不再是「加分項」，而是「必備項」— 但多數團隊仍在踩雷。本文全面分析 RAG 面臨的六大核心挑戰與四大突破方向，附帶 pseudocode 與實戰注意事項。</p> <hr/> <h2 id="核心挑戰">核心挑戰</h2> <h3 id="1-檢索品質的瓶頸">1. 檢索品質的瓶頸</h3> <p>RAG 的效果高度依賴「<strong>找得到</strong>」的前提。傳統向量相似度搜尋（<strong>cosine similarity</strong>）在語意模糊或多義詞情境下容易失準，例如查詢「蘋果市值」時可能同時召回水果和科技公司的文件。此外，文件切分（<strong>chunking</strong>）策略若處理不當，同一個概念被切斷後，單獨的 chunk 會失去上下文意義。→ 這正是 <strong>Hybrid Search + Reranking</strong> 要解決的問題。</p> <h3 id="2-知識整合的挑戰lost-in-the-middle">2. 知識整合的挑戰（Lost in the Middle）</h3> <p>研究顯示，當 LLM 的 <strong>context window</strong> 塞入大量 retrieved 文件時，<strong>模型對位於中間位置的文件注意力顯著下降</strong>，容易忽略關鍵資訊。這個問題在 context 超過 4k token 時尤為明顯。→ 解法是 <strong>Long-Context 重新排列</strong>與壓縮式摘要。</p> <h3 id="3-知識衝突knowledge-conflict">3. 知識衝突（Knowledge Conflict）</h3> <p>外部檢索到的文件與 LLM 本身的<strong>參數知識</strong>（<strong>parametric knowledge</strong>）可能互相矛盾。例如模型訓練時學到「X 是 CEO」，但最新文件顯示已換人，模型可能固執地相信自己的舊知識。→ 需要<strong>指令強化</strong>明確提示「以文件為準」。</p> <h3 id="4-幻覺傳染hallucination-propagation">4. 幻覺傳染（Hallucination Propagation）</h3> <p>若 retriever 召回了錯誤或無關文件，LLM 傾向於「信任」並據此生成，<strong>反而比不做 RAG 更糟</strong>，因為模型會把錯誤資訊包裝成有根據的回答。→ <strong>Faithfulness 評估模型</strong>與 RAGAS 框架能有效偵測這個問題。</p> <h3 id="5-跨文件推理受限multi-hop-reasoning">5. 跨文件推理受限（Multi-hop Reasoning）</h3> <p>複雜問題需要跨多份文件進行推理（A → B → C），但標準 RAG 是「一次性」檢索，無法像人類一樣逐步找到中間線索再繼續深挖。→ <strong>Agentic RAG</strong> 與 <strong>Graph RAG</strong> 正是為此而生。</p> <h3 id="6-延遲與成本">6. 延遲與成本</h3> <p>每次請求需要即時做 embedding 搜尋、重排序（reranking），加上 LLM 推理，整體延遲在生產環境中是顯著挑戰。→ 透過<strong>快取 + 預計算索引</strong>可有效緩解。</p> <hr/> <h2 id="深入解析突破解決方案">深入解析：突破解決方案</h2> <h3 id="hybrid-search--reranking">Hybrid Search + Reranking</h3> <p>結合<strong>稀疏檢索</strong>（BM25，擅長精確關鍵字匹配）與<strong>稠密向量檢索</strong>，再透過 <strong>Cross-Encoder</strong> 做二次排序。這種<strong>兩階段架構</strong>（召回 100 篇 → 精排 top-5）大幅提升最終送入 LLM 的文件品質，是目前業界主流作法。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Hybrid Search + Reranking pseudocode
</span><span class="n">bm25_results</span> <span class="o">=</span> <span class="nf">bm25_search</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">top_k</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
<span class="n">vector_results</span> <span class="o">=</span> <span class="nf">vector_search</span><span class="p">(</span><span class="nf">embed</span><span class="p">(</span><span class="n">query</span><span class="p">),</span> <span class="n">top_k</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>

<span class="c1"># Reciprocal Rank Fusion
</span><span class="n">candidates</span> <span class="o">=</span> <span class="nf">rrf_merge</span><span class="p">(</span><span class="n">bm25_results</span><span class="p">,</span> <span class="n">vector_results</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">60</span><span class="p">)</span>

<span class="c1"># Cross-Encoder reranking
</span><span class="n">scored</span> <span class="o">=</span> <span class="n">cross_encoder</span><span class="p">.</span><span class="nf">predict</span><span class="p">([(</span><span class="n">query</span><span class="p">,</span> <span class="n">doc</span><span class="p">)</span> <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">candidates</span><span class="p">])</span>
<span class="n">top_docs</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">scored</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)[:</span><span class="mi">5</span><span class="p">]</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — Cross-Encoder reranking 延遲約 50-200ms（取決於模型大小）。可用輕量 reranker（如 <code class="language-plaintext highlighter-rouge">bge-reranker-v2-m3</code>）在 &lt;50ms 完成。召回階段用 <strong>ANN 近似搜尋</strong>（HNSW）而非暴力搜尋以降低 p99 延遲。</p> </blockquote> <h3 id="agentic-rag-與-graph-rag">Agentic RAG 與 Graph RAG</h3> <p>Agentic RAG 讓 LLM 作為 agent，根據前一次檢索的結果決定下一個查詢，支援跨文件多步推理。<strong>Graph RAG</strong>（Microsoft 2024 年提出）則將知識以圖結構儲存，能捕捉實體間的關係，對「比較型」和「概念聯結型」問題效果顯著優於傳統向量 RAG。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Agentic RAG pseudocode — iterative retrieval loop
</span><span class="n">context</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">step</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">MAX_ITERATIONS</span><span class="p">):</span>  <span class="c1"># guard: prevent infinite loops
</span>    <span class="n">action</span> <span class="o">=</span> <span class="n">llm</span><span class="p">.</span><span class="nf">decide</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>  <span class="c1"># "search" | "answer" | "refine"
</span>    <span class="k">if</span> <span class="n">action</span> <span class="o">==</span> <span class="sh">"</span><span class="s">answer</span><span class="sh">"</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">llm</span><span class="p">.</span><span class="nf">generate</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
    <span class="k">elif</span> <span class="n">action</span> <span class="o">==</span> <span class="sh">"</span><span class="s">search</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">new_query</span> <span class="o">=</span> <span class="n">llm</span><span class="p">.</span><span class="nf">rewrite_query</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
        <span class="n">docs</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="n">new_query</span><span class="p">)</span>
        <span class="n">context</span><span class="p">.</span><span class="nf">extend</span><span class="p">(</span><span class="n">docs</span><span class="p">)</span>
    <span class="k">elif</span> <span class="n">action</span> <span class="o">==</span> <span class="sh">"</span><span class="s">refine</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">query</span> <span class="o">=</span> <span class="n">llm</span><span class="p">.</span><span class="nf">decompose</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>  <span class="c1"># break into sub-questions
</span></code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — 務必設定 <code class="language-plaintext highlighter-rouge">MAX_ITERATIONS</code>（建議 3-5），避免 agent 陷入無限循環。每輪迭代的 token 消耗會累積，需監控成本。Graph RAG 的建圖成本高（indexing 階段），但查詢階段效率與向量 RAG 相當。</p> </blockquote> <h3 id="self-rag">Self-RAG</h3> <p>這是一個較根本的架構改變：模型學會在生成過程中自行插入特殊 token，決定「現在需不需要檢索」、「這段生成是否有文件支持」，把檢索決策內化到模型本身，而非外部固定流程。</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Self-RAG — model generates special tokens during inference
</span><span class="n">output_tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">segment</span> <span class="ow">in</span> <span class="nf">generate_segments</span><span class="p">(</span><span class="n">query</span><span class="p">):</span>
    <span class="c1"># Model outputs a retrieval decision token
</span>    <span class="k">if</span> <span class="n">segment</span><span class="p">.</span><span class="n">retrieval_token</span> <span class="o">==</span> <span class="sh">"</span><span class="s">[Retrieve=Yes]</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">docs</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="n">segment</span><span class="p">.</span><span class="n">text</span><span class="p">)</span>
        <span class="n">segment</span> <span class="o">=</span> <span class="nf">regenerate_with_context</span><span class="p">(</span><span class="n">segment</span><span class="p">,</span> <span class="n">docs</span><span class="p">)</span>
    <span class="c1"># Model self-evaluates with support token
</span>    <span class="k">if</span> <span class="n">segment</span><span class="p">.</span><span class="n">support_token</span> <span class="o">==</span> <span class="sh">"</span><span class="s">[Fully Supported]</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">output_tokens</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">segment</span><span class="p">)</span>
    <span class="k">elif</span> <span class="n">segment</span><span class="p">.</span><span class="n">support_token</span> <span class="o">==</span> <span class="sh">"</span><span class="s">[No Support]</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">output_tokens</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nf">flag_as_uncertain</span><span class="p">(</span><span class="n">segment</span><span class="p">))</span>
</code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — Self-RAG 需要專門微調的模型（原論文使用 Llama 2 微調）。推論延遲比標準 RAG 高約 1.5-2x，因為需要多次生成 + 評估。適合<strong>高精度場景</strong>（醫療、法律），不適合低延遲需求。</p> </blockquote> <h3 id="ragas-評估框架">RAGAS 評估框架</h3> <p>RAG 系統的評估一直是痛點。RAGAS 提供了四個維度的自動化評估：</p> <ul> <li><strong>Faithfulness</strong> – 生成是否忠實於文件</li> <li><strong>Answer Relevancy</strong> – 答案是否回答問題</li> <li><strong>Context Recall</strong> – 需要的資訊是否被召回</li> <li><strong>Context Precision</strong> – 召回的文件是否相關</li> </ul> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># RAGAS evaluation pseudocode
</span><span class="k">for</span> <span class="n">question</span><span class="p">,</span> <span class="n">ground_truth</span> <span class="ow">in</span> <span class="n">eval_dataset</span><span class="p">:</span>
    <span class="n">contexts</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="n">question</span><span class="p">)</span>
    <span class="n">answer</span> <span class="o">=</span> <span class="n">llm</span><span class="p">.</span><span class="nf">generate</span><span class="p">(</span><span class="n">question</span><span class="p">,</span> <span class="n">contexts</span><span class="p">)</span>

    <span class="n">scores</span> <span class="o">=</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">faithfulness</span><span class="sh">"</span><span class="p">:</span>      <span class="n">ragas</span><span class="p">.</span><span class="nf">faithfulness</span><span class="p">(</span><span class="n">answer</span><span class="p">,</span> <span class="n">contexts</span><span class="p">),</span>
        <span class="sh">"</span><span class="s">answer_relevancy</span><span class="sh">"</span><span class="p">:</span>  <span class="n">ragas</span><span class="p">.</span><span class="nf">relevancy</span><span class="p">(</span><span class="n">answer</span><span class="p">,</span> <span class="n">question</span><span class="p">),</span>
        <span class="sh">"</span><span class="s">context_recall</span><span class="sh">"</span><span class="p">:</span>    <span class="n">ragas</span><span class="p">.</span><span class="nf">recall</span><span class="p">(</span><span class="n">contexts</span><span class="p">,</span> <span class="n">ground_truth</span><span class="p">),</span>
        <span class="sh">"</span><span class="s">context_precision</span><span class="sh">"</span><span class="p">:</span> <span class="n">ragas</span><span class="p">.</span><span class="nf">precision</span><span class="p">(</span><span class="n">contexts</span><span class="p">,</span> <span class="n">question</span><span class="p">),</span>
    <span class="p">}</span>
<span class="c1"># Aggregate scores to track system improvements over time
</span></code></pre></div></div> <blockquote> <p><strong>Production Notes</strong> — RAGAS 本身使用 LLM 做評估（LLM-as-judge），因此評估成本與被評估系統的推論成本相當。建議在 CI/CD 中對 <strong>golden dataset</strong>（50-100 筆）跑 RAGAS，設定 threshold 作為品質門檻。</p> </blockquote> <p>有了可量化的指標，系統改進才有方向。</p> <hr/> <h2 id="總結趨勢">總結趨勢</h2> <p>目前領域的方向是從「靜態一次性檢索」走向「動態、自反式、多輪」的架構。Long Context 模型的崛起（如 Gemini 1.5 Pro 的 1M token window）讓部分人質疑 RAG 是否仍有必要，但實際上 RAG 的價值在於<strong>知識的可更新性與可溯源性</strong>，而非只是解決 context 長度問題，這是純粹增大 context window 無法取代的。兩者更可能是互補而非替代關係。</p> <hr/> <h2 id="references">References</h2> <ul> <li><strong>Lost in the Middle</strong> — Liu et al., 2023. <a href="https://arxiv.org/abs/2307.03172">Lost in the Middle: How Language Models Use Long Contexts</a></li> <li><strong>Graph RAG</strong> — Microsoft, 2024. <a href="https://arxiv.org/abs/2404.16130">From Local to Global: A Graph RAG Approach</a> · <a href="https://github.com/microsoft/graphrag">GitHub</a></li> <li><strong>Self-RAG</strong> — Asai et al., 2023. <a href="https://arxiv.org/abs/2310.11511">Self-RAG: Learning to Retrieve, Generate, and Critique</a></li> <li><strong>RAGAS</strong> — <a href="https://github.com/explodinggradients/ragas">GitHub</a> · <a href="https://docs.ragas.io/">Documentation</a></li> <li><strong>Corrective RAG</strong> — Yan et al., 2024. <a href="https://arxiv.org/abs/2401.15884">Corrective Retrieval Augmented Generation</a></li> </ul> <h3 id="recommended-repos">Recommended Repos</h3> <ul> <li><a href="https://github.com/microsoft/graphrag">microsoft/graphrag</a> — Production-ready Graph RAG implementation</li> <li><a href="https://github.com/explodinggradients/ragas">explodinggradients/ragas</a> — RAG evaluation framework</li> <li><a href="https://github.com/run-llama/llama_index">run-llama/llama_index</a> — Full-featured RAG framework</li> <li><a href="https://github.com/langchain-ai/langchain">langchain-ai/langchain</a> — LLM application framework with RAG support</li> </ul> <hr/> <blockquote> <p>Source: <a href="https://github.com/osisdie/osisdie.github.io">osisdie/osisdie.github.io</a> — PRs and Issues welcome!</p> </blockquote>]]></content><author><name></name></author><category term="rag"/><category term="llm"/><category term="ai"/><category term="graph-rag"/><category term="agentic-rag"/><category term="self-rag"/><category term="ragas"/><summary type="html"><![CDATA[深入分析 2026 年 RAG 技術面臨的六大挑戰與四大突破解決方案]]></summary></entry></feed>