LLM、推理模型、Agent、Harness大解析：揭秘编程智能体的强大内核！

大家好，我是讯享网，很高兴认识大家。这里提供最前沿的Ai技术和互联网信息。

这篇文章来自 Sebastian Raschka 大神，这篇文章的核心观点是：编程智能体之所以比普通聊天式 LLM 更强，往往不是单靠模型本身，而是靠 live repo context、提示词缓存、结构化工具、上下文压缩、会话记忆和有边界委派等系统设计共同撑起来的。

最后作者还对比了编程工具Codex，Claude Code和OpenClaw的区别到底是什么。

如果你对最近正流行的各种AI概念有迷惑，那么本文正好适合你。

第 1 段 · 先抛出总问题：编程智能体到底由什么组成

In this article, I want to cover the overall design of coding agents and agent harnesses: what they are, how they work, and how the different pieces fit together in practice.

在这篇文章里，作者想系统介绍编程智能体与智能体脚手架的整体设计：它们是什么、如何运作，以及各个部件在实际中怎样彼此配合。

Readers of my Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch) books often ask about agents, so I thought it would be useful to write a reference I can point to.

不少读者在看完他关于大语言模型和推理模型的书后，常常会继续追问“智能体到底是怎么回事”，所以他想写一篇可以反复引用的系统性说明。

重点词汇与表达

英文中文笔记 overall design 整体设计常用于引出系统、产品或架构的全局视角。文中不是讲某个零件，而是讲整套编程智能体的设计框架。 coding agents 编程智能体这里不是泛泛的 agent，而是专门针对软件开发任务设计的智能体系统。 agent harnesses 智能体脚手架 / 智能体框架层这是本文最关键的术语之一，指围绕模型搭建起来的那层软件系统。 fit together in practice 在实际中如何拼接协同常用于讲多个模块如何配合。文中强调的是工程实践里的协同，而不是抽象定义。 write a reference 写一篇参考说明这里的 reference 不是“引用文献”，而是“可以反复指给别人看的说明文”。

第 2 段 · 为什么这个话题重要：真实进步很多来自“怎么用模型”

More generally, agents have become an important topic because much of the recent progress in practical LLM systems is not just about better models, but about how we use them.

更广义地说，智能体之所以成为重要议题，是因为近来许多实用型 LLM 系统的进步，并不只是来自模型更强，还来自我们如何使用模型。

In many real-world applications, the surrounding system, such as tool use, context management, and memory, plays as much of a role as the model itself. This also helps explain why systems like Claude Code or Codex can feel significantly more capable than the same models used in a plain chat interface.

在很多真实应用里，工具调用、上下文管理和记忆等外围系统，与模型本身一样重要。这也解释了为什么 Claude Code 或 Codex 这类系统，往往会让人觉得比相同模型在普通聊天界面里更能干。

重点词汇与表达

英文中文笔记 practical LLM systems 实用型 LLM 系统常用来强调“真实落地”的系统，而不是实验室里的裸模型对比。文中在为后文的系统视角铺路。 surrounding system 外围系统很适合写“模型之外”的那些支撑层。文中指工具、上下文和记忆等配套机制。 plays as much of a role as 起到同样重要的作用这是很有力量的比较表达。文中明确说外围系统的作用不亚于模型本身。 plain chat interface 普通聊天界面用于和 agent 产品形成对照。文中暗示“同模型不同包装，体验差距很大”。 feel significantly more capable 让人感觉强得多这个表达适合写用户感知层面的能力差异。文中说的是产品体验，不只是 benchmark 分数。

第 3 段 · 先把 Claude Code 和 Codex 放到正确位置上

Claude Code or the Codex CLI are essentially agentic coding tools that wrap an LLM in an application layer, a so-called agentic harness, to be more convenient and better-performing for coding tasks.

Claude Code 和 Codex CLI 本质上都是“把 LLM 包在应用层里的编程智能体工具”。这层应用层，也就是所谓的 agentic harness，会让模型在编码任务里更方便也更高效。

Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity.

编程智能体是为软件工作专门工程化出来的系统。真正关键的并不只有模型选择，还有代码仓上下文、工具设计、提示词缓存稳定性、记忆，以及长会话延续能力。

重点词汇与表达

英文中文笔记 wrap an LLM in an application layer 在 LLM 外包一层应用层这是理解 harness 的核心说法。文中表示模型不是直接裸跑，而是被系统化封装起来。 agentic coding tools 具备智能体特征的编程工具这个词组强调它们不是普通 IDE 插件，而是能循环观察、调用工具和推进任务的系统。 engineered for software work 为软件工作专门工程化常见于系统设计语境。文中强调 coding agent 是为特定任务深度优化出来的。 prompt-cache stability 提示词缓存稳定性比较技术化的表达。文中把它列为 coding agent 的关键工程特征之一。 long-session continuity 长会话延续能力很适合写 agent 能否持续处理复杂任务。文中强调编程不是几轮对话就结束。

第 4 段 · 把 LLM、推理模型、agent、harness 分清楚

An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers.

LLM 是核心的 next-token 模型。推理模型本质上仍然是 LLM，只是通常经过训练和提示优化，会在中间推理、验证或候选答案搜索上投入更多推理时算力。

An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer decides what to inspect next, which tools to call, how to update its state, and when to stop.

智能体则是套在模型外层的一层控制循环。给定一个目标之后，agent 会决定下一步检查什么、调用什么工具、怎样更新状态，以及什么时候结束。

Roughly, the LLM is the engine, a reasoning model is a beefed-up engine, and an agent harness is what helps us use the model inside a working system.

粗略地说，LLM 像引擎，推理模型像强化版引擎，而 agent harness 则是帮助我们把这个引擎真正装进可运作系统里的那一层。

重点词汇与表达

英文中文笔记 core next-token model 核心的下一个 token 预测模型这是对 LLM 最基础、最技术化的定义。文中用它把讨论拉回模型本体。 inference-time compute 推理时算力适合写 reasoning model 的成本与能力差异。文中强调推理模型更“肯花算力”。 control loop 控制循环是 agent 的关键定义，不是一次性回答，而是循环做判断和行动。 beefed-up engine 强化版引擎一个很形象的比喻。文中借它说明推理模型更强，但通常也更贵。 working system 可运作的系统文中强调 harness 的价值在于把模型放进真实工作流，而不只是单次问答。

第 5 段 · 作者真正想强调的是：harness 往往决定实际体验

Coding work is only partly about next-token generation. A lot of it is about repo navigation, search, function lookup, diff application, test execution, error inspection, and keeping all the relevant information in context.

编码工作只是在一部分层面上属于“继续生成 token”。更大一部分其实是代码仓导航、搜索、函数定位、应用 diff、运行测试、查看报错，以及在整个过程中维持相关上下文。

The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box.

这里的核心 takeaway 是：一个好的 coding harness，会让推理模型和非推理模型都显得比在普通聊天框里更强。

重点词汇与表达

英文中文笔记 only partly about 只是在一部分意义上属于这个结构很适合写“某事不只是……”。文中用它反驳“编程就是继续写 token”的简化理解。 repo navigation 代码仓导航编程 agent 场景中的高频表达。文中说写代码前要先会在仓库里行动。 diff application 应用 diff 很贴近真实开发工作流。文中强调 agent 的能力包含修改与落地，而不只是解释。 keeping all the relevant information in context 让相关信息始终留在上下文里这是全文最核心的能力要求之一。文中实际上在为“上下文质量决定体验”做铺垫。 takeaway 核心结论很常见的总结词。文中用它明确收束上半篇的概念铺垫。

第 6 段 · 组件一与二：实时代码仓上下文 + 稳定提示前缀缓存

When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, and which project documents might contain instructions.

当用户说“修好测试”或“实现某功能”时，模型最好知道自己是不是在 Git 仓库里、当前分支是什么、哪些项目文档可能带有指令。

The coding agent collects stable facts as a workspace summary upfront so that it is not starting from zero on every prompt.

因此，coding agent 会先收集一组稳定事实，整理成 workspace summary，而不是每次都在零上下文中重新开始。

Smart runtimes do not rebuild everything as one giant undifferentiated prompt on every turn. They keep a stable prompt prefix and only update the parts that change frequently, such as short-term memory, recent transcript, and the newest request.

聪明的 runtime 不会在每一轮都把所有信息重新拼成一个巨大而不分层的 prompt。它们会保留一个稳定的 prompt prefix，只更新那些变化更频繁的部分，比如短期记忆、最近的 transcript 和最新用户请求。

重点词汇与表达

英文中文笔记 stable facts 稳定事实很适合写系统在运行前先抓取的“不太会变”的背景信息。文中指 repo 根目录、分支、文档等。 workspace summary 工作区摘要 coding agent 的核心输入之一。文中说它能避免模型每轮都从零开始猜。 upfront 预先地 / 一开始就常见于流程设计表述。文中强调上下文收集应发生在真正执行任务之前。 giant undifferentiated prompt 巨大且不分层的提示词这是批评低效 prompt 组装方式的形象说法。文中主张要分层、可缓存。 stable prompt prefix 稳定提示前缀 prompt caching 的核心对象。文中说它通常包含规则、工具定义和工作区摘要。

第 7 段 · 组件三：真正像 agent 的地方，是结构化工具调用

A plain model can suggest commands in prose, but an LLM in a coding harness should be able to execute the command and retrieve the results.

普通模型只能在文字里建议你去执行某个命令；但处在 coding harness 里的 LLM，应该能够真正执行命令，并把结果拿回来继续用。

Instead of letting the model improvise arbitrary syntax, the harness usually provides a predefined list of allowed tools with clear inputs and boundaries.

与其让模型自由发挥任意语法，不如由 harness 提供一份预先定义好的工具列表，每个工具都有清晰的输入格式和边界。

The runtime can then ask: Is this a known tool? Are the arguments valid? Does this need user approval? Is the path inside the workspace?

这样一来，runtime 就可以程序化地检查：这是已知工具吗？参数合法吗？需不需要用户批准？目标路径是不是在工作区之内？

重点词汇与表达

英文中文笔记 suggest commands in prose 用文字建议命令用来对比“聊天模型只是建议”与“agent 真执行”之间的差别。 predefined list of allowed tools 预定义且受允许的工具列表这是结构化工具设计的核心。文中强调工具不是随便调用，而是先被约束好。 clear inputs and boundaries 清晰的输入和边界适合写受控系统设计。文中认为它既提高安全性，也提高可靠性。 user approval 用户批准与权限控制绑定在一起。文中说 agent 的执行不是毫无阻拦的。 inside the workspace 在工作区内典型的路径安全表达。文中强调文件访问需要被限制在仓库边界内。

第 8 段 · 组件四：很多“模型质量”，其实是上下文质量

Coding agents are even more susceptible to context bloat than regular LLMs during multi-turn chats, because of repeated file reads, lengthy tool outputs, and logs.

与普通 LLM 多轮聊天相比，编程智能体更容易遭遇上下文膨胀，因为它们会反复读取文件、产生长工具输出和日志。

A good coding harness clips large outputs, deduplicates older file reads, and compresses the transcript before it goes back into the prompt.

好的 coding harness 会裁剪大输出、去重旧文件读取记录，并在内容重新喂回 prompt 前压缩 transcript。

A lot of apparent model quality is really context quality.

很多表面上的“模型质量”，其实本质上是“上下文质量”。

重点词汇与表达

英文中文笔记 susceptible to 更容易受到……影响很适合写系统脆弱点。文中说编程 agent 比普通聊天更容易出现上下文膨胀。 context bloat 上下文膨胀全文非常关键的概念。文中指太多无关或重复信息占满 prompt 预算。 clip large outputs 裁剪大输出典型的工程措施表达。文中说不能让一段长日志占掉整个 prompt。 deduplicate 去重技术写作中常用。文中指别让模型反复看到同一份旧文件内容。 apparent model quality 表面上的模型质量很有观点色彩的写法。文中借它强调用户感受到的“模型强”，经常是上下**得好。

第 9 段 · 组件五：记忆要分层，不能一股脑塞给模型

A coding agent usually separates state into at least two layers: a small working memory and a full transcript.

一个成熟的 coding agent 通常至少会把状态分成两层：一层是小而精炼的 working memory，另一层是完整的 transcript。

The full transcript stores the whole history and makes the session resumable. The working memory is a distilled version that keeps the most important current information.

完整 transcript 保存整个历史，因此会话可以恢复；working memory 则是经过提炼的版本，保留当前最重要的信息。

The compact transcript is for prompt reconstruction, while the working memory is for task continuity.

compact transcript 的职责是帮助重建 prompt，而 working memory 的职责则是维持任务连续性。

重点词汇与表达

英文中文笔记 separates state into two layers 把状态拆成两层这是系统设计里非常重要的表达。文中用它说明“存什么”和“喂什么”不能混为一谈。 full transcript 完整会话记录指可恢复、可追溯的完整历史。文中强调它偏“持久化档案”。 distilled version 提炼后的版本适合写压缩后的高价值信息。文中说 working memory 不是原样拷贝，而是精炼摘要。 prompt reconstruction 提示词重建一个很系统化的说法。文中指出 compact transcript 是服务于下次继续对话的。 task continuity 任务连续性在 agent 场景里很关键。文中说 working memory 的目标是让任务不中断。

第 10 段 · 组件六：委派有用，但前提是“有边界”

Once an agent has tools and state, one of the next useful capabilities is delegation.

当一个智能体已经拥有工具和状态之后，下一项非常有价值的能力就是委派。

A subagent is only useful if it inherits enough context to do real work. But if we do not restrict it, multiple agents may duplicate work, touch the same files, or keep spawning more subagents.

子智能体只有在继承了足够上下文时才真正有用；但如果不给它边界，多个智能体就可能重复劳动、碰同一批文件，甚至不断继续生出更多子智能体。

The design challenge is not only how to spawn a subagent, but how to bind one.

因此，真正的设计难题并不只是“怎样生成一个子智能体”，而是“怎样把它约束住”。

重点词汇与表达

英文中文笔记 delegation 委派在 agent 语境中指把子任务分给别的 agent。文中强调它的价值在于并行化和解耦。 inherits enough context 继承足够的上下文适合写子系统能否真正完成任务。文中说没上下文的 subagent 基本没有用。 duplicate work 重复劳动很常见也很实用。文中指出 subagent 设计不好会产生低效协作。 spawning more subagents 继续生成更多子智能体带一点“失控扩散”的意味。文中借它提醒委派必须设深度和权限边界。 bind one 给它设边界 / 约束住它这是全文很有记忆点的表达。文中说真正难的不是 spawn，而是 bound。

第 11 段 · 最后的比较：OpenClaw 不是同一种产品，但有很多重叠

OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized coding assistant.

OpenClaw 更像是一个本地的通用智能体平台，它也能写代码，但并不是一个高度专门化的 coding assistant。

There are still several overlaps with a coding harness: it uses workspace instruction files, keeps session files, performs transcript compaction, and can spawn helper sessions and subagents.

不过它和 coding harness 依然有许多重叠之处，例如读取工作区指令文件、保存会话文件、做 transcript compaction，以及生成 helper session 和 subagent。

The emphasis is different: coding harnesses optimize for a person working in a repository, while OpenClaw optimizes for many long-lived local agents across chats, channels, and workspaces.

真正的区别在于优化目标不同：coding harness 主要优化“人在仓库里高效完成编码任务”，而 OpenClaw 更偏向“在多个聊天、频道和工作区里运行长生命周期本地智能体”。

重点词汇与表达

英文中文笔记 local, general agent platform 本地的通用智能体平台用于强调产品定位更广。文中说 OpenClaw 不是只为编程而生。 specialized coding assistant 专门化编程助手很适合拿来和更通用的平台做定位对比。 overlaps with 与……有重叠分析异同点时非常实用。文中强调两类系统并非完全不同阵营。 long-lived local agents 长生命周期本地智能体一个很有平台意味的说法。文中强调 OpenClaw 更重持续运行和多场景。 emphasis is different 侧重点不同是做系统对比时很稳的收束句型。文中没有说谁更好，而是说目标不同。

英文中文语境 coding agent 编程智能体执行软件任务的 agent 系统 agent harness 智能体脚手架围绕模型的控制与执行层 coding harness 编程脚手架面向软件工作的专用 harness live repo context 实时代码仓上下文当前仓库、分支、文档等稳定事实 prompt caching 提示词缓存复用稳定提示前缀，降低成本 structured tool use 结构化工具使用工具调用有验证、有边界 context bloat 上下文膨胀日志、文件、输出让 prompt 过载 transcript compaction 会话压缩把完整历史压缩成更短摘要 working memory 工作记忆任务连续性所需的精炼状态 full transcript 完整会话记录可恢复的完整历史 bounded delegation 有边界的委派子智能体并行，但受限制 runtime 运行时实际执行业务逻辑的系统层