Agent

最近看了吴恩达老师分享的Agent的设计模式，思考发现Reflection不止可以用在代码层面，在prompt上也可以有很好的应用。视频链接：What’s next for AI agentic workflows ft. Andrew Ng of AI Fund Reflection是一种使AI能够自我审视和分析其决策过程与行为表现的技术，让agent通过回顾自己的行为和接收的反馈，识别决策和知识的不足，进而调整和优化，以期在未来任务中表现得更好。一、框架 Actor接收环境的状态信息，结合短期记忆（轨迹）生成初步的内容或动作。通过内部反馈和外部反馈并结合自我反思机制进行反思，过程中利用长期记忆（经验）优化生成的的内容或动作。组成部分： Actor：根据观察到的状态输出必要的文本和动作。 Evaluator（评估器）：是核心模块，负责检验Actor创建的输出品质。评估器通过分析生成的结果，并基于任务的具体情境计算出一个奖励分数来评价这些结果的表现。 internal feedback：是与既定目标或标准进行比较得出的，目的是让模型能够在没有外部输入的情况下自我调整和优化。 External feedback：非Evaluator输出，来自真实世界的应用反馈、用户互动、专家评审或其他机器学习系统的输出。 Self-reflection：结合评估标准、内部反馈（过去的教训）、外部反馈（应用反馈/专家评审等）生成对于初始内容或动作的反思。 Memory：“轨迹历史"作为短期记忆，而自我反思模型的输出则被保存为长期记忆。这两种记忆类型的结合为agent提供了即具体又包含多次尝试中学习到的教训的上下文。记忆：当检索内容过多时，长期记忆可以帮助agent快速定位&检索。二、细节步骤： Actor通过与环境的交互生成一系列行动轨迹 τ0。评估器根据这些行动输出一个得分 r0，此得分通过公式 rt = Me(τ0) 计算得出，代表了该尝试的效果，其值会随着对应任务表现的提升而增加。完成初次尝试后，为了把这个得分 r0 转化为 LLM 可以利用来进行改进的具体反馈，自我反思模型分析这对 {τ0, r0}，并总结出一个摘要 sr0，随后将其保存在记忆库中。这个摘要 srt 提供了针对该次尝试的直接经验反馈。 Actor、Evaluator以及Self-reflection模型协同工作，通过重复的尝试循环不断优化，直至评估器判断最新的轨迹 τt 达到预期的正确性。根据反馈生成优化后的内容或行为。三、结论可以看到基模型在使用self-refine后对于不同工作的性能大幅提升。使用reflexion技术后，各种模型在HotPotQA数据集上的首次通过准确率均有所提高。四、示例 4.1 Decision-making Environment: You are in the middle of a room. Looking quickly around you, you see a bed 1, a desk 2, a desk 1, a drawer 6, a drawer 5, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a garbagecan 1, a laundryhamper 1, a safe 1, a shelf 6, a shelf 5, a shelf 4, a shelf 3, a shelf 2, and a shelf 1. Task: examine the mug with the desklamp. Trial #1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 -think: To solve the task, I need to find and take a mug, then find and use a desklamp. -think: First I need to find a mug. A mug is more likely to appear in drawer (1-6), desk (1-2), shelf (1-6), garbagecan (1), laundryhamper (1). I can check one by one, starting with drawer 1. -go to drawer 1 The drawer 1 is closed. ... -open drawer 6 You open the drawer 6. The drawer 6 is open. In it, you see a keychain 2. -go to desk 1 On the desk 1, you see a creditcard 3, a desklamp 1, a laptop 2, a mug 1, a pen 1, and a pencil 1. -think: Now I find a mug (1). Next, I need to take it. -take mug 1 from desk 1 You pick up the mug 1 from the desk 1. -think: Now I take a mug (1). Next, I need to find a desklamp. A desklamp is more likely to appear in desk (1-2), sidetable (1-2), shelf (1-6), bed (1), drawer (1-6). I can check one by one, starting with desk 1. -go to desk 1 Nothing happens. -go to desk 2 On the desk 2, you see a alarmclock 1, a bowl 1, a mug 3, a pencil 3, and a pencil 2. -think: Now I find a desklamp (1). Next, I need to use it. -use desklamp 1 Nothing happens. -use desklamp 1 Nothing happens. Status: Fail Reflection: In this environment, my plan was to find a mug then find and use a However, the task says to examine the mug with the I should have looked for the first, then looked for the mug. I noticed that the was found on desk 1. In the next trial, I will go to desk 1, find the lamp, then look for the mug and examine it with the desklamp. Trial #2 1 2 3 4 5 6 7 8 9 10 -go to desk 1 On the desk 1, you see a creditcard 3, a desklamp 1, a laptop 2, a mug 1, a pen 1, and a pencil 1. -think: To solve the task, I need to find and take a mug, then find and use a desklamp. -take mug 1 from desk 1 You pick up the mug 1 from the desk 1. -think: To solve the task, I need to find and take a mug, then find and use a desklamp. -use desklamp 1 You turn on the desklamp 1. Status: Success 通过反思trial1，trial2就直接从desk1开始了。 ...