目录
Contents
把文章交给 AI 解读 Hand Off the Essay to AI
-
STEP 1
复制提示词 Copy the prompt
-
STEP 2
前往 Agent Open an agent
-
STEP 1
下载文档 Download the markdown
下载 Markdown Download Markdown -
STEP 2
复制提示词 Copy the prompt
-
STEP 3
前往聊天模型 Open a chat model
第一章 · 四个现象
如果人们在过去一年去审视自己的信息消费,会发现几个现象同时发生。单看每一个都不算稀奇,把它们并在一起看,会指向一个没被充分讨论的结构性事实。
先看四个现象。
一、反刍链
每天打开小红书、公众号、X 的时候看到的 AI 相关内容,大部分是这样生产出来的:原始信息——一篇论文、一段代码、一次实验结果、某个从业者的一手观察——先被某个账号用 AI 初次总结;然后被另一个营销号挑选并加上一个留住注意力的钩子标题;然后被下一个账号用 AI 再重新加工成自己语气的版本;最后推送到终端读者面前。
你一定见过这样的标题:“又变天了”、“Gemini 杀疯了”、“百万失业倒计时”、“OpenAI 刚刚曝光”、“00 后用 AI 月入百万”、“硅谷连夜震动”、“这次是真的不一样了”——每一条都在制造紧迫感,点进去是几百字的空话。
这就是反刍链。每经过一层,三件事同时发生:信息密度下降(细节、条件、不确定性被摘要吃掉),误差累积(每次总结都会引入小偏差,多层之后还会相互强化),情绪浓度上升(为了在算法分发里活下来,每一层都必须加戏)。到达终端读者手里的是一个经过处理、营养降低、但变得更易消化的东西——一种内容形态的预制垃圾食品。
过去,这个链条上都是人,且速度和数量远远没有爆发。但今天我们看到,巨量的反刍垃圾正在被快速重新发明和制造出来。
人们每天以为自己在“接触信息”的时候,接触的是这条链的末端,距离原始信息可能已经有三四层加工。这个事实本身不算新闻,有人说过很多次。它是接下来三层现象的前提。
二、反刍消费的工具化
更值得盯着看的,不是反刍链本身,是消费者的反应。
面对无限供应的反刍内容,一部分人的反应不是“少消费点、去找一手源”,而是“我需要更高效的工具来消费更多”。
OpenClaw ——其本身的能力远不止于此,但它最广为传播的使用场景不是“帮我写代码”“帮我分析数据”,而是“帮我每天整理 AI 圈发生了什么”。用户在社交媒体上兴奋地展示自己的工作流:AI 帮我每天读几十个信息源、自动生成摘要、给我推送关键更新。
这件事表面看是提升效率,实际是一个奇怪的自我强化循环:一个 AI 工具,帮助我更高效地消费另一个 AI 吐出来的反刍物。主动吞食反刍物的效率本身成为了一种被宣传的成就。
读者的位置发生了一个微妙的变化——从“被动接受反刍”升级到“主动优化反刍消费管道”。看起来是主动性的提升,本质是消费量的放大。
三、生产者的分化
同时,在另一个不太重叠的圈子里,有一批人在做完全不同的事。
Andrej Karpathy 在 X 上曾经精确地描述过这个分化:对 AI 能力的判断,在不同人群里呈现极端的两极化。一边是主要用免费或旧版 ChatGPT 的人,他们的印象停留在“这东西会产生幻觉、会写出一堆 AI 垃圾内容”——社交媒体上那种“AI 连该走路还是开车去洗车都搞不清楚”的翻车例子,就是这群人每天看到的 AI。另一边是每月花几百美元、每天用 Claude Code / Codex 做真实技术工作的人,他们看到的是“一小时内重构整个代码库”“自主找出系统漏洞”——这个能力上限在过去一年里是爆炸式增长的。
两群人所处的现实差异巨大。前者的信息来源是反刍链末端的总结;后者的信息来源是自己和模型实时协作的体验。前者通过“我读了什么”判断 AI,后者通过“我让它做成了什么”判断 AI。
两群人用同一个词——“AI”——但指的不是同一件事。一方说“AI 也就那样”,另一方说“这东西正在爆炸”,两句话都是真的。差别不在模型,在使用者在做什么。
四、三层叠在一起看
单看任何一层,都只是一个局部观察。把三层并在一起看——反刍链、反刍消费的工具化、生产者分化——会出现一个没被单独拎出来讨论的问题。
消费反刍的人,不是在缺信息。他们接触的信息比任何时代的任何人都多。他们也不是“学不到东西”,因为产品设计就不是为了让他们学到。他们真正在做的,是用“消费信息”这个动作管理“我没跟上时代”的恐惧——一种焦虑的自我缓解。
做真正认知活动的人(Karpathy 说的第二群),不是因为他们消费的信息更多,而是因为他们在做另一件性质完全不同的事——他们的认知活动在启动。面对一个问题,他们会去推一推、试一试、验证一下,然后形成自己的判断。消费反刍的人不会做这些——他们只是读成品、记关键词、关掉、下一条。
所以三层现象合起来指向的不是“信息过载”(这是个陈词滥调),也不是“AI 让人变懒”(这是一种回避)。指向的是一件更尖锐的事实——大部分人把知识、经验、熟练度错当成了认知能力本身。他们从未识别过什么是真正的认知活动。
什么叫真正的认知活动?先排除几种常见的错认——它不是脑子在转(脑子一直在转,人活着就在转);不是读了很多、记了很多(那是知识储备);不是某个领域做久了很熟练(那是经验);不是表达流畅(那是语言能力)。这些东西过去被误认为认知能力,是因为在老环境里它们常常和认知能力一起出现——但它们不是认知能力本身。
真正的认知活动指的是:产生可以被现实校准的、结构性判断的那种活动。形成具体的、能被后来的事实验证或反驳的判断。在一个陌生问题上识别深层结构。把自己此刻的判断和现实之间的距离,看成一个可以操作的对象。这个意义上的认知活动才是 AI 时代被剥离出来单独标价的东西。大部分人从未意识到这个区别的存在——没有意识到,就没有机会做。
过去的评价体系不需要它启动。经验、模式、熟练、博闻强记、流畅表达、社交灵敏度——这些旧的“聪明”和“能力”的表达方式,没有一项需要真正的认知活动。一个精明的生意人、一个博学的编辑、一个能干的中层经理,他们的认知系统里真正的反思可能从未发生过一次。但在老环境里,他们确实是高能力者——因为老环境就是奖励这些的,而且奖励得不错。
AI 时代的变化不是“让人变懒”,而是第一次把“认知活动本身”从其他所有能力里剥离出来、单独标价。当知识储备被模型摊平、重复性任务被吃掉、判断成品的生产成本降到零之后,剩下的那部分——把现实切成结构、识别输出里的错误、在陌生问题上做真正的抽象——大部分人根本没有。而他们自己从前的成功经验告诉他们“我是有能力的”。
这不是一个新问题。它是一个一直存在但被旧评价体系遮盖的事实,第一次在新技术条件下暴露出来。不是因为人突然变蠢了,而是因为**旧的“有能力感”的来源正在消失。
第二章 · AI 其实没那么均匀强
在进入主话题之前,有一个常见的误解需要先处理
这个误解是:AI 已经全方位超过了人类,所以大部分人在它面前变得渺小也是正常的、被动消费反刍也算理所当然。
这个判断只对了一半,另一半完全错。
AI 强在哪里。在结构化、可测量的任务上,AI 已经稳定超过中位数人类,甚至超过很多专家。数学奥赛(AIME)、博士级科学题(GPQA)、真实软件工程任务(SWE-Bench)、各种语言理解测试——这些基准在过去一年里被前沿模型陆续突破。加上知识检索、流畅表达、按指令执行这类能力,AI 远超大部分人。对于这些方向的工作,最好的用法是把 AI 当一个超级杠杆——它能把一个有能力的人的产出放大十倍以上。
AI 弱在哪里。同样多的证据显示,AI 的强是尖锐的、不均匀的。把一道题的条件稍微改一下(换成表面不相关但结构相同的场景),模型的表现会突然崩溃——这种脆弱在 SimpleBench、BrainBench 这类基准里反复出现,模型在日常常识问题上连中位数成人都不如。还有一整类任务它做得不好:没有清晰对错的判断(这个设计好不好、这个决策对不对)、处理模糊的现实(一段混乱的反馈到底在说什么)、在完全陌生的结构上做抽象(一个从未见过的问题类型如何拆解)。
为什么是这样不均匀。有一个技术上的原因。前沿模型的主要进步来自强化学习——给模型“这个对、那个错”的信号让它校准。这种训练在有明确对错的领域极其有效:代码跑不跑、数学答案对不对、测试过不过——可验证奖励,信号清晰,训练效率高。反过来,没有清晰对错的领域——好的判断、恰当的分寸、审美——根本没办法给可靠的奖励信号。你怎么教一个模型“这段文字感觉不对”?没法。所以这些领域的进步远慢于可验证领域。再叠加商业动机:代码和结构化任务直接带来 B2B 收入,所以算力和研究资源绝大部分投在这里。结果是 AI 的能力在“可验证 + 商业价值高”的领域指数增长,在其他领域进步缓慢。这个技术解释也是 Karpathy 在 X 上反复讲过的——第一章提到的两群人对 AI 能力的判断差异,本质是他们在用同一个模型的不同维度。
真正的推论:AI 是放大器,不是均衡器。它放大的不是“人类的整体能力”,而是“使用者这一侧能向它输入什么结构”。给一个已经能拆解问题、能判断输出对错的人,AI 能把他的产出放大十倍以上;给一个只会让 AI “写点东西”的人,AI 吐出来的是反刍物,放大的是他的焦虑管理频率。
最关键的:AI 没有替代“构造问题”这件事。
这里需要主动处理一个合理的反驳——“AI 在强化学习里已经展现出结构抽象能力了:o 系列推理模型解数学大题时会自己推翻错误假设、重构解题框架;AlphaGo 早就走出过人类从未见过的新定式。你说 AI 不能做结构抽象,是不是过时了?”
这个观察是对的。但这里有一个致命的区分必须讲清楚。
AI 的结构抽象永远发生在给定的奖励函数之下。围棋的目标是赢、数学题的目标是对、代码的目标是跑通——在这些边界清晰、胜利条件已经被人类定义好的封闭空间里,AI 可以比人类更好地寻找结构。这是一种工具性的去耦——在已定义的游戏里找最优解。AI 在这一层已经在超过人类,也会继续超过。
但还有另一层去耦——定义什么是值得追求的东西、什么问题值得问。这个产品到底解决了什么隐秘的人类需求?这项政策会引发何种人性反弹?一段混乱的用户反馈到底在说什么?一个组织长期问题的真正根源在哪里?这一层没有现成的奖励函数——因为“什么是对的”这件事本身要先由人来裁决。AI 可以生成一万种逻辑严密的结构,但判断哪一种真正对应了现实世界里的痛点,这一步的裁决者必须是一个处在现实中、要承担后果、有肉身的主体。没有这个主体,奖励函数从何而来?
所以 AI 抢走的是“在已定义的游戏里寻找最优结构”这件事(这件事人类本来也不擅长,让给 AI 是合理的);它没有抢走、也无法自发产生“决定玩哪个游戏”这件事——后者的结构性本身就不属于 AI 能触达的层。
回到分工层面——什么问题值得问、哪个方向是对的、输出是否真的解决了问题——这些判断现在仍然完全依赖人。而这类判断恰好是第一章最后一节说的那件事——需要真正的认知活动参与才能做的事。
所以“AI 太强、我跟不上”是一个偷懒的叙事。它在你该放弃的事情上(记忆、流畅表达、按指令执行)确实超过你了;但它在你真正该参与的事情上(判断、构造、审美、结构抽象)并没有。那些在能参与的地方选择放弃的人,才是真正被 AI 甩开的人。而他们常常不自知——因为反刍消费给了他们一种“我还在跟上”的错觉。
第三章 · 反刍链里人的位置:学习还是焦虑管理?
如果 AI 是放大器而不是均衡器,“不同使用者的产出差距在拉大”就是一个必然推论——放大器把使用者这一端的原有差距按倍数放出。但这还不足以解释为什么两群人的差距如此之大、而且方向完全相反:一边是通过 AI 做出复杂系统的人,另一边不是“AI 用得少”,是通过 AI 消费越来越多却产出越来越少。
这个反向动力学需要一个额外的解释。它不是“能力差”或“使用技巧差”的问题。真正的原因是——在反刍链的末端,大部分人做的根本不是学习。他们在做另一件看起来很像学习、但机制完全相反的事。
学习和焦虑管理
先把两件事的定义干净地放在一起。
学习是这样一个闭环:遇到一件不明白的事 → 自己先形成一个初步判断 → 带着具体问题去找信息 → 用信息校准判断 → 形成新的、自己拥有的理解。四个动作缺一不可。其中前置判断和校准动作是整个过程的核心——没有这两步,进来的信息不会被任何已有结构吸收,就像水倒进没有容器的地方,过一会儿就干了,什么都没留下。
焦虑管理是另一个闭环:感到落后 → 消费信息 → 产生“我在跟上”的感觉 → 焦虑短暂缓解 → 因为没有任何东西被内化,一段时间后再次感到落后 → 再次消费。这个循环里没有前置判断、没有校准、没有内化。它的运行逻辑和学习完全不同——学习的产物是认知结构,焦虑管理的产物是情绪缓和。
两件事在表面上很像,因为都伴随“读了东西”这个动作。但在机制上完全相反。更麻烦的是——大脑无法区分“读到一个结论”和“自己想到一个结论”。读一篇总结得头头是道的文章,大脑会产生一种“我现在理解了这件事”的感觉,和自己真正推导出一个结论时的满足感几乎一致。但前者只是一瞬间的错觉,文章关掉几天后什么都留不下;后者是真实的认知内化,会长期改变一个人对这件事的判断方式。
这个脑内机制是整条反刍链能够持续运转的基础——它让焦虑管理感觉上和学习一样,所以消费者每天都有“今天没白过”的确认,而实际上什么都没有被真正吸收。
反刍链的真正目的
第一章已经讲过反刍链的机制——每一层加工都在让信息密度下降、误差累积、情绪浓度上升。但更关键的事实是:这整条生产链的设计目标不是传递信息,是留住注意力。
这个区分很重要。一个为传递信息而生产的东西(比如学术论文、严谨的实证报告、技术文档),它在设计上会优先保留“让读者能真正理解和使用”的内容——定义、条件、反例、不确定性。这些东西在注意力经济里是负资产,因为它们降低阅读流畅度、增加认知负担。所以一个为留住注意力而生产的东西,在设计上会系统性地删掉这些东西。
这意味着——即使花再多时间认真读这些内容,也学不到东西。不是读者不够努力,是这个产品从设计上就不是为让人学到东西而造的。你读得越认真,越是在一个不设输出口的房间里转圈。
人在这个闭环里的位置
现在把焦虑管理的循环和反刍链叠在一起看,可以精确地描述大部分人每天在做的事:
打开一个 app → 刷到一个带钩子标题的判断成品 → 快速读完 → 记住一两个关键词或情绪印象 → 关掉 → 获得“我今天跟上了”的感觉 → 焦虑短暂缓解。
这整个过程里,没有任何一步是学习。没有前置判断(读之前不知道自己要验证什么),没有校准(没有拿信息和任何已有判断对照),没有内化(关掉之后什么都没留下,下次被问起复述不出来)。全部的动作都是情绪调节——用一次“接触信息”的动作,完成一次“我没落下”的心理仪式。
这里有一个残酷的验证方法:随便找一个每天刷 AI 动态的人,让他复述三天前读到的任何一条内容的核心观点和他对它的判断。
绝大多数人做不到。
这不是记忆力问题。是那些内容从来没有被加工过,所以没有任何东西可以被记住。大脑只记得住被自己处理过的东西——被判断过、被反驳过、被对照过已有理解的东西。被动流过去的信息,不管当时感觉有多“信息量大”,几天之后都不存在。
对照着看真正在做 AI 相关工作的人,会发现一个完全相反的现象。他们不消费反刍内容——因为每天手上的实验、模型反馈、代码调试,已经在提供远高于任何二手总结的信息密度。他们对一个模型能做什么、不能做什么的判断,来自昨天自己让它做失败的一件事,而不是别人写的评测。
更反常的是:消费反刍的人越消费越感觉落后,做事的人越做越感觉清晰。这不是因为做事的人懂更多——是因为两种活动的信息处理方向完全相反。焦虑管理是只进不出,信息变成情绪消耗掉了;真正的工作是处理输入并产生输出,每一次处理都在加固认知结构。前者的人对 AI 的印象是“东西太多了、跟不上了、又出新的了”;后者的人对 AI 的印象是“我昨天发现它在 X 上特别好、在 Y 上还是不行”——具体、清晰、有边界。
两群人用同一个词,但背后对应的是完全不同的心智对象。
当消费反刍变成成就
把这个诊断推到底,会遇到一个值得盯着看的现象。
OpenClaw 这类工具最广为传播的用途,是“帮我每天整理 AI 圈发生的事”。看起来是提升了效率,这是以前做不到事情,但仔细想——这是一件值得被自豪宣传的成就吗?
在前面的分析框架下,它的意义变成了:一个工具帮我更高效地、持续地消费反刍出的低密度内容;而我把这件事当作值得分享的使用案例。主动吞食反刍物的效率本身成为了一种生产力表达。
这不是某个工具的问题。这是一个更深的集体信念的外化:跟上信息等于有能力。在前 AI 时代这个信念有一定依据——信息稀缺,能持续获取一手信息本身是稀缺技能。但在 AI 让判断成品的生产成本降到零之后,这个信念变成了一个纯粹的伪需求。你能消费的内容无穷多,但没有一份是为了让你学到什么而造的。
于是出现了一个结构上很讽刺的画面:AI 让信息供给从匮乏变成无限,一部分人的反应不是“终于可以从消费信息转向产生判断”,而是“我需要更强的工具来消费更多信息”。工具越强,消费得越快;消费得越快,接触的信息维度越大;接触维度越大,焦虑就越深——因为每消费一条都在暗示还有十条没消费到。
当判断成品的生产成本降到零,“消费判断成品的效率”就成了伪需求的最后堡垒。这个堡垒之所以还立着,不是因为它提供任何真正的价值,而是因为它提供了一种可以被展示的忙碌——一种在 AI 时代看起来像“跟上了”的表演。
而做真正认知活动的人,早已不在这个表演里。他们甚至根本不观看这个表演。
第四章 · 一个更隐蔽的陷阱:知识管理
反刍消费是明显的形态——刷、读、下一条。但它还有一个更隐蔽的变体:不是被动吸收,而是主动建构;不是刷信息,而是整理信息。因为它伪装得更像“正经事”,所以对认真的人杀伤力更大。
这个变体叫知识管理。
为什么它在 AI 时代变成伪命题
“知识管理”这个词里藏着一个旧时代的假设——知识是一种可以被管理的静态对象,像图书馆里的书一样,能被归类、索引、检索。过去几十年的整个知识管理工具链(Evernote、Notion、Roam、Obsidian、Logseq)都建立在这个假设上。
但真实的认知过程不是这样的。脑子里的“知识”不是静态存储的信息,是一张不断被重构的关系网络。“理解”不等于“记住”,“能用”不等于“能找到”。这个错位让知识管理从一开始就在做一件错位的事。
几个机制在过去几年同时发生,让这件错位的事彻底变成了伪命题:
检索问题被 AI 解决了。过去整理笔记的核心目的之一是“以后能找到”。现在的 AI 几乎了解世界上所有的文本信息,检索成本降到几乎为零。整个“为检索而整理”的传统目的,80% 已经不成立。
记忆外置的陷阱。心理学上有个现象叫 Google effect——知道“信息被保存了”会让大脑更少真的记住它。这个效应在重度笔记用户身上特别明显:笔记里有 ≠ 脑子里有。能被调用的知识是脑子里的活跃模型,不是笔记里的标签。大量重度笔记用户的脑内活跃知识反而在退化,因为他们把大脑当成了索引而不是工作台。
整理笔记是最高级的焦虑管理。这件事在焦虑缓解量表上得分极高——它看起来像工作、有成就感(“今天新增了 12 个 backlink”)、不需要承担任何判断风险、提供“我在成长”的感觉但完全不需要输出。这比刷反刍内容更危险,因为它有更强的“我在做正经事”的伪装。
整理笔记和形成判断是两件不同的活动。Luhmann 的 Zettelkasten 之所以产生了他 70 多本书的核心材料,关键不是他用了什么系统,而是他每张卡片写的都是自己的思考——每张卡片都是一个小判断、一次对已有知识的校准。这个系统的本质不是“管理知识”,是“强制思考”。现代人用 Obsidian 大多是在管理别人的思考——高亮、摘抄、引用——这和 Luhmann 做的事没有任何共同点,只是借了同一个视觉形式。
Karpathy 的 wiki 方案:它真正在解决什么
在这个背景下,Andrej Karpathy 在 2026 年 4 月初提出过一个 LLM Wiki 的方案:把原始资料扔到 raw/ 文件夹、让 LLM 自动编译成结构化的 markdown wiki、人只负责策展输入内容。他自己一个研究话题的 wiki 已经生长到约 100 篇文章、40 万字,他几乎不亲手编辑。
这个方案核心解决三件事:session 之间的上下文丢失、笔记维护的不可持续性(LLM 做 bookkeeping 比人强)、知识不复利(每次新资料进来自动更新多个已有页面、建立 cross-reference)。
听起来像是对“知识管理是伪命题”的反驳——毕竟他做的是一种知识管理,而且在他身上真的管用。
但这里有一个最底层的事实决定了它对他管用的真实原因:Karpathy 是科学家。
他的 raw sources 是 arxiv 论文、实验结果、代码、他自己还没发表的工作——这些都是大模型训练数据里没有或者已经过时的内容。模型训练数据有 cutoff,前沿研究的最新进展模型常常不知道,而科学家恰好生活在这个 cutoff 之后的信息空间里。他不是在“管理知识”,是在维护一个模型知识的 delta——把前沿补充到已有底座之上。
这和知识管理社区那套“我要管理我读过的所有内容”完全是两件事。
普通人复制它为什么失败
理解了 Karpathy 的前提,就能理解普通追随者复制他方案时为什么会变成新的焦虑管理。
普通人感兴趣的内容——商业新闻、AI 动态、科普、管理方法论、行业分析——模型基本都知道得比他们多。他们试图“管理”的“知识”其实是模型训练数据里早就有的东西的反刍版本。
在这种情况下复制 Karpathy 的方案会变成什么?让 AI 从已经被反刍过的内容里生成一个看起来结构化的 wiki——这是在给反刍链又加了一层。产物看起来更像“知识”、更像“系统”、更像“研究”——但离原始信息更远、更空。
而且由于 wiki 的视觉结构感比笔记更强,它产生的“我在学习”的错觉也更强。一个人每天看着自己的 wiki 在变大、cross-reference 在增加,会比刷小红书有更强的“我在成长”的感觉。但真正发生的事情,和小红书读者本质上一样——他都在消费自己不能真正调用的信息。
区别只在于一个是在公开产品上消费,一个是在自己搭建的系统里消费。后者更 cope,不是更健康。
什么是合理的个人信息基础设施
要避开这个陷阱,需要问一个根本的问题:模型相对于我,到底缺什么?
大部分知识管理实践从来不问这个问题。他们只问“我应该记录点什么”,然后记录了一堆模型其实知道得比他们更多、更准的东西。
这个问题的合理答案只有两类:
第一类 · 学科前沿的 delta。你在一个快速变化的前沿领域工作——研究、尖端工程、未被训练数据覆盖的新兴实践。在这种情况下,维护一个持续更新的知识底座有真实价值,因为你积累的是模型不知道的东西。Karpathy 的 wiki 方案适合这类场景。前提是——你真的在做前沿工作、你的 raw sources 是一手的、你有判断能力去检验 wiki 是否偏了。
第二类 · 个人独特认知的 delta。你不在前沿工作,但你有自己对具体事物的判断、偏好、非共识模型、个人经验教训——这些是通用模型从训练数据里无法生成的。这种情况下合理的做法是最小对齐层:只记录“模型不已然知晓的东西”,让下次和模型协作时它能从你的上一次终点开始,而不是从零。
两类方案看起来很不同,底层逻辑完全一致——都是在回答同一个问题:“模型相对于我,缺什么?”只是两种人的“缺”落在不同的信息分布上。
第二类方案有一个反直觉但重要的特性:它无法被表演。因为它的准入门槛是“我能识别什么是模型不知道的”——不具备这个判断力的人用不了这个方法(他们会发现自己其实没什么可记的);具备这个判断力的人记录下来的内容一定有效。这个自我筛选机制比方法本身还值钱——它从结构上就防止了这个方法变成新的反刍仪式。
对大多数人,第二类方案是合理的起点。在使用过程中如果发现“模型在我的领域里知识不够用”,再考虑加一层 Karpathy 式的学科底座。顺序反过来则容易陷入仪式式的知识管理陷阱——这也是整个知识管理社区犯的最大错误:他们建议新人先建底座,结果新人花几个月整理别人的判断,认知活动反而被整理动作本身替代。
一个更通用的原则
把知识管理这个具体话题推到最一般的层面,会得到一个能用来评估几乎所有未来“新方案”的原则:
在 AI 时代,任何不要求使用者具备独立判断能力的方法论,都已经或正在变成反刍消费的一种变体。
区别不在方法论的内容——它可以是提升效率、促进学习、组织知识、加强创造力。区别在使用者是不是把判断外包给了这个方法论。
这也解释了一个普遍现象:所有“方法论的流行”本质上都是认知外包的集体仪式。一个方法论越流行,就越说明它满足的不是具体的工具需求,而是“不用自己判断”的需求。真正有效的方法论往往不流行,因为它们太依赖使用者的具体情境,无法被批量复制。
Karpathy 的 wiki 方案本身——在他自己身上——是有效的。但它流行起来之后产生的那批“Karpathy Wiki 教程”、“Second Brain 2.0”、“AI 知识管理工作流”等等已经在变成新一轮的表演方案。区别的判断标准很简单:这个方案在你身上用多久之后,开始让你感觉“我不用亲自判断”? 这一刻就是它开始腐烂的时间点。
第五章 · 他们都说对了什么:通才、品味、智力的重新折叠
到这里,反刍消费的两种形态(被动刷信息与主动整理知识)都拆过了。它们看起来不同,本质都在做焦虑管理而不是学习。浮上来的就是那个一直被绕开的根本问题——什么才是真正重要的能力?
这个问题其实在 AI 之前就已经被多次触及。不同的圈子里,有几种流行的说法在从不同角度描述同一件事——但每一种都只描述了其中的一部分,而且都没意识到自己在描述的其实是同一个东西。把它们放在一起看,会出现一个更完整的图景。
通才论
近几年在创作者和独立开发者圈子里,“通才”这个概念被重新推到前台。最常被引用的版本来自 Dan Koe——他的核心论点是:专业化正在贬值,多领域的通才反而在变得稀缺且有价值。
他说对的那部分:广度不是杂学。真正的通才不是“什么都知道一点”,而是在多个领域都积累到足够深度,因此能看到不同领域之间的同构结构。一个同时深入研究过经济激励、生物进化、组织行为的人,会发现这三个领域的底层机制大量重叠——因为它们都在讲“多个行动者在约束条件下追求自身目标时,系统层面会涌现出什么”。识别出这种跨域同构,是通才的真正价值所在,也是把广度转化为判断力的唯一路径。
他没说清的那部分:什么样的深度才算数。这是通才论最模糊的一块,也是追随者最容易踩坑的地方。有人按通才论去扩展自己的阅读,读了十个领域各一本入门书,最后脑子里装的是十个领域的表面叙事——这种“深度”对跨域映射毫无用处,因为表面叙事在不同领域里本来就不一样,没有同构可言。真正能被映射的是深层结构——一个领域里真正决定行为的因果机制、激励约束、反馈回路。一个可操作的深度标准是:你能不能说出一个领域的从业者集体相信但实际上是错的一件事。说得出的,是有深度;说不出的,还停在标准叙事层。
他留下的漏洞:通才论没把做跨域映射的底层机制单独拎出来。它读起来像一个方法论——多读几个领域的书、建立联系——但它暗含一个前提:读者有能力做这种映射。而这个能力本身才是稀缺的。这解释了为什么同样的方法论,有人用出来了、有人用不出来:差的不是努力,是映射能力本身。
品味(taste)
另一条线来自创作者和设计圈——Paul Graham、乔布斯、Rick Rubin 这一路反复谈论的 taste。这个词被翻译成中文的“品味”之后容易被理解成一种模糊的审美感受,但他们实际在描述的是一个更具体的机制。
他们说对的那部分:在你能清楚说出理由之前,你已经能判断一个东西好不好。这种前语言的判断不是神秘主义,它来自大量高质量样本的长期内化。一个从小在博物馆里长大的人,看到一幅画能在几秒内判断它的好坏——不是因为他懂艺术史,而是因为他的视觉系统已经被大量高质量画作校准过了。这种校准的产物就是品味。
但他们描述的是低阶品味——纯粹的模式识别。一个人能判断“这个字体不好看”,但让他解释为什么,他说不出。这种品味是真实的、也有用——它能让一个人在自己熟悉的领域快速过滤大量选项——但它有一个严重的限制:不可迁移、不可教、不可验证。你没办法把你的品味传给别人,也没办法在一个新领域从零建立品味,因为没有足够的样本。
他们没说清的那部分:品味其实有两层。低阶品味是前语言的模式识别,高阶品味是模式识别 + 需要时能拆出结构化理由。一个真正厉害的设计师不仅能说“这个字体难看”,还能分析“是因为它的 x-height 和字重的比例破坏了视觉节奏”。高阶品味之所以重要,是因为它可教、可迁移、可在新领域加速建立新品味——它把直觉背后的结构显式化了,于是结构可以被学习。
高阶品味和通才论在底层其实是同一个能力——都是把一个具体判断抽象成可操作的结构,然后把这个结构用在新的地方。不同的是,通才论从“跨域”的角度谈它,品味论从“审美直觉”的角度谈它。两者说的是同一个机制的两个侧面。
认知折叠论
第三条线在技术圈和独立创作者圈子里越来越常见。它的核心论点简单粗暴:AI 不会让差距缩小,只会让差距以前所未有的速度拉开。类比的对象通常是工业革命——蒸汽机和后来的装配线让体力劳动的相对价值急剧下降,能设计机器、能组织生产、能调度系统的人,和只能出卖体力的人之间的差距从线性扩大变成了指数分化。现在轮到认知了。AI 正在摊平一批认知能力(知识、记忆、检索、流畅表达),剩下那一部分不被摊平的会被爆炸式放大。结果不是“贫富差距变大”,是“不同人群的产出在同一个单位时间里被折叠到完全不同的量级”。
他们说对的那部分:差距会拉开,且速度前所未有。这个判断是对的,而且类比工业革命非常合适。历史上每一次通用技术跃迁之后,掌握了新杠杆的人和没掌握的人之间的差距都不是线性扩大,是指数分化——蒸汽机如此、电如此、互联网如此,AI 只会更极端,因为它是第一次直接作用于认知本身,而认知恰好是判断、创造、决策的源头。被放大的那部分,会比历史上任何一次都显眼。
他们没说清的那部分:被放大的到底是哪种认知能力。
这是整条论述最模糊的一块,也是大部分追随者会踩坑的地方。他们默认的假设通常是“高认知 = 高 IQ”——所以结论变成“高 IQ 的人会越来越富、低 IQ 的人会越来越被甩开”。这个结论的前半句大致对,但原因完全错。
被放大的那件事不是 IQ。IQ 测的是原始处理能力——工作记忆、处理速度、信息提取、按规则推演——这些 AI 现在已经比绝大多数人强。一个单纯 IQ 高但从不做真正认知活动的人,在 AI 时代反而会被最快淘汰——因为他过去用 IQ 做的那些事(快速学习、记忆调取、流畅推理),AI 都做得更快更准更便宜。他的优势直接被摊平。
真正被放大的是另一层——在一个陌生问题上识别深层结构、判断什么问题值得问、在模糊的现实里切出可操作的对象。这件事和 IQ 相关但不等同。一个 IQ 中等但识别出了真正的认知活动是什么、并持续投入去做的人,比一个 IQ 高却一辈子活在模式识别舒适区里的人,会被 AI 放大得更多。
他们留下的漏洞:“高认知”被当成了一个先天的、固定的属性。这让整个“折叠”叙事听起来像一个不可抗力——高认知的人会赢,低认知的人会被甩开,没有中间地带、没有可操作空间。
但真正的分化不是“高认知 vs 低认知”的先天分化,是“识别出了 vs 没识别出”的意识分化。前者是一个不可改变的命运,后者是一个可以跨越的门槛——虽然跨越这个门槛本身不容易,但它在原理上是开放的。认知折叠论里最让人绝望的那种命运感,大部分来自于把“可跨越的门槛”误认成了“不可改变的先天属性”。
三条线在说同一件事
把三条线放在一起看。
通才论说的“跨域映射能力”——那个让多领域深度变成判断力的底层机制——其实就是把多个领域的表象剥离、抓住它们共同的深层结构的能力。没有这种能力,读十个领域只能得到十堆碎片;有了它,三个领域就能跨出新的判断。
品味论说的“高阶品味”——那个把直觉背后的结构显式化的能力——其实是同一种能力作用在大量样本内化之上的结果。低阶品味只需要样本,高阶品味需要样本加上把模式抽象成结构的能力。
认知折叠论说的“不被 AI 替代、反而被 AI 放大”的那一层——其实也是这个能力。AI 摊平了知识、记忆、流畅表达、按指令执行之后,唯一没被摊平的就是把现实从它的表象里剥离、当作结构来操作的那个动作。被放大的就是它。
三条线从完全不同的角度触及了同一件事,只是给它起了不同的名字——通才、品味、折叠。核心都是同一件东西:把具体情境抽象成可操作结构的能力。
认知科学里有一个专门的名字给它:认知去耦——把一个表征从它所指的现实里剥离、当作独立对象来操作。所有抽象、假设、反事实推理、自我审视都建立在它上面。三条流行说法都在从不同角度描述它,只是每一条都只摸到了其中一部分。
这件事过去一直存在,但一直没有被单独标价。因为在前 AI 时代,它和大量其他能力混在一起——和知识储备混、和记忆力混、和流畅表达混、和熟练度混。一个“聪明”的人通常这些都有,但没人知道其中哪一项才是真正起作用的。所以每一种流行说法都在盲人摸象——摸到哪一部分就用哪一部分的语言去描述它。
AI 时代第一次让这件事可见。因为 AI 替代了知识储备(它知道的比任何人多),替代了记忆力(它能随时调取),替代了流畅表达(它写得比大部分人好),替代了熟练度(它不疲劳、不出错、不需要练习)。这些能力一个个被剥离之后,剩下的就是那个一直被遮盖的底层机制——把现实切成结构、判断哪个结构是对的、在陌生领域里建立新结构的能力。
这就是通才论、品味论、认知折叠论三条线共同指向的那件事。它不是新出现的能力,是一直存在但第一次被独立命名的能力。
AI 能干什么,人剩下什么,要学什么
把这个判断推到实用层面。
AI 现在能干的事情范围在迅速扩大:模式合成、信息检索、流畅表达、按指令执行,以及在可验证奖励领域(代码、数学、结构化推理)里的复杂任务。这个范围每隔几个月都在扩张,且没有明显的边界。
AI 做不了的事情范围在缩小,但有几件事它现在做不了、近期也做不了:把模糊的现实切成可操作的问题(问题构造)、识别自己输出里的结构性错误(元判断)、在没有外部奖励信号时判断什么是好的(审美和价值权衡)、在完全陌生的结构上做真正的抽象(去耦本身)。
前三条都依赖最后一条。没有真正的去耦能力,前三条都只是在高维表征空间里做插值模拟——看起来像,但本质不是。
所以 AI 时代人真正剩下的是什么?指挥模型的那一层——提出正确的问题、判断输出是否真的解决了那个问题、在模型走偏时把它拉回来、对最终结果做价值决定。这一层的全部工作,都建立在认知去耦这个底层机制上。
然后是最关键的问题:要学的到底是什么?
不是更多知识——模型知道的比任何人多。 不是更流畅的表达——模型写得比大部分人好。 不是更多的“思维框架”——市面上大部分思维框架是装成工具的话术。
真正要学的是少数几件具体的事:几种核心的推理工具(让判断不靠直觉也能做)、几个非本行领域的规律性理解(让跨域映射有东西可映射)、把自己放进会被现实校准的环境(让所有这些不会慢慢腐烂)。加上那个部分不可训练的底层机制——认知去耦。
四件事各自独立,相互作用。构成一个完整的系统。
第六章 · 四元公式:逐项讲透
去耦本身只是一个算子。一个人的认知产出能力由四项东西共同决定,它们之间的关系不是加法,是乘法。
硬件 × 软件 × 数据库 × 运行环境。
任何一项为零,整体就是零。这解释了为什么绝大多数“提升认知”的努力失败——那些努力通常只作用于四项里的一项,而乘法意味着其他三项的短板会彻底抵消这项的增益。一个天生聪明但从不学推理工具的人,和一个装了一堆推理工具但从不让判断碰到反馈的人,两者的产出都会接近零,只是失败的方式不同。
下面把四项分别拆开。
第一项 · 认知去耦(硬件)
认知去耦是把一个表征从它所指的现实里剥离、当作独立对象来操作的能力。所有抽象、假设、反事实推理、自我审视都建立在这个能力之上。没有它,一个人永远和眼前的具体刺激绑在一起思考——“我的想法”和“事实”混在一起,“这个情境”和“它背后的结构”无法分开。
心理学上这项能力最接近的概念叫流体智力——面对完全陌生问题时、在没有任何先验经验的情况下识别结构的能力。注意它和“知识储备”没有关系。一个博学的人流体智力不一定高,一个读书不多的人流体智力也可能很高。它是纯粹的“遇到没见过的东西能不能自己想出结构”的底层能力。
去耦能力在日常中有几个可观察的信号。给两个表面描述不同但底层结构相同的问题,去耦强的人在两者上表现接近,弱的人会随着表面差异的增大而崩溃,且自己意识不到崩溃。反事实推理的稳定性是另一个信号——“如果 X 没发生、其他保持不变,你能推出什么”——能稳定持有“X 没发生”这个假设并推演的人,去耦在线;立刻被现实覆盖掉假设的人,去耦不在线。还有一个极简单的代理:面对完全陌生的问题时的第一反应。去耦强的人会开始做结构抽象(“这类问题的一般形式是什么”),弱的人会卡在“这我不懂”,或者用最表面的相似性硬套。
这里必须无情。
流体智力的上限由基因决定了大部分。大量比较双胞胎(一组是同卵双胞胎,基因几乎完全相同;一组是异卵双胞胎,基因差异和普通兄弟姐妹一样)的研究显示,这项能力的遗传度远高于人们愿意承认的程度。而且有一个反常的现象——越老越像遗传决定的。童年和青少年阶段还有环境的空间,但随着年龄增长,基因的决定作用反而越来越强,成年之后这项能力基本锁定。
过去二十年有大量号称能“提升智力”的训练产品,各种大脑训练游戏、工作记忆训练 app、思维训练课程。后续的大规模研究基本否定了这条路径:你练什么就在什么上进步,但这个进步无法迁移到真正的推理任务上。练大脑训练游戏会让你更擅长这个游戏,但不会让你面对陌生问题时想得更清楚。
这意味着市面上几乎所有承诺“提升认知能力”“训练思维”“提高智商”的课程和产品,都在出售幻觉。成年之后,这一项基本锁定。
所以能做的只有两件事。
第一,诚实识别自己在这一项的位置。不是为了放弃,是为了合理分配后面三项的投入——一个流体智力中等的人和一个流体智力很高的人,装上同样的推理工具之后,输出的复杂度上限不同。假装上限不存在,只会让人在错误的目标上消耗自己。
第二,不要让这一项的限制污染对其他三项的判断。后面三项都可训练,而且它们决定了一个人距离自己上限还有多远——这个距离对绝大多数人来说远大于上限本身的差距。
这里必须加一个诚实的附注:流体智力不仅决定去耦能力的上限,也影响后面三项的训练效率本身。装推理工具的速度、装到什么深度、能不能迁移到新领域——这些不是匀速发展的,流体智力高的人会装得更快、理解得更深、迁移得更广。这个规律在认知科学里有扎实的实证支持:学习任务越复杂,流体智力和学习速率的相关性越强;在技能习得的早期阶段,流体智力解释复杂问题解决能力差异的 30% 到 40%。
所以更诚实的表述是——这不是“每个人努力一下就能追平那些聪明人”。天花板之下的空间对所有人都开放,但空间的形状和攀爬的斜率不同。流体智力中等偏下的人,装齐推理工具可能需要比后文给的数字(6 到 12 个月)更长的时间;最终能达到的深度也会比流体智力高的人浅一些。
但这并不改变核心的事实:绝大多数人——包括流体智力高的人——都没有把自己可达到的空间用完。不去用,是真正的浪费。至于“能不能用到最顶端”,那是另一个话题,而且对大多数人来说不相关。
过去那些让人感觉“我其实很聪明只是没发挥”的自助文学,主要功能是焦虑管理,不是帮助。真正有用的信息是冷的:天花板存在,且成年后基本不动;但天花板之下的空间,绝大多数人远远没有用完。
第二项 · 推理工具(软件)
推理工具是装载在认知系统里、可随时调用的具体思考方法。它和去耦的关系类似软件和硬件——硬件决定能跑多复杂的程序,软件决定这台硬件在具体问题上实际输出什么。
这是整个四元公式里最被严重低估的一项。大部分自称“爱思考”的人,推理工具装备是空的。他们所谓的思考,是用直觉对现象做合理化,然后用流畅的语言把合理化表达出来——过程里没有工具参与,所以产出主要是情绪和模式匹配的副产品,不是判断。
覆盖最广、回报最高的四门推理工具是:
概率与不确定性推理——真正理解基准率、条件概率、样本偏差、选择效应、校准这些东西。不是学统计课程,是把它们变成判断的反射。
因果推理——区分相关和因果,理解混淆变量、反事实对照。大部分人把“A 伴随 B 发生”等同于“A 导致 B”,这一门就是系统地纠正这个习惯。
博弈论与激励结构——不需要数学深度,要的是在任何现象里自动识别“谁在为谁的决策买单”的反射。这一门装上后看新闻、看政策、看商业现象的方式会完全不一样。
系统动力学——理解非线性、延迟反馈、涌现。大部分复杂问题的错误归因,都是因为没在脑子里跑反馈回路的模拟。
每一门都有经典的入门读物(见文末附录)。四门装齐后,面对任何现象,认知系统会自动把它拆成“这里面哪些变量、因果方向如何、各方激励如何分布、反馈回路在哪里”——这不是思考技巧,是一种反射。
装没装上,有一个极简单的测试:面对新现象时,此人是直接给一个结论,还是会自动把它放进某个推理框架。前者贫乏——他的结论可能对可能错,但他自己分不清。后者已经内化——他会先问基准率、因果方向、激励分布、反馈延迟。
另一个诊断更直接:让他估算一个完全不熟悉的量,比如“一个城市里一年被雨淋湿过的自行车有多少辆”。装备完整的人会自发拆解成几个独立因子相乘;装备空的人会直接给一个数字,或者说“我怎么知道”。这个差异不是知识差,是工具差。
这一项完全可训练,而且训练回报在四项里最高——因为大部分人起点接近零。装齐最小集大约需要 6 到 12 个月的认真阅读加上刻意在日常判断里套用。这个时间跨度对大部分人来说短得惊人,远短于任何学位课程,远短于学一门手艺,但回报大得多。
关键不是读完书,是在日常判断里能自动调用。验证方法:下次对某个现象有判断时,暂停一下,问自己“这里面哪个工具在起作用”。答得上且答得准——装上了。答不上——还没装上。答得上但发现自己的判断其实绕过了工具、直接走了直觉——这是最常见的中间状态,需要刻意练习把工具前置到直觉之前。
这一项和第一项的关系必须讲清楚。学术上反复验证的一个结论是:智商和实际判断质量的相关性低得惊人——高智商完全不保证不做蠢事,因为智商测的是算力,不测有没有装上推理工具。流体智力中等但工具装齐的人,实际产出可以稳定超过流体智力高但工具空缺的人。
这对在第一项上没拿到好起点的人是真正的好消息。对在第一项上拿到好起点却懒得装第二项的人,是坏消息。
第三项 · 跨域规律深度(数据库)
去耦能力需要有东西可操作——这个东西就是对多个领域的规律性理解。但“深度”需要精确定义,因为它极容易被误解成“精通”或“博学”。
这里有个核心区分。一个领域的深层结构是真正决定行为的因果机制和约束关系;表面特征是术语、流程、案例、行业黑话。真正能被去耦调用的是深层结构,不是表面特征。
所以跨域规律深度不是精通。精通是能执行一个领域的工作——会开庭、会做手术、会写生产级代码。规律深度是能解释为什么这个领域的事情是这样发生的——背后的激励结构、信息不对称、反馈延迟、幸存者偏差、标准叙事在哪里撒谎。前者需要十年以上的专门投入;后者,好的观察者几个月可以到位。
最严厉的一个诊断:能不能说出一个领域的从业者集体相信但实际上是错的一件事。
这个诊断的背后逻辑是——每个领域都有自己的“标准叙事”,那是利益相关者构造出来的、自我美化的、有时是反向的。真正抓住一个领域规律的人,能识别这套叙事在哪里撒谎。说得出的,是真懂;说不出或者复述的都是行业里已经人尽皆知的元吐槽(“资本逐利”“体制问题”之类)的,说明还停在标准叙事层,没进入深层结构。
另一个诊断:听到这个领域的新现象时,能不能在不查资料的情况下预测它的走向,且在预测错时知道自己错在什么假设上。能做到这一点,说明脑子里已经建立了这个领域的因果模型。做不到的——你知道这个领域的事情,但你没有它的模型。有事情没模型,对去耦能力完全没有帮助。
这一项完全可训练,但门槛比推理工具高——需要时间和持续的好奇心投入。每个领域大约需要 6 个月到一年才能到“能预测并识别错误”的深度。
选领域有三类分布回报最高:
激励扭曲明显的领域——医疗、教育、学术出版、政府采购、保险、慈善。这些领域的表象和真实动力差距最大,每理解一个都能装上一批可迁移的思考工具。
历史数据丰富的领域——金融市场、战争史、流行病史、技术迭代史。有真实反馈、可证伪、规律经过长时间压力测试。
与主场邻接的领域——你已有深度的领域的相邻学科。学习斜率最快,且迁移回主场的价值最高。
每个领域的最小学习路径:读一本内部人吐槽本行的书(不是入门教材),读 2 到 3 篇实证研究或系统综述,定期追一个高信息密度的来源(不是新闻,不是 KOL,是行业内的深度 newsletter 或研究者博客),做至少 10 条可验证预测并跟踪。没有最后这一步,前面三步全是娱乐。
需要主动避开的是主要靠故事和叙事驱动的领域——时尚、娱乐八卦、政治评论、鸡汤商业书。它们的“规律”大部分是事后合理化,学了之后脑子里装的是更多表面特征,不是深层结构。
第四项 · 反馈暴露(运行环境,作为指数)
前三项决定瞬时能力。第四项决定前三项能不能长期维持且持续增长。
反馈暴露指的是一个人的判断在多大程度上被现实系统性地检验。高反馈环境里,每个判断都会被事实打脸或确认;低反馈环境里,判断可以无限期飘在空中而不被校准。
为什么这是指数项而不是加法项——因为没有反馈,前三项的任何水平都会随时间腐烂。去耦能力会退化成自嗨(以为自己在做结构抽象,实际在产出听起来深刻的废话);推理工具会变成仪式(用贝叶斯的语言但从没真的更新过先验);跨域规律会变成学究式的收集(知道各种规律但分不清哪些在当前情境下适用)。反之,即使前三项中等,在高反馈环境里会持续自我校准,时间拉长后产出质量远超前三项满配但活在低反馈环境里的人。
最简单的一个诊断:过去一年你做的重要判断里,有多少被现实明确地验证或推翻过?
数量接近零的人,无论他自认为多会思考、读过多少书、表达多么流畅,认知系统实际上已经很久没被校准过了。他可能在某个阶段达到过相当高的水平,但那个水平正在缓慢失真,而他自己察觉不到——低反馈环境的定义就是没东西来告诉他失真了。
有一个著名的长期研究,跟踪了近 300 位各领域“专家”在二十年里的大量预测。核心发现不是“哪些人预测得准”,而是绝大多数专家的预测准确率接近抛硬币,但他们对自己准确率的自我评估远高于实际。二者的差距来自同一件事——这些人从未被系统记录过自己的预测。没有记分板,所以没有校准。
严格说这一项不在训练层面,而在环境选择层面。反馈暴露不能通过努力提高,只能通过进入什么环境、拒绝什么环境来选择。这让它成为四项里最隐蔽的一项——前三项的水平别人能从交流中大致判断,反馈暴露是结构性的,外人看不出来,当事人也经常自己看不出来。一个在大公司做了十年“战略”的人可能前三项都很高,但如果这十年里他的判断从未被市场、被用户、被具体结果检验过,那他这十年的“思考”产出大部分是噪音。
值得进入的环境是这些:创业(产品卖不卖、用户留不留,每周都在验证判断);交易和投资(每个决策带着盈亏作为真实反馈);实证研究(假设被实验证伪是常态);做面向真实用户的产品(不是做给 KPI 的产品);竞技博弈(围棋、扑克、电竞——结果清晰)。
以上是以“职业”为单位的反馈环境。但反馈闭环不只存在于职业生活里——日常生活里有大量同样严苛的反馈结构,只是容易被忽略:做饭(菜咸了淡了、火候过了不过,每一餐都在对一个具体判断打分);带孩子(孩子的反应是几秒内的反馈,你的每个判断都在被哭声、笑声、行为反应即时校准);锻炼身体(动作对不对、强度合不合适、饮食调整有没有效果,身体会用很短的周期告诉你);养宠物(和带孩子类似的结构,周期更短,因为反应更直接);学乐器(每个音是不是对、节奏是不是稳,耳朵即刻反馈);修理东西(修好了还是没修好,毫无空间让你自我美化);谈判(对方的反应就是你判断有没有对齐他们实际想法的校准);园艺(植物不会给你面子,它活或者死)。
这些都是极高密度的反馈闭环。相比之下,一个在大公司会议室里做战略 PPT 十年、从不下场执行的人,在反馈暴露这一项上,可能远远输给一个每天给两个孩子做饭同时带一只狗的家长。这不是修辞,是结构——前者的判断从不接触现实,后者的每个判断都在几小时之内被现实打脸或确认。
文化上我们倾向于把“认知能力”和特定职业绑定(科学家、工程师、投资人),但反馈暴露这一项完全不尊重职业标签。一个真正投入在一件需要持续反馈的生活实践里的人——哪怕那件事是做饭、带孩子、锻炼身体——认知系统保持在线的概率,反而高于很多在“体面白领岗位”上做着永远不被结果验证的工作的人。前三项(去耦、推理工具、跨域深度)确实需要专门训练,但第四项——让自己的判断持续碰到现实——在任何生活形态里都是可得的。
应该避开的是纯观点生产(KOL、评论、专栏),咨询式“战略思考”(建议给出就结束,不跟踪执行),低频决策长延迟反馈的大公司岗位,以及任何“说得好就赢”的领域——说服力和判断准确性在这些领域完全脱钩。
如果暂时不能改变主环境,可以自己建反馈机制。但要先破除一个常见的幻觉——市面上流传的那套“预测日志、决策日志、对抗性同伴小组”基本没人能坚持。不是它们错,是它们把反馈机制从真实生活里剥离成了一个附加仪式。一个每天忙着工作、生活的人,靠意志力每月写 5 条预测,两周之内就会停。缺反馈的人缺的不是表格,是一种让自己的判断必须接触现实的生活结构。表格解决不了结构问题。
真正可持续的反馈来自两件事——在自己已经关心的领域里完成判断-验证闭环,以及进入新领域时把学习本身变成有验证的过程。两件都不是日志,是行动。
第一种 · 在已有领域里完成闭环。大部分人在自己的领域里其实已经每天都在做判断——只是这些判断没有被明确化,所以无法验证。同事问“这个方案行不行”、朋友问“这家公司值得跳吗”、自己决定“这个功能做不做”、看到一条新闻觉得“这事过两周会反转”——每一个都是判断,每一个都可能被后来的事实检验,但绝大多数人从不把这些判断说出口或写下来。它们以一种模糊的“我觉得”的形式存在,之后不管结果如何,都可以被大脑重写成“我当时就是这么想的”。
要做的事情只有一件——在判断发生的那一刻,把它具体化。和同事说方案有问题时,明确说“我觉得它会在 X 上卡住,两个月内”;做产品决策时,直接在文档里写“我们预期做完之后 DAU 会涨 Y,如果到时没涨我错了”;看新闻做预测时,告诉身边一个具体的人“我赌 X 不会发生”。不需要日志、不需要打分、不需要 Brier score——把判断说出口这个动作本身就强制了后续的自我校准,因为说过的话会在几个月后被对方或被自己记起来。
这个做法的门槛是承担被打脸的风险。大部分人不把判断说出口,不是因为懒,是因为说出口之后错了会丢脸。正是这个“丢脸”的压力让判断变得真实——它强制你在发出判断之前再想一遍。每一次这样的具体化,都是一次小型的认知校准。
第二种 · 把学习新领域变成验证过程。想理解一个新领域时,大部分人的默认路径是先读、再读、再读,读到某个模糊的“我觉得懂了”为止——然后这个“懂”从未被验证过。正确的路径是先形成一个能错的预判,再去验证。
具体做法:选一个你最近想搞懂的领域,不读任何东西的情况下,先写下三到五句你对这个领域的基础假设——你觉得它是怎么运作的、哪些因素在驱动它、近期会怎么变化。写下来之后再去找材料读。读的过程不是“吸收信息”,而是持续拿材料和自己的预判撞——哪些假设被证实、哪些被推翻、哪些需要修正。读完之后,你脑子里不是一堆别人的结论,是一个被校准过的自己的模型。
这个方法的核心不是效率,是信息的加工路径。先读后想是被动吸收,产物是别人判断的复述;先想后读是主动校准,产物是自己的因果模型。花费的时间其实差不多,但输出质量完全不同——前者几个月后什么都记不住,后者在这个领域里建立了一个可持续使用的判断工具。
为什么这两种方法管用而日志方法失败。日志方法要求你为“记录”本身投入额外的精力,这件事除了少数格外有纪律性的人之外对绝大多数人不可持续。上面这两种方法不需要额外精力——它们只是改变你本来就在做的动作的方式。你本来就在和同事讨论、做决定、读东西、关心新闻;这些方法做的事是让这些动作自带验证环节,而不是在它们之外另起一个仪式。
可持续性来自不新增生活内容,只改变已有内容的结构。这个原则比任何具体工具都重要。
四项里只有一项决定了上限,另外三项决定了一个人离自己上限还有多远。
绝大多数人距离自己的上限差得非常远。
这是坏消息,对不想动的人。这是好消息,对愿意动的人。
第七章 · 最后
一个愿意动的人,读完前面的框架,自然会问——“所以我应该从哪里开始?”
这个问题本身值得警惕。
如果读完四项公式之后第一个动作是索要一份“开始的步骤清单”,说明还在用消费方法论的方式消费这篇文章。清单再好,拿清单的人大概率两周内停下来——因为清单所描述的那些动作都建立在一件更底层的事情之上,而那件事不是靠清单能启动的。
这件底层的事只有一件:
停止把“消费反刍”当作学习。
这不是“少刷半小时手机”,不是“每天早起读书”——这些都是把问题理解成纪律问题。真正要停的是一种自我欺骗:把“我今天跟上了 AI 圈”的感觉误认成“我今天学到了东西”。承认前者只是情绪管理、和刷剧放松同属一类——这个承认本身就是第一步。
承认之后,会发生一件具体的事:未被反刍占据的认知带宽开始出现。这个带宽一旦出现,它会自己找到用处——不是因为你“决心学习”,是因为一个不被噪音淹没的头脑本来就会朝自己真正关心的问题走。
接下来的问题——怎么和模型打交道、什么问题值得问、怎么用 AI 打磨自己的判断、怎么在已有经验上长出跨域深度——不在本篇的范围里。本篇只做诊断:讲清楚你面前的分化是什么、它由什么组成、大部分人距离自己的上限差多远。
这套东西不是给所有人的。
认知去耦的硬天花板决定了有些人装不进这个系统——这是残酷但必须说的事实。这篇文章从头到尾没有提供希望的普惠,它提供的是一个能让人判断自己位置的框架。
但能读到这里的人不是那群人。能识别出“我在消费反刍”这件事、能跟着一路读到这里而没有关掉页面的人,已经完成了基础自测——前面那些残酷的话里,有一部分没有打中你。
AI 时代真正的重新分层正在发生。过去的评价体系奖励晶体智力和模式库,所以把知识和经验错当成认知能力的人也可以显得“有能力”。AI 把那些东西摊平之后,剩下的那部分——真正的认知活动——第一次被单独标价。大部分人还没意识到这件事。等他们意识到的时候,差距已经不是可追赶的量级。
最后一个自检,比前面任何框架都更有杀伤力——
回想你最近一次真正改变重要看法是什么时候、因为什么改变。
答得出具体的、近期的、非鸡毛蒜皮的——这套系统在你身上已经在运转。
答不出的——问题不在 AI,不在时代,在自己。而且问题从来就在这里,只是过去没有暴露而已。
附录 · 入门书单
第六章提到的四门核心推理工具,各有经典的入门读物。按难度从低到高排列,从第一本开始读就好。
概率与不确定性推理
- 《思考,快与慢》Daniel Kahneman
- 《超越智商》Keith Stanovich
因果推理
- 《为什么:关于因果关系的新科学》Judea Pearl
- 《因果推断实用指南》Scott Cunningham
博弈论与激励结构
- 《冲突的战略》Thomas Schelling
- 《大脑中的大象》Robin Hanson & Kevin Simler
系统动力学
- 《系统之美》Donella Meadows
关于反馈暴露和判断质量(第六章第四项的理论基础)
- 《专家的政治判断》Philip Tetlock
- 《超预测》Philip Tetlock & Dan Gardner
读的时候有个实用提示:每本书读完先不要读下一本,花 1 到 2 周在日常判断里刻意用这本书的思考方式。没有这一步,读完等于没读,因为知识没有在脑子里跑过实际推理,不会形成反射。
引用与出处
本文涉及的主要事实性断言的来源。
关于 AI 能力分化(第一、二章)
- Andrej Karpathy 关于 “growing gap in understanding of AI capability” 的 X 推文(2026 年 4 月),原文提到免费 ChatGPT 用户和付费 Claude Code / Codex 用户对 AI 能力判断的两极分化,以及强化学习 + 可验证奖励的技术解释。相关二手报道:The New Stack, “Karpathy says developers have ‘AI Psychosis’”(2026 年 4 月)
- “该走路还是开车去洗车”例子不是 Karpathy 提出的,是 2026 年 2-3 月在 Threads / Twitter 上独立流传的 viral test,被多家媒体报道(Newsweek, Cybernews 等)
- SimpleBench:simple-bench.com(Philip AI Explained,2024),213 道多选题,人类 baseline 约 84%,持续高于 SOTA 模型
- BrainBench:Exposing the Commonsense Reasoning Gap in Large Language Models(arXiv:2603.14761,2026)——系统化 LLM 在常识推理上的失败模式
关于反刍链与知识管理(第三、四章)
- Google effect / digital amnesia:Sparrow, Liu, Wegner, “Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips,” Science 333 (2011): 776-778(注:该研究的效应量在后续复制中有争议)
- Karpathy 的 LLM Wiki 方案:https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f(2026 年 4 月 3 日发布,wiki 规模约 100 篇文章、40 万字)
- Luhmann 的 Zettelkasten 系统:一生积累约 90,000 张卡片,发表 70+ 本书、400+ 篇论文。参考 zettelkasten.de 和 Sönke Ahrens, How to Take Smart Notes(2017)
关于通才、品味、认知折叠(第五章)
- Dan Koe 的通才论:thedankoe.com 上多篇相关文章,如 “The Rise of the Generalist”、“The Future of Work”
- Paul Graham 的品味理论:paulgraham.com/taste.html(“Taste for Makers”, 2002)
- Rick Rubin:The Creative Act: A Way of Being(2023)
- 认知去耦、反思心智、IQ 与理性判断相关性低:Keith E. Stanovich, Rationality and the Reflective Mind(Oxford, 2011)和 What Intelligence Tests Miss(Yale, 2009)。Stanovich & West 系列研究显示思考倾向和 IQ 的相关性通常 < 0.30
- 流体智力与结构抽象能力:François Chollet, “On the Measure of Intelligence”(arXiv:1911.01547, 2019)
关于认知能力的可塑性(第六章第一项)
- IQ 遗传度随年龄增长的 Wilson Effect:Haworth et al., “The heritability of general cognitive ability increases linearly from childhood to young adulthood,” Molecular Psychiatry 15 (2010): 1112-1120;Plomin & von Stumm, “The new genetics of intelligence,” Nature Reviews Genetics 19 (2018): 148-159。成年期 IQ 遗传度估计在 0.70-0.80
- 脑训练无法迁移到流体智力:Melby-Lervåg & Hulme 等多项 meta-analysis;Simons et al., “Do ‘Brain-Training’ Programs Work?” Psychological Science in the Public Interest(2016)
关于专家预测与反馈暴露(第六章第四项)
- Philip E. Tetlock, Expert Political Judgment: How Good Is It? How Can We Know?(Princeton University Press, 2005/2017)——从 1985 到 2003 年,284 位各领域专家,共 27,451 条可验证预测。核心结论:专家预测准确率接近随机,且对自身准确率的自评远高于实际
关于 AI 前沿能力基准(第二章)
- AIME(American Invitational Mathematics Examination):数学奥赛初选级别基准
- GPQA(Graduate-Level Google-Proof Q&A):博士级科学题基准
- SWE-Bench:真实 GitHub issue 软件工程任务基准
- 这些基准在 2025-2026 年间被前沿模型(OpenAI o 系列、Anthropic Claude 4/4.5/4.6 系列、Google Gemini 2.x/3.x 系列等)陆续逼近或达到专家水平
本文对上述来源做了概念上的转述而非直接引用。如需核查原文表述,请参考原始链接与出版物。
Complete English translation.
Chapter 1 · Four Phenomena
If one audits a year of one’s own information consumption, several patterns appear at once. Any one of them on its own is unremarkable; read together, they point at a structural fact that has not been discussed enough.
Four phenomena, then.
I. The rumination chain
Most of the AI-related content you see on Instagram, Substack, or X each day is produced like this. A piece of primary material, whether a paper, a snippet of code, an experiment result, or some practitioner’s firsthand observation, is summarised once by an account using an LLM. Then a marketing account picks it up and gives it a hook-shaped headline. Then the next account’s LLM rewrites it in its own voice. Then it is pushed to the end reader.
You’ve seen the headlines: “You’re using ChatGPT wrong,” “Gemini just cooked GPT-5,” “The white-collar bloodbath has begun,” “Sam Altman just confirmed it,” “How this 19-year-old is making $100K a month with AI,” “Silicon Valley is quietly panicking,” “This time is actually different.” Each one manufactures urgency; click through, and you find a few hundred words of nothing.
This is the rumination chain. At every layer, three things happen at once. Information density drops, as details, caveats, and uncertainty are eaten by summarisation. Error accumulates, because every pass introduces small distortions that reinforce each other over multiple layers. Emotional intensity rises, because every handler has to crank up the drama to survive algorithmic distribution. What arrives at the end reader is something processed, nutritionally depleted, but easier to swallow: a prefabricated junk food of the content world.
The chain used to be entirely human, and its speed and volume were nothing like today’s. Now we’re watching a vast quantity of ruminated garbage being reinvented and manufactured at scale.
When people think they’re “taking in information” every day, what they’re taking in is the tail end of this chain, probably three or four layers removed from the primary source. This fact is not itself news; it’s been said many times. It’s the premise for the three layers that follow.
II. The tooling-up of ruminated consumption
The more arresting thing to watch is not the rumination chain itself. It is the consumer’s response.
Faced with an infinite supply of ruminated content, a subset of people responds not with “consume less, find a primary source” but with “I need better tools to consume more.”
Take OpenClaw. Its underlying capability goes well beyond this, but its most virally shared use case is not “help me write code” or “help me analyse data.” It is “help me summarise what happened in the AI world today.” Users enthusiastically show off their workflows on social media: AI reads dozens of sources for me each day, auto-generates summaries, pushes key updates to my phone.
On the surface this looks like an efficiency gain. It is in fact a strange self-reinforcing loop: an AI tool helping me consume, more efficiently, the rumination another AI has spat out. The throughput of actively eating ruminated matter has itself become a publicised achievement.
The reader’s position has shifted subtly, from passively receiving rumination to actively optimising the ruminated-consumption pipeline. What looks like a step up in agency is, at bottom, an amplification of consumption volume.
III. The producers split
Meanwhile, in a different and barely overlapping circle, a different group of people is doing something entirely different.
Andrej Karpathy has described this split precisely on X: judgments of AI capability are extremely bimodal across populations. On one side are people who mainly use free or older ChatGPT. Their impression sits somewhere around “this thing hallucinates, it produces AI slop.” The viral social-media examples of AI not being sure whether to walk or drive to the car wash are the AI these people see every day. On the other side are people paying a few hundred dollars a month and using Claude Code or Codex for real technical work daily. They see “refactor this entire codebase in an hour,” “find a vulnerability in this system autonomously.” The ceiling of this capability has been growing explosively over the past year.
The two groups inhabit radically different realities. The first group’s information source is summaries from the tail of the rumination chain; the second group’s information source is their own real-time collaboration with the model. The first judges AI by “what did I read about it”; the second judges AI by “what have I gotten it to do.”
Two groups use the same word, “AI,” and are not referring to the same thing. One says “AI is whatever”; the other says “this is exploding.” Both sentences are true. The difference isn’t in the model; it’s in what the user is doing.
IV. Three layers at once
Any single layer, read in isolation, is just a local observation. Read the three together, the rumination chain, the tooling-up of ruminated consumption, and the producer split, and a question surfaces that hasn’t been singled out for discussion.
People consuming rumination are not short of information. They are exposed to more of it than anyone in any previous era. Nor are they “failing to learn,” because the product was never designed to teach them anything. What they are actually doing is using the act of “consuming information” to manage the fear of having fallen behind the times, a kind of self-soothing against anxiety.
People doing real cognitive work (Karpathy’s second group) aren’t doing so because they consume more information. It’s that they are doing something categorically different: their cognitive activity is running. Faced with a problem they push on it, test it, verify it, and form their own judgment. Rumination consumers don’t do any of this. They read the finished product, register a keyword or two, close it, next one.
So what the three layers together point to is not “information overload” (a cliché), nor “AI is making people lazy” (an evasion). They point at a sharper fact: most people have been mistaking knowledge, experience, and fluency for cognitive ability itself. They have never identified what a real cognitive act is.
What counts as real cognitive activity? Start by ruling out several familiar mistakes. It’s not a brain turning over (brains are always turning over; you’re alive). It’s not having read or memorised a lot (that’s knowledge storage). It’s not being skilled in a field after years of practice (that’s experience). It’s not fluent expression (that’s linguistic capacity). These were once confused for cognitive ability because, in the old environment, they tended to show up together with it. But they aren’t it.
Real cognitive activity means producing structural judgments that reality can calibrate. Forming concrete judgments that future facts can verify or refute. Recognising deep structure in an unfamiliar problem. Treating the distance between one’s current judgment and reality as a manipulable object. Cognitive activity in that sense is the thing the AI era has separated out and priced on its own. Most people have never registered that this distinction exists. Without registering it, there is no opportunity to do it.
The old evaluation system did not require it to run. Experience, patterns, fluency, broad reading, expressive ease, social sensitivity: none of the old expressions of “smart” or “capable” required real cognitive activity. A shrewd businessman, a well-read editor, a competent middle manager. Their cognitive systems may never have performed a single genuine reflection. Yet in the old environment they were genuinely high-ability people, because the old environment rewarded those traits, and rewarded them well.
The AI-era shift isn’t “AI makes people lazy.” It’s that for the first time, cognitive activity itself has been peeled off from every other ability and priced separately. Once knowledge storage has been flattened by the model, repetitive tasks swallowed, and the production cost of finished judgments driven to zero, what remains is cutting reality into structure, catching errors in the output, and doing real abstraction on unfamiliar problems. Most people simply don’t have that. And their own past successes tell them “I am capable.”
This isn’t a new problem. It’s a fact that has always existed, masked by the old evaluation system, now exposed for the first time under new technical conditions. Not because people suddenly got dumber, but because the old sources of “feeling capable” are disappearing.
Chapter 2 · AI Isn’t Uniformly Strong
Before the main argument, a common misconception needs to be cleared out.
The misconception is this: AI has surpassed humans across the board, so most people feeling small in front of it is natural, and passively consuming rumination is a reasonable response.
This is half right and half completely wrong.
Where AI is strong. On structured, measurable tasks, AI has stably outperformed the median human and, in many cases, the domain expert. Math olympiad (AIME), PhD-level science questions (GPQA), real software-engineering tasks (SWE-Bench), various language-understanding tests: these benchmarks have been broken one after another by frontier models over the past year. Add knowledge retrieval, fluent expression, and instruction-following to the list, and AI far outstrips most people. For work in these directions, the best use of AI is as a super-lever. It can amplify a capable person’s output by more than ten times.
Where AI is weak. An equal amount of evidence shows that AI’s strength is sharp and uneven. Change a problem’s surface conditions slightly. Substitute a scenario that looks unrelated but is structurally identical. The model’s performance can suddenly collapse. This brittleness shows up repeatedly in benchmarks like SimpleBench and BrainBench, where models underperform the median adult on ordinary common-sense questions. And there is a whole class of tasks AI is bad at: judgments with no clear right or wrong (is this design good; is this decision correct), processing murky reality (what is this tangle of user feedback actually saying), abstracting on a completely unfamiliar structure (how do you decompose a problem type you’ve never seen before).
Why it’s uneven. There is a technical reason. The frontier models’ progress is driven mostly by reinforcement learning, which gives the model “right / wrong” signals so it can calibrate. That kind of training is extraordinarily effective in domains with unambiguous correctness. Does the code run? Is the math answer correct? Does the test pass? These are verifiable rewards: clean signals, high training efficiency. Conversely, in domains without clean correctness, such as good judgment, appropriate restraint, or taste, there is simply no way to produce a reliable reward signal. How do you teach a model that “this paragraph feels off”? You can’t. So progress in these domains lags far behind the verifiable ones. Add commercial incentives: code and structured tasks generate direct B2B revenue, so the vast majority of compute and research effort flows there. The result is that AI capability grows exponentially in “verifiable + commercially valuable” domains, and slowly everywhere else. This is the technical story Karpathy has been repeating on X. Chapter 1’s bimodal capability split between the two groups is, at bottom, the two groups using different axes of the same model.
The real corollary: AI is an amplifier, not an equaliser. What it amplifies is not “human ability overall” but “the structure the user can feed into it.” Give it to a person who can already break a problem down and judge output quality, and AI amplifies their work more than tenfold. Give it to a person who can only ask AI to “write something,” and AI returns rumination. What gets amplified is the frequency of their anxiety management.
The crux: AI has not replaced the act of constructing the problem.
A reasonable objection needs to be addressed here. “AI has already demonstrated structural abstraction under reinforcement learning: the o-series reasoning models, working on hard math problems, overturn their own wrong assumptions and rebuild their approach mid-solve; AlphaGo was inventing novel joseki a decade ago. Isn’t saying ‘AI can’t do structural abstraction’ already out of date?”
The observation is correct. But there is a critical distinction that has to be drawn explicitly.
AI’s structural abstraction always happens under a given reward function. Go’s objective is to win; a math problem’s is to be correct; code’s is to run. Inside those well-bounded spaces, where the victory condition has already been defined by humans, AI can indeed search for structure better than humans can. This is instrumental decoupling: finding the optimum inside a defined game. AI is already past humans at that layer, and will keep pulling further ahead.
But there is another layer of decoupling: defining what is worth pursuing, what question is worth asking. What hidden human need does this product actually solve? What kind of human reflex will this policy provoke? What is this tangle of user feedback really trying to say? Where is the real root of an organisation’s long-running problem? This layer has no preset reward function, because “what counts as right” here has to be adjudicated by a human in the first place. AI can generate ten thousand logically airtight structures, but the step of judging which one actually corresponds to the real world’s pain requires an adjudicator who is an embodied agent, living inside reality, bearing consequences. Without that agent, where does the reward function come from?
So what AI has taken over is “finding the optimal structure inside a defined game” (something humans were not particularly good at anyway; handing it to AI is reasonable). What it has not taken, and cannot generate on its own, is “deciding which game to play.” The structure of that second question is, by nature, not in a layer AI can reach.
Back to the division of labour: which questions are worth asking, which direction is right, whether the output actually solves the problem. These judgments still rest entirely with humans. And this class of judgment is exactly what the last section of Chapter 1 was pointing at: the kind of thing only real cognitive activity can do.
So “AI is too strong, I can’t keep up” is a lazy story. AI has indeed surpassed you at the things you should have let go of (memorisation, fluent expression, instruction-following); it has not surpassed you at the things you should actually be participating in (judgment, construction, taste, structural abstraction). The people who have opted out where they could participate are the ones AI is actually leaving behind. And they often don’t notice, because ruminated consumption gives them the illusion of “still keeping up.”
Chapter 3 · Where the Reader Stands on the Chain: Learning, or Anxiety Management?
If AI is an amplifier rather than an equaliser, then “the output gap between users is widening” is a necessary corollary. An amplifier blows up whatever gap was already there on the user’s side. But that alone doesn’t explain why the gap between the two groups is so large, and pointed in the opposite direction: on one side, people building complex systems with AI; on the other, not “people who use AI less” but people who consume more and more through AI and produce less and less.
This reverse dynamic needs an additional explanation. It isn’t about ability or skill with the tool. The real reason is that at the tail of the rumination chain, most people aren’t learning at all. They’re doing something that looks like learning but whose mechanism is the exact inverse.
Learning and anxiety management
Put the two definitions cleanly side by side.
Learning is this loop: encounter something you don’t understand → form a preliminary judgment on your own → go find information with specific questions in mind → use the information to calibrate your judgment → form a new understanding that is now yours. All four moves are required. Of them, the prior judgment and the calibration are the core of the process. Without those two steps, incoming information won’t be absorbed into any existing structure. It’s like pouring water where there is no container: a little while later it’s dry, nothing left.
Anxiety management is a different loop: feel behind → consume information → feel “I’m keeping up” → anxiety briefly eases → because nothing was internalised, the feeling of being behind returns a little later → consume again. This loop has no prior judgment, no calibration, no internalisation. Its operating logic is the opposite of learning’s. Learning produces cognitive structure; anxiety management produces emotional relief.
On the surface the two look alike, because both involve the act of “reading something.” Mechanically they are inverses. Worse: the brain cannot distinguish between “reading a conclusion” and “arriving at a conclusion oneself.” Read a well-constructed article and the brain produces the feeling of “I now understand this,” nearly identical to the satisfaction of having reasoned to a conclusion yourself. But the first is a moment’s illusion. Close the tab, and a few days later nothing is left. The second is genuine cognitive internalisation, and changes how the person judges the matter for a long time afterward.
This intracranial mechanism is the foundation that lets the rumination chain keep running. It makes anxiety management feel identical to learning, so the consumer gets a daily “today wasn’t wasted” confirmation while in fact nothing has been taken in.
What the rumination chain is actually for
Chapter 1 already laid out the mechanics of the chain: every layer drops information density, accumulates error, and raises emotional intensity. The more important fact is this: the chain’s design objective is not to convey information; it is to hold attention.
That distinction matters. A thing produced to convey information (an academic paper, a rigorous empirical report, technical documentation) keeps, by design, the content that lets the reader truly understand and use it: definitions, conditions, counterexamples, uncertainties. In the attention economy those things are liabilities. They reduce reading fluency and raise cognitive load. So a thing produced to hold attention will, by design, systematically delete exactly that kind of content.
Which means: no matter how hard you read this kind of material, you will not learn from it. Not because the reader isn’t trying hard enough. Because the product was never designed for learning to occur. The more carefully you read, the more you are just walking circles inside a room with no outlet.
The reader’s position inside the loop
Now stack the anxiety-management loop on top of the rumination chain, and you can describe precisely what most people do each day:
Open an app → scroll into a hook-shaped finished judgment → read quickly → retain one or two keywords or an emotional impression → close → land on the feeling “I kept up today” → anxiety briefly eases.
Not one step in that entire sequence is learning. No prior judgment (before reading, they didn’t know what they were trying to verify), no calibration (they didn’t check the information against any existing judgment), no internalisation (close the tab and nothing remains; ask them later, they can’t recap it). Every action in the sequence is emotional regulation, using the act of “contacting information” to complete a ritual of “I didn’t fall behind.”
Here’s a cruel verification: find someone who scrolls AI news every day, and ask them to recap the core argument, and their own judgment of it, for any single piece they read three days ago.
Most of them can’t.
This isn’t a memory problem. It’s that the content was never processed, so there is nothing there to remember. The brain only holds onto what it has processed itself: what it has judged, argued with, tested against its existing understanding. Information that has flowed past passively, no matter how “information-dense” it felt at the time, is gone a few days later.
The contrast with people doing actual AI-adjacent work is complete. They don’t consume ruminated content. The day’s experiments, model feedback, and debug output already provide information at a density far higher than any secondhand summary. Their judgment of what a model can and cannot do comes from the specific thing they tried and failed to get it to do yesterday, not somebody else’s review.
And here is the counterintuitive part: the more rumination consumers consume, the further behind they feel; the more doers do, the clearer they feel. Not because the doers know more. Because the two activities move information in opposite directions. Anxiety management is input-only: information becomes emotion, burned up. Real work processes input and produces output; every pass reinforces cognitive structure. Consumer impressions of AI: “there’s too much, I can’t keep up, something new dropped again.” Doer impressions: “I noticed yesterday it’s especially strong at X, still weak at Y.” Specific, clean, bounded.
The two groups use the same word while referring to entirely different mental objects.
When consuming rumination becomes an achievement
Push the diagnosis all the way, and a phenomenon worth staring at appears.
OpenClaw’s most virally shared use is “help me summarise what happened in the AI world today.” It looks like an efficiency gain; it’s something that wasn’t possible before. But think about it. Is this actually an achievement worth broadcasting?
Under the framework built up so far, the meaning of that use case becomes: a tool that lets me more efficiently, more continuously consume low-density ruminated content; and I treat it as a use case worth sharing. The throughput of actively eating ruminated matter is itself presented as a productivity expression.
This is not a problem with a particular tool. It is the externalisation of a deeper collective belief: keeping up with information equals being capable. In the pre-AI era that belief had some basis. Information was scarce; sustained access to primary sources was itself a rare skill. After AI collapses the production cost of finished judgments to zero, that belief becomes a purely manufactured need. The content you can consume is infinite, and not one piece of it was produced so that you could learn something.
So we get a structurally ironic picture. AI takes the supply of information from scarce to infinite, and a portion of the population responds not with “I can finally shift from consuming information to producing judgment,” but with “I need stronger tools to consume more.” The stronger the tool, the faster the consumption; the faster the consumption, the more information dimensions brushed against; the more dimensions brushed against, the deeper the anxiety. Every piece consumed implies ten more unconsumed.
Once the production cost of finished judgments is zero, “efficient consumption of finished judgments” becomes the last citadel of a manufactured need. The citadel stands not because it provides any real value, but because it provides a kind of busyness that can be displayed, a performance of “keeping up” that the AI era still accepts.
People doing real cognitive work are not in that performance. They aren’t even watching it.
Chapter 4 · A Subtler Trap: Knowledge Management
Ruminated consumption is the obvious form: scroll, read, next. But it has a subtler variant: not passive absorption but active construction, not scrolling information but organising it. Because it camouflages itself better as “serious work,” it does more damage to serious people.
The variant is called knowledge management.
Why it has become a false premise in the AI era
The phrase “knowledge management” hides an older era’s assumption: that knowledge is a static object that can be managed, like a book in a library, able to be categorised, indexed, retrieved. The entire tool-chain of the past few decades (Evernote, Notion, Roam, Obsidian, Logseq) is built on that assumption.
Real cognition doesn’t work that way. What’s in your head as “knowledge” is not statically stored information; it’s a relational network constantly being re-composed. “Understanding” is not “remembering.” “Being able to use it” is not “being able to find it.” That misalignment meant knowledge management was doing a misaligned thing from day one.
Several mechanisms came together in the past few years and pushed the misaligned thing into full false-premise territory:
Retrieval has been solved by AI. One of the core reasons to organise notes used to be “so I can find it later.” Today’s AI knows almost every piece of text in the world, and retrieval cost has gone to nearly zero. About 80% of the traditional “organise so you can retrieve” purpose no longer holds.
The memory-outsourcing trap. There’s a psychological phenomenon called the Google effect: knowing that “the information is saved” makes the brain remember it less. The effect is extreme in heavy note-takers: a note containing X ≠ your head containing X. Knowledge that can actually be called on lives as an active model in your head, not as a tag in a notebook. Many heavy note-takers’ active internal knowledge is in fact atrophying, because they treat the brain as an index rather than a workbench.
Organising notes is the most sophisticated form of anxiety management. It scores exceptionally high on the anxiety-relief instrument: it looks like work, produces a sense of accomplishment (“added twelve backlinks today”), carries no judgment risk, and supplies the feeling of “growing” without requiring any output. It is more dangerous than scrolling ruminated content, because its “I’m doing real work” disguise is stronger.
Organising notes and forming judgments are different activities. Luhmann’s Zettelkasten generated the core material for seventy-plus books not because of whichever system he used, but because every card held his own thinking. Each card was a small judgment, a calibration against existing knowledge. The essence of that system is not “managing knowledge” but “forcing thought.” Modern Obsidian users are mostly managing other people’s thinking: highlighting, quoting, citing. That has nothing in common with what Luhmann did. It only borrows the same visual form.
Karpathy’s wiki: what it actually solves
In that context, in early April 2026 Andrej Karpathy proposed an LLM Wiki scheme: drop raw material into a raw/ folder, let the LLM compile it into a structured markdown wiki, and let the human curate only the input. His own wiki for a single research topic has grown to roughly 100 articles and 400,000 characters; he barely edits it by hand.
The scheme solves three things: loss of context between sessions, the unsustainability of note maintenance (LLMs do the bookkeeping better than humans do), and the fact that knowledge doesn’t compound (every new piece of material automatically updates multiple existing pages and establishes cross-references).
This sounds like a refutation of “knowledge management is a false premise.” After all, he’s doing a form of knowledge management, and in his case it clearly works.
But a bottom-level fact determines why it works for him in particular: Karpathy is a scientist.
His raw sources are arXiv papers, experiment results, code, and his own unpublished work. All of these are either absent from, or already stale in, the training data of the large models. Model training data has a cutoff; the frontier research the model doesn’t know about is exactly the information space a scientist lives in. He isn’t “managing knowledge”; he’s maintaining a delta against the model’s knowledge base, layering the frontier on top of the existing foundation.
That is categorically different from the knowledge-management community’s version of “I have to manage everything I’ve ever read.”
Why ordinary people fail when they copy it
Once Karpathy’s precondition is clear, you can see why ordinary followers replicating his scheme end up producing new anxiety management.
The topics ordinary people care about, such as business news, AI developments, popular science, management frameworks, and industry analysis, are things the model basically knows better than they do. The “knowledge” they’re trying to “manage” is actually a ruminated version of material the model already has in its training data.
Copying Karpathy’s scheme under those conditions becomes what? Having an AI generate a structured-looking wiki out of already-ruminated content. You’re adding another layer to the rumination chain. The output looks more like “knowledge,” more like “a system,” more like “research.” But it’s further from the primary source and emptier.
And because a wiki’s visual structure is stronger than that of a notebook, the illusion of “I’m learning” it produces is also stronger. A person who watches their wiki grow by the day, cross-references multiplying, will feel more strongly that they’re “growing” than one scrolling Instagram. What’s actually happening is the same as what happens to the Instagram reader: they are consuming information they cannot actually call on.
The only difference is that one consumes on a public product and the other consumes inside a self-built system. The latter is more of a cope, not a healthier habit.
What a reasonable personal information infrastructure looks like
To avoid the trap, ask a fundamental question: relative to the model, what am I short of?
Most knowledge-management practice never asks this. It asks only “what should I record,” and ends up recording a pile of things the model already knows more and more accurately.
There are only two reasonable classes of answer:
Class I · The frontier delta. You work in a fast-moving frontier field: research, cutting-edge engineering, emerging practice not yet covered by training data. In that case maintaining a continuously updated knowledge base has real value, because what you are accumulating is what the model doesn’t know. Karpathy’s wiki fits this scenario. Preconditions: you are actually doing frontier work, your raw sources are primary, and you have the judgment to check whether the wiki has drifted.
Class II · The personal-cognition delta. You’re not working at the frontier, but you have your own judgments on specific matters, preferences, non-consensus models, personal lessons. None of these can be generated by a generic model from its training data. In this case the right move is a minimum alignment layer: record only “things the model doesn’t already know,” so that next time you collaborate with the model it can start from where you left off instead of from zero.
The two schemes look very different; the underlying logic is identical. Both are answering the same question: “relative to the model, what am I short of?” The “short of” simply falls in different information distributions for different people.
The second class has a counterintuitive but important property: it cannot be performed. Its entry requirement is “I can identify what the model doesn’t know.” People without that judgment can’t use the method (they discover they have nothing worth recording); people with it end up recording material that’s guaranteed to matter. That self-selection is worth more than the method itself. It structurally prevents the method from becoming a new rumination ritual.
For most people, Class II is the reasonable starting point. If, during use, you discover “the model doesn’t know enough about my field,” you can consider adding a Karpathy-style disciplinary base. The opposite order lands people straight in ritual knowledge management. This is the single biggest mistake the knowledge-management community has made: advising beginners to build the base first. The beginners then spend months organising other people’s judgments, and the organising substitutes for the cognitive activity itself.
A more general principle
Lift knowledge management to its most general level and you get a principle that evaluates almost every future “new scheme”:
In the AI era, any methodology that doesn’t require the user to exercise independent judgment has become, or is becoming, a variant of ruminated consumption.
The distinction is not in the content of the methodology. It can be about productivity, learning, knowledge organisation, creativity. The distinction is whether the user has outsourced judgment to it.
Which also explains a common pattern: every “popular methodology” is, at its core, a collective ritual of cognitive outsourcing. The more popular a methodology, the more it indicates that what it satisfies is not a concrete tool need but a “don’t have to judge for myself” need. Genuinely effective methodologies tend not to go viral, because they depend too heavily on the user’s specific context to be replicated in bulk.
Karpathy’s wiki works for him. But the wave of “Karpathy Wiki Tutorials,” “Second Brain 2.0,” “AI Knowledge Management Workflows” that followed is already turning into the next round of performance schemes. The test is simple: how long into using this scheme does it take before you start feeling “I don’t have to judge for myself”? That moment is the moment it starts rotting.
Chapter 5 · What They All Got Right: Generalists, Taste, and the Refolding of Intelligence
By this point both shapes of ruminated consumption have been pulled apart: passive scrolling of information and active organising of knowledge. They look different; underneath, both are doing anxiety management rather than learning. That surfaces the question that has been circled all along: what actually matters as a capability?
This question had already been brushed against several times before AI. A few widely circulated framings, from different corners, were describing the same underlying thing from different angles. Each one described only a fragment, and none of them realised it was describing the same thing. Laid side by side, a more complete picture emerges.
The generalist thesis
Over the past few years, in creator and indie-developer circles, the concept of the “generalist” has been pushed back to the front. The most-quoted version comes from Dan Koe. The core claim is that specialisation is losing value, and the multi-domain generalist is becoming scarce and valuable.
What he got right: breadth is not miscellany. A real generalist is not “someone who knows a little bit of everything,” but someone who has accumulated enough depth in multiple fields to see the isomorphic structures between them. A person who has studied economic incentives, biological evolution, and organisational behaviour in depth simultaneously discovers that the underlying mechanisms of the three fields overlap heavily. All three describe what happens at the system level when multiple actors pursue their own goals under constraints. Recognising that cross-domain isomorphism is the generalist’s real value, and the only route from breadth to judgment.
What he didn’t make explicit: what kind of depth counts. This is the vaguest part of the generalist thesis, and the part where followers most often slip. Some people, following the thesis, read one introductory book from each of ten fields and end up with ten fields’ worth of surface narratives in their head. That “depth” is useless for cross-domain mapping, because surface narratives are different across fields by definition; there is no isomorphism to be found. What can actually be mapped is deep structure: the causal mechanisms, incentive constraints, and feedback loops that genuinely drive behaviour in a field. A workable test for depth: can you name one thing that practitioners of a field collectively believe but that is actually wrong? If yes, you have depth. If no, you’re still stuck at the standard-narrative layer.
The gap he left behind: the generalist thesis never singles out the underlying mechanism of doing cross-domain mapping. It reads as a methodology: read books from several fields, draw connections. But it rests on an unstated precondition: that the reader is capable of making such mappings. And it’s that capacity that’s actually scarce. Which explains why, given the same methodology, some people pull it off and others don’t: the difference isn’t effort, it’s the mapping capacity itself.
Taste
A second line comes out of the creator and design circles: the taste that Paul Graham, Steve Jobs, and Rick Rubin have been circling for decades. The Chinese translation of the word (品味, pǐnwèi) makes it sound like a fuzzy aesthetic sensation, but what they are describing is a more specific mechanism.
What they got right: before you can articulate the reason, you can already tell whether a thing is good. That pre-linguistic judgment is not mysticism; it comes from the long internalisation of a large body of high-quality samples. A person who grew up in museums can judge a painting’s quality within seconds, not because they know art history, but because their visual system has been calibrated by a large body of high-quality paintings. The output of that calibration is taste.
But what they are describing is low-order taste: pure pattern recognition. Someone can tell that “this typeface is ugly” but can’t say why. That taste is real and useful. It lets a person rapidly filter options inside a domain they know well. But it has a severe limitation: it doesn’t transfer, can’t be taught, can’t be verified. You cannot pass your taste to someone else; you can’t build taste from zero in a new domain, because you don’t have the sample base yet.
What they didn’t make explicit: taste actually has two layers. Low-order taste is pre-linguistic pattern recognition. High-order taste is pattern recognition plus the ability to extract structured reasons on demand. A genuinely strong designer doesn’t only say “this typeface is ugly”; they can analyse why. It may be that the ratio between x-height and weight breaks the visual rhythm. High-order taste matters because it is teachable, transferable, and lets you build taste in a new domain faster. It has made the structure behind the intuition explicit, so the structure can be learned.
High-order taste and the generalist thesis turn out to be, at bottom, the same capability. Both are abstracting a concrete judgment into an operable structure, then applying that structure somewhere new. The generalist thesis talks about it from the cross-domain angle; the taste argument talks about it from the aesthetic-intuition angle. Two faces of one mechanism.
The cognitive-folding thesis
A third line has become common in tech circles and among indie creators. Its core claim is blunt: AI will not narrow the gap; it will widen it at an unprecedented speed. The analogy is usually the industrial revolution. Steam engines and assembly lines crashed the relative value of physical labour. The gap between the people who could design machines, organise production, and schedule systems, and the people who could only sell their physical labour, went from linear widening to exponential splitting. Now it’s cognition’s turn. AI is flattening a batch of cognitive abilities (knowledge, memory, retrieval, fluent expression), and the remaining part that isn’t flattened will be blown up explosively. The result isn’t “inequality grows”; it’s “different populations’ output in the same unit of time gets folded onto completely different orders of magnitude.”
What they got right: the gap will widen, at a speed without historical precedent. The judgment is correct, and the industrial-revolution analogy is apt. Every previous general-purpose-technology leap has produced not linear widening but exponential splitting between people who seized the new lever and people who didn’t. True of steam, true of electricity, true of the internet. AI will only be more extreme, because it is the first time the technology acts directly on cognition itself, which is the source of judgment, creation, and decision. What gets amplified will be more visible than it has ever been.
What they didn’t make explicit: which kind of cognitive capability gets amplified.
This is the murkiest part of the argument, and where most followers slip. The default assumption is usually “high cognition = high IQ,” so the conclusion slides into “high-IQ people will grow richer; low-IQ people will be left behind faster.” The first half of that is roughly right; the reason behind it is completely wrong.
What gets amplified is not IQ. IQ measures raw processing: working memory, processing speed, information extraction, rule-based inference. AI already beats almost everyone at these capabilities. A person with high IQ who never does real cognitive activity is, in the AI era, the one eliminated fastest, because the things they used to use their IQ for (rapid learning, memory recall, fluent deduction) are now done faster, more accurately, and more cheaply by AI. Their advantage is flattened.
What actually gets amplified is another layer: recognising deep structure on an unfamiliar problem, judging which question is worth asking, carving operable objects out of a fog of reality. This is related to IQ but not identical. A person with moderate IQ who has identified what real cognitive activity is and sustained effort at it will be amplified more by AI than a person with high IQ who has spent their life inside the comfort zone of pattern recognition.
The gap they left behind: “high cognition” gets treated as an innate, fixed attribute. This makes the whole folding narrative sound like an unstoppable force: high-cognition people win, low-cognition people get left behind, no middle ground, no agency.
But the real divide isn’t the innate “high cognition vs. low cognition” divide; it is the conscious one: “recognised it vs. didn’t.” The first is an immovable fate; the second is a crossable threshold. Not easy to cross, but in principle open. Most of the hopelessness that colours the cognitive-folding narrative comes from mistaking the crossable threshold for the immovable attribute.
Three lines are describing one thing
Put the three together.
The generalist thesis’s “cross-domain mapping ability,” the underlying mechanism that turns multi-domain depth into judgment, is really the ability to peel the surface off multiple domains and grasp the deep structure they share. Without it, reading across ten fields produces ten piles of fragments; with it, three fields can produce a new judgment.
The taste argument’s “high-order taste,” the ability to make the structure behind intuition explicit, is really the same ability operating on top of a deeply internalised sample base. Low-order taste needs only samples; high-order taste needs samples plus the capacity to abstract the pattern into structure.
The cognitive-folding thesis’s “the layer AI doesn’t replace but amplifies” is also this ability. Once AI has flattened knowledge, memory, fluent expression, and instruction-following, the one thing left unflattened is the act of peeling reality off its surface and operating on it as structure. That is what gets amplified.
Three lines, three different angles, touching the same thing and giving it three names: generalist, taste, folding. The core is one thing: the ability to abstract a concrete situation into an operable structure.
Cognitive science has a name for it: cognitive decoupling. It means peeling a representation off the reality it refers to and operating on it as an independent object. Every abstraction, hypothesis, counterfactual, and act of self-examination is built on it. The three popular framings are each touching a different facet.
This capability has always existed, but it was never priced on its own. In the pre-AI era it sat mixed in with a large number of other capabilities: knowledge storage, memory, fluent expression, skill through practice. A “smart” person typically had all of them, but nobody knew which of them was actually doing the work. So each popular framing was a blind person touching an elephant, grabbing a part and describing it with the vocabulary of that part.
The AI era makes this visible for the first time. AI has substituted for knowledge storage (it knows more than anyone), memory (it retrieves instantly), fluent expression (it writes better than most), skill through practice (it doesn’t tire, doesn’t slip, doesn’t need reps). With those capabilities stripped away one by one, what remains is the underlying mechanism that was always masked: the ability to cut reality into structure, judge which structure is right, and build new structure in unfamiliar territory.
That is what the generalist thesis, the taste argument, and the cognitive-folding thesis are all converging on. It isn’t a new ability; it’s an ability that has always existed and is only now being named in isolation.
What AI can do, what is left to humans, what to learn
Take the judgment to the practical level.
The range of what AI can do is expanding quickly: pattern synthesis, information retrieval, fluent expression, instruction-following, and complex tasks in verifiable-reward domains (code, math, structured reasoning). That range is expanding every few months with no visible ceiling.
The range of what AI can’t do is shrinking, but a few things it cannot do now and cannot do soon: carving murky reality into operable problems (problem construction), recognising structural errors in its own output (meta-judgment), judging what is good in the absence of an external reward signal (taste and value trade-offs), true abstraction on a completely unfamiliar structure (decoupling itself).
The first three all rest on the last one. Without real decoupling, the first three are just interpolative mimicry in a high-dimensional representation space. They look like the real thing but aren’t.
So what is actually left to humans in the AI era? The layer that directs the model: posing the right question, judging whether the output actually solves that question, pulling the model back when it drifts, making the value decision about the final result. Every part of that layer’s work rests on cognitive decoupling.
Then the crucial question: what, specifically, should one learn?
Not more knowledge. The model knows more than anyone. Not more fluent expression. The model writes better than most people. Not more “thinking frameworks.” Most of the thinking frameworks on the market are rhetoric dressed up as tools.
What one should actually learn is a small number of concrete things: several core reasoning tools (so judgment can be made without relying on intuition); regularities from a few domains outside one’s own (so cross-domain mapping has something to map); and putting oneself in environments where reality calibrates (so everything above doesn’t slowly rot). Plus the partly-untrainable underlying mechanism: cognitive decoupling.
Four things, independent of each other, acting on each other. Together, a whole system.
Chapter 6 · The Four-Part Formula, Item by Item
Decoupling itself is only an operator. A person’s capacity for cognitive output is jointly determined by four things. Their relationship is not additive. It is multiplicative.
Hardware × software × database × operating environment.
If any one item is zero, the whole thing is zero. This explains why most attempts to “improve cognition” fail. They usually work on only one of the four items, while the multiplicative structure lets the other three weaknesses cancel the gain completely. A naturally smart person who never learns reasoning tools and a person who has loaded themselves with reasoning tools but never exposes judgment to feedback will both produce close to zero. They just fail in different ways.
Take the four items in turn.
Item I · Cognitive decoupling as hardware
Cognitive decoupling is the ability to peel a representation away from the reality it refers to and operate on it as an independent object. Every abstraction, hypothesis, counterfactual, and act of self-examination rests on this ability. Without it, a person remains bound to the concrete stimulus in front of them. “My thought” and “the fact” blur together. “This situation” and “the structure behind it” cannot be separated.
The closest psychological concept is fluid intelligence: the ability to recognise structure when facing a completely unfamiliar problem, without relying on prior experience. Notice that it has nothing to do with knowledge storage. A well-read person may have low fluid intelligence; a person who has read very little may have high fluid intelligence. This is the underlying capacity for asking, when meeting something unseen, whether one can produce a structure for it by oneself.
There are several observable signals in daily life. Give someone two problems with different surface descriptions but the same underlying structure. A strong decoupler performs similarly on both. A weak decoupler collapses as the surface difference grows, and usually does not notice the collapse. Stability in counterfactual reasoning is another signal. If X had not happened, and everything else stayed the same, what follows? A person who can hold “X did not happen” steady and reason from it has decoupling online. A person whose mind is immediately overwritten by the actual world does not. There is also a very simple proxy: the first reaction to a completely unfamiliar problem. Strong decouplers begin abstracting the structure. “What is the general form of this problem?” Weak decouplers get stuck at “I don’t understand this,” or force-fit it to the most superficial similarity available.
This part has to be brutal.
The upper limit of fluid intelligence is largely genetic. Twin studies comparing identical twins, whose genes are almost the same, with fraternal twins, whose genetic difference is like that of ordinary siblings, show that this ability is far more heritable than people like to admit. There is also a counterintuitive pattern: the older you get, the more genetic it looks. Childhood and adolescence still leave some room for environmental effects. As people age, genetic influence becomes stronger, and by adulthood this ability is largely locked.
Over the past twenty years, many products have claimed to “raise intelligence”: brain-training games, working-memory apps, thinking courses. Later large-scale studies have mostly rejected that path. You improve at whatever you train, but the improvement does not transfer to real reasoning tasks. Practising a brain-training game makes you better at that game. It does not make you think more clearly when you face an unfamiliar problem.
That means almost every course or product promising to “improve cognitive ability,” “train thinking,” or “raise IQ” is selling an illusion. In adulthood, this item is basically locked.
So there are only two things to do.
First, identify honestly where you stand on this item. Not in order to give up, but in order to allocate effort rationally across the other three. A person with middling fluid intelligence and a person with very high fluid intelligence will have different ceilings of output complexity after installing the same reasoning tools. Pretending the ceiling does not exist only makes people burn themselves on the wrong target.
Second, do not let the limits of this item contaminate your judgment of the other three. The next three items are trainable, and they determine how far a person is from their own ceiling. For most people, that distance is far larger than the gap between ceilings.
An honest footnote has to be added here. Fluid intelligence does not only set the ceiling for decoupling. It also affects the training efficiency of the next three items. The speed at which reasoning tools are installed, the depth to which they are installed, and whether they transfer to new domains do not develop at a uniform rate. People with high fluid intelligence install them faster, understand them more deeply, and transfer them more widely. This pattern has solid empirical support in cognitive science. The more complex a learning task is, the stronger the correlation between fluid intelligence and learning rate. In the early phase of skill acquisition, fluid intelligence explains roughly 30% to 40% of the variation in complex problem-solving ability.
So the more honest version is this: this is not a story in which everyone can catch up with the smart people by trying harder. The space below the ceiling is open to everyone, but the shape of that space and the slope of the climb differ. A person with below-average or lower-middle fluid intelligence may need longer than the 6 to 12 months given later to install the basic reasoning tools. The eventual depth may also be shallower than that of a person with high fluid intelligence.
But this does not change the central fact. Most people, including many people with high fluid intelligence, have not used the reachable space they actually have. Not using it is the real waste. Whether one can reach the very top is a separate question, and for most people it is irrelevant.
The self-help literature that makes people feel “I am actually very smart, I just haven’t used it yet” mainly serves anxiety management. It is not help. Useful information is cold: the ceiling exists, and after adulthood it mostly does not move. But below that ceiling, most people have left enormous space unused.
Item II · Reasoning tools as software
Reasoning tools are concrete methods of thought installed in the cognitive system and available for use. Their relationship to decoupling is like software to hardware. Hardware determines how complex a program can run. Software determines what the hardware actually outputs on a given problem.
This is the most severely underrated item in the four-part formula. Most people who describe themselves as “people who like to think” have an empty reasoning-tool stack. What they call thinking is intuition rationalising a phenomenon, then fluent language expressing the rationalisation. No tool participates in the process, so the output is mostly a by-product of emotion and pattern matching. It is not judgment.
The four reasoning tools with the widest coverage and highest return are these.
Probability and uncertainty reasoning. Truly understand base rates, conditional probability, sampling bias, selection effects, and calibration. This does not mean taking a statistics course. It means turning these ideas into judgment reflexes.
Causal reasoning. Distinguish correlation from causation. Understand confounders and counterfactual controls. Most people treat “A happened alongside B” as “A caused B.” This tool systematically corrects that habit.
Game theory and incentive structures. This does not require mathematical depth. What matters is the reflex of asking, in any phenomenon, who is paying for whose decision. Once this tool is installed, the way one reads news, policy, and business changes completely.
System dynamics. Understand nonlinearity, delayed feedback, and emergence. Most mistaken attributions in complex problems happen because no feedback loop is being simulated in the mind.
Each tool has classic entry-level books, listed in the appendix. Once the four are installed, the cognitive system automatically decomposes any phenomenon into variables, causal directions, incentive distributions, and feedback loops. This is not a thinking trick. It is a reflex.
There is a simple test for whether the tools are installed. When facing a new phenomenon, does the person jump straight to a conclusion, or do they automatically place it inside a reasoning frame? The first type is impoverished. The conclusion may be right or wrong, but the person cannot tell which. The second type has internalised the tools. They first ask about base rates, causal direction, incentive distribution, and feedback delay.
Another diagnosis is more direct. Ask the person to estimate a quantity they are completely unfamiliar with, such as how many bicycles in a city get rained on in a year. A well-equipped person will spontaneously decompose the estimate into several independent factors and multiply them. An empty-stack person will give a number directly or say, “How would I know?” The difference is not knowledge. It is tooling.
This item is fully trainable, and its return is the highest of the four because most people start near zero. Installing the minimal set takes roughly 6 to 12 months of serious reading plus deliberate use in daily judgment. That time span is surprisingly short for most people. It is far shorter than any degree program, far shorter than learning a craft, and the return is much larger.
The key is not finishing the books. The key is being able to call the tools automatically in daily judgment. The verification method is simple. Next time you form a judgment about some phenomenon, pause and ask which tool is operating here. If you can answer accurately, it is installed. If you cannot answer, it is not. If you can answer but realise your judgment bypassed the tool and went straight through intuition, you are in the most common intermediate state. You need to practise putting the tool before the intuition.
The relationship between this item and the first has to be made explicit. A repeated academic finding is that the correlation between IQ and actual judgment quality is surprisingly low. High IQ does not prevent stupid decisions, because IQ measures processing power. It does not measure whether reasoning tools are installed. A person with middling fluid intelligence and a complete tool stack can steadily outperform a person with high fluid intelligence and an empty stack.
For people who did not get a strong starting point on the first item, this is genuinely good news. For people who did get a strong starting point and are too lazy to install the second, it is bad news.
Item III · Cross-domain regularity depth as database
Decoupling needs something to operate on. That something is an understanding of regularities across multiple domains. But “depth” has to be defined precisely, because it is easily mistaken for mastery or erudition.
There is a core distinction here. A domain’s deep structure is the causal mechanism and constraint relationship that actually determines behaviour. Its surface features are terminology, process, cases, and industry jargon. What decoupling can use is deep structure, not surface features.
So cross-domain regularity depth is not mastery. Mastery means being able to execute work inside a domain: arguing a case, performing surgery, writing production-grade code. Regularity depth means being able to explain why things in that domain happen the way they do: the incentive structure, information asymmetry, feedback delays, survivorship bias, and where the standard narrative lies. The first requires more than ten years of specialised investment. A good observer can reach the second in a few months.
The harshest diagnostic is this: can you name one thing practitioners in a field collectively believe that is actually wrong?
The logic behind this test is that every field has its own standard narrative. It is constructed by stakeholders, beautifies itself, and is sometimes inverted from reality. A person who has grasped a field’s regularities can identify where that narrative lies. If they can name it, they understand. If they cannot, or if all they can repeat is generic meta-complaint everyone in the industry already knows, such as “capital chases profit” or “the system is broken,” they are still at the standard-narrative layer. They have not entered deep structure.
Another diagnostic: when hearing about a new phenomenon in that field, can the person predict where it will go without looking anything up, and know which assumption was wrong when the prediction fails? If yes, there is already a causal model in the mind. If not, the person knows facts about the field, but does not have a model of it. Having facts without a model does nothing for decoupling.
This item is fully trainable, but the threshold is higher than for reasoning tools. It requires time and sustained curiosity. Each domain takes roughly 6 months to a year to reach the level where one can predict and identify error.
Three types of domains have the highest return.
Domains with obvious incentive distortion. Medicine, education, academic publishing, government procurement, insurance, charity. These fields have the largest gap between appearance and actual dynamics. Understanding any one of them installs a set of transferable thinking tools.
Domains rich in historical data. Financial markets, military history, epidemic history, technology iteration. They have real feedback, falsifiability, and regularities pressure-tested over long periods.
Domains adjacent to your home field. Neighbouring disciplines around a field where you already have depth. The learning slope is fastest, and the value transfers back to the home field most directly.
The minimal learning path for each domain is simple. Read one insider’s critique of the field, not an introductory textbook. Read 2 or 3 empirical studies or systematic reviews. Follow one high-information-density source regularly: not news, not a KOL, but a deep newsletter or researcher blog from inside the field. Then make at least 10 verifiable predictions and track them. Without the last step, the first three are entertainment.
What should be actively avoided are domains driven mainly by story and narrative: fashion, celebrity gossip, political commentary, inspirational business books. Their “regularities” are mostly post-hoc rationalisation. Studying them fills the mind with more surface features, not deep structure.
Item IV · Feedback exposure as operating environment and exponent
The first three items determine momentary ability. The fourth determines whether the first three can be maintained and keep growing over time.
Feedback exposure means the degree to which a person’s judgments are systematically tested by reality. In a high-feedback environment, each judgment is confirmed or corrected by facts. In a low-feedback environment, judgments can float indefinitely without being calibrated.
Why is this an exponent rather than an additive item? Because without feedback, any level of the first three decays over time. Decoupling turns into self-indulgence: the person thinks they are abstracting structure, but they are producing deep-sounding nonsense. Reasoning tools turn into ritual: Bayesian language without any actual prior update. Cross-domain regularity becomes scholarly collecting: the person knows many patterns but cannot tell which one applies in the current situation. Conversely, even if the first three items are only moderate, a high-feedback environment keeps calibrating them. Over time, output quality can exceed that of someone fully loaded on the first three but living in low feedback.
The simplest diagnostic is this: among the important judgments you made in the past year, how many were clearly confirmed or refuted by reality?
If the number is close to zero, then no matter how much the person thinks, how many books they have read, or how fluent their expression is, the cognitive system has not been calibrated for a long time. They may once have reached a fairly high level, but that level is slowly distorting, and they cannot see it. A low-feedback environment is defined by the absence of anything that tells you the distortion is happening.
A famous long-term study tracked nearly 300 “experts” across fields and collected a large number of their predictions over twenty years. The core finding was not which people predicted well. It was that most experts’ predictive accuracy was close to a coin toss, while their self-assessment of accuracy was far higher than their actual performance. The gap came from one thing: their predictions had never been systematically recorded. No scoreboard, no calibration.
Strictly speaking, this item is not about training. It is about environment selection. Feedback exposure cannot be raised by effort alone. It can only be selected by entering some environments and refusing others. This makes it the most hidden of the four items. People can roughly infer the first three from conversation. Feedback exposure is structural. Outsiders cannot see it, and the person often cannot see it either. Someone who has spent ten years doing “strategy” at a large company may be strong on the first three items. But if none of those judgments has ever been tested by the market, users, or concrete outcomes, most of that decade’s “thinking” output is noise.
The environments worth entering include entrepreneurship, where sales and retention test judgment every week; trading and investing, where decisions carry profit and loss; empirical research, where hypotheses are routinely falsified; building products for real users rather than KPI products; and competitive games such as Go, poker, and esports, where outcomes are clear.
Those are feedback environments at the level of career. But feedback loops are not limited to professional life. Daily life contains many equally strict feedback structures, although they are easy to ignore. Cooking tests salt, heat, and timing at every meal. Raising children gives feedback within seconds through crying, laughter, and behaviour. Physical training lets the body report on movement, intensity, and diet adjustments over short cycles. Caring for pets resembles raising children but often gives faster feedback because reactions are more direct. Learning an instrument gives immediate feedback on pitch and rhythm. Repairing things leaves no room for self-beautification: it works or it does not. Negotiation tests whether your judgment has aligned with what the other side actually wants. Gardening is equally indifferent. The plant lives or dies.
These are high-density feedback loops. By contrast, someone who has spent ten years in a big-company meeting room making strategy slides and never executing may score far lower on feedback exposure than a parent who cooks daily for two children while caring for a dog. This is not rhetoric. It is structure. The former’s judgments never touch reality. The latter’s judgments are confirmed or corrected within hours.
Culturally, we tend to bind “cognitive ability” to certain occupations: scientists, engineers, investors. But feedback exposure does not respect occupational labels. A person genuinely invested in a life practice that requires continuous feedback, even if it is cooking, parenting, or training, may keep their cognitive system more online than many people in respectable white-collar roles whose work is never validated by outcomes. The first three items, decoupling, reasoning tools, and cross-domain depth, do require specialised training. But the fourth item, keeping one’s judgment in contact with reality, is available in any life form.
What should be avoided are pure opinion production such as KOL commentary, columns, and punditry; consulting-style “strategic thinking” where advice is given and execution is never tracked; large-company roles with low-frequency decisions and long-delayed feedback; and any domain where speaking well is enough to win. Persuasiveness and judgment accuracy are completely decoupled in those domains.
If changing the main environment is temporarily impossible, one can build feedback mechanisms. But first a common illusion has to be broken. The popular trio of prediction logs, decision journals, and adversarial peer groups is something almost nobody sustains. The problem is not that those tools are wrong. The problem is that they turn feedback into an extra ritual detached from real life. A person busy with work and life who relies on willpower to write five predictions a month will stop within two weeks. People who lack feedback do not lack a spreadsheet. They lack a life structure that forces judgment to meet reality. A spreadsheet cannot solve a structural problem.
Sustainable feedback comes from two things: closing judgment-verification loops inside domains one already cares about, and turning the learning of new domains into a process with verification built in. Neither is a journal. Both are actions.
First: close loops in existing domains. Most people are already making judgments in their own field every day. The judgments simply remain implicit, so they cannot be verified. A colleague asks whether a plan will work. A friend asks whether a company is worth joining. You decide whether to build a feature. You read a news item and feel that it will reverse in two weeks. Each of these is a judgment, and each could be tested by later facts. But most people never say or write the judgment concretely. It remains a vague “I feel,” and afterward, whatever happens, the brain can rewrite the memory into “that is what I thought all along.”
There is only one thing to do: make the judgment concrete at the moment it occurs. When telling a colleague that a plan has a problem, say specifically, “I think it will get stuck on X within two months.” When making a product decision, write directly in the document, “We expect DAU to rise by Y after this ships. If it does not, I was wrong.” When making a prediction about a news event, tell one concrete person, “I bet X will not happen.” No journal, no scoring system, no Brier score is required. Saying the judgment out loud already forces later self-calibration, because what was said will be remembered by the other person or by you a few months later.
The threshold is accepting the risk of being wrong in public. Most people do not say their judgments because if they are wrong, they lose face. Exactly that pressure makes the judgment real. It forces you to think again before issuing it. Every such concretisation is a small act of cognitive calibration.
Second: turn learning a new domain into verification. When trying to understand a new field, most people default to reading, then reading more, then reading still more until they reach a vague sense of “I think I get it.” That “understanding” is never tested. The correct path is to form a prediction that can be wrong before verifying it.
The procedure is concrete. Choose a field you want to understand. Before reading anything, write down three to five basic assumptions about how the field works, what factors drive it, and how it may change soon. After writing them down, begin reading. The reading process is not “absorbing information.” It is colliding the material with your own prior assumptions. Which assumptions are confirmed? Which are refuted? Which need revision? Afterward, what remains in the mind is not a pile of other people’s conclusions. It is your own model, calibrated by contact with material.
The point of this method is not efficiency. It is the processing path of information. Reading before thinking is passive absorption, and its product is a repetition of other people’s judgments. Thinking before reading is active calibration, and its product is your own causal model. The time cost is roughly the same, but the output quality is completely different. The first path leaves almost nothing after a few months. The second builds a reusable judgment tool in that field.
Why these two methods work while journals fail. Journal methods require extra energy for the act of recording itself. Except for a small number of unusually disciplined people, that is not sustainable. The two methods above require no extra life content. They simply change the structure of actions you are already taking. You are already discussing things with colleagues, making decisions, reading, and following news. These methods make those actions carry a verification loop inside them, instead of adding a separate ritual beside them.
Sustainability comes from not adding new life content, but changing the structure of existing content. That principle matters more than any specific tool.
Of the four items, only one determines the ceiling. The other three determine how far a person is from that ceiling.
Most people are very far from their own ceiling.
This is bad news for people who do not want to move. It is good news for people who do.
Chapter 7 · Conclusion
A person willing to move, after reading the framework above, will naturally ask: “So where should I start?”
The question itself deserves suspicion.
If the first move after reading the four-part formula is to ask for a starter checklist, that means the article is still being consumed as a methodology. However good the checklist is, the person holding it will probably stop within two weeks. The actions described by the checklist all depend on something deeper, and that deeper thing cannot be started by a checklist.
That deeper thing is only this:
Stop treating ruminated consumption as learning.
This does not mean “scroll half an hour less” or “wake up early to read every day.” Those frame the problem as discipline. What actually has to stop is a form of self-deception: mistaking the feeling of “I kept up with the AI world today” for “I learned something today.” Admit that the former is only emotional regulation, in the same category as relaxing with a TV show. That admission itself is the first step.
After that admission, something concrete happens. Cognitive bandwidth that was not occupied by rumination begins to appear. Once it appears, it finds its own use. Not because you have “resolved to learn,” but because a mind not drowned in noise naturally moves toward the problems it actually cares about.
The next questions are outside the scope of this essay: how to work with models, which questions are worth asking, how to use AI to sharpen one’s own judgment, and how to grow cross-domain depth out of existing experience. This essay is only a diagnosis. It explains what the divide in front of you is, what it is made of, and how far most people are from their own ceiling.
This system is not for everyone.
The hard ceiling of cognitive decoupling means some people cannot install this system. That is cruel, but it has to be said. This essay has not offered universal hope. It has offered a framework for locating oneself.
But the people who can read this far are not that group. If you can recognise “I am consuming rumination” and follow the argument all the way here without closing the page, you have already completed the basic self-test. Some of the brutal statements above did not hit you.
The real re-stratification of the AI era is already happening. The old evaluation system rewarded crystallised intelligence and pattern libraries, so people who mistook knowledge and experience for cognitive ability could still appear capable. After AI flattens those things, the remaining part, real cognitive activity, is being priced on its own for the first time. Most people have not noticed. By the time they do, the gap will no longer be catch-up sized.
One final self-test is more lethal than any framework above.
Think back to the last time you genuinely changed an important view. When was it, and why did it change?
If you can answer with something concrete, recent, and non-trivial, this system is already running in you.
If you cannot, the problem is not AI, and it is not the era. It is you. And the problem was always there. It simply had not been exposed before.
Appendix · Starter Reading List
The four core reasoning tools mentioned in Chapter 6 each have classic entry-level books. They are arranged from easier to harder. Start with the first one.
Probability and uncertainty reasoning
- Thinking, Fast and Slow, Daniel Kahneman
- What Intelligence Tests Miss, Keith Stanovich
Causal reasoning
- The Book of Why: The New Science of Cause and Effect, Judea Pearl
- Causal Inference: The Mixtape, Scott Cunningham
Game theory and incentive structures
- The Strategy of Conflict, Thomas Schelling
- The Elephant in the Brain, Robin Hanson and Kevin Simler
System dynamics
- Thinking in Systems, Donella Meadows
Feedback exposure and judgment quality, the theoretical basis for Chapter 6’s fourth item
- Expert Political Judgment, Philip Tetlock
- Superforecasting, Philip Tetlock and Dan Gardner
A practical note for reading: after finishing each book, do not immediately start the next one. Spend 1 to 2 weeks deliberately using that book’s way of thinking in daily judgments. Without this step, finishing the book is close to not reading it at all. The knowledge has not run through actual reasoning in the mind, so it will not become a reflex.
References and Sources
Sources for the main factual claims in this essay.
AI capability divergence, Chapters 1 and 2
- Andrej Karpathy’s X post about the “growing gap in understanding of AI capability” from April 2026. The original post discusses the bimodal difference between free ChatGPT users and paid Claude Code / Codex users, as well as the technical explanation involving reinforcement learning and verifiable rewards. A secondary report: The New Stack, “Karpathy says developers have ‘AI Psychosis’” (April 2026).
- The “walk or drive to the car wash” example was not proposed by Karpathy. It was a viral test that circulated independently on Threads / Twitter in February and March 2026 and was covered by several outlets, including Newsweek and Cybernews.
- SimpleBench: simple-bench.com (Philip AI Explained, 2024), 213 multiple-choice questions, with a human baseline around 84%, remaining above SOTA models.
- BrainBench: “Exposing the Commonsense Reasoning Gap in Large Language Models” (arXiv:2603.14761, 2026), a systematic account of LLM failure modes in common-sense reasoning.
Rumination chains and knowledge management, Chapters 3 and 4
- Google effect / digital amnesia: Sparrow, Liu, Wegner, “Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips,” Science 333 (2011): 776-778. Note that the effect size has been disputed in later replication work.
- Karpathy’s LLM Wiki scheme: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f (published April 3, 2026). The wiki scale mentioned is roughly 100 articles and 400,000 characters.
- Luhmann’s Zettelkasten system: roughly 90,000 cards accumulated over a lifetime, producing 70+ books and 400+ papers. See zettelkasten.de and Sönke Ahrens, How to Take Smart Notes (2017).
Generalists, taste, and cognitive folding, Chapter 5
- Dan Koe’s generalist thesis: several essays at thedankoe.com, including “The Rise of the Generalist” and “The Future of Work.”
- Paul Graham’s theory of taste: paulgraham.com/taste.html, “Taste for Makers” (2002).
- Rick Rubin, The Creative Act: A Way of Being (2023).
- Cognitive decoupling, reflective mind, and the low correlation between IQ and rational judgment: Keith E. Stanovich, Rationality and the Reflective Mind (Oxford, 2011) and What Intelligence Tests Miss (Yale, 2009). The Stanovich & West line of work shows that correlations between thinking dispositions and IQ are usually below 0.30.
- Fluid intelligence and structural abstraction: François Chollet, “On the Measure of Intelligence” (arXiv:1911.01547, 2019).
Plasticity of cognitive ability, Chapter 6 item I
- IQ heritability and the Wilson Effect: Haworth et al., “The heritability of general cognitive ability increases linearly from childhood to young adulthood,” Molecular Psychiatry 15 (2010): 1112-1120; Plomin & von Stumm, “The new genetics of intelligence,” Nature Reviews Genetics 19 (2018): 148-159. Adult IQ heritability is commonly estimated at 0.70 to 0.80.
- Brain training and the failure of transfer to fluid intelligence: multiple meta-analyses by Melby-Lervag & Hulme and others; Simons et al., “Do ‘Brain-Training’ Programs Work?” Psychological Science in the Public Interest (2016).
Expert prediction and feedback exposure, Chapter 6 item IV
- Philip E. Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? (Princeton University Press, 2005/2017). From 1985 to 2003, the study tracked 284 experts across fields and collected 27,451 verifiable predictions. Core conclusion: expert predictive accuracy was close to random, while experts’ self-assessments of accuracy were far higher than actual performance.
Frontier AI benchmarks, Chapter 2
- AIME (American Invitational Mathematics Examination): an olympiad-track mathematics benchmark.
- GPQA (Graduate-Level Google-Proof Q&A): a benchmark of PhD-level science questions.
- SWE-Bench: a benchmark of real GitHub issue software-engineering tasks.
- These benchmarks were approached or reached at expert level by frontier models in 2025-2026, including OpenAI o-series models, Anthropic Claude 4/4.5/4.6-series models, and Google Gemini 2.x/3.x-series models.
This essay conceptually paraphrases the sources above rather than quoting them directly. For exact wording, consult the original links and publications.