目录
Contents
把文章交给 AI 解读 Hand Off the Essay to AI
-
STEP 1
复制提示词 Copy the prompt
-
STEP 2
前往 Agent Open an agent
-
STEP 1
下载文档 Download the markdown
下载 Markdown Download Markdown -
STEP 2
复制提示词 Copy the prompt
-
STEP 3
前往聊天模型 Open a chat model
序章 · 一千一百年
公元 587 年,长安。
杨坚翻着一份名单。
开皇七年,官员要补。这些人他大半不认识,每个名字旁边都有荐主。荐主他认得。三省的省郎,各州的中正,几十年里来回,就这几个家族。
崔、卢、王、谢。
这些字写在他签发的国书上,也写在他祖父签发的国书上,也写在再往前几位皇帝签发的国书上。北周亡了,北齐亡了,梁亡了,陈快要亡了。这些字没有亡。
他统一了北方,即将统一南方。从他这一代起,这片土地不再叫某朝的北、某朝的南,它叫隋。但他要任命的每一个人,都要先从这些字底下过一道。
九品中正制运转了三百多年。中正官在地方上把人评成九等,中央按品级授官。表面挑的是人才,底下数的是出身。一个皇帝坐在长安城里,他签出去的每一份任命,都要先经过几十个家族在地方上的笔。
杨坚要这个国家听他的命令。
他要一种办法。不经中正,不经荐举,不经家族过手。任何人,只要通得过一套公开的标准,就能直接被识别。
开皇七年,朝廷下令,分科举人。
这条线从这里露出形状。
往后一千三百年,这件事的细节会反复修改。科目会变,内容会变,殿试会出现,糊名誊录会出现,八股会出现。皇朝会换,一次又一次。底下不变的是同一件事:
让一个坐在中央的人,从一千万户人家里识别出可被使用的那一小批;这套识别不经出身,不经荐举,不经中间人。
科举的设计目的不在于让读书人有出路。读书人有出路,是这件事的副产品。
时间过去一千一百年。
公元 1763 年,波茨坦。
腓特烈二世坐在无忧宫里,签下一份命令。
七年战争半年多前刚结束。普鲁士赢了。和约是赢家的和约,但这个赢家也快空了。五十万人死在战场和饥荒里,东部的村庄整片整片没人,税源接近枯竭。靠他赢下这场仗的,是一支由国家长期训练的军队,以及一套由国家直接调度的工业。
打完之后,他知道下一仗早晚要打。他需要一种特定的人,能在工厂和军营之间互换,能在标准化流程里被分配位置。
这种人不会自然产生。
农村的孩子不识字,不会算术,不守时,不接受抽象命令。他们对父亲负责,对宗族负责,对教区的神父负责。国家不在他们的负责清单上。要让国家进入他们,必须在更早的时候介入。
8 月 12 日,腓特烈签下《一般地方学校规程》。
5 到 13 岁的儿童必须入学。国家规定教学内容。
这是普鲁士国家义务教育的奠基文本,也是后来欧洲国民教育模式的重要参照。
义务教育要训练四件事。
识字。算术。守时。服从分级权威。
识字让人读得懂命令。算术让人能在工厂和军队里完成基础任务。守时让人能配合规模化调度。服从让人能在科层里被分配位置。
一个孩子被生产成什么的过程,从此可以在他还不知道国家是什么的时候开始。
时间再过去一百五十年。
公元 1913 年,密歇根州,Highland Park。
Henry Ford 看着一份报表。
报表上写着:这一年,福特工厂为了维持一万三千个岗位,一共雇佣了五万两千人。流动率 380%。
T 型车从 1908 年开始量产。这一年初,他把流水线第一次满载推到生产线上。底盘装配的工时从 12.5 小时压到 93 分钟。一辆车从开始组装到下线,比一个工人吃完午饭快。
但工厂出了问题。
工人不来,来了不留。一台机器停在那里等一个会拧螺丝的人,比拧螺丝本身更贵。
原因清楚。工厂里大半是刚到美国的移民。波兰人、意大利人、希腊人、东欧犹太人、墨西哥人。他们听不懂工头的英语指令,看不懂安全标识,记不住自己工号对应的工序。在自己的村子里习惯了一种节奏,工厂要的是另一种节奏。一个工人从乡下到流水线,中间缺的那一段东西,义务教育补不上。义务教育给了他识字、算术、守时、服从这四件底子。工厂要的是他在被分配的工位上,按 93 分钟的节拍,重复一个具体到秒的动作。这一段,没有任何已有的系统在教。
这一年,Ford 建立了 Ford English School。一年之内,几千名移民工人在工厂里、在下班后上课。课程包括英语、美国公民常识、个人卫生、如何使用银行、如何在美国的家庭里安排开支。次年扩展为 Sociological Department,五十名调查员上门访问工人家庭,评估他们 “是否准备好成为福特工人”。通过评估的人,可以拿到当时业界两倍的工资,每天 5 美元。
这是早期最清楚被档案记录的、由企业自己出钱、自己设计、自己执行的大型成人内训体系之一。
它要解决的不是工人会不会读书的问题。义务教育已经回答了那个。它要解决的是另一个问题:
一个人通过了义务教育的筛选、被国家识别为可使用的对象之后,如何把他进一步生产成某一个具体岗位上可立即嵌入的执行单元。
往后一百年,这件事的形式会反复修改。学徒制变成在职培训,在职培训变成 OJT,OJT 变成企业大学,企业大学变成员工发展体系,员工发展体系变成今天的领导力项目、岗位认证、内部商学院。底下不变的是同一件事:把通过了普遍筛选的人,进一步生产成特定组织能直接使用的人。
科举回答的是筛选问题:已经有一千万户人家了,如何从里面识别出可用对象。
普鲁士义务教育回答的是生产问题:如何让一千万户人家先变成可识别的对象。
福特的内训体系回答的是适配问题:如何把已经被识别的对象,进一步加工成某一个组织在某一个具体位置上可以直接使用的人。
三件事拼起来,是现代教育和培训的完整结构。先把人生产成可识别的对象,再把这些对象筛选出可用的那一小批,再把可用的人加工成可立即嵌入的岗位单元。
二十世纪以后,每一个工业化国家都同时运转这三套机器。义务教育,考试与学历,企业内训。形态各有不同。义务教育的年限不同,考试的形式不同,选拔的节点不同,使用的话语不同。企业培训的形态也不同,学徒制、岗前培训、OJT、内部商学院、领导力项目、外部 MBA。但识字算术、规模化课堂、年级制度、分科考试、学历层级,加上岗位说明书、SOP、培训手册、绩效考核、晋升通道,这套底层结构在所有现代国家、所有现代组织里都成立。
这件事在学术上也不新。Bourdieu 与 Passeron 1970 年的《再生产》、Bowles 与 Gintis 1976 年的《资本主义美国的学校教育》、Andy Green 1990 年的《教育与国家形成》,半个世纪里反复指向过同一件事:学校的真正功能,是为既存的国家与经济体制生产可被使用的人。
教育和培训共同构成一个为了使用人而设计的人力资本生产系统。
它在设计的时候就不为一个人活得好不好负责。它要回答的问题是:
这个人能否被使用。
第一章 · 当下的现状
杨坚、腓特烈、Ford 三个时点都已经过去。三个问题没有过去:谁能被筛选出来,谁能被生产成可用的人,谁能被适配到具体的位置上。这套系统在 1913 年的 Highland Park 之后又运转了一百一十多年。2026 年此刻,它仍然在运转,方式开始变形。
七项认知机制
「教育训练人」这件事说出来是一句话,拆开来是七项。每一项可塑性不同,AI 之后的命运不同,需要的训练方式也不同。
第一是知识(晶体智力,Gc)。一个人脑子里储存的事实、定理、词汇、案例。读书、上课、做题、积累经验,主要厚度都积累在这里。这一层可以无限累积。AI 之后它的稀缺性几乎归零,AI 储存的事实库远超任何人。但读过、记得、理解仍然有用:在 AI 给出答案时,能立刻判断真假。
第二是流体智力(Gf)。从一个从没见过的局面,识别结构、找突破口、构造路径的现场推理能力。它不依赖知识,依赖一个人能不能在新情境里启动思考。这件事很难被训练直接抬高。心理学几十年的研究反复指向同一个结论:教育和脑训练很难稳定、远迁移地提高 Gf 上限。教育能做的不是凭空抬高上限,是把后面三项做扎实。
第三是调用频率。一个 Gf 上限再高的人,几年不用就会闲置。Gf 在没有外力驱动的时候不会主动启动。一个人在成长过程里被反复推到「非用不可」的边缘的次数,决定了他成年后实际能调动多少推理。题库训练真正起作用的地方就在这里。好题制造结构性陌生感,迫使学生现场建模;坏题训练标准反应,越练越不动脑筋。AI 之后这一层不贬值,因为 AI 没有连续主体可被训练,无法替代你维持思考状态。
第四是策略库。每一次费力推导出来的解法,下一次遇到相似问题不必再次推理,可以直接调用。下棋有定式,物理有套路,工程师有手感。专业问题解决能力的差距,主要不在 Gf,在策略库的厚度。AI 之后这一层一分为二:表层的、模板化的策略 AI 全有,比你快也比你准;深一层和具体领域深度关联的判断(「这台机器一开机就有点不对劲」),AI 无法承接,因为它没有现场感受。
第五是基础自动化。识字、计算、阅读、书写训练到不需要专门动脑就能完成的程度。这一层决定了一个人在做高级任务时还剩下多少认知资源。日本基础教育在这一层投入最厚,PIAAC 成人测评里日本和北欧排在最前面,根子就在这里。AI 之后这一层的作用变了:不再是替你产出,而是让你在 AI 给出答案时立刻判断「这看起来不对」。基础不扎实的人使用 AI,相当于把判断权外包给一个不能负责的系统。
第六是认知去耦。把抽象结构从具体情境里剥离出来,作为形式对象操作。证一道几何题、推一段代数、修复一个程序的 bug,做的都是同一件事。它和 Gf 紧密相关,但更窄、更专、更费力。ISSUE 01 把它放在了「AI 时代真正的分化」的中心。AI 之后这一层稀缺性最高。AI 本身是认知去耦的产物(语言被剥离成概率,推理被剥离成参数),但它无法替代具体的人完成去耦。AI 能给出结果,但完成去耦这一步必须由具体的人来做。
第七是元认知。「我哪里不懂」、「这个解法在什么前提下成立」、「我刚才那一步会不会跳得太快」。它是长期通过真实任务和真实反馈反复磨炼出来的习惯。旧教育几乎不训练这一层。考试不考它,老师无法评分,它发生在学生自己脑子里。AI 之后这一层极其稀缺,因为面对 AI 的输出,「我现在该不该相信它」是 AI 无法替你完成的判断。
七项摆在一起看,我们有一个隐约的发现:旧教育投入最多的几项(知识、调用频率的浅层、策略库浅层、基础自动化),刚好是 AI 之后开始贬值的;旧教育几乎没有投入的两项(认知去耦、元认知),刚好是 AI 之后最稀缺的。
教与训
序章已经讲过这两个词。
「教」是学校教育,从小学到大学,做两件事:把人按能力分出层级,再把每一层塑造成可被识别的对象。学历、分数、证书、学校层级,是它的产出。
「训」是工作里的训练,包括学徒制、岗上带教、企业内训、师徒、行业认证。它做一件事:把已经有学历的人塑造成可以立刻嵌入某个岗位的人。岗位经验、流程熟练度、汇报能力、协作习惯,是它的产出。
教生产可排序的人,训生产可嵌入流程的人。两者过去都成立,因为过去的世界确实需要大量可排序、可训练、可嵌入的人。
失格
先把「失格」和「失败」分开。失败是事情做得不好。失格是事情承担的资格不再成立。一件事可以做得很好,仍然失格。
旧教育和旧培训不是没有用。它们曾经能够使用,因为知识、经验、熟练度、形式合格在过去都是真实稀缺的。一个人能写出像样的文章,多半真的理解;能写出像样的代码,多半真的通晓工程;能稳定交付,多半真的适合岗位。形式间接证明了实质。
AI 改变的就是这个证明关系。形式仍然能用,仍然能交付,仍然能让人觉得「这个东西做得不错」。但它不再稳定证明背后有相应的主体。ISSUE 04 把这件事称为「形式可以脱离实质独立产生」。
教与训失格,失的是这一点:它们训练出来的「合格」,过去能间接证明合格,今天证明不了了。
参照系:四国体系
比较各国教育体系是危险的工作。它太容易掉进国别脸谱:东亚刷题、北欧自由、美国创新、日本严谨。这些标签每个都对一半,每个都不能用来论证。
本文选中国、美国、日本、北欧四个参照系,因为它们覆盖了「教与训」在 AI 之前的几乎全部可能形态。每个体系都是一种把人生产成「可被使用对象」的完整的、合理的设计,对应一个具体的历史问题。以前面确立的七项为坐标,每个体系投入的方向都不一样。
中国的核心是题库筛选。从隋代科举一路到当代高考,再到这几年长三角和北京推动的拔尖创新人才培养计划,逻辑没变:通过大规模、分级、带反馈的标准化题目,把学生反复推到已有能力的边缘上,触发现场推理,再按结果排出一个稳定的次序。它的强项是高密度调用 Gf 并稳定排序。同样上限的两个人,被反复推到边缘的那一个比从未经历过的那一个更能做事。弱项是一旦题型被充分模式化,训练就从触发判断退化为训练标准反应。七项里,中国体系的厚度集中在知识、调用频率、策略库这三项。
日本的核心是底盘和嵌入。从明治维新引入普鲁士教育模型那一刻起,这套系统就把识字、计算、阅读、注意力、规则意识训练得极厚,厚到学生进入工作时这些动作几乎都是自动化的。然后在企业里,通过岗上带教、轮岗、长期雇佣,把人塑造成组织流程里能长期承载具体职能的人。
日本管理学者野中郁次郎用「暗默知」(あんもくち)概括这种能力:它无法编码成文字,只能在岗位中长年沉淀。它的强项是基础自动化做得最厚,PIAAC 成人测评里日本始终在第一梯队,根子在这里:日本人成年后基础认知能力不大幅折旧。弱项是大规模社会化再培训不灵活,跨行业流动很难。七项里,日本体系的厚度集中在基础自动化和知识上。
美国采取的是另一种逻辑:不强求所有人,把资源集中在头部。从顶尖私校、研究型大学,到大学预修课程(AP)、荣誉班、磁石学校、本科科研项目,这套系统在头部建立一连串高强度认知场,让本来就具备高 Gf 的人在开放任务里被识别出来,再用资源加速放大。它愿意接受中位以下、底部惨淡的代价,换取顶部的右尾爆发力。
它头部的认知去耦和元认知训练强度,世界其他地方都达不到。直到今天,全球 AI 行业最顶尖的研究者绝大多数仍然在美国头部体系里训练出来。代价是中位往下基本不投入。这个体系在 2026 年的样子比这段描述还极端:头部和底部已经分裂为两个不同的国家,下面会用 NAEP 2024 的数据展开。
北欧(以芬兰为代表)采取的是第四条路径:低压、平均、长期。它不强求学生,依靠较小的校际差距、较高的教师自主权、较稳定的基础能力训练,把绝大多数学生送到一个不低的底盘上。然后真正的大头放在工作之后:成人再培训、终身学习入口、工会、雇主、政府三方共担的学习权利。一个人 30 岁、40 岁、50 岁仍然在学新东西。PIAAC 第一梯队是芬兰、瑞典、挪威、荷兰加日本,原因都在这里:北欧的成人不折旧。强项是全民底盘的长期维护,弱项是不集中训练顶尖深度图式。七项里,北欧体系的特点是没有特别强的项,也没有特别弱的项。
四个体系在七项上的偏向完全不同。背后是各自的历史问题不一样:中国要从极大的人口里筛出可用的工程师,日本要在战后重建里把人塑造成可靠的工业组织成员,美国要在开放社会里持续产出企业家和创新者,北欧要在小国体量里维持高福利社会的长期生产力。
放在一起看,「教与训」在 AI 之前的几乎全部可能形态,都被这四个体系覆盖了。
有几个体系不进主参照系,需要提及。韩国不进主参照,更适合作为反例:学生测评表现很强,成人能力画像却不强,hagwon 生态又放大了测评解释的难度。它会在后面作为反面案例出现:2025 年 AI 数字教科书四个月就被撤回,PIAAC 2023 成人三项都在 OECD 平均之下,是这条线索的两个端点。德国双元制是另一个职业培训标杆,覆盖面比中美窄,后面作为对比出现。新加坡是迷你版的混合体,规模太小,不作主参照。
这四个体系覆盖了「教与训」最不一样的几种形态。
学生层面:PISA 2022
PISA 是 OECD 每三年评估一次 15 岁学生数学、阅读、科学素养的项目。2022 年的结果如下(中国大陆未参加 2022 年评估)。
| 体系 | 数学 | 阅读 | 科学 |
|---|---|---|---|
| 日本 | 536 | 516 | 547 |
| 芬兰 | 484 | 490 | 511 |
| 美国 | 465 | 504 | 499 |
| OECD 平均 | 472 | 476 | 485 |
中国大陆最近一次参加 PISA 是 2018 年,B-S-J-Z(北京、上海、江苏、浙江四地联合样本)数学 591、阅读 555、科学 590,覆盖四地 81% 的 15 岁人口,并不是中国全国样本。中国大陆 PIAAC 也缺数据,PISA 又只有四地,所以严格讲,国际比较里没有「中国全国」这个对象。
PISA 2022 还有一项创造性思维评估。新加坡 41 分领先,韩国 38 分,加拿大、澳大利亚、新西兰、爱沙尼亚、芬兰在 36 到 38 分之间,OECD 平均 33 分。韩国的高分需要谨慎解释:它说明韩国学生在 PISA 这一测评框架下表现很强,但私人补习生态、国际应试及赛场上的作弊履历,均可能影响测评表现与成人长期能力之间的外部效度。这是本文的解释,不是 PISA 数据本身的结论。
成人层面:PIAAC 2023
PIAAC 是 OECD 的成人技能调查,评估 16 到 65 岁成人的读写、数理、适应性问题解决能力。它评估的是已经离开教育系统多年的人在工作和生活中实际的信息处理能力,比 PISA 更接近教育和培训系统的最终输出。2023 年是第二轮,结果如下。
| 国家 | Literacy | Numeracy | Adaptive PS |
|---|---|---|---|
| 芬兰 | 296 | 294 | 276 |
| 日本 | 289 | 291 | 276 |
| 瑞典 | 284 | 285 | 273 |
| 挪威 | 281 | 285 | 271 |
| 荷兰 | 279 | 284 | 265 |
| 德国 | 266 | 273 | 261 |
| 美国 | 258 | 249 | 247 |
| 韩国 | 249 | 253 | 238 |
| OECD 平均 | 260 | 263 | 251 |
中国大陆并未参加 PIAAC,无可比数据。
北欧加日本是成人技能最强的组合,三项都在第一梯队。美国数理和适应性问题解决都低于 OECD 平均,且 16 到 24 岁年轻人三项也低于平均。韩国三项都低于平均,相比上一轮 literacy 下降 23 分、numeracy 下降 10 分。
课堂层面:TIMSS Video Study
PISA 和 PIAAC 评估的是结果。TIMSS 在 1995 年和 1999 年做过一次罕见的研究:把真实课堂录像下来,分析 8 年级数学课的实际时间分配。1995 年覆盖德国、日本、美国,1999 年扩展到澳大利亚、捷克、荷兰、瑞士和中国香港。
学生工作时间在三类活动上的分配:
| 国家 | 练习已学过的程序 | 应用概念到新情境 | 发明新解法 |
|---|---|---|---|
| 德国 | 96% | 4% | 0% |
| 美国 | 90% | 8% | 2% |
| 日本 | 41% | 15% | 44% |
8 年级数学课获得最高演绎推理评分的比例:日本 39%,德国 28%,美国 0%。
TIMSS 1999 还有一组数据。教师设计的题目里,有些是要求把不同概念联系起来(「making connections」类问题)。课堂实际推进时,这类问题真正被讨论展开的比例:美国 8%,澳大利亚极低,其他五个体系(日本、捷克、中国香港、荷兰、瑞士)37% 到 52% 不等。澳大利亚的这种现象,甚至被研究者直接命名为「shallow teaching syndrome」(浅层教学综合征)。
这两组数据是 1995 和 1999 年的研究。具体百分比可能已变,结构性差异基本仍然成立。
美国底部:NAEP 2024
NAEP 是美国国家教育进步评估,俗称「国家成绩单」。2024 年那一次给出了美国教育系统当下最清晰的画像:
4 年级阅读,最高 10% 和最低 10% 之间相差 107 分,是 NAEP 历史上最大的差距之一。8 年级数学的高低差距是 NAEP 测试有史以来最大。再看绝对水平:底部 25% 的学生,阅读和数学的成绩比 1992 年同一位置的学生还要差。32 年过去,最差的那一截人不仅没有进步,还比上一代倒退了。4 年级最低分位 20 年来最低,8 年级最低分位是 NAEP 历史最低。
差距扩大不是偶然现象。从 2005 年到 2024 年这二十年里,4 年级数学最高 10% 上涨 4 分,最低 10% 下降 3 分;4 年级阅读最高上涨 3 分,最低下降 5 分。哈佛和布朗大学合作的 Annenberg Institute 估算:这二十年里美国教育系统内部的成就不平等扩大幅度,相当于 1.3 个学年的学习增量。再加上疫情几年的冲击,到 2024 年春,全国学生平均仍然落后疫情前接近半个学年,没有任何一个州在阅读上回到 2022 年水平。
但同一时间,美国头部仍然在源源不断产出全球 AI 行业的核心力量。MacroPolo 的 Global AI Talent Tracker 3.0 追踪全球顶尖 AI 研究者的本科教育来源,美国头部高校(MIT、Stanford、Berkeley、CMU 这一组)是除中国之外世界上最重要的供给源;研究生院聚集了全球最高比例的顶尖 AI 人才。
底部 25% 的学生倒退到 1992 年以前,顶部训练和吸纳机器仍然站在世界最前列。同一个国家折叠了两个时代。
当下的教育改革
四个体系都在改革。改革的方向各不相同,都没有完成,且产出最早也要到 2030 年代中期才进入劳动力市场。
中国时间表:
- 2021 年 7 月,双减政策落地,校外学科类培训机构受到系统性压制
- 2022 年 4 月,教育部发布义务教育课程方案和课程标准,把核心素养放在中心位置
- 2024 年 12 月,教育部发布关于加强中小学人工智能教育的指导意见
- 2025 年 1 月,中共中央国务院印发《教育强国建设规划纲要(2024-2035 年)》,明确「重点强化学生关键能力、学科素养和思维品质考查」,并启动面向中小学生的「沃土计划」(科学素养培育)和面向高中生的「脱颖计划」(拔尖创新人才培养)
- 上海 2024 年起,4 年级和 7 年级全覆盖开设《人工智能基础》地方课程,每周 1 课时,每年级总计不少于 30 课时;自研「启创·InnoSpark 人工智能教育大模型 1.0」,支持各区学校接入
- 黄浦区「登峰计划」依托大同、格致、向明三所老牌实验性示范性高中,建立五大学科拔尖人才培养基地,3 年内选拔不少于 300 位学生进入;与复旦、交大、同济、华东师大共同开展高中-大学衔接的科研英才培养项目
- 2026 年 4 月,教育部等五部门印发《「人工智能+教育」行动计划》(教科信〔2026〕1 号),提出到 2030 年构建纵向贯通、横向联通的人工智能全学段教育和全社会通识教育体系,把 AI 教育从地方试点推进到全国层面的制度部署
美国时间表:
- 2020 年起,「science of reading」改革在多州推进,重新强调系统性语音教学和早期阅读干预
- 2024 年 NAEP 数据显示阅读全部州无改善,多州转向加强基础能力测评
- 2025 年 4 月,白宫发布关于推进美国青少年 AI 教育的行政命令
- 2025 年起,美国教育部在多份指导意见中将 AI literacy 和 AI proficiency 纳入联邦教育资金支持范围
日本时间表:
- 2020 年新课程标准实施,引入主动学习、跨学科项目式学习、小学编程必修
- GIGA School 计划在 2020 到 2024 完成全国学生数字设备和网络配置
- 2024 年 12 月,MEXT 发布《中小学使用生成式 AI 指南》
- AI Education Accelerator Program 启动,与产业合作扩大教师 AI 培训
韩国时间表:
- 2025 年 3 月,AI 数字教科书在数学、英语、信息、特殊教育韩语四个科目正式推行,覆盖小学 3-4 年级和初高一年级
- 2025 年 7 月,第一学期采用率仅 37%
- 2025 年 8 月,国会立法将 AI 数字教科书重新分类为「辅助材料」而非「核心教材」,学校自愿采用
- 2025 年下半年,新总统李在明撤销原 AI 教科书强制政策;第二学期采用率降至 19%
韩国是全世界对 AI 教育改革意愿最强的国家之一,但反弹和放弃也最快。
几个观察
把数据和时间表放在一起看,几件事浮现出来。
学生表现和成人表现并不一致。日本两个层面都强;芬兰主要在成人层面强;美国两个层面都不突出;中国发达地区学生层面强,成人无可比数据;韩国学生层面强但成人下降。从学生时代的分数推断不出成年之后的实际能力。
课堂层面的差异比测评层面更大。德国和美国 8 年级数学课 90% 以上的时间在练习已学过的程序,日本只有 41%。这件事在 PISA 的最终分数上看不出来,但它是基础教育实际形态的差异。
美国的内部分裂在四个参照系里最严重。底部 25% 倒退到 1992 年水平之前,顶部仍然在向世界其他地方输出顶尖人才。这是同一个国家。
四个体系都在改革。改革产出最早 2030 年代中期才进入劳动力市场。当下这一代劳动力,是被旧体系生产出来的。
核心问题
四个体系覆盖了「教与训」在 AI 之前的几乎全部可能形态。它们在 PISA、PIAAC、TIMSS Video Study、NAEP 上的实测画像比脸谱化叙事复杂得多。它们都在改革。改革的方向各不相同,改革的产出最早要到 2030 年代中期才进入劳动力市场。
当下的图景在这里。
教育和培训训练的是七项不同的认知机制。每一项的可塑性、AI 之后的命运、市场稀缺性都不同。旧体系投入最多的四项是知识、调用频率的浅层、策略库浅层、基础自动化,恰恰是 AI 之后开始贬值的几项;旧体系几乎没有投入的两项是认知去耦和元认知,恰恰是 AI 之后唯一被单独定价的能力。
这个错位不是当下劳动力的过失。这套系统按它自己的逻辑生产他们,按它自己的标准评价他们,把他们送进劳动力市场,然后这套系统开始变形。改革启动了,针对的是 5 年之后入学的孩子。还要在劳动力市场上工作 20 到 40 年的几代人,不在改革的覆盖范围里。
教育和培训没有失去意义。它们曾经承担的资格,正在失格。
第二章 · 教的失格
旧体系投入最多的几项,刚好是 AI 之后开始贬值的;旧体系几乎没投入的两项,刚好是 AI 之后唯一被单独定价的能力。这个错位是结构性的。教育系统的存在前提和 AI 的能力边界,覆盖的是同一片地带。
错位的根源
旧教育的存在前提是「规模化生产可识别的人」。一所国民学校要面对几百人,一个国家的基础教育系统要面对几千万到几亿人。这个量级决定了,进入教育系统的训练必须同时满足三件事。
可标准化。教什么必须能用教材和教学大纲定义清楚。否则不同地区、不同教师没办法按统一节奏交付。一套高考数学题,从乌鲁木齐到上海到县城中学的学生都能做、都按同一标准批改,靠的就是这一层。
可考核。教得怎么样必须能用一套外部标准评价。否则学生层级无法稳定排序,文凭无法被劳动力市场使用。能力如果只是「我自己知道我懂」,但没有任何外部信号能证明,整个学历体系就瓦解了。
可低成本大量复制。教师成本不能高到雇不起。一个普通中学教师,经过四年师范训练,能按教学大纲在教室里传递知识。这种能力必须能被大量培养出来。如果只有顶尖学者能教某门课,那门课就不可能成为基础教育的一部分。
七项认知机制里,能同时满足这三条的是知识、调用频率的浅层、策略库的浅层、基础自动化。这四项可以写进教材、可以出标准化试题、可以由不同水平的教师按统一节奏交付。这就是为什么旧教育的资源几乎全部投入在这四项上。
满足不了这三条的是认知去耦的深度训练和元认知。
认知去耦的深度训练要求长期处理结构性陌生情境。每一次的「陌生」都是一次性的。一旦被标准化进教材,「陌生」就消失了,训练退化为标准反应。这件事在制度上根本无法规模化生产。中国题库教育里最有效的部分,例如顶尖学校的奥数训练、研究型大学的数学专业课,恰好是脱离了规模化考核进入了小班教学和导师制。
元认知要求真实任务加真实反馈的反复循环。它没法考核,因为「你能监控自己思维的程度」没办法用一张试卷测出来。它也没法低成本复制:给一个学生提供真实任务和真实反馈,需要一个比学生水平更高的人长期一对一参与。这个成本在规模化教育里支撑不起。整个 K-12 教育里几乎没有元认知训练的位置。
这就是教育系统的内在边界。能规模化的层,规模化生产;不能规模化的层,不进入教育系统的覆盖范围,留给少数精英项目、师徒制、个体自学。
到这里为止,错位还只是一个有意思的事实。AI 出现之后,这件事变得严重。
AI 训练的本质,是把人类社会过去几百年沉淀下来的可标准化、可结构化的语料,变成可低成本大量复制的输出能力。
可标准化的部分,AI 全有。
可考核的部分,也就是有标准答案的部分,AI 给得比人快、比人准。
可低成本大量复制的部分(知识、模板、套路),AI 单位成本接近于零。
AI 和旧教育覆盖的,是同一类「可规模化生产的认知输出」。两者是替代关系,不是互补。
旧教育不是做错了什么。它做的就是规模化生产可识别对象这件事。AI 也在做这件事,做得更便宜、更快、覆盖范围更广。
这个错位来自规模化机制本身的两次被使用。第一次是教育用了一两百年,第二次是 AI 用了几年。两者覆盖的恰好是同一片地带,而 AI 这一次的覆盖更彻底。
题库失格
题库训练真正起作用的方式,是把学生反复推到 Gf 的工作边缘上:遇到一个没见过的局面,要在没有现成解法的前提下识别结构、找突破口、构造路径。这是一次真实的现场推理。越是如此推动一个学生,他会越接近自己的 Gf 可调用上限。
这个机制的有效前提是题足够「熟悉」和足够「陌生」。一道学生从未见过结构的题才会触发现场推理,但这道题需要的推理工具又必须是学生学过的东西。一旦题目依赖完全陌生的知识基础,学生无法解决,但如若同样结构的题被刷过几十遍,「陌生」就消失了。学生不再启动 Gf,直接调用策略库:识别题型、套用模板、按记忆顺序展开。
题库被反复刷过几十年之后,绝大多数题型已经被充分模式化。中国基础教育里高强度训练的「刷题能力」,本质上是这种已经退化的训练形态:学生不再是在做现场推理,是在做高速的模式识别加模板调用。
这件事 AI 接管起来毫无障碍。AI 自带巨大的题型库和解题策略库,比任何刷过题的高考状元都更快更准。一个能在高考里做对 95% 数学题的学生,做的事情 AI 一秒钟就能完成。
这件事不是猜测。笔者对 2021 到 2025 年高考数学真题的认知机制分析(567 道题,覆盖 7 种独立卷型)给出了具体数据。按”调用 Gf 的程度”把题分成三档:
| 类别 | 占比 | 含义 | 刷题红利 |
|---|---|---|---|
| GOOD | 8.8% | 真正调用流体智力,需要现场构造 | 0.32 |
| MIXED | 29.6% | 多步推理,但每一步都能匹配标准武器 | 0.50 |
| BAD | 57.3% | 题型识别加套用模板即可 | 0.47 |
“刷题红利”指刷过本章 30 道高考题的学生比未刷过同类题型的学生快多少。0 表示两者相同,1 表示刷题学生秒做、未刷题学生几乎做不出。而类别代表了,这道题是否真正有效地将学生推到了 Gf 的工作边缘。
从七项认知机制看,37.9% 的题考”基础自动化”(求导、套公式),33% 的题考”策略库深度图式”(立体几何建系、解析几何联立后韦达定理),两者合计 71%。真正调用 Gf 的题占 18%。
几个反直觉的发现。
全国甲卷理 4 年(2021 到 2024)共 90 道题,GOOD 数等于 0。包括所谓”压轴大题”,每一步都能匹配标准武器。
刷题红利在 MIXED 题上最高(0.50),BAD 题上反而较低(0.47),GOOD 题上最低(0.32)。真正”刷过的学生秒做、没刷过的学生卡住”的是中等难度的解析几何、导数、概率大题,不是基础题。
2024 年起 GOOD 比例从前几年的 6% 到 8% 跳到 14%。这反映命题人在压轴位置加入”新定义”题型,北京卷 q21 是标杆。但 14% 意味着 100 道题里仍有 86 道可被刷题秒做。
这是题库失格在 2026 年此刻的具体规模。
题库失格的具体形态在这里:训练机制本身没有错,但题型被充分模式化之后,训练已经从「触发判断」退化为「训练标准反应」,恰好和 AI 接管的层次完全重叠。
形式生产失格
教育系统训练的大量「形式合格」(写作训练、表达训练、PPT 训练、报告训练、简历训练)在过去是「实质的间接证据」。一个学生能写出像样的议论文,多半真的理解;能写出像样的代码,多半真的会工程;能在面试中清晰表达自己的项目,多半真的做过相关工作。形式合格成为筛选可信对象的低成本信号。
AI 之后形式合格的门槛瓦解。一份看起来完整的方案、一段能跑的代码、一篇结构整齐的文章、一份格式漂亮的简历,都不再稳定证明背后有相应的主体。ISSUE 04 把这件事称为「形式可以脱离实质独立产生」。
这意味着教育系统过去几十年训练的「产出能力」,作为筛选信号的有效性大幅下降。一个能写出像样东西的学生,和一个不能写但会用 AI 的学生,从外部产出上看不出区别。
雇主、研究生院、奖学金委员会面对的是同一个问题:申请文书、推荐信、研究计划书、个人陈述,过去用来筛选「会写、有想法、有沉淀」的申请者,今天可能筛选出「会用 ChatGPT」的申请者。基础教育训练的「作文功夫」,在 AI 工具面前贬值速度比题库还快。题库还需要学生记住题型,AI 写作不需要任何前置条件。
更深的一层:形式生产失格不只影响筛选,也影响学习本身。一个学生从小被训练「先把作文结构搭好,再填充内容」,这种训练的隐含假设是「形式承载实质」。学生在训练形式的过程中,本来会自然沉淀出对实质的理解。AI 之后这个假设被打破:学生可以用 AI 直接产出形式,跳过实质沉淀这一步。整套训练机制的内在反馈链中断。
这是教育系统在 AI 时代最棘手的处境之一:不仅产出能力贬值,连「通过产出训练实质」这条几百年验证有效的路径,也开始失效。
标准答案失格
考试体系的核心动作是把答案标准化。这件事的目的是降低评价成本,让规模化考核成为可能。一份标准答案让任何被合理培训的阅卷老师都能给出近似一致的分数。
这是教育系统几百年来运转的基础。从隋代科举到当代高考,一套能被标准化评价的考试体系,是文凭和学位作为社会信号的前提。
但 AI 最擅长生成的就是标准答案。
教育系统训练的能力越接近「能给出标准答案」,越证明这件事 AI 已经能做。一个高考状元能在规定时间内给出几乎完美的标准答案,但这件事在 AI 之后已经不是稀缺能力。
更深的一层:当一个考试可以用 AI 拿到 90 分以上,这个考试测的就不再是它原本想测的东西。它测的是「会用 AI 的能力」或「会作弊的能力」。这一层和考试设计目标完全脱钩。
近两年中美各国大学的考试体系都在面对这一问题。美国大学普遍发现 ChatGPT 可以独立完成大部分本科课程作业。中国大学陆续出现「代写加 AI 工具」的混合作弊。各国应对方式从禁止 AI、到检测 AI、到把考试改成现场口试或闭卷手写,本质上都是在试图重新让考试与「主体能力」挂钩。
但这只是延缓,没有解决。标准答案在非受控环境下的证明力失效。它仍可在受控考试里测基础自动化和部分策略库,但不能再独立证明一个人具备完整判断能力。
考试要么放弃标准答案(变成开放式评价,但成本高、可比较性差),要么坚持标准答案(但承认这个考试只能测 AI 已经擅长的部分)。两条路都通向同一个事实:考试系统作为筛选机制的可信度在瓦解。
这件事比题库失格和形式生产失格更深,因为它撼动的是教育系统据以运转的整个评价基础。
失格平方
考试失格是一阶问题:考试这种筛选机制本身在 AI 时代的可信度下降。围绕考试的辅导产业是二阶问题:用大量时间和金钱训练学生去配合一个已经失格的筛选机制。这件事的性质是失格平方。
具体形态在三个体系里都清晰可见。中国校外学科辅导被双减压制后转入地下、转入素养类、转入 AI 类,全国家长付费补课的总量仍然庞大。很多补课的目的是让孩子在已经失格的高考机制里多拿几分,不指望真正掌握什么能力。美国 SAT 备考、AP 备考、Common App 个人陈述代写代改是一个数十亿美元的产业。AI 之后这套产出对真实能力的证明价值大幅下降,产业仍在扩张。韩国 hagwon 是失格平方的极端案例:把孩子从六岁起送进补习班,结果他们成年之后基础能力反而下降(PIAAC 韩国成人 literacy 在两轮之间下降 23 分)。
但失格平方不会是顶端的常态。
顶端的家长和学生大部分已经在做不一样的事。美国精英家庭把孩子送进真正训练认知去耦的项目:研究型实验室的本科科研、competition math、debate、独立科研项目,而不是 SAT 培优。这些项目的产出在 AI 时代仍然有效。中国一线城市的高知家庭开始把更多时间精力投在拔尖创新人才培养基地、奥赛、海外暑期科研项目,绕开规模化考试机制的训练。北欧家庭从来就没把考试辅导当成主流方式。
顶端撤离的原因有两条。第一是信息优势:顶端家长更早识别出「配合失格机制」和「训练真实能力」之间的差别。第二是投资回报率:高 Gf 上限的孩子,把训练投在不能规模化的层(认知去耦、元认知)回报率最高,因为他们本来就有把这些层训练得厚的潜力。
但中段和长尾家庭很难做到。原因在于没有替代选项。一个三线城市的家庭,找不到本地的真实科研项目,找不到能进行高强度真实反馈的导师,孩子的所有「上升通道」仍然是高考。家长把钱投在补习班上,原因是没有别的可投。他们知道这件事在贬值。
这是失格平方真正的代价。它不只让被辅导的学生在 AI 时代的处境恶化,更让顶端和非顶端家庭的能力差距进一步扩大。顶端能识别并撤离失格机制,把资源投在仍然有效的训练上。非顶端即使识别得到,但仍然无法撤离。
教的失格在这个层面已经超出教育本身。它在重新分配下一代的能力分布。
四国失格点
中国。中国体系的强项是高密度调用 Gf 并稳定排序。但这个强项的有效边缘是:题必须足够「陌生」。被反复刷过几十年之后,「陌生」消失,训练退化为模式识别和模板调用。这件事 AI 时代等于「会让 AI 帮我刷题」。
上文中笔者的 567 题研究按卷型拆开后差异更明显。自主命题省(上海 14.4%、北京 13.6%、浙江 6.8%)的 GOOD 比例显著高于新高考统一卷(Ⅰ卷 8.7%、Ⅱ卷 5.3%)。全国甲卷理 4 年 90 题里 GOOD 数等于 0。高考题库在「提升 Gf 考察」上失败了,统一卷把套路化推得更彻底。「新情境包装」在中国高考里大多是装饰,嫦娥二号连分数、信道传输、投篮赛这些题剥离外壳后是标准模板。
近年双减、拔尖创新人才培养计划、上海 AI 必修课、黄浦区登峰计划,都在试图把基础教育从「标准反应训练」调整到「陌生情境推理」。改革方向是对的,但覆盖率有限。黄浦登峰计划三年选拔不超过 300 位学生,是一个城区内的精英项目,对全国大多数学生没有直接影响。
更深的难题:陌生情境推理的训练本身没法规模化。中国体系想在保留规模化筛选优势的同时引入去规模化的训练,本质是结构性矛盾。怎么解决,目前没有可推广的方案。
日本。日本体系的强项是基础自动化最厚。前面给过 PIAAC 数据:日本人成年后基础认知能力不大幅折旧,根子在 K-12 把识字、计算、阅读、注意力、规则意识训练到自动化程度。
但日本的失格点不在基础教育,在企业内训的「嵌入」假设:把人塑造成长期承载特定职能的对象。AI 之后「特定职能」本身在被重写。一个长期被训练成承载某个具体职能的人,当那个职能被 AI 重写时,他几十年积累的「嵌入能力」全部失效。
基础教育层面,日本面对的失格点是 AI 使用范式的缺失。MEXT 2024 年的生成式 AI 指南是一个起步,但离形成「AI 使用范式的基础自动化」还很远。日本基础教育还没有把 AI 使用纳入「练到不需要专门动脑」的训练范围。
美国。美国教育已经分裂为三层,每一层在 AI 时代的命运完全不同。
头部(AP、荣誉班、研究项目、顶级私校、研究型大学)在认知去耦和元认知上的训练强度世界第一,全球顶尖 AI 研究者绝大多数来自这里。AI 之后这一层会被进一步放大。他们已经具备的能力恰好是 AI 时代被单独定价的能力,加上 AI 工具加速效率,头部和其他人群的差距会扩大。
钟形领域。NAEP 2024 已经显示这一层在持续下滑,最低分位 20 年来最低。各州正在试图回归基础教育,「science of reading」改革就是这个回归动作。但回归基础教育对当下劳动力没有帮助,受益的是 5 年后入学的孩子。当下钟形领域劳动力市场上的人,没有救济。
长尾底部。底部 25% 已经退步到 1992 年水平之前。这一层在 AI 时代会进一步被压缩。基础不扎实的人使用 AI,等于把判断权外包给一个不能负责的系统。结果是被 AI 压低。他们没有能力判断 AI 输出是否合理,AI 给什么他们相信什么。
美国未来 10 年最可能出现的画像是中产分裂为两半:能完成自我重构的进入头部加速放大,不能完成的滑入底部被 AI 压低。原本钟形分布的中位人群在 AI 时代会逐步消失,整个社会能力分布从钟形变为双峰。这是 AI 之后所有四国体系里最戏剧性的失格画像。
北欧。北欧体系的强项是全民底盘维护得长。但北欧不集中训练顶尖深度图式,纯学生层面爆发力不足,因此在 AI 时代极少数高产出位置上不占优势。这是北欧体系的失格点:能保证全民不折旧,不能保证产出右尾。但北欧体系的真正机会反而在 AI 时代显现。终身学习入口的成熟让北欧成人不折旧,这套机制如果能扩展到 AI 使用层,就是把基础自动化的边界扩展到 AI 时代。
反例:什么样的训练没有失格
在 AI 之后仍然完整有效的训练形态,长什么样。
真正的数学训练:数学奥赛、数学系本科、数学研究。它训练的是认知去耦的极端形态:把抽象结构从语境里完全剥离出来,作为形式对象操作。证明一个定理需要在严格的形式系统里逐步推导。这件事 AI 能验证,但训练人本身的去耦能力只有人能完成。
顶级科研训练:博士阶段、博士后、研究型实验室的实习。它训练元认知(识别自己研究方向的边界、判断什么是重要的问题、知道自己什么时候在自欺)和领域深度图式(一个真正懂物理的人脑子里那套「什么解法可能不对」的判断)。这种训练完全依赖真实任务加真实反馈,AI 没法替代导师做出判断。
专业辩论、模拟法庭、debate club。在对抗中识别论证结构、识别隐含假设、构造反驳。AI 可以模拟辩论,但训练人在压力下、在真人对抗中现场调用判断,这件事必须由真实主体完成。
真实的工程训练。要在复杂系统中调试、推断异常、识别隐藏依赖、判断什么时候该重写架构而不是修补 bug 的工程。这种工程不止是写写就能跑的代码。这种训练同时训练领域深度图式和元认知。
长期师徒制:医学住院医师、传统手工艺、临床心理。训练通过长期真实反馈才能形成的判断力。一个住院医师在床边判断病人状况的能力,需要几年时间和几百个真实病人才能形成。
高强度的写作训练加真实反馈。每周改稿改三十遍那种,不是大学写作课的形式课。这一类训练的核心是让学生在反复修改中真正形成自己的判断和声音,也就是 ISSUE 04 后半段讨论的「作者性」的训练通道。
这些形态的共同特征:
不能被规模化教学(学生数量必然小)。
必须有真实任务和真实反馈(不能靠考试评价)。
投入产出比从经济角度看不划算(教师成本高、学生转化率低)。
覆盖率必然低(中国十几亿人口里能做到这种训练的可能不到 0.1%)。
这是前面论证的镜像。旧教育不投入这些层,因为经济上不划算、规模化做不到。AI 之后这些恰好是唯一被单独定价的层。
没有失格的训练形态有一个共同特征:它们都做不到规模化。AI 时代被市场单独定价的能力,刚好是教育系统从来没有覆盖过的能力。
教的失格不在教育内部
教的失格本质上是教育系统的存在前提(规模化生产可识别对象)和 AI 时代的能力分布(去规模化的判断和去耦能力)之间的结构性错位。失格不在教师身上,不在课程身上,也不在考试身上。
这个错位是教育这件事在 AI 时代的根本处境,不是某一时刻可以纠正的问题。改革针对的是 5 年后入学的孩子。当下劳动力市场上几代被旧体系训练出来的人,必须自己面对这个错位。
第三章 · 训的失格
教的失格在于训练的内容刚好被 AI 接管。「训」的失格在另一处:训的整个体系建立在三个前提之上,岗位是稳定的、流程是稳定的、决策链是稳定的。AI 之后这三个前提同时失效。
训没有失格在它训练的内容上。它失格在它训练的对象正在消失。这是一个比「教」更深的失格。教只是产出贬值,训是连存在前提都被抽走。
训的存在前提
序章讲过亨利福特的故事。1913 年 Highland Park 的移动装配线把汽车装配时间从 12.5 小时压缩到 93 分钟。Ford 解决的是一个具体问题:怎么把一群没有任何汽车制造经验的农场移民,在两周内塑造成可以站在某个工位上重复一个动作的工人。
这就是「训」作为现代制度的起源。它是工业化生产对人的一种特殊塑造方式,区别于教育的延伸。
训要解决三个问题。
第一,让通用人才成为特定岗位的可立即嵌入单元。一个大学毕业生进入一家公司,并不能立刻产出价值。他需要被训练成「这家公司这个岗位」上能稳定运转的零件。岗位说明书、SOP、入职培训、跟班学习、师徒制、轮岗,这一整套机制的核心动作就是这件事。
第二,让经验在岗位之间传递。一个高级员工脑子里那套「什么时候该怎么处理」的判断,没法用文字写完整。组织通过 OJT、师徒制、长期共事让这种隐性知识从老员工身上转移到新员工身上。这是日本企业内训的强项,也是 Nonaka 的「暗默知」概念真正适用的场景。
第三,让人在组织流程里形成稳定预期。我接到这种类型的需求会怎么处理、我向上汇报应该用什么格式、出了问题我应该找谁、这个客户应该怎么对接。这些预期承载的是组织文化,不是个人能力。一个员工在一家公司工作越久,他承载的组织预期越多,对组织的价值也越高。
训的产出是岗位经验、流程熟练度、汇报能力、协作习惯、组织文化承载。
这套体系在过去几十年里有效,是因为它的三个隐含前提同时成立。
岗位是稳定的。我今天被训练成的岗位形态,明年大体上还是这个形态。
流程是稳定的。组织的工作方式不会每年重写一遍。
决策链是稳定的。上游下游的关系、向上汇报的对象、分工边界,是可预期的。
只有这三件事同时成立,「训」作为一种长期投资才有意义。一个员工被训练 5 年达成的能力,组织和员工双方都假设它会在第 6 年、第 10 年、第 20 年仍然有效。如果第 6 年因为岗位被重新定义而失效,这 5 年的训练投入就是沉没成本。从企业视角是钱,从员工视角是 5 年人生的认知投入。
训的存在前提是一个稳定的世界:稳定的岗位、稳定的流程、稳定的决策链。AI 改变的是这三个前提同时成立的可能性。
三个前提的同时失效
岗位不再稳定。一个岗位本质上是「一组任务的稳定捆绑」。产品经理的岗位是写需求文档、用户调研、PRD、跟进开发、汇报上下游的捆绑。运营的岗位是数据分析、内容生产、流程化执行、KPI 拆解、日常运营动作的捆绑。中层的岗位是接需求、拆任务、派给下属、跟进进度、收结果、汇报上游的捆绑。每一个岗位都是过去几十年组织演化中形成的具体捆绑。这个捆绑稳定的前提是:捆绑里的任务彼此协同,单独拆出来都不容易完成。
AI 改变的就是这个协同假设。AI 之后,捆绑里的大量任务可以被独立完成,且独立完成的成本接近零。写需求文档、做用户调研综合、写 PRD、做基础数据分析、生成模板化内容、做需求澄清和任务树生成、跟进进度,每一项 AI 都能做。当捆绑里 60% 到 80% 的任务都可以被 AI 独立完成时,这个捆绑作为「岗位」的存在合理性消失。
岗位是被解构。它现在是一组可以被随时拆开、重新分配、外包给 AI 的任务集合,不再是一组稳定的、需要长期训练才能承担的任务。被这种岗位训练了 10 年、15 年、20 年的人,承载的是一组已经被解构的任务捆绑的「承担能力」,不是稳定的能力。这个能力在岗位被重新定义后没有处可以落。
流程不再稳定。一个组织的工作流程,是把上面那些岗位连接起来的具体路径。需求从产品经理到设计师到工程师到测试到上线。客户从销售到售前到方案到合同到交付。报销从财务到审计到合规到批准。每一条流程都是几代人在组织里磨出来的具体路径。
AI 之后这些路径可以被重写。重写的方式有三种。第一种是流程压缩:原本需要 5 个人协作的流程,1 个人加 AI 就能完成。第二种是流程重组:原本必须按特定顺序的流程,AI 让中间环节可以并行或合并。第三种是流程消失:原本需要的流程,因为 AI 让整个产出形态变化,流程本身不再必要。
被旧流程训练出来的人,熟悉每个环节的具体细节、知道找谁、知道怎么对接。他们掌握的是对一套具体流程的熟悉,不是通用能力。流程被重写后,这种熟悉本身贬值。
这件事和「岗位被解构」不是两件事。岗位被解构的同时,连接岗位的流程也被解构。两者一起发生。
决策链不再稳定。任何组织里的人都是一条决策链上的节点:上游交付任务,我理解,我拆解,我交给下游,我审查下游产出,我交付给上游。旧节点的默认假设是:上游和下游都是人。我作为中间节点的核心价值是在人和人之间做翻译、协调、审查。
AI 之后这个假设失效。下游不再都是人,也包括 AI、流程、模板、agent。我作为节点的价值不再是「协调上下游的人」,是「判断哪些工作交给 AI、哪些交给人、哪些自己保留」。
这是一个完全不同的价值定义。旧的中层管理者承载的能力是「协调人和人」。AI 时代的节点承担的能力是「重新画自己的下游边界」。这两件事的训练方式、判断标准、能力构成几乎没有重叠。
这一句对应序章 Henry Ford 的反向。Ford 把通用人转化成可被使用的工业组织成员。AI 时代不合格的管理者,是还在用人做机器已经能做的事。AI 时代的管理者必须把过去转化给人的工作重新分配,让 AI 承担机器能做的部分,让人承担只有人能做的部分。
岗位被解构、流程被重写、决策链需要重新画边界。三件事同时发生。旧训练系统训练的人,承载的是「在稳定岗位、稳定流程、稳定决策链上熟练运转」的能力。这三个前提同时失效之后,能力本身没有失效,但能力承载的对象(具体岗位)消失了。
这就是训的失格的真正机制。它不像教那样在内容上失格,是在前提上失格。训没有训错,是训的对象正在消失。
失格的具体形态
熟练度训练失格。最快被 AI 接管的层。写邮件、总结资料、做表格、生成代码、写报告、做 PPT、做会议纪要、整理调研结果。这些「熟练度」在过去是一个员工的核心价值之一。一个工作 5 年的人写报告比工作 1 年的人写得快、写得规范、写得齐整。
AI 之后这个差距被抹平。一个会用 AI 的工作 1 年的人,写报告的速度和质量超过不用 AI 的工作 5 年的人。工作 5 年的人仍然有价值。他可能在判断、在客户关系、在专业积累上仍然厚。但「写报告快、写得规范」这一层不再是他的价值来源。
被熟练度训练充分的中年员工,处境最尴尬。他们工作了 10 年、15 年、20 年,最厚的那一层正是熟练度。当熟练度不再被定价,他们手里能拿出来的资本就只剩判断、关系、客户、行业理解。这些东西在过去并没有被组织系统性训练,是个人零散积累的。
流程适配训练失格。旧训练把人塑造成「特定流程的稳定承载者」。这个员工知道这个 SOP 怎么走、那个客户怎么对接、内部审批怎么流转、跨部门协作的潜规则。AI 之后流程本身被重写。被流程训练出来的能力,作为「对一套具体流程的熟悉」,跟随流程一起贬值。
更深的一层:当流程不稳定时,「按流程走」这件事本身的价值降低。组织开始更看重「判断什么时候不能按流程走」,也就是元认知能力。元认知恰好不是流程训练的覆盖范围。
经验积累训练失格。旧组织里「老员工值钱」的核心理由是经验:见过更多 case、知道遇到什么问题怎么处理、记得三年前那个客户的具体情况、能预判某种情境下会出什么问题。
经验作为一种能力是分层的。浅层经验是「见过类似 case」。这一层 AI 自带巨大的 case 库,覆盖范围比任何人都广。深层经验是「在大量真实失败和成功的反馈中形成的判断力」。这一层 AI 没有连续主体,无法替代。
但旧训练并不区分这两层。一个员工经常被组织当作「工作 X 年」这个数字来评价,背后的隐含假设是「工作时间 = 经验 = 判断力」。AI 之后这个等式失效:浅层经验贬值、深层经验仍然稀缺,但浅层经验是 90% 员工承载的部分。
工作 15 年的中层经常说「这种事我做过 100 次」。这句话在 AI 之前是能力证明。在 AI 之后变成尴尬:AI 看过的 case 比 100 次多 10000 倍。
标准化交付训练失格。旧训练强调交付物的标准化:方案要完整、PPT 要漂亮、报告要专业、邮件要正式、汇报要结构清晰。这一层在 AI 之前是组织运转的润滑剂。交付物形式合格让信息流转效率提高。AI 之后形式合格门槛消失。这一点和教的失格里「形式生产失格」对应,但场景在工作而不是教育。一份 PPT 看起来很专业,不再证明背后的人真的理解问题。一份报告结构完整,不再证明背后的人真的做过分析。被旧训练充分塑造的「职业感」(会写邮件、会做汇报、会包装方案),这些过去定义「职场人」的具体能力,在 AI 之后从竞争力变成「AI 即可产出的输出」。
中层最难
四种失格形态覆盖大部分员工。但有一个特殊群体在 AI 时代的处境最严峻:中层管理者。
中层是当下劳动力市场上规模最大的「被旧训练充分塑造」的群体。但中层的失格机制和基层员工不一样。AI 不是直接替代中层。它是把中层核心动作的标准抬到一个旧训练上来的中层达不到的位置。
中层的核心动作是一个循环。任何中层管理者的核心动作可以拆成六步:接需求、拆任务、派给下属、跟进进度、收结果、汇报。这六步组成一个判断和执行循环。中层的核心价值不在于步骤本身,在于这个循环跑得快不快、稳不稳、产出质量高不高。一个好的中层让循环跑得快、决策果断、产出整齐;一个差的中层让循环卡顿、决策反复、产出参差。
AI 不替代循环,它改写循环的标准。AI 不是替代这六步。AI 协助下的中层每一步都做得更快、更准。接需求时 AI 帮你澄清和重写需求。拆任务时 AI 帮你生成结构化任务树。派任务时一部分子任务可以直接交给 AI 完成,剩下的派给下属。跟进进度时自动化流程做大部分跟进动作。收结果时 AI 做初步质量检查。汇报时 AI 生成各种形式的汇报材料。整套循环的速度和质量都显著提升。ISSUE 02 讲过这件事的本质:AI 大幅提升效率的同时把质量保证在中位以上。这是任何公司的任何团队在全人基础上都无法做到的。
全人团队的质量天花板。一个由人组成的团队,无论怎么训练,质量分布都是正态的。最好的人产出质量极高,最差的人产出质量低。整体团队的产出受最弱环节拖累。一个 10 人团队的整体产出质量,永远低于团队里最强的那个人。
AI 把每个人的工作质量基线都抬到中位以上。一个工作 1 年的人借助 AI 完成的报告,质量可以达到工作 5 年水平的中位线。一个 5 年水平的人借助 AI,质量可以达到 10 年水平的中位线。最弱环节的质量被显著提升。
会用 AI 的中层管理的团队,整体产出质量稳定在中位以上。不会用 AI 的中层管理的团队,仍然是传统正态分布,受最弱环节拖累。同样的资源、同样的人员配置、同样的目标,两种团队的产出差距可能是 2 到 3 倍。
旧训练里没有 AI 使用。中层在过去十几二十年被训练的,是路由器的能力:接需求、拆任务、跟进进度、汇报。这一整套动作的训练里没有「AI 使用」这一层,因为旧的训练对象是稳定岗位、稳定流程、稳定决策链。AI 使用是 2023 年之后才出现的范式。按旧路径培训上来的中层,路由能力很好,但不会用 AI 改写自己的判断和执行循环的效率和质量。
中层最难的真正含义:他承担的核心动作(判断和执行循环的效率和质量),AI 时代的标准被大幅提升,而旧训练没有教他怎么达到这个新标准。按旧路径培训的中层,相比会用 AI 的中层,在效率和质量上都大幅落后。同样资源、同样团队、同样目标,会用 AI 的中层做完了,旧路径中层还在做。或者两者都做完了,但会用 AI 的产出质量在中位以上稳定,旧路径中层产出质量受最弱环节拖累。这是中层在 AI 时代的真正失格:按旧训练上来的中层,相比会用 AI 的中层,效率和质量都大幅落后。
四国职业培训
日本。日本企业内训的强项是基础自动化加长期嵌入,通过 OJT、轮岗、师徒制把人塑造成能长期承载特定职能的对象。失格点在「特定职能」本身正在消失。一个被训练成「承载某个具体职能 30 年」的人,当那个职能被 AI 重写时,他的整套训练失效。日本组织的长期雇佣假设的两端都失效。一端是公司不再需要终身的专业人才,另一端是员工的专业能力也不再适配新岗位。
日本的失格不在企业训练得不好。是日本企业训练的方向(让人深度嵌入特定岗位)和 AI 时代的能力分布(让岗位本身可以被随时重写)相反。日本越是擅长企业内训,在 AI 时代越是脆弱。
中国。中国职业培训有一个独特的特征。它在过去 20 年里集中输出「执行型工程师」:能完成具体技术任务、能 996、能稳定交付的工程人才。这一层在 AI 时代的脆弱性,比日本和美国都更高。原因是 AI 最先接管的恰好是这种「标准化执行型工程任务」:写代码、调 bug、做集成、写文档。一个有 AI 协助的工程师可以承担过去需要两到三个人完成的工作量。
中位执行型工程师的岗位需求在未来几年可能会出现明显压缩。这件事在中国的影响比日本和美国更大,因为这一层在中国劳动力市场上的占比更高。中国的另一个失格点和日本相似:长期雇佣假设也在失效。但中国的失效速度更快,因为没有日本那种长期雇佣文化作为缓冲。
其他参照系。美国职业培训本来就碎片化,依赖横向流动(裸辞跳槽是常态)、外部认证(PMP、CFA、AWS 认证)、apprenticeship 复兴。碎片化在 AI 时代有意外的优势:因为没有像日本那样把人深度嵌入特定企业,美国劳动力对岗位变化的耐受性更高。但美国的失格点在底盘弱:基础不扎实的人在 AI 时代使用 AI,是被 AI 牵着走,不是用 AI 加持自己。
北欧的强项是终身学习入口的成熟。失格点是规模太小、覆盖人群太少,无法成为其他国家的可推广方案。但北欧的机制本身仍然成立。如果它能扩展到 AI 使用层(把「用 AI 工作」变成全民可以终身学习的能力),就是把基础自动化的边界扩展到 AI 时代。
韩国是「训」的失格的极端形态。从 hagwon 把孩子的所有时间填满应试训练,到大学的高强度就业准备,到职场的「在场即价值」文化,整套链条都把高强度训练本身当成价值。结果是 PIAAC 2023 韩国成人 literacy 比上一轮下降 23 分。被训练得最久的一代成人,在 PIAAC 上的下降幅度最大。整套链条同时在失效,文化让这件事自我强化。
训没有训错
训的失格本质上不是训练内容的失格。是训的存在前提(稳定岗位、稳定流程、稳定决策链)在 AI 时代同时失效。
被旧训练充分塑造的几代人,承载的是「在稳定结构里熟练运转」的能力。这个能力本身没有失效,是它承载的对象消失了。当一个人 20 年的工作经验对应的全部是已经解构的岗位、已经重写的流程、已经被 AI 接管的决策链,他需要重新画自己的能力边界。旧训练里没有教过他怎么画。
教的失格让 5 年后的孩子需要被重新教。训的失格让当下劳动力市场上的几代人需要被重新训。
教的失格还有一条改革路径。课程可以被重新设计,新的考试体系可以被构建,新的教学方式可以推广。这件事 5 年到 10 年内会有结果。
训的失格没有这条改革路径。原因是没有「AI 时代的训」作为现成参照。每一个组织、每一个行业、每一个具体岗位都需要自己摸索什么是新的训。这件事的速度取决于 AI 能力的边界、组织的反应速度、个人的自我重构能力。中间会有大量人被卡在旧训练和新需求之间。
他们脑子里被训练出来的认知结构,是为一个不再存在的世界准备的。
第四章 · 夹层世代
诊断要落到具体的人群上:现在 25 到 50 岁、已经离开 K-12 教育系统、还要在劳动力市场工作 20 到 40 年的几代人。
教的失格让他们被旧体系生产出来时就携带了错位。训的失格让他们工作之后被进一步塑造成不再被需要的形态。改革启动了,但改革不为他们准备。
这群人是谁
按当下年龄分子群:
| 当下年龄 | 出生年份 | K-12 完成 | 进入职场 | 还要工作年数 |
|---|---|---|---|---|
| 25-30 | 1996-2001 | 2014-2020 | 2018-2024 | 35-40 |
| 30-37 | 1989-1996 | 2007-2014 | 2011-2019 | 28-35 |
| 37-47 | 1979-1989 | 1997-2007 | 2001-2012 | 18-28 |
| 47-55 | 1971-1979 | 1989-1997 | 1993-2003 | 10-18 |
每一个子群进入职场的时点都是不同的「前 AI 时代」。25-30 岁的人在 2020 年前后进入职场,刚开始工作就遇到 ChatGPT。47-55 岁的人在 1990 年代进入职场,整个职业生涯的认知模式是为完全不同的世界塑造的。
粗略按人口年龄结构、受教育水平和劳动参与率估算:中国 25-50 岁人口约 4.5 亿,其中受过高中以上教育、目前在劳动力市场上工作的约 3 亿。全球 25-50 岁人口约 27 亿,在工业化和半工业化经济体里、有正规教育和正规工作的约 12 到 15 亿。
按七项认知机制看,这群人脑子里被强训练的是知识、调用频率的浅层、策略库的浅层、基础自动化、形式合格。弱训练或没有训练的是认知去耦的深度、元认知、AI 使用范式(这件事在他们求学和早期职业阶段不存在)。
这个能力结构是为一个稳定岗位、稳定流程、稳定决策链的世界设计的。它在 AI 之前能完整支撑一份职业生涯。AI 之后这个支撑开始失效。
他们当年没有选择被训练成现在这个样子。
变革的时间错位
夹层世代面对的不只是能力错位,还面对时间错位。
教育改革产出到劳动力市场的速度:
| 改革 | 启动 | 受益人群进入职场 |
|---|---|---|
| 中国双减 | 2021 | 2030 年代初 |
| 中国教育强国规划纲要 | 2024-2035 | 2030 中-2040 |
| 上海 4-7 年级 AI 必修课 | 2024 起 | 2032-2036 |
| 美国联邦 AI 教育行政令 | 2025 | 2032-2037 |
| 日本生成式 AI 指南 | 2024 | 2030 年代初 |
AI 改写工作的速度:
| AI 工具 | 公开 | 工作影响显现 |
|---|---|---|
| GPT-3.5 | 2022.11 | 2023 起 |
| GPT-4 | 2023.3 | 2023 下半年起 |
| 各行业 AI 工作流 | 2024-2026 | 持续 |
| AI agent 工作流 | 2025-2026 | 进行中 |
教育改革按 10 年算,AI 改写工作按月算。两者的时间尺度差异是两个数量级。
夹层世代在这两个时间表的中间。他们不是改革的受益人,改革的产出在他们退休前才能到达。他们是 AI 改写工作的承受人,每个月都在发生。
四国教育改革有一个共同特征:它们都把「教育系统的下一代产出」作为目标。它们不针对当下劳动力市场上的成年人。
成年人的再教育不在传统教育部的覆盖范围。它属于「职业培训」或「终身学习」,由企业、行业协会、个人自己负责。但企业的 AI 培训停留在工具技能层,行业协会的反应速度比国家教育系统更慢,个人自己负责意味着大部分人没有合适的资源。
夹层世代被三个层面同时推卸。
三种推卸
政府推教育改革。教育改革对当下劳动力没有直接帮助,针对的是 10 年后的孩子。政府把「我们在改革」作为对当下劳动力市场失效的一种回应,但这个回应在时间上完全错位。四国都在做这件事。中国的 2024-2035 教育强国规划纲要、美国的 2025 联邦 AI 教育行政令、日本的 MEXT AI 指南、韩国 2025 年 AI 数字教科书。这些改革针对的全是基础教育阶段的学生,没有一项针对在职成年人的认知能力重构。夹层世代和教育改革的关系,本质上是不被覆盖的关系。改革做得多好都和他们无关。
企业推 AI 培训。多数企业的 AI 培训停在「教大家怎么写 prompt、怎么用某某工具、怎么总结文档」。这是把工具能力当成认知能力训练。
真正需要的不是工具能力。是任务重构能力:重新画自己工作的下游边界,判断哪些任务交给 AI、哪些交给人、哪些自己保留。这件事要求深度的元认知,不是任何工具培训能完成的。
企业不教这件事,因为它要求重新设计组织流程。企业自己也不知道怎么设计。一个公司不会培训员工「重新发明你的工作」,因为公司自己也不知道这个工作应该被发明成什么样。
企业的真实困境:AI 让每个岗位的边界都需要重新画,但企业的组织结构、KPI 体系、汇报关系是按旧岗位设计的。重新画岗位边界等于重新设计整个组织。这件事的成本和风险都太大。所以企业选择最小阻力的路径:教员工用 AI 工具,假装这就够了。
个体推「终身学习」。市面上的终身学习产品大量是把 AI 时代的焦虑产业化为知识付费课程。这些课程教的仍然是知识层和工具层,AI 已经接管的部分。学完这些课程的人,焦虑暂时缓解,工作方式不变。这是 ISSUE 01 讲过的「反刍消费」在教育市场的落地形态:付费、消费、心理满足、零迁移。更深的问题:真正的认知重构不能通过付费课程完成。它需要真实任务加真实反馈的反复循环,需要长期一对一的导师,需要持续的元认知校准。这件事的成本极高、覆盖率必然低,本质上无法被产品化。市场上看似在解决夹层世代的认知重构需求,实际上在做的是另一件事:消费焦虑。
焦虑与绝望
夹层世代不是均匀的。不同年龄、不同岗位、不同积累程度的人,处境完全不同。下面三种处境覆盖夹层世代里最有代表性的子群。
初入职场的文本型工作者。22-28 岁,刚毕业 0 到 5 年。工作内容是文本型的:写邮件、做 PPT、写报告、做表格、整理调研资料、做基础数据分析、对接客户的标准化沟通。
他们的全部价值在于「基础执行能力」。但他们刚进入职场,还没有积累任何东西:没有领域策略库,没有职场判断力,没有流程的内在熟悉,没有任何技能的自动化。所谓「基础执行」,本质上是「被组织带着完成具体任务」。这是过去职场新人的标准入口。
AI 之后这个入口在消失。这件事和第三章 3.3 节的判断不矛盾:第三章说过「一个会用 AI 的工作 1 年的人,写报告的速度和质量超过不用 AI 的工作 5 年的人」,这话对新人本身是利好,借助 AI 他能做老人的活;但放到劳动力市场上看,结果反过来——当新人加 AI 的产出等于老人加 AI 的产出等于 AI 直接做的产出,新人「借助 AI 能做老人的活」就不再构成稀缺性。劳动力市场上已经有足够多会用 AI 的人,且 AI 直接做也能做。新人没有比较优势。
他们的全部产出,AI 都能做。AI 写邮件、AI 做 PPT、AI 写报告、AI 整理调研资料、AI 做基础分析、AI 起草客户邮件,每一项都比新人快、比新人稳、比新人质量更整齐。AI 的边际成本以 token 计算,几乎为零。新人的成本是工资、社保、培训时间、试错容忍。企业的经济计算很简单:与其花一年时间把一个新人训练到能稳定交付,不如直接用 AI 完成这一年里的全部产出。
这件事的真正后果不是新人立刻失业。是新人原本应该在头几年获得的「积累入口」消失了。前一代人通过头几年的基础执行积累了领域策略库、流程熟悉、职场判断力,由此走向中层和高层。这一代人没有这个入口。基础执行的工作给了 AI,剩下的判断工作他们没有资格承担。他们既不能像前辈那样「打杂打到能上岗」,也没有获得 AI 时代的认知重构能力。
他们的处境是夹层世代里最难解的。他们被旧训练塑造得不够厚(还没来得及在职场里积累),又错过了积累入口(AI 把这个入口抹去了)。
具体感受:投了几百份简历得到的工作,做了几个月发现自己写的报告 AI 都能写。请教前辈,前辈说「你需要积累」,但具体什么是「积累」,前辈也说不清楚。AI 工具的教程一波一波,学完之后焦虑暂时缓解,但本质上没有给自己提供「积累」。
37-47 岁的中层管理者。工作 15 到 25 年,已经从专业岗走到中层管理岗。被训练的核心能力是判断和执行循环的运转、跨部门协调、向上汇报、向下管理。
AI 之后他们承担的核心动作(判断和执行循环),AI 时代的标准被大幅提升,而旧训练里没有 AI 使用。按旧路径培训的中层,相比会用 AI 的中层,在效率和质量上都大幅落后。
具体感受:发现自己手下的年轻人用 AI 完成的工作量是自己的几倍。开始尝试用 AI,但用得别扭、不顺手、效率反而不如年轻下属。一些人选择拒绝,强调「判断不能交给 AI」。一些人选择硬学,但学得慢。一些人选择观望,等公司给出明确的方向。
他们的位置在过去 10 年是组织里最稳的位置,在未来 10 年可能是最不稳的位置。
47-55 岁的资深专业人员。工作 25 到 33 年,积累了相对厚的领域深度图式。可能是某个细分领域的专家、技术骨干、行业老兵。
相比前两组,他们有一个真正的资本:领域深度图式。这是 AI 接管不了的层次。但他们也有一个真正的劣势:和年轻人相比的 AI 使用差距。一个 50 岁的专家如果不会用 AI,他的领域深度图式仍然有价值,但产出的速度和广度被严重限制。一个 30 岁、用 AI 用得熟的人,可能在三年内就把这个差距追平到一个不舒服的距离。
他们距离退休还有 10 到 18 年。这个时间足够完成一次自我重构,但需要主动选择重构。多数人会选择「用我熟悉的方式工作完最后这些年」。这个选择本身没有错,但它意味着这一代人的领域深度图式在退休时大量带走,没有传递给下一代。这是组织在 AI 时代的隐性成本。
责任的归属
夹层世代不是被遗忘的人群。他们规模太大、声音太响、对劳动力市场和消费市场的影响太深,没有人会真的忘记他们。
但他们是被推卸的人群。
教育系统不为他们设计。他们已经离开教育系统,教育改革针对的是教育系统的下一代产出。
企业不为他们设计。企业的 AI 培训停在工具层,不解决认知重构。重新设计组织流程的成本和风险企业承担不起。
终身学习市场不为他们设计。它把焦虑产业化,把消费当作解决方案。
夹层世代的认知重构最终落在他们自己身上。
夹层世代的认知结构是为一个不再存在的世界准备的。这不是他们的错,但弥补这个错位的责任落在他们自己身上。
第五章 · 演化
七项机制本身不会消失。失格的是它们的训练方式,不是机制本身。
七项机制如何演化
| 机制 | 旧训练对象 | 新训练对象 |
|---|---|---|
| 知识 | 事实、定理 | 基础知识仍必须 + 知识边界判断 + 通过 AI 寻找知识 |
| 流体智力 | 不训练(先天) | 不训练(先天) |
| 调用频率 | 题库 + 基础逻辑 | 题库仍必须,但要保持结构性陌生 |
| 策略库 | 套路、图式 | 浅层:AI 工作流模板;深层:领域深度图式(不可教) |
| 基础自动化 | 识字、计算、阅读、规则、道德 | 旧基础仍必须 + AI 使用范式的自动化 |
| 认知去耦 | 数学物理证明(精英) | 题库激活仍是核心 + AI 协助 |
| 元认知 | 不训练 | 一般性元认知可大规模训练 + 领域内元认知是个人责任 |
把训练分成三类。
A 类:可标准化训练。基础知识、规则、道德、企业文化、AI 使用范式、AI 寻找知识、一般性元认知。规模化生产,覆盖所有人。
B 类:可标准化工具激活不可标准化能力。题库、基础逻辑训练。出题、评分、覆盖可标准化,但激活的能力(Gf 调用、认知去耦深层)不可标准化。
C 类:完全不可标准化。领域深度图式、领域内的元认知。必须个人完成,AI 是工具不是替代。
A 类和 B 类的训练对象在 AI 时代有了新的形态。教育系统和企业培训都需要按这个新形态重建。C 类的训练永远是个人责任,AI 让它的工具变了,但责任主体没变。
基础、AI 使用、AI 寻找知识、一般性元认知
知识训练的三件事。
第一是基础知识仍必须。一个不懂牛顿三定律的人,AI 告诉他什么他都信;一个不懂基本经济学概念的人,AI 给的政策分析他无法判断。这一层不能因为 AI 自带知识就放弃。
第二是知识的边界。什么情况下这个知识成立、什么情况下不适用。这是知识训练在 AI 时代的真正升级。旧训练教知识本身,新训练同时教知识的适用条件。一个知道「x 在 y 条件下成立」的人,比一个只记得「x」的人,在 AI 协助下能做出更好的判断。
第三是如何通过 AI 寻找知识。AI 之前知识检索是个人化技能(搜索引擎、文献数据库、专业资料库)。AI 之后这件事变成结构化的、可训练的能力。如何提问让 AI 给出有效信息、如何识别 AI 输出里的可疑部分、如何让 AI 帮你定位真正的原始来源、如何在 AI 帮助下处理跨领域的知识连接。这一层在 2026 年还是新东西,没有成熟教材,但它会是未来基础教育和职业培训的标配。
规则、道德、法律、企业文化。这一类训练在过去和现在都必须。员工手册、合规培训、企业文化、社会规则。它们决定 AI 工具能不能被合法、合伦理地使用。不展开。
AI 使用范式作为新基础自动化。把过去的「识字、计算、阅读到不需要专门动脑」扩展到「AI 调用、AI 工作流到不需要专门动脑」。借鉴日本范式:高频、泛用、可标准化、记忆加实训。需要练到自动化的具体内容:知道什么任务该交给 AI、什么不该;知道怎么对 AI 提问(结构化、上下文清楚、要求明确);知道怎么读 AI 的输出,立刻判断真假;知道怎么用 AI 校准判断而不是替代;一组针对常见任务的 AI 协作模板;一组反例(AI 在什么情况下犯错)。具体训练形式:大量重复练习、标准化模板、实训反馈、跨任务迁移。这一套不要求高 Gf,要求大量重复加规范化实训,类似日本职业培训的现场学习(OJT)。
一般性元认知可以训练。元认知不是不可教。它是分层的。
一般性元认知是可以大规模训练的:关于寻找知识的知识、关于问问题的知识、关于判断结构的知识、关于复盘方法的知识。这些都是可教、可练、可考核的。它们有明确的训练形态、明确的评估标准、明确的进步路径。
在 AI 之前这一层已经有人在教:信息素养课、批判性思维课、研究方法课。但覆盖面窄,因为依赖高水平的教师,而高水平的教师永远是稀缺资源。
AI 之后这件事变了。AI 既是训练工具,也是反馈工具。学完一般性元认知之后可以立刻用 AI 校准:你提的问题质量高不高、你的追问方向对不对、你的判断结构清楚不清楚,AI 都可以给即时反馈。这是 AI 之前不存在的训练循环。
领域内元认知仍然是个人责任。在具体领域里「我什么时候可能错」、在具体决策里「哪一步推理需要再检查」、在具体任务中「我哪里在自欺」。这一层依赖具体领域的真实任务和真实失败,无法脱离领域单独训练。
教材从 AI 来。传统教材几年一更新。AI 工具和范式几个月一更新。任何固化教材都会立刻滞后。解决方案:教材本身从 AI 来。用 AI 生成针对当前 AI 工具的实训内容、案例、反例、评估标准。让教材随 AI 自身演化实时更新。如果未来 AI 知识进展变慢(达到成熟期),可以回到固化教材。如果继续加速,必须保持教材实时更新。
AI 让高认知教师的覆盖面无限扩大。教育系统的存在前提是规模化。规模化要求三件事:可标准化、可考核、可低成本大量复制。
用人训练人有一个根本问题:人的水平呈正态分布。再好的师范训练也不能保证每个老师在中位以上。一个学生在三线城市遇到的语文老师,水平可能在全国分布的 50 百分位以下。这个学生的训练质量受老师水平的限制。
AI 改变这件事的方式是「AI 帮助一个高认知教师无限扩大他的教材形式的覆盖面」。一个真正高水平的教师,在过去的覆盖范围是他能直接教的几个班、几百个学生。他的教学方法、判断力、对学科的理解,无法被复制到其他教师身上。他的认知是稀缺资源,被几何分布限制。
AI 之后这件事变了。一个高认知教师可以让 AI 学习他的教学方法、判断标准、出题思路,然后让 AI 把这套方法以教材、习题、反馈、答疑的形式覆盖到任何学生身上。AI 在这里不是默认水平的输出(默认水平 AI 是中位偏上),是这位教师认知的放大器。学生通过 AI 接触到的训练,可以远超过 AI 默认输出的水平。
这是 AI 时代教育的真正机会:第一次让最好的教师的覆盖面,从几百人扩展到几千万人。需要的是高认知教师愿意把自己的方法结构化让 AI 学习、有支持这件事的工具链、有让学生接触到这种「被高认知教师塑形过的 AI」的渠道。这件事当下还处于早期。但它的方向已经清楚。
题库仍然必须
题库的真正价值。前面立过判断:好题制造结构性陌生感,迫使学生进入 Gf 工作状态。题库失格也讲过:题型被充分模式化之后,训练退化为标准反应。
题库本身不失格,失格的是已经被模式化的题。一道结构陌生的好题,在 AI 时代仍然能激活 Gf 调用、认知去耦、策略库的深层。它是不可标准化能力的可标准化触发器。题库是「可标准化工具激活不可标准化能力」。出题、评分、覆盖可标准化,但被激活的能力(现场推理、深度图式)不能规模化。这是题库在 AI 时代仍然必须的本质原因。
中国体系的强项。中国基础教育在过去 30 年训练出了世界上最厚的题库训练体系。这个体系在 AI 时代不应该被否定,应该被重组。旧体系的问题:题库充分模式化后退化为标准反应训练。新体系的方向:保留题库训练机制,更新题库内容。让大量题保持结构性陌生。
用 AI 自动出题。具体方法:用 AI 生成新题。每年甚至每月更新题库结构。让题库迭代速度跟上学生刷题速度。让「刷过」成为一个不可能完全实现的状态。这件事在 AI 之前做不到。出题速度受人类教师限制,几年才能出一套有质量的题。在 AI 之后做得到,因为 AI 可以低成本大量生成新题。
但这里有一个护栏。关键不是题面新,是深层结构新。AI 很容易生成「题面新、结构旧」的伪陌生题。AI 出题必须叠加结构去重、难度标定、人工抽样复核三层,否则只是把旧题换皮。
题库训练加 AI 自动出题等于中国体系的真正升级路径。这一升级保留中国题库训练的强项,同时去掉它的失格点。
考核方式的根本变化。题库本身的升级只是一半。另一半是考核方式。
旧考核方式:测试学生能否独立给出标准答案。这个考核方式在 AI 之后失效(标准答案 AI 都能给)。
新考核方式:考核学生如何通过 AI prompt 解决认知边界的问题。审计学生(或在职培训参与者)与 AI 的完整对话过程。具体审计的维度:他给 AI 的初始问题是什么;他读到 AI 第一轮回答后做了什么判断;他追问的方向是什么;他如何验证 AI 的输出;他在哪里识别出 AI 的错误或局限;最终的结论是怎么形成的。
这个考核方式测的是判断、决策、深度。也就是认知去耦和元认知的具体应用。它比答案更难被 AI 替代,因为它考核的是过程和判断链,不是答案和结果。
这套考核必须放在受控环境里:完整日志、随机追问、现场口辩、版本记录、限制外部工具、要求解释每一步判断转向。否则学生可以让 AI 生成一段看起来合理的 prompt 轨迹,过程本身也可以被伪造。
这不是从零开始的想象。当下已经出现 AI competency framework、AI Assessment Scale、AI literacy rubric 和 oral defense 的回潮。它们都在指向同一件事:AI 时代的考核不能只看最终答案,而必须看人如何使用 AI、约束 AI、质疑 AI、验证 AI。
但真正缺的,是把这些零散做法合成一套过程审计:审计学生给 AI 的初始问题、读到第一轮回答后的判断、追问方向、验证方式、错误识别、结论形成,以及他能否在现场解释每一次判断转向。
中国基础教育和职业培训如果要做真正的升级,方向是:题库内容用 AI 实时迭代,考核方式从「独立给标准答案」改成「审计 AI 协作过程」。
基础逻辑训练。数学、形式逻辑、基本论证训练。这一层是认知去耦的基础。中国基础教育已经在这一层有大量投入。新方向是让训练形态保持「陌生」。让学生每次面对题时仍然处于「必须现场推理」的状态。奥数、形式逻辑、几何证明这些在过去被认为是精英训练的项目,在 AI 时代应该回到基础教育。它们不培养 Gf 上限,但激活 Gf 调用频率和认知去耦,AI 之后最稀缺的几层。
全球借鉴的可能。中国题库训练体系如果完成升级(题库迭代加考核方式变化),可以为其他国家提供一个真正的参照。日本、北欧基础教育在 Gf 调用频率和认知去耦激活上的训练强度不如中国。美国头部有,中位以下没有。如果这个升级成功,中国基础教育第一次有可能成为全球可借鉴的模型。前提是要主动做这次升级,而不是等改革被动到来。
企业 knowhow 蒸馏
训的失格没有现成改革路径,因为没有「AI 时代的训」作为现成参照。
当下企业在做什么。 很多领先企业已经在做工作流和岗位 knowhow 的蒸馏。它们把工作流程、岗位职责、内部决策路径、专家经验、领域 know-how,用 AI 帮助文档化、结构化和可调用化。具体形态包括:通过 process mining / task mining 捕捉真实工作流;把内部 wiki、邮件、会议纪要、项目文档、决策记录接入企业 AI;让 AI 从案例库里总结「什么情况下用什么方案」;把客服、销售、IT、审计、运营等高频场景里的处理路径提炼成可复用规则;在部分场景里,甚至记录专家或高级员工的实际操作过程,把隐性的流程和判断显性化。
这件事在 2024 到 2026 年已经从实验进入领先公司的平台化部署。它还没有在所有企业里成熟,但方向已经非常清楚:企业正在把原本散落在员工、流程、文档和系统里的 know-how,转成 AI 可以调用的组织上下文。
当下的目标定位仍然偏窄。 主流企业叙事里,这件事首先被定位成两类目标。第一是效率提升:让员工用 AI 更快完成现有工作,把蒸馏出来的知识库当作助手、搜索入口、agent 上下文或自动化材料。第二是 AI 使用管理:衡量员工是不是在用 AI、用得多深、采用率和 ROI 是否提升。
这些目标都成立,但它们仍然是当下视角的优化。更深的一层,是企业还没有充分把这些被蒸馏出来的工作流和 know-how,当作未来培训体系的底座。真正重要的不是「让今天的员工更快完成工作」,而是:当岗位、流程和决策链都被 AI 重写之后,企业如何把组织里最稀缺的判断、路径、经验和隐性规则,重新变成新人可以学习、老人可以迁移、团队可以复用的训练材料。
这件事的真正意义,不在知识库,而在训练源。
回到前面的论点:AI 让高认知教师的覆盖面扩大。在企业里,「高认知教师」不一定是老师,而是公司里最好的中层、最资深的专家、积累了最多 know-how 的老员工。他们的判断方法、经验、决策标准,过去只能通过师徒制、长期共事,以一对一或小范围方式传递。这是日本企业内训为什么有效的根本原因:暗默知通过长期共事沉淀;也是为什么日本模式不能规模化:一对一传递覆盖率太低。
蒸馏到 AI 之后,这件事变了。一个新员工通过 AI 接触到的,不再只是 AI 的默认输出,而是被这家公司最好的人、最有效的流程、最长期积累的案例塑形过的输出。一个中层通过 AI 学到的,也不只是通用管理知识,而是这家公司特有的判断框架、决策标准、流程逻辑。
这套机制把 AI 从效率工具推进成训练工具:AI 默认只能提供通用中位输出,但企业 know-how 的蒸馏,能把它校准到组织内部的高质量经验上。原来这个论证讲的是 AI 协助员工工作;现在还可以讲 AI 协助员工被训练。
真正的企业培训变革。企业培训的真正变革有三件事。
第一,把 knowhow 蒸馏作为培训系统的核心,不只是当下效率工具。蒸馏出的内容要按「用 AI 培训新员工」的标准设计:结构化到能让 AI 理解、能让 AI 用来出题、能让 AI 用来评估学习者。
第二,培训方式从课堂讲授转向 AI 协助实训。新员工进入岗位后直接进入「被 AI 协助完成真实任务」的状态,跳过一周课堂培训。AI 基于公司蒸馏的 knowhow 来辅导他、纠正他、给他出题、审计他的协作过程。这一套机制让新员工在三个月里达到旧体系下两年才能达到的水平。
第三,考核方式从「考结果」改为「考 AI 协作过程」。和前面讲的考核改革对应。审计新员工与 AI 的对话过程:他的问题质量、追问方向、判断深度、识别 AI 错误的能力、最终结论的形成过程。这些维度直接反映他的真实判断力。
当下绝大多数企业还在哪里。绝大多数企业还停在「教员工用 AI 工具」的层次。少数领先公司已经在做 knowhow 蒸馏,但目标定位仍然是当下效率而不是未来培训。把蒸馏作为培训基础的企业,2026 年此刻几乎没有。
这件事会发生,因为逻辑上不可避免。但它的速度受三个因素限制。
第一,蒸馏出的 knowhow 质量取决于公司原本的文档化水平。一个内部知识库混乱的公司,蒸馏出的 AI 训练材料也混乱。
第二,企业管理层是否意识到这件事是培训基础。多数企业还在把它当效率工具。
第三,员工是否愿意配合。一个高级员工如果担心自己的 know-how 被 AI 学走后自己被替代,就不会真正配合蒸馏。
这三个因素决定了企业培训变革的速度。它会发生,但比技术能做到的速度慢得多。在企业完成这次升级之前,夹层世代仍然要靠自己重构。
不可标准化层:领域深度图式与领域内元认知
C 类训练永远不能规模化。前面论证过:领域深度图式需要长期真实任务和真实失败的反复反馈;领域内元认知没法考核,发生在每个人自己的判断过程里。这两层永远是个人责任。
AI 协助下的自我训练形态。接 ISSUE 01 和 ISSUE 02 已经讲过的几个动作:先有可被反驳的初步判断;用 AI 找反例、做审阅、拆变量;在反驳中修正自己的判断模型;把经验抽象成可结构化的规则;把学习接回真实任务。
针对每一项机制的自我训练。
Gf 与其调用频率:每天给自己设一些「必须用推理才能解决」的小任务。不让自己习惯性问 AI,先自己想 5 分钟。AI 在这里的角色是反例提供者。这不仅是机械提高 Gf 调用频率这么简单,反复推动 Gf 的调用亦可以不断接近你的 Gf 上限
认知去耦的深度:当你对具体问题有判断时,让 AI 帮你「把这个判断的核心结构抽象出来」,看这个抽象结构是否在其他领域适用。这是 ISSUE 01 讲过的「跨域同构」训练。
领域内元认知:每次重要判断之后,让 AI 帮你做复盘。这个判断的关键假设是什么、哪些证据支持、哪些反对、如果证据变化判断会怎么变。
领域深度图式:依赖真实任务和真实失败。AI 不能替代失败本身,但可以加速反思。每次完成一个真实任务后,让 AI 帮你梳理「如果重做一次,哪一步会变」。
AI 是校准工具,不是替代工具。具体感受:你对一个问题有判断,让 AI 反驳。AI 给出反驳后,你判断这个反驳是否成立。如果成立,修正你的判断;如果不成立,论证为什么。这个循环每一次都在训练领域内元认知。成本:AI 的 token 成本极低,但你的时间成本不低。每次复盘需要 30 分钟到几个小时。这件事不能被规模化。也没必要被规模化。它是个人和 AI 之间的一次次具体对话,是个人自己的认知重构过程。
谁能完成
一个具体的基础画像:不刷题但能考七八十分的人。每个人都见过这种人。中学和大学时不刷题、上课不听讲、记得有限的题型公式和解法,但考试现场能拿到七八十分。
他们的认知机制有三件事。
Gf 上限较高。同样一道结构陌生的题,他不需要见过同类题目,能现场推理出来。
少而精的策略库。他记得的「几个题型公式和解法」是已经被深度内化的核心结构,不是浅层套路。他用的是领域深度图式:知道什么时候该用什么、什么时候该绕开。
领域内元认知较强。他知道自己什么时候在自欺、知道哪一步推理需要再检查、知道考试时间该怎么分配。
这三件事,恰好是 AI 时代被单独定价的能力。这种人在 AI 时代的处境,会比靠刷题拿到同样分数的人好得多。
但有两个边界。
第一个边界:从考试场景迁移到真实工作场景不是自动的。考试场景结构化、有标准答案;真实工作非结构化、需要自己定义评价指标。有些人在考试里靠 Gf 拿高分,到了真实工作里仍然停留在「等任务下达」的状态。
第二个边界:这种人在分数顶部的占比仍然很小。绝大多数高考状元、保送生,是高强度刷题加高 Gf 的组合。「少刷题但拿高分」在统计上仍然是少数。
更精确的判断:在 AI 时代,「少刷题加中高分」的认知模式的市场价值,会显著高于「高强度刷题加高分」组合的市场价值。前者的认知模式接近 AI 时代被单独定价的形态,后者刚好是 AI 接管的部分。但前者要在职业生涯里真正脱颖而出,仍然需要在离开考试体系之后主动把这种认知模式迁移到真实任务上。这件事不会自动发生。
门槛不在年龄、学历、行业。夹层世代里能完成自我重构的人和不能完成的人之间,差别不在年龄、不在学历、不在所处行业。一个 50 岁的工程师可能完成,一个 30 岁的产品经理可能完成不了。
门槛在五件事。
A. 能给出可被反驳的初步判断 B. 能从被反驳中修正模型 C. 能把经验抽象成规则 D. 用 AI 找反例、做审阅、做复盘 E. 能把学习接回真实任务
每一件都不难。加在一起需要长期的坚持和真实的痛苦。
「不刷题但能考七八十分」的人,天然具备 A、B、C 三项中的一些。但 D、E 需要主动养成。AI 工具是新的,把学习接回真实任务是每个人的主动选择。
一个相对积极的判断。夹层世代里能完成不可标准化层(C 类)自我重构的人是少数。这件事必然如此,因为这一层的训练永远不能规模化。
但 A 类和 B 类的训练在 AI 时代有能力让覆盖率高、质量稳定。这意味着:只要 A 类和 B 类的 AI 转向真正完成,大量「基础可用 + 有基本元认知」的人会被生产出来。他们可能没有完成深度自我重构,但能完成基础的 AI 使用、能在 AI 协助下达到中位以上的工作输出、能保留基础的判断能力。这些人在劳动力市场上仍然有位置。
前提是 A 类和 B 类的转向真正发生。AI 使用范式作为新基础自动化、AI 寻找知识、一般性元认知训练、题库加 AI 自动出题加 AI 协作过程审计、AI 让高认知教师覆盖面扩大、企业 knowhow 蒸馏。这些转向在 2026 年都在早期,还没有成熟的实践,但路径是清楚的。
如果这些转向成功,AI 时代不会出现「大多数人活不下去」的局面。出现的会是分层:完成深度重构的少数人在头部,完成基础转向的大量人在中位以上稳定。
前提是 AGI 没有实现。如果 AGI 在 5 到 10 年内实现,所有这些路径都需要重写。AGI 的讨论不在这里展开。
教育的滞后与这件事的尺度
教育系统的设计本质上是滞后产物。它训练出来的人要在 5 到 10 年后进入劳动力市场,必须假设劳动力市场在这 5 到 10 年里相对稳定。
这个假设在过去 100 年大体成立。AI 时代第一次彻底打破。AI 改写工作的速度按月算,任何一个 5 到 10 年的教育课程,在毕业生进入劳动力市场时已经过时。
每一波改革到来时,都已经是上一个时代的方案。
对夹层世代的具体含义:不能等。
不能等政府的改革(10 年周期,不针对在职成年人)。不能等企业的培训(停在工具层)。不能等市场的产品(焦虑产业化)。
唯一可以现在开始的是 AI 协助下的自我训练。这件事不依赖任何外部体系准备好。
回到序章。杨坚要的是可被使用的官员。腓特烈要的是可被使用的国民。Ford 要的是可被使用的工人。每一次教育和培训的建立,都是在回答「这个人能否被使用」。
2026 年此刻。一个 35 岁的人坐在工位上,看着 AI agent 完成他过去十年熟练做的事。他知道自己当年被训练的不再是市场单独定价的能力。
这一代人没有被给一份新地图。他们当年被给的地图,描述的是一个已经不存在的地形。
教与训不是失去意义。是失去旧资格。
旧资格里有几样东西仍然必须:基础知识、规则、道德、基础逻辑、题库激活。也有几样东西必须新建:AI 使用范式、AI 寻找知识、一般性元认知训练、AI 协作过程的审计、AI 协助下的自我训练、企业 knowhow 蒸馏。
这次重建会发生。但它的速度比技术能做到的速度慢得多,比夹层世代承受得起的速度更慢。
今天先把今天的事做好。明天的事,明天的人去做。
引用与出处
本文涉及的主要事实性断言的来源。
关于序章三个历史时点
- 杨坚(隋文帝)开皇七年(587 年)下诏命各州「岁贡三人」应考「秀才」。《隋书·高祖纪》。学界对「科举开端」具体年份长期有争议,部分学者认为科举正式确立于隋炀帝大业年间(605 年起)设进士科。本文取 587 年「岁贡三人」作为分科取士的雏形与常见叙事
- 腓特烈二世(Frederick the Great)1763 年 8 月 12 日颁布《Generallandschulreglement》(一般地方学校规程),是普鲁士国家义务教育制度化的关键文本:James Van Horn Melton, Absolutism and the Eighteenth-Century Origins of Compulsory Schooling in Prussia and Austria (Cambridge University Press, 1988)。需要说明:普鲁士 1717 年腓特烈·威廉一世已颁布过早期义务教育敕令;马萨诸塞湾殖民地 1642 年也有更早的强制教育法规。1763 年规程不是「世界第一部国家级强制义务教育法」,而是国家义务教育制度化早期的标志性文本之一
- Henry Ford 1913 年 Highland Park 移动装配线把汽车装配时间从 12.5 小时压缩到 93 分钟;1914 年 1 月宣布 5 美元/日工资政策;1914 年成立 Sociological Department(50 名调查员上门访问工人家庭);Ford English School 教授英语和美国公民常识。来源:The Henry Ford 数字档案(Sociological Department & English School 集合);Stephen Meyer III, The Five Dollar Day: Labor Management and Social Control in the Ford Motor Company 1908–1921 (SUNY Press, 1981)。Ford English School 创立年份在不同档案中有 1913 与 1914 两种标注,本文取「1913 年装配线建立同期 + 1914 年 Sociological Department 扩展」的整合叙事
关于七项认知机制(第一章)
- 流体智力(Gf)与晶体智力(Gc)二分:Raymond B. Cattell, “Theory of Fluid and Crystallized Intelligence: A Critical Experiment,” Journal of Educational Psychology 54 (1963): 1-22;及 Cattell-Horn-Carroll 智力理论的后续发展
- 暗默知(あんもくち / tacit knowledge):Michael Polanyi, The Tacit Dimension (Routledge, 1966) 是源头;野中郁次郎(Ikujiro Nonaka)在企业知识管理语境下的应用见 Nonaka & Takeuchi, The Knowledge-Creating Company (Oxford University Press, 1995)
- 「认知去耦」概念框架出自 Offbook Press ISSUE 01《On Cognitive Decoupling》。「调用频率」「策略库」「基础自动化」「元认知」是本文为整合教育心理学相关概念而组织的工作框架,不完全对应单一学术理论
关于国际教育评估数据(第一章)
- PISA 2022:OECD (2023), PISA 2022 Results (Volume I): The State of Learning and Equity in Education, OECD Publishing, Paris, https://doi.org/10.1787/53f23881-en。中国大陆未参加 2022 年评估。文中中国 B-S-J-Z(北京、上海、江苏、浙江四地)数学 591、阅读 555、科学 590 的数据来自 PISA 2018:OECD (2019), PISA 2018 Results
- PISA 2022 创造性思维评估:OECD (2024), PISA 2022 Results (Volume III): Creative Minds, Creative Schools
- PIAAC 2023(OECD 成人技能调查第二轮):OECD (2024), Do Adults Have the Skills They Need to Thrive in a Changing World? Survey of Adult Skills 2023。韩国 literacy 比 2012 年下降 23 分;numeracy 下降;https://www.oecd.org/en/publications/survey-of-adults-skills-2023-country-notes_ab4f6b8c-en/korea-republic-of_5f95963c-en.html
- TIMSS Video Study 1995 / 1999:James W. Stigler & James Hiebert, The Teaching Gap: Best Ideas from the World’s Teachers for Improving Education in the Classroom (Free Press, 1999);James Hiebert et al., Teaching Mathematics in Seven Countries: Results from the TIMSS 1999 Video Study (NCES, 2003)。1995 年覆盖德国、日本、美国;1999 年扩展到 7 个体系:澳大利亚、捷克、香港、日本、荷兰、瑞士、美国(不含德国)。「shallow teaching syndrome」出自 1999 年澳大利亚国家报告:Hilary Hollingsworth, Jan Lokan & Barry McCrae, Teaching Mathematics in Australia: Results from the TIMSS 1999 Video Study (ACER, 2003)
- NAEP 2024:U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, 2024 NAEP Reading & Mathematics Assessment Results,https://www.nationsreportcard.gov/。4 年级阅读 10 / 90 百分位差 107 分
关于四国教育与培训改革(第一章、第四章)
- 中国「双减」政策:2021 年 7 月中共中央办公厅、国务院办公厅《关于进一步减轻义务教育阶段学生作业负担和校外培训负担的意见》
- 中国《教育强国建设规划纲要(2024–2035 年)》:2025 年 1 月中共中央、国务院印发
- 中国《「人工智能+教育」行动计划》:2026 年 4 月教育部、国家发展改革委、工业和信息化部、科技部、国家数据局五部门联合印发,教科信〔2026〕1 号
- 上海 4–7 年级 AI 必修课、黄浦区登峰计划:上海市教育委员会、黄浦区教育局公开文件
- 美国联邦 AI 教育行政令:“Advancing Artificial Intelligence Education for American Youth”(The White House, 2025 年 4 月)
- 日本 GIGA School 计划(2020–2024 完成全国中小学数字设备配置):MEXT(日本文部科学省)
- 日本 MEXT 2024 年 12 月《初中等教育阶段使用生成式 AI 指南》:MEXT
- 日本 AI Education Accelerator Program:通过与产业方(包括 SoftBank Robotics 等)合作扩大教师 AI 培训。该计划具体规模数字(如「训练约 5 万名教师」)目前仅见于二级来源(行业咨询博客与媒体报道),未在 MEXT 官方文件中直接核实,故文中以定性表述为主
- 韩国 AI 数字教科书(AI Digital Textbook,AIDT):2025 年 3 月在部分学科分阶段实施,因效果与教师反馈问题在 2025 年内被国会立法降级为「教育资料」(非法定教科书),并大幅缩减预算与覆盖范围。来源:韩国教育部公告与韩国国会相关立法记录
关于高考数学题认知机制分析(第二章)
- 笔者团队对 2021–2025 年高考数学真题的认知机制分析。样本:567 道题,覆盖上海卷、北京卷、浙江卷、全国甲卷理、全国乙卷理、新高考Ⅰ卷、新高考Ⅱ卷 7 种独立卷型。每题在 v2 校准版分类器下输出 verdict(GOOD / MIXED / BAD)、primary_mechanism(七项认知机制)、crammer_gap(刷题红利)等维度。数据资产:scored.jsonl(567 行)+ findings.md(项目结论文档)
关于 AI 与劳动力市场(第三章、第五章)
- 「AI 大幅提升效率的同时把质量保证在中位以上」的判断出自 Offbook Press ISSUE 02《Rebuilding Learning》和 ISSUE 03《Breakdown of Firms》中关于 AI 在工作产出层的影响分析
与 Offbook Press 此前各期的概念延续
- 「认知去耦」「反刍消费」「跨域同构」出自 ISSUE 01《On Cognitive Decoupling》
- 「AI 审阅循环」「AI 大幅提升效率的同时把质量保证在中位以上」出自 ISSUE 02《Rebuilding Learning》
- 「组织瓦解」「AI 在大型组织内落地的阻力」「knowhow 蒸馏」的早期讨论出自 ISSUE 03《Breakdown of Firms》
- 「形式可以脱离实质独立产生」「作者性」训练出自 ISSUE 04《Mirage of Form》
本文对上述来源做了概念上的转述而非直接引用。如需核查原文表述,请参考原始链接与出版物。
ISSUE 05 完 · Offbook Press · 2026
Complete English translation.
Preface · Eleven Hundred Years
Chang’an, 587 CE.
Yang Jian is paging through a list.
The seventh year of Kaihuang. Officials need filling. Most of these names he doesn’t know; each carries the name of a recommender. The recommenders he knows. The Three Departments’ clerks, the prefectural Rectifiers — for decades they’ve cycled through, the same handful of clans.
Cui. Lu. Wang. Xie.
These names appear on the dispatches he signs, and on the dispatches his grandfather signed, and on dispatches sent by emperors well before that. Northern Zhou is gone. Northern Qi is gone. Liang is gone. Chen is about to fall. These names have not.
He has unified the north and is about to unify the south. From his generation forward, this land will no longer be the north or the south of one dynasty or another. It will be Sui. But every appointment he intends to issue must first pass under these names.
The Nine-Rank System has run for three centuries. Rectifiers grade local men into nine ranks; the central court appoints by grade. On the surface it picks talent; underneath it counts pedigree. An emperor seated in Chang’an, every appointment he sends out has already passed through the brushes of a few dozen clans in the provinces.
Yang Jian wants this country to obey him.
He wants a method. No Rectifiers. No recommendations. No clans passing things along. Anyone, if they can pass a public standard, gets identified directly.
In the seventh year of Kaihuang, the court issues an edict: examination by category.
A line begins to show its shape from here.
For thirteen hundred years afterward, the details would be revised over and over. Subjects changed. Content changed. The palace examination appeared. Anonymous transcripts appeared. The eight-legged essay appeared. Dynasties turned over, again and again. What stayed unchanged underneath was one thing:
Let a man seated at the centre pick, out of ten million households, the small handful that can be used; and let that picking pass through no pedigree, no recommendation, no intermediary.
The exam was not designed to give scholars a path. Giving scholars a path was a by-product.
Eleven hundred years pass.
Potsdam, 1763.
Frederick II, seated at Sanssouci, signs a decree.
The Seven Years’ War ended half a year ago. Prussia won. The peace is a winner’s peace, but this winner is nearly hollow. Half a million dead in battle and famine. Eastern villages emptied wholesale. Tax revenue close to dry. What carried him through the war was a standing army trained by the state and an industrial apparatus directed by the state.
With the war over, he knows the next will come. He needs a particular kind of person — one who can shuttle between factories and barracks, who can be slotted into a position inside standardised processes.
This kind of person does not appear on his own.
Children in the countryside cannot read, cannot do arithmetic, cannot keep time, cannot accept abstract orders. They answer to their fathers, to their clans, to the parish priest. The state is not on their list. To get the state inside them, the state has to intervene earlier.
On 12 August, Frederick signs the Generallandschulreglement — the General School Regulations.
Children aged 5 to 13 must attend. The state prescribes the curriculum.
It is the foundational text of Prussian state-mandated schooling, and a key reference for later national-education models across Europe.
Compulsory education trains four things.
Literacy. Arithmetic. Punctuality. Obedience to graded authority.
Literacy makes orders legible. Arithmetic makes basic tasks in factories and armies possible. Punctuality makes mass-scale coordination possible. Obedience makes a bureaucracy able to assign each man his slot.
The process of producing a child into what the state needs can now begin before the child knows what the state is.
Another hundred and fifty years pass.
Highland Park, Michigan. 1913.
Henry Ford is looking at a report.
It says that this year, to maintain thirteen thousand positions, the Ford plant hired fifty-two thousand people. Turnover: 380%.
Mass production of the Model T began in 1908. Earlier this year, he pushed the moving assembly line to full load for the first time. Chassis assembly time dropped from 12.5 hours to 93 minutes. A car got assembled in less time than it took a worker to finish lunch.
But the plant has a problem.
Workers don’t come. Those who come don’t stay. A machine sitting idle while it waits for someone who can turn screws costs more than the screw-turning itself.
The reason is clear. Most of the floor is recent immigrants. Poles, Italians, Greeks, Eastern European Jews, Mexicans. They cannot follow the foreman’s English. They cannot read the safety signs. They cannot remember which step their badge number maps to. They were used to one tempo back home; the factory needs another. The gap between a peasant arriving from the countryside and a man on the assembly line is something compulsory education does not bridge. Compulsory education gave him the four basics: literacy, arithmetic, punctuality, obedience. What the plant needs is for him to stand at his assigned station and, on a 93-minute beat, repeat one specific motion calibrated to the second. No existing system teaches that.
That year, Ford founded the Ford English School. Within a year, several thousand immigrant workers were attending classes inside the plant and after hours. The curriculum included English, American civics, personal hygiene, how to use a bank, how a household in America budgets. The next year it expanded into the Sociological Department: fifty investigators visited workers’ homes to assess whether they were “ready to be a Ford worker.” Those who passed earned twice the industry rate — five dollars a day.
It is one of the earliest large-scale corporate adult-training systems documented in the archives — paid for, designed, and run by the company itself.
The problem it tackles is not whether workers can read. Compulsory education has already answered that. The problem is the next one:
Once a person has been filtered through compulsory schooling and identified by the state as a usable object, how do you produce him further into an execution unit that can drop, immediately, into one specific position.
For a century afterward, the form would change repeatedly. Apprenticeships became on-the-job training, OJT became corporate universities, corporate universities became employee development systems, and those became today’s leadership programs, role certifications, internal business schools. The thing underneath did not change: take people who have passed the universal filter and produce them, further, into people a specific organisation can use directly.
The imperial exam answers a selection problem: given ten million households, how do you identify the usable objects.
Prussian compulsory schooling answers a production problem: how do you turn ten million households into identifiable objects in the first place.
Ford’s training apparatus answers a fitting problem: how do you take an already-identified object and machine him further into someone a specific organisation can use at a specific position right now.
Stack the three together and you get the full structure of modern teaching and training. First produce people into identifiable objects. Then filter the small subset that is usable. Then machine the usable into units that drop straight into a specific role.
After the twentieth century, every industrialised country runs all three machines at once. Compulsory schooling, exams and credentials, corporate training. The shapes vary. School-leaving ages vary, exam formats vary, the selection points vary, the language varies. Corporate training varies too — apprenticeships, onboarding, OJT, internal business schools, leadership programs, external MBAs. But literacy and arithmetic, mass-scale classrooms, age-graded cohorts, subject exams, credential ladders — plus role descriptions, SOPs, training manuals, performance reviews, promotion tracks — that whole understructure holds in every modern state and every modern organisation.
The point isn’t new in the academic literature. Bourdieu and Passeron’s Reproduction (1970), Bowles and Gintis’s Schooling in Capitalist America (1976), Andy Green’s Education and State Formation (1990) — half a century pointing at the same thing: the school’s actual function is to produce, for the existing state and economy, people who can be used.
Teaching and training together form a human-capital production system designed for the purpose of using people.
At its design moment, the system does not assume responsibility for whether a person lives well. The question it answers is:
Can this person be used.
Chapter 1 · The Current State
Yang Jian, Frederick, Ford — three points already in the past. Three problems, not in the past: who can be filtered out, who can be produced into a usable person, who can be fitted to a specific position. The system has run for another hundred and ten years since Highland Park in 1913. In April 2026 it is still running, but its shape is starting to deform.
Seven Cognitive Mechanisms
“Education trains people” sounds like one thing. Pulled apart, it is seven. Each one has different plasticity, a different fate after AI, and a different training method.
First is knowledge (crystallised intelligence, Gc). The facts, theorems, vocabulary, and case stock a person stores in his head. Reading, classes, problem sets, accumulated experience — most of the thickness lives here. This layer can accumulate without limit. After AI its scarcity has fallen close to zero; AI’s fact base outstrips any single person’s. But having read it, remembered it, understood it still has a use: when AI gives an answer, you can decide right away whether it’s true.
Second is fluid intelligence (Gf). The on-the-spot reasoning ability that, given a situation you’ve never seen, picks out structure, finds an entry point, and constructs a path. It does not lean on knowledge; it leans on whether a person can fire up thought in a new setting at all. This is hard to lift directly through training. Decades of psychology converge on the same conclusion: education and brain training do not reliably, far-transferably raise the Gf ceiling. What education can do is not raise the ceiling out of thin air; it is to make the next three layers solid.
Third is activation frequency. Even a person with a high Gf ceiling, if he doesn’t use it for a few years, leaves it idle. Gf, with no external pressure, does not start itself. The number of times a person is pushed in childhood and adolescence to the “have to use it” edge determines how much of his reasoning he can actually mobilise as an adult. This is where problem-bank training really earns its keep. Good problems manufacture structural unfamiliarity and force the student to model on the spot; bad problems train standard responses, and the more you grind them the less you think. After AI this layer does not depreciate, because AI has no continuous subject that can be trained — it cannot maintain your thinking-state for you.
Fourth is the strategy repertoire. Every solution painfully reasoned out the first time can be called up directly the next time the same shape appears. Chess has its formations, physics its tricks, engineering its feel. The gap in professional problem-solving is mostly not in Gf; it is in the depth of the repertoire. After AI this layer splits in two. The shallow, template-shaped strategies AI has all of, faster and more accurately than you. The deeper layer — judgments tied to specific domains (“this machine sounds wrong from the moment it starts up”) — AI cannot pick up, because it has no on-site sense.
Fifth is foundational automation. Reading, calculating, processing text, writing, all trained to the point where they no longer require dedicated thought. This layer determines how much cognitive capacity is left over when you do high-level work. Japanese basic education invests most heavily here; Japan and the Nordic countries top PIAAC adult assessments precisely because of this. After AI the role of this layer changes: it is no longer producing for you, it is letting you decide right away that “this looks wrong” when AI hands over an answer. A person without solid foundations using AI is outsourcing judgment to a system that cannot bear responsibility.
Sixth is cognitive decoupling. Stripping abstract structure out of concrete context and operating on it as a formal object. Proving a geometry theorem, pushing an algebraic step, debugging a program — they are doing the same thing. It is closely related to Gf but narrower, more specialised, more effortful. ISSUE 01 placed it at the centre of “the real divide in the age of AI.” After AI this layer is the scarcest. AI itself is a product of cognitive decoupling — language stripped into probabilities, reasoning stripped into parameters — but it cannot do the decoupling on a specific person’s behalf. AI can hand over the result; the act of decoupling has to be done by a specific person.
Seventh is metacognition. “Where do I not understand”, “under what conditions does this method hold”, “did I just skip a step too fast.” It is a habit ground in over years through real tasks and real feedback. The old education barely trains it. Exams don’t test it, teachers can’t grade it; it happens inside the student’s own head. After AI this layer is extremely scarce, because faced with AI’s output, “should I trust this right now” is the judgment AI cannot do for you.
Lay the seven side by side and a faint pattern shows up: the layers the old education invested most in (knowledge, the shallow part of activation frequency, the shallow part of the repertoire, foundational automation) are precisely the ones starting to depreciate after AI; the two layers it barely invested in at all (cognitive decoupling and metacognition) are precisely the ones AI has made scarcest.
Teaching and Training
The Preface introduced these two words.
Teaching is school education, from primary school to university. It does two things: sort people into ability tiers, and shape each tier into an identifiable object. Credentials, scores, certificates, school rankings — these are its outputs.
Training is what happens at work: apprenticeships, on-the-job mentoring, corporate training, master-apprentice pairs, industry certifications. It does one thing: take people who already have credentials and shape them into people who can drop, immediately, into a specific role. Role experience, process fluency, reporting skill, collaboration habits — these are its outputs.
Teaching produces sortable people. Training produces embeddable people. Both held in the past, because the world really did need large numbers of sortable, trainable, embeddable people.
Disqualified
Begin by separating “disqualified” from “failed”. A failure is something done badly. Disqualification is something whose qualification no longer holds. A thing can be done very well and still be disqualified.
The old education and the old training are not without use. They were once usable, because knowledge, experience, fluency, and formal compliance were genuinely scarce in the past. A person who could write a passable essay was, more often than not, someone who genuinely understood. A person who could write working code was, more often than not, someone with engineering chops. A person who could deliver consistently was, more often than not, fit for the role. The form indirectly testified to the substance.
What AI has changed is that proof relation. The form still works, still delivers, still gets people to feel “this is well done.” But it no longer reliably testifies that a corresponding subject stands behind it. ISSUE 04 called this “form being able to be produced independently of substance”.
Teaching and training are disqualified at exactly this point: the “qualified” they used to produce indirectly testified to actual qualification; today it doesn’t.
Four Reference Systems
Comparing national education systems is dangerous work. It falls too easily into national caricatures: East Asia drills, the Nordics free, America innovates, Japan disciplines. Each label is half right, and none of them can be used as an argument.
This essay picks China, the United States, Japan, and the Nordics as four reference systems, because between them they cover almost all of the possible shapes “teaching and training” took before AI. Each system is a complete, internally coherent design for producing people into “usable objects,” answering one specific historical problem. Plotted against the seven mechanisms above, each invests in a different direction.
China’s core is filtering by problem bank. From the Sui imperial exam to the contemporary gaokao (the make-or-break Chinese university entrance exam), to the recent wave of “top-tier innovative talent” programs in the Yangtze Delta and Beijing, the logic has not moved: through large-scale, tiered, feedback-rich standardised problems, push students repeatedly against the edge of their existing ability, trigger on-the-spot reasoning, and rank the results into a stable order. Its strength is high-density Gf activation and stable ranking. Of two people with the same ceiling, the one who has been pushed against the edge again and again will get more done than the one who never has. Its weakness is that once problem types have been exhaustively patternised, training degenerates from “triggering judgment” to “training standard responses.” Of the seven mechanisms, China’s thickness sits in knowledge, activation frequency, and the strategy repertoire.
Japan’s core is foundation and embedding. From the moment Meiji reform imported the Prussian model, the system trained literacy, calculation, reading, attention, and rule-awareness to extraordinary thickness — thick enough that by the time a student enters work, those motions are essentially automatic. Then, inside firms, on-the-job mentoring, rotation, and lifetime employment shape him into someone who can carry a specific function inside organisational processes for the long run.
The Japanese management scholar Ikujiro Nonaka summarised this kind of capacity as anmokuchi — tacit knowledge — that cannot be encoded in writing and only deposits over years on the post. Its strength is that foundational automation runs deepest: in PIAAC, Japan stays in the top tier because Japanese adults’ basic cognitive abilities don’t depreciate badly with age. Its weakness is that mass-scale social retraining is inflexible and cross-industry mobility is hard. Of the seven, Japanese thickness sits in foundational automation and knowledge.
The United States runs on a different logic: don’t try to push everyone, concentrate resources at the top. Top private schools, research universities, AP classes, honours classes, magnet schools, undergraduate research — that pipeline builds a chain of high-intensity cognitive environments, lets people who are already high-Gf get identified through open-ended tasks, and pours resources to amplify them. It is willing to accept that the median is unimpressive and the bottom is bleak, in exchange for the right-tail explosion at the top.
The intensity of cognitive-decoupling and metacognition training at its top end is unmatched anywhere else. To this day, the overwhelming majority of the world’s top AI researchers were trained in the American top tier. The cost is that almost nothing is invested below the median. By 2026 the picture is more extreme than that description suggests: the top and the bottom have already split into two different countries. The NAEP 2024 numbers below will lay this out.
The Nordics (with Finland as the case) take a fourth path: low pressure, average, long. They don’t push the student. They lean on small school-to-school variance, high teacher autonomy, and stable foundational training to bring the vast majority up to a respectable floor. Then the real weight goes onto post-school life: adult retraining, lifelong-learning entry points, and a learning-rights regime co-funded by unions, employers, and the state. A person at 30, 40, 50 is still learning new things. The PIAAC top tier is Finland, Sweden, Norway, the Netherlands, and Japan, all for this reason: Nordic adults don’t depreciate. The strength is long-term maintenance of the population-wide floor; the weakness is a lack of concentrated training in deep elite schemas. Of the seven mechanisms, the Nordic system has neither a particularly strong nor a particularly weak layer.
The four systems lean in completely different directions because the historical problems they answered are not the same. China had to filter usable engineers out of an enormous population. Japan, in postwar reconstruction, had to shape people into reliable members of industrial organisations. America, in an open society, had to keep producing entrepreneurs and innovators. The Nordics had to maintain the long-run productivity of a small high-welfare society.
Put together, these four systems cover almost every shape “teaching and training” could take before AI.
Several systems sit outside the main set and need a brief mention. South Korea is not in the main set but works better as a counter-example: students score very strongly on assessments, while adult skill profiles are noticeably weaker, and the hagwon ecosystem makes interpreting the assessments harder. It will appear later as a case in negative: the AI digital textbook withdrawn within four months in 2025, and three sub-OECD-average PIAAC 2023 adult scores — two ends of the same thread. Germany’s dual-track system is another vocational benchmark, with narrower coverage than China or the US, and will appear later for contrast. Singapore is a miniature hybrid, too small in scale to be a primary reference.
Together, these four systems cover the most distinct shapes “teaching and training” took.
Students: PISA 2022
PISA is the OECD’s triennial assessment of 15-year-olds in mathematics, reading, and science. The 2022 results are below (mainland China did not participate in 2022).
| System | Math | Reading | Science |
|---|---|---|---|
| Japan | 536 | 516 | 547 |
| Finland | 484 | 490 | 511 |
| United States | 465 | 504 | 499 |
| OECD average | 472 | 476 | 485 |
Mainland China last participated in PISA in 2018. The B-S-J-Z sample (Beijing, Shanghai, Jiangsu, Zhejiang) scored 591 in math, 555 in reading, 590 in science, covering 81% of the 15-year-old population in those four regions — not a national sample. Mainland China is also missing from PIAAC, and PISA only has those four regions, so strictly speaking there is no “China-wide” object available for international comparison.
PISA 2022 also includes a creative-thinking assessment. Singapore led with 41, Korea 38, Canada / Australia / New Zealand / Estonia / Finland between 36 and 38, OECD average 33. Korea’s high score has to be read carefully: it shows that Korean students perform very strongly under PISA’s test framework, but the private hagwon tutoring ecosystem, international competition entries, and a track record of cheating in tests can all affect the external validity between assessment performance and long-term adult ability. That is this essay’s reading, not a conclusion of the PISA data itself.
Adults: PIAAC 2023
PIAAC is the OECD’s adult-skills survey, assessing literacy, numeracy, and adaptive problem-solving in the 16-65 population. It assesses what people who left the education system years ago actually do with information at work and in life — closer to the ultimate output of teaching and training than PISA. 2023 was the second round.
| Country | Literacy | Numeracy | Adaptive PS |
|---|---|---|---|
| Finland | 296 | 294 | 276 |
| Japan | 289 | 291 | 276 |
| Sweden | 284 | 285 | 273 |
| Norway | 281 | 285 | 271 |
| Netherlands | 279 | 284 | 265 |
| Germany | 266 | 273 | 261 |
| United States | 258 | 249 | 247 |
| Korea | 249 | 253 | 238 |
| OECD average | 260 | 263 | 251 |
Mainland China did not participate.
Nordics + Japan is the strongest adult-skills cluster, top tier across all three. The United States is below the OECD average on numeracy and adaptive problem-solving, and its 16-24 cohort also scores below average on all three. Korea is below average on all three; relative to the previous round, literacy is down 23 points, numeracy down 10.
Classrooms: The TIMSS Video Study
PISA and PIAAC measure outcomes. TIMSS, in 1995 and 1999, did something rare: it filmed real classrooms and analysed actual time allocation in 8th-grade math classes. 1995 covered Germany, Japan, and the United States; 1999 added Australia, the Czech Republic, the Netherlands, Switzerland, and Hong Kong.
Distribution of student work time across three categories of activity:
| Country | Practising procedures already learned | Applying concepts to new situations | Inventing new methods |
|---|---|---|---|
| Germany | 96% | 4% | 0% |
| United States | 90% | 8% | 2% |
| Japan | 41% | 15% | 44% |
Share of 8th-grade math lessons earning the highest deductive-reasoning rating: Japan 39%, Germany 28%, United States 0%.
TIMSS 1999 has another set of numbers. Among teacher-designed problems requiring connections between different concepts (“making connections” problems), the share that actually got discussed and worked through in class: United States 8%, Australia extremely low, the other five systems (Japan, Czech Republic, Hong Kong, Netherlands, Switzerland) between 37% and 52%. Researchers gave Australia’s pattern a name of its own: “shallow teaching syndrome.”
These numbers are from 1995 and 1999. The exact percentages may have shifted; the structural difference largely has not.
America’s Bottom: NAEP 2024
NAEP is the U.S. National Assessment of Educational Progress — popularly, the nation’s report card. The 2024 round produced the clearest picture of the American education system in its current state:
In 4th-grade reading, the gap between the top 10% and the bottom 10% is 107 points, one of the largest in NAEP’s history. The 8th-grade math top-bottom gap is the largest ever recorded. In absolute terms: the bottom 25% of students score lower in reading and math than students at the same percentile in 1992. Over 32 years, the worst-off slice has not improved — it has gone backward relative to a generation ago. The 4th-grade lowest percentile is at a 20-year low; the 8th-grade lowest is the lowest in NAEP history.
The widening gap is not noise. From 2005 to 2024, the top 10% in 4th-grade math rose 4 points and the bottom 10% fell 3; in reading, top up 3 and bottom down 5. Harvard and Brown’s joint Annenberg Institute estimates that, over those two decades, internal achievement inequality in U.S. education widened by an amount equivalent to 1.3 academic years of learning. Add the pandemic, and by spring 2024 the national student average is still about half an academic year behind pre-pandemic levels, with no state having returned to its 2022 reading level.
At the same time, the American top end keeps producing the global AI industry’s core talent. MacroPolo’s Global AI Talent Tracker 3.0 traces the undergraduate origins of the world’s top AI researchers; the U.S. top-tier schools (MIT, Stanford, Berkeley, CMU) are, after China, the world’s most important supply source, and U.S. graduate schools concentrate the highest share of top AI talent.
The bottom 25% has slid back past 1992. The top end’s training-and-attraction machinery still leads the world. One country is folding two eras into itself.
Current Reforms
All four systems are reforming. The directions differ. None is finished. The earliest of their outputs reach the labour market sometime in the mid-2030s.
China timeline:
- July 2021, the “Double Reduction” policy lands; off-school subject tutoring is systematically suppressed.
- April 2022, the Ministry of Education issues a new compulsory-education curriculum framework, putting “core literacies” at the centre.
- December 2024, MoE issues guidance on strengthening AI education in primary and secondary schools.
- January 2025, the State Council issues the Education Strong Nation Plan (2024-2035), calling for “concentrated assessment of students’ key abilities, disciplinary literacies, and quality of thought,” and launching the “Fertile Soil Initiative” (science literacy for K-12) and the “Excellence Initiative” (top-tier innovative talent for high schoolers).
- From 2024, Shanghai introduces a mandatory AI Foundations curriculum for grades 4 and 7, one period a week, no fewer than 30 periods per grade per year, with the in-house “InnoSpark AI Education Model 1.0” available for any district school to plug into.
- Huangpu District’s “Climbing the Peak” program builds five subject-specific top-talent bases anchored in three flagship demonstration high schools (Datong, Gezhi, Xiangming), selecting no fewer than 300 students over three years; runs joint research-talent programs with Fudan, Jiao Tong, Tongji, and East China Normal that bridge high school and university.
- April 2026, five ministries (MoE, NDRC, MIIT, MoST, NDA) jointly issue the AI + Education Action Plan (Jiao Ke Xin [2026] No. 1), targeting by 2030 a vertically and horizontally connected AI curriculum across all education stages and a society-wide AI-literacy system — pushing AI education from local pilots to a national institutional deployment.
US timeline:
- From 2020, the “science of reading” reform is rolled out across multiple states, restoring systematic phonics instruction and early reading intervention.
- The 2024 NAEP shows zero state improvements in reading; multiple states pivot toward stronger foundational-skills assessment.
- April 2025, the White House issues an executive order on advancing AI education for American youth.
- From 2025, the U.S. Department of Education brings AI literacy and AI proficiency into federal-funding-eligible categories in multiple guidance documents.
Japan timeline:
- 2020, the new course of study takes effect; it introduces active learning, cross-disciplinary project-based learning, and mandatory primary-school programming.
- The GIGA School plan, 2020-2024, completes nationwide student devices and network deployment.
- December 2024, MEXT issues guidelines for using generative AI in primary and secondary schools.
- AI Education Accelerator Program launches; partnerships with industry expand teacher AI training.
Korea timeline:
- March 2025, AI digital textbooks are rolled out in math, English, information, and special-education Korean for grades 3-4 and the first year of middle and high school.
- July 2025, first-semester adoption rate is 37%.
- August 2025, the National Assembly downgrades AI digital textbooks from “core textbooks” to “supplementary materials”; school adoption becomes voluntary.
- Late 2025, newly elected President Lee Jae-myung rescinds the mandatory AI textbook policy; second-semester adoption rate falls to 19%.
Korea is among the most willing in the world to push AI education reform — and also the fastest to back away.
A Few Observations
Lay the data and the timelines side by side and a few things surface.
Student performance and adult performance don’t line up. Japan is strong at both layers. Finland is mostly strong at the adult layer. The U.S. is unimpressive at both. China’s developed regions are strong at the student layer; no comparable adult data exists. Korea is strong as a student, declining as an adult. You cannot read off adult ability from teenage scores.
Classroom-level differences are larger than assessment-level differences. German and U.S. 8th-grade math classes spend over 90% of the time practising procedures already learned; Japan spends 41%. PISA’s final scores do not reveal this, but it is the actual shape of basic-education delivery.
The U.S. internal split is the most severe of the four. The bottom 25% has fallen below its 1992 floor while the top end keeps shipping top-tier talent to the rest of the world. One country.
All four systems are reforming. The earliest reform outputs reach the labour market in the mid-2030s. The current generation of workers was produced by the old system.
The Core Problem
Four systems cover almost every shape “teaching and training” took before AI. Their measured profiles in PISA, PIAAC, the TIMSS Video Study, and NAEP are richer and stranger than the caricatures suggest. They are all reforming. The reforms point in different directions. The earliest of those reforms reach the labour market in the mid-2030s.
The current scene is here.
Education and training drill seven different cognitive mechanisms. Each has its own plasticity, its own fate after AI, and its own market scarcity. The four layers the old system poured its resources into — knowledge, the shallow part of activation frequency, the shallow part of the strategy repertoire, foundational automation — are precisely the ones depreciating after AI. The two layers the old system barely funded — cognitive decoupling and metacognition — are precisely the ones AI now prices on their own.
This mismatch is not the fault of the current workforce. The system produced them on its own logic, evaluated them by its own standards, sent them into the labour market — and then began to deform. Reform has started, and it is aimed at the children who will start school five years from now. The several generations who still have 20 to 40 years left to work in the labour market are not in scope.
Teaching and training have not lost meaning. The credentials they used to confer are coming undone.
Chapter 2 · Teaching, Disqualified
The four layers the old system invested most heavily in are precisely the ones depreciating after AI; the two layers it barely invested in are precisely the ones AI now prices on their own. The mismatch is structural. The premise behind the education system and the capability frontier of AI cover the same patch of ground.
The Source of the Mismatch
The premise of the old education was “produce identifiable people at scale.” A single national school faces hundreds; a country’s basic education system faces tens of millions to hundreds of millions. That order of magnitude forces any training inside the system to satisfy three things at once.
Standardisable. What gets taught has to be defined precisely in textbooks and curricula. Otherwise different regions and different teachers cannot deliver on a shared cadence. A gaokao math problem that students from Urumqi to Shanghai to a county-town high school can all attempt and have graded by the same standard is what this layer makes possible.
Examinable. How well it was taught has to be evaluated against an external standard. Otherwise tiers can’t stably rank, and credentials lose their use to the labour market. If “ability” amounts to “I know I get it” with no external signal anywhere, the entire credential system unravels.
Cheap to mass-replicate. Teacher costs cannot run so high that you cannot hire them. An ordinary middle-school teacher, with four years of teacher training, has to be able to deliver curriculum knowledge in a classroom on a shared cadence. That capacity has to be reproducible at scale. If only top-tier scholars could teach a given subject, that subject cannot be part of basic education.
Of the seven mechanisms, the layers that satisfy all three conditions are knowledge, the shallow part of activation frequency, the shallow part of the strategy repertoire, and foundational automation. They can be written into textbooks, posed as standardised exam questions, and delivered on a shared cadence by teachers of varying quality. That is why nearly all of the old system’s resources end up here.
The ones that do not satisfy the three are deep cognitive-decoupling training and metacognition.
Deep cognitive decoupling demands sustained engagement with structurally unfamiliar situations. Each “unfamiliar” is single-use. Once it gets standardised into a textbook, the unfamiliarity is gone, and training degenerates into standard responses. Institutionally, this is something you cannot mass-produce. The most effective parts of the Chinese problem-bank tradition — top schools’ Math Olympiad training, the math-major core at research universities — sit precisely outside the standardised-assessment frame, in small classes and supervisor-driven mentoring.
Metacognition demands the loop of real tasks plus real feedback, repeated. It can’t be tested, because “the extent to which you can monitor your own thinking” is not something a single test sheet can measure. It can’t be cheaply replicated either: providing real tasks and real feedback to one student requires someone above the student’s level to participate one-on-one for a long time. That cost is unsupportable inside scale education. K-12 has almost no slot for metacognition training.
This is the system’s internal boundary. The layers that scale, the system mass-produces. The layers that don’t scale fall outside the system’s coverage and get left to a small set of elite programs, master-apprentice arrangements, and individual self-study.
Up to here, the mismatch is just an interesting fact. After AI, it gets serious.
The essence of AI training is taking the standardisable, structurable corpus that human society laid down over centuries and turning it into output capacity that mass-replicates cheaply.
The standardisable parts — AI has all of them.
The examinable parts, i.e. the parts with model answers — AI gives them faster and more accurately than people.
The mass-replicable parts (knowledge, templates, formulas) — AI’s per-unit cost is essentially zero.
What AI covers and what the old education covers is the same class of “scalable cognitive output.” The relation between them is substitution, not complement.
The old education did not do anything wrong. What it did was mass-produce identifiable objects. AI is doing the same thing, cheaper, faster, with broader coverage.
This mismatch comes from the scale mechanism being used twice. Education used it for one to two centuries. AI used it for a few years. The two cover the same patch of ground, and AI’s coverage is more complete.
Question Banks, Disqualified
Problem-bank training works by pushing the student repeatedly to the working edge of his Gf: a situation he hasn’t seen before, where he has to identify structure, find an entry point, and construct a path with no ready-made solution. That is real on-the-spot reasoning. The harder he is pushed against that edge, the closer he gets to his usable Gf ceiling.
The mechanism only works if the problem is sufficiently “familiar” and sufficiently “unfamiliar”. A problem whose structure the student has never seen triggers on-the-spot reasoning, but the reasoning tools required have to be tools he has already learned. If the problem leans on totally foreign knowledge, he can’t solve it. If the same structure has been drilled for forty rounds, the unfamiliarity is gone. He stops firing up Gf and instead reaches into the strategy repertoire: identify the type, plug in the template, expand from memory.
After decades of grinding, the bulk of problem types have been exhaustively patternised. The high-intensity “problem-bank ability” of Chinese basic education is, in essence, this degraded form of training: students no longer doing on-site reasoning, but doing high-speed pattern recognition plus template recall.
This is something AI takes over without resistance. AI carries an enormous library of problem types and solution strategies, faster and more accurately than any drilling-prizewinning gaokao top scorer. A student who can get 95% of the gaokao math right is doing what AI does in one second.
This is not speculation. The author analysed the cognitive mechanisms of 2021-2025 gaokao math (567 problems across 7 distinct paper variants). Sorting by “extent to which Gf is invoked”:
| Category | Share | Meaning | Drilling premium |
|---|---|---|---|
| GOOD | 8.8% | Genuinely invokes fluid intelligence; requires on-site construction | 0.32 |
| MIXED | 29.6% | Multi-step reasoning, but each step matches a standard weapon | 0.50 |
| BAD | 57.3% | Pattern recognition + template substitution | 0.47 |
“Drilling premium” is how much faster a student who has drilled 30 problems of that chapter is than one who hasn’t. 0 means parity; 1 means the drilled student finishes instantly while the un-drilled student basically can’t solve it. The category captures whether the problem actually pushes the student to the working edge of Gf.
By mechanism, 37.9% of problems test “foundational automation” (taking derivatives, plugging formulas), 33% test “deep schemas in the strategy repertoire” (3-D coordinate setup, conic-section + Vieta), totalling 71%. Genuine Gf problems are 18%.
A few counter-intuitive findings.
The Quan-Guo-Jia paper science section, four years (2021-2024), 90 problems total: zero GOOD. Including the so-called “anchor problems” — every step matches a standard weapon.
The drilling premium is highest on MIXED (0.50), lower on BAD (0.47), and lowest on GOOD (0.32). The places where “the drilled student finishes instantly while the un-drilled gets stuck” are the medium-difficulty conic-section, derivative, and probability problems — not the basic ones.
From 2024, the GOOD share jumps from 6-8% in earlier years to 14%. This reflects examiners adding “newly defined” problem types in the anchor slot — Beijing’s q21 is the marker. But 14% means 86 of every 100 problems can still be drilled into one-second answers.
This is the size of the question-bank disqualification at this exact moment in 2026.
The shape of question-bank disqualification: the mechanism itself is not wrong, but once problem types have been exhaustively patternised, training has shifted from “triggering judgment” to “training standard responses” — exactly the layer AI takes over.
Form-Production, Disqualified
The huge volume of “formal compliance” training the education system runs (writing, presenting, slide-making, report-writing, résumé-writing) used to serve as “indirect evidence of substance.” A student who could turn out a passable argumentative essay was, more often than not, someone who genuinely understood. A student who could write working code was, more often than not, someone with engineering chops. A student who could clearly articulate a project in interview was, more often than not, someone who had actually done it. Formal compliance was a low-cost signal for filtering trustworthy candidates.
After AI the threshold of formal compliance collapsed. A proposal that looks complete, code that runs, an article that reads structured, a résumé that looks neat — none of these reliably testifies to the corresponding subject behind it. ISSUE 04 named this “form being able to be produced independently of substance”.
Which means the “production capacity” that the education system has been training for decades has, as a filtering signal, sharply lost validity. A student who can write something passable and a student who can’t but knows how to use AI look the same from the outside.
Employers, graduate programs, scholarship committees all face the same problem: application essays, recommendation letters, research proposals, personal statements — the things they used to filter for “writes well, has ideas, has accumulation” — may today filter for “knows how to use ChatGPT.” The “essay craft” basic education trains depreciates faster against AI tools than even the question bank does. The question bank still requires that the student remember problem types; AI-writing requires no prerequisites.
A deeper layer: form-production disqualification doesn’t just affect filtering, it affects learning itself. A student trained from childhood to “build the essay structure first, then fill in content” is being trained on the implicit assumption “form carries substance.” In the act of training the form, he naturally deposits an understanding of the substance. After AI, that assumption breaks: the student can have AI produce the form directly, skipping the substance-deposition step. The internal feedback loop of the whole training mechanism is severed.
This is one of the education system’s most awkward positions in the age of AI: not only does production capacity depreciate, but “training substance through producing form” — a path that worked for centuries — also begins to fail.
The Standard Answer, Disqualified
The core move of an exam system is to standardise the answer. The point is to lower assessment cost so that scale assessment becomes possible. A model answer lets any reasonably trained grader produce nearly the same score.
This is the basis on which the education system has run for centuries. From the Sui imperial exam to the contemporary gaokao, an examination system that can be evaluated against a standard is the precondition for a credential to function as a social signal.
But what AI is best at producing is precisely the standard answer.
The closer the abilities the education system trains are to “can give the standard answer,” the more they testify that this is a thing AI already does. A gaokao top scorer can give an essentially perfect standard answer in the time allotted, but in the age of AI, that ability is no longer scarce.
A deeper layer: when an exam can score above 90 with AI, that exam is no longer measuring what it set out to measure. It is measuring “how good you are at using AI” or “how good you are at cheating.” That layer has come fully unmoored from the exam’s design goal.
In the past two years, university exam systems in both China and the U.S. have been confronting this. American universities widely report that ChatGPT can independently complete most undergraduate coursework. Chinese universities increasingly find “ghostwriting + AI tools” mixed-mode cheating. National responses go from banning AI, to AI detection, to switching exams to live oral defence or closed-book longhand — all of it, at root, an attempt to re-anchor the exam to “subject ability.”
But this only delays the problem; it doesn’t solve it. Standard-answer evidentiary power has failed in uncontrolled environments. Inside controlled exams it can still test foundational automation and parts of the strategy repertoire, but it can no longer independently certify that a person has full judgment.
Either the exam abandons the standard answer (and becomes open-ended, but expensive and weakly comparable), or it keeps the standard answer (but admits the exam now only tests what AI already excels at). Both roads end at the same fact: the credibility of examination as a filtering mechanism is breaking down.
This is deeper than question-bank disqualification or form-production disqualification, because it rocks the entire evaluation basis on which the education system runs.
Disqualification Squared
Exam disqualification is a first-order problem: the credibility of exams as filtering mechanisms is falling in the age of AI. The tutoring industry built around exams is a second-order problem: spending massive time and money to train students to perform on a filtering mechanism that has already been disqualified. The nature of that thing is disqualification squared.
The shape is plain in three of the four systems. After Double Reduction suppressed off-school subject tutoring in China, demand went underground, into “literacy” courses, and into AI-themed courses; total parental spend on tutoring is still enormous. A great deal of that tutoring exists to win the child a few more points on a gaokao mechanism that has already been disqualified, with no expectation that the child will actually master anything. American SAT prep, AP prep, and Common App personal-statement editing are a multi-billion-dollar industry; after AI, the evidentiary value of what they produce has fallen sharply, and the industry keeps growing. Korean hagwon are the extreme case of disqualification squared: send the child to cram school from age six and find that as adults, his foundational abilities have actually gone backwards (Korean adult literacy in PIAAC fell 23 points between rounds).
But disqualification squared will not be the steady state at the top.
Top families and top students are mostly already doing something different. American elite families send their children into programs that genuinely train cognitive decoupling: undergraduate research in research labs, competition math, debate, independent research projects — not SAT cramming. The output of those programs remains valid in the age of AI. High-knowledge families in Chinese first-tier cities are putting more time into top-tier-talent bases, Olympiad programs, and overseas summer-research projects, sidestepping scale-exam training. Nordic families never made exam tutoring the dominant mode anyway.
Two reasons the top withdraws. First, information advantage: the top sees earlier the difference between “playing along with a disqualified mechanism” and “training real ability.” Second, return on investment: for a high-Gf-ceiling child, putting training where it can’t be standardised (cognitive decoupling, metacognition) has the highest ROI, because that child has the latent capacity to make those layers thick.
But middle and long-tail families struggle to do this. The reason is that there is no substitute. A family in a third-tier city cannot find a local genuine research project, cannot find a mentor capable of high-intensity real feedback. The child’s only “ladder up” is still the gaokao. Parents put money into tutoring not because they trust it, but because there is nothing else to put it into. They know the value is going down.
This is what disqualification squared really costs. It doesn’t just worsen the situation of the tutored students in the age of AI; it widens the ability gap between top and non-top families further. The top can recognise the disqualified mechanism and withdraw, redirecting resources to training that still works. The non-top can recognise it but cannot withdraw.
At this layer, the disqualification of teaching has gone past education itself. It is redistributing the ability distribution of the next generation.
Four Failure Points
China. The strength of the Chinese system is high-density Gf activation and stable ranking. But the working edge of that strength is: the problem must be sufficiently “unfamiliar.” After decades of grinding, the unfamiliarity is gone, and training has degenerated into pattern recognition plus template recall. In the age of AI this amounts to “I’ll have AI do the drilling for me.”
The author’s 567-problem study, broken out by paper, shows the difference more sharply. The autonomous-paper provinces (Shanghai 14.4%, Beijing 13.6%, Zhejiang 6.8%) have a notably higher GOOD share than the unified new-gaokao papers (Paper I 8.7%, Paper II 5.3%). The Quan-Guo-Jia science paper, four years, 90 problems, has zero GOOD. The gaokao question bank has failed at “lifting Gf assessment”; the unified paper has pushed routinisation further. “New-context dressing” in the Chinese gaokao is mostly decoration; problems wrapped in Chang’e satellite signal transmission, basketball games, etc. are standard templates with the wrapper stripped away.
Recent reforms — Double Reduction, top-tier-talent programs, Shanghai’s mandatory AI curriculum, Huangpu’s Climbing the Peak — are all attempts to shift basic education from “standard-response training” to “unfamiliar-context reasoning.” The direction is right; coverage is limited. Climbing the Peak selects no more than 300 students over three years, an elite program inside one district, with no direct effect on the vast majority of national students.
The deeper problem: training for unfamiliar-context reasoning cannot itself be scaled. The Chinese system trying to keep its scale-filtering advantage while introducing un-scalable training is, structurally, a contradiction. There is no spreadable solution at present.
Japan. The strength of the Japanese system is the thickest foundational automation. The PIAAC numbers above tell the story: Japanese adults’ basic cognitive abilities don’t depreciate badly with age, because K-12 trains literacy, calculation, reading, attention, and rule-awareness to automation.
But Japan’s failure point is not in basic education; it is in the “embedding” assumption of corporate training: shape a person into a long-term carrier of a specific function. After AI, “specific function” is itself being rewritten. A person trained to carry a specific function for thirty years finds, when AI rewrites that function, that decades of “embedding capacity” have all gone to nothing.
At the basic-education layer, Japan’s failure point is the absence of an AI-use paradigm. MEXT’s 2024 generative-AI guidelines are a start, but they are far from forming an “AI-use paradigm in foundational automation.” Japanese basic education has not yet taken AI use into “trained to the point you don’t think about it.”
United States. American education has already split into three layers, each with a different fate in the age of AI.
The top end (AP, honours, research programs, top private schools, research universities) trains cognitive decoupling and metacognition at world-leading intensity. The overwhelming majority of top global AI researchers come from here. After AI this layer will be amplified further. The abilities they already have are precisely the ones AI prices on their own; with AI tools accelerating efficiency, the gap between the top end and everyone else widens.
The bell-curve interior. NAEP 2024 shows this layer in continued decline; the lowest percentile is at a 20-year low. States are trying to swing back to foundational basics — the “science of reading” reform is exactly that swing back. But returning to basics doesn’t help the current workforce; it helps children who will start school five years from now. The bell-interior workforce in the labour market today gets no relief.
The long-tail bottom. The bottom 25% has slid back past 1992. This layer will be further compressed in the age of AI. People without solid foundations using AI are outsourcing judgment to a system that cannot bear responsibility. The result is being held down by AI: they don’t have the ability to judge AI’s output, so they trust whatever AI gives them.
The most likely picture for the U.S. over the next ten years is the middle splitting in two: those who can complete a self-rebuild rise into the top end and accelerate; those who can’t slide into the bottom and get held down by AI. The bell-curve median, originally so populous, gradually disappears, and the social ability distribution moves from a bell curve to bimodal. This is the most dramatic disqualification picture among the four systems after AI.
The Nordics. The Nordic system’s strength is long-run maintenance of the population-wide floor. But the Nordics don’t train concentrated deep elite schemas, so at the student layer the right-tail explosive power isn’t there — and they don’t dominate the few extreme-output positions in the age of AI. That is the Nordic failure point: they can guarantee the floor doesn’t depreciate, but cannot guarantee a right-tail output. But the real opportunity for the Nordic system shows up in the age of AI. The mature lifelong-learning entry points keep Nordic adults from depreciating; if that mechanism extends to AI use, it amounts to extending the boundary of foundational automation into the age of AI.
Counter-examples: What Has Not Been Disqualified
The shape of training that remains fully valid after AI.
Real mathematics training: Math Olympiad, math undergraduate, math research. It trains the extreme form of cognitive decoupling: stripping abstract structure entirely out of context and operating on it as a formal object. Proving a theorem requires step-by-step derivation in a strict formal system. AI can verify; the act of training the human’s decoupling capacity itself can only be done by a human.
Top-tier research training: doctoral, postdoctoral, internships in research labs. It trains metacognition (knowing the boundaries of your own research direction, judging which problems matter, knowing when you’re deceiving yourself) and deep domain schemas (the “this solution looks wrong” judgment in the head of someone who actually understands physics). It depends entirely on real tasks plus real feedback; AI cannot do an advisor’s judgment.
Professional debate, moot court, debate club. Identifying argument structure under adversarial conditions, identifying hidden assumptions, constructing rebuttals. AI can simulate debate, but training people to call up judgment under pressure in live human-on-human exchange has to be done by real subjects.
Real engineering training. Engineering that requires debugging in complex systems, inferring anomalies, identifying hidden dependencies, judging when to rewrite an architecture rather than patch a bug. Engineering that is more than “code that runs”. This kind of training drills both deep domain schemas and metacognition.
Long-form mentorship: medical residency, traditional craft, clinical psychology. Training the kind of judgment that can only form through years of real feedback. A resident’s bedside judgment about a patient takes several years and several hundred real cases to form.
High-intensity writing training plus real feedback. The kind where you rewrite a draft thirty times a week — not the formality of a college writing class. The core of this kind of training is letting the student form, through repeated revision, his own judgment and his own voice — the channel through which “authorship,” in the sense ISSUE 04 discussed in its later half, is trained.
The shared characteristics of these forms:
Cannot be scale-taught (student counts have to be small).
Real tasks plus real feedback are required (assessment cannot be a stand-in).
The economics don’t add up (high teacher cost, low student conversion).
Coverage will inevitably be low (less than 0.1% of China’s 1.4 billion can plausibly be in such training).
This is the mirror of the earlier argument. The old education didn’t invest in these layers because they’re economically unattractive and can’t be done at scale. After AI, these are precisely the layers AI prices on their own.
The shapes of training that have not been disqualified share one property: none of them scale. The abilities the market prices on their own in the age of AI are exactly the abilities the education system has never covered.
Teaching’s Disqualification Is Not Inside Education
Teaching’s disqualification is, in essence, a structural mismatch between the education system’s premise (mass-produce identifiable objects) and the age-of-AI ability distribution (un-scalable judgment and decoupling). The disqualification is not in the teachers, not in the curricula, not in the exams.
This mismatch is the basic situation of education in the age of AI. It is not a one-shot problem to be corrected. Reform is aimed at the children starting school five years from now. The several generations on the labour market who were trained by the old system have to face the mismatch on their own.
Chapter 3 · Training, Disqualified
Teaching’s disqualification is in the content being taken over by AI. Training’s disqualification is somewhere else: training rests on three premises — the role is stable, the process is stable, the decision chain is stable — and after AI all three fail at once.
Training was not wrong in what it trained. It was disqualified because what it was training for is disappearing. This is a deeper disqualification than teaching’s. Teaching’s output has merely depreciated; training has lost its very premise.
Training’s Premises
The Preface told the Henry Ford story. The Highland Park moving line in 1913 compressed chassis assembly time from 12.5 hours to 93 minutes. What Ford solved was a specific problem: how do you take a group of farm-immigrants with no automobile-manufacturing background and shape them, in two weeks, into people who can stand at a single station and repeat one motion.
That is the origin of “training” as a modern institution. It is industrialisation’s particular way of shaping a person, distinct from education’s extension.
Training has three problems to solve.
First, turn a generic talent into a unit that can drop, immediately, into a specific role. A college graduate joining a company can’t produce value right away. He has to be trained into “this company, this role” — a part that runs steadily inside the operation. Role descriptions, SOPs, onboarding, shadowing, master-apprentice pairings, rotation — the core motion of all of this is that one thing.
Second, transmit experience between roles. The “what to do in what situation” judgment in a senior employee’s head cannot be fully written down. The organisation moves that tacit knowledge from older to newer employees through OJT, master-apprentice arrangements, long-term co-presence. This is the strong point of Japanese corporate training, and it is where Nonaka’s “tacit knowledge” actually applies.
Third, build stable expectations for a person inside the organisation’s process. “When I get this kind of request, this is how I handle it”; “when I report up, this is the format”; “when something breaks, this is who I find”; “this client gets handled this way.” These expectations carry the organisation’s culture, not the individual’s ability. The longer an employee is at a firm, the more organisational expectations he carries, and the more value he holds for the organisation.
Training’s outputs are role experience, process fluency, reporting skill, collaboration habits, organisational-culture carriage.
This system worked over the past several decades because three premises held simultaneously.
The role is stable. The shape of the role I am being trained into today will largely still be that shape next year.
The process is stable. The organisation’s way of working doesn’t get rewritten every year.
The decision chain is stable. Upstream-downstream relationships, who I report to, the boundary lines of division of labour — these are predictable.
Only when all three hold does training, as a long-term investment, make sense. An employee trained for five years to a certain capability is, by both the organisation and the employee, assumed to still hold that capability in year six, year ten, year twenty. If by year six the role is redefined and the capability fails, the five years of training become sunk cost. From the company’s view it’s money; from the employee’s view it’s five years of cognitive investment in a life.
Training rests on a stable world: stable roles, stable processes, stable decision chains. What AI changes is the possibility of all three holding at once.
Three Premises Failing at Once
Roles are no longer stable. A role is, at root, “a stable bundle of tasks.” A product manager’s role is the bundle of writing requirement docs, doing user research, drafting PRDs, tracking development, reporting up and down. An operations role is the bundle of data analysis, content production, process execution, KPI breakdown, daily ops. A middle manager’s role is the bundle of intake-requirements, break-down-tasks, hand-down-to-reports, track-progress, collect-results, report-up. Each role is a specific bundle that emerged from decades of organisational evolution. The bundle is stable on the precondition that the tasks inside collaborate — that pulling them out individually doesn’t get easy.
What AI changes is exactly that collaboration assumption. After AI, a large fraction of the tasks in the bundle can be done independently, and the cost of doing them independently approaches zero. Writing the requirement doc, synthesising user research, writing the PRD, doing baseline data analysis, generating templated content, clarifying requirements and producing task trees, tracking progress — every one of these AI can do. When 60-80% of the tasks in the bundle can be done independently by AI, the existence rationale of the bundle as a “role” evaporates.
The role is being deconstructed. It is now a collection of tasks that can be split apart, redistributed, and outsourced to AI at any time — not a stable cluster requiring long training to carry. People who were trained for ten, fifteen, twenty years on this kind of role carry the “carrying capacity” of a now-deconstructed bundle, not a stable capacity. That capacity has nowhere to land once the role is redefined.
Processes are no longer stable. An organisation’s processes are the specific paths that string those roles together. A request goes from PM to designer to engineer to QA to release. A client goes from sales to pre-sales to proposal to contract to delivery. A reimbursement goes from finance to audit to compliance to approval. Every process is a specific path ground out by several generations inside the organisation.
After AI, those paths can be rewritten. Three modes. The first is process compression: a path that needed five people now needs one with AI. The second is process recomposition: stages that once had to run sequentially can now run in parallel or merge with AI. The third is process disappearance: with AI changing the shape of the output, the process itself is no longer needed.
People trained on the old processes know every step’s specifics, who to find, how to interface. What they own is familiarity with one specific process, not a generic ability. Once the process is rewritten, that familiarity itself depreciates.
This is not separate from “the role being deconstructed.” As the role gets deconstructed, the process linking the roles also gets deconstructed. They happen together.
Decision chains are no longer stable. Every person in an organisation is a node on a decision chain: upstream hands me a task, I understand it, I break it down, I hand to downstream, I review the downstream’s output, I deliver to upstream. The default assumption of the old node is that upstream and downstream are people. My core value, as a middle node, is translation, coordination, and review between people.
After AI, that assumption fails. Downstream is no longer all people; it includes AI, processes, templates, agents. My value as a node is no longer “coordinate the people upstream and downstream” — it is “judge what work goes to AI, what goes to people, what I keep.”
This is a completely different value definition. The capability the old middle manager carried was “coordinate between people.” The capability the age-of-AI node carries is “redraw your own downstream boundary.” The training method, the judgment criteria, and the capability composition of the two have almost no overlap.
This sentence is the inverse of Henry Ford in the Preface. Ford turned generic people into useable members of an industrial organisation. The disqualified manager in the age of AI is one who is still using people to do what machines can already do. The age-of-AI manager has to redistribute work that used to go to people: let AI carry what machines can do, and let people carry what only people can do.
The role gets deconstructed, the process gets rewritten, the decision chain has to redraw its own boundary. Three things at once. Old training systems trained people in the capability of “running fluently on stable roles, stable processes, stable decision chains.” When the three premises fail at once, the capability itself does not fail, but the object the capability was carrying (the specific role) disappears.
This is what training-disqualification really is. It does not disqualify on content the way teaching does. It disqualifies on premise. Training was not wrong; what it trained for is disappearing.
Specific Forms of Disqualification
Fluency-training disqualified. The fastest-taken-over layer. Writing emails, summarising material, building tables, generating code, writing reports, making slides, taking minutes, organising research notes. This “fluency” used to be one of the core values of an employee. A person five years in could write reports faster, more cleanly, more uniformly than a person one year in.
After AI, that gap is gone. A one-year-in person who knows how to use AI writes reports faster and at higher quality than a five-year-in person who doesn’t. The five-year-in person still has value. He may still be thicker in judgment, in client relationships, in domain accumulation. But “writes reports fast and well-formatted” is no longer where his value comes from.
Mid-career employees thoroughly trained on fluency are in the most awkward position. They’ve worked ten, fifteen, twenty years; the layer they’re thickest at is precisely fluency. When fluency stops being priced, what they have to bring to the table is judgment, relationships, clients, domain understanding. None of which the organisation systematically trained — they accumulated those in fragments on their own.
Process-fitting training, disqualified. Old training shaped people into “stable carriers of a specific process.” That employee knew this SOP, that client, internal approval flow, the implicit rules of cross-team collaboration. After AI the process itself gets rewritten. The capability trained on a process — as “familiarity with one specific process” — depreciates with the process.
A deeper layer: when the process is unstable, “follow the process” itself is worth less. Organisations start to value “judging when not to follow the process” more — i.e., metacognition. Metacognition is precisely not what process training covers.
Experience-accumulation training, disqualified. The core reason “old employees are valuable” inside the old organisation was experience: more cases seen, knows what to do when something breaks, remembers the specifics of that client three years ago, anticipates what will go wrong in a given context.
Experience as a capability is layered. Shallow experience is “I’ve seen this kind of case.” That layer AI carries an enormous case library for, broader than any single person. Deep experience is “the judgment formed by feedback through huge numbers of real failures and successes.” That layer, AI cannot replicate, because it has no continuous subject.
The old training does not separate the two. An employee is often valued on “X years of work experience,” with the implicit assumption “time = experience = judgment.” After AI that equation breaks: shallow experience depreciates; deep experience stays scarce — but shallow is what 90% of employees are carrying.
A 15-year middle manager often says, “I’ve done this kind of thing 100 times.” Before AI, that was a proof of capability. After AI, it becomes awkward: AI has seen ten thousand times more cases than 100.
Standardised-delivery training, disqualified. Old training emphasised delivery standardisation: a complete plan, polished slides, professional report, formal email, structured presentation. Before AI this layer was the lubricant of the organisation. Formally compliant deliverables raised the efficiency of information flow. After AI the threshold of formal compliance is gone. This corresponds to “form-production disqualification” in teaching, but the scene is at work, not at school. A slick-looking deck no longer testifies that the person behind it understands the problem. A structurally complete report no longer testifies that the person did the analysis. The “professionalism” old training thoroughly shaped (writing emails, presenting, packaging plans) — the specific abilities that used to define “office worker” — has shifted, after AI, from competitive value to “stuff AI can produce.”
The Middle Has It Worst
Those four forms cover most employees. But one specific group has the harshest position in the age of AI: middle managers.
Middle managers are the largest “thoroughly trained by the old training” group on the labour market. Their disqualification mechanism, however, is not the same as front-line employees’. AI does not directly replace middle managers. It raises the standard of the middle manager’s core motion to a level the old-trained middle manager can’t reach.
The middle manager’s core motion is a loop. Any middle manager’s core motion can be broken into six steps: intake-requirements, break-down-tasks, hand-down-to-reports, track-progress, collect-results, report-up. Those six form a judgment-and-execution loop. The middle manager’s core value is not in the steps themselves, but in how fast, stable, and high-quality the loop runs. A good middle manager runs the loop fast, decides decisively, ships consistently; a bad one chokes the loop, decides indecisively, ships unevenly.
AI doesn’t replace the loop; it rewrites the standard of the loop. AI doesn’t replace the six steps. With AI, every step is faster, more accurate. At intake, AI helps clarify and rewrite the requirement. At breakdown, AI generates a structured task tree. At delegation, parts can go directly to AI; the rest goes to reports. At progress-tracking, automation does most of it. At results, AI does an initial quality check. At reporting, AI generates the various forms of reporting material. The whole loop runs noticeably faster and at higher quality. ISSUE 02 made the underlying point: AI lifts efficiency dramatically while keeping quality at or above the median. No company’s all-human team, no matter how it’s trained, can do that.
The all-human team’s quality ceiling. Any team made of people has a normally distributed quality, no matter how trained. The best output very high, the worst very low. The team’s overall output is held back by the weakest link. A 10-person team’s overall output quality is forever below that of the strongest person on the team.
AI raises every person’s work-quality baseline above the median. A one-year-in employee writing a report with AI can hit the median of five-year-in quality. A five-year-in employee with AI can hit the median of ten-year-in quality. The weakest link’s quality goes up sharply.
A middle manager who knows how to use AI runs a team whose total output quality stays above median. A middle manager who doesn’t runs a team that is still a normal distribution, dragged by its weakest link. Same resources, same headcount, same goal — the output gap between the two teams can be 2-3×.
Old training doesn’t include AI use. What middle managers were trained on for the past 10-20 years is router capability: intake, break-down, track, report-up. There is no “AI use” anywhere in that training, because the old training’s object was stable roles, stable processes, stable decision chains. AI use is a paradigm that emerged after 2023. Middle managers trained on the old path have good router capability but cannot use AI to rewrite the speed and quality of their own judgment-and-execution loop.
What “the middle has it worst” really means: the core motion he carries (the speed and quality of the judgment-and-execution loop) has had its age-of-AI standard raised dramatically, and the old training never taught him how to hit that new standard. Old-path middle managers, compared with AI-using middle managers, are sharply behind on both efficiency and quality. Same resources, same team, same goal, the AI-using middle is done while the old-path middle is still going. Or both are done, and the AI-using middle’s output quality is steady above median while the old-path middle’s is dragged by the weakest link. This is the real disqualification of the middle manager in the age of AI: old-trained middles are sharply behind AI-using middles in both efficiency and quality.
Four Reference Systems for Vocational Training
Japan. The strength of Japanese corporate training is foundational automation plus long-term embedding, shaping a person, through OJT, rotation, and master-apprentice arrangements, into someone who can carry a specific function for the long term. The failure point is that “specific function” itself is disappearing. A person trained to “carry a specific function for thirty years” finds, when AI rewrites that function, that the entire training has failed. Both ends of the long-term-employment assumption fail in Japanese organisations. One end: the firm no longer needs lifetime specialists. The other end: the employee’s specialism no longer fits new roles.
Japan’s failure is not in training poorly. It is that the direction Japanese firms train (deep embedding into a specific role) and the age-of-AI ability distribution (the role itself can be rewritten at any time) point opposite ways. The better Japan is at corporate training, the more brittle it is in the age of AI.
China. Chinese vocational training has a peculiar feature. Over the past 20 years it has concentrated on outputting “execution-type engineers”: people who can execute specific technical tasks, work 996, ship reliably. This layer is more brittle in the age of AI than Japan’s or America’s. The reason is that AI takes over precisely this kind of “standardised execution-type engineering”: writing code, debugging, integration, documentation. An engineer with AI can carry the workload of two or three engineers without it.
The job-demand for median execution-type engineers is likely to compress noticeably in the next few years. The impact of this in China is greater than in Japan or the U.S. because that layer’s share of the labour market in China is bigger. China’s other failure point is similar to Japan’s: the lifetime-employment assumption is also failing. But China’s failure happens faster, because there is no Japanese-style lifetime-employment culture as a buffer.
Other reference systems. American vocational training has always been fragmented, leaning on lateral mobility (job-hopping is normal), external certifications (PMP, CFA, AWS), and an apprenticeship revival. Fragmentation has an unexpected advantage in the age of AI: because Americans are not deeply embedded in specific firms the way the Japanese are, U.S. labour tolerates role change better. But the U.S. failure point is in the floor: people without solid foundations using AI in the age of AI are led around by AI rather than amplified by it.
The Nordics’ strength is mature lifelong-learning entry points. The failure point is small scale and small population covered, so it cannot be a transferable solution for other countries. But the mechanism itself still holds. If it extends to AI use (turn “working with AI” into a population-wide lifelong-learnable capability), it amounts to extending the boundary of foundational automation into the age of AI.
Korea is the extreme form of training disqualification. From hagwon filling all of a child’s time with exam training, to high-intensity employment preparation in university, to “presence-as-value” workplace culture — the entire chain treats high-intensity training as value itself. The result: Korean adult literacy in PIAAC 2023 is 23 points below the previous round. The most heavily trained generation of adults shows the largest decline. The whole chain is failing simultaneously, and the culture self-reinforces it.
Training Wasn’t Wrong
Training’s disqualification is not, in essence, a disqualification of training content. It is the simultaneous failure of training’s premises (stable roles, stable processes, stable decision chains) in the age of AI.
The several generations thoroughly shaped by old training carry the capability of “running fluently inside stable structures.” That capability is not failing; the object the capability was carrying is disappearing. When 20 years of work experience all corresponds to deconstructed roles, rewritten processes, and AI-taken-over decision chains, the person has to redraw his ability boundary. Old training did not teach him how to redraw.
Teaching’s disqualification means the children five years from now have to be re-taught. Training’s disqualification means the several generations on the labour market today have to be re-trained.
Teaching’s disqualification still has a reform path. Curricula can be redesigned, new exam systems built, new teaching methods spread. Results in 5-10 years.
Training’s disqualification has no such reform path. The reason: there is no “age-of-AI training” sitting around as a ready reference. Every organisation, every industry, every specific role has to find on its own what the new training is. The pace depends on AI’s capability frontier, the organisation’s reaction speed, and the individual’s self-rebuild capacity. Many people will be stuck in between old training and new demand.
The cognitive structure inside their heads was built for a world that no longer exists.
Chapter 4 · The Sandwich Generation
The diagnosis has to land on a specific group: the several generations who are currently 25 to 50, already out of K-12, with 20 to 40 years left on the labour market.
Teaching’s disqualification means they were already produced with a mismatch when the old system shipped them. Training’s disqualification means work then shaped them further into a form that is no longer needed. Reform has started, but reform is not for them.
Who They Are
By current age, sub-cohorts:
| Age now | Born | K-12 finished | Entered workforce | Years left to work |
|---|---|---|---|---|
| 25-30 | 1996-2001 | 2014-2020 | 2018-2024 | 35-40 |
| 30-37 | 1989-1996 | 2007-2014 | 2011-2019 | 28-35 |
| 37-47 | 1979-1989 | 1997-2007 | 2001-2012 | 18-28 |
| 47-55 | 1971-1979 | 1989-1997 | 1993-2003 | 10-18 |
Each sub-cohort entered work at a different “pre-AI moment”. The 25-30s entered around 2020 and met ChatGPT essentially as soon as they started working. The 47-55s entered in the 1990s; their entire career’s cognitive mode was shaped for a different world.
Rough estimates by age structure, education, and labour-force participation: China has about 450 million people aged 25-50, of whom roughly 300 million have completed at least high school and are currently in the workforce. Globally there are about 2.7 billion people aged 25-50; of those in industrialised and semi-industrialised economies with formal education and formal jobs, about 1.2 to 1.5 billion.
By the seven mechanisms, what this group was strongly trained on is knowledge, the shallow part of activation frequency, the shallow part of the strategy repertoire, foundational automation, and formal compliance. What they were weakly trained on or not trained on at all is the depth of cognitive decoupling, metacognition, and the AI-use paradigm (which did not exist during their schooling and early career).
This ability structure was designed for a world of stable roles, stable processes, stable decision chains. Before AI it could fully sustain a career. After AI the support starts to fail.
They did not choose, back then, to be trained into who they are now.
A Temporal Mismatch
What the sandwich generation faces is not just an ability mismatch. It is a temporal mismatch.
Pace at which education reform reaches the labour market:
| Reform | Launched | When beneficiaries reach the workforce |
|---|---|---|
| China Double Reduction | 2021 | Early 2030s |
| China Education Strong Nation Plan | 2024-2035 | Mid-2030s to 2040 |
| Shanghai grades 4-7 mandatory AI curriculum | From 2024 | 2032-2036 |
| US federal AI education executive order | 2025 | 2032-2037 |
| Japan generative-AI guidelines | 2024 | Early 2030s |
Pace at which AI is rewriting work:
| AI tool | Public release | Workplace impact appears |
|---|---|---|
| GPT-3.5 | Nov 2022 | From 2023 |
| GPT-4 | Mar 2023 | From late 2023 |
| Industry-specific AI workflows | 2024-2026 | Continuing |
| AI agent workflows | 2025-2026 | Underway |
Education reform runs on a 10-year clock. AI rewriting work runs on a monthly clock. The two timescales differ by two orders of magnitude.
The sandwich generation sits between the two timetables. They are not the beneficiaries of reform; the reforms reach them only after they retire. They are the absorbers of AI’s rewrite of work; that is happening every month.
Education reforms in all four systems share one feature: they target “the next generation of education-system output.” None of them target adults already on the labour market.
Adult retraining is not in the traditional ministry-of-education’s remit. It belongs to “vocational training” or “lifelong learning,” carried by employers, industry associations, individuals themselves. But corporate AI training stops at tool-skills, industry associations move slower than national education systems, and “individuals carry it themselves” means most people have no good resources.
The sandwich generation gets pushed off by three layers at once.
Three Forms of Deflection
Government punts to education reform. Education reform offers no direct help to the current workforce; it targets children of ten years from now. Government uses “we are reforming” as a response to the disqualification of the labour market, but that response is fully off-by-time. All four systems do this. China’s 2024-2035 Education Strong Nation Plan, the U.S.’s 2025 federal AI-education executive order, Japan’s MEXT AI guidelines, Korea’s 2025 AI digital textbooks. All target students at the basic-education stage. None targets adult-cognitive-rebuild for the working population. The sandwich generation’s relationship to education reform is, in essence, a non-coverage relationship. However well reform goes, it has nothing to do with them.
Companies punt to AI training. Most corporate AI training stops at “teach everyone how to write a prompt, how to use this tool, how to summarise documents.” That treats tool capacity as cognitive capacity.
What is actually needed is not tool capacity. It is task-redesign capacity: redraw the downstream boundary of your own work, judge what tasks go to AI, what go to people, what you keep. That demands deep metacognition; no tool training can reach it.
Companies don’t teach this, because it requires redesigning the organisation’s processes. Companies don’t know how to redesign them either. A company will not train its employees to “reinvent your job,” because the company itself doesn’t know what the job should be reinvented into.
The real corporate dilemma: AI requires every role boundary to be redrawn, but the company’s structure, KPIs, and reporting lines are designed around the old roles. Redrawing role boundaries equals redesigning the organisation. The cost and risk of that are too high. So companies pick the path of least resistance: teach employees to use AI tools, pretend that’s enough.
Individuals are punted to “lifelong learning.” A large part of the lifelong-learning market commodifies age-of-AI anxiety into paid courses. What those courses teach is still the knowledge layer and the tool layer — exactly the parts AI has already taken over. People who finish them feel anxiety relief temporarily; their way of working doesn’t change. That is the educational-market form of “rumination consumption” from ISSUE 01: pay, consume, psychological satisfaction, zero transfer. The deeper problem: real cognitive rebuild can’t be done through paid courses. It needs the loop of real tasks plus real feedback, repeated; it needs sustained one-on-one mentorship; it needs continuous metacognitive calibration. The cost is extreme, the coverage will inevitably be low, and it cannot be productised. The market looks like it’s solving the sandwich generation’s cognitive-rebuild need; what it’s actually doing is something else: monetising anxiety.
Anxiety and Despair
The sandwich generation is not uniform. Different ages, different roles, different accumulations — completely different positions. The three positions below cover the most representative sub-cohorts.
Entry-level text-work workers. 22-28, 0-5 years out of school. Work is text-shaped: emails, slide decks, reports, tables, research notes, baseline data analysis, standardised client communication.
Their entire value sits in “baseline execution capacity.” But they’re new on the job and have accumulated nothing yet: no domain repertoire, no work judgment, no internal familiarity with the process, no automation of any skill. “Baseline execution,” in essence, is “being walked through specific tasks by the organisation.” That is the standard onboarding entry point for a new hire.
After AI, that entry is disappearing. This isn’t in conflict with Chapter 3.3: there I said “a one-year-in person who knows AI writes faster and better than a five-year-in person who doesn’t.” That is a positive for the new hire individually — with AI, they can do an old hand’s work. But across the labour market, the picture flips: when new-hire-with-AI output equals old-hand-with-AI output equals AI-direct output, “the new hire can do the old hand’s work with AI” stops being scarcity. The labour market already has plenty of AI-using people, and AI-direct is also fine. The new hire has no comparative advantage.
Their entire output, AI can do. AI writes emails, AI makes slides, AI writes reports, AI organises research notes, AI does baseline analysis, AI drafts client emails — every one faster, more reliable, and more uniform than the new hire. AI’s marginal cost is in tokens, near zero. The new hire’s cost is salary, social insurance, training time, error tolerance. The company’s calculation is simple: rather than spend a year training a new hire to consistent delivery, just have AI do that year’s output directly.
The real consequence is not immediate unemployment. It is that the “accumulation entry” the new hire was supposed to earn in the first few years is gone. Earlier generations accumulated, through those early baseline-execution years, a domain repertoire, process familiarity, work judgment — and so moved up to middle and senior. This generation has no such entry. The baseline-execution work has gone to AI; the judgment work, they are not yet entitled to. They cannot “do the grunt work to earn their stripes” the way previous generations did, and they have not earned the cognitive-rebuild ability that the age of AI demands.
Their position is the hardest to resolve in the sandwich generation. They were not thoroughly enough shaped by old training (they hadn’t yet accumulated on the job), and they missed the accumulation entry (AI erased it).
The lived experience: a job won after sending out hundreds of résumés, a few months in, you realise the reports you write AI can write too. You ask a senior; the senior says “you need accumulation,” but cannot tell you exactly what “accumulation” is. AI-tool tutorials roll out in waves; finishing them eases anxiety briefly, but does not produce, in any real sense, accumulation.
Middle managers, 37-47. 15-25 years on the job, already moved from specialist to middle management. The core capabilities they were trained on are the running of the judgment-and-execution loop, cross-team coordination, reporting up, managing down.
After AI, the core motion they carry (the judgment-and-execution loop) has had its age-of-AI standard sharply raised, and old training contains no AI use. Old-path middle managers, compared to AI-using middle managers, lag noticeably on both efficiency and quality.
Lived experience: they realise the reports their younger reports can produce with AI are several times their own throughput. They start trying to use AI, but the use is awkward, unergonomic, slower than the younger reports. Some refuse, insisting “judgment can’t be handed to AI.” Some force themselves to learn but learn slowly. Some hang back and wait for the company to give a clear direction.
Their position has been the most stable in the organisation for the last decade and may be the least stable for the next.
Senior specialists, 47-55. 25-33 years on the job, with relatively thick domain-deep schemas. May be a sub-domain expert, a technical anchor, an industry veteran.
Compared to the previous two groups, they hold one real piece of capital: domain-deep schemas. That layer AI cannot take over. They also hold one real disadvantage: the AI-use gap relative to younger people. A 50-year-old specialist who doesn’t use AI still has valuable domain-deep schemas, but his output speed and breadth are sharply limited. A 30-year-old comfortable with AI may close the gap to within an uncomfortable distance in three years.
They have 10-18 years to retirement. Enough time to complete a self-rebuild, but it requires actively choosing to rebuild. Most will choose “work the last few years in the way I’m used to.” That choice isn’t itself wrong, but it means this generation’s domain-deep schemas leave with them at retirement, in bulk, without being passed down. That is the hidden cost of organisations in the age of AI.
Where the Responsibility Lands
The sandwich generation is not a forgotten group. They are too large, too loud, too consequential to the labour market and consumer market for anyone to actually forget them.
But they are a deflected group.
The education system isn’t designed for them. They’ve already left the education system; reform targets the system’s next-generation output.
Companies aren’t designed for them. Corporate AI training stops at the tool layer; it does not solve cognitive rebuild. Companies cannot bear the cost and risk of redesigning their processes.
The lifelong-learning market isn’t designed for them. It commodifies anxiety and treats consumption as the solution.
The cognitive rebuild of the sandwich generation lands, at the end, on themselves.
The sandwich generation’s cognitive structure was built for a world that no longer exists. That is not their fault, but the responsibility for repairing the mismatch lands on them.
Chapter 5 · Evolution
The seven mechanisms themselves do not disappear. What gets disqualified is the way of training them, not the mechanisms.
How the Seven Mechanisms Evolve
| Mechanism | Old training object | New training object |
|---|---|---|
| Knowledge | Facts, theorems | Foundational knowledge still required + boundary-of-knowledge judgment + retrieving knowledge through AI |
| Fluid intelligence | Not trained (innate) | Not trained (innate) |
| Activation frequency | Problem bank + basic logic | Problem bank still required, but kept structurally unfamiliar |
| Strategy repertoire | Tricks, schemas | Shallow: AI-workflow templates; deep: domain-deep schemas (untrainable at scale) |
| Foundational automation | Literacy, calculation, reading, rules, ethics | Old basics still required + automation of the AI-use paradigm |
| Cognitive decoupling | Math/physics proof (elite) | Problem-bank activation still core + AI assistance |
| Metacognition | Not trained | General metacognition can be trained at scale + domain metacognition is individual responsibility |
Sort training into three classes.
Class A: standardisable training. Foundational knowledge, rules, ethics, corporate culture, the AI-use paradigm, AI knowledge retrieval, general metacognition. Mass-produced; covers everyone.
Class B: standardisable tools that activate non-standardisable abilities. Problem banks, basic logic. Question authoring, scoring, and coverage are standardisable; the abilities activated (Gf invocation, deep cognitive decoupling) are not.
Class C: not standardisable at all. Domain-deep schemas, in-domain metacognition. Has to be done by the individual; AI is a tool, not a substitute.
Class A and B’s training objects in the age of AI take new shapes. The education system and corporate training both need to rebuild around the new shapes. Class C training is forever individual responsibility. AI changes the tools, not the responsible party.
Foundation, AI Use, AI Knowledge Retrieval, General Metacognition
Three things in knowledge training.
First, foundational knowledge is still required. Someone who doesn’t know Newton’s three laws will believe whatever AI tells him; someone without basic economics cannot judge AI’s policy analysis. This layer can’t be abandoned just because AI carries the knowledge.
Second, the boundary of knowledge. Under what conditions does this knowledge hold; under what conditions does it not. This is the real upgrade for knowledge training in the age of AI. Old training taught knowledge itself; new training simultaneously teaches the conditions of applicability. Someone who knows “X holds under Y” makes better judgments with AI than someone who only remembers “X.”
Third, how to retrieve knowledge through AI. Before AI, knowledge retrieval was a personal skill (search engines, literature databases, professional databases). After AI it becomes a structurable, trainable capability. How to ask so AI returns useful information, how to identify suspicious parts of AI’s output, how to have AI locate the actual primary sources, how to handle cross-domain knowledge connections with AI’s help. In 2026 this is still new; there are no mature curricula. But it will be standard in basic education and vocational training of the future.
Rules, ethics, law, corporate culture. This class of training was needed in the past and is needed now. Employee handbooks, compliance training, corporate culture, social rules. They decide whether AI tools can be used legally and ethically. Not unpacked here.
The AI-use paradigm as new foundational automation. Extend the old “literacy, calculation, reading until you don’t think about it” to “AI calls, AI workflows until you don’t think about them.” Borrow from the Japanese paradigm: high-frequency, broadly applicable, standardisable, memorisation plus practice. What needs to be drilled to automation: knowing what tasks go to AI and what don’t; knowing how to ask AI (structured, context-clear, demand-clear); knowing how to read AI’s output and decide right away whether it’s true; knowing how to use AI to calibrate judgment rather than substitute it; a set of AI-collaboration templates for common tasks; a set of counter-examples (when AI is wrong). The training form: a lot of repetition, standard templates, practice feedback, cross-task transfer. This doesn’t require a high Gf; it requires high repetition plus standardised practice — closely parallel to Japanese vocational on-the-job learning.
General metacognition can be trained. Metacognition isn’t unteachable. It is layered.
General metacognition can be mass-trained: knowledge about how to retrieve knowledge, knowledge about how to ask questions, knowledge about how to judge structure, knowledge about how to do post-mortems. All of these are teachable, drillable, examinable. Their training shape is clear, their evaluation criteria are clear, and there’s a defined path to improvement.
Some people have been teaching this layer before AI: information-literacy classes, critical-thinking classes, research-methods classes. But coverage was narrow, because it relied on high-quality teachers, and high-quality teachers are forever a scarce resource.
After AI this changes. AI is both a training tool and a feedback tool. After learning general metacognition, you can immediately calibrate against AI: is the question you asked good, is the follow-up direction right, is the judgment structure clear — AI can give immediate feedback. This is a training loop that didn’t exist before AI.
Domain metacognition remains individual responsibility. “When could I be wrong in this specific domain”; “which step of reasoning needs another check”; “where am I deceiving myself in this task.” This layer depends on real tasks and real failures in a specific domain; you cannot train it independent of the domain.
Curricula come from AI. Traditional curricula update on multi-year cycles. AI tools and paradigms update on multi-month cycles. Any frozen curriculum will lag immediately. The solution: curricula come from AI. Use AI to generate practice content, cases, counter-examples, and evaluation rubrics for the current AI tools. Let curricula update in real time alongside AI’s own evolution. If AI knowledge progress slows in the future (a maturity phase), one can return to frozen curricula. If it keeps accelerating, real-time curriculum updates are mandatory.
AI lets the high-cognition teacher’s coverage scale without limit. The premise of the education system is scale. Scale demands three things: standardisable, examinable, cheap to mass-replicate.
Training people with people has a fundamental problem: people are normally distributed in quality. No teacher-training will guarantee every teacher is above the median. A student in a third-tier city may meet a Chinese-language teacher whose level is below the 50th percentile nationwide. That student’s training quality is bounded by that teacher’s level.
What AI changes is that “AI helps one high-cognition teacher scale his teaching forms without limit.” A genuinely high-level teacher used to be able to reach the few classes and few hundred students he could teach in person. His teaching method, judgment, and understanding of the discipline could not be copied to other teachers. His cognition was a scarce resource, geometrically bounded.
After AI this changes. A high-cognition teacher can have AI learn his teaching methods, judgment criteria, and problem-design philosophy, and then have AI deliver that method, in the form of curricula, problem sets, feedback, and Q&A, to any student. AI here is not the default-level output (default AI is median or slightly above) — it is an amplifier of that teacher’s cognition. The training a student touches via AI can run far above AI’s default level.
This is the real opportunity for education in the age of AI: for the first time, the best teacher’s reach extends from a few hundred to several tens of millions. What is needed: high-cognition teachers willing to structure their methods so AI can learn them, a tool chain that supports it, and a channel for students to access “the AI shaped by the high-cognition teacher.” This is still early in 2026. But the direction is clear.
Question Banks Are Still Necessary
The real value of the problem bank. The argument earlier: good problems manufacture structural unfamiliarity and force students into the working state of Gf. The argument under “question-bank disqualification” was: once problem types are exhaustively patternised, training degenerates into standard responses.
The problem bank itself is not disqualified; what is disqualified is patternised problems. A structurally unfamiliar problem still activates Gf invocation, cognitive decoupling, and the deep layers of the strategy repertoire even in the age of AI. It is a standardisable trigger of non-standardisable abilities. The problem bank is “a standardisable tool that activates a non-standardisable ability.” Authoring, scoring, and coverage are standardisable; the abilities triggered (on-site reasoning, deep schemas) are not. That is why the problem bank is still necessary in the age of AI.
The strength of the Chinese system. Chinese basic education has, over 30 years, built the world’s thickest problem-bank training apparatus. This system shouldn’t be discarded in the age of AI; it should be reorganised. The problem with the old system: after exhaustive patternisation, training degenerates into standard responses. The new direction: keep the problem-bank training mechanism, refresh the contents. Keep large numbers of problems structurally unfamiliar.
Use AI to author problems. The method: generate new problems with AI. Update the problem bank yearly or even monthly. Let the bank’s iteration speed match the student’s drilling speed. Let “drilled-it” become an unreachable state. Before AI this was impossible. Authoring speed was constrained by human teachers; producing one good set took years. After AI it is possible, because AI can mass-generate new problems at low cost.
But there is a guardrail. The key isn’t surface novelty; it is structural novelty. AI can easily generate “surface-novel, structure-old” pseudo-unfamiliar problems. AI authoring has to layer in structural deduplication, difficulty calibration, and human spot-checks; otherwise it’s just old problems in new wrappers.
Problem-bank training plus AI auto-authoring equals the real upgrade path for the Chinese system. The upgrade preserves China’s strength in problem-bank training and removes the failure point.
A fundamental change in how to assess. The bank’s upgrade is only half. The other half is the assessment.
Old assessment: test whether the student can independently produce the standard answer. That assessment fails after AI (standard answers AI gives all of).
New assessment: assess how the student uses AI prompting to solve a problem at a cognitive boundary. Audit the student’s (or in-job trainee’s) full conversation with AI. The audit dimensions: what was the initial question; what judgment did he make after the first AI reply; what was the direction of his follow-up; how did he verify AI’s output; where did he identify AI’s error or limit; how did the final conclusion form.
What this assessment measures is judgment, decision, depth — the application of cognitive decoupling and metacognition. It is harder to substitute with AI than answers, because what it scores is process and judgment chain, not answer and result.
The assessment has to live in a controlled environment: complete logs, random follow-ups, on-site oral defence, version records, restricted external tools, required explanation of every judgment-pivot. Otherwise the student can have AI generate a plausible prompt trace; the process itself can be faked.
This isn’t a from-scratch imagination. AI competency frameworks, AI Assessment Scales, AI literacy rubrics, and the return of oral defence have all surfaced. They all point at the same thing: assessment in the age of AI cannot only look at the final answer; it has to look at how the person uses, constrains, questions, and verifies AI.
What’s missing is the synthesis of those scattered practices into a coherent process audit: audit the student’s initial question, the judgment after the first reply, the follow-up direction, the verification, the error identification, the conclusion formation, and the ability to explain every judgment-pivot live.
If Chinese basic education and vocational training are to do a real upgrade, the direction is: refresh problem-bank content via AI in real time; shift assessment from “independently give the standard answer” to “audit AI-collaboration process.”
Basic logic training. Mathematics, formal logic, basic argumentation. This layer is the foundation of cognitive decoupling. Chinese basic education has invested heavily here. The new direction is to keep the training shape “unfamiliar.” Keep the student in the state of “must reason on the spot” each time he faces a problem. Math Olympiad, formal logic, geometry proofs — programs once considered elite — should return to basic education in the age of AI. They don’t lift the Gf ceiling, but they activate Gf frequency and cognitive decoupling, which are the scarcest layers after AI.
Possibility for global borrowing. If the Chinese problem-bank apparatus completes the upgrade (bank iteration plus assessment change), it can become a real reference for other countries. Japanese and Nordic basic education don’t train Gf invocation and cognitive-decoupling activation as intensely as China. The American top end does; everything below the median doesn’t. If this upgrade succeeds, Chinese basic education for the first time has the chance to be a globally borrowable model. The premise is doing the upgrade actively, not waiting for reform to arrive on its own.
Corporate Know-how Distillation
Training disqualification has no ready-made reform path because there is no “age-of-AI training” sitting around as a reference.
What companies are doing now. Many leading firms are already distilling workflow and role know-how. They use AI to document, structure, and make callable: workflows, role responsibilities, internal decision paths, expert experience, domain know-how. Concrete forms include: capturing real workflows via process mining / task mining; piping internal wikis, emails, meeting notes, project docs, and decision records into corporate AI; having AI synthesise from case libraries “in this situation, use this approach”; distilling reusable rules from high-frequency situations in customer service, sales, IT, audit, operations; in some cases, recording experts’ or senior employees’ actual operational steps to make tacit processes and judgment explicit.
This work, between 2024 and 2026, has moved from experiments to platform deployments at leading firms. It is not yet mature everywhere, but the direction is clear: companies are turning the know-how that used to live scattered across employees, processes, documents, and systems into AI-callable organisational context.
The current goal-setting is still narrow. In the dominant corporate narrative, this work is positioned for two goals. First, efficiency lift: let employees use AI to finish current work faster, treat the distilled knowledge base as assistant, search entry, agent context, or automation material. Second, AI-use management: measure whether employees are using AI, how deeply, whether adoption rate and ROI are improving.
Both goals stand, but they are still optimisations from the current view. The deeper layer is that companies have not fully treated those distilled workflows and know-how as the foundation of a future training system. What really matters is not “let today’s employees finish work faster,” it is: when roles, processes, and decision chains are all rewritten by AI, how does the company turn the organisation’s most scarce judgments, paths, experiences, and tacit rules into training materials that new hires can learn, old hands can transfer, and teams can reuse.
The real meaning of this is not in the knowledge base, it is in the training source.
Recall the earlier point: AI extends the high-cognition teacher’s reach. Inside a company, “high-cognition teacher” isn’t necessarily a teacher — it is the company’s best middle managers, most senior experts, the old hands with the most accumulated know-how. Their judgment methods, experience, and decision criteria could only be transmitted through master-apprentice and long co-presence one-on-one or in small groups before. That is why Japanese corporate training works: tacit knowledge deposits over years of co-presence. It is also why the Japanese model can’t scale: one-on-one transmission has too low coverage.
Distilled into AI, this changes. What a new employee touches via AI is no longer just AI’s default output, but output shaped by the company’s best people, most effective processes, and longest-accumulated cases. What a middle manager learns via AI is no longer just generic management knowledge, but the specific judgment frameworks, decision criteria, and process logic of this firm.
That mechanism pushes AI from efficiency tool to training tool: AI’s default is generic median output; corporate know-how distillation calibrates it to high-quality experience inside the organisation. Originally the argument was AI helping employees work; now we can also say AI helping employees be trained.
The real corporate-training transformation. Three moves.
First, treat know-how distillation as the core of the training system, not just a current efficiency tool. The distilled material has to be designed against the standard of “use AI to train new hires”: structured enough that AI can understand it, can use it to author problems, can use it to assess learners.
Second, shift training from classroom lecture to AI-assisted on-the-job practice. New hires, on entering the role, go straight into “AI-assisted real-task” mode and skip the week of classroom training. AI tutors them, corrects them, sets them problems, and audits their collaboration process based on the company’s distilled know-how. This mechanism lets new hires reach in three months what the old system needed two years to produce.
Third, shift assessment from “scoring outcomes” to “scoring AI-collaboration processes.” Same as the assessment reform above. Audit the new hire’s conversation with AI: question quality, follow-up direction, judgment depth, ability to identify AI errors, formation of the final conclusion. These dimensions reflect real judgment.
Where most companies still are. Most are still at “teach employees to use AI tools.” A small set of leading firms are doing know-how distillation, but the goal is still current efficiency, not future training. Companies that treat distillation as the training base — at this moment in 2026 — are essentially none.
This will happen, because the logic is unavoidable. Three factors gate the speed.
First, the quality of distilled know-how depends on the company’s pre-existing documentation level. A company with a messy internal knowledge base produces messy AI training material.
Second, whether senior management recognises this as the training base. Most companies still treat it as an efficiency tool.
Third, whether employees cooperate. A senior employee who fears AI learning his know-how then making him replaceable will not genuinely cooperate with distillation.
These three gate the speed of corporate-training transformation. It will happen, but slower than the technology can move. Until companies complete this upgrade, the sandwich generation has to rebuild on its own.
The Non-Standardisable Layer: Domain-Deep Schemas and Domain Metacognition
Class C training will never scale. As argued: domain-deep schemas need long real tasks plus real failures in repeated feedback; in-domain metacognition cannot be assessed, it happens inside each person’s own judgment process. These two layers are forever individual responsibility.
The shape of AI-assisted self-training. Picking up the moves ISSUE 01 and ISSUE 02 already discussed: have a refutable initial judgment first; use AI to find counter-examples, do reviews, decompose variables; revise the judgment model under refutation; abstract experience into structurable rules; reconnect learning to real tasks.
Self-training for each mechanism.
Gf and its activation frequency: set yourself daily small tasks that “must be solved by reasoning.” Don’t let yourself ask AI by reflex; think for 5 minutes first. AI’s role here is counter-example provider. This isn’t just mechanically lifting Gf’s activation frequency; pushing Gf invocation repeatedly also pushes you closer to your Gf ceiling.
The depth of cognitive decoupling: when you have a judgment about a specific problem, ask AI to “abstract the core structure of this judgment” and check whether that abstract structure applies in other domains. This is the “cross-domain isomorphism” training from ISSUE 01.
Domain metacognition: after every important judgment, have AI run a post-mortem with you. What are the key assumptions of this judgment, what evidence supports it, what evidence opposes it, how would the judgment shift if the evidence changed.
Domain-deep schemas: depend on real tasks and real failures. AI cannot replace the failure itself, but it can speed up reflection. After each completed real task, have AI walk you through “if I redid it, which step would change.”
AI is a calibration tool, not a substitute. The lived experience: you have a judgment on a problem; you ask AI to refute it. Once AI gives the refutation, you decide whether the refutation holds. If it holds, revise the judgment; if not, argue why not. Each pass through that loop trains in-domain metacognition. Cost: AI’s token cost is very low, but your time cost is not. Each post-mortem costs 30 minutes to a few hours. This cannot be scaled. It also doesn’t need to be scaled. It is a series of specific conversations between an individual and AI — the individual’s own cognitive rebuild.
Who Can Do This
One concrete baseline picture: the person who scores 70-80 without drilling. Everyone has met this kind of person. In secondary school and college, doesn’t drill problems, doesn’t pay attention in class, remembers a limited number of formulas and methods, and yet on test day pulls 70-80.
Three things in their cognitive setup.
Higher Gf ceiling. Faced with the same structurally unfamiliar problem, he doesn’t need to have seen the type before to reason it out on the spot.
A small but high-quality strategy repertoire. The “few formulas and methods” he remembers are deeply internalised core structures, not shallow tricks. He uses domain-deep schemas: knows what to use when and what to avoid.
Stronger in-domain metacognition. He knows when he’s deceiving himself, knows which step of reasoning needs another check, knows how to allocate exam time.
These three are precisely the abilities the age of AI prices on their own. This kind of person will be in a far better position in the age of AI than someone who reaches the same score by drilling.
But there are two boundaries.
First boundary: the transfer from exam scenes to real work scenes is not automatic. Exam scenes are structured with model answers; real work is unstructured and you have to define your own metrics. Some people who scored high on Gf in exams stay in “wait for the task to be assigned” mode at work.
Second boundary: this kind of person is still rare at the top of the score distribution. Most gaokao top scorers and recommended-admission students are heavy-drilling-plus-high-Gf combinations. “Low-drill, high-score” is statistically a minority.
A more precise judgment: in the age of AI, the market value of the “low-drill plus mid-to-high score” cognitive mode will be sharply higher than the “heavy-drill plus high-score” combination. The former is close to the cognitive shape the age of AI prices on its own; the latter is precisely what AI takes over. But to genuinely stand out in a career, the former still has to actively transfer the cognitive mode to real tasks after leaving the exam system. That doesn’t happen automatically.
The threshold isn’t age, credential, or industry. Inside the sandwich generation, the difference between those who can complete a self-rebuild and those who can’t is not age, not credential, not industry. A 50-year-old engineer may complete it; a 30-year-old PM may fail.
The threshold is five things.
A. Can produce a refutable initial judgment. B. Can revise the model from being refuted. C. Can abstract experience into rules. D. Uses AI to find counter-examples, do reviews, run post-mortems. E. Can reconnect learning to real tasks.
None is hard alone. Together they take long persistence and real pain.
The “low-drill, score-70-80” person naturally has some of A, B, C. D and E need active cultivation. The AI tooling is new; reconnecting learning to real tasks is everyone’s individual choice.
A relatively positive judgment. The number of sandwich-generation people who can complete the non-standardisable (Class C) self-rebuild is small. That is unavoidable, because that layer will never scale.
But Class A and B training can, in the age of AI, hit high coverage at stable quality. Which means: as long as the Class A and B AI turn happens, large numbers of “basic-usable + with basic metacognition” people will be produced. They may not have completed deep self-rebuild, but they can complete basic AI use, hit above-median work output with AI assistance, and keep basic judgment. Those people still have a place in the labour market.
The premise is that the Class A and B turn actually happens. AI-use paradigm as new foundational automation, AI knowledge retrieval, general-metacognition training, problem-bank-plus-AI-auto-authoring-plus-AI-collaboration-process-audit, AI extending the high-cognition teacher’s reach, corporate know-how distillation. All of these turns are early in 2026, with no mature practice yet, but the path is clear.
If those turns succeed, the age of AI will not produce “most people can’t make a living.” What will appear is layering: a small group of deep-rebuilders at the top, a large group of basic-turners stable above the median.
The premise is that AGI hasn’t arrived. If AGI arrives in 5-10 years, all of these paths need rewriting. The AGI discussion isn’t unpacked here.
Education’s Lag, and the Scale of This
Education systems are, by design, lag products. The people they train enter the labour market 5-10 years later, on the assumption that the labour market is relatively stable in the meantime.
That assumption held for the past 100 years. The age of AI breaks it for the first time. AI rewrites work on a monthly clock; any 5-10-year curriculum is already out of date when its graduates reach the labour market.
Every wave of reform arrives as a solution to the previous era.
The concrete meaning for the sandwich generation: don’t wait.
Don’t wait for government reform (10-year cycle, doesn’t target working adults). Don’t wait for corporate training (stuck at the tool layer). Don’t wait for market products (anxiety, commodified).
The only thing that can start now is AI-assisted self-training. It doesn’t depend on any external system being ready.
Back to the Preface. Yang Jian wanted usable officials. Frederick wanted usable subjects. Ford wanted usable workers. Every time teaching and training were established, they were answering “can this person be used.”
April 2026. A 35-year-old sits at his desk and watches an AI agent finish what he had spent ten years getting fluent at. He knows the abilities he was trained on are no longer the abilities the market prices on their own.
This generation has not been given a new map. The map they were given describes a terrain that no longer exists.
Teaching and training have not lost meaning. They have lost the credentials they used to confer.
Some things remain necessary inside the old credentials: foundational knowledge, rules, ethics, basic logic, problem-bank activation. Some things have to be built new: the AI-use paradigm, AI knowledge retrieval, general-metacognition training, AI-collaboration process auditing, AI-assisted self-training, corporate know-how distillation.
This rebuild will happen. It will be slower than the technology, and far slower than the sandwich generation can bear.
For today, do today’s work. Tomorrow’s work, tomorrow’s people will do.
Sources and Citations
The principal factual claims in this essay rely on the sources below.
On the three historical pivots in the Preface
- Yang Jian (Emperor Wen of Sui), in the seventh year of Kaihuang (587 CE), issued a decree instructing each prefecture to “annually present three” candidates for the xiucai examination. Sui Shu, Annals of Gaozu. There is long-standing scholarly debate over the start year of the imperial examination; some scholars place its formal establishment in the Daye era of Emperor Yang of Sui (605 onward), with the jinshi category. This essay takes the 587 “annual presentation of three” as the proto-form of subject-based recruitment — the conventional narrative.
- Frederick II (“the Great”) issued the Generallandschulreglement (General School Regulations) on 12 August 1763 — a key text in the institutionalisation of Prussian state-mandated schooling: James Van Horn Melton, Absolutism and the Eighteenth-Century Origins of Compulsory Schooling in Prussia and Austria (Cambridge University Press, 1988). Note: Frederick William I had already issued an early compulsory-schooling edict in Prussia in 1717; the Massachusetts Bay Colony had a still-earlier compulsory-education act in 1642. The 1763 regulation is not “the world’s first national-level compulsory-education law” but a landmark text in the early institutionalisation of state compulsory schooling.
- Henry Ford’s 1913 Highland Park moving assembly line compressed automotive assembly time from 12.5 hours to 93 minutes; the $5/day wage policy was announced in January 1914; the Sociological Department (50 home-visit investigators) was formed in 1914; the Ford English School taught English and American civics. Sources: The Henry Ford digital archives (Sociological Department & English School collections); Stephen Meyer III, The Five Dollar Day: Labor Management and Social Control in the Ford Motor Company 1908-1921 (SUNY Press, 1981). The founding year of the Ford English School appears as both 1913 and 1914 in different archival entries; this essay uses the integrated narrative of “1913 line + 1914 Sociological Department expansion.”
On the seven cognitive mechanisms (Chapter 1)
- The fluid (Gf) / crystallised (Gc) intelligence split: Raymond B. Cattell, “Theory of Fluid and Crystallized Intelligence: A Critical Experiment,” Journal of Educational Psychology 54 (1963): 1-22, and the Cattell-Horn-Carroll lineage that followed.
- Tacit knowledge (anmokuchi): Michael Polanyi, The Tacit Dimension (Routledge, 1966) is the source; Ikujiro Nonaka’s application in corporate knowledge management appears in Nonaka & Takeuchi, The Knowledge-Creating Company (Oxford University Press, 1995).
- The “cognitive decoupling” framework comes from Offbook Press ISSUE 01, On Cognitive Decoupling. “Activation frequency,” “strategy repertoire,” “foundational automation,” and “metacognition” are organising labels this essay uses to integrate related concepts in educational psychology; they do not map exactly to a single academic theory.
On the international assessments (Chapter 1)
- PISA 2022: OECD (2023), PISA 2022 Results (Volume I): The State of Learning and Equity in Education, OECD Publishing, Paris, https://doi.org/10.1787/53f23881-en. Mainland China did not participate in 2022. The mainland-China B-S-J-Z (Beijing, Shanghai, Jiangsu, Zhejiang) figures of 591 / 555 / 590 used here are from PISA 2018: OECD (2019), PISA 2018 Results.
- PISA 2022 creative-thinking assessment: OECD (2024), PISA 2022 Results (Volume III): Creative Minds, Creative Schools.
- PIAAC 2023 (the OECD’s second-round Survey of Adult Skills): OECD (2024), Do Adults Have the Skills They Need to Thrive in a Changing World? Survey of Adult Skills 2023. Korean literacy is down 23 points relative to 2012; numeracy is also down. https://www.oecd.org/en/publications/survey-of-adults-skills-2023-country-notes_ab4f6b8c-en/korea-republic-of_5f95963c-en.html
- TIMSS Video Study 1995 / 1999: James W. Stigler & James Hiebert, The Teaching Gap: Best Ideas from the World’s Teachers for Improving Education in the Classroom (Free Press, 1999); James Hiebert et al., Teaching Mathematics in Seven Countries: Results from the TIMSS 1999 Video Study (NCES, 2003). The 1995 study covered Germany, Japan, and the U.S.; the 1999 study expanded to seven systems: Australia, Czech Republic, Hong Kong, Japan, the Netherlands, Switzerland, and the U.S. (without Germany). “Shallow teaching syndrome” comes from the 1999 Australian national report: Hilary Hollingsworth, Jan Lokan & Barry McCrae, Teaching Mathematics in Australia: Results from the TIMSS 1999 Video Study (ACER, 2003).
- NAEP 2024: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, 2024 NAEP Reading & Mathematics Assessment Results, https://www.nationsreportcard.gov/. The 4th-grade reading 10/90 percentile gap is 107 points.
On the four-system education and training reforms (Chapters 1 & 4)
- China’s Double Reduction policy: July 2021, Opinions on Further Reducing the Workload of Compulsory Education Students and the Burden of Off-Campus Tutoring (CCP Central Committee General Office and State Council General Office).
- China’s Education Strong Nation Plan (2024-2035): issued by the CCP Central Committee and the State Council in January 2025.
- China’s AI + Education Action Plan: jointly issued in April 2026 by the Ministry of Education, NDRC, MIIT, MoST, and the National Data Administration; reference number Jiao Ke Xin [2026] No. 1.
- Shanghai’s grades 4-7 mandatory AI curriculum and Huangpu District’s “Climbing the Peak” program: published documents of the Shanghai Municipal Education Commission and the Huangpu District Education Bureau.
- U.S. federal AI-education executive order: “Advancing Artificial Intelligence Education for American Youth” (The White House, April 2025).
- Japan’s GIGA School plan (2020-2024 nationwide K-12 device deployment): MEXT.
- Japan’s MEXT December 2024 Guidelines on the Use of Generative AI at the Primary and Lower-Secondary Stages: MEXT.
- Japan’s AI Education Accelerator Program: expanded teacher AI training through industry partnerships (including SoftBank Robotics among others). Specific scale figures (e.g. “about 50,000 teachers trained”) currently appear only in secondary sources (industry blogs and media reports) and have not been directly verified against MEXT official documents; the essay therefore uses qualitative phrasing.
- Korea’s AI Digital Textbook (AIDT): rolled out in stages in March 2025 across selected subjects; downgraded by the National Assembly within 2025 to “supplementary materials” (no longer statutory textbooks) in light of effectiveness concerns and teacher feedback, with budget and coverage substantially reduced. Sources: Korean Ministry of Education announcements and related National Assembly legislative records.
On the cognitive-mechanism analysis of gaokao math (Chapter 2)
- The author’s team’s analysis of cognitive mechanisms in 2021-2025 gaokao math. Sample: 567 problems across 7 distinct paper variants — Shanghai, Beijing, Zhejiang, Quan-Guo-Jia (Sciences), Quan-Guo-Yi (Sciences), New Gaokao Paper I, New Gaokao Paper II. Each problem, under a v2 calibrated classifier, outputs a verdict (GOOD / MIXED / BAD), a primary_mechanism (across the seven mechanisms), and a crammer_gap (drilling premium), among other dimensions. Data assets: scored.jsonl (567 lines) plus findings.md (project conclusions document).
On AI and the labour market (Chapters 3 & 5)
- The judgment that “AI lifts efficiency dramatically while keeping quality at or above the median” is drawn from the analysis of AI’s impact on workplace output in Offbook Press ISSUE 02 (Rebuilding Learning) and ISSUE 03 (Breakdown of Firms).
Continuities with earlier issues of Offbook Press
- “Cognitive decoupling,” “rumination consumption,” and “cross-domain isomorphism” come from ISSUE 01, On Cognitive Decoupling.
- “AI review loop” and “AI lifts efficiency dramatically while keeping quality at or above the median” come from ISSUE 02, Rebuilding Learning.
- “Organisational dissolution,” “the resistance to AI landing inside large organisations,” and the early discussion of “know-how distillation” come from ISSUE 03, Breakdown of Firms.
- “Form being able to be produced independently of substance” and the training of “authorship” come from ISSUE 04, Mirage of Form.
This essay paraphrases the sources above conceptually rather than quoting them directly. For verification of original wording, consult the original links and publications.
ISSUE 05 ends · Offbook Press · 2026