Vibe Coding would not work

Posted by eagleboost on August 16, 2025

I watched the interview of Sam Altman on Huge Conversations following the release of GPT-5.

Shortly after beginning of the interview, he mentioned:

The thing that I am most excited about is this is a model for the first time where I feel like I can ask kind of any hard scientific or technical question and get a pretty good answer. And I’ll give a fun example: actually when I was in junior high, or maybe it was ninth grade, I got a TI-83, this old graphing calculator. And I spent so long making this game called Snake. It was a very popular game with kids in my school. And I was proud when it was done. But programming on a TI-83 was extremely painful. It took a long time. It was really hard to debug and whatever.

As the most powerful AI model ever created, GPT-5, it’s surprising that Sam Altman used something GPT-3.5 could already do years ago to demonstrate its capabilities—Snake is neither a complex scientific nor technical problem. Even for the sake of making it understandable to the general public, there must be better examples to cite.

One of OpenAI’s co-founders, Andrej Karpathy, earlier proposed the concept of “Vibe Coding”:

“You fully give in to the vibes… forget the code even exists.”

Undoubtedly, Sam Altman didn’t write a single line of code—he just gave some instruction, and GPT-5 produced a Snake game. Is this Vibe Coding? I don’t think so. Models like GPT have already been trained on codes like Snake game. Generating it in one go, without even needing to specify requirements, hardly counts as coding.

Whether Vibe Coding is useful is another question, but by definition, it should at least follow this workflow:

Human describes requirements → AI generates code → Human tests → Human provides feedback → AI iterates and optimizes

So, is Vibe Coding useful? Or is it even feasible? My answer is:

For non-professionals creating simple prototypes, it’s somewhat feasible—and that’s it.

A simple test would be enough to verify my opinion.

I used a interview question I designed to assess candidates’ knowledge—implementing a basic user management window. The AI assistant I used was Claude 4 Sonnet, widely regarded as the strongest model for coding.

Round 1: High-Level Requirements

I provided a very general description of the task. Since this is an extremely common scenario, the AI quickly generated a mediocre implementation—better than most average interviewees, but the code quality was nothing impressive. Due to the vague requirements, corners were cut (e.g., input validation was ignored).

The reason is simple: If humans don’t ask the right questions, the AI won’t give thoughtful answers. In software development, users rarely mention details like input validation or real-time feedback when describing requirements. Experienced business analysts can translate user needs into actionable documentation, but they often also skip such details—those only come into developers mind during actual coding.

So, what role does the human play in Vibe Coding? Non-experts can’t foresee details, so their requirements are vague, leading to mediocre outputs. Higher-quality results demand higher-quality input, which requires expertise.

Round 2: Detailed Requirements (500+ words)

This time, I give AI details about 500 word long that covers below:

  • UI styling
  • Data formats
  • Input validation
  • Real-time feedback
  • …and more

The AI’s output improved significantly—it used polymorphism for different modes and data templates for the view layer.

But when running it:

  1. Input validation had issues. After feedback, it improved.
  2. Error messages didn’t auto-dismiss. Multiple feedback attempts failed—Claude even started hallucinating. I had to take over and point out the exact problem so it could fix it.

The conclusion is clear:

  1. Non-experts (and even junior devs) struggle to articulate requirements from a development perspective.
  2. The AI doesn’t truly understand the code it generates (note: generates, not writes). Sometimes it seems to fix issues, but it’s just swapping solutions based on other training data. In my case, this didn’t work—if I weren’t a programmer, I and the AI would be stuck in a headless situation.

From a professional standpoint, while Round 2’s code was better, it was still far from production-ready. This touches on another issue: training quality.

AI coding models are trained on publicly available internet code, but let’s be honest—most online code is low-quality. Some excellent code exists but is niche, carrying little weight in the AI’s “brain.” Unless explicitly prompted, the AI won’t generate it. And many high-quality private codebases are entirely absent from AI’s training.

Thus:

  • Non-experts can only guide AI to produce mediocre results.
  • Programmers can get better code from AI, but when problems arise, they can’t rely solely on verbal instructions to fix them.
  • For experts, Vibe Coding is redundant—AI does save some repetitive work, but these professionals already automate such tasks themselves. AI is smarter, but its absence wouldn’t hinder those people’s efficiency. Besides, AI-generated code often requires extra polishing to meet high standards, which can be a waste of time.

To me, Vibe Coding is a paradox:

  • Non-experts can’t do much with it.
  • Experts don’t need it.

It feels more like a marketing concept to promote AI. As for those hyping it up, I doubt many have done any real testing.

Serious programmers shouldn’t fear job loss (if a role can be fully replaced by AI, it probably wasn’t essential to begin with). Instead:

  1. Keep improving your skills.
  2. Understand AI’s strengths and limits.
  3. Use it as the best tool ever created—but don’t overestimate it.

  看了GPT-5发布后Sam AltmanHuge Conversations接受油管主播的专访。

  刚开始不久他就说“头一次觉得能向模型提出任何复杂的科学或技术问题,还能得到相当不错的答案……初中九年级时我花了好久才编出个“贪吃蛇”游戏……用GPT-5早期版本试试……结果它用了7秒时间就完美搞定了”。

  GPT-5作为有史以来最牛逼的AI大模型,Sam Altman居然用几年前GPT-3.5就已经能做的事来说明其厉害程度,我是挺诧异的——贪吃蛇并不是复杂的科学问题也不是复杂的技术问题,即便是为了让普通人能听懂也应该有更好的例子可以举。

  OpenAI的联合创始人之一Andrej Karpathy早些时候提出了Vibe Coding的概念:“you fully give in to the vibes … forget the code even exists”(“你要完全沉浸于氛围”并且“忘记代码本身的存在”)。中文一般叫做“氛围编程”,翻译得不好也不准确,其实直接叫“自然语言编程”就行,毕竟他原话也说“The hottest new programming language is English”。

  毫无疑问Sam Altman一行代码没有写,只发号施令GPT-5就弄出了一个贪吃蛇游戏。这是Vibe Coding吗?我想不是。贪吃蛇这样的语料GPT们早就被喂食过了,一句话就搞定的东西,连需求都不用提,不能算是CodingVibe Coding有没有用按下不表,按定义至少应该是这样的流程:

人类描述需求 → AI 生成代码 → 人类运行测试 → 人类反馈 → AI 迭代优化

  问题是Vibe Coding有用吗?或者说可行吗?我的答案是对非专业人士用来做简单的原型设计大致可行,仅此而已。做个简单的测试即可验证这个观点。

  测试用的是我设计的一个用来检验求职者相关知识的万能面试题——实现一个具备基本功能的用户管理窗口,AI用的是公认写代码综合能力最强的Claude 4 Sonnet

  第一轮,从高度概括的角度把需求告诉AI。由于是个非常常见的场景,AI很快给出了一个非常普通的实现,比多数普通面试者好一些,但代码质量乏善可陈。由于需求很简略,所以实现也偷懒,比如输入验证这类需求都没有考虑。

  原因很简单,人类问不出问题,AI也不会用心答问题。在软件开发中,用户描述需求时并不会关注输入验证、实时反馈等细枝末节的东西,经验丰富的业务分析员能够比较准确地把用户需求转化成程序员可以照着做的文档,但同样不会涉及细枝末节。细节的东西只会在程序员写代码的过程中才逐渐浮现出来。

  所以主导Vibe Coding的人类是什么角色本身就是个问号。非专业人士考虑不到细节,给AI提供的需求简略,生成的东西质量也就那样了。要得到质量更高的结果,必须提供质量更高的输入,而高质量的输入则需要专业知识。

  第二轮,我写了大约500个英文单词的输入,把界面样式、数据格式、输入验证、实时反馈等需求详细描述了一遍。这次AI生成的代码质量高了不少,比如不同的模式用到了多态,在视图层面用到了数据模版等。

  运行起来先是输入验证有问题,反馈之后有了改进,但是显示出来提示用户输入有错的文字不会自动消失。反馈几次都没法改正,Claude甚至开始胡编乱造,直到我阅读代码告诉它什么地方有问题它才改对。

  答案其实很清楚。首先非专业人士甚至初级程序员很难从开发的角度把需求弄清楚。其次,AI并不真正理解它生成(注意是“生成”而不是“写”)的代码。有时候出了问题看似它能解决,其实是根据别的训练语料换了个方案罢了。而我这里的例子并不是适用——虽然只是很小的改动,但如果我完全不懂编程,结局就是和AI一起变成无头苍蝇乱撞,解决不了问题。

  从专业的角度,虽然第二轮AI生成的代码质量有所提高,但还是远远达不到要求。这又牵涉到另一个话题,训练质量。用来训练AI的编程资料几乎都来自公开的互联网信息,然而实话实说,互联网充斥的是大量低质量代码,一些优秀的代码尽管公开但比较小众,在AI大脑中的权重并不高,如果不特地问到那个点它是不会答的,更不用说很多不公开的优秀代码AI没有学到。

  因此普通人由于缺乏专业知识,指导AI只能做出普通的东西。程序员能够指导AI生成好一些的代码,但遇到问题没法只是动动嘴皮子完全靠AI来解决。Vibe Coding对更专业的人士则会显得鸡肋——AI确实能帮助省去一些重复劳动,但在没有AI的年代这些人早就自己开发和使用工具避免重复劳动。AI的确更加智能,但没有AI并不会影响他们的工作效率。而AI生成的代码需要额外的工作量才能满足高质量的要求,某种程度上说是浪费时间。

  在我看来Vibe Coding是个悖论,非专业人士用它做不了啥,专业人士用它也做不了啥,可以说是为了推广AI弄出来的概念。至于一些大吹特吹的人我怀疑没几个真正做过实测。严肃的程序员不用因此担心自己会丢工作(会丢工作的那些岗位大概也不是必须的岗位),一方面需要提高自己,一方面要了解AI能做什么不能做什么,用好这个有史以来最好的工具。