← Back to Articles

Where to do Vibe Coding

The question is not whether to use it, it is where.

Andrej Karpathy introduced the term vibe coding in February 2025, describing the practice of delegating code generation entirely to an LLM and accepting the output with minimal review. You describe what you want, the model writes it, you run it, and you iterate on feel rather than on understanding. The name stuck because it actually captured something real, as a growing number of people were already doing exactly this, and calling it something else.

By that time, GPT-4, Claude, and a wave of coding assistants had just crossed a threshold where the output was good enough to be impressive even on the first try, often on tasks that were previously inaccessible to non-specialists. The productivity narrative took over quickly, and vibe coding went from a half-joking description to a semi-serious methodology with advocates and critics. Some people declared software engineering changed forever. Others pointed out the obvious: models hallucinate a lot, and hallucinations in code are syntactically invisible.

The useful question is, nonetheless, narrower: for a given task, what is the cost of those hallucinations? The two criteria for answering it are criticality (the potential impact of failure) and novelty (how unfamiliar or complex the task is for the model).

The go for it zone

Internal tooling is the obvious case. Scripts that rename files, reformat CSVs, generate reports from a database dump, stub out a UI for a design review. The failure mode here is a wrong output you catch immediately, fix in a few minutes, and forget. There is no argument for spending an afternoon writing careful, tested code when the LLM produces something usable in two minutes. The time saved is real and the downside is bounded.

Prototyping sits in the same category. If the point of the code is to answer a question or demonstrate a concept, correctness is provisional by design. A vibe-coded prototype that proves a layout works or sketches an integration pattern has done its job even if you throw it away afterward.

Vibe coding can also work externally for small companies and individual contributors, especially in front-end development. Low-stakes or non-critical initiatives, such as advertising or demos, are widely documented on the internet, meaning that the model can leverage many good patterns and examples to produce acceptable code quickly.

The caution zone

Business logic is where people get into trouble. You can still take advantage of the model's speed for routine tasks: generating a switch statement or writing documentation is faster than typing it out. The output looks right. It often is right. But LLMs produce boundary condition bugs that are syntactically correct and logically wrong.

We usually catch those mistakes by following code smells, a principle based on the idea that professionals develop their problem-solving and coding skills at the same pace. LLMs hide those smells, making the underlying bugs harder to detect.

A rounding error that affects transactions above a specific threshold; an off-by-one in a date range query that silently drops one day of records; authorization logic that covers the obvious cases but misses a role combination that exists in production. These are hard-to-trace bugs that surface weeks later as incidents taking days to track back to their origin.

At this point, the AI tools are still useful, but you have to review their output carefully, and you have to be aware that the review process is more difficult than it looks. You can't just read the code and understand it. You have to reason about the edge cases and the failure modes, and that's a skill that takes time to develop. It means that proper testing becomes even more critical and extensive in these contexts.

In this zone, you might consider switching from naive vibe coding to what is increasingly being called AI Assisted Coding. The video below, from IBM, explains what AI Assisted Coding is and how it differs from vibe coding. AI Assisted Coding emphasizes a more structured interaction with AI, integrating rigorous SDLC practices into the process.

For a detailed exploration of SDD (Specification-Driven Development), see my original article about using SDD in practice.

The no-go zone

In safety-critical applications, the problem with vibe coding is that reliability, in the technical sense required by these domains, is a property you have to prove, not observe.

DO-178C, the standard that governs software in civil aviation, requires traceability from requirements to code to tests, coverage analysis, and formal review at every stage. ISO 26262, its automotive equivalent, adds hazard analysis and functional safety requirements that tie software behavior to physical failure modes. IEC 62443 covers industrial control systems. None of these standards describe a desired outcome and trust that the implementation will follow. They require that you can demonstrate, at audit, how every line of code relates to a verified requirement and what happens when it fails.

An LLM cannot produce that documentation because it does not reason about requirements; it predicts tokens. You can ask it to generate a traceability matrix and it will give you one. It will look plausible and it will be meaningless, because the relationship between the code and the requirement was never established in the first place. A certification auditor will find this in minutes.

Beyond certification, there is the simpler physical argument. A subtle bug in a collision avoidance algorithm or a PLC ladder program can cause a crash, a fire, or an uncontrolled machine. The failure mode is irreversible. No productivity argument survives that tradeoff.

Besides that, if you have already tried AI models in research or very innovative projects, you know that they produce a large number of hallucinations trying to map a seemingly simple problem onto something that exists in their training data. The code can work from existing documents in the project and sometimes solve equations you have already established, but it will not go one step deeper than the surface.

The pattern

Vibe coding is a productivity multiplier in low-criticality contexts and a liability in high-criticality ones. The dividing line is set by two criteria: criticality (the potential impact of failure) and novelty (how unfamiliar or complex the task is for the model). The practical rule is simple: the maximum of these two criteria determines your approach. If either criticality or novelty is high, then vibe coding is a no-go.

For example, prototyping a UI or automating a personal workflow is low risk, but writing authorization logic or code for a collision avoidance system is not. If your project scores "max" on either criticality or novelty, that's a clear signal to avoid vibe coding. The safest path is always determined by the riskiest criterion.

Much of the confusion in public debate comes from applying arguments that are correct at one level to a context at a different level. Vibe coding made low-criticality tasks much faster and created pressure to pretend that higher-criticality or high-novelty contexts work the same way, which they do not.