Why software won't become disposable despite the rise of AI-assisted coding
Andrej Karpathy calls it “vibe coding”
It is easy to come to the conclusion that this is the future of software. Cursor hit \$500M ARR in twelve months.
Indeed, this was the conclusion I jumped to when I first started thinking about the future of software. But while we can experience vibe coding today, ephemeral software is only hype so far, and in this essay, I want to argue that ephemeral software will not become a reality as it has been pitched in various ways
While I do believe that software engineering as a whole can be automated, and indeed, any role within the process can and likely will be performed by AI agents in the near future
My core claim is this: AI does not make software ephemeral; it obviously makes code generation cheaper, but this shifts the bottlenecks to validation, integration, and ergonomics (UX etc.). The hard part of software engineering has never been writing code. It has been discovering correct behavior by resolving edge cases and operational ambiguity when software collides with reality. Neither the duration of this discovery process nor its cost vanishes when code generation becomes fast by itself
Instead, I will argue that the future of software is malleable, not ephemeral. By this, I mean that all artifacts of software engineering will become more malleable but code will not become ephemeral—i.e., forgettable or disposable. Code will remain the source of truth. Institutional artifacts will become more important: code, version history, tests, specs, audit trails, and postmortems. The malleable software model persist code and higher-level artifacts together with massively reduced friction and maintenance costs thanks to AI agents and tools.
Let’s contrast vibe coding and ephemeral software:
Vibe coding. For many small tasks, you can describe intent in natural language and get working code fast and cheaply enough to treat it as a one-off artifact. This is already true.
Ephemeral software. Much more software, including larger and more important projects, becomes throwaway: generated on demand, used once or briefly, and not meaningfully maintained. The persisted layer moves upward into stored prompts, specs, logs, policies, and UI mocks; the codebase is at most cached and can be discarded and regenerated with high trust.I am defining ephemeral software here somewhat generously to engage with the strongest defensible version of the claim, not only its most naive form.
There is a risk for the Motte and Bailey fallacy in the discourse around ephemeral software: evidence for cheap code and fast iteration is treated as evidence that persisted artifact stacks will disappear. Cursor’s ARR, Copilot adoption, AI-generated codebases at YC startups, Claude Code revenue, and the Stack Overflow survey mostly show developers producing code cheaper and faster within durable workflows: pull requests, CI, Git, code review. This is not evidence that software has become disposable as the ephemeral software hypothesis claims.
To make the discussion more precise, we can decompose “ephemerality” along two axes:
The strongest form of the ephemeral software hypothesis occupies the extreme of both axes: continuous regeneration with minimal persisted code artifacts. My argument is that this corner is unstable at scale, and systems that start there migrate toward persisted artifacts as they accumulate users, state, and integration complexity. Where a given piece of software falls on these axes depends on its stakes, its integration surface, and how long it lives.
Much of the evidence cited for ephemerality actually demonstrates movement along the regeneration frequency axis without comparable movement along the artifact durability axis. Developers are clearly able to produce code faster. But in most important cases, the code is still versioned, reviewed, tested, deployed carefully, and maintained. That is actually evidence for malleability.
The most defensible form of the ephemeral software hypothesis is therefore softer: not that all software becomes stateless promptware regenerated from vague natural language each time, but that implementation matters much less than today, that teams care less about preserving particular codebases, and that the durable layer moves upward into higher-level artifacts. This softer claim is harder to refute, and I want to engage with it directly. My argument is that even this weaker version runs into hard limits once systems become stateful, integrated, and repeatedly used, and that the zone of software that stays lightweight enough to avoid those limits is narrower than it appears.
Before arguing against ephemeral software, I want to acknowledge why many find it plausible. In a poll about the ephemeral software hypothesis, roughly two thirds of respondents told me this was a straw man: nobody seriously believes software will become disposable. The other third told me it was obviously the future.
The reasoning is straightforward: if software generation cost approaches zero, software maintenance looks like a waste of time.
Why debug code when we can regenerate it? Why refactor what we can redescribe? And crucially, why maintain institutional knowledge about code that appears cheaper to throw away and recreate than to understand?
Andrej Karpathy describes his experience vibe coding in his 2025 year-in-review
The hypothesis has been explicitly endorsed by credible names in tech and backed by extraordinary capital. Andrej Karpathy has repeatedly called for “super custom, super ephemeral one-off apps by default,”
The combined valuations of top AI coding companies now exceed \$60 billion.
Survey data suggests developers are already behaving as if code is becoming more disposable. The Stack Overflow 2025 survey found that 28% of developers use vibe coding professionally, just months after the term was coined.
Perhaps most telling is the empirical data on how code itself is changing. GitClear’s analysis of over 211 million changed lines of code found that code cloning increased eightfold in 2024, while refactoring and reuse declined from 25% to less than 10% of changed lines.
The most expansive phrasing of the ephemeral software case combines Tunguz’s framework (ephemeral apps will outnumber persisted SaaS millions to one), Acharya’s economic argument (the ROI constraint on software creation has been removed), and Karpathy’s experiential claim (code is free, ephemeral, malleable, discardable after single use). Together, they argue not just that ephemeral software is possible, but that the economic and cognitive barriers that previously made software durable were always artificial constraints, and those constraints have now been lifted.
This is a serious prediction, backed by real capital and taken seriously by experienced technologists.
Three observations from this evidence are worth highlighting.
First, the narrative is partially incentive-shaped. If someone’s business monetizes generation cycles, hosted runtimes, and proprietary context stores, then framing software as disposable supports their economics. That does not make the thesis false, but it should make us treat it as partially incentive-driven rather than purely technical forecasting.
Second, there is a rational economic case from the buyer’s side. Organizations often spend far more effort maintaining and adapting software than writing greenfield code. If regeneration reaches “good enough” reliability for non-critical systems at a fraction of that maintenance burden, many businesses will rationally accept the tradeoff. I expect it to drive adoption in the lightweight categories mentioned earlier. However, the savings are less dramatic than they first appear. Verification, integration testing, and incident response (the cost of diagnosing failures in regenerated code) do not disappear when generation becomes cheap. They shift from “maintaining code” to “maintaining trust in (re)generated code.” For big software systems with complex integration surfaces, these residual costs can rival or exceed the maintenance costs they were meant to replace.
Furthermore, many companies actually fail to maintain software. Legacy code persists but is not well-maintained. If the real comparison is “ephemeral vs. rotting,” rather than “ephemeral vs. well-maintained,” the ephemeral case looks considerably stronger. But ephemerality does not solve the problem of rot; it replaces one failure mode—accumulating neglect—with another: regeneration risk. (The right response to rot isn’t disposability but cheaper maintenance, which is precisely what the malleable software thesis proposes later in this essay.)
Third, adoption data deserves scrutiny. Companies with 50–90% AI-generated code, or YC startups with 95% AI-generated codebases, are succeeding. AI-generated code works. But these companies use Claude Code and Cursor within standard engineering workflows: pull requests, code review, CI, Git. “AI-generated” is not the same as “ephemeral.” This is exactly the malleable software thesis: AI produces the code more efficiently, but the code remains a persisted, maintained artifact within standard engineering workflows.
Nobody is running a bank or a hospital on ephemeral software, so instead of claiming to refute something that has not happened, I will argue that ephemeral software faces the same barriers that have hampered software rewrites for the past 50 years, and that these structural barriers do not disappear.
The history of software engineering is littered with failed rewrites. Initially, in 1975, Fred Brooks famously advised “plan to throw one away; you will, anyhow.”
When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
—Joel Spolsky, “Things You Should Never Do”
A bit later, in 2000, Joel Spolsky wrote “Things You Should Never Do,”
The strongest modern objection to this is obvious: unlike a human rewrite team, an AI can read the entire prior codebase, issue history, incident reports, and postmortems before regenerating. I agree this is a major capability jump, and it will reduce some rewrite costs. But working code has emergent properties that nobody explicitly designed, documented, or even noticed. These properties exist because this code ran against this data in this environment over time. Much of an existing system’s behavior is environmental, implicit, or never fully recorded.
There is an even more relevant historical precedent: compilers. Assembly was once hand-crafted and carefully maintained; now it is generated from higher-level languages and treated as disposable. Isn’t the ephemeral software hypothesis simply predicting another abstraction-layer shift?
The analogy is instructive, but it doesn’t support the conclusion. Every prior abstraction-layer shift in computing went from one formal language to another formal language: assembly to C, C to Python, raw queries to SQL. The higher-level language became the new persisted artifact precisely because it had formal semantics: unambiguous, deterministic, machine-verifiable. The ephemeral software thesis, however, asks for something with no historical precedent: that natural language—inherently ambiguous and underspecified—can serve as the persisted specification layer for important software
This creates a structural dilemma independent of how capable future AI becomes: regeneration either preserves these emergent behavioral properties—in which case you are investing heavily in continuity, the opposite of ephemerality—or it does not, in which case you are accepting production risk with every cycle. There is no third option.
Even vibe coding, as a weaker form of ephemerality, has received pushback as we have gained experience with it.
A research study analyzing over 5,600 publicly available vibe-coded applications identified more than 2,000 vulnerabilities.
“Don’t remove a fence until you understand why it was built.” —G.K. Chesterton
This is obviously a play on gaps in institutional knowledge and the necessity for caution and for the right amount of change aversion. One pushback to this is that change aversion only exists because writing code used to be expensive, and now that generation is cheap we can drop it. I think this gets it backwards. Change aversion exists not because writing code was expensive, but because discovering correct behavior is expensive. Generation cost was never the dominant term for mature software systems; discovery, validation, integration, and coordination were. Cheap generation does not remove any of those costs.
A different perspective on the issue is that ambiguity has to go somewhere. In non-ephemeral systems, it is progressively resolved into stable code from tests, schemas, interfaces, and operational practice. Over time, users experience the result as continuity, familiarity, and reliability. In ephemeral systems, that same ambiguity is reintroduced at each regeneration and experienced instead as variance: the same user request can behave slightly differently across runs. This does not depend on system complexity. Any software used more than once creates expectations, and any ambiguity left unresolved in the specification becomes variance across regenerations that violates those expectations. A simple internal tool does not need the state machinery of a payment system for this to matter—it only needs a user who opens it on Tuesday and finds that the table sorts differently than it did on Monday. The barriers below are instances of this mechanism: each describes a domain where unresolved ambiguity produces variance, and where that variance carries a cost.
With this in mind, here are four specific barriers which are primarily structural, that is, they are unlikely to disappear with better tooling.
Edge cases can emerge from real users doing unexpected things, from third-party services behaving inconsistently, and from data that violates assumptions you did not know you were making. This knowledge accumulates only over time in deployed systems.
Each regeneration resets the clock. Variations in an implementation can trigger new edge cases not previously encountered. Persisting logs and traces from previous runs helps preserve past lessons, but it does not cover novel failure modes caused by differences in the regenerated code.
The speed at which we can create trustworthy software is limited by unexpected interactions: failures that emerge only when new code meets the real world in combinations that no pre-deployment testing anticipated. Better monitoring, canary deployments, staged rollouts, and simulations can compress discovery time while minimizing risks. But novelty remains the issue: new code interacting with existing state, real users, and real environmental conditions produces failure modes that prior runs have not yet modeled.
The strongest counterargument here is persistent AI agents that monitor production, accumulate knowledge in vector stores, log anomalies, and feed rich context into every future generation. I think such agents will be genuinely valuable. But notice what happens as you make this approach robust: the ephemeral part shrinks to implementation details while everything that makes the system work correctly remains persisted. At that point, you have arrived at a malleable software thesis instead.
Vibe coding works best when the application is mostly stateless, the data model is simple, and integration surfaces are few. Most software lives in a different world.
Real systems accumulate schema evolution over time: columns renamed, types widened, nullable fields that were once required, foreign keys to tables that no longer exist. Each migration encodes a decision about backward compatibility that took effort to get right. Regenerate the application layer, and the new code must still interface correctly with the existing data, or you risk silent corruption.
Integration surfaces compound the problem: APIs from partners who version inconsistently, message queues with in-flight events, caches with assumptions about object shapes, systems with hidden timing dependencies. The code that manages these boundaries is often the thinnest but most consequential layer, and it is precisely the layer where regeneration is most dangerous because getting it almost right can be worse than getting it obviously wrong (a silent bug can corrupt data or cause costly failures that only become visible after the fact).
This issue is, historically, where software rewrites go to die: not in the abstract business logic, but in the accumulated state machinery that keeps everything consistent across versions, deploys, migrations, and downstream consumers.
Any software used repeatedly creates consistency expectations. Yet, any variance in ephemeral software due unresolved ambiguity would be experienced as interface instability by the end user. Users build mental models of where UI elements are, what shortcuts do, which workflows are safe, and what kinds of behavior they can trust. This is not limited to power users: casual users develop habits quickly, and any change—even an objective improvement—imposes friction and a learning cost. QWERTY keyboards persist not because they are optimal but because switching costs are too high.
The stakes escalate in high-pressure environments. In healthcare IT, EHR interface changes during version upgrades have been identified as contributing factors in medication errors and missed alerts.
Some strong-form versions of the ephemeral software hypothesis imagine user interfaces regenerated for each use and user. But regeneration implies variance, and variance conflicts with the consistency that repeated use demands. The reply might be: “We keep stable UI templates and only regenerate the implementation.” But that concedes the point. If you constrain templates, interaction models, and behavioral contracts to remain stable, you have already committed a significant layer to durability. The “ephemeral” part shrinks to implementation details.
Systems with real stakes require traceable artifacts. “We ran this prompt” is not a sufficient answer for SOC2, HIPAA, financial controls, incident response, or liability disputes. When something goes wrong, you need to know what code ran, with what inputs, against which data, under which version, and why the system behaved as it did. This is not limited to formally regulated industries: payments, identity, fraud detection, ads delivery, supply chains, ranking systems, and production operations all depend on post-hoc explainability and reproducibility.
Even perfect determinism would not solve the deeper problem as there are really two separate issues:
The first is non-determinism: LLMs do not always produce identical outputs from identical prompts; model versions change; and the same prompt run twice may generate slightly different code. This is a technical challenge that is solvable.
However, the second, deeper issue is ambiguity: natural language is inherently ambiguous as a specification medium. This is an epistemological property of natural language itself. The same English sentence can be parsed multiple ways, and edge cases that code forces you to resolve can remain implicit in prose. A requirement like “retry on transient failures” leaves open which failures count as transient, how many retries to attempt, which backoff schedule to use, and whether to alert after exhaustion. For a marketing email queue, this ambiguity may be acceptable. For a payment system, once these questions are resolved in code and validated in production, the resolution becomes part of the relied-upon contract.
Of course, higher-level specifications are useful, and abstraction levels in programming languages have risen meaningfully over time. SQL lets you declare what you want without writing a query optimizer, and there is substantial productive space between “vague prompt” and “code in C++.” AI makes that space larger. But SQL works precisely because it has formal semantics. The residual ambiguity at any abstraction level above formal code still creates regeneration risk. The more critical the system, the more ambiguity must be pinned down, and the only way to eliminate this ambiguity is to make the specification increasingly formal and precise, at which point it converges towards code.
This is why code remains central. Implementation details matter eventually in any software system, and code is the final arbiter. Not because it is a perfect expression of human intent, but because it is where operational ambiguity has been concretely resolved into deployed behavior.
Returning to the big picture, ephemerality shifts variance risk onto someone. Every regeneration cycle introduces the possibility of subtle behavioral change, and someone must bear the cost when that change causes a failure. In domains where switching costs are low or where losses are fat-tailed, the equilibrium favors durability and standardization. This incentive toward durability operates independently of any specific technology trend.
The most significant development supporting this essay’s thesis is the emergence of spec-driven development (SDD) tools like GitHub’s Spec Kit, AWS’s Kiro, and others.
Concretely, a malleable workflow looks something like this:
Regeneration still happens locally—a module, endpoint, UI flow—and sometimes even more aggressively. But it is bounded by code and other persisted context. The key optimization target is not “discard code quickly,” but “apply reliable change quickly while preserving accumulated knowledge.”
To make this concrete, consider how three teams handle adding multi-currency support to a payment system:
Existing tools already point toward this model: GitHub Copilot Workspace generates implementation plans alongside code; Cursor’s agent mode edits code while maintaining project context; spec-driven tools like Kiro produce structured artifacts alongside implementation. None of these treat code as disposable. They treat it as more efficiently produced and modified while remaining persisted.
Cheap generation also creates a broader ecosystem issue. If every team regenerates its own HTTP client, authentication library, date parser, or serialization layer, the result is a combinatorial explosion of subtly incompatible implementations. This would cause a breakdown of the interoperability that makes software useful at scale.
Consider authentication: a single well-maintained open-source library like OpenSSL gets security patches from a global community, and a vulnerability fix propagates to every consumer. If instead a thousand teams each regenerate their own TLS implementation, a vulnerability discovered in one does not get fixed in the other 999. Shared stable abstractions—libraries, protocols, APIs, data formats—are likely to become more important in a world of cheap generation, not less. They are the glue between components that may be regenerated more frequently. Who maintains these shared dependencies becomes a more pressing question when cheap generation tempts every team to roll their own.
One more objection deserves explicit acknowledgment: perhaps AI makes it economically viable to keep more software in the lightweight category (minimal state, users, integration complexity) for longer, where it remain essentially ephemeral. What if features that currently require complex, stateful systems get rebuilt as simpler, more modular services that can be regenerated more easily with minimal persisted context? If AI enables better decomposition, the zone of safely regenerable software could expand.
I think this is a real possibility, and it would move the boundary somewhat. But there are limits. State, users, and integration complexity are not artifacts of poor architecture but properties of the problems being solved. A payment system needs state because money moves between accounts. A healthcare system needs audit trails because lives depend on traceability. A multi-user app needs consistency because people share data. A single user already expects consistency in behavior and UI. Better decomposition can isolate the potentially ephemeral-friendly parts, but we have seen that even simple user interfaces already benefit from durability.
To avoid making an equally unfalsifiable claim, I want to be explicit about what would change my mind.
I would update moderately toward the ephemeral thesis if, within 2–3 years, the majority of new consumer and internal business applications are generated ephemerally with acceptable reliability, even if critical infrastructure and regulated systems remain durable. That outcome would mean ephemerality is correct for a large share of software by volume, and I would need to concede the boundary sits further toward ephemerality than I currently expect.
I would update strongly and concede that software durability has been structurally overestimated if we see the following pattern broadly:
To make these criteria more concrete, here are three measurable markers I would look for:
Vibe coding is here. One-off tools, exploratory analysis, internal dashboards, and prototypes are already becoming more ephemeral, and that is genuinely valuable.
Ephemeral software as a broader thesis, however, will likely not happen for the same reasons “just rewrite it” has never worked at scale. The dominant challenges in software engineering remain regardless of generation speed: discovering edge cases through real-world use, preserving state and compatibility, maintaining auditability, and sustaining interface stability.
The category of software that stays lightweight is smaller than it appears. Most apps accumulate state, users, integrations, and reliability expectations quickly, at which point durability pressures reassert themselves. The interesting question is not whether any software becomes more disposable—some clearly will—but whether the boundary between disposable and durable moves as far as ephemerality proponents predict. I am betting it does not, and I’ve laid out what would change my mind; I don’t expect to see it soon though.
My prediction is that the future of software is malleable: codebases and artifact stacks that are much easier to modify, with many more persisted artifacts than in the past. AI allows us to interact with larger volumes of less structured information than was previously possible. Natural-language specifications, richer test suites, conversational change logs, production-memory stores, and postmortem summaries will exist alongside established artifacts such as code, version history, schemas, UI conventions, and other institutional knowledge. In practice, an engineer is increasingly able to describe a change in plain language, have an AI agent draft the implementation and update the surrounding artifacts, and then review the whole package. But those changes still go through verification, staged deployment, and real-world feedback.
This is a meaningful shift, but it is not “ephemeral software.”
Acknowledgment: I would like to thank everyone who gave feedback on earlier drafts of this essay: korigero, Alex Shtoff, Calvin McCarter, Sam Stevens (in no particular order).
Claude and GPT were used for editing and iterating over many drafts. I also used Claude, Gemini, and Gemini Deep Research for the literature review (“Why Ephemeral Software Is Seductive”).