The Flawed Ephemeral Software Hypothesis

Why software won't become disposable despite the rise of AI-assisted coding

Andrej Karpathy calls it “vibe coding”Andrej Karpathy on “vibe coding”: X/Twitter (February 2025): you describe what you want, and the AI writes the code for you. Likewise, Guillermo Rauch, Vercel’s CEO, has predicted the future of apps as ephemeral: software “generated on demand rather than downloaded and installed.”Guillermo Rauch on ephemeral apps: Sequoia interview

It is easy to come to the conclusion that this is the future of software. Cursor hit \$500M ARR in twelve months.Cursor / Anysphere ARR: TechCrunch A quarter of YC Winter 2025 startups report codebases that are 95% AI-generated.YC W25 AI-generated codebases: TechCrunch The Stack Overflow 2025 survey shows AI tools are now standard in daily developer workflows.Stack Overflow 2025 Developer Survey on AI: Stack Overflow More recently, Google’s AI Mode started offering custom UIs, calculators, and simulations in response to search queriesGoogle Gemini 3 / AI Mode: Google Blog: custom software only used once to be discarded.

Indeed, this was the conclusion I jumped to when I first started thinking about the future of software. But while we can experience vibe coding today, ephemeral software is only hype so far, and in this essay, I want to argue that ephemeral software will not become a reality as it has been pitched in various waysSee for example Tomasz Tunguz’s “Ephemeral Software” or Anish Acharya’s “Disposable Software” on the a16z blog..

While I do believe that software engineering as a whole can be automated, and indeed, any role within the process can and likely will be performed by AI agents in the near futureSteve Yegge, “Welcome to Gas Town”: Medium (January 2026), I disagree that this will lead to an “ephemeral” quality of software. I do not think we will ever use a text editor app whose code is generated on the fly from natural-language specifications, trusted as if nothing important could have been lost in the process, and immediately discarded when we close the app.

My core claim is this: AI does not make software ephemeral; it obviously makes code generation cheaper, but this shifts the bottlenecks to validation, integration, and ergonomics (UX etc.). The hard part of software engineering has never been writing code. It has been discovering correct behavior by resolving edge cases and operational ambiguity when software collides with reality. Neither the duration of this discovery process nor its cost vanishes when code generation becomes fast by itself Obviously, AI is speeding up other parts of the software engineering process as well, but in the end, software always faces the real world and its complexities. Even if we assumed that we will further transform the world to become more legible for automation, this process will take considerable time and effort.. Drawing an analogy to Amdahl’s law, which tells us that speedup through parallelization is limited by the irreducibly sequential part, the speed at which we can generate trustworthy software is also limited by the parts that cannot be sped up by AI.

Instead, I will argue that the future of software is malleable, not ephemeral. By this, I mean that all artifacts of software engineering will become more malleable but code will not become ephemeral—i.e., forgettable or disposable. Code will remain the source of truth. Institutional artifacts will become more important: code, version history, tests, specs, audit trails, and postmortems. The malleable software model persist code and higher-level artifacts together with massively reduced friction and maintenance costs thanks to AI agents and tools.

TL;DR. AI will make code massively easier to produce and modify, but durable artifacts—code, version history, tests, schemas, specs, audit trails, production memory—will proliferate rather than disappear. In mature software systems, code remains central both because it is formal and because it is where operational ambiguity is concretely resolved into deployed behavior. Natural language can express intent, but it cannot by itself replace formal executable semantics. AI therefore accelerates incremental change and artifact maintenance rather than making software disposable as a whole. The rest of this essay builds the case: first by engaging seriously with the evidence for ephemerality, then by identifying structural barriers that resist it, and finally by sketching what malleable software engineering looks like in practice.

Avoiding the Motte and Bailey Fallacy

Let’s contrast vibe coding and ephemeral software:

Vibe coding. For many small tasks, you can describe intent in natural language and get working code fast and cheaply enough to treat it as a one-off artifact. This is already true.

Ephemeral software. Much more software, including larger and more important projects, becomes throwaway: generated on demand, used once or briefly, and not meaningfully maintained. The persisted layer moves upward into stored prompts, specs, logs, policies, and UI mocks; the codebase is at most cached and can be discarded and regenerated with high trust.

I am defining ephemeral software here somewhat generously to engage with the strongest defensible version of the claim, not only its most naive form.

There is a risk for the Motte and Bailey fallacy in the discourse around ephemeral software: evidence for cheap code and fast iteration is treated as evidence that persisted artifact stacks will disappear. Cursor’s ARR, Copilot adoption, AI-generated codebases at YC startups, Claude Code revenue, and the Stack Overflow survey mostly show developers producing code cheaper and faster within durable workflows: pull requests, CI, Git, code review. This is not evidence that software has become disposable as the ephemeral software hypothesis claims.

To make the discussion more precise, we can decompose “ephemerality” along two axes:

Regeneration frequency: how often is most of the code regenerated? Rarely (incremental edits to a regular codebase), on every change request, or continuously per session?
Artifact durability: what persists across regenerations? A full artifact stack (code, tests, schemas, specs, logs, UI mocks), only higher-level prompts/specifications, or almost nothing at all?

The strongest form of the ephemeral software hypothesis occupies the extreme of both axes: continuous regeneration with minimal persisted code artifacts. My argument is that this corner is unstable at scale, and systems that start there migrate toward persisted artifacts as they accumulate users, state, and integration complexity. Where a given piece of software falls on these axes depends on its stakes, its integration surface, and how long it lives.

Much of the evidence cited for ephemerality actually demonstrates movement along the regeneration frequency axis without comparable movement along the artifact durability axis. Developers are clearly able to produce code faster. But in most important cases, the code is still versioned, reviewed, tested, deployed carefully, and maintained. That is actually evidence for malleability.

The most defensible form of the ephemeral software hypothesis is therefore softer: not that all software becomes stateless promptware regenerated from vague natural language each time, but that implementation matters much less than today, that teams care less about preserving particular codebases, and that the durable layer moves upward into higher-level artifacts. This softer claim is harder to refute, and I want to engage with it directly. My argument is that even this weaker version runs into hard limits once systems become stateful, integrated, and repeatedly used, and that the zone of software that stays lightweight enough to avoid those limits is narrower than it appears.

Why Ephemeral Software Is Seductive

Before arguing against ephemeral software, I want to acknowledge why many find it plausible. In a poll about the ephemeral software hypothesis, roughly two thirds of respondents told me this was a straw man: nobody seriously believes software will become disposable. The other third told me it was obviously the future.See my poll a week ago: X/Twitter Poll. The detailed evidence below exists for the first group; the key observations that follow are relevant to both.

The detailed case for ephemeral software (evidence and endorsements)

The Reasoning Behind It

The reasoning is straightforward: if software generation cost approaches zero, software maintenance looks like a waste of time.

Why debug code when we can regenerate it? Why refactor what we can redescribe? And crucially, why maintain institutional knowledge about code that appears cheaper to throw away and recreate than to understand?

Andrej Karpathy describes his experience vibe coding in his 2025 year-in-reviewAndrej Karpathy, “2025 LLM Year in Review”: Bear Blog (December 2025): he “vibe coded entire ephemeral apps just to find a single bug [in a different project] because why not—code is suddenly free, ephemeral, malleable, discardable after single use.” This is already transformative for debugging tools, quick prototypes, and personal scripts. And faster iteration cycles enabled by LLMs suggest these costs will keep dropping, allowing for vibe engineering.Simon Willison, “Not all AI-assisted programming is vibe coding”: simonwillison.net (March 2025)

The Voices and Capital Behind It

The hypothesis has been explicitly endorsed by credible names in tech and backed by extraordinary capital. Andrej Karpathy has repeatedly called for “super custom, super ephemeral one-off apps by default,”Andrej Karpathy, reply to Geoffrey Litt: X/Twitter (May 2025); see also X/Twitter (July 2025) predicting the app store will be replaced by generated, disposable software. Tomasz Tunguz predicted ephemeral apps will “outnumber SaaS applications, perhaps millions to one.”Tomasz Tunguz, “Coffee, Omelettes, and Five-Course Meals: A New Software Menu”: tomtunguz.com (September 2025) Anish Acharya of a16z argued software “doesn’t need to be permanent or practical anymore” and led Wabi’s \$20M pre-seed investment to prove it.Anish Acharya, “Disposable Software”: a16z.com (August 2025); TechCrunch, “Replika founder raises \$20M pre-seed for Wabi”: TechCrunch (November 2025) Amjad Masad, CEO of Replit, went furthest: “The value of all application software will eventually go to zero.”Amjad Masad, YC AI Startup School (June 2025); see StartupHub.ai

The combined valuations of top AI coding companies now exceed \$60 billion.Cursor/Anysphere valuation: SpearheadLovable \$6.6B valuation: CNBC (December 2025) Products built on the explicit premise of disposable software are proliferating, from Wabi to Google’s A2UI standard to Anthropic’s Artifacts.

The Evidence So Far

Survey data suggests developers are already behaving as if code is becoming more disposable. The Stack Overflow 2025 survey found that 28% of developers use vibe coding professionally, just months after the term was coined.Stack Overflow 2025 Developer Survey on AI: Stack Overflow Claude Code reached \$1 billion in run-rate revenue within six months of its public launch,Anthropic acquires Bun as Claude Code reaches \$1B: Anthropic and 50–90% of code at Anthropic itself is now produced by Claude Code.A spokesperson at Anthropic said 70-90%, while an employee said that it was closer to 50% for production code. See the following report generated with Claude for all the sources and details: Report

Perhaps most telling is the empirical data on how code itself is changing. GitClear’s analysis of over 211 million changed lines of code found that code cloning increased eightfold in 2024, while refactoring and reuse declined from 25% to less than 10% of changed lines.GitClear, “AI Copilot Code Quality: 2025 Data”, p. 9, 12, 24: GitClear Code churn (code reverted within two weeks of being written) is projected to double relative to the 2021 baseline. Developers are already treating code as more disposable in practice, whether or not they endorse the theory.

Ephemeral Software as a Hypothesis

The most expansive phrasing of the ephemeral software case combines Tunguz’s framework (ephemeral apps will outnumber persisted SaaS millions to one), Acharya’s economic argument (the ROI constraint on software creation has been removed), and Karpathy’s experiential claim (code is free, ephemeral, malleable, discardable after single use). Together, they argue not just that ephemeral software is possible, but that the economic and cognitive barriers that previously made software durable were always artificial constraints, and those constraints have now been lifted.

This is a serious prediction, backed by real capital and taken seriously by experienced technologists.

Key Observations

Three observations from this evidence are worth highlighting.

First, the narrative is partially incentive-shaped. If someone’s business monetizes generation cycles, hosted runtimes, and proprietary context stores, then framing software as disposable supports their economics. That does not make the thesis false, but it should make us treat it as partially incentive-driven rather than purely technical forecasting.

Second, there is a rational economic case from the buyer’s side. Organizations often spend far more effort maintaining and adapting software than writing greenfield code. If regeneration reaches “good enough” reliability for non-critical systems at a fraction of that maintenance burden, many businesses will rationally accept the tradeoff. I expect it to drive adoption in the lightweight categories mentioned earlier. However, the savings are less dramatic than they first appear. Verification, integration testing, and incident response (the cost of diagnosing failures in regenerated code) do not disappear when generation becomes cheap. They shift from “maintaining code” to “maintaining trust in (re)generated code.” For big software systems with complex integration surfaces, these residual costs can rival or exceed the maintenance costs they were meant to replace.

Furthermore, many companies actually fail to maintain software. Legacy code persists but is not well-maintained. If the real comparison is “ephemeral vs. rotting,” rather than “ephemeral vs. well-maintained,” the ephemeral case looks considerably stronger. But ephemerality does not solve the problem of rot; it replaces one failure mode—accumulating neglect—with another: regeneration risk. (The right response to rot isn’t disposability but cheaper maintenance, which is precisely what the malleable software thesis proposes later in this essay.)

Third, adoption data deserves scrutiny. Companies with 50–90% AI-generated code, or YC startups with 95% AI-generated codebases, are succeeding. AI-generated code works. But these companies use Claude Code and Cursor within standard engineering workflows: pull requests, code review, CI, Git. “AI-generated” is not the same as “ephemeral.” This is exactly the malleable software thesis: AI produces the code more efficiently, but the code remains a persisted, maintained artifact within standard engineering workflows.

Falsifying the Idea of Ephemeral Software

Nobody is running a bank or a hospital on ephemeral software, so instead of claiming to refute something that has not happened, I will argue that ephemeral software faces the same barriers that have hampered software rewrites for the past 50 years, and that these structural barriers do not disappear.

Historic Precedents

The history of software engineering is littered with failed rewrites. Initially, in 1975, Fred Brooks famously advised “plan to throw one away; you will, anyhow.”Fred Brooks, The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, Addison-Wesley, 1995 (Chapter 5: second system effect; Chapter 11: “plan to throw one away”; Chapter 16: “grow, not build”; Chapter 19: revision of throwaway advice) But twenty years later, in the anniversary edition of The Mythical Man-Month, he revised this as too simplistic. His updated advice became “grow, not build”: have a working system at every stage, let it grow incrementally, and iterate with users early.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

—Joel Spolsky, “Things You Should Never Do”

A bit later, in 2000, Joel Spolsky wrote “Things You Should Never Do,”Joel Spolsky, “Things You Should Never Do”: Joel on Software about Netscape’s disastrous decision to rewrite its browser from scratch. It took three years, and during those years, Internet Explorer ate their market share. In the end, the rewrite shipped in a buggy state because the team had to rediscover edge cases that had already been resolved in the original system throughout its lifetime. Netscape never recovered.

The strongest modern objection to this is obvious: unlike a human rewrite team, an AI can read the entire prior codebase, issue history, incident reports, and postmortems before regenerating. I agree this is a major capability jump, and it will reduce some rewrite costs. But working code has emergent properties that nobody explicitly designed, documented, or even noticed. These properties exist because this code ran against this data in this environment over time. Much of an existing system’s behavior is environmental, implicit, or never fully recorded.

Compilers

There is an even more relevant historical precedent: compilers. Assembly was once hand-crafted and carefully maintained; now it is generated from higher-level languages and treated as disposable. Isn’t the ephemeral software hypothesis simply predicting another abstraction-layer shift?

The analogy is instructive, but it doesn’t support the conclusion. Every prior abstraction-layer shift in computing went from one formal language to another formal language: assembly to C, C to Python, raw queries to SQL. The higher-level language became the new persisted artifact precisely because it had formal semantics: unambiguous, deterministic, machine-verifiable. The ephemeral software thesis, however, asks for something with no historical precedent: that natural language—inherently ambiguous and underspecified—can serve as the persisted specification layer for important softwareObviously, contracts are a special case where we very carefully specify constraints and interactions using natural language. However, contracts resolve ambiguity through a painstaking process of negotiation and review that takes considerable time, and the execution of contracts establishes consistency and trust in many ways through the institutional knowledge of the parties involved and the consistency of the humans executing them.. Even programming languages have undefined behavior that are often at the root of hard-to-trace bugs: C’s platform-dependent integer sizes, for instance, have caused many headaches over the years. Natural language is orders of magnitude worse. We will return to this point below when discussing ambiguity as a barrier.

This creates a structural dilemma independent of how capable future AI becomes: regeneration either preserves these emergent behavioral properties—in which case you are investing heavily in continuity, the opposite of ephemerality—or it does not, in which case you are accepting production risk with every cycle. There is no third option.

Unconstrained regeneration recreates the classic rewrite problem, just faster. Every time someone regenerates an application’s code without sufficient constraints, they risk losing lessons embedded in the previous iteration through trial and error. Constrained regeneration—using the old codebase, diffs, logs, test suites, structured memory, gradual rollout—reduces this risk, but only by leaning on persisted artifacts.Even if one took detailed notes of all those lessons, any rewrite still has to resolve how to handle them correctly anew. The more you constrain regeneration to make it safe, the less “ephemeral software” the system actually contains.

Vibe Coding Caveats

Even vibe coding, as a weaker form of ephemerality, has received pushback as we have gained experience with it.Andrew Ng on vibe coding: Business Insider (June 2025) The defining characteristic of vibe coding is accepting code without reviewing it in detail. For throwaway scripts, that may be fine. In production systems, it can be negligent.

A research study analyzing over 5,600 publicly available vibe-coded applications identified more than 2,000 vulnerabilities.Escape Security Research, “2k+ Vulnerabilities in Vibe-Coded Apps”: Escape.tech (October 2025) The Stack Overflow 2025 survey found that while over half of professional developers now use AI coding tools daily, 46% distrust their accuracy, compared to just 33% who trust them.Stack Overflow 2025 Developer Survey on AI: Stack Overflow

The Barriers

“Don’t remove a fence until you understand why it was built.” —G.K. Chesterton

This is obviously a play on gaps in institutional knowledge and the necessity for caution and for the right amount of change aversion. One pushback to this is that change aversion only exists because writing code used to be expensive, and now that generation is cheap we can drop it. I think this gets it backwards. Change aversion exists not because writing code was expensive, but because discovering correct behavior is expensive. Generation cost was never the dominant term for mature software systems; discovery, validation, integration, and coordination were. Cheap generation does not remove any of those costs.

A different perspective on the issue is that ambiguity has to go somewhere. In non-ephemeral systems, it is progressively resolved into stable code from tests, schemas, interfaces, and operational practice. Over time, users experience the result as continuity, familiarity, and reliability. In ephemeral systems, that same ambiguity is reintroduced at each regeneration and experienced instead as variance: the same user request can behave slightly differently across runs. This does not depend on system complexity. Any software used more than once creates expectations, and any ambiguity left unresolved in the specification becomes variance across regenerations that violates those expectations. A simple internal tool does not need the state machinery of a payment system for this to matter—it only needs a user who opens it on Tuesday and finds that the table sorts differently than it did on Monday. The barriers below are instances of this mechanism: each describes a domain where unresolved ambiguity produces variance, and where that variance carries a cost.

With this in mind, here are four specific barriers which are primarily structural, that is, they are unlikely to disappear with better tooling.

1. Edge Cases

Edge cases can emerge from real users doing unexpected things, from third-party services behaving inconsistently, and from data that violates assumptions you did not know you were making. This knowledge accumulates only over time in deployed systems.

Tests catch what you anticipated. Production catches what you didn’t.

Each regeneration resets the clock. Variations in an implementation can trigger new edge cases not previously encountered. Persisting logs and traces from previous runs helps preserve past lessons, but it does not cover novel failure modes caused by differences in the regenerated code.

The speed at which we can create trustworthy software is limited by unexpected interactions: failures that emerge only when new code meets the real world in combinations that no pre-deployment testing anticipated. Better monitoring, canary deployments, staged rollouts, and simulations can compress discovery time while minimizing risks. But novelty remains the issue: new code interacting with existing state, real users, and real environmental conditions produces failure modes that prior runs have not yet modeled.

The strongest counterargument here is persistent AI agents that monitor production, accumulate knowledge in vector stores, log anomalies, and feed rich context into every future generation. I think such agents will be genuinely valuable. But notice what happens as you make this approach robust: the ephemeral part shrinks to implementation details while everything that makes the system work correctly remains persisted. At that point, you have arrived at a malleable software thesis instead.

2. State, Data, and Integration Surfaces

Vibe coding works best when the application is mostly stateless, the data model is simple, and integration surfaces are few. Most software lives in a different world.

Real systems accumulate schema evolution over time: columns renamed, types widened, nullable fields that were once required, foreign keys to tables that no longer exist. Each migration encodes a decision about backward compatibility that took effort to get right. Regenerate the application layer, and the new code must still interface correctly with the existing data, or you risk silent corruption.

Integration surfaces compound the problem: APIs from partners who version inconsistently, message queues with in-flight events, caches with assumptions about object shapes, systems with hidden timing dependencies. The code that manages these boundaries is often the thinnest but most consequential layer, and it is precisely the layer where regeneration is most dangerous because getting it almost right can be worse than getting it obviously wrong (a silent bug can corrupt data or cause costly failures that only become visible after the fact).

This issue is, historically, where software rewrites go to die: not in the abstract business logic, but in the accumulated state machinery that keeps everything consistent across versions, deploys, migrations, and downstream consumers.

3. Interface Stability and User Expertise

Any software used repeatedly creates consistency expectations. Yet, any variance in ephemeral software due unresolved ambiguity would be experienced as interface instability by the end user. Users build mental models of where UI elements are, what shortcuts do, which workflows are safe, and what kinds of behavior they can trust. This is not limited to power users: casual users develop habits quickly, and any change—even an objective improvement—imposes friction and a learning cost. QWERTY keyboards persist not because they are optimal but because switching costs are too high.

The stakes escalate in high-pressure environments. In healthcare IT, EHR interface changes during version upgrades have been identified as contributing factors in medication errors and missed alerts.ECRI Institute has repeatedly flagged EHR usability issues, including layout changes during upgrades, as a top patient safety concern. See also The Joint Commission Sentinel Event Alert 54, “Safe use of health information technology” (2015). The same dynamic applies wherever decisions are made under time pressure: trading floors, air traffic control, incident response.

Some strong-form versions of the ephemeral software hypothesis imagine user interfaces regenerated for each use and user. But regeneration implies variance, and variance conflicts with the consistency that repeated use demands. The reply might be: “We keep stable UI templates and only regenerate the implementation.” But that concedes the point. If you constrain templates, interaction models, and behavioral contracts to remain stable, you have already committed a significant layer to durability. The “ephemeral” part shrinks to implementation details.

4. Ambiguity, Determinism, and Auditability

Systems with real stakes require traceable artifacts. “We ran this prompt” is not a sufficient answer for SOC2, HIPAA, financial controls, incident response, or liability disputes. When something goes wrong, you need to know what code ran, with what inputs, against which data, under which version, and why the system behaved as it did. This is not limited to formally regulated industries: payments, identity, fraud detection, ads delivery, supply chains, ranking systems, and production operations all depend on post-hoc explainability and reproducibility.

Even perfect determinism would not solve the deeper problem as there are really two separate issues:

The first is non-determinism: LLMs do not always produce identical outputs from identical prompts; model versions change; and the same prompt run twice may generate slightly different code. This is a technical challenge that is solvable.

However, the second, deeper issue is ambiguity: natural language is inherently ambiguous as a specification medium. This is an epistemological property of natural language itself. The same English sentence can be parsed multiple ways, and edge cases that code forces you to resolve can remain implicit in prose. A requirement like “retry on transient failures” leaves open which failures count as transient, how many retries to attempt, which backoff schedule to use, and whether to alert after exhaustion. For a marketing email queue, this ambiguity may be acceptable. For a payment system, once these questions are resolved in code and validated in production, the resolution becomes part of the relied-upon contract.

Of course, higher-level specifications are useful, and abstraction levels in programming languages have risen meaningfully over time. SQL lets you declare what you want without writing a query optimizer, and there is substantial productive space between “vague prompt” and “code in C++.” AI makes that space larger. But SQL works precisely because it has formal semantics. The residual ambiguity at any abstraction level above formal code still creates regeneration risk. The more critical the system, the more ambiguity must be pinned down, and the only way to eliminate this ambiguity is to make the specification increasingly formal and precise, at which point it converges towards code.

This is why code remains central. Implementation details matter eventually in any software system, and code is the final arbiter. Not because it is a perfect expression of human intent, but because it is where operational ambiguity has been concretely resolved into deployed behavior.

Returning to the big picture, ephemerality shifts variance risk onto someone. Every regeneration cycle introduces the possibility of subtle behavioral change, and someone must bear the cost when that change causes a failure. In domains where switching costs are low or where losses are fat-tailed, the equilibrium favors durability and standardization. This incentive toward durability operates independently of any specific technology trend.

From Vibes to Malleable Software

The most significant development supporting this essay’s thesis is the emergence of spec-driven development (SDD) tools like GitHub’s Spec Kit, AWS’s Kiro, and others.Thoughtworks, “From vibe coding to context engineering”: MIT Technology Review (November 2025) Instead of jumping straight to code generation, developers begin with more detailed specifications to ground AI agents. Notice what this requires: another artifact of significant detail, while the generated code still has to be vetted and tested. This points toward a malleable software model where code and higher-level artifacts persist together.

Concretely, a malleable workflow looks something like this:

A stable, deployed baseline exists (code + infra + schemas + interfaces).
AI agents continuously accumulate production memory (incident reports, logs, support tickets, rollout outcomes).
A requested change updates multiple persisted artifacts in one transaction: spec, code, tests, migration plan, and runbooks.
Verification runs against both synthetic tests and replayed production traces.
Deployment feeds new observations back into the memory layer, improving future edits.

Regeneration still happens locally—a module, endpoint, UI flow—and sometimes even more aggressively. But it is bounded by code and other persisted context. The key optimization target is not “discard code quickly,” but “apply reliable change quickly while preserving accumulated knowledge.”

To make this concrete, consider how three teams handle adding multi-currency support to a payment system:

A traditional team spends weeks scoping, writing design docs, implementing, testing edge cases (rounding, exchange-rate staleness, partial refunds in mixed currencies), and rolling out behind a feature flag. The knowledge accumulates in code, tests, and pull-request discussions.
An ephemeral-software team regenerates the payment module from an updated prompt, feeding in production traces and historical incident reports as context. The generator correctly preserves known workarounds. But the new currency-conversion code introduces a rounding path that interacts with existing partial-refund logic in a way no historical trace covers—because the combination never existed before. On day three, customers in Brazil start receiving refunds in the value of the wrong currency, with the issue only being discovered weeks later. The failure was not in any historical trace; it emerged from new code interacting with old state.
A malleable-software team describes the change in natural language, and an AI agent drafts the implementation, updates the spec, and generates migration scripts. Pre-deployment trace analysis flags known risks, but the team recognizes it cannot catch novel interactions. So the change deploys behind a feature flag with a canary rollout. On day two, monitoring detects the same wrong-currency refund in the canary population, and the flag is killed before it reaches the wider user base. The cycle is faster than the traditional approach, but the advantage comes from durable artifacts, incremental rollout, and the assumption that generated code must still be verified in production over time.

Existing tools already point toward this model: GitHub Copilot Workspace generates implementation plans alongside code; Cursor’s agent mode edits code while maintaining project context; spec-driven tools like Kiro produce structured artifacts alongside implementation. None of these treat code as disposable. They treat it as more efficiently produced and modified while remaining persisted.

The Shared Dependency Problem

Cheap generation also creates a broader ecosystem issue. If every team regenerates its own HTTP client, authentication library, date parser, or serialization layer, the result is a combinatorial explosion of subtly incompatible implementations. This would cause a breakdown of the interoperability that makes software useful at scale.

Consider authentication: a single well-maintained open-source library like OpenSSL gets security patches from a global community, and a vulnerability fix propagates to every consumer. If instead a thousand teams each regenerate their own TLS implementation, a vulnerability discovered in one does not get fixed in the other 999. Shared stable abstractions—libraries, protocols, APIs, data formats—are likely to become more important in a world of cheap generation, not less. They are the glue between components that may be regenerated more frequently. Who maintains these shared dependencies becomes a more pressing question when cheap generation tempts every team to roll their own.

Could the Categories Themselves Shift?

One more objection deserves explicit acknowledgment: perhaps AI makes it economically viable to keep more software in the lightweight category (minimal state, users, integration complexity) for longer, where it remain essentially ephemeral. What if features that currently require complex, stateful systems get rebuilt as simpler, more modular services that can be regenerated more easily with minimal persisted context? If AI enables better decomposition, the zone of safely regenerable software could expand.

I think this is a real possibility, and it would move the boundary somewhat. But there are limits. State, users, and integration complexity are not artifacts of poor architecture but properties of the problems being solved. A payment system needs state because money moves between accounts. A healthcare system needs audit trails because lives depend on traceability. A multi-user app needs consistency because people share data. A single user already expects consistency in behavior and UI. Better decomposition can isolate the potentially ephemeral-friendly parts, but we have seen that even simple user interfaces already benefit from durability.

What Would Falsify My View

To avoid making an equally unfalsifiable claim, I want to be explicit about what would change my mind.

I would update moderately toward the ephemeral thesis if, within 2–3 years, the majority of new consumer and internal business applications are generated ephemerally with acceptable reliability, even if critical infrastructure and regulated systems remain durable. That outcome would mean ephemerality is correct for a large share of software by volume, and I would need to concede the boundary sits further toward ephemerality than I currently expect.

I would update strongly and concede that software durability has been structurally overestimated if we see the following pattern broadly:

Multiple production-critical systems (not demos) run primarily on repeatedly regenerated code with incident rates no worse than regular, persisted baselines.
Regulated or high-liability domains accept prompt/spec lineage as sufficient audit evidence without code-level traceability.
Teams repeatedly replace substantial code portions without preserving implementation history and do not show the usual rewrite pathologies (extended instability, rediscovered edge cases, rising churn).
Persistent artifact stacks (tests, traces, postmortems, memory stores) stop growing in importance because ephemeral pipelines absorb their function reliably.

To make these criteria more concrete, here are three measurable markers I would look for:

At least one major regulated industry (finance, healthcare, aviation) adopts an accepted audit regime based on prompt/spec lineage alone, without requiring code-level persistence.
In large organizations, the median age of deployed code artifacts drops to hours or days while higher-level spec artifacts remain stable over months—indicating that code is genuinely treated as disposable while specifications serve as the persisted layer.
The share of production changes applied via regeneration without human code review rises above 50% in organizations with meaningful production traffic.

Conclusion

Vibe coding is here. One-off tools, exploratory analysis, internal dashboards, and prototypes are already becoming more ephemeral, and that is genuinely valuable.

Ephemeral software as a broader thesis, however, will likely not happen for the same reasons “just rewrite it” has never worked at scale. The dominant challenges in software engineering remain regardless of generation speed: discovering edge cases through real-world use, preserving state and compatibility, maintaining auditability, and sustaining interface stability.

While code generation is becoming trivially cheap, software engineering is not.

The category of software that stays lightweight is smaller than it appears. Most apps accumulate state, users, integrations, and reliability expectations quickly, at which point durability pressures reassert themselves. The interesting question is not whether any software becomes more disposable—some clearly will—but whether the boundary between disposable and durable moves as far as ephemerality proponents predict. I am betting it does not, and I’ve laid out what would change my mind; I don’t expect to see it soon though.

My prediction is that the future of software is malleable: codebases and artifact stacks that are much easier to modify, with many more persisted artifacts than in the past. AI allows us to interact with larger volumes of less structured information than was previously possible. Natural-language specifications, richer test suites, conversational change logs, production-memory stores, and postmortem summaries will exist alongside established artifacts such as code, version history, schemas, UI conventions, and other institutional knowledge. In practice, an engineer is increasingly able to describe a change in plain language, have an AI agent draft the implementation and update the surrounding artifacts, and then review the whole package. But those changes still go through verification, staged deployment, and real-world feedback.

This is a meaningful shift, but it is not “ephemeral software.”

Acknowledgment: I would like to thank everyone who gave feedback on earlier drafts of this essay: korigero, Alex Shtoff, Calvin McCarter, Sam Stevens (in no particular order).

Claude and GPT were used for editing and iterating over many drafts. I also used Claude, Gemini, and Gemini Deep Research for the literature review (“Why Ephemeral Software Is Seductive”).