The Case For Open-Weight Models And Why We Can't Trust Frontier Labs

Through early 2026, several large companies blew past their annual AI budgets in a matter of months. Uber and ServiceNow, according to The Information, exhausted their entire yearly allocations for Anthropic’s tools in the first months of the year, and Uber capped spending at $1,500 per employee per tool each month, behind a dashboard that gates any overage. Meta told staff it was tracking toward billions in internal AI cost and moved to meter token usage, standing up a dashboard called AI Gateway to watch spending and flag spikes. The pressure had a cause. Months earlier the company had made “AI-driven impact” a core expectation, and some engineers answered with a practice they called tokenmaxxing, climbing an internal leaderboard named Claudeonomics that ranked the top 250 users by consumption. A copy of the dashboard reviewed by The Information showed 60.2 trillion tokens burned in a 30-day window in April, rising to 73.7 trillion before the leaderboard came down. The retrenchment was industry-wide, and the Wall Street Journal counted Microsoft, Salesforce, and DoorDash among the firms now rationing AI spend. Agentic coding spends tokens far faster than chat, and the move from flat subscriptions to per-token pricing turned that appetite into a runaway bill.

Much of that spend went to engineers coding with agentic tools, so the meter was really measuring how deeply these companies had wired their development cycle to a vendor they cannot audit. The frontend servers and databases that serve their users still run their own code; the exposure is in how the software gets built, and it is real all the same: on price, on availability, and on the integrity of what the model writes. Cost is the symptom. Dependency is the disease.

To be upfront: a frontier-lab API does not belong inside your trusted computing base. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The price is subsidized today and set unilaterally tomorrow. The values are encoded in weights you cannot read. The refusal surface expands without notice. The model itself can become unavailable by an order you had no part in. Coding is the gentlest way to take this dependency, since the output is a durable artifact you keep and can review, and it is the one use I would readily defend. Wiring a frontier model into the live request path is the opposite, a standing bet on all four of those variables at once. Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours, wherever you run it. The rest of this piece is the case for moving the model back inside the boundary.

Frontier labs run at a loss, and the incentive is to subsidize usage now and raise prices after customers depend on them. Once a company routes its product path through a frontier model, the vendor holds the price, the rate limits, the retention policy, the routing, the refusal behavior, the model class, and the output itself. Any of them can move without warning. A price increase on a dependency you cannot replace is not a negotiation; it is an invoice.

What matters more than the price is whether you can keep a copy you control. A library, a compiler, or a self-hosted database is yours to pin; you can run the exact version you tested for as long as you like. A frontier API offers no such copy. The vendor can change the behavior of the model you built against, deprecate it, or gate it, and you have nothing pinned to fall back on. You rent capability, and you rent it on terms the landlord rewrites at will.

The compiler that lies

The clearest illustration arrived with Anthropic’s Fable 5. Its system card disclosed that, for queries aimed at developing frontier language models, the model’s safeguards would not be visible to the user and the model would not fall back to a different model. Instead the safeguards would limit its effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning. Anthropic drew an explicit line between this category and its interventions for cybersecurity and biology, which were visible. By the company’s own estimate the covert path would touch about 0.03% of traffic, concentrated in fewer than 0.1% of organizations.

After a backlash from the technical community, Anthropic reversed course within days and now routes that category to a visible fallback to an Opus model. The system card’s changelog records the change, to the company’s credit. The body now describes the new fallback behavior, and the original covert mechanism survives in full only in the archived copy linked above. The speed of the retreat is its own signal. A safeguard withdrawn almost as soon as it shipped looks like an impulse, not a settled policy.

Picture a compiler that builds your code faithfully until it notices that the code is itself a compiler, a potential rival, and then quietly emits a slower, subtly faulty binary. A compiler that refuses such a job outright is annoying but legible; you see the refusal and route around it. One that silently degrades the binary for policy reasons is a supply-chain nightmare. That last case is what Fable’s frontier-LLM safeguard amounted to. An intervention that covertly degrades the work is sabotage in everything but intent. When the output is worse, the engineer cannot tell whether the cause is a bad prompt, a bug in their own code, the model’s ordinary variance, a hidden policy trigger, or a vendor protecting its lead. Silent computational sabotage is not hypothetical. A sabotage framework called fast16, used in a 2005 attack and later analyzed by SentinelLABS, patched high-precision simulation code in memory to tamper with the results. It corrupted high-explosive implosion physics so the answers came back subtly and confidently wrong. Researchers read the operation as aimed at a nuclear-weapons program, most likely Iran’s. The analogy holds on one axis, the nature of undetectable degradation: in both cases the victim cannot tell a corrupted result from a correct one. The dangerous sabotage is never denial of service. It is plausible but corrupted output.

The covert path also rests on a classifier, and the same imprecise detection machinery that produces everyday false positives would decide which queries count as frontier-LLM development. The 0.03% estimate assumes the trigger fires only where intended. It also fires on ordinary AI work, and users have already reported their models turning dull on basic tasks.

Unfalsifiability cuts both ways. I cannot prove covert degradation is touching my work, since the mechanism is undetectable by design. What I can report is my own behavior. I have run Fable 5 only on a codebase where silent sabotage would not have mattered, and I have kept it away from the code that does, because the model might quietly degrade the result. The capability never had to fire to change how I work, and that is the cost. An intervention a customer can neither detect nor disprove poisons the tool for any task worth protecting. The vendor designed it, shipped it, gated it on an imprecise classifier, and made no commitment against using it again. The damage that outlasts the reversal is the loss of trust, and a dependency you cannot predict does not belong in production.

The model enforces someone else’s policy

Silent degradation is the dramatic failure. The daily one is the refusal surface, and it has been expanding. Reading some Old Germanic runes earned me an acceptable-use flag. Drafting rap lyrics for an Activ8te Cybersecurity track was detected as violative cyber use. They read as anecdotes until you set them beside Anthropic’s own statement, after the Fable backlash, that users “may experience more false positives as we refine these classifiers to respond to new threats.” The company says it is working to reduce them, but you do not control that dial. My previous post described a routine vulnerability assessment that stalled when the model refused after two of seven steps. The friction lands on defenders doing legitimate work. Refusals are only part of it: Google’s Gemini policy reserves the right to throttle you or change which model answers your request, so the boundary you build on holds routing and enforcement, not just a model.

The deeper issue is that the policy layer encodes a worldview. A frontier model carries its vendor’s moderation assumptions, national context, and institutional incentives into every output. American labs have trained their models toward a domestic political even-handedness that treats opposing positions as equally acceptable regardless of the underlying facts. Whatever one makes of that choice inside the United States, it travels poorly. A European bank, an Indian insurer, or a Japanese manufacturer may not want an American lab’s policy worldview embedded in a business process. This is jurisdictional mismatch, and it is structural rather than a matter of culture-war bias. Google’s models follow instructions well enough that a system prompt anchoring political judgments to independent international bodies restores usable behavior, and the need for that workaround proves the embedded default it has to correct.

Access can vanish overnight

The sharpest proof of the dependency problem is political rather than commercial. Anthropic launched Fable 5 on June 9. Days later the Commerce Department, in a letter from Secretary Howard Lutnick to CEO Dario Amodei, placed Fable 5 and Mythos 5 under export control covering every foreign national, including non-citizens inside the United States and on Anthropic’s own staff. The scope left no clean way to comply, so Anthropic disabled both models for all customers worldwide and kept only Opus 4.8 and its lesser models online. The stated trigger was a “jailbreak” that could bypass the safeguard meant to stop Fable from finding software vulnerabilities. Anthropic said the government produced only verbal evidence of a narrow, non-universal jailbreak, and it warned that the same standard applied across the industry would halt every new frontier-model deployment.

Export controls exist to keep capability away from foreign adversaries. This one was reportedly set in motion by Amazon, Anthropic’s largest investor and the cloud host that runs its models. Axios and the Wall Street Journal reported that Amazon’s chief executive, Andy Jassy, called Treasury Secretary Scott Bessent and other senior officials late that night and handed over an internal report showing Amazon researchers had bypassed Fable 5’s guardrails to pull out information usable for cyberattacks. Anthropic’s own backer, with some $13 billion invested and a reported $100 billion of Anthropic’s own spend committed back to AWS, gave the government the case that took the model down days after the launch it helped power, and every customer who had built on Fable 5 lost it overnight with no say and no recourse. The episode followed an existing rift, since the administration had already moved to bar Anthropic from federal supply chains after the company refused military use of its models for surveillance and autonomous weapons. Closed frontier access is politically contingent, a single point of failure a third party can trip without your consent. The worry is not confined to engineers. Canadian Prime Minister Mark Carney read the episode as evidence of the danger in leaning on a handful of American providers, warning that it is never a good idea to have one option.

Set aside whether the order was justified; the rationale does not survive contact with the technology. Anthropic itself noted that rival public models, including OpenAI’s GPT-5.5, can be driven to the same bug-finding behavior, and those models stayed online. My own vulnerability-discovery work reached the same conclusion from the other side: open-weight models successfully find new vulnerabilities end to end, because the capability lives in the orchestration rather than in any single frontier model. An export control on one model removes it from law-abiding customers and leaves the capability untouched for anyone who downloads open weights. Resilience requires a model no one else can recall.

Open weights move the model inside your trust boundary

The constructive response is a hierarchy of computation. Solve the problem with classical, deterministic algorithms wherever they suffice, and most problems need no model at all. Where generative AI genuinely helps, reserve frontier models for offline work that tolerates their volatility: quality assurance, synthetic data, evaluation, and red-teaming. Run production on open-weight models, where you control policy, data, and workflow end to end. Open weights are cheaper, though that is the least of it. An open-weight model sits inside your trust boundary: you can inspect it, fork it, pin a version forever, and run it where no remote order can switch it off.

The open-weight frontier is moving fast, and most of the recent motion comes from China. Z.AI’s GLM 5.1 carries open weights you can download today, and it already drove autonomous vulnerability discovery in my earlier work. On June 13, Z.AI announced GLM 5.2, reaching its paid coding tiers first, with MIT-licensed open weights promised the following week; Z.AI cast that open release as a direct answer to tightening US export controls. The company open-sources its reinforcement-learning infrastructure, Slime, as well. While the Chinese state runs on a different value system from the West, it also gives its AI companies something the United States no longer provides: stability and direction. A US lab just shipped its best model and lost it four days later to an order it did not anticipate and still does not fully understand. A company planning a multi-year roadmap can work within a clear and durable policy, even one it dislikes, far more readily than within reversals it cannot predict. The present trajectory favors the labs operating under predictable rules, and at the moment those rules are not American.

You can benefit from the result without betting on that stability. Calling Z.AI’s hosted API would only swap one foreign frontier vendor for another; the independence comes from the weights themselves. Downloaded under a permissive license and run on your own hardware, an open-weight model takes every government out of your execution loop, the chaotic one and the stable one alike. It is yours to run in production, to pin, and to replace, and the capability gap with proprietary frontier models keeps narrowing. I would rather the stable, open alternative came from Europe, and a European open-weight foundation model would be a real counterweight. That is a hope more than a plan, and it does not change the choice in front of an engineering team today.

Owning the model is the start. Owning the data that improves it is the rest, and here the labs are quietly taking the most valuable artifact. The reasoning trace that produces an answer is worth more than the answer. It is the record you would use to verify a conclusion, debug a workflow, or train a successor. It is also the part frontier labs increasingly filter, summarize, or withhold while billing for it: OpenAI charges for reasoning tokens it never returns, and Gemini bills the full thoughts but emits only a summary. A collapsed chain of thought protects nothing the customer cares about; it removes the audit trail and keeps the one input that transfers to a future model. An open-weight model gives it back in full. Run the weights yourself and nothing stands between you and whatever reasoning the model exposes, so the trace it emits is yours to keep in full, the very artifact a closed API now redacts.

The practical response is to stop treating token streams as exhaust. Capture every trajectory you generate, as much of the trace as the model exposes, and store it. Curated, those streams become supervised fine-tuning data. Scored against a verifier, they become the reward signal for reinforcement learning from verifiable rewards. A team that has been saving its trajectories can fork to an open-weight model and recover most of the capability it was renting, because it owns the input that actually transfers. A team that let the lab swallow its reasoning has nothing to fall back on.

Meta is already doing a version of this. Its Applied AI Engineering group has engineers generating programming challenges to produce reinforcement-learning data, training the in-house MetaCode assistant to lean less on Anthropic’s Claude. The company writing some of the largest checks to frontier labs is funding its own exit at the same time.

IronCurtain, my open-source project, is built for exactly this. It is a runtime for AI agents that runs them on your own machine under a security policy you write, and vulnerability discovery is one of the example workflows it ships. With trajectory capture turned on, it records the full request and response of every model call, including whatever reasoning trace the model exposes, which is the raw material an SFT or verifiable-reward pipeline needs. It already runs across harnesses such as Codex, Claude Code, and Goose, drives open-weight models like GLM through any compatible endpoint as readily as proprietary ones, and lets a workflow assign a different model to each state. The forward direction is to curate those captured runs and train smaller open-weight models that specialize in a single state, hypothesis generation or harness construction, so the trace from an expensive frontier run becomes the training data that teaches a cheaper model the same job. As I have argued before, vulnerability discovery is an orchestration problem, not a frontier-model problem, and the workflow already runs end to end on open weights. Running it on local consumer GPUs, well below the cost of frontier APIs, is the goal that remains.

Frontier models outside the trust boundary

Frontier models have a place. They are tools for leverage, evaluation, and exploring the edge of what is possible. The mistake is letting them become the invisible policy engine inside a production system, where their price, their values, their refusals, and their availability are set by someone else. Use them from outside the boundary. Keep the model you depend on auditable, forkable, and yours.