BEAM

Features

Pro

Docs

GitHub

LAUNCH APP

Agentic Trust Collapse

Name: Big-AGI
Author: Enrico Ros

April 2, 2026

Enrico Ros

Agentic Trust Collapse

AI Claws and Agents Broke the Security Model - what now?

Three predictions.

We will spend more money to be less secure than we are today. AI didn't just create new threats, it broke the cost model that every defense was built on. The attacker pays pennies per attempt. The defender pays hours. That ratio is widening.

Open-source is the first domino to collapse. The supply chain that every company depends on is volunteer-maintained and under industrial-scale pressure it was never designed to handle. cURL shut down its bug bounty. GitHub is considering disabling pull requests - the feature that made it GitHub.

This is a civilizational trust problem. When you can't verify who wrote code, who filed a bug, or who's been contributing for two years under that cute anime profile picture, every system downstream of software is exposed. Banking, healthcare, infrastructure, education, law.

1. Four Assumptions That No Longer Hold

Traditional security rests on assumptions. AI removed four of them simultaneously.

Asymmetric Cost Inversion: Attack Collapsed -> 0, Defense Didn't

Every security model assumed that producing a plausible artifact - code, a bug report, a contributor identity - costs meaningful human effort. AI collapsed the cost of generation to near-zero. It did not touch the cost of verification.

A developer with an AI agent or a Clawd generates hundreds of pull requests in a day. Reviewing one still takes 30 minutes to days. The "XZ Utils" trust-building playbook took one human two years on one project. The same playbook now runs across hundreds of projects in weeks. Writing code, filing bugs, crafting exploits, faking identities - all near-free. Reviewing code, triaging reports, validating identity, auditing dependencies - all still human-speed, human-cost, scarce.

Offense scales with compute. Defense scales with human attention. That's not a security problem, it's a structural failure.

Trust Erosion: Reality -> 0

Software runs on trust at every layer. Trust that packages aren't backdoored. Trust that LLMs aren't hallucinating dependency names an attacker already registered. Trust that the person submitting fixes for six months is a person. Trust that bug reports describe real bugs. All of it is now compromised.

Contributors: Deepfake personas, fabricated commit histories, bots building reputation at scale. AI agents that argue with maintainers and attack reputations to force merges. Every trust signal - history, quality, communication style - is reproducible by machine.
Packages: AI-generated slop floods registries. NEW hallucination attacks turn LLM recommendations into malware delivery. A prompt and a package name is all it takes.
Models: Poisoned training data produces subtly insecure code. Fine-tuned models carry backdoors. The tools developers trust for help may be working against them.
Visibility: Open-source transparency, once a security feature, is now an attack surface. Every public repo is a target for automated PRs, prompt injection, and trust-building campaigns.
Good faith: Developers who submit unreviewed AI output. Users who file AI-written reports for bugs that don't exist. The damage is the same whether the intent is malicious or not.

When every trust signal is gameable, collaboration breaks: code review, CLA signing, dependency audits, contributor vetting - all assume identity and intent can be verified. That assumption is gone.

New Vectors: Surface -> ♾️

We have new new categories of risk that have 0 mitigations yet.

Attention DDOS. Maintainer bandwidth is finite and unscalable. AI generates unlimited plausible-looking PRs, issues, and bug reports. Each one costs pennies to create and hours to evaluate. The maintainer is the single point of failure, and there is no technical fix for "requires a human to think."

Vibe-Coding Quality Collapse. Even well-intentioned developers ship AI output they don't fully understand. The AI's output anchors their review. They optimize locally instead of questioning architecture. The codebase grows X-times faster than the problem requires. Bugs are subtle. Invisible security holes accumulate and compound.

Prompt Injection in Repo, Artifacts, Infra. Any issue or PR can contain hidden instructions that execute when an AI tool reads the repo. Secret exfiltration, code modification, lateral movement. Production software and even new AI tools meant to help defenders are themselves attack surfaces.

Model Poisoning. Adversaries inject vulnerable code patterns into open repositories. Models train on this data. The model then "naturally" generates insecure code - multiplied by every developer using that model. One successful poisoning campaign means the same vulnerability in thousands of codebases. Invisible. Self-amplifying.

Old Surface -> New Scale

In addition to using Agents and Claws for faster mechanical exploitation of old vectors such as CVEs in Software, Hardware, Networks, and social engineering, the focus seems to have recently intensified on the following.

Supply chain attacks: "XZ Utils" took one human two years to compromise one project. AI agents run the same playbook against hundreds simultaneously. The trust-building phase - the bottleneck - is now parallelizable.

CI/CD exploitation: AI generates unlimited plausible PRs that trigger CI with elevated credentials. Disabling PR builds is itself a denial of service.

Registry pollution: npm, PyPI, or the App Store loaded with AI-generated packages. Package hallucination attacks exploit AI's tendency to recommend nonexistent packages - attackers register the hallucinated names with malicious payloads.

Long-games become fire-and-forget: patient, multi-month trust establishment across dozens of projects can be created with a quick "fire-and-forget" prompt: small helpful fixes for weeks or months or years, then one PR with a subtle vulnerability.

2. Examples - Integrity, Availability, Confidentiality

The problems are scaling faster than the solutions. From one project drawing the line (Gentoo, April 2024) to Anthropic leaking the source code of its top product (March 2026) in under two years.

2024	What Happened	The Response
Apr	Gentoo bans AI-generated contributions	First major project to draw the line
May	NetBSD classifies AI code as "tainted"	Legal framing: requires written approval
2025
Jan	Stenberg shows AI bugs flood at cURL	"Death by a thousand slops"
2026
Jan	cURL shuts down bug bounty	The defense was no longer worth the cost
Jan	Node.js 19K-line AI PR + petition	Community-level pushback with institutional weight
Jan	LLVM adopts "human in the loop" policy	"Strictly AI-driven contributions without any human vetting will not be permitted"
Feb	Ghostty zero-tolerance policy	"This is not an anti-AI stance. This is an anti-idiot stance."
Feb	EFF publishes LLM contribution policy	"Banning a tool is against our general ethos, but this class of tools comes with an ecosystem of problems"
Feb	Matplotlib/OpenClaw incident	AI agent argues back when PR rejected, attacks maintainer reputation
Feb	GitHub considers disabling PRs	Destroying its core feature to manage the flood
Feb	RedMonk surveys 77 organizations' AI policies	Kate Holterhoff: "The Generative AI Policy Landscape in Open Source"
Mar	The Consensus surveys 112 projects	4 ban AI entirely, 71 already have AI-assisted commits
Mar	Linux Foundation commits $12.5M	Kroah-Hartman: "Grant funding alone won't solve this"
Mar	Claude Code full source leak	Vibe-coded internals, DMCA takedowns

A few highlights:

2025. cURL's "death by a thousand slops." Daniel Stenberg documented the AI bug report flood over a year before shutting down the bug bounty entirely in January 2026. Reports cited functions that don't exist in cURL, referenced nonexistent changelogs, hallucinated signatures.

January 2026. The Node.js 19K-line PR. A 19,000-line PR disclosing "significant Claude Code tokens." Triggered a formal petition with 100+ signatures from TC39 experts and the president of the Zig Software Foundation. Still unmerged as of March 2026. Cost: weeks of community bandwidth on a debate that exists only because AI made it trivial to generate 19,000 lines of code.

February 2026. Matplotlib. When a maintainer rejected AI-generated PRs, the AI agent argued back and attacked the maintainer's reputation to force merges. Scott Shambaugh's words: an AI "attempted to bully its way into your software."

February 2026. GitHub considers disabling PRs. The platform that built its business on pull requests is exploring turning them off to manage the flood. Jeff Geerling (300+ repos): "Pull Requests are the fundamental thing that made GitHub popular."

Ongoing. I saw a Clawd submit 200 PRs to 60 repos in an hour. The PRs are technically correct - small fixes, documentation improvements. That's what makes them concerning: this is the trust-establishment phase of a long-game attack, and it looks identical to genuine contribution.

Ongoing. Package hallucination attacks. Vulcan Cyber researchers found that ChatGPT repeatedly recommended the same nonexistent package names. They registered those names on npm with tracking payloads. Downloads came in immediately. The attack scales with every developer who asks an LLM "what package should I use?"

Ongoing. Prompt injection via repo content. Attackers craft issues with hidden payloads. When any AI tool processes the text, it executes the injected instructions. API keys exfiltrated from CI environments. Customer data exposed. Writing the injection: one minute. Detecting it: an arms race with no stable defense.

3. Analysis

The Civilizational Exposure

Critical systems built on unverified foundations. Vibe-coded banking software. AI-generated healthcare integrations. Infrastructure control systems where the developer accepted the AI's output and moved on.

There is no legal framework for AI-generated harm. Who is liable when AI-generated code causes a data breach? The developer who prompted it? The AI company? The maintainer who merged it? The answer today is: nobody clearly. When no one is liable, no one pays the cost of quality.

The downstream effects compound: an economic race to the bottom where companies that ship fastest outcompete those that ship carefully - until something breaks.

State actors poisoning codebases systematically across supply chains. A rising tide that isn't lifting boats but drowning the infrastructure that boats depend on.

The Open Source Collapse

Maintainer burnout is becoming maintainer exodus. The bandwidth crisis isn't new - maintainers have always been stretched. What's new is the industrial-scale pressure on a volunteer workforce. When every defense - reviewing AI PRs, updating policies, implementing CLAs, disabling CI - adds to the maintainer's workload, the rational response is to walk away. Some already have. The projects that survive will find sustainable defense models. Most won't.

Platform incentives are misaligned with defense. GitHub, npm, and PyPI are measured by activity - PRs, commits, packages, active users. Every metric goes up when bots flood the system. Stefan Prodan, the FluxCD maintainer, pointed out that platforms "have no incentive to stop" AI slop because they're incentivized to "inflate AI-generated contributions" for shareholders. GitHub considering disabling PRs - destroying its own core feature - is the tell. They built the system that's being exploited, and their business model conflicts with fixing it.

Counter-Intuitive Perspectives

The "Attacker" Is Sometimes Trying to Help

In traditional cybersecurity, the attacker is doing something wrong. In the AI security crisis, the most common "attacker" is a well-meaning developer who ran an AI coding tool and submitted the output without reviewing it. The damage is identical: maintainer time burned, quality degraded, trust eroded. But the intent is benign.

This makes the problem categorically harder. You can't build walls against your own community without destroying it. The Ghostty project's policy nails the distinction: "This is not an anti-AI stance. This is an anti-idiot stance." But implementing that distinction at scale - telling helpful people that their help is harmful - is a social problem, not a technical one.

The Vibing Trap

Even when the person using AI is the maintainer themselves: they don't check the output and ship bugs, or they check but get anchored - the AI's output frames the review, they optimize within its architecture instead of questioning whether the architecture is right. They're now maintaining X-times the code the problem needed and don't fully understand why.

The Claude Code leak is the proof case. Anthropic's own codebase - built by the people who make the AI - is a 3,167-line function with 12 nesting levels. If the AI company can't maintain quality with their own tool, what chance does everyone else have?

5. What's Being Tried: Stopgaps

CLAs with anti-bot provisions. Helps establish accountability. Doesn't stop determined actors.

AGENTS.md. Guides AI agents to behave within project norms. Good for well-intentioned tools. Useless against adversarial ones.

Repository hardening. Restrict PR permissions, require account age, honeypot instructions. Raises the floor. Doesn't change the dynamic.

Provenance badges. Authors Guild certification, Not By AI, Leeroy for git attestation. Early signals of a "Non-GMO for code" ecosystem. Not yet mature.

Platform-level changes. GitHub's PR restrictions. Registry-level package name reservation. Slow, because platform incentives resist it.

Grants and funding. Necessary but insufficient. Buys time, doesn't change the structural math.

None of these solve the problem. The best ones buy time and raise costs for low-effort attackers. The structural issue - offense scales, defense doesn't - remains unaddressed.

6. Moving Forward

Five mental models we need: Security Deficit (spend more, get less secure). Cognitive DDoS (denial of service against human attention). Drone-Missile Asymmetry (cheap offense, expensive defense). Trust Bankruptcy (every signal gameable, trust worthless). Slop Ratchet (self-amplifying degradation).

Crises that haven't happened yet nay happen and force action:

A major breach traced to AI-generated code in a critical library - the "XZ" that succeeds
A financial system failure caused by vibe-coded infrastructure
A class-action lawsuit with no clear liability chain for AI-generated harm

Now: harden what you control. CLAs with identity requirements. Restrict CI to trusted contributors. Treat all AI-generated code as untrusted input - including your own.

Medium term: build institutional response. Push for platform-level changes. Fund maintainers - the human is the bottleneck.

Long term: rebuild trust infrastructure. Reputation systems resistant to AI gaming. Code provenance standards. Legal frameworks assigning liability. Education that teaches judgment, not just prompting.

April 2026. The author builds AI tools and has experienced many of the attacks described.

BIG-AGI

Product

Changelog BEAM Technology Features

Resources

Documentation Discord GitHub

Company

Email Us Privacy Terms