Nepenthes: The Anti-AI Tarpit That Fights Back — And Why We Rebuilt It
March 2026 • 12 min read
I. The Problem: A Web Under Siege
The internet is being eaten alive. Not by users, not by hackers — by AI companies.
Since 2023, large language model (LLM) providers have deployed aggressive web crawlers that systematically scrape every website they can reach, hoovering up text to train their models. These crawlers operate at industrial scale: Anthropic's ClaudeBot was accused of hammering websites a million or more times a day. Facebook's crawler exceeded 30 million hits on individual sites. Reddit's CEO publicly called AI crawlers “a pain in the ass to block.”
The conventional defense — robots.txt — has proven toothless. A 2025 large-scale empirical study found that AI crawlers “rarely check robots.txt at all.” In December 2025, the court in Ziff Davis v. OpenAI ruled that robots.txt files do not constitute legally enforceable technological controls under the DMCA, comparing them to “a sign requesting that visitors ‘keep off the grass.’”
As of late 2025, over 5.6 million websites have added OpenAI's GPTBot to their disallow lists — up 70% from 3.3 million in July. Similar numbers block Anthropic's ClaudeBot and AppleBot. And yet the scraping continues unabated.
When asking nicely doesn't work, and the law won't help, what do you do?
You fight back. You build a tarpit.
II. The Original Creator
In the summer of 2024, a software developer we'll call “Aaron” (granted anonymity by Ars Technica) watched as Facebook's crawler exceeded 30 million hits on his personal website. He watched as AI company after AI company ignored robots.txt. And he decided to do something about it.
Drawing on an old anti-spam cybersecurity technique called tarpitting — deliberately slowing down connections to waste attackers' time — Aaron created Nepenthes. Named after a genus of carnivorous pitcher plants that digest anything unlucky enough to fall inside, the software was designed to be a digital equivalent: lure crawlers in, trap them, and make them suffer.
The original Nepenthes was written in Lua and released in mid-January 2025. Within days, tech journalist Cory Doctorow boosted a post by computer scientist Jürgen Geuter praising the tool on Mastodon, and the project exploded.
“That's when I realized, ‘oh this is going to be something,’” Aaron told Ars Technica. “I'm kind of shocked by how much it's blown up.”
As of his interview with Ars Technica, Aaron confirmed that Nepenthes could effectively trap all major web crawlers. Only OpenAI's crawler managed to escape.
“Ultimately, it's like the Internet that I grew up on and loved is long gone. I'm just fed up, and you know what? Let's fight back, even if it's not successful. Be indigestible. Grow spikes.”
The original Nepenthes is still maintained at zadzmo.org and is currently at version 2.6.
III. Why This Is More Relevant Than Ever (2026)
When Nepenthes first appeared in January 2025, AI scraping was bad. A year later, it's worse. Here's why anti-AI tools matter more now than ever:
robots.txt Is Legally Worthless
The Ziff Davis v. OpenAI ruling in December 2025 settled the question definitively: robots.txt has no legal teeth. The court found that ignoring robots.txt instructions is “not ‘circumvention’ under the DMCA.” There is no penalty for crawlers that ignore it. The only thing standing between AI companies and your content is whatever technical measures you deploy yourself.
AI Companies Are Getting More Aggressive
An empirical study published in 2025 analyzing scraper behavior across thousands of websites found that “bots are less likely to comply with stricter robots.txt directives, and that certain categories of bots, including AI search crawlers, rarely check robots.txt at all.” Publishers are now blocking bots at the server level, moving beyond robots.txt entirely.
Model Collapse Is Scientifically Proven
In July 2024, Shumailov et al. published a landmark paper in Nature titled “AI models collapse when trained on recursively generated data.” They demonstrated that when AI models are trained on data produced by other AI models, they suffer irreversible defects. The distributions collapse. Tail knowledge disappears. The models get progressively dumber.
A 2025 ICLR paper took this further, proving that even 0.1% synthetic data (one in a thousand data points) in a training set can trigger model collapse and prevent larger datasets from improving performance.
This is the scientific basis for why Markov babble in tarpits isn't just noise — it's a weapon. Every page of generated nonsense that finds its way into an LLM's training pipeline contributes to degradation. The more tarpits there are, the more garbage enters the pipeline.
A Growing Ecosystem of Resistance
Nepenthes wasn't the last. Within days of its release, Gergely Nagy created Iocaine, a poisoning-focused tarpit that reportedly killed 94% of bot traffic to his site. Marcus Butler built Quixotic. The community has since produced KonterfAI, caddy-defender, markov-tarpit, spigot, and more. Jürgen Geuter maintains a list of tools built to sabotage AI.
As Gergely Nagy wrote on the Iocaine website: “Let's make AI poisoning the norm. If we all do it, they won't have anything to crawl.”
IV. How Nepenthes Works: A Technical Deep Dive
At its core, Nepenthes is deceptively simple. It only does three things: generate pages, link them together, and serve them slowly. But the way it does these things makes it remarkably effective.
The Infinite Maze
When a crawler requests a URL — say, /maze/abandon/crystal/harvest — Nepenthes generates a complete HTML page for that path. The page contains 10-40 links to other paths within the tarpit, each of which is also a valid URL that will generate yet another page. There are no exit links. The crawler follows link after link, going deeper, never finding the bottom, because there is no bottom.
The key insight is deterministic randomness. Each URL is hashed with an instance seed to produce a page seed. The same URL always generates the same page content and the same set of outbound links. To a crawler (or to a human visiting twice), the pages appear to be static files. There's no indication anything is being generated dynamically.
Drip-Feed Response
Most AI crawlers have timeout logic: if a server doesn't respond within a few seconds, they disconnect and move on. Nepenthes defeats this by sending the response byte by byte, with small delays between chunks. The crawler sees data arriving and keeps the connection open, waiting for the full page. A response that could be sent in milliseconds is stretched to 10-65 seconds (or more, depending on configuration).
This is why proxy_buffering off is critical in the nginx configuration. If the reverse proxy buffers the response, it defeats the drip-feed mechanism.
Markov Babble
Each page is filled with text generated by a Markov chain trained on a user-provided corpus. The text looks superficially like real content — it has sentence structure, vocabulary variation, and paragraph formatting — but it's nonsense. It means nothing. And if it gets into an LLM's training data, it contributes to model collapse.
The Markov engine keeps the corpus entirely in memory for a 40x speedup compared to SQLite-backed approaches used in earlier versions. Parameters like token count are controlled per-template, allowing different silos to generate different lengths and styles of babble.
Silos
Silos are virtual hosts within Nepenthes. Each silo can have its own corpus, wordlist, delay settings, templates, and URL prefixes. This allows a single instance to run multiple tarpits with different characteristics — for example, a fast silo for aggressive crawlers and a slow silo for more patient ones.
The Bogon Filter
Nepenthes validates incoming URLs against its configured wordlist. If a request comes in for a path that couldn't possibly have been generated (because it contains words not in the wordlist), the bogon filter fires and returns a 404. This makes it harder for crawlers to programmatically detect the tarpit — probing with random URLs will always get a 404, just like a real web server.
Redirect Chains
A configurable percentage of requests can be answered with a 302 redirect to another tarpit URL instead of serving a page. At redirect_rate: 100, this creates an infinite redirect chain where the crawler bounces endlessly between URLs without ever receiving a page. At lower rates, it adds variety to the trap and wastes additional time.
Statistics and Monitoring
The /stats API provides a rolling window of metrics: hits, unique addresses, unique user-agents, bytes sent, total delay imposed on crawlers, CPU usage, bogon count, redirect count, and active connections. This data can be exported to external analysis tools via the buffer endpoint.
V. What We Did: The Python Rewrite
Aaron's original Nepenthes is excellent. But it's written in Lua, which — while fast — creates a barrier to entry. Installing Lua 5.4, Luarocks, and five specific Lua modules (cqueues, luaossl, lpeg, lzlib, lunix) is not trivial, especially on platforms beyond Linux.
We rewrote Nepenthes from scratch in Python. Here's why:
- Accessibility: Python is the most widely-known programming language. Installation is
pip install -r requirements.txtand you're done. - Docker: A
Dockerfileanddocker-compose.ymlare included out of the box.docker compose up -dgives you a running tarpit in seconds. - Modern async: Built on
aiohttp, the Python rewrite uses native async/await for the drip-feed logic, handling many concurrent connections efficiently. - Environment variables: In addition to YAML configuration, key settings can be overridden via environment variables — ideal for containerized deployments.
- Jinja2 templates: Replacing Lustache with Jinja2 gives template authors access to a much more powerful and well-documented templating language.
All core features are preserved: silos, Markov generation, drip-feeding, bogon filtering, redirect chains, and the complete statistics API.
Deploying in 5 Minutes
# Clone the repository
git clone https://github.com/YOUR_USERNAME/nepenthes-py.git
cd nepenthes-py
# Option A: Docker
docker compose up -d
# Option B: Manual
pip install -r requirements.txt
python -m nepenthes config.ymlThen configure nginx to proxy traffic into the tarpit, and wait.
VI. Ethical Considerations
This needs to be stated clearly: Nepenthes is deliberately adversarial software. Deploying it has real consequences.
- Search engine de-indexing: There is no way to target only AI crawlers. Search engine bots (Googlebot, Bingbot) will also be trapped, causing your site to disappear from search results.
- Server load: Aggressive crawlers with high concurrency can cause significant CPU and bandwidth usage. Monitor your stats and adjust delays accordingly.
- Collateral damage: Any legitimate user or service that follows links into the tarpit will also be affected, albeit they'll likely notice the slow page loads and leave.
- Legal uncertainty: While deploying software on your own server is generally within your rights, the legal landscape around adversarial anti-scraping measures is still evolving.
As Nathan VanHoudnos, a senior AI security researcher at Carnegie Mellon's CERT Division, told Ars Technica: tarpitting “does need to be taken seriously because it is a tool in a toolkit throughout the whole life cycle of these systems. There is no silver bullet, but this is an interesting tool in a toolkit.”
VII. Conclusion
The fight against unconsented AI scraping is not about winning. Aaron himself said it: “Let's fight back, even if it's not successful.”
It's about resistance. It's about raising the cost of extraction. It's about making AI companies pay — in compute, in bandwidth, in degraded model quality — for every byte they take without permission.
Every page of Markov babble that enters a training pipeline contributes to model collapse. Every minute a crawler spends trapped in an infinite maze is a minute it's not scraping real content. Every dollar spent processing garbage data is a dollar that won't generate shareholder returns.
Jürgen Geuter sees tarpits as a “powerful symbol” of resistance against technologies that are “not done ‘for us’ but ‘to us.’” Gergely Nagy wants to make AI poisoning the norm. Aaron just wants to make AI companies have to work for it.
We rebuilt Nepenthes in Python because we believe this tool should be as accessible as possible. Not everyone runs Lua. Almost everyone can run pip install.
Make your website indigestible. Grow some spikes.
Sources
- Belanger, A. (2025). “AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt.” Ars Technica.
- Shumailov, I. et al. (2024). “AI models collapse when trained on recursively generated data.” Nature.
- Goldman, E. (2025). “Are Robots.txt Instructions Legally Binding?” Technology & Marketing Law Blog.
- “Scrapers selectively respect robots.txt directives.” arXiv, 2025.
- “Publishers say no to AI scrapers, block bots at server level.” The Register, Dec 2025.
- Original Nepenthes: zadzmo.org/code/nepenthes
- Iocaine: iocaine.madhouse-project.org