Frequently Asked Questions

What is Nepenthes?

Nepenthes is a tarpit — a piece of software that traps web crawlers in an endless maze of fake pages. It is specifically designed to target AI/LLM crawlers that scrape websites for training data, but it will catch any crawler that enters.

Why is it called Nepenthes?

Nepenthes is a genus of carnivorous pitcher plants. Insects are lured inside by nectar, then slip down the smooth walls into a pool of digestive fluid. They cannot escape. The software works the same way for crawlers.

Who created the original Nepenthes?

The original was created by an anonymous developer known as “Aaron” and published at zadzmo.org. It was written in Lua and released in mid-January 2025. The project gained widespread attention after being featured in Ars Technica and boosted by Cory Doctorow.

What is Nepenthes-Py?

Nepenthes-Py is a Python rewrite of the original. We rebuilt the core functionality from scratch in Python to make it more accessible, easier to install, and easier to extend. It maintains full feature parity with the original: silos, Markov generation, drip-feeding, bogon filtering, statistics API, and more.

Will this affect legitimate search engines?

Yes. There is currently no reliable way to distinguish between AI training crawlers and search engine indexing bots. Any site this software is applied to will likely disappear from search results. This is a fundamental trade-off.

How much server resources does it use?

The original creator likened the cost to running a cheap virtual machine or Raspberry Pi. However, some crawlers are extremely aggressive with high concurrency. Misconfiguration — especially with zero_delay mode — can easily overwhelm your server. Monitor the /stats endpoint carefully after deployment.

Does data poisoning actually work?

Research published in Nature (July 2024) proved that AI models collapse when trained on recursively generated data. A 2025 ICLR paper showed that even 0.1% synthetic data in training sets can trigger model collapse. However, AI companies are developing countermeasures, so the long-term effectiveness is uncertain.

Is this legal?

The legal landscape is evolving. In Ziff Davis v. OpenAI (Dec 2025), a court ruled that robots.txt files are not legally enforceable technological controls under the DMCA. Deploying a tarpit on your own server for your own domains is generally within your rights as a site operator, but consult legal counsel for your specific jurisdiction.

How is this different from Iocaine, Quixotic, etc.?

Several anti-AI tools have emerged since Nepenthes:

Iocaine — by Gergely Nagy. Focuses more on poisoning than trapping. Reported 94% bot traffic reduction.
Quixotic — by Marcus Butler. Feeds fake content to scrapers that ignore robots.txt.
KonterfAI, caddy-defender, markov-tarpit — Various community tools with similar goals.

Nepenthes-Py combines the tarpit approach (infinite maze + delays) with poisoning (Markov babble) in a single, easy-to-deploy Python package.

Can AI companies detect and bypass tarpits?

Theoretically, yes. In practice, as the original creator noted, he has “2 million lines of access log that show that Google didn't graduate.” Major crawlers continue to fall into tarpits. Only OpenAI's crawler has been observed escaping.

How do I contribute?

The Python rewrite is open source under the MIT license. Contributions are welcome on GitHub. Whether it's new templates, better Markov generation, performance improvements, or documentation — all help is appreciated.