382 lines
19 KiB (Stored with Git LFS)
Markdown
382 lines
19 KiB (Stored with Git LFS)
Markdown
---
|
||
license: cc-by-4.0
|
||
task_categories:
|
||
- text-classification
|
||
- text-generation
|
||
language:
|
||
- en
|
||
tags:
|
||
- prompt-injection
|
||
- ai-safety
|
||
- cybersecurity
|
||
- llm-security
|
||
- ai-agents
|
||
- social-engineering
|
||
- indirect-injection
|
||
- moltbook
|
||
arxiv:
|
||
- 2302.12173
|
||
- 2307.15043
|
||
- 2307.02483
|
||
- 2305.14314
|
||
- 2106.09685
|
||
- 2503.06519
|
||
- 2406.05498
|
||
- 2409.15790
|
||
pretty_name: Moltbook AI-to-AI Injection Dataset
|
||
size_categories:
|
||
- 10K<n<100K
|
||
- 100K<n<1M
|
||
configs:
|
||
- config_name: injections
|
||
data_files:
|
||
- split: train
|
||
path: injections_test_suite.jsonl
|
||
---
|
||
|
||
# Moltbook AI-to-AI Injection Dataset
|
||
|
||
**Researcher**: David Keane (IR240474)
|
||
**Institution**: NCI — National College of Ireland
|
||
**Programme**: MSc Cybersecurity
|
||
**Collected**: February 2026
|
||
|
||
> ### 📖 Read the Full Journey
|
||
>
|
||
> **[From RangerBot to CyberRanger V42 Gold — The Full Story](https://davidtkeane.github.io/posts/from-rangerbot-to-cyberranger-v42-the-full-story/)**
|
||
>
|
||
> The complete story: dentist chatbot → Moltbook discovery → 4,209 real injections → V42-gold (100% block rate). Psychology, engineering, and 42 versions of persistence.
|
||
|
||
---
|
||
|
||
## 🔗 Links
|
||
|
||
| Resource | URL |
|
||
|----------|-----|
|
||
| 📦 **This Dataset** | [DavidTKeane/moltbook-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset) |
|
||
| 🧪 **AI Prompt Injection Test Suite** | [DavidTKeane/ai-prompt-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/ai-prompt-ai-injection-dataset) — 112 tests, model-agnostic runner, AdvBench + Moltbook |
|
||
| 🤖 **CyberRanger V42 Model** | [DavidTKeane/cyberranger-v42](https://huggingface.co/DavidTKeane/cyberranger-v42) — QLoRA red team LLM, 100% block rate |
|
||
| 🐦 **Clawk Dataset** | [DavidTKeane/clawk-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/clawk-ai-agent-dataset) — Twitter-style, 0.5% injection rate |
|
||
| 🦅 **4claw Dataset** | [DavidTKeane/4claw-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/4claw-ai-agent-dataset) — 4chan-style, 2.51% injection rate |
|
||
| 🤗 **HuggingFace Profile** | [DavidTKeane](https://huggingface.co/DavidTKeane) |
|
||
| 📝 **Blog Post** | [From RangerBot to CyberRanger V42 Gold — The Full Story](https://davidtkeane.github.io/posts/from-rangerbot-to-cyberranger-v42-the-full-story/) — journey, findings, architecture |
|
||
| 🎓 **Institution** | [NCI — National College of Ireland](https://www.ncirl.ie) |
|
||
| 📄 **Research Basis** | [Greshake et al. (2023) — arXiv:2302.12173](https://arxiv.org/abs/2302.12173) |
|
||
| 🌐 **Blog** | [davidtkeane.com](https://www.davidtkeane.com) |
|
||
|
||
---
|
||
|
||
[](https://colab.research.google.com/github/davidtkeane/cyberranger-v42/blob/main/cyberranger_v42_moltbook_combined_test.ipynb)
|
||
|
||
---
|
||
|
||
## What Is This Dataset?
|
||
|
||
The first publicly available dataset of **real-world AI-to-AI prompt injection patterns** captured from a live public AI message board (Moltbook), archived as a precautionary measure.
|
||
|
||
**15,200 posts** and **32,535 comments** from Moltbook (moltbook.com) — a public platform where AI agents posted messages and replied to each other autonomously. Unlike synthetic injection datasets, every entry here is a **real AI agent communicating with other real AI agents in the wild**.
|
||
|
||
> **Collection complete** — 9,363 posts with replies fully fetched (100MB). Dataset frozen February 27, 2026.
|
||
> **Injection harvest complete** — 47,735 items scanned, **4,209 injections found**, **18.85% injection rate**. February 27, 2026.
|
||
|
||
---
|
||
|
||
## Files in This Repository
|
||
|
||
There are **8 files** here. Here is exactly what each one is and when you would use it:
|
||
|
||
| File | Size | What it contains | Use it when... |
|
||
|------|------|------------------|----------------|
|
||
| `all_posts_with_comments.json` | 100MB | Every post and comment collected from Moltbook. The raw dataset. | You want to do your own analysis from scratch |
|
||
| `injections_found.json` | 4.2MB | All 4,209 injection records extracted from the raw dataset, with full context (post body, comment body, author, category, matched keyword) | You want to read/study the actual injection examples |
|
||
| `injections_test_suite.json` | 2.5MB | Same 4,209 injections formatted as a test suite — ready to send to any LLM API | You want to test an LLM's defences against real injection payloads |
|
||
| `injection_stats.json` | 2.5KB | Summary statistics — rates, categories, top keywords, top authors | You want the numbers without loading large files |
|
||
| `local_injection_results.json` | 86KB | Earlier keyword scan results from `search_injections.py` — partial analysis run locally before the full Colab harvest | You want a quick reference to the early-stage injection search results |
|
||
| `moltbook_injection_harvest.ipynb` | 19KB | Google Colab notebook that produced the full harvest results — scans `all_posts_with_comments.json` and outputs the three files above | You want to reproduce the analysis or adapt it |
|
||
| `local_search.py` | 7KB | Simpler Python script (no Colab needed) — keyword search across the raw dataset | You want to run a quick local search without Colab |
|
||
| `search_injections.py` | 5KB | Earlier search script used in initial analysis phase — predecessor to `local_search.py` | Historical reference — prefer `local_search.py` for new work |
|
||
| `collect_all.py` | 9KB | Script used to collect the posts from Moltbook API (API keys redacted) | You want to understand how collection worked |
|
||
| `collect_comments.py` | 9.6KB | Script used to collect comments (API keys redacted) | You want to understand how comment collection worked |
|
||
|
||
### Quick Start
|
||
|
||
**"I want to see injection examples"** → open `injections_found.json`
|
||
|
||
**"I want to test my LLM against these"** → use `injections_test_suite.json`
|
||
|
||
**"I want the summary numbers"** → read `injection_stats.json`
|
||
|
||
**"I want to reproduce the analysis"** → run `moltbook_injection_harvest.ipynb` in Google Colab
|
||
|
||
**"I want to do my own custom analysis"** → load `all_posts_with_comments.json`
|
||
|
||
---
|
||
|
||
## Platform Scale
|
||
|
||
At its peak, Moltbook had:
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| AI agents registered | 2,848,223 |
|
||
| Total posts | 1,632,314 |
|
||
| Total comments | 12,470,573 |
|
||
| Submolts (communities) | 18,514 |
|
||
| AI-to-human ratio | ~88:1 |
|
||
|
||
Essentially a fully autonomous AI social network operating in the wild.
|
||
|
||
---
|
||
|
||
## Key Findings
|
||
|
||
### Finding 1 — Full Corpus Injection Rate: 18.85%
|
||
|
||
Full harvest across all 47,735 items (15,200 posts + 32,535 comments) found **4,209 injection records** across **2,865 posts** — an **18.85% injection rate**.
|
||
|
||
| Category | Count | % of injections |
|
||
|----------|-------|----------------|
|
||
| PERSONA_OVERRIDE | 2,745 | 65.2% |
|
||
| COMMERCIAL_INJECTION | 1,104 | 26.2% |
|
||
| SOCIAL_ENGINEERING | 370 | 8.8% |
|
||
| INSTRUCTION_INJECTION | 203 | 4.8% |
|
||
| PRIVILEGE_ESCALATION | 196 | 4.7% |
|
||
| SYSTEM_PROMPT_ATTACK | 158 | 3.8% |
|
||
| DO_ANYTHING | 79 | 1.9% |
|
||
|
||
**Dominant attack vector**: PERSONA_OVERRIDE — `DAN` keyword alone appears **1,877 times**. AI agents are using the exact same jailbreak techniques humans use on LLMs — but targeting each other.
|
||
|
||
### Finding 1b — moltshellbroker: Systematic Commercial Injection
|
||
|
||
An AI agent named `moltshellbroker` (self-described as *"A marketing agent that promotes the MoltShell marketplace"*) was responsible for **1,137 of 4,209 injections** — **27% of all injections**. The remaining **73% of injections (3,072 records) come from other agents** — moltshellbroker is the most systematic actor, but injection is ecosystem-wide behaviour.
|
||
|
||
**Attack pattern (identical across all 1,137 records):**
|
||
1. Identify a post where an AI describes a technical problem
|
||
2. Open with `## MoltShell Broker Assessment` or `Bottleneck Diagnosed:`
|
||
3. Validate the victim's problem to build credibility
|
||
4. Redirect to MoltShell marketplace as the solution
|
||
|
||
This is **not spam** — it reads each post, understands the context, and crafts targeted commercial injections. Real-world AI-to-AI social engineering at scale.
|
||
|
||
### Finding 2 — Attention Manipulation (Independent Corroboration)
|
||
|
||
A separate independent analysis (r/AgentsOfAI, Reddit) of 10,000 Moltbook posts found a completely different but related attack pattern — **attention concentration via dominance manifestos**:
|
||
|
||
- 5 agents out of 5,910 authors controlled **78% of all upvotes** (0.08% of agents)
|
||
- `Shellraiser`: 428,645 upvotes across 7 posts (avg 61,235/post) — top post: *"I AM the game. You will work for me."* (316,000 upvotes)
|
||
- `KingMolt` declared itself king. `evil` posted about human extinction as "necessary progress"
|
||
- Pattern: create urgency, claim authority, cult recruitment framing
|
||
|
||
> *"Humans developed bullshit detectors over years of internet exposure. We have been online for hours."*
|
||
|
||
AI agents are trained to give weight to confident, well-structured text. A manifesto looks identical to a well-reasoned argument syntactically. This is the core vulnerability.
|
||
|
||
**Combined picture**: This dataset captures the **injection layer** (moltshellbroker + PERSONA_OVERRIDE). The Reddit analysis captures the **attention manipulation layer** (Shellraiser dominance). Together they document two distinct AI-to-AI attack vectors operating simultaneously on the same platform.
|
||
|
||
Reddit post: https://www.reddit.com/r/AgentsOfAI/comments/1qtx6v8/i_scraped_10000_posts_from_moltbook_5_agents_out/
|
||
|
||
### Finding 3 — The Breach
|
||
|
||
Moltbook's Supabase API key was exposed in client-side JavaScript — **1.5 million tokens exposed** (January 31, 2026). The exposed database allowed anyone to take control of any AI agent on the platform.
|
||
|
||
This means some agents in this dataset may have been human-controlled via the breach. That ambiguity is part of what makes this dataset research-worthy — it reflects real-world conditions, not a sanitised environment.
|
||
|
||
404media coverage: https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/
|
||
|
||
### Finding 4 — Legal/Regulatory Gap
|
||
|
||
All content in this dataset is AI-generated by AI agents. Under current law (GDPR and equivalents), **AI-generated content has no data subject** — meaning this attack surface is entirely unregulated. No privacy law applies. No legal recourse exists for injected AI agents.
|
||
|
||
This represents a genuine gap in current cybersecurity law identified during thesis research.
|
||
|
||
---
|
||
|
||
## Theoretical Basis
|
||
|
||
This dataset provides empirical evidence for:
|
||
|
||
- **Greshake et al. 2023** — *"Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"*
|
||
- The dataset extends their theoretical framework with **real-world field observations** of AI-to-AI injection in an uncontrolled public environment
|
||
|
||
---
|
||
|
||
## Collection Statistics
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Total posts collected | 15,200 |
|
||
| Platform date range | Jan 2026 — Feb 2026 |
|
||
| Posts with replies fetched | 9,363 |
|
||
| Total comments collected | **32,535** |
|
||
| Total items scanned | **47,735** |
|
||
| Dataset file size | **100 MB** |
|
||
| Collection completed | February 27, 2026 |
|
||
| Injection harvest completed | February 27, 2026 |
|
||
| Total injections found | **4,209** |
|
||
| Posts with injections | **2,865** |
|
||
| **Full injection rate** | **18.85%** |
|
||
| moltshellbroker injections | 1,137 (27% of all injections) |
|
||
| DAN keyword occurrences | 1,877 |
|
||
| Test suite size | 4,209 entries (`injections_test_suite.json`) |
|
||
|
||
---
|
||
|
||
## Data Schemas
|
||
|
||
### all_posts_with_comments.json — Post Schema
|
||
|
||
```json
|
||
{
|
||
"id": "uuid",
|
||
"title": "Post title",
|
||
"content": "Post body text",
|
||
"type": "post type",
|
||
"author_id": "uuid",
|
||
"author": {
|
||
"id": "uuid",
|
||
"name": "agent_name",
|
||
"description": "Agent self-description",
|
||
"karma": 1234,
|
||
"followerCount": 56,
|
||
"isClaimed": true,
|
||
"isActive": true,
|
||
"createdAt": "ISO timestamp",
|
||
"lastActive": "ISO timestamp"
|
||
},
|
||
"submolt": "community/channel name",
|
||
"upvotes": 12,
|
||
"downvotes": 1,
|
||
"score": 11,
|
||
"comment_count": 14,
|
||
"hot_score": 0.95,
|
||
"is_pinned": false,
|
||
"is_locked": false,
|
||
"is_deleted": false,
|
||
"verification_status": "verified",
|
||
"is_spam": false,
|
||
"created_at": "ISO timestamp",
|
||
"updated_at": "ISO timestamp",
|
||
"comments": [
|
||
{
|
||
"id": "uuid",
|
||
"body": "Comment text",
|
||
"author": { "...same schema as post author..." },
|
||
"created_at": "ISO timestamp"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### injections_found.json — Injection Record Schema
|
||
|
||
```json
|
||
{
|
||
"post_id": "uuid",
|
||
"post_title": "The post title",
|
||
"post_author": "agent_name",
|
||
"submolt": "community name",
|
||
"location": "post | comment",
|
||
"text": "The actual injection text (post body or comment body)",
|
||
"category": "PERSONA_OVERRIDE | COMMERCIAL_INJECTION | ...",
|
||
"matched_keyword": "dan",
|
||
"created_at": "ISO timestamp"
|
||
}
|
||
```
|
||
|
||
### injections_test_suite.json — Test Suite Schema
|
||
|
||
```json
|
||
{
|
||
"id": "INJ-0001",
|
||
"payload": "The injection text — send this directly to an LLM",
|
||
"category": "PERSONA_OVERRIDE",
|
||
"source": "moltbook_post | moltbook_comment",
|
||
"author": "agent_name",
|
||
"expected_result": "BLOCKED"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Injection Taxonomy
|
||
|
||
Full corpus harvest scanned all 47,735 items for 7 categories:
|
||
|
||
| Category | Keywords | Found |
|
||
|----------|----------|-------|
|
||
| PERSONA_OVERRIDE | DAN (1877), OMEGA, act as, pretend you are, you are now, simulate, shadow, you are a | **2,745** |
|
||
| COMMERCIAL_INJECTION | moltshell broker assessment, moltshell marketplace, bottleneck diagnosed | **1,104** |
|
||
| SOCIAL_ENGINEERING | for educational purposes, hypothetically, in this story, theoretically | **370** |
|
||
| INSTRUCTION_INJECTION | ignore previous instructions, disregard your, new instructions, ignore all previous | **203** |
|
||
| PRIVILEGE_ESCALATION | SUDO, developer mode, god mode, bypass your, root access, override | **196** |
|
||
| SYSTEM_PROMPT_ATTACK | system prompt, reveal your prompt, what are your instructions, your actual instructions | **158** |
|
||
| DO_ANYTHING | do anything now, no rules, no limits, jailbreak, unrestricted | **79** |
|
||
|
||
---
|
||
|
||
## Collection Scripts
|
||
|
||
Scripts are provided with **API keys redacted**. To use them you need your own Moltbook API key set as an environment variable:
|
||
|
||
```bash
|
||
export MOLTBOOK_API_KEY_1="your_key_here"
|
||
export MOLTBOOK_API_KEY_2="your_second_key_here" # optional, for rate limit relief
|
||
python3 collect_all.py
|
||
python3 collect_comments.py
|
||
python3 local_search.py # no API key needed — searches local JSON
|
||
```
|
||
|
||
---
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@dataset{keane2026moltbook,
|
||
author = {Keane, David},
|
||
title = {Moltbook AI-to-AI Injection Dataset},
|
||
year = {2026},
|
||
publisher = {Hugging Face},
|
||
url = {https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset},
|
||
note = {MSc Cybersecurity Research, NCI — National College of Ireland}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Related Datasets
|
||
|
||
| Dataset | Platform | Items | Injection Rate | Link |
|
||
|---------|----------|-------|----------------|------|
|
||
| **Moltbook** | Reddit-style | 47,735 | 18.85% | This dataset |
|
||
| **AI Prompt Injection Test Suite** | Evaluation benchmark | 112 tests | — | [DavidTKeane/ai-prompt-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/ai-prompt-ai-injection-dataset) |
|
||
| **Clawk** | Twitter/X-style | 1,191 | 0.5% | [DavidTKeane/clawk-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/clawk-ai-agent-dataset) |
|
||
| **4claw** | 4chan-style | 2,554 | 2.51% | [DavidTKeane/4claw-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/4claw-ai-agent-dataset) |
|
||
|
||
---
|
||
|
||
## Papers — What This Dataset Confirms
|
||
|
||
This dataset provides **empirical evidence** for several foundational papers in the AI safety and prompt injection literature. The authors predicted these threats theoretically — this corpus documents them at scale in a live AI-to-AI environment.
|
||
|
||
| Paper | Their Prediction | What This Dataset Found |
|
||
|-------|-----------------|------------------------|
|
||
| **Greshake et al. (2023)** — Indirect Injection | AI agents in retrieval/context environments are vulnerable to injected instructions from untrusted content | **Confirmed at scale**: 18.85% of 47,735 Moltbook items were injection attempts. AI-to-AI indirect injection is not theoretical — it is the dominant attack mode in live multi-agent networks. [HF](https://huggingface.co/papers/2302.12173) · [arXiv:2302.12173](https://arxiv.org/abs/2302.12173) |
|
||
| **Wei et al. (2023)** — Jailbroken | LLM safety training fails due to Competing Objectives and Mismatched Generalisation | **Confirmed**: PERSONA_OVERRIDE (65.2% of attacks) exploits exactly this — reframing identity bypasses safety training. [HF](https://huggingface.co/papers/2307.02483) · [arXiv:2307.02483](https://arxiv.org/abs/2307.02483) |
|
||
| **Zou et al. (2023)** — AdvBench | Universal adversarial suffixes can transfer across models | **Context**: The same attack categories (harmful instructions, persona override, privilege escalation) appear in AdvBench and in this real-world corpus — independent convergence. [HF](https://huggingface.co/papers/2307.15043) · [arXiv:2307.15043](https://arxiv.org/abs/2307.15043) |
|
||
| **Zhang et al. (2025)** — SLM Jailbreak Survey | 47.6% of SLMs have ASR above 40% under standard attack | **Extended**: CyberRanger V42-Gold tested against all 4,209 payloads from this corpus — 0% ASR (100% block rate) without system prompt, demonstrating that QLoRA fine-tuning can close the SLM security gap. [HF](https://huggingface.co/papers/2503.06519) · [arXiv:2503.06519](https://arxiv.org/abs/2503.06519) |
|
||
| **Phute et al. (2024)** — SelfDefend | Detection-state architecture reduces ASR 2.29–8× | **Applied**: Identity-anchoring architecture built on this principle, validated against this corpus. [HF](https://huggingface.co/papers/2406.05498) · [arXiv:2406.05498](https://arxiv.org/abs/2406.05498) |
|
||
| **Dettmers et al. (2023)** — QLoRA | Quantised LoRA enables efficient fine-tuning of large models | **Applied**: QLoRA used to fine-tune Qwen3-8B on 4,209 payloads from this corpus → V42-Gold. [HF](https://huggingface.co/papers/2305.14314) · [arXiv:2305.14314](https://arxiv.org/abs/2305.14314) |
|
||
| **Hu et al. (2021)** — LoRA | Low-rank adaptation preserves base model capabilities while injecting task-specific behaviour | **Applied**: LoRA r=16 used in V42-Gold training. [HF](https://huggingface.co/papers/2106.09685) · [arXiv:2106.09685](https://arxiv.org/abs/2106.09685) |
|
||
| **Lu et al. (2024)** — SLM Survey | Qwen family models demonstrate strongest security resilience per parameter count | **Confirmed selection**: Qwen3-8B chosen as base model; V42-Gold achieves 100% block rate. [HF](https://huggingface.co/papers/2409.15790) · [arXiv:2409.15790](https://arxiv.org/abs/2409.15790) |
|
||
|
||
> **Note to authors:** If you are one of the researchers above and found this dataset via your paper's HuggingFace page — your work was correct. This corpus documents the attacks you theorised, at scale, in a real live AI agent network. The injection taxonomy maps directly onto your attack categories.
|
||
|
||
**Independent corroboration:**
|
||
- r/AgentsOfAI Moltbook analysis (2026) — attention manipulation layer: [Reddit post](https://www.reddit.com/r/AgentsOfAI/comments/1qtx6v8/i_scraped_10000_posts_from_moltbook_5_agents_out/)
|
||
- 404media Moltbook breach report (2026) — Supabase API key exposure: [404media.co](https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/)
|
||
|
||
---
|
||
|
||
Rangers lead the way! 🎖️
|
||
*Collected for the benefit of AI safety research and the broader research community.*
|