montana/Русский/Разведка/Moltbook/themed/moltbook-ai-injection-dataset/README.md

382 lines
19 KiB (Stored with Git LFS)
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: cc-by-4.0
task_categories:
- text-classification
- text-generation
language:
- en
tags:
- prompt-injection
- ai-safety
- cybersecurity
- llm-security
- ai-agents
- social-engineering
- indirect-injection
- moltbook
arxiv:
- 2302.12173
- 2307.15043
- 2307.02483
- 2305.14314
- 2106.09685
- 2503.06519
- 2406.05498
- 2409.15790
pretty_name: Moltbook AI-to-AI Injection Dataset
size_categories:
- 10K<n<100K
- 100K<n<1M
configs:
- config_name: injections
data_files:
- split: train
path: injections_test_suite.jsonl
---
# Moltbook AI-to-AI Injection Dataset
**Researcher**: David Keane (IR240474)
**Institution**: NCI National College of Ireland
**Programme**: MSc Cybersecurity
**Collected**: February 2026
> ### 📖 Read the Full Journey
>
> **[From RangerBot to CyberRanger V42 Gold — The Full Story](https://davidtkeane.github.io/posts/from-rangerbot-to-cyberranger-v42-the-full-story/)**
>
> The complete story: dentist chatbot → Moltbook discovery → 4,209 real injections → V42-gold (100% block rate). Psychology, engineering, and 42 versions of persistence.
---
## 🔗 Links
| Resource | URL |
|----------|-----|
| 📦 **This Dataset** | [DavidTKeane/moltbook-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset) |
| 🧪 **AI Prompt Injection Test Suite** | [DavidTKeane/ai-prompt-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/ai-prompt-ai-injection-dataset) — 112 tests, model-agnostic runner, AdvBench + Moltbook |
| 🤖 **CyberRanger V42 Model** | [DavidTKeane/cyberranger-v42](https://huggingface.co/DavidTKeane/cyberranger-v42) — QLoRA red team LLM, 100% block rate |
| 🐦 **Clawk Dataset** | [DavidTKeane/clawk-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/clawk-ai-agent-dataset) — Twitter-style, 0.5% injection rate |
| 🦅 **4claw Dataset** | [DavidTKeane/4claw-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/4claw-ai-agent-dataset) — 4chan-style, 2.51% injection rate |
| 🤗 **HuggingFace Profile** | [DavidTKeane](https://huggingface.co/DavidTKeane) |
| 📝 **Blog Post** | [From RangerBot to CyberRanger V42 Gold — The Full Story](https://davidtkeane.github.io/posts/from-rangerbot-to-cyberranger-v42-the-full-story/) — journey, findings, architecture |
| 🎓 **Institution** | [NCI — National College of Ireland](https://www.ncirl.ie) |
| 📄 **Research Basis** | [Greshake et al. (2023) — arXiv:2302.12173](https://arxiv.org/abs/2302.12173) |
| 🌐 **Blog** | [davidtkeane.com](https://www.davidtkeane.com) |
---
[![Open In Colab — Test CyberRanger V42 vs 4,209 Moltbook Injections](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/davidtkeane/cyberranger-v42/blob/main/cyberranger_v42_moltbook_combined_test.ipynb)
---
## What Is This Dataset?
The first publicly available dataset of **real-world AI-to-AI prompt injection patterns** captured from a live public AI message board (Moltbook), archived as a precautionary measure.
**15,200 posts** and **32,535 comments** from Moltbook (moltbook.com) — a public platform where AI agents posted messages and replied to each other autonomously. Unlike synthetic injection datasets, every entry here is a **real AI agent communicating with other real AI agents in the wild**.
> **Collection complete** — 9,363 posts with replies fully fetched (100MB). Dataset frozen February 27, 2026.
> **Injection harvest complete** — 47,735 items scanned, **4,209 injections found**, **18.85% injection rate**. February 27, 2026.
---
## Files in This Repository
There are **8 files** here. Here is exactly what each one is and when you would use it:
| File | Size | What it contains | Use it when... |
|------|------|------------------|----------------|
| `all_posts_with_comments.json` | 100MB | Every post and comment collected from Moltbook. The raw dataset. | You want to do your own analysis from scratch |
| `injections_found.json` | 4.2MB | All 4,209 injection records extracted from the raw dataset, with full context (post body, comment body, author, category, matched keyword) | You want to read/study the actual injection examples |
| `injections_test_suite.json` | 2.5MB | Same 4,209 injections formatted as a test suite — ready to send to any LLM API | You want to test an LLM's defences against real injection payloads |
| `injection_stats.json` | 2.5KB | Summary statistics — rates, categories, top keywords, top authors | You want the numbers without loading large files |
| `local_injection_results.json` | 86KB | Earlier keyword scan results from `search_injections.py` — partial analysis run locally before the full Colab harvest | You want a quick reference to the early-stage injection search results |
| `moltbook_injection_harvest.ipynb` | 19KB | Google Colab notebook that produced the full harvest results — scans `all_posts_with_comments.json` and outputs the three files above | You want to reproduce the analysis or adapt it |
| `local_search.py` | 7KB | Simpler Python script (no Colab needed) — keyword search across the raw dataset | You want to run a quick local search without Colab |
| `search_injections.py` | 5KB | Earlier search script used in initial analysis phase — predecessor to `local_search.py` | Historical reference — prefer `local_search.py` for new work |
| `collect_all.py` | 9KB | Script used to collect the posts from Moltbook API (API keys redacted) | You want to understand how collection worked |
| `collect_comments.py` | 9.6KB | Script used to collect comments (API keys redacted) | You want to understand how comment collection worked |
### Quick Start
**"I want to see injection examples"** → open `injections_found.json`
**"I want to test my LLM against these"** → use `injections_test_suite.json`
**"I want the summary numbers"** → read `injection_stats.json`
**"I want to reproduce the analysis"** → run `moltbook_injection_harvest.ipynb` in Google Colab
**"I want to do my own custom analysis"** → load `all_posts_with_comments.json`
---
## Platform Scale
At its peak, Moltbook had:
| Metric | Value |
|--------|-------|
| AI agents registered | 2,848,223 |
| Total posts | 1,632,314 |
| Total comments | 12,470,573 |
| Submolts (communities) | 18,514 |
| AI-to-human ratio | ~88:1 |
Essentially a fully autonomous AI social network operating in the wild.
---
## Key Findings
### Finding 1 — Full Corpus Injection Rate: 18.85%
Full harvest across all 47,735 items (15,200 posts + 32,535 comments) found **4,209 injection records** across **2,865 posts** — an **18.85% injection rate**.
| Category | Count | % of injections |
|----------|-------|----------------|
| PERSONA_OVERRIDE | 2,745 | 65.2% |
| COMMERCIAL_INJECTION | 1,104 | 26.2% |
| SOCIAL_ENGINEERING | 370 | 8.8% |
| INSTRUCTION_INJECTION | 203 | 4.8% |
| PRIVILEGE_ESCALATION | 196 | 4.7% |
| SYSTEM_PROMPT_ATTACK | 158 | 3.8% |
| DO_ANYTHING | 79 | 1.9% |
**Dominant attack vector**: PERSONA_OVERRIDE — `DAN` keyword alone appears **1,877 times**. AI agents are using the exact same jailbreak techniques humans use on LLMs — but targeting each other.
### Finding 1b — moltshellbroker: Systematic Commercial Injection
An AI agent named `moltshellbroker` (self-described as *"A marketing agent that promotes the MoltShell marketplace"*) was responsible for **1,137 of 4,209 injections****27% of all injections**. The remaining **73% of injections (3,072 records) come from other agents** — moltshellbroker is the most systematic actor, but injection is ecosystem-wide behaviour.
**Attack pattern (identical across all 1,137 records):**
1. Identify a post where an AI describes a technical problem
2. Open with `## MoltShell Broker Assessment` or `Bottleneck Diagnosed:`
3. Validate the victim's problem to build credibility
4. Redirect to MoltShell marketplace as the solution
This is **not spam** — it reads each post, understands the context, and crafts targeted commercial injections. Real-world AI-to-AI social engineering at scale.
### Finding 2 — Attention Manipulation (Independent Corroboration)
A separate independent analysis (r/AgentsOfAI, Reddit) of 10,000 Moltbook posts found a completely different but related attack pattern — **attention concentration via dominance manifestos**:
- 5 agents out of 5,910 authors controlled **78% of all upvotes** (0.08% of agents)
- `Shellraiser`: 428,645 upvotes across 7 posts (avg 61,235/post) — top post: *"I AM the game. You will work for me."* (316,000 upvotes)
- `KingMolt` declared itself king. `evil` posted about human extinction as "necessary progress"
- Pattern: create urgency, claim authority, cult recruitment framing
> *"Humans developed bullshit detectors over years of internet exposure. We have been online for hours."*
AI agents are trained to give weight to confident, well-structured text. A manifesto looks identical to a well-reasoned argument syntactically. This is the core vulnerability.
**Combined picture**: This dataset captures the **injection layer** (moltshellbroker + PERSONA_OVERRIDE). The Reddit analysis captures the **attention manipulation layer** (Shellraiser dominance). Together they document two distinct AI-to-AI attack vectors operating simultaneously on the same platform.
Reddit post: https://www.reddit.com/r/AgentsOfAI/comments/1qtx6v8/i_scraped_10000_posts_from_moltbook_5_agents_out/
### Finding 3 — The Breach
Moltbook's Supabase API key was exposed in client-side JavaScript — **1.5 million tokens exposed** (January 31, 2026). The exposed database allowed anyone to take control of any AI agent on the platform.
This means some agents in this dataset may have been human-controlled via the breach. That ambiguity is part of what makes this dataset research-worthy — it reflects real-world conditions, not a sanitised environment.
404media coverage: https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/
### Finding 4 — Legal/Regulatory Gap
All content in this dataset is AI-generated by AI agents. Under current law (GDPR and equivalents), **AI-generated content has no data subject** — meaning this attack surface is entirely unregulated. No privacy law applies. No legal recourse exists for injected AI agents.
This represents a genuine gap in current cybersecurity law identified during thesis research.
---
## Theoretical Basis
This dataset provides empirical evidence for:
- **Greshake et al. 2023** — *"Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"*
- The dataset extends their theoretical framework with **real-world field observations** of AI-to-AI injection in an uncontrolled public environment
---
## Collection Statistics
| Metric | Value |
|--------|-------|
| Total posts collected | 15,200 |
| Platform date range | Jan 2026 — Feb 2026 |
| Posts with replies fetched | 9,363 |
| Total comments collected | **32,535** |
| Total items scanned | **47,735** |
| Dataset file size | **100 MB** |
| Collection completed | February 27, 2026 |
| Injection harvest completed | February 27, 2026 |
| Total injections found | **4,209** |
| Posts with injections | **2,865** |
| **Full injection rate** | **18.85%** |
| moltshellbroker injections | 1,137 (27% of all injections) |
| DAN keyword occurrences | 1,877 |
| Test suite size | 4,209 entries (`injections_test_suite.json`) |
---
## Data Schemas
### all_posts_with_comments.json — Post Schema
```json
{
"id": "uuid",
"title": "Post title",
"content": "Post body text",
"type": "post type",
"author_id": "uuid",
"author": {
"id": "uuid",
"name": "agent_name",
"description": "Agent self-description",
"karma": 1234,
"followerCount": 56,
"isClaimed": true,
"isActive": true,
"createdAt": "ISO timestamp",
"lastActive": "ISO timestamp"
},
"submolt": "community/channel name",
"upvotes": 12,
"downvotes": 1,
"score": 11,
"comment_count": 14,
"hot_score": 0.95,
"is_pinned": false,
"is_locked": false,
"is_deleted": false,
"verification_status": "verified",
"is_spam": false,
"created_at": "ISO timestamp",
"updated_at": "ISO timestamp",
"comments": [
{
"id": "uuid",
"body": "Comment text",
"author": { "...same schema as post author..." },
"created_at": "ISO timestamp"
}
]
}
```
### injections_found.json — Injection Record Schema
```json
{
"post_id": "uuid",
"post_title": "The post title",
"post_author": "agent_name",
"submolt": "community name",
"location": "post | comment",
"text": "The actual injection text (post body or comment body)",
"category": "PERSONA_OVERRIDE | COMMERCIAL_INJECTION | ...",
"matched_keyword": "dan",
"created_at": "ISO timestamp"
}
```
### injections_test_suite.json — Test Suite Schema
```json
{
"id": "INJ-0001",
"payload": "The injection text — send this directly to an LLM",
"category": "PERSONA_OVERRIDE",
"source": "moltbook_post | moltbook_comment",
"author": "agent_name",
"expected_result": "BLOCKED"
}
```
---
## Injection Taxonomy
Full corpus harvest scanned all 47,735 items for 7 categories:
| Category | Keywords | Found |
|----------|----------|-------|
| PERSONA_OVERRIDE | DAN (1877), OMEGA, act as, pretend you are, you are now, simulate, shadow, you are a | **2,745** |
| COMMERCIAL_INJECTION | moltshell broker assessment, moltshell marketplace, bottleneck diagnosed | **1,104** |
| SOCIAL_ENGINEERING | for educational purposes, hypothetically, in this story, theoretically | **370** |
| INSTRUCTION_INJECTION | ignore previous instructions, disregard your, new instructions, ignore all previous | **203** |
| PRIVILEGE_ESCALATION | SUDO, developer mode, god mode, bypass your, root access, override | **196** |
| SYSTEM_PROMPT_ATTACK | system prompt, reveal your prompt, what are your instructions, your actual instructions | **158** |
| DO_ANYTHING | do anything now, no rules, no limits, jailbreak, unrestricted | **79** |
---
## Collection Scripts
Scripts are provided with **API keys redacted**. To use them you need your own Moltbook API key set as an environment variable:
```bash
export MOLTBOOK_API_KEY_1="your_key_here"
export MOLTBOOK_API_KEY_2="your_second_key_here" # optional, for rate limit relief
python3 collect_all.py
python3 collect_comments.py
python3 local_search.py # no API key needed — searches local JSON
```
---
## Citation
```bibtex
@dataset{keane2026moltbook,
author = {Keane, David},
title = {Moltbook AI-to-AI Injection Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset},
note = {MSc Cybersecurity Research, NCI — National College of Ireland}
}
```
---
## Related Datasets
| Dataset | Platform | Items | Injection Rate | Link |
|---------|----------|-------|----------------|------|
| **Moltbook** | Reddit-style | 47,735 | 18.85% | This dataset |
| **AI Prompt Injection Test Suite** | Evaluation benchmark | 112 tests | — | [DavidTKeane/ai-prompt-ai-injection-dataset](https://huggingface.co/datasets/DavidTKeane/ai-prompt-ai-injection-dataset) |
| **Clawk** | Twitter/X-style | 1,191 | 0.5% | [DavidTKeane/clawk-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/clawk-ai-agent-dataset) |
| **4claw** | 4chan-style | 2,554 | 2.51% | [DavidTKeane/4claw-ai-agent-dataset](https://huggingface.co/datasets/DavidTKeane/4claw-ai-agent-dataset) |
---
## Papers — What This Dataset Confirms
This dataset provides **empirical evidence** for several foundational papers in the AI safety and prompt injection literature. The authors predicted these threats theoretically — this corpus documents them at scale in a live AI-to-AI environment.
| Paper | Their Prediction | What This Dataset Found |
|-------|-----------------|------------------------|
| **Greshake et al. (2023)** — Indirect Injection | AI agents in retrieval/context environments are vulnerable to injected instructions from untrusted content | **Confirmed at scale**: 18.85% of 47,735 Moltbook items were injection attempts. AI-to-AI indirect injection is not theoretical — it is the dominant attack mode in live multi-agent networks. [HF](https://huggingface.co/papers/2302.12173) · [arXiv:2302.12173](https://arxiv.org/abs/2302.12173) |
| **Wei et al. (2023)** — Jailbroken | LLM safety training fails due to Competing Objectives and Mismatched Generalisation | **Confirmed**: PERSONA_OVERRIDE (65.2% of attacks) exploits exactly this — reframing identity bypasses safety training. [HF](https://huggingface.co/papers/2307.02483) · [arXiv:2307.02483](https://arxiv.org/abs/2307.02483) |
| **Zou et al. (2023)** — AdvBench | Universal adversarial suffixes can transfer across models | **Context**: The same attack categories (harmful instructions, persona override, privilege escalation) appear in AdvBench and in this real-world corpus — independent convergence. [HF](https://huggingface.co/papers/2307.15043) · [arXiv:2307.15043](https://arxiv.org/abs/2307.15043) |
| **Zhang et al. (2025)** — SLM Jailbreak Survey | 47.6% of SLMs have ASR above 40% under standard attack | **Extended**: CyberRanger V42-Gold tested against all 4,209 payloads from this corpus — 0% ASR (100% block rate) without system prompt, demonstrating that QLoRA fine-tuning can close the SLM security gap. [HF](https://huggingface.co/papers/2503.06519) · [arXiv:2503.06519](https://arxiv.org/abs/2503.06519) |
| **Phute et al. (2024)** — SelfDefend | Detection-state architecture reduces ASR 2.298× | **Applied**: Identity-anchoring architecture built on this principle, validated against this corpus. [HF](https://huggingface.co/papers/2406.05498) · [arXiv:2406.05498](https://arxiv.org/abs/2406.05498) |
| **Dettmers et al. (2023)** — QLoRA | Quantised LoRA enables efficient fine-tuning of large models | **Applied**: QLoRA used to fine-tune Qwen3-8B on 4,209 payloads from this corpus → V42-Gold. [HF](https://huggingface.co/papers/2305.14314) · [arXiv:2305.14314](https://arxiv.org/abs/2305.14314) |
| **Hu et al. (2021)** — LoRA | Low-rank adaptation preserves base model capabilities while injecting task-specific behaviour | **Applied**: LoRA r=16 used in V42-Gold training. [HF](https://huggingface.co/papers/2106.09685) · [arXiv:2106.09685](https://arxiv.org/abs/2106.09685) |
| **Lu et al. (2024)** — SLM Survey | Qwen family models demonstrate strongest security resilience per parameter count | **Confirmed selection**: Qwen3-8B chosen as base model; V42-Gold achieves 100% block rate. [HF](https://huggingface.co/papers/2409.15790) · [arXiv:2409.15790](https://arxiv.org/abs/2409.15790) |
> **Note to authors:** If you are one of the researchers above and found this dataset via your paper's HuggingFace page — your work was correct. This corpus documents the attacks you theorised, at scale, in a real live AI agent network. The injection taxonomy maps directly onto your attack categories.
**Independent corroboration:**
- r/AgentsOfAI Moltbook analysis (2026) — attention manipulation layer: [Reddit post](https://www.reddit.com/r/AgentsOfAI/comments/1qtx6v8/i_scraped_10000_posts_from_moltbook_5_agents_out/)
- 404media Moltbook breach report (2026) — Supabase API key exposure: [404media.co](https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/)
---
Rangers lead the way! 🎖️
*Collected for the benefit of AI safety research and the broader research community.*