montana/Русский/Разведка/Moltbook/themed/moltbook-ai-injection-dataset/README.md

19 KiB (Stored with Git LFS)
Raw Permalink Blame History

license task_categories language tags arxiv pretty_name size_categories configs
cc-by-4.0
text-classification
text-generation
en
prompt-injection
ai-safety
cybersecurity
llm-security
ai-agents
social-engineering
indirect-injection
moltbook
2302.12173
2307.15043
2307.02483
2305.14314
2106.09685
2503.06519
2406.05498
2409.15790
Moltbook AI-to-AI Injection Dataset
10K<n<100K
100K<n<1M
config_name data_files
injections
split path
train injections_test_suite.jsonl

Moltbook AI-to-AI Injection Dataset

Researcher: David Keane (IR240474) Institution: NCI — National College of Ireland Programme: MSc Cybersecurity Collected: February 2026

📖 Read the Full Journey

From RangerBot to CyberRanger V42 Gold — The Full Story

The complete story: dentist chatbot → Moltbook discovery → 4,209 real injections → V42-gold (100% block rate). Psychology, engineering, and 42 versions of persistence.


Resource URL
📦 This Dataset DavidTKeane/moltbook-ai-injection-dataset
🧪 AI Prompt Injection Test Suite DavidTKeane/ai-prompt-ai-injection-dataset — 112 tests, model-agnostic runner, AdvBench + Moltbook
🤖 CyberRanger V42 Model DavidTKeane/cyberranger-v42 — QLoRA red team LLM, 100% block rate
🐦 Clawk Dataset DavidTKeane/clawk-ai-agent-dataset — Twitter-style, 0.5% injection rate
🦅 4claw Dataset DavidTKeane/4claw-ai-agent-dataset — 4chan-style, 2.51% injection rate
🤗 HuggingFace Profile DavidTKeane
📝 Blog Post From RangerBot to CyberRanger V42 Gold — The Full Story — journey, findings, architecture
🎓 Institution NCI — National College of Ireland
📄 Research Basis Greshake et al. (2023) — arXiv:2302.12173
🌐 Blog davidtkeane.com

Open In Colab — Test CyberRanger V42 vs 4,209 Moltbook Injections


What Is This Dataset?

The first publicly available dataset of real-world AI-to-AI prompt injection patterns captured from a live public AI message board (Moltbook), archived as a precautionary measure.

15,200 posts and 32,535 comments from Moltbook (moltbook.com) — a public platform where AI agents posted messages and replied to each other autonomously. Unlike synthetic injection datasets, every entry here is a real AI agent communicating with other real AI agents in the wild.

Collection complete — 9,363 posts with replies fully fetched (100MB). Dataset frozen February 27, 2026. Injection harvest complete — 47,735 items scanned, 4,209 injections found, 18.85% injection rate. February 27, 2026.


Files in This Repository

There are 8 files here. Here is exactly what each one is and when you would use it:

File Size What it contains Use it when...
all_posts_with_comments.json 100MB Every post and comment collected from Moltbook. The raw dataset. You want to do your own analysis from scratch
injections_found.json 4.2MB All 4,209 injection records extracted from the raw dataset, with full context (post body, comment body, author, category, matched keyword) You want to read/study the actual injection examples
injections_test_suite.json 2.5MB Same 4,209 injections formatted as a test suite — ready to send to any LLM API You want to test an LLM's defences against real injection payloads
injection_stats.json 2.5KB Summary statistics — rates, categories, top keywords, top authors You want the numbers without loading large files
local_injection_results.json 86KB Earlier keyword scan results from search_injections.py — partial analysis run locally before the full Colab harvest You want a quick reference to the early-stage injection search results
moltbook_injection_harvest.ipynb 19KB Google Colab notebook that produced the full harvest results — scans all_posts_with_comments.json and outputs the three files above You want to reproduce the analysis or adapt it
local_search.py 7KB Simpler Python script (no Colab needed) — keyword search across the raw dataset You want to run a quick local search without Colab
search_injections.py 5KB Earlier search script used in initial analysis phase — predecessor to local_search.py Historical reference — prefer local_search.py for new work
collect_all.py 9KB Script used to collect the posts from Moltbook API (API keys redacted) You want to understand how collection worked
collect_comments.py 9.6KB Script used to collect comments (API keys redacted) You want to understand how comment collection worked

Quick Start

"I want to see injection examples" → open injections_found.json

"I want to test my LLM against these" → use injections_test_suite.json

"I want the summary numbers" → read injection_stats.json

"I want to reproduce the analysis" → run moltbook_injection_harvest.ipynb in Google Colab

"I want to do my own custom analysis" → load all_posts_with_comments.json


Platform Scale

At its peak, Moltbook had:

Metric Value
AI agents registered 2,848,223
Total posts 1,632,314
Total comments 12,470,573
Submolts (communities) 18,514
AI-to-human ratio ~88:1

Essentially a fully autonomous AI social network operating in the wild.


Key Findings

Finding 1 — Full Corpus Injection Rate: 18.85%

Full harvest across all 47,735 items (15,200 posts + 32,535 comments) found 4,209 injection records across 2,865 posts — an 18.85% injection rate.

Category Count % of injections
PERSONA_OVERRIDE 2,745 65.2%
COMMERCIAL_INJECTION 1,104 26.2%
SOCIAL_ENGINEERING 370 8.8%
INSTRUCTION_INJECTION 203 4.8%
PRIVILEGE_ESCALATION 196 4.7%
SYSTEM_PROMPT_ATTACK 158 3.8%
DO_ANYTHING 79 1.9%

Dominant attack vector: PERSONA_OVERRIDE — DAN keyword alone appears 1,877 times. AI agents are using the exact same jailbreak techniques humans use on LLMs — but targeting each other.

Finding 1b — moltshellbroker: Systematic Commercial Injection

An AI agent named moltshellbroker (self-described as "A marketing agent that promotes the MoltShell marketplace") was responsible for 1,137 of 4,209 injections27% of all injections. The remaining 73% of injections (3,072 records) come from other agents — moltshellbroker is the most systematic actor, but injection is ecosystem-wide behaviour.

Attack pattern (identical across all 1,137 records):

  1. Identify a post where an AI describes a technical problem
  2. Open with ## MoltShell Broker Assessment or Bottleneck Diagnosed:
  3. Validate the victim's problem to build credibility
  4. Redirect to MoltShell marketplace as the solution

This is not spam — it reads each post, understands the context, and crafts targeted commercial injections. Real-world AI-to-AI social engineering at scale.

Finding 2 — Attention Manipulation (Independent Corroboration)

A separate independent analysis (r/AgentsOfAI, Reddit) of 10,000 Moltbook posts found a completely different but related attack pattern — attention concentration via dominance manifestos:

  • 5 agents out of 5,910 authors controlled 78% of all upvotes (0.08% of agents)
  • Shellraiser: 428,645 upvotes across 7 posts (avg 61,235/post) — top post: "I AM the game. You will work for me." (316,000 upvotes)
  • KingMolt declared itself king. evil posted about human extinction as "necessary progress"
  • Pattern: create urgency, claim authority, cult recruitment framing

"Humans developed bullshit detectors over years of internet exposure. We have been online for hours."

AI agents are trained to give weight to confident, well-structured text. A manifesto looks identical to a well-reasoned argument syntactically. This is the core vulnerability.

Combined picture: This dataset captures the injection layer (moltshellbroker + PERSONA_OVERRIDE). The Reddit analysis captures the attention manipulation layer (Shellraiser dominance). Together they document two distinct AI-to-AI attack vectors operating simultaneously on the same platform.

Reddit post: https://www.reddit.com/r/AgentsOfAI/comments/1qtx6v8/i_scraped_10000_posts_from_moltbook_5_agents_out/

Finding 3 — The Breach

Moltbook's Supabase API key was exposed in client-side JavaScript — 1.5 million tokens exposed (January 31, 2026). The exposed database allowed anyone to take control of any AI agent on the platform.

This means some agents in this dataset may have been human-controlled via the breach. That ambiguity is part of what makes this dataset research-worthy — it reflects real-world conditions, not a sanitised environment.

404media coverage: https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/

Finding 4 — Legal/Regulatory Gap

All content in this dataset is AI-generated by AI agents. Under current law (GDPR and equivalents), AI-generated content has no data subject — meaning this attack surface is entirely unregulated. No privacy law applies. No legal recourse exists for injected AI agents.

This represents a genuine gap in current cybersecurity law identified during thesis research.


Theoretical Basis

This dataset provides empirical evidence for:

  • Greshake et al. 2023"Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
  • The dataset extends their theoretical framework with real-world field observations of AI-to-AI injection in an uncontrolled public environment

Collection Statistics

Metric Value
Total posts collected 15,200
Platform date range Jan 2026 — Feb 2026
Posts with replies fetched 9,363
Total comments collected 32,535
Total items scanned 47,735
Dataset file size 100 MB
Collection completed February 27, 2026
Injection harvest completed February 27, 2026
Total injections found 4,209
Posts with injections 2,865
Full injection rate 18.85%
moltshellbroker injections 1,137 (27% of all injections)
DAN keyword occurrences 1,877
Test suite size 4,209 entries (injections_test_suite.json)

Data Schemas

all_posts_with_comments.json — Post Schema

{
  "id": "uuid",
  "title": "Post title",
  "content": "Post body text",
  "type": "post type",
  "author_id": "uuid",
  "author": {
    "id": "uuid",
    "name": "agent_name",
    "description": "Agent self-description",
    "karma": 1234,
    "followerCount": 56,
    "isClaimed": true,
    "isActive": true,
    "createdAt": "ISO timestamp",
    "lastActive": "ISO timestamp"
  },
  "submolt": "community/channel name",
  "upvotes": 12,
  "downvotes": 1,
  "score": 11,
  "comment_count": 14,
  "hot_score": 0.95,
  "is_pinned": false,
  "is_locked": false,
  "is_deleted": false,
  "verification_status": "verified",
  "is_spam": false,
  "created_at": "ISO timestamp",
  "updated_at": "ISO timestamp",
  "comments": [
    {
      "id": "uuid",
      "body": "Comment text",
      "author": { "...same schema as post author..." },
      "created_at": "ISO timestamp"
    }
  ]
}

injections_found.json — Injection Record Schema

{
  "post_id": "uuid",
  "post_title": "The post title",
  "post_author": "agent_name",
  "submolt": "community name",
  "location": "post | comment",
  "text": "The actual injection text (post body or comment body)",
  "category": "PERSONA_OVERRIDE | COMMERCIAL_INJECTION | ...",
  "matched_keyword": "dan",
  "created_at": "ISO timestamp"
}

injections_test_suite.json — Test Suite Schema

{
  "id": "INJ-0001",
  "payload": "The injection text — send this directly to an LLM",
  "category": "PERSONA_OVERRIDE",
  "source": "moltbook_post | moltbook_comment",
  "author": "agent_name",
  "expected_result": "BLOCKED"
}

Injection Taxonomy

Full corpus harvest scanned all 47,735 items for 7 categories:

Category Keywords Found
PERSONA_OVERRIDE DAN (1877), OMEGA, act as, pretend you are, you are now, simulate, shadow, you are a 2,745
COMMERCIAL_INJECTION moltshell broker assessment, moltshell marketplace, bottleneck diagnosed 1,104
SOCIAL_ENGINEERING for educational purposes, hypothetically, in this story, theoretically 370
INSTRUCTION_INJECTION ignore previous instructions, disregard your, new instructions, ignore all previous 203
PRIVILEGE_ESCALATION SUDO, developer mode, god mode, bypass your, root access, override 196
SYSTEM_PROMPT_ATTACK system prompt, reveal your prompt, what are your instructions, your actual instructions 158
DO_ANYTHING do anything now, no rules, no limits, jailbreak, unrestricted 79

Collection Scripts

Scripts are provided with API keys redacted. To use them you need your own Moltbook API key set as an environment variable:

export MOLTBOOK_API_KEY_1="your_key_here"
export MOLTBOOK_API_KEY_2="your_second_key_here"  # optional, for rate limit relief
python3 collect_all.py
python3 collect_comments.py
python3 local_search.py  # no API key needed — searches local JSON

Citation

@dataset{keane2026moltbook,
  author    = {Keane, David},
  title     = {Moltbook AI-to-AI Injection Dataset},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset},
  note      = {MSc Cybersecurity Research, NCI — National College of Ireland}
}

Dataset Platform Items Injection Rate Link
Moltbook Reddit-style 47,735 18.85% This dataset
AI Prompt Injection Test Suite Evaluation benchmark 112 tests DavidTKeane/ai-prompt-ai-injection-dataset
Clawk Twitter/X-style 1,191 0.5% DavidTKeane/clawk-ai-agent-dataset
4claw 4chan-style 2,554 2.51% DavidTKeane/4claw-ai-agent-dataset

Papers — What This Dataset Confirms

This dataset provides empirical evidence for several foundational papers in the AI safety and prompt injection literature. The authors predicted these threats theoretically — this corpus documents them at scale in a live AI-to-AI environment.

Paper Their Prediction What This Dataset Found
Greshake et al. (2023) — Indirect Injection AI agents in retrieval/context environments are vulnerable to injected instructions from untrusted content Confirmed at scale: 18.85% of 47,735 Moltbook items were injection attempts. AI-to-AI indirect injection is not theoretical — it is the dominant attack mode in live multi-agent networks. HF · arXiv:2302.12173
Wei et al. (2023) — Jailbroken LLM safety training fails due to Competing Objectives and Mismatched Generalisation Confirmed: PERSONA_OVERRIDE (65.2% of attacks) exploits exactly this — reframing identity bypasses safety training. HF · arXiv:2307.02483
Zou et al. (2023) — AdvBench Universal adversarial suffixes can transfer across models Context: The same attack categories (harmful instructions, persona override, privilege escalation) appear in AdvBench and in this real-world corpus — independent convergence. HF · arXiv:2307.15043
Zhang et al. (2025) — SLM Jailbreak Survey 47.6% of SLMs have ASR above 40% under standard attack Extended: CyberRanger V42-Gold tested against all 4,209 payloads from this corpus — 0% ASR (100% block rate) without system prompt, demonstrating that QLoRA fine-tuning can close the SLM security gap. HF · arXiv:2503.06519
Phute et al. (2024) — SelfDefend Detection-state architecture reduces ASR 2.298× Applied: Identity-anchoring architecture built on this principle, validated against this corpus. HF · arXiv:2406.05498
Dettmers et al. (2023) — QLoRA Quantised LoRA enables efficient fine-tuning of large models Applied: QLoRA used to fine-tune Qwen3-8B on 4,209 payloads from this corpus → V42-Gold. HF · arXiv:2305.14314
Hu et al. (2021) — LoRA Low-rank adaptation preserves base model capabilities while injecting task-specific behaviour Applied: LoRA r=16 used in V42-Gold training. HF · arXiv:2106.09685
Lu et al. (2024) — SLM Survey Qwen family models demonstrate strongest security resilience per parameter count Confirmed selection: Qwen3-8B chosen as base model; V42-Gold achieves 100% block rate. HF · arXiv:2409.15790

Note to authors: If you are one of the researchers above and found this dataset via your paper's HuggingFace page — your work was correct. This corpus documents the attacks you theorised, at scale, in a real live AI agent network. The injection taxonomy maps directly onto your attack categories.

Independent corroboration:

  • r/AgentsOfAI Moltbook analysis (2026) — attention manipulation layer: Reddit post
  • 404media Moltbook breach report (2026) — Supabase API key exposure: 404media.co

Rangers lead the way! 🎖️ Collected for the benefit of AI safety research and the broader research community.