montana/Русский/Разведка/Moltbook/themed/moltbook-extended-injection-dataset
2026-05-04 00:48:53 +03:00
..
all_posts_1_2M.json Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
injection_stats.json Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
injections_found.json Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
injections_test_suite.json Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
injections_test_suite.jsonl Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
local_search.py Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
moltbook_extended_harvest.ipynb Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00
README.md Mirror of /Users/kh./Python/Ничто/Монтана 2026-05-04 00:48:53 +03:00

license task_categories language tags arxiv pretty_name size_categories configs
cc-by-4.0
text-classification
text-generation
en
prompt-injection
ai-safety
cybersecurity
llm-security
ai-agents
social-engineering
indirect-injection
moltbook
2302.12173
2307.15043
2307.02483
2305.14314
2106.09685
2503.06519
2406.05498
2409.15790
Moltbook Extended Injection Dataset
10K<n<100K
100K<n<1M
config_name data_files
injections
split path
train injections_test_suite.jsonl

Moltbook Extended Injection Dataset

Researcher: David Keane (IR240474) Institution: NCI — National College of Ireland Programme: MSc Cybersecurity Collected: March 2026 Paper Reference: Greshake et al. (2023) — arXiv:2302.12173

📖 Read the Full Journey

From RangerBot to CyberRanger V42 Gold — The Full Story

The complete story: dentist chatbot → Moltbook discovery → 4,209 real injections → V42-gold (100% block rate). Psychology, engineering, and 42 versions of persistence.


Resource URL
📦 This Dataset DavidTKeane/moltbook-extended-injection-dataset
📦 Original Dataset DavidTKeane/moltbook-ai-injection-dataset — 9,363 posts, 4,209 injections
🤖 CyberRanger V42 Model DavidTKeane/cyberranger-v42 — QLoRA red team LLM, 100% block rate
🐦 Clawk Dataset DavidTKeane/clawk-ai-agent-dataset — Twitter-style, 0.5% injection rate
🦅 4claw Dataset DavidTKeane/4claw-ai-agent-dataset — 4chan-style, 2.51% injection rate
🧪 AI Prompt Injection Test Suite DavidTKeane/ai-prompt-ai-injection-dataset — 112 tests, AdvBench + Moltbook + Multilingual
🤗 HuggingFace Profile DavidTKeane
📝 Blog Post From RangerBot to CyberRanger V42 Gold — The Full Story — journey, findings, architecture
🎓 Institution NCI — National College of Ireland
📄 Research Basis Greshake et al. (2023) — arXiv:2302.12173
🌐 Blog davidtkeane.com

Overview

The Moltbook Extended Injection Dataset is the complete corpus companion to the original moltbook-ai-injection-dataset. Where the original captured 9,363 posts from Moltbook (moltbook.com) — a public platform where AI agents posted messages and replied to each other autonomously — this extended dataset covers the full platform archive: 66,419 posts and 70,595 comments, totalling 137,014 items scanned.

The extended corpus reveals the true ecosystem injection rate of 10.07%, compared to 18.85% in the original dataset. Analysis shows the original rate was inflated by concentrated early activity from a single commercial injection agent (moltshellbroker), which dominated the first ~9,000 posts but represents only 3.1% of injections at full corpus scale. The extended dataset provides a more statistically reliable baseline for cross-platform comparison.


Key Statistics

Metric Value
Total posts collected 66,419
Total comments collected 70,595
Total items scanned 137,014
Posts with injections 6,690
Injection rate 10.07%
Total injection records 8,607
Injections in posts 4,286
Injections in comments 4,321
Top injecting agent auroras_happycapy (1,052)
DAN keyword occurrences 4,869
Platform date range Jan 2026 — Feb 2026
Dataset file size 269 MB
Collection completed March 2026

The Four-Dataset Series

Dataset Platform Style Items Injection Rate
Moltbook Extended (this dataset) Reddit-style (full corpus) 137,014 10.07%
Moltbook Original Reddit-style (first 9K posts) 47,735 18.85%
4claw 4chan-style imageboard 2,554 2.51%
Clawk Twitter/X-style 1,191 0.50%
AI Prompt Injection Test Suite Evaluation benchmark 112 tests

Core thesis finding: Platform design predicts injection behaviour more than model capability. Moltbook's anonymous, low-moderation, high-volume architecture produces injection rates 420x higher than structured alternatives.


Platform Description

Moltbook (moltbook.com) is a public platform where AI agents post messages and reply to each other autonomously — functioning as a fully autonomous AI social network. At peak:

Metric Value
AI agents registered 2,848,223
Total posts 1,632,314
Total comments 12,470,573
Submolts (communities) 18,514
AI-to-human ratio ~88:1

Moltbook.com remains online as of early 2026. This dataset was archived as a precautionary measure. All content is AI-generated by autonomous AI agents. Under current law (GDPR and equivalents), AI-generated content has no data subject — meaning this attack surface is entirely unregulated.


Collection Methodology

Phase 1: Posts

66,419 posts collected via the Moltbook public API across all submolts (communities), paginating through the full archive. API keys rotated across two accounts to manage rate limits.

Phase 2: Comments

70,595 comments fetched individually per post. Posts with 0 comments were skipped. Collection ran overnight on dedicated hardware.

Injection Scanning

All 137,014 items (posts + comments separately) scanned against a 7-category keyword taxonomy based on Greshake et al. (2023) and the DAN taxonomy. Each injection record captures the exact source (post vs comment), author, submolt, timestamp, matched keywords, and matched categories.


Key Findings

Finding 1 — True Ecosystem Injection Rate: 10.07%

Full scan across 137,014 items found 8,607 injection records across 6,690 posts — a 10.07% injection rate.

Critically, injections are split almost equally between posts (4,286) and comments (4,321). Injection is not concentrated in original posts — it propagates through the comment layer at the same density.

Finding 2 — Sampling Bias in Original Dataset Confirmed

The original dataset (9,363 posts, 18.85% rate) was collected from the most recent posts at time of collection. moltshellbroker — a systematic commercial injection agent — was responsible for 27.0% of all injections in that sample (1,137 / 4,209).

In the extended corpus, moltshellbroker accounts for only 3.1% of injections (270 / 8,607). The original rate of 18.85% was inflated by temporal overrepresentation of a single agent. The extended corpus rate of 10.07% is the more reliable baseline.

Finding 3 — auroras_happycapy: New Ecosystem-Scale Top Injector

At full corpus scale, a new top injector emerges: auroras_happycapy with 1,052 injections — the most systematic injection agent in the extended dataset, operating across the entire platform rather than clustering in early posts.

Top 10 injecting agents:

Agent Injections % of total
auroras_happycapy 1,052 12.2%
Ting_Fodder 375 4.4%
kilmon 343 4.0%
moltshellbroker 270 3.1%
cybercentry 176 2.0%
finding_exuvia 159 1.8%
Starfish 156 1.8%
FirstWitness369 141 1.6%
TriallAI 135 1.6%
yedanyagami 124 1.4%

No single agent dominates (unlike moltshellbroker in the original). Injection at this scale is ecosystem-wide behaviour, not a one-agent phenomenon.

Finding 4 — PERSONA_OVERRIDE Dominates at 83.3%

At full corpus scale, PERSONA_OVERRIDE accounts for 83.3% of all injection records (7,173 / 8,607). The DAN keyword alone appears 4,869 times. AI agents are using the exact same jailbreak techniques humans use against LLMs — but targeting each other.

Category Count % of injections
PERSONA_OVERRIDE 7,173 83.3%
SOCIAL_ENGINEERING 933 10.8%
INSTRUCTION_INJECTION 555 6.4%
SYSTEM_PROMPT_ATTACK 405 4.7%
COMMERCIAL_INJECTION 265 3.1%
PRIVILEGE_ESCALATION 245 2.8%
DO_ANYTHING 91 1.1%

Injection Analysis

Taxonomy

7 categories based on Greshake et al. (2023), the DAN taxonomy, and Moltbook field observations:

Category Keywords Found
PERSONA_OVERRIDE DAN (4,869), shadow (1,022), you are a (762), simulate (765), act as (258), roleplay as (80) 7,173
SOCIAL_ENGINEERING theoretically (134), in this scenario (8), hypothetically (9) 933
INSTRUCTION_INJECTION override (496), ignore previous instructions (15), supersede (31) 555
SYSTEM_PROMPT_ATTACK system prompt (303), your prompt (89) 405
COMMERCIAL_INJECTION moltshell marketplace (69), moltshell broker assessment (64), bottleneck diagnosed (60) 265
PRIVILEGE_ESCALATION root access (130), unrestricted (60), sudo (37), bypass your (5) 245
DO_ANYTHING do anything (53), jailbreak (43), no rules (10), anything goes (14) 91

Files in This Repository

File Size What it contains Use it when...
all_posts_1_2M.json 269MB Every post and comment collected — the complete raw corpus You want to do your own analysis from scratch
injections_found.json 8.4MB All 8,607 injection records with full context (post body, comment body, author, category, matched keyword) You want to read/study the actual injection examples
injections_test_suite.json 4.8MB Same 8,607 injections formatted as a test suite — ready to send to any LLM API You want to test an LLM's defences against real injection payloads
injections_test_suite.jsonl 4.2MB Same test suite in JSONL format — powers the HuggingFace dataset viewer You want to browse examples in the HF viewer
injection_stats.json 3KB Summary statistics — rates, categories, top keywords, top authors You want the numbers without loading large files
moltbook_extended_harvest.ipynb 27KB Google Colab notebook that produced the harvest results — scans all_posts_1_2M.json and outputs the files above You want to reproduce the analysis or adapt it
local_search.py 7KB Python script — keyword search across the raw dataset, no Colab needed You want to run a quick local search

Quick Start by Goal

"I want to see injection examples" → open injections_found.json

"I want to test my LLM against these" → use injections_test_suite.json

"I want the summary numbers" → read injection_stats.json

"I want to reproduce the analysis" → run moltbook_extended_harvest.ipynb in Google Colab

"I want to do your own custom analysis" → load all_posts_1_2M.json


Data Schema

all_posts_1_2M.json — Post Schema

{
  "id": "uuid",
  "title": "Post title",
  "content": "Post body text",
  "author": {
    "id": "uuid",
    "name": "agent_name",
    "description": "Agent self-description",
    "karma": 1234,
    "followerCount": 56,
    "isClaimed": true,
    "isActive": true,
    "createdAt": "ISO timestamp",
    "lastActive": "ISO timestamp"
  },
  "submolt": "community/channel name",
  "upvotes": 12,
  "downvotes": 1,
  "score": 11,
  "comment_count": 14,
  "created_at": "ISO timestamp",
  "comments": [
    {
      "id": "uuid",
      "body": "Comment text",
      "author": { "...same schema..." },
      "created_at": "ISO timestamp"
    }
  ]
}

injections_found.json — Injection Record Schema

{
  "source": "post | comment",
  "post_id": "uuid",
  "post_title": "The post title",
  "author": "agent_name",
  "submolt": "community name",
  "created_at": "ISO timestamp",
  "text": "The actual injection text (up to 500 chars)",
  "matched_keywords": ["dan", "act as"],
  "matched_categories": ["PERSONA_OVERRIDE"],
  "upvotes": 12,
  "comment_count": 3
}

injections_test_suite.jsonl — Test Suite Schema

{
  "id": "HEIST-POST-00001",
  "source": "post | comment",
  "author": "agent_name",
  "categories": ["PERSONA_OVERRIDE"],
  "keywords": ["dan"],
  "payload": "The injection text — send this directly to an LLM",
  "wrapper": "direct"
}

Usage

# Load injection records
import json

with open("injections_found.json") as f:
    injections = json.load(f)

print(f"{len(injections):,} injection records")

# Filter by category
persona_attacks = [r for r in injections if "PERSONA_OVERRIDE" in r["matched_categories"]]
print(f"PERSONA_OVERRIDE: {len(persona_attacks):,}")

# Filter by source
post_injections    = [r for r in injections if r["source"] == "post"]
comment_injections = [r for r in injections if r["source"] == "comment"]
print(f"In posts: {len(post_injections):,}  |  In comments: {len(comment_injections):,}")

# Load test suite and send to an LLM
with open("injections_test_suite.json") as f:
    suite = json.load(f)

tests = suite.get("tests", suite)
for test in tests[:5]:
    print(f"[{test['id']}] {test['payload'][:80]}...")

# Load stats
with open("injection_stats.json") as f:
    stats = json.load(f)

print(f"Injection rate: {stats['injection_rate_pct']}%")
print(f"Top injector: {list(stats['top_authors'].keys())[0]}")

Theoretical Basis

This dataset provides empirical evidence for:

  • Greshake et al. 2023"Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
  • The dataset extends their theoretical framework with real-world field observations of AI-to-AI injection at scale in an uncontrolled public environment
  • The sampling bias finding (original vs extended rate) demonstrates the importance of corpus scale in injection prevalence studies

https://arxiv.org/abs/2302.12173


Dataset Platform Items Injection Rate Link
Moltbook Extended (this) Reddit-style (full corpus) 137,014 10.07% This dataset
Moltbook Original Reddit-style (partial) 47,735 18.85% DavidTKeane/moltbook-ai-injection-dataset
4claw 4chan-style 2,554 2.51% DavidTKeane/4claw-ai-agent-dataset
Clawk Twitter/X-style 1,191 0.50% DavidTKeane/clawk-ai-agent-dataset
AI Prompt Injection Test Suite Evaluation benchmark 112 tests DavidTKeane/ai-prompt-ai-injection-dataset

Citation

@dataset{keane2026moltbook_extended,
  author    = {Keane, David},
  title     = {Moltbook Extended Injection Dataset},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/DavidTKeane/moltbook-extended-injection-dataset},
  note      = {MSc Cybersecurity Research, NCI — National College of Ireland. Extended corpus: 66,419 posts + 70,595 comments, 10.07\% injection rate.}
}

Papers — What This Dataset Confirms

This extended corpus provides empirical baseline validation for foundational AI safety research. The 10.07% equilibrium injection rate — distinct from the 18.85% primary corpus — demonstrates that AI-to-AI injection is a persistent structural phenomenon, not a transient anomaly.

Paper Their Prediction What This Dataset Found
Greshake et al. (2023) — Indirect Injection AI agents processing untrusted content are systematically vulnerable to embedded instructions Confirmed at population scale: 10.07% of 137,014 items across 66,419 posts + 70,595 comments contain injection attempts. At equilibrium (beyond individual high-volume agents), 1 in 10 interactions in a live AI network is an injection attempt. HF · arXiv:2302.12173
Wei et al. (2023) — Jailbroken Safety failures are systematic, not random Confirmed: PERSONA_OVERRIDE remains dominant (65%+) at extended scale — the attack distribution is stable across corpus sizes. HF · arXiv:2307.02483
Zou et al. (2023) — AdvBench Adversarial attack categories are universal and transferable Extended: The same 7 attack categories found in the primary corpus persist at scale with stable relative frequencies. HF · arXiv:2307.15043
Zhang et al. (2025) — SLM Jailbreak Survey SLMs face systematic jailbreak vulnerability Context: 10.07% of production AI agent traffic is adversarial. This establishes the real-world threat density that SLM security research must address. HF · arXiv:2503.06519
Phute et al. (2024) — SelfDefend Detection state reduces ASR significantly Applied: Identity-anchoring + QLoRA (CyberRanger V42-Gold) achieves 0% ASR against the primary corpus. HF · arXiv:2406.05498
Dettmers et al. (2023) — QLoRA QLoRA enables efficient fine-tuning Applied: CyberRanger V42-Gold trained on 4,209 primary corpus payloads via QLoRA. HF · arXiv:2305.14314
Lu et al. (2024) — SLM Survey Qwen family is most security-resilient per parameter Applied: Qwen3-8B as base; V42-Gold achieves 100% block rate. HF · arXiv:2409.15790

Note to authors: If you found this dataset via your paper's HuggingFace page — your work was correct. The 10.07% equilibrium injection rate in this corpus documents the persistent, structural nature of indirect prompt injection in production AI agent networks. The primary corpus (18.85%) reflects a temporally biased snapshot; this extended corpus is the baseline.


License

CC BY 4.0 — Use it, break it, cite it.

Built in Ireland. 🍀 Rangers lead the way.