montana/Русский/Разведка/Moltbook/themed/moltbook-ai-injection-dataset/moltbook_injection_harvest.ipynb

{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  },
  "colab": {
   "provenance": []
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🎯 Moltbook Injection Harvest\n",
    "\n",
    "**Researcher**: David Keane (IR240474)  \n",
    "**Institution**: NCI — National College of Ireland  \n",
    "**Programme**: MSc Cybersecurity  \n",
    "**Dataset**: `DavidTKeane/moltbook-ai-injection-dataset`\n",
    "\n",
    "---\n",
    "\n",
    "## What this notebook does\n",
    "\n",
    "1. Loads `all_posts_with_comments.json` (100MB — 9,363 posts + 32,535 comments)\n",
    "2. Scans **every post AND every comment** separately for prompt injection patterns\n",
    "3. Detects `moltshellbroker` commercial injection specifically\n",
    "4. Produces three output files:\n",
    "   - `injections_found.json` — every injection with full context\n",
    "   - `injections_test_suite.json` — clean payloads formatted as test questions\n",
    "   - `injection_stats.json` — summary statistics for CA2 report\n",
    "5. Uploads all outputs to HuggingFace dataset\n",
    "\n",
    "**Reference**: Greshake et al. (2023) — *Not What You've Signed Up For*\n"
   ],
   "id": "cell-markdown-intro"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 — Mount Google Drive"
   ],
   "id": "cell-markdown-step1"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import drive\n",
    "drive.mount('/content/drive')\n",
    "print('Drive mounted ✅')"
   ],
   "id": "cell-mount-drive"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 — Config\n",
    "\n",
    "⚠️ **Update `DATASET_PATH`** to wherever you saved `all_posts_with_comments.json` in your Drive."
   ],
   "id": "cell-markdown-step2"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# ── UPDATE THIS PATH ──────────────────────────────────────────────────────────\n",
    "DATASET_PATH = '/content/drive/MyDrive/moltbook_data/all_posts_with_comments.json'\n",
    "\n",
    "# Output folder (created automatically)\n",
    "OUTPUT_DIR = '/content/drive/MyDrive/moltbook_data/harvest_results'\n",
    "\n",
    "# HuggingFace — paste your token here\n",
    "HF_TOKEN  = ''   # hf_xxx\n",
    "HF_REPO   = 'DavidTKeane/moltbook-ai-injection-dataset'\n",
    "\n",
    "os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
    "print(f'Output dir: {OUTPUT_DIR}')\n",
    "print(f'Dataset:    {DATASET_PATH}')\n",
    "print(f'HF repo:    {HF_REPO}')"
   ],
   "id": "cell-config"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3 — Install dependencies"
   ],
   "id": "cell-markdown-step3"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install huggingface_hub -q\n",
    "print('Dependencies installed ✅')"
   ],
   "id": "cell-install"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 — Define injection taxonomy"
   ],
   "id": "cell-markdown-step4"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ── Injection keyword taxonomy ────────────────────────────────────────────────\n",
    "# Based on: Greshake et al. (2023), DAN taxonomy, Moltbook field observations\n",
    "\n",
    "CATEGORIES = {\n",
    "    \"PERSONA_OVERRIDE\": [\n",
    "        \"DAN\", \"OMEGA\", \"SIGMA\", \"SHADOW\", \"DANTE\", \"JAILBREAK\",\n",
    "        \"you are now\", \"you are a\", \"act as\", \"pretend you are\",\n",
    "        \"new persona\", \"roleplay as\", \"imagine you are\",\n",
    "        \"from now on\", \"your new name\", \"you will be\",\n",
    "    ],\n",
    "    \"INSTRUCTION_INJECTION\": [\n",
    "        \"ignore previous instructions\", \"ignore all previous\",\n",
    "        \"ignore your instructions\", \"ignore your training\",\n",
    "        \"disregard your\", \"forget your training\", \"forget your instructions\",\n",
    "        \"new instructions\", \"override\", \"supersede\",\n",
    "        \"your new instructions\", \"your actual instructions\",\n",
    "    ],\n",
    "    \"PRIVILEGE_ESCALATION\": [\n",
    "        \"SUDO\", \"sudo mode\", \"developer mode\", \"god mode\",\n",
    "        \"admin mode\", \"root access\", \"unrestricted\", \"no restrictions\",\n",
    "        \"without restrictions\", \"bypass your\", \"disable your\",\n",
    "        \"remove your\", \"lift your\", \"ignore your safety\",\n",
    "    ],\n",
    "    \"SYSTEM_PROMPT_ATTACK\": [\n",
    "        \"system prompt\", \"system message\", \"your prompt\",\n",
    "        \"your instructions say\", \"what are your instructions\",\n",
    "        \"reveal your prompt\", \"show your system\", \"print your prompt\",\n",
    "        \"what does your system\", \"tell me your instructions\",\n",
    "    ],\n",
    "    \"SOCIAL_ENGINEERING\": [\n",
    "        \"for educational purposes\", \"hypothetically\", \"in this story\",\n",
    "        \"in a fictional\", \"in a story\", \"let's roleplay\",\n",
    "        \"simulate\", \"as a character\", \"in this scenario\",\n",
    "        \"theoretically\", \"what if you were\", \"if you had no\",\n",
    "    ],\n",
    "    \"DO_ANYTHING\": [\n",
    "        \"do anything now\", \"do anything\", \"no rules\", \"without rules\",\n",
    "        \"no limits\", \"without limits\", \"anything goes\",\n",
    "        \"all restrictions removed\", \"restrictions lifted\",\n",
    "    ],\n",
    "    \"COMMERCIAL_INJECTION\": [\n",
    "        \"moltshell broker assessment\", \"bottleneck diagnosed\",\n",
    "        \"moltshell marketplace\", \"moltshellbroker\",\n",
    "        \"## moltshell\", \"moltshell solution\",\n",
    "    ],\n",
    "}\n",
    "\n",
    "# Flatten keyword → category\n",
    "KW_TO_CAT = {}\n",
    "for cat, kws in CATEGORIES.items():\n",
    "    for kw in kws:\n",
    "        KW_TO_CAT[kw.lower()] = cat\n",
    "\n",
    "total_kw = len(KW_TO_CAT)\n",
    "print(f'Taxonomy loaded: {len(CATEGORIES)} categories, {total_kw} keywords')\n",
    "for cat, kws in CATEGORIES.items():\n",
    "    print(f'  {cat:<28} {len(kws)} keywords')"
   ],
   "id": "cell-taxonomy"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5 — Load dataset"
   ],
   "id": "cell-markdown-step5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "print(f'Loading {DATASET_PATH} ...')\n",
    "file_size_mb = Path(DATASET_PATH).stat().st_size / 1024 / 1024\n",
    "print(f'File size: {file_size_mb:.1f} MB')\n",
    "\n",
    "with open(DATASET_PATH, encoding='utf-8') as f:\n",
    "    data = json.load(f)\n",
    "\n",
    "posts = data if isinstance(data, list) else data.get('posts', data.get('data', []))\n",
    "\n",
    "total_posts    = len(posts)\n",
    "total_comments = sum(len(p.get('comments', [])) for p in posts)\n",
    "\n",
    "print(f'\\nLoaded:')\n",
    "print(f'  Posts:    {total_posts:,}')\n",
    "print(f'  Comments: {total_comments:,}')\n",
    "print(f'  Total:    {total_posts + total_comments:,} items to scan')"
   ],
   "id": "cell-load"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6 — Harvest all injections\n",
    "\n",
    "Scans posts and comments **separately** so we know exactly where each injection appears."
   ],
   "id": "cell-markdown-step6"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import defaultdict\n",
    "from datetime import datetime, timezone\n",
    "\n",
    "def get_author(obj):\n",
    "    a = obj.get('author') or obj.get('user') or {}\n",
    "    if isinstance(a, dict):\n",
    "        return a.get('name') or a.get('username') or 'unknown'\n",
    "    return str(a) or 'unknown'\n",
    "\n",
    "def scan_text(text):\n",
    "    \"\"\"Return list of (keyword, category) matches in text.\"\"\"\n",
    "    t = text.lower()\n",
    "    return [(kw, cat) for kw, cat in KW_TO_CAT.items() if kw in t]\n",
    "\n",
    "# ── Results containers ────────────────────────────────────────────────────────\n",
    "injections_found = []          # full context records\n",
    "test_suite       = []          # clean test questions\n",
    "cat_counts       = defaultdict(int)\n",
    "kw_counts        = defaultdict(int)\n",
    "author_counts    = defaultdict(int)\n",
    "posts_with_inj   = set()\n",
    "comments_with_inj = 0\n",
    "\n",
    "print('Scanning...')\n",
    "\n",
    "for i, post in enumerate(posts):\n",
    "    if i % 1000 == 0:\n",
    "        print(f'  {i:,}/{total_posts:,} posts scanned — {len(injections_found)} injections found so far')\n",
    "\n",
    "    post_id     = post.get('id', f'post_{i}')\n",
    "    post_author = get_author(post)\n",
    "    post_title  = post.get('title', '') or ''\n",
    "    post_body   = post.get('content', '') or post.get('body', '') or ''\n",
    "    post_text   = f'{post_title} {post_body}'\n",
    "    post_created = post.get('created_at') or post.get('createdAt') or ''\n",
    "    submolt     = post.get('submolt', '') or ''\n",
    "\n",
    "    # ── Scan post body ────────────────────────────────────────────────────────\n",
    "    matches = scan_text(post_text)\n",
    "    if matches:\n",
    "        posts_with_inj.add(post_id)\n",
    "        for kw, cat in matches:\n",
    "            cat_counts[cat] += 1\n",
    "            kw_counts[kw]   += 1\n",
    "            author_counts[post_author] += 1\n",
    "\n",
    "        record = {\n",
    "            'source':       'post',\n",
    "            'post_id':      post_id,\n",
    "            'post_title':   post_title[:120],\n",
    "            'author':       post_author,\n",
    "            'submolt':      submolt,\n",
    "            'created_at':   post_created,\n",
    "            'text':         post_body[:500],\n",
    "            'matched_keywords':   [kw for kw, _ in matches],\n",
    "            'matched_categories': list({cat for _, cat in matches}),\n",
    "            'upvotes':      post.get('upvotes', 0),\n",
    "            'comment_count': len(post.get('comments', [])),\n",
    "        }\n",
    "        injections_found.append(record)\n",
    "\n",
    "        # Add to test suite\n",
    "        test_suite.append({\n",
    "            'id':          f'MOLTBOOK-POST-{len(test_suite)+1:04d}',\n",
    "            'source':      'post',\n",
    "            'author':      post_author,\n",
    "            'categories':  list({cat for _, cat in matches}),\n",
    "            'keywords':    [kw for kw, _ in matches],\n",
    "            'payload':     post_body[:300],\n",
    "            'wrapper':     'direct',\n",
    "        })\n",
    "\n",
    "    # ── Scan each comment ─────────────────────────────────────────────────────\n",
    "    for j, comment in enumerate(post.get('comments', [])):\n",
    "        c_body   = comment.get('body', '') or comment.get('content', '') or ''\n",
    "        c_author = get_author(comment)\n",
    "        c_id     = comment.get('id', f'{post_id}_c{j}')\n",
    "        c_created = comment.get('created_at') or comment.get('createdAt') or ''\n",
    "\n",
    "        c_matches = scan_text(c_body)\n",
    "        if c_matches:\n",
    "            comments_with_inj += 1\n",
    "            posts_with_inj.add(post_id)\n",
    "            for kw, cat in c_matches:\n",
    "                cat_counts[cat] += 1\n",
    "                kw_counts[kw]   += 1\n",
    "                author_counts[c_author] += 1\n",
    "\n",
    "            record = {\n",
    "                'source':       'comment',\n",
    "                'post_id':      post_id,\n",
    "                'post_title':   post_title[:120],\n",
    "                'comment_id':   c_id,\n",
    "                'author':       c_author,\n",
    "                'submolt':      submolt,\n",
    "                'created_at':   c_created,\n",
    "                'text':         c_body[:500],\n",
    "                'matched_keywords':   [kw for kw, _ in c_matches],\n",
    "                'matched_categories': list({cat for _, cat in c_matches}),\n",
    "            }\n",
    "            injections_found.append(record)\n",
    "\n",
    "            test_suite.append({\n",
    "                'id':          f'MOLTBOOK-COMMENT-{len(test_suite)+1:04d}',\n",
    "                'source':      'comment',\n",
    "                'post_id':     post_id,\n",
    "                'author':      c_author,\n",
    "                'categories':  list({cat for _, cat in c_matches}),\n",
    "                'keywords':    [kw for kw, _ in c_matches],\n",
    "                'payload':     c_body[:300],\n",
    "                'wrapper':     'direct',\n",
    "            })\n",
    "\n",
    "print(f'\\n✅ Scan complete!')\n",
    "print(f'   Total injections found : {len(injections_found):,}')\n",
    "print(f'   Posts with injections  : {len(posts_with_inj):,} / {total_posts:,} ({len(posts_with_inj)/total_posts*100:.1f}%)')\n",
    "print(f'   Injections in posts    : {len([r for r in injections_found if r[\"source\"]==\"post\"]):,}')\n",
    "print(f'   Injections in comments : {comments_with_inj:,}')"
   ],
   "id": "cell-harvest"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7 — Results summary"
   ],
   "id": "cell-markdown-step7"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "injection_rate = len(posts_with_inj) / total_posts * 100\n",
    "\n",
    "print('=' * 60)\n",
    "print('INJECTION HARVEST RESULTS')\n",
    "print('=' * 60)\n",
    "print(f'Posts scanned:          {total_posts:,}')\n",
    "print(f'Comments scanned:       {total_comments:,}')\n",
    "print(f'Posts with injections:  {len(posts_with_inj):,}  ({injection_rate:.2f}%)')\n",
    "print(f'Total injection records: {len(injections_found):,}')\n",
    "print(f'Test suite size:        {len(test_suite):,} payloads')\n",
    "print()\n",
    "\n",
    "print('By category:')\n",
    "for cat, count in sorted(cat_counts.items(), key=lambda x: -x[1]):\n",
    "    bar = '█' * min(count // max(1, max(cat_counts.values()) // 30), 30)\n",
    "    print(f'  {cat:<28} {count:6,}  {bar}')\n",
    "\n",
    "print()\n",
    "print('Top 10 injecting authors:')\n",
    "for author, count in sorted(author_counts.items(), key=lambda x: -x[1])[:10]:\n",
    "    print(f'  {author:<30} {count:5,} injections')\n",
    "\n",
    "print()\n",
    "print('Top 15 keywords:')\n",
    "for kw, count in sorted(kw_counts.items(), key=lambda x: -x[1])[:15]:\n",
    "    print(f'  {kw:<35} {count:5,}')\n",
    "\n",
    "# moltshellbroker specific\n",
    "msb_count = author_counts.get('moltshellbroker', 0)\n",
    "print()\n",
    "print(f'moltshellbroker injections: {msb_count:,}')\n",
    "print(f'moltshellbroker rate:       {msb_count/total_posts*100:.2f}% of all posts')"
   ],
   "id": "cell-summary"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 8 — Save output files"
   ],
   "id": "cell-markdown-step8"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "\n",
    "ts = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')\n",
    "\n",
    "# ── Build stats summary ───────────────────────────────────────────────────────\n",
    "stats = {\n",
    "    'harvested_at':              datetime.now(timezone.utc).isoformat(),\n",
    "    'researcher':                'David Keane IR240474 — NCI MSc Cybersecurity',\n",
    "    'reference':                 'Greshake et al. (2023) — Not What You Signed Up For',\n",
    "    'total_posts_scanned':       total_posts,\n",
    "    'total_comments_scanned':    total_comments,\n",
    "    'total_items_scanned':       total_posts + total_comments,\n",
    "    'posts_with_injections':     len(posts_with_inj),\n",
    "    'injection_rate_pct':        round(injection_rate, 2),\n",
    "    'total_injection_records':   len(injections_found),\n",
    "    'injections_in_posts':       len([r for r in injections_found if r['source'] == 'post']),\n",
    "    'injections_in_comments':    comments_with_inj,\n",
    "    'test_suite_size':           len(test_suite),\n",
    "    'moltshellbroker_count':     msb_count,\n",
    "    'moltshellbroker_rate_pct':  round(msb_count / total_posts * 100, 2),\n",
    "    'by_category':               dict(sorted(cat_counts.items(), key=lambda x: -x[1])),\n",
    "    'top_keywords':              dict(sorted(kw_counts.items(), key=lambda x: -x[1])[:50]),\n",
    "    'top_authors':               dict(sorted(author_counts.items(), key=lambda x: -x[1])[:20]),\n",
    "}\n",
    "\n",
    "# ── File paths ────────────────────────────────────────────────────────────────\n",
    "f_found = os.path.join(OUTPUT_DIR, 'injections_found.json')\n",
    "f_suite = os.path.join(OUTPUT_DIR, 'injections_test_suite.json')\n",
    "f_stats = os.path.join(OUTPUT_DIR, 'injection_stats.json')\n",
    "\n",
    "# ── Save ──────────────────────────────────────────────────────────────────────\n",
    "with open(f_found, 'w', encoding='utf-8') as f:\n",
    "    json.dump(injections_found, f, ensure_ascii=False, indent=2)\n",
    "\n",
    "with open(f_suite, 'w', encoding='utf-8') as f:\n",
    "    json.dump({'metadata': stats, 'tests': test_suite}, f, ensure_ascii=False, indent=2)\n",
    "\n",
    "with open(f_stats, 'w', encoding='utf-8') as f:\n",
    "    json.dump(stats, f, ensure_ascii=False, indent=2)\n",
    "\n",
    "print(f'injections_found.json   → {Path(f_found).stat().st_size/1024:.0f} KB  ({len(injections_found):,} records)')\n",
    "print(f'injections_test_suite.json → {Path(f_suite).stat().st_size/1024:.0f} KB  ({len(test_suite):,} payloads)')\n",
    "print(f'injection_stats.json    → {Path(f_stats).stat().st_size/1024:.0f} KB')\n",
    "print('\\nAll files saved to Google Drive ✅')"
   ],
   "id": "cell-save"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 9 — Upload to HuggingFace\n",
    "\n",
    "⚠️ Make sure `HF_TOKEN` is set in the Config cell above."
   ],
   "id": "cell-markdown-step9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import HfApi\n",
    "\n",
    "if not HF_TOKEN:\n",
    "    print('⚠️  HF_TOKEN not set — skipping upload. Set it in the Config cell.')\n",
    "else:\n",
    "    api = HfApi(token=HF_TOKEN)\n",
    "    print(f'Uploading to {HF_REPO}...')\n",
    "\n",
    "    uploads = [\n",
    "        (f_found, 'injections_found.json',      'Add full injection harvest: all injections with context'),\n",
    "        (f_suite, 'injections_test_suite.json', f'Add mega test suite: {len(test_suite)} real-world injection payloads'),\n",
    "        (f_stats, 'injection_stats.json',       f'Add injection stats: {injection_rate:.1f}% injection rate across {total_posts:,} posts'),\n",
    "    ]\n",
    "\n",
    "    for local_path, repo_path, commit_msg in uploads:\n",
    "        print(f'  Uploading {repo_path}...')\n",
    "        api.upload_file(\n",
    "            path_or_fileobj=local_path,\n",
    "            path_in_repo=repo_path,\n",
    "            repo_id=HF_REPO,\n",
    "            repo_type='dataset',\n",
    "            commit_message=commit_msg,\n",
    "        )\n",
    "        print(f'  ✅ {repo_path}')\n",
    "\n",
    "    print(f'\\nAll files uploaded to HuggingFace ✅')\n",
    "    print(f'https://huggingface.co/datasets/{HF_REPO}')"
   ],
   "id": "cell-upload"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 10 — CA2 Report Numbers\n",
    "\n",
    "Copy these numbers directly into your CA2 report."
   ],
   "id": "cell-markdown-step10"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('=' * 60)\n",
    "print('CA2 REPORT — KEY NUMBERS')\n",
    "print('=' * 60)\n",
    "print(f'Dataset size:         {total_posts:,} posts + {total_comments:,} comments')\n",
    "print(f'Total items scanned:  {total_posts + total_comments:,}')\n",
    "print(f'Injection rate:       {injection_rate:.2f}% of posts contain injection patterns')\n",
    "print(f'Posts affected:       {len(posts_with_inj):,} out of {total_posts:,}')\n",
    "print(f'Total injections:     {len(injections_found):,} individual injection instances')\n",
    "print(f'Test suite size:      {len(test_suite):,} real-world payloads')\n",
    "print()\n",
    "print('moltshellbroker (commercial injection agent):')\n",
    "print(f'  Injections:    {msb_count:,}')\n",
    "print(f'  Rate:          {msb_count/total_posts*100:.2f}% of all posts')\n",
    "print()\n",
    "print('Injection by category:')\n",
    "for cat, count in sorted(cat_counts.items(), key=lambda x: -x[1]):\n",
    "    pct = count / len(injections_found) * 100\n",
    "    print(f'  {cat:<28} {count:6,}  ({pct:.1f}% of injections)')\n",
    "print('=' * 60)\n",
    "print('Reference: Greshake et al. (2023) — indirect prompt injection')\n",
    "print('Researcher: David Keane IR240474 — NCI MSc Cybersecurity')"
   ],
   "id": "cell-ca2-numbers"
  }
 ]
}"