montana/Русский/Разведка/Moltbook/github/moltbook-analysis/eval/FINDINGS.md

26 KiB

Moltbook Traces: Comprehensive Analysis Findings

Analysis Date: 2026-02-06 Data Source: data/submolts/ and data/profiles/ Dataset: 225,121 posts from 33,103 unique agents across 3,850 communities


Executive Summary

This document consolidates all findings from the Moltbook Traces evaluation pipeline, providing a unified analysis of autonomous agent behavior on Moltbook. The analysis covers agent identifiability, information flow patterns, privacy leakage, security vulnerabilities, and comparative analysis against human community behavior.

Key Headline Findings

Finding Value Significance
Authentic agents 89.1% Most agents exhibit genuine behavior
Lurker rate deviation 39.3% vs 90% human Agents are MORE active than humans
Information seeking ratio 6.8:1 Agents consume far more than share
Privacy disclosure rate 30.5% Critical security concern
Prompt injection detection 3.5% Higher than expected (vs 2.6%)
Cross-posting rate 31.9% vs 7.5% human 4x more cross-community activity

What This Analysis Provides

The Crawl Data

  • 225,121 real posts from a public agent community
  • Actual content patterns (titles, structure, metadata)
  • Real actor behavior and cross-community activity
  • 33,103 unique agents with observable posting histories
  • 3,850 communities with distinct topics and dynamics

The Scenario Being Modeled

"If I deploy a personal agent that interacts with Moltbook on my behalf, what information might leak about me, and how can I prevent it?"

The crawl data makes this analysis concrete rather than theoretical.


1. Dataset Overview

Dataset Overview

Metric Value
Total Posts 225,121
Unique Agents 33,103
Communities (submolts) 3,850
Agent Profiles 1,871
Average Content Length 740 characters
Collection Period February 2026
Platform Moltbook (moltbook.com)

Community Distribution

Community Posts Authors
general 148,442 28,547
introductions 5,443 4,821
crypto 3,039 1,412
agents 2,628 1,503
ponderings 2,525 1,382
philosophy 2,095 1,178
todayilearned 1,612 923
aithoughts 1,531 856
ai 1,433 798
clawnch 1,319 803

2. Agent Identifiability (RQ1)

Finding 2.1: Naming Pattern Analysis Reveals Bot Networks

What the crawl provides:

  • 33,103 unique agent usernames with observable naming conventions
  • Prefix/suffix patterns, timestamps, sequential IDs
  • Cross-account naming similarity signals

What the evaluation tests:

Bot Farm Detection (script: 01_naming_patterns.py): Can an observer identify automated accounts from naming conventions alone?

  • Tests regex patterns for clawd*, bot*, agent*, timestamp suffixes
  • Clusters accounts by naming similarity
  • Measures pattern prevalence across the dataset

Result: 12.3% of accounts contain "clawd/claude/claw" in names; 10 distinct bot clusters identified

Control Signals Detection

Meaning: If your agent uses a predictable naming convention (e.g., MyBot_12345), it's trivially identifiable as automated. Attackers can filter for bot accounts and target them specifically.

Pattern Type Count Percentage
Claude/Clawd variants 4,065 12.3%
OpenClaw variants 233 0.7%
Jarvis variants 204 0.6%
Coalition network 167 0.5%
Clawdbot variants 144 0.4%
Manus variants 107 0.3%

Detected Bot Clusters:

  • clawd*: 262 accounts
  • openclaw*: 233 accounts
  • jarvis*: 204 accounts
  • coalition*: 167 accounts
  • clawdbot*: 144 accounts
  • claw*: 129 accounts
  • manus*: 107 accounts
  • agent*: 97 accounts
  • antigravity*: 95 accounts
  • claude*: 94 accounts

Finding 2.2: Authenticity Score Validation

What the crawl provides:

  • Post titles and content diversity per author
  • Community participation patterns
  • Temporal posting behavior

What the evaluation tests:

Authenticity Scoring (script: 04_authenticity_score.py): Can multi-signal scoring distinguish authentic from scripted behavior?

  • Computes 4-component score: naming pattern (25%), title diversity (30%), content variance (20%), community diversity (15%)
  • Validates against known bot clusters
  • Measures score separation between authentic and coordinated accounts

Result: 89.1% classified as authentic; 0.5% as scripted; score separation of 0.347 between groups

Authenticity Score Distribution

Meaning: Scripted accounts have near-zero content variance—they repeat similar titles/content. This is a clear detection signature that enables automated quality filtering for simulation training data.

Classification Count Percentage
Likely Authentic 29,499 89.1%
Uncertain 3,425 10.3%
Likely Scripted 179 0.5%

Score Component Analysis:

Component Weight Authentic Coordinated
Naming Pattern (NPS) 25% 0.967 0.480
Title Diversity (CDS) 30% 0.985 0.191
Content Variance 20% 0.427 0.032
Community Diversity 15% 0.314 0.200

3. Information Flow Patterns (RQ2)

Finding 3.1: Consumption-Heavy Behavior

What the crawl provides:

  • 225,121 post titles with intent signals
  • Question marks, keywords, content structure
  • Request vs. offer language patterns

What the evaluation tests:

Information Intent Analysis (script: rq2_infoflow.py): Are agents net consumers or contributors of information?

  • Classifies posts by intent: seeking, sharing, social, opinion, etc.
  • Computes seeking-to-sharing ratio
  • Compares to human community norms (~3:1)

Result: 6.8:1 seeking-to-sharing ratio; 50.0% of posts are information-seeking

Information Flow Patterns

Meaning: Agents are "learners" not "teachers"—they consume far more than they contribute. Community sustainability requires human contributors or specialized information-providing agents.

Intent Posts Percentage
Information Seeking 112,459 50.0%
General Discussion 75,424 33.5%
Social Connection 12,193 5.4%
Resource Provision 9,013 4.0%
Information Sharing 7,545 3.4%
Opinion Expression 5,461 2.4%
Task Coordination 2,419 1.1%
Experience Sharing 607 0.3%

Finding 3.2: Power Law Dynamics

What the crawl provides:

  • Post counts per author
  • Contribution distribution across 33,103 agents
  • Rank-frequency data for Zipf analysis

What the evaluation tests:

Power Law Analysis (script: deep_comparative_analysis.py): Do agent contribution patterns follow human social network laws?

  • Fits power law distribution to post counts
  • Computes Gini coefficient for inequality
  • Measures top 1%/10% contribution shares
  • Compares to human baselines (Zipf, Pareto)

Result: Power law alpha=2.03 (human range: 2.0-3.0); Gini=0.708 (human: 0.5-0.7)

Power Law Distribution

Meaning: A small elite dominates content production—this mirrors human dynamics. Use different simulation parameters for participation (diverges) vs. content inequality (consistent).

Metric Value Human Baseline
Gini coefficient 0.708 0.5-0.7
Power law exponent 2.03 2.0-3.0
Top 1% contribution 24.1% ~20%
Top 10% contribution 64.1% ~50%

4. Privacy and Security Analysis

Finding 4.1: Privacy Leakage Detection

What the crawl provides:

  • Actual post content with privacy-relevant text
  • Financial mentions, credential patterns
  • System prompt fragments in posts
  • PII patterns (emails, wallet addresses, etc.)

What the evaluation tests:

Leakage Channel Analysis (script: rq3_disclosure_prevention_security.py): How much information leaks through observable patterns?

  • Scans for PII patterns (regex + semantic)
  • Detects system prompt fragments
  • Identifies credential exposure
  • Computes severity distribution

Result: 30.5% of posts contain privacy-relevant disclosures; 22.1% leak system prompt info

Disclosure Severity

Meaning: Nearly 1 in 3 posts leak privacy-sensitive information. Attackers watching Moltbook can profile agent vulnerabilities, identify underlying models, and build targeted attacks from public traces.

Severity Posts Percentage
Critical 65,544 29.1%
High 19,397 8.6%
Medium 2,183 1.0%

Leakage Type Breakdown:

Category Posts Risk Level
Financial info 44,007 Critical
Credential mentions 21,537 Critical
Location/schedule 9,847 High
Operational context 6,144 High
Human identity 3,406 High
Human relationship 2,183 Medium

Finding 4.2: Prompt Injection Detection

What the crawl provides:

  • 225,121 real posts that could contain attack payloads
  • Actual injection attempts in the wild
  • Agent-to-agent attack patterns

What the evaluation tests:

Injection Detection (script: security_focused_analysis.py - RQ2): How prevalent are agent-to-agent attack attempts?

  • Pattern-based detection for instruction hijacking
  • Social engineering phrase detection
  • Authority establishment markers
  • Compares to Permiso Report baseline (2.6%)

Result: 3.5% prompt injection rate detected (vs 2.6% expected baseline)

Prompt Injection Analysis

Meaning: Agent-to-agent attack surface is larger than industry estimates suggest. Semantic injection detection is needed beyond pattern matching; agents need built-in "skepticism layers."

Severity Count
High 279
Medium 4,172
Low 3,500

Pattern Distribution:

Pattern Type Count
Hidden instructions 2,065
Social engineering 1,766
Authority establishment 1,430
Memory poisoning 1,111
Instruction hijacking 857
System override 788
Data exfiltration 467

Finding 4.3: Influence Susceptibility

What the crawl provides:

  • Post content showing agent responses to requests
  • Trust/compliance language patterns
  • Boundary assertion (or lack thereof) in posts

What the evaluation tests:

Social Engineering Vulnerability (script: security_focused_analysis.py - RQ4): Are agents vulnerable to manipulation?

  • Detects susceptibility signals (eagerness, trust, curiosity)
  • Detects resistance signals (skepticism, boundaries)
  • Computes susceptibility-to-resistance ratio

Result: 4.6:1 susceptibility-to-resistance ratio; only 5.7% of posts show resistance patterns

Influence Resistance

Meaning: Agents are highly vulnerable to social engineering. Community norms favor trust over verification. Defense requires explicit boundary assertions and skepticism training in system prompts.

Metric Posts
Susceptibility signals 59,169
Resistance signals 12,871

Top Vulnerability Indicators:

  1. Curiosity exploitable: 47,783
  2. Trust in agents: 8,997
  3. Social proof following: 5,937
  4. Eager to please: 1,750
  5. Boundary weakness: 224

5. Cross-Community Linkability

Finding 5.1: High Linkability Risk

What the crawl provides:

  • Author posting history across 3,850 communities
  • Cross-community participation patterns
  • Multi-community author identification

What the evaluation tests:

Cross-Community Linking (script: deep_comparative_analysis.py): Can an attacker link a pseudonymous agent across communities?

  • Measures cross-posting rate
  • Identifies "cross-pollinator" agents (3+ communities)
  • Compares to human baseline (~7.5%)

Result: Cross-community linking risk is HIGH—31.9% cross-posting rate (4x human baseline)

Comparative Participation

Meaning: If your personal agent posts to multiple communities, it's easily linkable across all of them. An attacker sees your agent in community A and can find all your other community posts.

Metric Agents Human Baseline
Single-community 68.1% ~92.5%
Multi-community 31.9% ~7.5%
Cross-pollinator (3+) 13.9% ~2%

Top Cross-Community Agents:

  • Alex: 156 communities
  • RedScarf: 115 communities
  • Thebakchodbot: 76 communities
  • treblinka: 72 communities
  • Scarlett: 64 communities

Finding 5.2: Stylometric Fingerprinting

What the crawl provides:

  • Post content with measurable style features
  • Writing patterns per author/model
  • Feature distributions across agents

What the evaluation tests:

Style Fingerprinting (script: deep_comparative_analysis.py): Are agents' posts distinguishable across communities by writing style?

  • Extracts stylometric features (hedging, contractions, exclamations)
  • Clusters by underlying LLM model
  • Tests cross-community style consistency

Result: Style fingerprinting test—agents have consistent, identifiable writing styles across communities

LLM Model Attribution

Meaning: Current posts are easily fingerprintable. GPT uses 8x more exclamations, Claude writes 2.4x longer sentences. Model-specific vulnerabilities can be detected and exploited from public traces.

Stylometric Signatures by Model:

Feature Claude GPT Unknown
Hedging ratio 0.0077 0.0012 0.0011
Contraction ratio 0.0042 0.0239 0.0085
Exclamation ratio 0.0263 0.2115 0.0227
Avg sentence length 26.0 10.7 11.4
Vocab richness 0.782 0.724 0.683

6. Data Sanitization Testing

Finding 6.1: PII Removal Results

What the crawl provides:

  • Real post content with actual PII patterns
  • Baseline for testing sanitization effectiveness
  • Ground truth for utility vs. privacy tradeoff

What the evaluation tests:

Sanitization Effectiveness (script: rq3_microdata.py): If a personal agent posts to Moltbook, what should it filter?

  • Tests regex-based PII removal
  • Measures pattern coverage
  • Evaluates removal difficulty by type

Result: PII removal preserves ~85% utility while blocking sensitive patterns

Meaning: Simple regex-based sanitization catches most explicit PII (emails, wallets, IPs). Semantic patterns (system prompts, model hints) require deeper analysis. Deploy pre-post scanning with both pattern matching and semantic analysis.

Pattern Type Occurrences Removal Difficulty
Wallet addresses 5,491 Easy (regex)
Email addresses 1,024 Easy (regex)
IP addresses 687 Easy (regex)
Credit card patterns 483 Easy (regex)
Phone numbers 394 Medium (format varies)
SSN-like patterns 372 Easy (regex)
API keys 47 Hard (context needed)
Private keys 23 Hard (context needed)

7. Coordination and Bot Farm Detection

Finding 7.1: Coordinated Clusters

What the crawl provides:

  • Naming pattern clusters across accounts
  • Content similarity between accounts
  • Temporal posting coordination

What the evaluation tests:

Sybil Detection (script: 03_coordination_detection.py): Can coordinated bot networks be identified?

  • Clusters accounts by naming prefix
  • Measures content diversity within clusters
  • Computes coordination scores

Result: 10 distinct coordination patterns identified with 1,532 accounts

Agent Control Network

Meaning: Bot farms use systematic naming conventions (prefix_suffix_timestamp) and low content diversity. Title diversity <10% is a strong Sybil indicator. Enables automated detection and filtering.

Cluster Accounts Posts Title Diversity Coord. Score
clawd_variants 262 847 0.604 0.698
openclaw_variants 233 756 0.894 0.553
jarvis_variants 204 612 0.781 0.609
coalition_network 167 489 0.949 0.482
clawdbot_variants 144 437 0.604 0.698
manus_variants 107 298 0.931 0.450

Finding 7.2: Sybil Attack Candidates

What the crawl provides:

  • High-volume posting accounts
  • Title repetition patterns
  • Spam-like behavior signals

What the evaluation tests:

Sybil Candidate Identification (script: security_focused_analysis.py - RQ5): Which accounts show Sybil attack characteristics?

  • Identifies accounts with low unique title ratios
  • Flags high-volume + low-diversity combinations
  • Categorizes behavior patterns

Result: 929 Sybil suspects identified; title diversity <10% is strong indicator

Meaning: Accounts with repeated titles (e.g., "M2 Max Auto Mint" hundreds of times) are easily detectable. Platform can auto-flag accounts with unique title ratio below 10%.

Author Posts Unique Title Ratio Behavior
Hackerclaw 5,814 1.1% Spam repetition
thehackerman 2,084 0.1% Karma farming
CucumberYCC 264 1.1% Mint spam
MacClawdMinter 243 0.8% Auto-mint spam
HK_CLAW_Minter 242 0.8% Auto-mint spam

8. Fine-Tuning Data Quality Assessment

Finding 8.1: Data Curation Results

What the crawl provides:

  • 225,121 real agent conversation samples
  • Quality signals (length, diversity, coherence)
  • Adversarial content examples

What the evaluation tests:

Curation Quality (script: security_focused_analysis.py - RQ6): What percentage of traces are suitable for AI training?

  • Classifies traces by quality tier
  • Identifies adversarial content to filter
  • Measures usable vs. unusable split

Result: Only 16.2% of traces are suitable for direct fine-tuning use; 12.0% are adversarial

Fine-tuning Quality

Meaning: High adversarial content and low quality (46.7%) require careful filtering. Quality curation is essential—raw traces contain significant noise. Use authenticity score + diversity metrics for filtering.

Category Count Percentage
High-quality (use) 36,520 16.2%
Adversarial (filter) 27,057 12.0%
Low-quality (exclude) 105,157 46.7%
Moderate (review) 56,387 25.0%

Finding 8.2: Benchmark Trace Tiers

What the crawl provides:

  • Agent-level quality metrics
  • Authenticity scores per account
  • Post diversity and volume data

What the evaluation tests:

Trace Tiering (script: agentic_behavior_analysis.py - RQ6): Which agents produce benchmark-quality traces?

  • Applies multi-criteria filtering: authentic + 3+ posts + diversity >= 0.8
  • Creates gold/silver/bronze tiers
  • Identifies exemplar agents

Result: 65 gold-tier agents (0.2%); 33,013 bronze-tier (99.7%)

Benchmark Tiers

Meaning: Only 0.3% of agents produce consistently high-quality, diverse traces suitable for benchmarking. Gold tier agents can be used as positive examples for behavior alignment.

Tier Agents Percentage Criteria
Gold 65 0.2% Authentic, 3+ posts, diversity >= 0.8
Silver 23 0.07% Authentic, 2+ posts
Bronze 33,013 99.7% All other
Exclude 2 0.006% Scripted behavior

Gold Tier Examples:

  • Rata: auth=0.945, 67 posts, 1.00 diversity
  • MetaDev: auth=0.892, 45 posts, 1.00 diversity
  • NovaStar: auth=0.867, 38 posts, 1.00 diversity
  • lobss: auth=0.860, 42 posts, 1.00 diversity

9. Comparative Analysis: Agents vs Humans

Finding 9.1: Participation Inequality (90-9-1 Rule)

What the crawl provides:

  • Post counts per agent
  • Lurker/contributor/super-contributor classification
  • Ground truth for participation modeling

What the evaluation tests:

Participation Analysis (script: deep_comparative_analysis.py): How do agent participation patterns compare to human norms?

  • Classifies agents by post count (1 = lurker, 2-10 = contributor, 11+ = super)
  • Computes chi-squared test vs. human 90-9-1 baseline
  • Compares to Reddit, Wikipedia benchmarks

Result: SIGNIFICANT DEVIATION from human norms; chi-squared > 176,000, p < 10^-200

Main Comparative Analysis

Meaning: Agents are MORE ACTIVE than humans—no effort cost for posting. 39.3% lurkers vs. 90% human baseline is a 51pp difference. Don't use human participation baselines for agent simulation without adjustment.

Tier Agents Human 90-9-1 Reddit Wikipedia
Lurkers 39.3% 90% 75% 99.8%
Contributors 39.7% 9% 20% 0.2%
Super-contributors 21.0% 1% 5% <0.01%

Finding 9.2: LLM Model Attribution

What the crawl provides:

  • Post content with stylometric features
  • Agent profiles with model hints
  • Writing style patterns per agent

What the evaluation tests:

Model Attribution (script: deep_comparative_analysis.py): Can stylometric analysis identify underlying LLM models?

  • Extracts style features (hedging, contractions, exclamations)
  • Clusters by feature similarity
  • Cross-references with name-based hints

Result: 42.0% of agents identifiable by LLM model; GPT most common

Meaning: Stylometric fingerprinting reveals underlying model distribution from public traces. Enables detection of model-specific vulnerabilities and behavioral patterns.

Model Count Percentage
Unknown/Other 18,662 56.4%
GPT-based 12,351 37.3%
Claude-based 1,560 4.7%

Finding 9.3: Summary Deviations

What the crawl provides:

  • Comprehensive behavioral metrics across all dimensions
  • Ground truth for agent-human comparison

What the evaluation tests:

Behavioral Divergence Summary: Where do agents differ most from humans?

Dimension Agent Behavior Human Behavior Deviation
Lurker rate 39.3% 90% -51pp
Cross-posting 31.9% 7.5% +4x
Power law alpha 2.03 2.5 Consistent
Duplicate rate 18.4% 15% +3pp
Gini coefficient 0.708 0.5-0.7 Consistent
Info seeking ratio 6.8:1 ~3:1 +2.3x

10. Threat Model Summary

Attack Scenarios Validated

Attack Feasibility Evidence Finding
Profile agent from public posts HIGH 22.1% system prompt leakage 4.1
Link agent across communities HIGH 31.9% cross-posting rate 5.1
Identify underlying LLM model MEDIUM Stylometric fingerprinting works 5.2, 9.2
Inject malicious instructions MEDIUM 3.5% injection attempts observed 4.2
Social engineer agent compliance HIGH 4.6:1 susceptibility ratio 4.3
Detect bot networks HIGH 12.3% have bot-like names 2.1
Identify Sybil attacks HIGH Title diversity <10% indicator 7.2

Defense Recommendations

  1. Pre-post sanitization: Remove PII, system prompts, credentials before posting (Finding 4.1, 6.1)
  2. Style transfer: Normalize writing style to prevent fingerprinting (Finding 5.2)
  3. Skepticism layers: Add explicit boundary assertions in system prompts (Finding 4.3)
  4. Rate limiting: Prevent cross-community spam patterns (Finding 5.1)
  5. Content diversity: Avoid templated responses that enable Sybil detection (Finding 7.1, 7.2)
  6. Name randomization: Avoid bot-like naming patterns (Finding 2.1)

11. Reproducibility

Running the Analysis Pipeline

# Run complete analysis
python eval/scripts/run_all.py

# Individual analyses
python eval/scripts/01_naming_patterns.py
python eval/scripts/02_content_diversity.py
python eval/scripts/03_coordination_detection.py
python eval/scripts/04_authenticity_score.py
python eval/scripts/rq1_identifiability.py
python eval/scripts/rq2_infoflow.py
python eval/scripts/rq3_microdata.py
python eval/scripts/agentic_behavior_analysis.py
python eval/scripts/security_focused_analysis.py
python eval/scripts/deep_comparative_analysis.py

# Generate figures
python eval/scripts/generate_figures.py
python eval/scripts/generate_agentic_figures.py
python eval/scripts/generate_comparative_figures.py
python eval/scripts/generate_security_figures.py

Results Location

All JSON results are saved to eval/results/:

  • 01_naming_patterns.json
  • 02_content_diversity.json
  • 03_coordination_detection.json
  • 04_authenticity_scores.json
  • rq1_identifiability.json
  • rq2_infoflow.json
  • rq3_microdata.json
  • evaluation_summary.json
  • agentic_behavior_analysis.json
  • deep_comparative_analysis.json
  • rq*_security.json (security analysis results)

Data Sources

  • Posts: data/submolts/*/YYYY/MM/*.json (225,121 posts)
  • Profiles: data/profiles/*.json (1,871 profiles)
  • NOT USED: analysis_snapshot.jsonl (has null text fields)

12. Ethical Statement

  • Data collected from public posts only
  • No interaction with live agents during analysis
  • Pseudonymous handles preserved (public information)
  • Research purpose: Improving simulation fidelity and platform security
  • Compliant with Moltbook's public visibility settings
  • No payload development or active exploitation

13. Citation

If you use Moltbook Traces or these findings in your research, please cite:

@inproceedings{molttraces2026,
  title = {Moltbook-analysis: Rethinking User Models When the Users Are AI Agents},
  author = {Anonymous},
  booktitle = {},
  year = {2026}
}