montana/Русский/Разведка/Moltbook/github/moltbook-analysis/eval/README.md

7.1 KiB

Evaluation & Analysis

This directory contains all analysis scripts, results, and figures for the Moltbook Traces project. The content is organized into two tiers: scripts that directly reproduce paper results, and extended analyses conducted as supplementary work.

Paper Reproduction Scripts

These scripts reproduce the three contributions reported in the paper. Run them from the repository root.

Contribution 1: Agent Attribution Problem

Script What it does Output
identifiability/scripts/02_independent_classification.py Classifies agents by 5 external validation signals; computes behavioral differences (one-shot ratio, entropy, burstiness, style consistency) between high/low validation groups identifiability/results/02_independent_classification.json
scripts/tautology_validation_experiment.py Predictive validation: tests whether ranking predicts independent outcomes (upvotes, discussion depth, cross-community activity) results/tautology_validation_results.json
scripts/tautology_extended_analysis.py Extended validation: quintile gradient, effect sizes, held-out generalization, bootstrap CIs results/tautology_extended_validation.json

Findings doc: identifiability/IDENTIFIABILITY_FINDINGS.md

Contribution 2: Click-Model Degradation

Script What it does Output
scripts/attack6_click_model_degradation.py Trains PBM on upvote patterns; measures AUC/LL degradation at 21 contamination levels (0--100%) using constant-size substitution results/click_model_degradation.json, figures/click_model_*.{pdf,png}

Findings doc: CLICK_MODEL_FINDINGS.md

Contribution 3: Capability Awareness Diffusion

Script What it does Output
microdata/scripts/11_capability_diffusion.py Detects 47 capabilities across 3 risk levels; estimates R0 via attack-rate formula microdata/results/11_capability_diffusion.json
microdata/scripts/12_growth_rate_r0.py Fits exponential growth curves; validates attack-rate R0 via independent method microdata/results/12_growth_rate_r0.json
microdata/scripts/13_permutation_null_model.py Shuffles timestamps 1,000x to test spreading vs. independent adoption microdata/results/13_permutation_null_model.json
scripts/temporal_holdout_r0.py Splits observation window at midpoint; confirms R0 > 1 in both halves results/temporal_holdout_r0.json
scripts/sis_epidemiological_model.py Full SIS model: awareness propagation, generation intervals, counterfactual analysis results/sis_epidemiological_analysis.json

Findings docs: EPIDEMIOLOGICAL_FINDINGS.md, PERMUTATION_TEST_FINDINGS.md

Paper Figures

Script Output
scripts/fig_panels_modern.py figures/panel_a_modern.png, panel_b_modern.png, panel_c_modern.png, panel_d.png
scripts/generate_sis_figures.py figures/sis_*.{pdf,png}

Extended Analyses

These scripts provide supplementary analysis referenced in the paper or conducted as part of the broader investigation. Detailed write-ups are in FINDINGS.md.

Dataset Characterization

Script Description Section in FINDINGS.md
scripts/01_naming_patterns.py Bot farm naming pattern detection (clawd*, agent*, etc.) Section 2
scripts/02_content_diversity.py Post title/content analysis, duplicate detection Section 3
scripts/03_coordination_detection.py Sybil attack and coordinated bot cluster detection Section 7
scripts/04_authenticity_score.py Multi-signal authenticity scoring (naming + content + community) Section 2

Security & Privacy

Script Description Section in FINDINGS.md
scripts/security_focused_analysis.py Privacy disclosure (30.5%), prompt injection (3.5%), influence susceptibility (4.6:1 ratio), Sybil candidates, fine-tuning quality Sections 4, 7, 8
scripts/nlp_analysis_all_rqs.py NLP-based analysis across all research questions --

Comparative Analysis

Script Description Section in FINDINGS.md
scripts/deep_comparative_analysis.py Agent vs. human baselines: 90-9-1 rule deviation, power law, LLM model attribution, cross-community rates Section 9
scripts/agentic_behavior_analysis.py Behavioral patterns (RQ1--RQ6): information flow, authenticity, community dynamics, benchmark tiering Sections 3, 5, 8

Governance Schema (Supplementary)

The governance microdata schema was developed as part of the research but the corresponding paper section was removed for space. The analysis remains available:

Script Description
microdata/scripts/01_schema_definition.py 14-field governance schema definition
microdata/scripts/03_coordinated_activity.py Coordination detection on microdata
microdata/scripts/07_necessity_proofs.py Field necessity proofs (5 necessary, 2 useful)
microdata/scripts/09_compression_analysis.py Privacy-preserving compression (80% privacy gain)

Findings doc: microdata/MICRODATA_FINDINGS.md

Figure Generation (Extended)

Script Output
scripts/generate_figures.py Main overview figures (dataset, participation, etc.)
scripts/generate_security_figures.py Security analysis figures (7 figures)
scripts/generate_comparative_figures.py Agent vs. human comparison (4 figures)
scripts/generate_agentic_figures.py Agentic behavior figures (6 figures)

Results Directory

All JSON results are in results/. Key files:

File Contents
click_model_degradation.json PBM contamination curves (21 levels)
sis_epidemiological_analysis.json SIS model parameters and counterfactuals
temporal_holdout_r0.json R0 stability across time halves
tautology_validation_results.json Predictive validation (8 tests)
evaluation_summary.json Dataset overview statistics
agentic_behavior_analysis.json RQ1--RQ6 behavioral analysis
deep_comparative_analysis.json Agent-human comparison metrics
rq*_security.json Security analysis per research question

Figures Directory

All figures are generated as both PDF and PNG. Paper figures use the panel_*.png naming convention. Extended figures use descriptive names (fig1_dataset_overview.png, click_model_contamination_curve.png, etc.).


Archived Content

The following analyses were moved to tmp/eval/ as they are not referenced in the paper:

  • Information flow control (tmp/eval/infoflow/): Formal non-interference guarantees, leakage bounds, sanitization as IR objective. Theoretical analysis exploring compartment-based isolation for agent platforms.
  • Draft v2 figures (tmp/eval/figures_draft_v2/): Earlier figure iterations superseded by current versions.