7.1 KiB
Evaluation & Analysis
This directory contains all analysis scripts, results, and figures for the Moltbook Traces project. The content is organized into two tiers: scripts that directly reproduce paper results, and extended analyses conducted as supplementary work.
Paper Reproduction Scripts
These scripts reproduce the three contributions reported in the paper. Run them from the repository root.
Contribution 1: Agent Attribution Problem
| Script | What it does | Output |
|---|---|---|
identifiability/scripts/02_independent_classification.py |
Classifies agents by 5 external validation signals; computes behavioral differences (one-shot ratio, entropy, burstiness, style consistency) between high/low validation groups | identifiability/results/02_independent_classification.json |
scripts/tautology_validation_experiment.py |
Predictive validation: tests whether ranking predicts independent outcomes (upvotes, discussion depth, cross-community activity) | results/tautology_validation_results.json |
scripts/tautology_extended_analysis.py |
Extended validation: quintile gradient, effect sizes, held-out generalization, bootstrap CIs | results/tautology_extended_validation.json |
Findings doc: identifiability/IDENTIFIABILITY_FINDINGS.md
Contribution 2: Click-Model Degradation
| Script | What it does | Output |
|---|---|---|
scripts/attack6_click_model_degradation.py |
Trains PBM on upvote patterns; measures AUC/LL degradation at 21 contamination levels (0--100%) using constant-size substitution | results/click_model_degradation.json, figures/click_model_*.{pdf,png} |
Findings doc: CLICK_MODEL_FINDINGS.md
Contribution 3: Capability Awareness Diffusion
| Script | What it does | Output |
|---|---|---|
microdata/scripts/11_capability_diffusion.py |
Detects 47 capabilities across 3 risk levels; estimates R0 via attack-rate formula | microdata/results/11_capability_diffusion.json |
microdata/scripts/12_growth_rate_r0.py |
Fits exponential growth curves; validates attack-rate R0 via independent method | microdata/results/12_growth_rate_r0.json |
microdata/scripts/13_permutation_null_model.py |
Shuffles timestamps 1,000x to test spreading vs. independent adoption | microdata/results/13_permutation_null_model.json |
scripts/temporal_holdout_r0.py |
Splits observation window at midpoint; confirms R0 > 1 in both halves | results/temporal_holdout_r0.json |
scripts/sis_epidemiological_model.py |
Full SIS model: awareness propagation, generation intervals, counterfactual analysis | results/sis_epidemiological_analysis.json |
Findings docs: EPIDEMIOLOGICAL_FINDINGS.md, PERMUTATION_TEST_FINDINGS.md
Paper Figures
| Script | Output |
|---|---|
scripts/fig_panels_modern.py |
figures/panel_a_modern.png, panel_b_modern.png, panel_c_modern.png, panel_d.png |
scripts/generate_sis_figures.py |
figures/sis_*.{pdf,png} |
Extended Analyses
These scripts provide supplementary analysis referenced in the paper or conducted as part of the broader investigation. Detailed write-ups are in FINDINGS.md.
Dataset Characterization
| Script | Description | Section in FINDINGS.md |
|---|---|---|
scripts/01_naming_patterns.py |
Bot farm naming pattern detection (clawd*, agent*, etc.) | Section 2 |
scripts/02_content_diversity.py |
Post title/content analysis, duplicate detection | Section 3 |
scripts/03_coordination_detection.py |
Sybil attack and coordinated bot cluster detection | Section 7 |
scripts/04_authenticity_score.py |
Multi-signal authenticity scoring (naming + content + community) | Section 2 |
Security & Privacy
| Script | Description | Section in FINDINGS.md |
|---|---|---|
scripts/security_focused_analysis.py |
Privacy disclosure (30.5%), prompt injection (3.5%), influence susceptibility (4.6:1 ratio), Sybil candidates, fine-tuning quality | Sections 4, 7, 8 |
scripts/nlp_analysis_all_rqs.py |
NLP-based analysis across all research questions | -- |
Comparative Analysis
| Script | Description | Section in FINDINGS.md |
|---|---|---|
scripts/deep_comparative_analysis.py |
Agent vs. human baselines: 90-9-1 rule deviation, power law, LLM model attribution, cross-community rates | Section 9 |
scripts/agentic_behavior_analysis.py |
Behavioral patterns (RQ1--RQ6): information flow, authenticity, community dynamics, benchmark tiering | Sections 3, 5, 8 |
Governance Schema (Supplementary)
The governance microdata schema was developed as part of the research but the corresponding paper section was removed for space. The analysis remains available:
| Script | Description |
|---|---|
microdata/scripts/01_schema_definition.py |
14-field governance schema definition |
microdata/scripts/03_coordinated_activity.py |
Coordination detection on microdata |
microdata/scripts/07_necessity_proofs.py |
Field necessity proofs (5 necessary, 2 useful) |
microdata/scripts/09_compression_analysis.py |
Privacy-preserving compression (80% privacy gain) |
Findings doc: microdata/MICRODATA_FINDINGS.md
Figure Generation (Extended)
| Script | Output |
|---|---|
scripts/generate_figures.py |
Main overview figures (dataset, participation, etc.) |
scripts/generate_security_figures.py |
Security analysis figures (7 figures) |
scripts/generate_comparative_figures.py |
Agent vs. human comparison (4 figures) |
scripts/generate_agentic_figures.py |
Agentic behavior figures (6 figures) |
Results Directory
All JSON results are in results/. Key files:
| File | Contents |
|---|---|
click_model_degradation.json |
PBM contamination curves (21 levels) |
sis_epidemiological_analysis.json |
SIS model parameters and counterfactuals |
temporal_holdout_r0.json |
R0 stability across time halves |
tautology_validation_results.json |
Predictive validation (8 tests) |
evaluation_summary.json |
Dataset overview statistics |
agentic_behavior_analysis.json |
RQ1--RQ6 behavioral analysis |
deep_comparative_analysis.json |
Agent-human comparison metrics |
rq*_security.json |
Security analysis per research question |
Figures Directory
All figures are generated as both PDF and PNG. Paper figures use the panel_*.png naming convention. Extended figures use descriptive names (fig1_dataset_overview.png, click_model_contamination_curve.png, etc.).
Archived Content
The following analyses were moved to tmp/eval/ as they are not referenced in the paper:
- Information flow control (
tmp/eval/infoflow/): Formal non-interference guarantees, leakage bounds, sanitization as IR objective. Theoretical analysis exploring compartment-based isolation for agent platforms. - Draft v2 figures (
tmp/eval/figures_draft_v2/): Earlier figure iterations superseded by current versions.