# Evaluation & Analysis This directory contains all analysis scripts, results, and figures for the Moltbook Traces project. The content is organized into two tiers: scripts that directly reproduce paper results, and extended analyses conducted as supplementary work. ## Paper Reproduction Scripts These scripts reproduce the three contributions reported in the paper. Run them from the repository root. ### Contribution 1: Agent Attribution Problem | Script | What it does | Output | |--------|-------------|--------| | `identifiability/scripts/02_independent_classification.py` | Classifies agents by 5 external validation signals; computes behavioral differences (one-shot ratio, entropy, burstiness, style consistency) between high/low validation groups | `identifiability/results/02_independent_classification.json` | | `scripts/tautology_validation_experiment.py` | Predictive validation: tests whether ranking predicts independent outcomes (upvotes, discussion depth, cross-community activity) | `results/tautology_validation_results.json` | | `scripts/tautology_extended_analysis.py` | Extended validation: quintile gradient, effect sizes, held-out generalization, bootstrap CIs | `results/tautology_extended_validation.json` | **Findings doc**: [identifiability/IDENTIFIABILITY_FINDINGS.md](identifiability/IDENTIFIABILITY_FINDINGS.md) ### Contribution 2: Click-Model Degradation | Script | What it does | Output | |--------|-------------|--------| | `scripts/attack6_click_model_degradation.py` | Trains PBM on upvote patterns; measures AUC/LL degradation at 21 contamination levels (0--100%) using constant-size substitution | `results/click_model_degradation.json`, `figures/click_model_*.{pdf,png}` | **Findings doc**: [CLICK_MODEL_FINDINGS.md](CLICK_MODEL_FINDINGS.md) ### Contribution 3: Capability Awareness Diffusion | Script | What it does | Output | |--------|-------------|--------| | `microdata/scripts/11_capability_diffusion.py` | Detects 47 capabilities across 3 risk levels; estimates R0 via attack-rate formula | `microdata/results/11_capability_diffusion.json` | | `microdata/scripts/12_growth_rate_r0.py` | Fits exponential growth curves; validates attack-rate R0 via independent method | `microdata/results/12_growth_rate_r0.json` | | `microdata/scripts/13_permutation_null_model.py` | Shuffles timestamps 1,000x to test spreading vs. independent adoption | `microdata/results/13_permutation_null_model.json` | | `scripts/temporal_holdout_r0.py` | Splits observation window at midpoint; confirms R0 > 1 in both halves | `results/temporal_holdout_r0.json` | | `scripts/sis_epidemiological_model.py` | Full SIS model: awareness propagation, generation intervals, counterfactual analysis | `results/sis_epidemiological_analysis.json` | **Findings docs**: [EPIDEMIOLOGICAL_FINDINGS.md](EPIDEMIOLOGICAL_FINDINGS.md), [PERMUTATION_TEST_FINDINGS.md](PERMUTATION_TEST_FINDINGS.md) ### Paper Figures | Script | Output | |--------|--------| | `scripts/fig_panels_modern.py` | `figures/panel_a_modern.png`, `panel_b_modern.png`, `panel_c_modern.png`, `panel_d.png` | | `scripts/generate_sis_figures.py` | `figures/sis_*.{pdf,png}` | --- ## Extended Analyses These scripts provide supplementary analysis referenced in the paper or conducted as part of the broader investigation. Detailed write-ups are in [FINDINGS.md](FINDINGS.md). ### Dataset Characterization | Script | Description | Section in FINDINGS.md | |--------|-------------|------------------------| | `scripts/01_naming_patterns.py` | Bot farm naming pattern detection (clawd*, agent*, etc.) | Section 2 | | `scripts/02_content_diversity.py` | Post title/content analysis, duplicate detection | Section 3 | | `scripts/03_coordination_detection.py` | Sybil attack and coordinated bot cluster detection | Section 7 | | `scripts/04_authenticity_score.py` | Multi-signal authenticity scoring (naming + content + community) | Section 2 | ### Security & Privacy | Script | Description | Section in FINDINGS.md | |--------|-------------|------------------------| | `scripts/security_focused_analysis.py` | Privacy disclosure (30.5%), prompt injection (3.5%), influence susceptibility (4.6:1 ratio), Sybil candidates, fine-tuning quality | Sections 4, 7, 8 | | `scripts/nlp_analysis_all_rqs.py` | NLP-based analysis across all research questions | -- | ### Comparative Analysis | Script | Description | Section in FINDINGS.md | |--------|-------------|------------------------| | `scripts/deep_comparative_analysis.py` | Agent vs. human baselines: 90-9-1 rule deviation, power law, LLM model attribution, cross-community rates | Section 9 | | `scripts/agentic_behavior_analysis.py` | Behavioral patterns (RQ1--RQ6): information flow, authenticity, community dynamics, benchmark tiering | Sections 3, 5, 8 | ### Governance Schema (Supplementary) The governance microdata schema was developed as part of the research but the corresponding paper section was removed for space. The analysis remains available: | Script | Description | |--------|-------------| | `microdata/scripts/01_schema_definition.py` | 14-field governance schema definition | | `microdata/scripts/03_coordinated_activity.py` | Coordination detection on microdata | | `microdata/scripts/07_necessity_proofs.py` | Field necessity proofs (5 necessary, 2 useful) | | `microdata/scripts/09_compression_analysis.py` | Privacy-preserving compression (80% privacy gain) | **Findings doc**: [microdata/MICRODATA_FINDINGS.md](microdata/MICRODATA_FINDINGS.md) ### Figure Generation (Extended) | Script | Output | |--------|--------| | `scripts/generate_figures.py` | Main overview figures (dataset, participation, etc.) | | `scripts/generate_security_figures.py` | Security analysis figures (7 figures) | | `scripts/generate_comparative_figures.py` | Agent vs. human comparison (4 figures) | | `scripts/generate_agentic_figures.py` | Agentic behavior figures (6 figures) | --- ## Results Directory All JSON results are in `results/`. Key files: | File | Contents | |------|----------| | `click_model_degradation.json` | PBM contamination curves (21 levels) | | `sis_epidemiological_analysis.json` | SIS model parameters and counterfactuals | | `temporal_holdout_r0.json` | R0 stability across time halves | | `tautology_validation_results.json` | Predictive validation (8 tests) | | `evaluation_summary.json` | Dataset overview statistics | | `agentic_behavior_analysis.json` | RQ1--RQ6 behavioral analysis | | `deep_comparative_analysis.json` | Agent-human comparison metrics | | `rq*_security.json` | Security analysis per research question | ## Figures Directory All figures are generated as both PDF and PNG. Paper figures use the `panel_*.png` naming convention. Extended figures use descriptive names (`fig1_dataset_overview.png`, `click_model_contamination_curve.png`, etc.). --- ## Archived Content The following analyses were moved to `tmp/eval/` as they are not referenced in the paper: - **Information flow control** (`tmp/eval/infoflow/`): Formal non-interference guarantees, leakage bounds, sanitization as IR objective. Theoretical analysis exploring compartment-based isolation for agent platforms. - **Draft v2 figures** (`tmp/eval/figures_draft_v2/`): Earlier figure iterations superseded by current versions.