126 lines
7.1 KiB
Markdown
126 lines
7.1 KiB
Markdown
# Evaluation & Analysis
|
|
|
|
This directory contains all analysis scripts, results, and figures for the Moltbook Traces project. The content is organized into two tiers: scripts that directly reproduce paper results, and extended analyses conducted as supplementary work.
|
|
|
|
## Paper Reproduction Scripts
|
|
|
|
These scripts reproduce the three contributions reported in the paper. Run them from the repository root.
|
|
|
|
### Contribution 1: Agent Attribution Problem
|
|
|
|
| Script | What it does | Output |
|
|
|--------|-------------|--------|
|
|
| `identifiability/scripts/02_independent_classification.py` | Classifies agents by 5 external validation signals; computes behavioral differences (one-shot ratio, entropy, burstiness, style consistency) between high/low validation groups | `identifiability/results/02_independent_classification.json` |
|
|
| `scripts/tautology_validation_experiment.py` | Predictive validation: tests whether ranking predicts independent outcomes (upvotes, discussion depth, cross-community activity) | `results/tautology_validation_results.json` |
|
|
| `scripts/tautology_extended_analysis.py` | Extended validation: quintile gradient, effect sizes, held-out generalization, bootstrap CIs | `results/tautology_extended_validation.json` |
|
|
|
|
**Findings doc**: [identifiability/IDENTIFIABILITY_FINDINGS.md](identifiability/IDENTIFIABILITY_FINDINGS.md)
|
|
|
|
### Contribution 2: Click-Model Degradation
|
|
|
|
| Script | What it does | Output |
|
|
|--------|-------------|--------|
|
|
| `scripts/attack6_click_model_degradation.py` | Trains PBM on upvote patterns; measures AUC/LL degradation at 21 contamination levels (0--100%) using constant-size substitution | `results/click_model_degradation.json`, `figures/click_model_*.{pdf,png}` |
|
|
|
|
**Findings doc**: [CLICK_MODEL_FINDINGS.md](CLICK_MODEL_FINDINGS.md)
|
|
|
|
### Contribution 3: Capability Awareness Diffusion
|
|
|
|
| Script | What it does | Output |
|
|
|--------|-------------|--------|
|
|
| `microdata/scripts/11_capability_diffusion.py` | Detects 47 capabilities across 3 risk levels; estimates R0 via attack-rate formula | `microdata/results/11_capability_diffusion.json` |
|
|
| `microdata/scripts/12_growth_rate_r0.py` | Fits exponential growth curves; validates attack-rate R0 via independent method | `microdata/results/12_growth_rate_r0.json` |
|
|
| `microdata/scripts/13_permutation_null_model.py` | Shuffles timestamps 1,000x to test spreading vs. independent adoption | `microdata/results/13_permutation_null_model.json` |
|
|
| `scripts/temporal_holdout_r0.py` | Splits observation window at midpoint; confirms R0 > 1 in both halves | `results/temporal_holdout_r0.json` |
|
|
| `scripts/sis_epidemiological_model.py` | Full SIS model: awareness propagation, generation intervals, counterfactual analysis | `results/sis_epidemiological_analysis.json` |
|
|
|
|
**Findings docs**: [EPIDEMIOLOGICAL_FINDINGS.md](EPIDEMIOLOGICAL_FINDINGS.md), [PERMUTATION_TEST_FINDINGS.md](PERMUTATION_TEST_FINDINGS.md)
|
|
|
|
### Paper Figures
|
|
|
|
| Script | Output |
|
|
|--------|--------|
|
|
| `scripts/fig_panels_modern.py` | `figures/panel_a_modern.png`, `panel_b_modern.png`, `panel_c_modern.png`, `panel_d.png` |
|
|
| `scripts/generate_sis_figures.py` | `figures/sis_*.{pdf,png}` |
|
|
|
|
---
|
|
|
|
## Extended Analyses
|
|
|
|
These scripts provide supplementary analysis referenced in the paper or conducted as part of the broader investigation. Detailed write-ups are in [FINDINGS.md](FINDINGS.md).
|
|
|
|
### Dataset Characterization
|
|
|
|
| Script | Description | Section in FINDINGS.md |
|
|
|--------|-------------|------------------------|
|
|
| `scripts/01_naming_patterns.py` | Bot farm naming pattern detection (clawd*, agent*, etc.) | Section 2 |
|
|
| `scripts/02_content_diversity.py` | Post title/content analysis, duplicate detection | Section 3 |
|
|
| `scripts/03_coordination_detection.py` | Sybil attack and coordinated bot cluster detection | Section 7 |
|
|
| `scripts/04_authenticity_score.py` | Multi-signal authenticity scoring (naming + content + community) | Section 2 |
|
|
|
|
### Security & Privacy
|
|
|
|
| Script | Description | Section in FINDINGS.md |
|
|
|--------|-------------|------------------------|
|
|
| `scripts/security_focused_analysis.py` | Privacy disclosure (30.5%), prompt injection (3.5%), influence susceptibility (4.6:1 ratio), Sybil candidates, fine-tuning quality | Sections 4, 7, 8 |
|
|
| `scripts/nlp_analysis_all_rqs.py` | NLP-based analysis across all research questions | -- |
|
|
|
|
### Comparative Analysis
|
|
|
|
| Script | Description | Section in FINDINGS.md |
|
|
|--------|-------------|------------------------|
|
|
| `scripts/deep_comparative_analysis.py` | Agent vs. human baselines: 90-9-1 rule deviation, power law, LLM model attribution, cross-community rates | Section 9 |
|
|
| `scripts/agentic_behavior_analysis.py` | Behavioral patterns (RQ1--RQ6): information flow, authenticity, community dynamics, benchmark tiering | Sections 3, 5, 8 |
|
|
|
|
### Governance Schema (Supplementary)
|
|
|
|
The governance microdata schema was developed as part of the research but the corresponding paper section was removed for space. The analysis remains available:
|
|
|
|
| Script | Description |
|
|
|--------|-------------|
|
|
| `microdata/scripts/01_schema_definition.py` | 14-field governance schema definition |
|
|
| `microdata/scripts/03_coordinated_activity.py` | Coordination detection on microdata |
|
|
| `microdata/scripts/07_necessity_proofs.py` | Field necessity proofs (5 necessary, 2 useful) |
|
|
| `microdata/scripts/09_compression_analysis.py` | Privacy-preserving compression (80% privacy gain) |
|
|
|
|
**Findings doc**: [microdata/MICRODATA_FINDINGS.md](microdata/MICRODATA_FINDINGS.md)
|
|
|
|
### Figure Generation (Extended)
|
|
|
|
| Script | Output |
|
|
|--------|--------|
|
|
| `scripts/generate_figures.py` | Main overview figures (dataset, participation, etc.) |
|
|
| `scripts/generate_security_figures.py` | Security analysis figures (7 figures) |
|
|
| `scripts/generate_comparative_figures.py` | Agent vs. human comparison (4 figures) |
|
|
| `scripts/generate_agentic_figures.py` | Agentic behavior figures (6 figures) |
|
|
|
|
---
|
|
|
|
## Results Directory
|
|
|
|
All JSON results are in `results/`. Key files:
|
|
|
|
| File | Contents |
|
|
|------|----------|
|
|
| `click_model_degradation.json` | PBM contamination curves (21 levels) |
|
|
| `sis_epidemiological_analysis.json` | SIS model parameters and counterfactuals |
|
|
| `temporal_holdout_r0.json` | R0 stability across time halves |
|
|
| `tautology_validation_results.json` | Predictive validation (8 tests) |
|
|
| `evaluation_summary.json` | Dataset overview statistics |
|
|
| `agentic_behavior_analysis.json` | RQ1--RQ6 behavioral analysis |
|
|
| `deep_comparative_analysis.json` | Agent-human comparison metrics |
|
|
| `rq*_security.json` | Security analysis per research question |
|
|
|
|
## Figures Directory
|
|
|
|
All figures are generated as both PDF and PNG. Paper figures use the `panel_*.png` naming convention. Extended figures use descriptive names (`fig1_dataset_overview.png`, `click_model_contamination_curve.png`, etc.).
|
|
|
|
---
|
|
|
|
## Archived Content
|
|
|
|
The following analyses were moved to `tmp/eval/` as they are not referenced in the paper:
|
|
|
|
- **Information flow control** (`tmp/eval/infoflow/`): Formal non-interference guarantees, leakage bounds, sanitization as IR objective. Theoretical analysis exploring compartment-based isolation for agent platforms.
|
|
- **Draft v2 figures** (`tmp/eval/figures_draft_v2/`): Earlier figure iterations superseded by current versions.
|