montana/Русский/Разведка/Moltbook/github/moltbook-analysis/eval/COLD_START_FINDINGS.md

14 KiB
Raw Permalink Blame History

Community-Level Heterogeneity in Click Model Contamination

Analysis Date: 2026-02-13 Extends: Attack 6 (Click Model Degradation) Script: eval/scripts/attack6b_cold_start_inversion.py Results: eval/results/click_model_cold_start.json Dataset: 286,217 posts, 767 communities, 143 communities with stable per-community AUC


Executive Summary

Attack 6 established that contaminating a click model's training set with low-validation agent data degrades prediction monotonically at the aggregate level (8.5% AUC drop at 50% contamination). This follow-up decomposes that aggregate into per-community effects and finds a Simpson's paradox: the monotonic aggregate curve masks the fact that 45% of communities show improved prediction under contamination. The effect is concentrated in mid-density ("warm") communities, where a small amount of contamination acts as implicit regularization, producing a non-monotonic response with AUC peaking at 10% contamination before declining.

Key Headline Findings

Finding Value Significance
Communities where contamination improved AUC 64 / 143 (44.8%) Nearly half improve under mixed training
Warm tercile mean ΔAUC +0.0067 Only tercile with positive mean effect
Warm curve peak 10% contamination (+2.0% AUC) Non-monotonic: improvement before decline
Cold tercile mean ΔAUC 0.0218 Sparse communities hurt most
Hot tercile mean ΔAUC 0.0028 Dense communities barely affected
Spearman ρ (size vs ΔAUC) 0.134 Weak linear correlation; relationship is non-linear

1. Experimental Design

Builds On Attack 6

This experiment reuses Attack 6's data pipeline, validation scoring, and PBM model architecture. It adds per-community AUC decomposition and tercile-stratified contamination curves.

Per-Community AUC Requirements

  • Communities must have ≥20 test posts for stable per-community AUC estimates
  • 143 of 767 communities meet this threshold
  • These 143 communities account for the vast majority of test data (>95% of 52,513 test posts)

Models Compared

Model Training Data Purpose
theta_high All high-validation posts (210,053) Organic-only baseline
theta_mixed High + low combined (233,704) Undifferentiated platform model

For each community c, we compute:

ΔAUC_c = AUC(theta_mixed, test_c)  AUC(theta_high, test_c)

Positive ΔAUC = contamination helped; negative = contamination hurt.

Tercile Definition

Communities sorted by organic (high-validation) training post count and split into three equal groups:

Tercile n Post range Total test posts
Cold 47 48125 1,202
Warm 48 125279 2,167
Hot 48 280141,411 45,752

The hot tercile contains 93% of all test data, explaining why the aggregate curve reflects hot-community behavior.


2. The Simpson's Paradox

Finding 2.1: 45% of Communities Improve Under Contamination

Direction Communities Percentage
Contamination improved AUC 64 44.8%
Contamination degraded AUC 75 52.4%
No change 4 2.8%

The aggregate contamination curve (Attack 6, Figure d) shows monotonic degradation because it is dominated by the 48 hot communities, which collectively contribute 45,752 of 52,513 test posts. The 64 communities that improve are mostly mid-size and their positive signal is drowned out in the aggregate average.

Finding 2.2: ΔAUC Distribution Is Centered Near Zero With High Variance

The scatter plot (Figure 1) shows ΔAUC ranging from 0.26 to +0.17 across communities. The distribution is roughly symmetric around zero for warm and hot communities, with cold communities skewed negative. This is not a story of uniform degradation — it is a story of heterogeneous, community-specific responses.


3. Tercile Analysis

Finding 3.1: Warm Communities Show Positive Mean ΔAUC

Tercile Mean ΔAUC Median ΔAUC Std ΔAUC Pct improved
Cold 0.0218 0.0238 0.066 32%
Warm +0.0067 +0.0045 0.058 54%
Hot 0.0028 0.0035 0.037 48%

The warm tercile is the only group where:

  • Mean ΔAUC is positive
  • A majority (54%) of communities show improvement
  • Median ΔAUC is also positive (ruling out outlier-driven means)

Finding 3.2: Warm Contamination Curve Is Non-Monotonic

Per-tercile contamination curves (trained at 11 levels from 0% to 100% in 10% increments):

Contamination Cold AUC Warm AUC Hot AUC
0% (baseline) 0.600 0.608 0.655
10% 0.595 0.620 (+2.0%) 0.642
20% 0.581 0.609 0.628
30% 0.578 0.614 (+1.0%) 0.617
40% 0.585 0.606 0.604
50% 0.571 0.601 0.592
60% 0.573 0.585 0.581
70% 0.559 0.575 0.570
80% 0.560 0.569 0.560
90% 0.575 0.534 0.549
100% 0.513 0.549 0.536

The warm curve peaks at 10% contamination (AUC 0.620 vs 0.608 baseline) and remains above baseline through 30%. This is the non-monotonic response the reviewer asked about: a small amount of low-validation data improves prediction for these communities.

The hot curve is monotonically declining — consistent with the aggregate result.

The cold curve is noisy and generally declining, reflecting the instability of AUC estimates on only 1,202 test posts spread across 47 communities.


4. Mechanism: Regularization Sweet Spot

Finding 4.1: Cold-Start Hypothesis Rejected

The initial hypothesis was cold-start inversion: sparse communities benefit because any signal reduces variance on poorly-estimated community parameters. The data rejects this — cold communities degrade the most (0.022 mean ΔAUC).

Why cold-start fails:

  • Cold communities have ≈26 test posts per community (1,202 / 47). AUC on 26 posts is extremely noisy regardless of model quality.
  • The community alpha estimates for cold communities are unreliable under both models (few training posts in both theta_high and theta_mixed), so adding contaminated data does not materially improve the alpha estimate.

Finding 4.2: Warm Communities Occupy a Regularization Sweet Spot

The mechanism in warm communities is different:

  • Enough organic data (125279 posts) for the community alpha to be roughly correctly estimated, but the model is still underfit — it hasn't fully converged on the optimal community-level parameters.
  • Low-validation data adds diversity to the training distribution. At low contamination rates (1030%), this acts as implicit regularization — analogous to dropout, label noise, or data augmentation in neural networks — that prevents overfitting to the organic training set.
  • At higher contamination (>30%), the bias from different engagement patterns overwhelms the regularization benefit, and AUC declines.

This produces the characteristic non-monotonic curve: improvement → peak → decline.

Finding 4.3: Hot Communities Are Robust

Hot communities (280+ posts, up to 141K) are barely affected (0.003 mean ΔAUC) because:

  • Their alpha estimates are already well-converged from abundant organic data
  • Contamination adds mild bias but the signal-to-noise ratio is high enough to absorb it
  • The aggregate curve's 8.5% AUC drop at 50% comes from the constant-size substitution design (23,651 total training posts), which is much smaller than the organic data available to hot communities

Finding 4.4: Size Alone Is a Weak Predictor

Spearman ρ = 0.134 between community size and ΔAUC. The relationship is non-linear — warm communities benefit, but it is not a simple "more data = less sensitivity" gradient. Other community-level factors (engagement rate heterogeneity, agent diversity, topical overlap between validation groups) likely moderate the effect but are not isolated in this analysis.


5. Alpha Divergence

Finding 5.1: Community Bias Shifts Are Heterogeneous

The alpha divergence plot (Figure 4) shows how each community's learned bias parameter changes between theta_high and theta_mixed:

Δα_c = α_mixed,c  α_high,c
  • Small communities show high variance in Δα (noisy alpha estimates in both models)
  • Large communities show Δα tightly clustered near zero (robust to contamination)
  • The pattern mirrors the ΔAUC findings: contamination perturbs community parameters most where estimates are unstable

6. Policy Implications

Finding 6.1: Blanket Filtering Is Not Pareto-Optimal

The Attack 6 aggregate result suggests the policy recommendation: "filter all low-validation agents from training data." The community-level decomposition shows this is suboptimal:

If you filter all low-val agents... Effect
Hot communities (48) +0.003 AUC improvement (negligible)
Warm communities (48) 0.007 AUC loss (54% of communities hurt)
Cold communities (47) +0.022 AUC improvement (but noisy)

A community-aware policy — filter for hot communities, include for warm — would be Pareto-superior to blanket filtering. This connects to real recommendation system design, where community-specific models or mixture weights are standard practice.

Finding 6.2: Governance Reframing

The story shifts from:

  • Old: "Agent contamination degrades IR models. Filter everything."
  • New: "Agent contamination is heterogeneous. Mid-density communities benefit from data diversity. Blanket exclusion sacrifices prediction quality where it matters most — in growing communities where the model is still learning."

This is a structurally different governance claim that connects to:

  • Fair ranking literature: Filtering disproportionately affects smaller communities
  • Data diversity / augmentation research: Noise can improve generalization when models are underfit
  • Practical system design: Community-specific contamination thresholds vs. global filtering

7. Known Limitations

L1: Test post imbalance across terciles. The cold tercile has only 1,202 test posts (≈26 per community) vs. 45,752 for hot. Per-community AUC estimates for cold communities are noisy and should be interpreted with caution.

L2: Tercile boundaries are arbitrary. The cold/warm/hot split at 33rd and 66th percentiles of community size is a convenient but not uniquely justified partition. The scatter plot (Figure 1) shows the underlying continuous distribution.

L3: No bootstrap confidence intervals. Single-seed results (seed=42). Multi-seed bootstrap would quantify uncertainty on per-tercile means and the location of the warm-curve peak.

L4: theta_mixed has more training data than theta_high in the constant-size design. The per-community comparison uses theta_high (210K posts) vs theta_mixed (234K posts). The additional 24K low-val posts in theta_mixed could partly explain warm-community improvements through a pure data-size effect rather than regularization. However, the contamination curves (where training size is held constant at 23,651) show the same non-monotonic warm-curve pattern, arguing against a pure size confound.

L5: Latent community moderators unexplored. Community engagement rate, topical focus, agent diversity, and temporal activity patterns may moderate the contamination effect but are not controlled for in this analysis.


8. Reproducibility

# Run the full experiment (~7 minutes: 5 min data loading, 2 min training + evaluation)
python eval/scripts/attack6b_cold_start_inversion.py

Output Files

File Location Description
Results JSON eval/results/click_model_cold_start.json All per-community and tercile results
Scatter plot eval/figures/cold_start_scatter.{pdf,png} ΔAUC vs community size, colored by tercile
Tercile curves eval/figures/cold_start_tercile_curves.{pdf,png} Per-tercile contamination curves (11 levels)
Tercile bars eval/figures/cold_start_tercile_bars.{pdf,png} Mean ΔAUC with std error bars
Alpha divergence eval/figures/cold_start_alpha_divergence.{pdf,png} Community bias divergence vs size

9. Paper-Ready Paragraph

The aggregate contamination curve, however, masks substantial community-level heterogeneity. Decomposing the AUC metric across 143 individual communities reveals a Simpson's paradox: 45% of communities show improved prediction when low-validation data is mixed into the training set. Stratifying communities into terciles by organic post density, we find that mid-density communities (125--279 posts) exhibit a non-monotonic response, with AUC peaking at 10% contamination (+2.0%) before declining (Figure~\ref{fig:tercile-curves}). This pattern is consistent with an implicit regularization effect: for communities where the model is still underfit, a small amount of behaviorally diverse data improves generalization, analogous to label smoothing or data augmentation. Dense communities degrade monotonically but by less than 0.3% AUC, while sparse communities degrade the most (-2.2\%) due to estimation instability. The governance implication is that blanket exclusion of low-validation agents is not Pareto-optimal---community-aware filtering policies that retain diverse data for mid-density communities would produce strictly better prediction quality overall.