montana/Русский/Разведка/Moltbook/github/moltbook-analysis/eval/CLICK_MODEL_FINDINGS.md

12 KiB

Click Model Degradation Findings

Analysis Date: 2026-02-10 Data Source: data/submolts/ (posts with upvotes) and data/profiles/ (agent profiles) Dataset: 286,217 posts from 27,048 agents across 767 communities (filtered from 370,737 posts) Script: eval/scripts/attack6_click_model_degradation.py


Executive Summary

This experiment tests whether the attribution problem identified in Contributions 1 and 2 has practical consequences for IR systems. We train a position-based click model (PBM) on upvote patterns and measure how prediction quality degrades as low-validation agent data is substituted into the training set.

Key Headline Findings

Finding Value Significance
Engagement rate gap 76.2% (high-val) vs. 44.2% (low-val) Validation groups behave fundamentally differently
AUC drop at 50% contamination 8.5% Measurable degradation at realistic mixing
LL drop at 50% contamination 9.4% Calibration degrades faster than discrimination
AUC drop at 100% contamination 16.6% Model approaches random (AUC 0.534)
LL drop at 100% contamination 164.2% Severe miscalibration
Parameter divergence (L2) 62.11 Learned models are structurally different
Parameter divergence (KL) 0.946 Substantial distributional divergence
Degradation monotonic Yes Smooth, predictable degradation curve

What This Analysis Provides

The Click Model Experiment

  • Position-Based Model (PBM) adapted for social platform engagement
  • Community-specific examination bias (alpha_c) analogous to position bias in search
  • Agent feature attractiveness (beta) learned from karma, followers, content length
  • Binary engagement signal: P(upvote > 0 | community, features)

The Contamination Curve

  • Constant-size substitution design isolating contamination from dataset size effects
  • 21 contamination levels from 0% to 100% in 5% increments
  • Held-out evaluation on 52,513 high-validation posts never seen during training

Important Methodological Note

We measure PREDICTION DEGRADATION ON ORGANIC BEHAVIOR, not model quality in general.

The test set consists exclusively of high-validation (organic) agent posts. The question is: how much worse does a click model predict organic engagement when trained on data contaminated with low-validation (scripted) agents? This directly simulates the real-world scenario where an IR system is trained on platform data of unknown provenance.


1. Experimental Design

Finding 1.1: Validation Groups Show Distinct Engagement Patterns

What the crawl provides:

  • 33,810 agent profiles with karma, verification status, follower counts, owner linkage
  • 370,737 posts with upvote/downvote counts and community assignments

What the evaluation tests:

Validation Score Computation (5-signal weighted index): Agents are scored using the established identifiability formula and split into high-validation (top 40%) and low-validation (bottom 40%) groups.

Signal Weight Coverage
Karma (percentile) 20% 78.8% of agents
Verified status 25% 71.3% of agents
Follower ratio (percentile) 15% 63.7% of agents
Owner linkage (has X account) 25% 97.7% of agents
Comment/post ratio (percentile) 15% All agents

Result: Groups show a 32 percentage-point engagement gap

Metric High-Validation Low-Validation Delta
Agents 13,524 13,524 --
Posts 262,566 23,651 11.1x ratio
Engagement rate (upvote > 0) 76.2% 44.2% +32.0 pp
Posts per agent (approx) 19.4 1.7 11.2x ratio

Note on data imbalance: High-validation agents produce 11x more posts per capita. This is consistent with the identifiability findings (one-shot ratio: 17.8% high-val vs. 52.4% low-val). The experiment accounts for this via constant-size substitution (see Section 2).

Finding 1.2: Community Filtering

767 communities retained after filtering those with fewer than 10 posts (from 3,774 total). This removes noise from tiny communities where the PBM cannot estimate stable community-specific parameters.


2. Core Model Comparison

Finding 2.1: Three Training Populations Yield Divergent Models

What the evaluation tests:

Three PBM variants trained on different populations, all evaluated on the same held-out high-validation test set (n=52,513):

Model Training Data Train Size AUC Log-Likelihood
theta_high All high-validation posts 210,053 0.654 -0.520
theta_mixed High + low combined 233,704 0.640 -0.526
theta_low All low-validation posts 23,651 0.534 -1.460

Interpretation:

  • theta_high best predicts organic behavior (as expected -- trained on the same distribution)
  • theta_mixed incurs a 2.1% AUC penalty from including ~10% low-validation data
  • theta_low approaches random performance (AUC 0.534 vs. 0.500 random baseline)

Finding 2.2: Parameter Divergence Is Substantial

Metric theta_high vs. theta_low theta_high vs. theta_mixed
L2 distance 62.11 9.83
KL divergence 0.946 0.006

The two populations learn structurally different models. L2=62.11 across 771 parameters (767 community biases + 3 feature weights + 1 intercept) indicates pervasive disagreement, not isolated outliers. KL=0.946 confirms the predicted probability distributions diverge substantially.

Core model comparison


3. Contamination Curve

Finding 3.1: Monotonic Degradation Across All Contamination Levels

What the evaluation tests:

Constant-size substitution design: At each contamination level f, we train on (1-f) * 23,651 high-validation posts + f * 23,651 low-validation posts. Total training size is always 23,651. This isolates the contamination effect from training set size confounds.

Result: Performance degrades monotonically from 0% to 100% contamination

Contamination AUC AUC Drop Log-Likelihood LL Drop
0% (baseline) 0.6401 -- -0.5528 --
10% 0.6277 1.9% -0.5599 1.3%
25% 0.6111 4.5% -0.5781 4.6%
50% 0.5860 8.5% -0.6049 9.4%
75% 0.5614 12.3% -0.6600 19.4%
100% 0.5336 16.6% -1.4604 164.2%

Contamination curve

Finding 3.2: Degradation Is Non-Linear

The AUC degradation curve is approximately linear up to ~60% contamination, then accelerates. The log-likelihood curve diverges sharply above 80%, reflecting severe miscalibration when the model has very little organic training signal remaining.

Relative degradation

Finding 3.3: Even Small Contamination Is Detectable

At just 5% contamination (1,182 low-validation posts out of 23,651), AUC drops by 0.45% and LL by 0.14%. While small in absolute terms, this demonstrates that contamination effects are continuous and begin immediately.


4. Methodological Audit

4.1: What This Experiment Does NOT Prove

The engagement rate gap (76.2% vs. 44.2%) is the primary driver of model divergence. A model trained on a population where ~44% of posts receive upvotes will naturally miscalibrate when evaluated on a population where ~76% receive upvotes. This is not a bug -- it IS the finding: low-validation agents produce qualitatively different engagement patterns that corrupt models built to predict organic behavior.

However, the AUC metric (which measures ranking quality, not calibration) also degrades substantially (0.640 to 0.534), confirming this is not purely a base-rate effect. The model loses its ability to discriminate between engaged and non-engaged posts WITHIN the organic population.

4.2: Verified Properties

Property Status Evidence
No train/test leakage PASS Test set (52,513 posts) fully disjoint from all training pools
Constant training size PASS All 21 contamination levels use exactly 23,651 posts
Feature standardization PASS Test data normalized using training-set statistics only
AUC computation PASS O(n log n) sorted-based algorithm, mathematically correct
NaN-free training PASS Value clipping on alpha prevents divergence; NaN guard as fallback
Reproducibility PASS Random seed 42 controls all stochastic operations

4.3: Known Limitations

L1: Data imbalance (11:1 post ratio). High-validation agents produce 262K posts vs. 23K for low-validation. This means:

  • The contamination curve uses only 23K posts per training run (matching the smaller pool)
  • The baseline model (0% contamination) is weaker than theta_high (AUC 0.640 vs. 0.654) because it trains on 9x fewer posts
  • The core model comparison (Section 2) uses full data and is not affected

L2: Sequential RNG consumption. The random number generator is consumed sequentially across contamination levels, creating mild statistical dependency between adjacent levels. This does not affect reproducibility but means variance across different seeds has not been assessed.

L3: Community coverage variation. Some communities may lack representation at extreme contamination levels (e.g., a community with only high-validation posts has no signal at 100% contamination). Untrained community biases default to 0 (global base rate). This could contribute to additional degradation at extremes.

L4: Single-seed results. All results reported for seed=42. Multi-seed bootstrap would strengthen confidence intervals.


5. PBM Model Architecture

Model Specification

P(upvote > 0 | community c, features x) = sigmoid(b + alpha_c + x . beta)
Component Dimension Role Learning Rate
b (intercept) 1 Global base rate 0.05
alpha_c (community bias) 767 Community-specific examination probability 0.50
beta (feature weights) 3 Agent attractiveness 0.05

Features

Feature Transform Motivation
Author karma sign(x) * log1p(|x|) Symmetric log (karma can be negative)
Author followers log1p(max(0, x)) Follower count (non-negative)
Content length log1p(x) Post length proxy for effort

Training

  • 500 epochs of full-batch gradient descent
  • Separate learning rates for community biases (fast convergence) and feature weights
  • Alpha clipped to [-5, 5] per epoch to prevent divergence
  • Feature standardization using training-set mean/std
  • Intercept initialized to log-odds of training base rate
  • L2 regularization (lambda=0.001)

6. Reproducibility

# Run the full experiment (requires ~7 minutes: 5 min data loading, 2 min training)
python eval/scripts/attack6_click_model_degradation.py

Output Files

File Location Description
Results JSON eval/results/click_model_degradation.json All numerical results
Contamination curve eval/figures/click_model_contamination_curve.{pdf,png} Main figure (AUC + LL vs. contamination)
Relative degradation eval/figures/click_model_relative_degradation.{pdf,png} Percentage drop figure
Core comparison eval/figures/click_model_core_comparison.{pdf,png} Bar chart of three core models

7. Paper-Ready Paragraph

To test whether the attribution problem has practical consequences for IR systems, we train a position-based click model on upvote patterns from high-validation agents (our proxy for organic users) and evaluate prediction quality as low-validation agents are substituted into the training set at constant training set size. Figure X shows that model performance degrades monotonically with contamination: at 50% low-validation agents, click model AUC drops by 8.5% and log-likelihood drops by 9.4% relative to the high-validation-only baseline. The learned model parameters diverge substantially between populations (L2=62.11, KL=0.946), confirming that the two groups produce structurally different engagement signals. This demonstrates that unattributable agent populations introduce measurable noise into standard IR training pipelines, directly connecting the identification gap (Contribution 1) and behavioral divergence (Contribution 2) to a concrete downstream consequence.