5.4 KiB
Montana deep retrospective — 2026-05-21
Generated 2026-05-21T19:40:02Z on Moscow orchestrator.
1. Current state of all four nodes
| Node | Phase | current_window | uptime since | CPU 1-min load |
|---|---|---|---|---|
| moscow | Active | 75849 | Thu 2026-05-21 22:24:54 MSK | 3.97 |
| frankfurt | CandidateVdf | 75847 | Thu 2026-05-21 19:29:17 UTC | 1.22 |
| helsinki | CandidateVdf | 75787 | Thu 2026-05-21 22:24:15 EEST | 7.68 |
| armenia | CandidateVdf | 75845 | Thu 2026-05-21 23:26:36 +04 | 1.26 |
2. Heartbeat health (last 1 hour)
Each node sends Ping every 5 s to every connected peer. Expected baseline per hour per peer = 720. Three Genesis peers + Armenia = ~2880 heartbeats per node per hour in steady state.
| Node | heartbeat OK | outgoing errors | connection closed |
|---|---|---|---|
| moscow | 47934 | 0 | 8 |
| frankfurt | 77878 | 2 | 6 |
| helsinki | 91416 | 0 | |
| ? | 610 | ||
| armenia | 2194 | 0 | |
| ? | 4 |
3. Consensus state convergence
Moscow is the canonical bootstrap proposer. Followers replay Moscow's Proposal envelopes through the apply_proposal path on each incoming broadcast. Lag = (Moscow.current_window) − (follower.current_window), positive means the follower is behind, expected ≤ 1 for the steady state once the follower has caught up.
| Node | current_window | lag vs Moscow (75851) |
|---|---|---|
| moscow | 75851 | 0 (proposer) |
| frankfurt | 75848 | 3 |
| helsinki | 75787 | 64 |
| armenia | 75845 | 6 |
4. Resource pressure on each operator host
| Node | cores | RAM (MB) | mem used (MB) | swap used (MB) | montana-node RSS (MB) |
|---|---|---|---|---|---|
| moscow | 1 | 1968 | 677 | 376 | 5 |
| frankfurt | 1 | 1967 | 558 | 368 | 8 |
| helsinki | 1 | 961 | 541 | 581 | 3 |
| armenia | 1 | 961 | 330 | 447 | 8 |
5. Frequent error / warning lines (last 24 hours)
moscow
5341 Main process exited, code=exited, status=1/FAILURE
5341 Failed with result 'exit-code'.
5339 Permission denied (os error 13)
5 Error(Right(Closed)) }))
2 No space left on device [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
1 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
1 "Connection timed out" })))) }))
1 "Connection timed out" }))) }))
frankfurt
602 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
6 Error(Right(Closed)) }))
6 "Connection reset by peer" })))) }))
2 [active W=82123] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82122] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82121] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82120] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82119] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
helsinki
600 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
8 "Connection reset by peer" })))) }))
5 Error(Right(Closed)) }))
3 Failed with result 'timeout'.
2 Invalid argument
1 Connection refused (os error 111))]
armenia
4 Error(Right(Closed)) }))
1 Invalid argument
1 Failed with result 'timeout'.
6. Soak watchdog (5-minute polls)
montana-soak.timer writes one JSON line per poll to
/var/lib/montana-soak/soak.jsonl on the Moscow orchestrator covering
all four nodes. The 24-hour continuous record is the empirical evidence
for the Noise_PQ XX cross-machine soak (DEV-014 Phase 3 part 3
acceptance).
Total soak records to date: 54.
Last 3 records (one line per poll):
0,"closed_5m":0
0},{"label":"armenia","host":"yerevan","active":"active","window":75845,"phase":"CandidateVdf","D":325000000,"hb_5m":2204,"err_5m":0
0,"closed_5m":4}]}
7. Mainnet release candidate verdict
| Component | State |
|---|---|
| Noise_PQ XX production transport | active across four-node mesh (3 Genesis + 1 external operator) |
| Genesis manifest auto-sync | live (10-min timer) |
| VPN key auto-sync | live (5-min timer) |
| Explorer auto-discovery | live (1-min collector at /var/www/efir/explorer/data.json) |
| Soak watchdog | live (5-min timer at /var/lib/montana-soak/soak.jsonl) |
| External-operator onboarding | verified end-to-end on a fresh Yerevan VPS in ~16 min |
| Sixteen Metzdowd findings | 12 closed by construction + 2 rejected with citation + MONT-001/MONT-002/MONT-004 + DEV-014 all closed |
| DEV-012 follower drift fix | closed (commit e1a0bd0 follower_skip flag) |
| DEV-012 multi-confirmer protocol | open for v1.0.0 promotion (BundledConfirmation cross-node aggregation + quorum) |
| M7 fast-sync | open for v1.0.0 promotion (snapshot-based onboarding for million-account scale) |
The network is suitable for a public release candidate v1.0.0-rc.2 including the DEV-012 partial close (commit e1a0bd0). The two open items above are the explicit gates for promotion to v1.0.0 mainnet.