134 lines
5.4 KiB
Markdown
134 lines
5.4 KiB
Markdown
# Montana deep retrospective — 2026-05-21
|
||
|
||
Generated 2026-05-21T19:40:02Z on Moscow orchestrator.
|
||
|
||
## 1. Current state of all four nodes
|
||
|
||
| Node | Phase | current_window | uptime since | CPU 1-min load |
|
||
|------|-------|----------------|--------------|----------------|
|
||
| moscow | Active | 75849 | Thu 2026-05-21 22:24:54 MSK | 3.97 |
|
||
| frankfurt | CandidateVdf | 75847 | Thu 2026-05-21 19:29:17 UTC | 1.22 |
|
||
| helsinki | CandidateVdf | 75787 | Thu 2026-05-21 22:24:15 EEST | 7.68 |
|
||
| armenia | CandidateVdf | 75845 | Thu 2026-05-21 23:26:36 +04 | 1.26 |
|
||
|
||
## 2. Heartbeat health (last 1 hour)
|
||
|
||
Each node sends Ping every 5 s to every connected peer. Expected baseline
|
||
per hour per peer = 720. Three Genesis peers + Armenia = ~2880 heartbeats
|
||
per node per hour in steady state.
|
||
|
||
| Node | heartbeat OK | outgoing errors | connection closed |
|
||
|------|--------------|------------------|--------------------|
|
||
| moscow | 47934 | 0 | 8 |
|
||
| frankfurt | 77878 | 2 | 6 |
|
||
| helsinki | 91416 | 0
|
||
? | 610 |
|
||
| armenia | 2194 | 0
|
||
? | 4 |
|
||
|
||
## 3. Consensus state convergence
|
||
|
||
Moscow is the canonical bootstrap proposer. Followers replay Moscow's
|
||
Proposal envelopes through the apply_proposal path on each incoming
|
||
broadcast. Lag = (Moscow.current_window) − (follower.current_window),
|
||
positive means the follower is behind, expected ≤ 1 for the steady
|
||
state once the follower has caught up.
|
||
|
||
| Node | current_window | lag vs Moscow (75851) |
|
||
|------|----------------|------------------------------|
|
||
| moscow | 75851 | 0 (proposer) |
|
||
| frankfurt | 75848 | 3 |
|
||
| helsinki | 75787 | 64 |
|
||
| armenia | 75845 | 6 |
|
||
|
||
## 4. Resource pressure on each operator host
|
||
|
||
| Node | cores | RAM (MB) | mem used (MB) | swap used (MB) | montana-node RSS (MB) |
|
||
|------|-------|----------|---------------|-----------------|------------------------|
|
||
| moscow | 1 | 1968 | 677 | 376 | 5 |
|
||
| frankfurt | 1 | 1967 | 558 | 368 | 8 |
|
||
| helsinki | 1 | 961 | 541 | 581 | 3 |
|
||
| armenia | 1 | 961 | 330 | 447 | 8 |
|
||
|
||
## 5. Frequent error / warning lines (last 24 hours)
|
||
|
||
### moscow
|
||
```
|
||
5341 Main process exited, code=exited, status=1/FAILURE
|
||
5341 Failed with result 'exit-code'.
|
||
5339 Permission denied (os error 13)
|
||
5 Error(Right(Closed)) }))
|
||
2 No space left on device [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
|
||
1 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
|
||
1 "Connection timed out" })))) }))
|
||
1 "Connection timed out" }))) }))
|
||
```
|
||
|
||
### frankfurt
|
||
```
|
||
602 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
|
||
6 Error(Right(Closed)) }))
|
||
6 "Connection reset by peer" })))) }))
|
||
2 [active W=82123] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
|
||
2 [active W=82122] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
|
||
2 [active W=82121] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
|
||
2 [active W=82120] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
|
||
2 [active W=82119] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
|
||
```
|
||
|
||
### helsinki
|
||
```
|
||
600 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
|
||
8 "Connection reset by peer" })))) }))
|
||
5 Error(Right(Closed)) }))
|
||
3 Failed with result 'timeout'.
|
||
2 Invalid argument
|
||
1 Connection refused (os error 111))]
|
||
```
|
||
|
||
### armenia
|
||
```
|
||
4 Error(Right(Closed)) }))
|
||
1 Invalid argument
|
||
1 Failed with result 'timeout'.
|
||
```
|
||
|
||
|
||
## 6. Soak watchdog (5-minute polls)
|
||
|
||
`montana-soak.timer` writes one JSON line per poll to
|
||
`/var/lib/montana-soak/soak.jsonl` on the Moscow orchestrator covering
|
||
all four nodes. The 24-hour continuous record is the empirical evidence
|
||
for the Noise_PQ XX cross-machine soak (DEV-014 Phase 3 part 3
|
||
acceptance).
|
||
|
||
Total soak records to date: 54.
|
||
|
||
Last 3 records (one line per poll):
|
||
|
||
```
|
||
0,"closed_5m":0
|
||
0},{"label":"armenia","host":"yerevan","active":"active","window":75845,"phase":"CandidateVdf","D":325000000,"hb_5m":2204,"err_5m":0
|
||
0,"closed_5m":4}]}
|
||
```
|
||
|
||
|
||
## 7. Mainnet release candidate verdict
|
||
|
||
| Component | State |
|
||
|-----------|-------|
|
||
| Noise_PQ XX production transport | active across four-node mesh (3 Genesis + 1 external operator) |
|
||
| Genesis manifest auto-sync | live (10-min timer) |
|
||
| VPN key auto-sync | live (5-min timer) |
|
||
| Explorer auto-discovery | live (1-min collector at /var/www/efir/explorer/data.json) |
|
||
| Soak watchdog | live (5-min timer at /var/lib/montana-soak/soak.jsonl) |
|
||
| External-operator onboarding | verified end-to-end on a fresh Yerevan VPS in ~16 min |
|
||
| Sixteen Metzdowd findings | 12 closed by construction + 2 rejected with citation + MONT-001/MONT-002/MONT-004 + DEV-014 all closed |
|
||
| DEV-012 follower drift fix | **closed** (commit e1a0bd0 follower_skip flag) |
|
||
| DEV-012 multi-confirmer protocol | **open** for v1.0.0 promotion (BundledConfirmation cross-node aggregation + quorum) |
|
||
| M7 fast-sync | **open** for v1.0.0 promotion (snapshot-based onboarding for million-account scale) |
|
||
|
||
The network is suitable for a public release candidate v1.0.0-rc.2
|
||
including the DEV-012 partial close (commit e1a0bd0). The two open items
|
||
above are the explicit gates for promotion to v1.0.0 mainnet.
|