montana/Montana-Protocol/External-Audit/montana-deep-retrospective-2026-05-21.md
2026-05-26 21:14:51 +03:00

134 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Montana deep retrospective — 2026-05-21
Generated 2026-05-21T19:40:02Z on Moscow orchestrator.
## 1. Current state of all four nodes
| Node | Phase | current_window | uptime since | CPU 1-min load |
|------|-------|----------------|--------------|----------------|
| moscow | Active | 75849 | Thu 2026-05-21 22:24:54 MSK | 3.97 |
| frankfurt | CandidateVdf | 75847 | Thu 2026-05-21 19:29:17 UTC | 1.22 |
| helsinki | CandidateVdf | 75787 | Thu 2026-05-21 22:24:15 EEST | 7.68 |
| armenia | CandidateVdf | 75845 | Thu 2026-05-21 23:26:36 +04 | 1.26 |
## 2. Heartbeat health (last 1 hour)
Each node sends Ping every 5 s to every connected peer. Expected baseline
per hour per peer = 720. Three Genesis peers + Armenia = ~2880 heartbeats
per node per hour in steady state.
| Node | heartbeat OK | outgoing errors | connection closed |
|------|--------------|------------------|--------------------|
| moscow | 47934 | 0 | 8 |
| frankfurt | 77878 | 2 | 6 |
| helsinki | 91416 | 0
? | 610 |
| armenia | 2194 | 0
? | 4 |
## 3. Consensus state convergence
Moscow is the canonical bootstrap proposer. Followers replay Moscow's
Proposal envelopes through the apply_proposal path on each incoming
broadcast. Lag = (Moscow.current_window) (follower.current_window),
positive means the follower is behind, expected ≤ 1 for the steady
state once the follower has caught up.
| Node | current_window | lag vs Moscow (75851) |
|------|----------------|------------------------------|
| moscow | 75851 | 0 (proposer) |
| frankfurt | 75848 | 3 |
| helsinki | 75787 | 64 |
| armenia | 75845 | 6 |
## 4. Resource pressure on each operator host
| Node | cores | RAM (MB) | mem used (MB) | swap used (MB) | montana-node RSS (MB) |
|------|-------|----------|---------------|-----------------|------------------------|
| moscow | 1 | 1968 | 677 | 376 | 5 |
| frankfurt | 1 | 1967 | 558 | 368 | 8 |
| helsinki | 1 | 961 | 541 | 581 | 3 |
| armenia | 1 | 961 | 330 | 447 | 8 |
## 5. Frequent error / warning lines (last 24 hours)
### moscow
```
5341 Main process exited, code=exited, status=1/FAILURE
5341 Failed with result 'exit-code'.
5339 Permission denied (os error 13)
5 Error(Right(Closed)) }))
2 No space left on device [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
1 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
1 "Connection timed out" })))) }))
1 "Connection timed out" }))) }))
```
### frankfurt
```
602 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
6 Error(Right(Closed)) }))
6 "Connection reset by peer" })))) }))
2 [active W=82123] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82122] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82121] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82120] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
2 [active W=82119] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
```
### helsinki
```
600 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
8 "Connection reset by peer" })))) }))
5 Error(Right(Closed)) }))
3 Failed with result 'timeout'.
2 Invalid argument
1 Connection refused (os error 111))]
```
### armenia
```
4 Error(Right(Closed)) }))
1 Invalid argument
1 Failed with result 'timeout'.
```
## 6. Soak watchdog (5-minute polls)
`montana-soak.timer` writes one JSON line per poll to
`/var/lib/montana-soak/soak.jsonl` on the Moscow orchestrator covering
all four nodes. The 24-hour continuous record is the empirical evidence
for the Noise_PQ XX cross-machine soak (DEV-014 Phase 3 part 3
acceptance).
Total soak records to date: 54.
Last 3 records (one line per poll):
```
0,"closed_5m":0
0},{"label":"armenia","host":"yerevan","active":"active","window":75845,"phase":"CandidateVdf","D":325000000,"hb_5m":2204,"err_5m":0
0,"closed_5m":4}]}
```
## 7. Mainnet release candidate verdict
| Component | State |
|-----------|-------|
| Noise_PQ XX production transport | active across four-node mesh (3 Genesis + 1 external operator) |
| Genesis manifest auto-sync | live (10-min timer) |
| VPN key auto-sync | live (5-min timer) |
| Explorer auto-discovery | live (1-min collector at /var/www/efir/explorer/data.json) |
| Soak watchdog | live (5-min timer at /var/lib/montana-soak/soak.jsonl) |
| External-operator onboarding | verified end-to-end on a fresh Yerevan VPS in ~16 min |
| Sixteen Metzdowd findings | 12 closed by construction + 2 rejected with citation + MONT-001/MONT-002/MONT-004 + DEV-014 all closed |
| DEV-012 follower drift fix | **closed** (commit e1a0bd0 follower_skip flag) |
| DEV-012 multi-confirmer protocol | **open** for v1.0.0 promotion (BundledConfirmation cross-node aggregation + quorum) |
| M7 fast-sync | **open** for v1.0.0 promotion (snapshot-based onboarding for million-account scale) |
The network is suitable for a public release candidate v1.0.0-rc.2
including the DEV-012 partial close (commit e1a0bd0). The two open items
above are the explicit gates for promotion to v1.0.0 mainnet.