montana/Montana-Protocol/External-Audit/montana-deep-retrospective-2026-05-21.md
2026-05-26 21:14:51 +03:00

5.4 KiB
Raw Blame History

Montana deep retrospective — 2026-05-21

Generated 2026-05-21T19:40:02Z on Moscow orchestrator.

1. Current state of all four nodes

Node Phase current_window uptime since CPU 1-min load
moscow Active 75849 Thu 2026-05-21 22:24:54 MSK 3.97
frankfurt CandidateVdf 75847 Thu 2026-05-21 19:29:17 UTC 1.22
helsinki CandidateVdf 75787 Thu 2026-05-21 22:24:15 EEST 7.68
armenia CandidateVdf 75845 Thu 2026-05-21 23:26:36 +04 1.26

2. Heartbeat health (last 1 hour)

Each node sends Ping every 5 s to every connected peer. Expected baseline per hour per peer = 720. Three Genesis peers + Armenia = ~2880 heartbeats per node per hour in steady state.

Node heartbeat OK outgoing errors connection closed
moscow 47934 0 8
frankfurt 77878 2 6
helsinki 91416 0
? 610
armenia 2194 0
? 4

3. Consensus state convergence

Moscow is the canonical bootstrap proposer. Followers replay Moscow's Proposal envelopes through the apply_proposal path on each incoming broadcast. Lag = (Moscow.current_window) (follower.current_window), positive means the follower is behind, expected ≤ 1 for the steady state once the follower has caught up.

Node current_window lag vs Moscow (75851)
moscow 75851 0 (proposer)
frankfurt 75848 3
helsinki 75787 64
armenia 75845 6

4. Resource pressure on each operator host

Node cores RAM (MB) mem used (MB) swap used (MB) montana-node RSS (MB)
moscow 1 1968 677 376 5
frankfurt 1 1967 558 368 8
helsinki 1 961 541 581 3
armenia 1 961 330 447 8

5. Frequent error / warning lines (last 24 hours)

moscow

   5341 Main process exited, code=exited, status=1/FAILURE
   5341 Failed with result 'exit-code'.
   5339 Permission denied (os error 13)
      5 Error(Right(Closed)) }))
      2 No space left on device [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
      1 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
      1 "Connection timed out" })))) }))
      1 "Connection timed out" }))) }))

frankfurt

    602 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
      6 Error(Right(Closed)) }))
      6 "Connection reset by peer" })))) }))
      2 [active W=82123] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
      2 [active W=82122] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
      2 [active W=82121] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
      2 [active W=82120] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)
      2 [active W=82119] singleton невозможен (NodeTable=2 узлов), пропуск окна — жду peer Proposal (M9 Phase 2)

helsinki

    600 https://docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof" })))) }))
      8 "Connection reset by peer" })))) }))
      5 Error(Right(Closed)) }))
      3 Failed with result 'timeout'.
      2 Invalid argument
      1 Connection refused (os error 111))]

armenia

      4 Error(Right(Closed)) }))
      1 Invalid argument
      1 Failed with result 'timeout'.

6. Soak watchdog (5-minute polls)

montana-soak.timer writes one JSON line per poll to /var/lib/montana-soak/soak.jsonl on the Moscow orchestrator covering all four nodes. The 24-hour continuous record is the empirical evidence for the Noise_PQ XX cross-machine soak (DEV-014 Phase 3 part 3 acceptance).

Total soak records to date: 54.

Last 3 records (one line per poll):

0,"closed_5m":0
0},{"label":"armenia","host":"yerevan","active":"active","window":75845,"phase":"CandidateVdf","D":325000000,"hb_5m":2204,"err_5m":0
0,"closed_5m":4}]}

7. Mainnet release candidate verdict

Component State
Noise_PQ XX production transport active across four-node mesh (3 Genesis + 1 external operator)
Genesis manifest auto-sync live (10-min timer)
VPN key auto-sync live (5-min timer)
Explorer auto-discovery live (1-min collector at /var/www/efir/explorer/data.json)
Soak watchdog live (5-min timer at /var/lib/montana-soak/soak.jsonl)
External-operator onboarding verified end-to-end on a fresh Yerevan VPS in ~16 min
Sixteen Metzdowd findings 12 closed by construction + 2 rejected with citation + MONT-001/MONT-002/MONT-004 + DEV-014 all closed
DEV-012 follower drift fix closed (commit e1a0bd0 follower_skip flag)
DEV-012 multi-confirmer protocol open for v1.0.0 promotion (BundledConfirmation cross-node aggregation + quorum)
M7 fast-sync open for v1.0.0 promotion (snapshot-based onboarding for million-account scale)

The network is suitable for a public release candidate v1.0.0-rc.2 including the DEV-012 partial close (commit e1a0bd0). The two open items above are the explicit gates for promotion to v1.0.0 mainnet.