One narrow-ALU macro-cycle, chunked across a wide virtual state, is now machine-checked bit-exactly equivalent to one native wide-W driftStep — for any physical chunk width $P \geq 1$, any chunk count $N$, and any input. The full chain is proven in Lean 4 against the pinned Mathlib, axiom-clean: [propext, Classical.choice, Quot.sound] only, no sorry.
The standard Drift Core executes the recurrence $S_{t+1} = \mathrm{Fold}((3 S_t + 1) \gg 1)$ in a single cycle on a state of native width $W$. The Drift-Flex embodiment (US PPA 63/929,897, Claims 1–2) trades that single-cycle native width for a sequence of macro-cycles: a narrower physical ALU of width $P$ iterates over a virtual state of total width $V = P \cdot N$ stored as $N$ chunks. Per macro-cycle, the chunked computation performs Phase A (a multi-precision $3\cdot\text{value}+1$ ripple across all $N$ chunks with cross-chunk carry propagation) followed by Phase B (a cross-chunk right shift by one). The result is the same updated state — just produced in $N$ ALU cycles instead of one.
Wall-clock per state update trades multiplicatively with $N$. A native width-1024 core produces one new state per cycle; the chunked $P=32, N=32$ Drift-Flex produces one per 32 cycles. The reverse trade is silicon area: chunked variants land at the SKY130 architectural floor (~12 GE per virtual-state bit at W=1024) versus ~22 GE/bit for the agile-state native-width family.
The output bitstream is identical — not approximately, but bit-for-bit. Every virtual-step result from flex_macro P cs equals the corresponding result from driftStep (P·N) on the assembled value. This is the theorem on the right.
The headline theorem and its supporting lemmas are mechanized in Lean 4 against Mathlib's pinned distribution. The crux is phaseA_exact: chunked multi-precision $3 \cdot c + \text{carry}$ ripple reconstructs the exact integer $3 \cdot \text{value} + \text{carry}_0$, by induction on chunk count. The composition lemma flex_macro_eq_driftStep closes the loop — one macro-cycle of the chunked machine equals one native wide driftStep, for all widths and inputs.
-- Verified File: DriftFlex.lean
-- Drift-Flex temporal-extension bit-exact equivalence (PPA 63/929,897 Claims 1-2).
import Mathlib.Tactic
import Mathlib.Data.BitVec
import FormalVerification.DriftRecurrence
namespace DriftFlex
/-- **Two-phase decomposition**: the native driftStep on a wide state
factors exactly into Phase A (3*value + 1 ripple) followed by
Phase B (cross-chunk right shift by one). -/
theorem driftStep_factor (V : â„•) (s : Fin (2 ^ V)) :
driftStep V s = phaseB V (phaseA V s.val) := by
rfl
/-- **Phase A multi-precision correctness** (the crux lemma): chunked
`3*chunk + carry_in` ripple across N chunks reconstructs the exact
integer `3*value + carry_0`. Proof by induction on chunk count;
holds for any chunk count and any chunk width P ≥ 1. -/
theorem phaseA_exact {P : ℕ} (hP : 1 ≤ P) (cs : List ℕ)
(hbound : ∀ c ∈ cs, c < 2 ^ P) (carry₀ : ℕ) :
chunkValue P (phaseAChunks P carryâ‚€ cs)
= 3 * chunkValue P cs + carryâ‚€ := by
-- induction body omitted in display; see file
...
/-- **Phase A modular form**: Phase A on the assembled value equals
`(3 * value + 1) mod 2^V` exactly. Corollary of `phaseA_exact`
with `carryâ‚€ = 1` and a chunk-count argument. -/
theorem phaseA_mod {P : ℕ} (hP : 1 ≤ P) (cs : List ℕ)
(hbound : ∀ c ∈ cs, c < 2 ^ P) :
chunkValue P (phaseAChunks P 1 cs)
= (3 * chunkValue P cs + 1) % 2 ^ (P * cs.length) := by
...
/-- **Phase B value preservation**: the chunked cross-chunk right-shift
by one yields exactly value / 2 of the assembled value. -/
theorem phaseB_value {P : ℕ} (hP : 1 ≤ P) (cs : List ℕ) :
chunkValue P (phaseBChunks P cs) = chunkValue P cs / 2 := by
...
/-- **HEADLINE**: one narrow-ALU macro-cycle of the chunked machine
(Phase A then Phase B over the P-bit chunks) is bit-exactly equal
to one native wide driftStep(P*N) of the assembled value, for any
physical chunk width P ≥ 1 and any chunk count N ≥ 1.
This is the formal statement of PPA 63/929,897 Claims 1-2. -/
theorem flex_macro_eq_driftStep
{P : ℕ} (hP : 1 ≤ P) (cs : List ℕ)
(hbound : chunkValue P cs < 2 ^ (P * cs.length)) :
chunkValue P (flexMacro P cs)
= (driftStep (P * cs.length) ⟨chunkValue P cs, hbound⟩).val := by
...
end DriftFlexEvery theorem is fully discharged in formal_verification/DriftFlex.lean with no sorry and no axioms beyond Mathlib's standard library. Axiom audit confirms dependence only on [propext, Classical.choice, Quot.sound]. The chunk-bound hypothesis in phaseA_exact is propagated through to the headline; the bound at the headline is exactly the well-formedness of the input as a Fin (2 ^ V).
SCOPE NOTE. What this theorem proves: functional bit-exactness of the chunked arithmetic versus the native recurrence, for the general case (all widths, all inputs). What it does not prove: RTL FSM and handshake correspondence (an RTL-equivalence obligation discharged separately by the iverilog campaign across the 13 measured variants), nor any cryptographic strength property (an open cryptanalytic question, per compute.html §3 and the Drift design rationale). The earlier iverilog campaign empirically validated 13 specific variants over 1,000 steps each; this Lean proof generalizes that to all variants over all inputs.