Lysis VM
Third VM: structural template instantiation with hash-consing and frame-spill heap. The canonical example of Bytecode-Oriented Compilation in Achronyme.
Why Lysis exists
Akron and Artik both execute once per proof. Lysis is different: it separates compile/instantiate from runtime emission. The walker emits a Lysis program that, when run, produces SSA IR — and the executor’s interning sink collapses structurally identical sub-trees during emission.
Concretely: the legacy eager template expander unrolled every loop and inlined every sub-template up front. Lysis defers that expansion to runtime and dedupes redundant sub-graphs via hash-consing. Current in-tree comments and tests identify SHA-256(64)-shaped circuits as the motivating scale target; exact wall-clock and constraint-delta figures are benchmark outputs, not architectural invariants.
Bytecode-Oriented Compilation (BOC)
Lysis is the canonical example of what we call Bytecode-Oriented Compilation (BOC) inside Achronyme: a compiler architecture where the compilation step itself is a bytecode program, and the output of compilation is the side-effect of running that bytecode on a VM.
The shape:
Source → Bytecode → Run on VM → Compile output
Two distinctions worth being explicit about, since BOC sits next to several closer-named patterns:
- BOC is not “a compiler that emits bytecode.” Java’s
javacand CPython produce bytecode as their final artifact — that bytecode is data, consumed later by a separate runtime. In BOC the bytecode is consumed as part of compilation itself: every step the VM executes pushes an entry into the output. In Lysis’s case, those entries are SSA IR nodes that go on to become R1CS constraints. The bytecode is the engine of compilation, not its product. - BOC is heterogeneous staged compilation. The closest neighbors in the literature are multi-stage programming (MetaOCaml, Template Haskell), compile-time function execution (Zig comptime, D CTFE), and metacompilation (Forth). All of them are typically homogeneous — stage-2 produces stage-1-shaped code, baked into the same target binary. BOC is heterogeneous: Lysis bytecode is a different ISA from its output, and the output is a constraint system, not an executable.
The practical consequence is that BOC pipelines fail and scale like programs, not like data structures. The frame-overflow handling, the spill heap, the validator’s single-static-store rule, the 8192 default call depth — none of these would exist for a passive IR. They exist because the meta-program is a real program subject to real resource limits, and the compiler’s compile-time is reified as this VM’s runtime. When Lysis pegs against a limit, the fix is rarely “change the data shape”; it is “give the executor more capacity” or “teach the executor a new opcode.” That is the signature of BOC.
Where Lysis sits
The walker is the IR-side lifter: it consumes ExtendedInstruction<F> (from ir-forge) and produces Lysis bytecode. The executor is the VM side: it runs that bytecode and emits SSA via a sink.
Lift pipeline (ir-forge::lysis_lift)
File: ir-forge/src/lysis_lift/mod.rs. Five stages:
| Stage | Module | Deliverable |
|---|---|---|
| Symbolic walk | ir-forge/src/lysis_lift/symbolic.rs | SymbolicTree with placeholder slots for loop-var derivatives |
| Structural diff | ir-forge/src/lysis_lift/diff.rs | Diff with AST-path slot identity |
| Binding-Time Analysis | ir-forge/src/lysis_lift/bta.rs | 3-point classification (Constant / Invariant / DataDependent) |
| Template extraction | ir-forge/src/lysis_lift/extract/ | Lambda lifting + frame size, capture layout |
| Walker (driver) | ir-forge/src/lysis_lift/walker/ | Orchestrates the full lift pass |
All five implement deferred evaluation — templates remain symbolic until proof-time.
Bytecode header
0..4 magic "LYSI"
4..6 version (u16 LE) — current: 2 (v1 streams decode with heap_size_hint = 0)
6 field_family (u8) — BnLike256 | Goldilocks64
7 flags (u8) — bit 0: has_witness_calls
8..12 const_pool_len (u32 LE)
12..16 body_len (u32 LE)
16..20 heap_size_hint (u32 LE) — spill heap size in entries (v2 only)
20..… const_pool (FieldConst entries + bytecode blobs for Artik calls)
….. body (instruction stream)
The decoder dispatches on version and reads the v1-shaped header (16 bytes) when it sees a v1 stream, defaulting heap_size_hint to 0.
Register / frame model
- Registers: 256 per frame, stored as
Vec<Option<FieldElement<F>>>(undefined until written). - Frame stack:
Vec<Frame>withpc,body_start_idx,body_end_idx,template_id,loop_stack: Vec<LoopState>. - Loop state: each
LoopUnrollpushes aLoopStatetracking iteration bounds; the dispatcher’sadvance_loops()step increments and detects completion. - Default call depth: 8192 (raised from the original 64 to accommodate deep template trees like SHA-256 message schedules).
Spill heap
The frame heap is a flat Vec<FieldElement<F>> allocated per program execution, sized by heap_size_hint. It holds values that outlive the frame that wrote them — captured loop-invariant values, materialised arrays from rolled loops, witness-call buffers — that don’t fit into the 256-register window of the active frame.
Each heap slot is governed by a single-static-store invariant: a slot is written exactly once per execution. The validator rejects programs that emit two StoreHeap to the same slot, and the executor traps on a LoadHeap from an unwritten slot. This makes the heap behave like SSA registers extended into a wider address space, with the same dataflow guarantees the rest of the VM relies on.
Opcode table (37 opcodes)
File: lysis/src/bytecode/opcode.rs.
| Code | Mnemonic | Operands | Semantics |
|---|---|---|---|
| 0x01 | LoadCapture | dst, idx | R[dst] = captures[idx] |
| 0x02 | LoadConst | dst, idx | R[dst] = const_pool[idx] |
| 0x03 | LoadInput | dst, name_idx, vis | R[dst] = inputs[name_idx] (Public/Witness) |
| 0x04 | EnterScope | — | scope_depth += 1 |
| 0x05 | ExitScope | — | scope_depth −= 1 |
| 0x10 | Jump | offset | PC += offset |
| 0x11 | JumpIf | cond, offset | conditional |
| 0x12 | Return | — | pop frame |
| 0x13 | Halt | — | end execution |
| 0x14 | Trap | code | abort |
| 0x20 | LoopUnroll | iter_var, start, end, body_len | inline body N times |
| 0x21 | LoopRolled | iter_var, start, end, template_id | call template repeatedly |
| 0x22 | LoopRange | iter_var, R(end), template_id | dynamic-bound loop |
| 0x30 | DefineTemplate | template_id, frame_size, n_params, body_offset, body_len | register template metadata |
| 0x31 | InstantiateTemplate | template_id, capture_regs, output_regs | push frame + call |
| 0x32 | TemplateOutput | output_idx, src | publish output value |
| 0x40 | EmitConst | dst, src | emit Const(R[src]) |
| 0x41 | EmitAdd | dst, lhs, rhs | emit Add |
| 0x42 | EmitSub | dst, lhs, rhs | emit Sub |
| 0x43 | EmitMul | dst, lhs, rhs | emit Mul |
| 0x44 | EmitNeg | dst, src | emit Neg |
| 0x45 | EmitMux | dst, cond, t, f | emit Mux |
| 0x46 | EmitDecompose | dst_arr, src, n_bits | emit Decompose |
| 0x47 | EmitAssertEq | lhs, rhs | emit AssertEq |
| 0x48 | EmitRangeCheck | var, max_bits | emit RangeCheck |
| 0x49 | EmitWitnessCall | bytecode_idx, in_regs, out_regs | emit WitnessCall (Artik) |
| 0x4A | EmitPoseidonHash | dst, in_regs | emit PoseidonHash |
| 0x4B | EmitIsEq | dst, lhs, rhs | emit IsEq |
| 0x4C | EmitIsLt | dst, lhs, rhs | emit IsLt |
| 0x4D | EmitIntDiv | dst, lhs, rhs, max_bits | emit IntDiv |
| 0x4E | EmitIntMod | dst, lhs, rhs, max_bits | emit IntMod |
| 0x4F | EmitDiv | dst, lhs, rhs | emit field-Div |
| 0x50 | StoreHeap | slot, src | spill R[src] into heap slot (single-static-store) |
| 0x51 | LoadHeap | dst, slot | read heap slot back into a register |
| 0x52 | EmitWitnessCallHeap | bytecode_idx, in_descs, out_regs | WitnessCall whose inputs may live in heap slots |
| 0x53 | EmitAssertEqMsg | lhs, rhs, msg_idx | AssertEq carrying a constant-pool message id |
| 0x54 | EmitIsLtBounded | dst, lhs, rhs, max_bits | emit bounded IsLt |
Emit* opcodes don’t compute results — they push entries into the sink.
Sinks
The dispatcher writes IR-emitting opcodes into a IrSink trait object. Two implementations exist:
StubSink— no-op, used by skeleton tests that exercise control flow without materialising IR.InterningSink— hash-cons viaIndexMap<NodeKey, NodeId>. Default in production. Materialises a flatVec<Instruction<F>>viasink.materialize().
Dispatcher loop
File: lysis/src/execute/mod.rs. Outer loop over frames + inner step:
loop {
check_instruction_budget()?;
let instr = program.body[frame.pc];
match dispatch(instr, ...)? {
Step::Next => frame.pc += 1,
Step::JumpToIndex(idx) => frame.pc = idx,
Step::PushFrame(f) => stack.push(f),
Step::PopFrame => stack.pop(),
Step::Halt => break,
}
advance_loops()?;
}
Memory model
No GC. Frame registers live in the frame struct. The interner internally uses IndexMap<NodeKey, NodeId> borrowed mutably during execution. The spill heap is a single Vec<FieldElement<F>> owned by the executor for the duration of one program run, reset between executions.
Lift output → walker
File: ir-forge/src/lysis_lift/walker/. Driver that consumes Vec<ExtendedInstruction<F>>:
| Variant | Walker action |
|---|---|
Plain(ir::Instruction) | emit corresponding Emit* opcode |
LoopUnroll { iter_var, start, end, body } | emit LoopUnroll opcode |
TemplateCall {…} | emit InstantiateTemplate; spill long-lived captures via StoreHeap |
TemplateBody {…} | bound to a DefineTemplate block; rolled-loop bodies live here |
When a body is too wide to keep its values in the 256-register window — the SHA-256 round body is the canonical example — the walker drops to per-iteration unroll and routes loop-carried values through heap slots so each iteration starts with a fresh register file.
E2E coverage
| Circuit | Pipeline | Result |
|---|---|---|
| SHA-256(64) | walker (per-iter unroll, heap-routed) → optimize → R1CS | digest equivalence and full R1CS witness verification are covered by ignored regression tests in circom/tests/e2e_sha256/witness.rs |
| Poseidon(2) | structural lift → R1CS → Groth16 | proof-generation path covered by Groth16 and Solidity E2E tests |
| EdDSAPoseidon | structural lift + heap captures | compile + R1CS clean |
| MiMCSponge(2,220,1) | structural lift → R1CS | covered by circomlib E2E and benchmark harnesses; exact constraint counts belong in benchmark output |
| Adversarial constraint preservation | benchmark + cross-mode tests | round-trip and preservation tests check byte-identical R1CS where that property is required |
Open questions
- Current benchmark numbers for SHA-256(64), MiMCSponge, and circom O2 deltas should be regenerated from the benchmark harness before publishing exact counts or wall-clock timings as facts.
Source files
| Component | File |
|---|---|
| Bytecode opcodes | lysis/src/bytecode/opcode.rs |
| Encoding | lysis/src/bytecode/encoding.rs |
| Validator | lysis/src/bytecode/validate.rs |
| Const pool | lysis/src/bytecode/const_pool.rs |
| Header (v1/v2) | lysis/src/header.rs |
| Dispatcher | lysis/src/execute/mod.rs |
| Frame | lysis/src/execute/frame.rs |
| Heap dispatch | lysis/src/execute/dispatch/heap.rs |
| Interning sink | lysis/src/execute/interning_sink.rs |
| Interner | lysis/src/intern/interner/ |
| Builder | lysis/src/builder.rs |
| Walker (lift) | ir-forge/src/lysis_lift/walker/ |
| BTA classifier | ir-forge/src/lysis_lift/bta.rs |
| Symbolic tree | ir-forge/src/lysis_lift/symbolic.rs |
| Structural diff | ir-forge/src/lysis_lift/diff.rs |
| Template extraction | ir-forge/src/lysis_lift/extract/ |
See ProveIR for the layer feeding Lysis. See Akron VM and Artik Witness VM for the other two VMs.