Lysis VM

Third VM: structural template instantiation with hash-consing and frame-spill heap. The canonical example of Bytecode-Oriented Compilation in Achronyme.

Why Lysis exists

Akron and Artik both execute once per proof. Lysis is different: it separates compile/instantiate from runtime emission. The walker emits a Lysis program that, when run, produces SSA IR — and the executor’s interning sink collapses structurally identical sub-trees during emission.

Concretely: the legacy eager template expander unrolled every loop and inlined every sub-template up front. Lysis defers that expansion to runtime and dedupes redundant sub-graphs via hash-consing. Current in-tree comments and tests identify SHA-256(64)-shaped circuits as the motivating scale target; exact wall-clock and constraint-delta figures are benchmark outputs, not architectural invariants.

Bytecode-Oriented Compilation (BOC)

Lysis is the canonical example of what we call Bytecode-Oriented Compilation (BOC) inside Achronyme: a compiler architecture where the compilation step itself is a bytecode program, and the output of compilation is the side-effect of running that bytecode on a VM.

The shape:

Source → Bytecode → Run on VM → Compile output

Two distinctions worth being explicit about, since BOC sits next to several closer-named patterns:

BOC is not “a compiler that emits bytecode.” Java’s javac and CPython produce bytecode as their final artifact — that bytecode is data, consumed later by a separate runtime. In BOC the bytecode is consumed as part of compilation itself: every step the VM executes pushes an entry into the output. In Lysis’s case, those entries are SSA IR nodes that go on to become R1CS constraints. The bytecode is the engine of compilation, not its product.
BOC is heterogeneous staged compilation. The closest neighbors in the literature are multi-stage programming (MetaOCaml, Template Haskell), compile-time function execution (Zig comptime, D CTFE), and metacompilation (Forth). All of them are typically homogeneous — stage-2 produces stage-1-shaped code, baked into the same target binary. BOC is heterogeneous: Lysis bytecode is a different ISA from its output, and the output is a constraint system, not an executable.

The practical consequence is that BOC pipelines fail and scale like programs, not like data structures. The frame-overflow handling, the spill heap, the validator’s single-static-store rule, the 8192 default call depth — none of these would exist for a passive IR. They exist because the meta-program is a real program subject to real resource limits, and the compiler’s compile-time is reified as this VM’s runtime. When Lysis pegs against a limit, the fix is rarely “change the data shape”; it is “give the executor more capacity” or “teach the executor a new opcode.” That is the signature of BOC.

Where Lysis sits

The walker is the IR-side lifter: it consumes ExtendedInstruction<F> (from ir-forge) and produces Lysis bytecode. The executor is the VM side: it runs that bytecode and emits SSA via a sink.

Lift pipeline (ir-forge::lysis_lift)

File: ir-forge/src/lysis_lift/mod.rs. Five stages:

Stage	Module	Deliverable
Symbolic walk	`ir-forge/src/lysis_lift/symbolic.rs`	`SymbolicTree` with placeholder slots for loop-var derivatives
Structural diff	`ir-forge/src/lysis_lift/diff.rs`	`Diff` with AST-path slot identity
Binding-Time Analysis	`ir-forge/src/lysis_lift/bta.rs`	3-point classification (Constant / Invariant / DataDependent)
Template extraction	`ir-forge/src/lysis_lift/extract/`	Lambda lifting + frame size, capture layout
Walker (driver)	`ir-forge/src/lysis_lift/walker/`	Orchestrates the full lift pass

All five implement deferred evaluation — templates remain symbolic until proof-time.

Bytecode header

0..4    magic "LYSI"
4..6    version (u16 LE)          — current: 2 (v1 streams decode with heap_size_hint = 0)
6       field_family (u8)         — BnLike256 | Goldilocks64
7       flags (u8)                — bit 0: has_witness_calls
8..12   const_pool_len (u32 LE)
12..16  body_len (u32 LE)
16..20  heap_size_hint (u32 LE)   — spill heap size in entries (v2 only)
20..…   const_pool (FieldConst entries + bytecode blobs for Artik calls)
…..     body (instruction stream)

The decoder dispatches on version and reads the v1-shaped header (16 bytes) when it sees a v1 stream, defaulting heap_size_hint to 0.

Register / frame model

Registers: 256 per frame, stored as Vec<Option<FieldElement<F>>> (undefined until written).
Frame stack: Vec<Frame> with pc, body_start_idx, body_end_idx, template_id, loop_stack: Vec<LoopState>.
Loop state: each LoopUnroll pushes a LoopState tracking iteration bounds; the dispatcher’s advance_loops() step increments and detects completion.
Default call depth: 8192 (raised from the original 64 to accommodate deep template trees like SHA-256 message schedules).

Spill heap

The frame heap is a flat Vec<FieldElement<F>> allocated per program execution, sized by heap_size_hint. It holds values that outlive the frame that wrote them — captured loop-invariant values, materialised arrays from rolled loops, witness-call buffers — that don’t fit into the 256-register window of the active frame.

Each heap slot is governed by a single-static-store invariant: a slot is written exactly once per execution. The validator rejects programs that emit two StoreHeap to the same slot, and the executor traps on a LoadHeap from an unwritten slot. This makes the heap behave like SSA registers extended into a wider address space, with the same dataflow guarantees the rest of the VM relies on.

Opcode table (37 opcodes)

File: lysis/src/bytecode/opcode.rs.

Code	Mnemonic	Operands	Semantics
0x01	`LoadCapture`	dst, idx	`R[dst] = captures[idx]`
0x02	`LoadConst`	dst, idx	`R[dst] = const_pool[idx]`
0x03	`LoadInput`	dst, name_idx, vis	`R[dst] = inputs[name_idx]` (Public/Witness)
0x04	`EnterScope`	—	scope_depth += 1
0x05	`ExitScope`	—	scope_depth −= 1
0x10	`Jump`	offset	`PC += offset`
0x11	`JumpIf`	cond, offset	conditional
0x12	`Return`	—	pop frame
0x13	`Halt`	—	end execution
0x14	`Trap`	code	abort
0x20	`LoopUnroll`	iter_var, start, end, body_len	inline body N times
0x21	`LoopRolled`	iter_var, start, end, template_id	call template repeatedly
0x22	`LoopRange`	iter_var, R(end), template_id	dynamic-bound loop
0x30	`DefineTemplate`	template_id, frame_size, n_params, body_offset, body_len	register template metadata
0x31	`InstantiateTemplate`	template_id, capture_regs, output_regs	push frame + call
0x32	`TemplateOutput`	output_idx, src	publish output value
0x40	`EmitConst`	dst, src	emit `Const(R[src])`
0x41	`EmitAdd`	dst, lhs, rhs	emit `Add`
0x42	`EmitSub`	dst, lhs, rhs	emit `Sub`
0x43	`EmitMul`	dst, lhs, rhs	emit `Mul`
0x44	`EmitNeg`	dst, src	emit `Neg`
0x45	`EmitMux`	dst, cond, t, f	emit `Mux`
0x46	`EmitDecompose`	dst_arr, src, n_bits	emit `Decompose`
0x47	`EmitAssertEq`	lhs, rhs	emit `AssertEq`
0x48	`EmitRangeCheck`	var, max_bits	emit `RangeCheck`
0x49	`EmitWitnessCall`	bytecode_idx, in_regs, out_regs	emit `WitnessCall` (Artik)
0x4A	`EmitPoseidonHash`	dst, in_regs	emit `PoseidonHash`
0x4B	`EmitIsEq`	dst, lhs, rhs	emit `IsEq`
0x4C	`EmitIsLt`	dst, lhs, rhs	emit `IsLt`
0x4D	`EmitIntDiv`	dst, lhs, rhs, max_bits	emit `IntDiv`
0x4E	`EmitIntMod`	dst, lhs, rhs, max_bits	emit `IntMod`
0x4F	`EmitDiv`	dst, lhs, rhs	emit field-`Div`
0x50	`StoreHeap`	slot, src	spill `R[src]` into heap slot (single-static-store)
0x51	`LoadHeap`	dst, slot	read heap slot back into a register
0x52	`EmitWitnessCallHeap`	bytecode_idx, in_descs, out_regs	`WitnessCall` whose inputs may live in heap slots
0x53	`EmitAssertEqMsg`	lhs, rhs, msg_idx	`AssertEq` carrying a constant-pool message id
0x54	`EmitIsLtBounded`	dst, lhs, rhs, max_bits	emit bounded `IsLt`

Emit* opcodes don’t compute results — they push entries into the sink.

Sinks

The dispatcher writes IR-emitting opcodes into a IrSink trait object. Two implementations exist:

StubSink — no-op, used by skeleton tests that exercise control flow without materialising IR.
InterningSink — hash-cons via IndexMap<NodeKey, NodeId>. Default in production. Materialises a flat Vec<Instruction<F>> via sink.materialize().

Dispatcher loop

File: lysis/src/execute/mod.rs. Outer loop over frames + inner step:

loop {
    check_instruction_budget()?;
    let instr = program.body[frame.pc];
    match dispatch(instr, ...)? {
        Step::Next => frame.pc += 1,
        Step::JumpToIndex(idx) => frame.pc = idx,
        Step::PushFrame(f) => stack.push(f),
        Step::PopFrame => stack.pop(),
        Step::Halt => break,
    }
    advance_loops()?;
}

Memory model

No GC. Frame registers live in the frame struct. The interner internally uses IndexMap<NodeKey, NodeId> borrowed mutably during execution. The spill heap is a single Vec<FieldElement<F>> owned by the executor for the duration of one program run, reset between executions.

Lift output → walker

File: ir-forge/src/lysis_lift/walker/. Driver that consumes Vec<ExtendedInstruction<F>>:

Variant	Walker action
`Plain(ir::Instruction)`	emit corresponding `Emit*` opcode
`LoopUnroll { iter_var, start, end, body }`	emit `LoopUnroll` opcode
`TemplateCall {…}`	emit `InstantiateTemplate`; spill long-lived captures via `StoreHeap`
`TemplateBody {…}`	bound to a `DefineTemplate` block; rolled-loop bodies live here

When a body is too wide to keep its values in the 256-register window — the SHA-256 round body is the canonical example — the walker drops to per-iteration unroll and routes loop-carried values through heap slots so each iteration starts with a fresh register file.

E2E coverage

Circuit	Pipeline	Result
SHA-256(64)	walker (per-iter unroll, heap-routed) → optimize → R1CS	digest equivalence and full R1CS witness verification are covered by ignored regression tests in `circom/tests/e2e_sha256/witness.rs`
Poseidon(2)	structural lift → R1CS → Groth16	proof-generation path covered by Groth16 and Solidity E2E tests
EdDSAPoseidon	structural lift + heap captures	compile + R1CS clean
MiMCSponge(2,220,1)	structural lift → R1CS	covered by circomlib E2E and benchmark harnesses; exact constraint counts belong in benchmark output
Adversarial constraint preservation	benchmark + cross-mode tests	round-trip and preservation tests check byte-identical R1CS where that property is required

Open questions

Current benchmark numbers for SHA-256(64), MiMCSponge, and circom O2 deltas should be regenerated from the benchmark harness before publishing exact counts or wall-clock timings as facts.

Source files

Component	File
Bytecode opcodes	`lysis/src/bytecode/opcode.rs`
Encoding	`lysis/src/bytecode/encoding.rs`
Validator	`lysis/src/bytecode/validate.rs`
Const pool	`lysis/src/bytecode/const_pool.rs`
Header (v1/v2)	`lysis/src/header.rs`
Dispatcher	`lysis/src/execute/mod.rs`
Frame	`lysis/src/execute/frame.rs`
Heap dispatch	`lysis/src/execute/dispatch/heap.rs`
Interning sink	`lysis/src/execute/interning_sink.rs`
Interner	`lysis/src/intern/interner/`
Builder	`lysis/src/builder.rs`
Walker (lift)	`ir-forge/src/lysis_lift/walker/`
BTA classifier	`ir-forge/src/lysis_lift/bta.rs`
Symbolic tree	`ir-forge/src/lysis_lift/symbolic.rs`
Structural diff	`ir-forge/src/lysis_lift/diff.rs`
Template extraction	`ir-forge/src/lysis_lift/extract/`

See ProveIR for the layer feeding Lysis. See Akron VM and Artik Witness VM for the other two VMs.