Grammar & Lexer
Token kinds, operator precedence, and EBNF grammar for the Achronyme surface language.
Lexer
Hand-written single-pass byte scanner. No external dependency (no pest, no logos).
- File:
crates/achronyme-parser/src/lexer.rs - Entry:
Lexer::tokenize(source: &str) -> Result<Vec<Token>, ParseError> - Handles UTF-8 multibyte input via
advance_multibyte(); tracks(line, col, byte_offset)for every produced token so downstream diagnostics can render carets at the correct visual column even when the source contains non-ASCII identifiers. - Comments are stripped at lex time:
//line comment, terminated by newline (or EOF)/* ... */block comment, non-nesting (single-pass scanner; nesting would require a counter and is not supported)
- Whitespace (
' ','\t','\r','\n') is consumed between tokens. - Numeric literals are recognised by their prefix:
- bare digits →
Number 0p→FieldLit(decimal/hex/bin variant by suffix)0i<width>→BigIntLit(width is the bit width: 256, 384, 512, …)
- bare digits →
Token kinds
The full TokenKind enum lives in crates/achronyme-parser/src/token.rs. Below it is grouped by role. A token is (kind, span, lexeme); lexeme is borrowed from the source for cheap identifier comparison.
Literals
pub enum TokenKind {
Number, // 42, 0, 9999
FieldLit, // 0p123, 0p1A2F, 0p0b1010
BigIntLit, // 0i256_DEAD..., 0i384_0b1010..., 0i512_999
StringLit, // "hello"
True, // true
False, // false
Nil, // nil
// ...
}
Keywords (23 reserved)
let mut if else while
for in fn return break
continue print nil true false
public witness prove circuit forever
import export as
These are matched by exact lexeme after a generic identifier has been recognised. Keywords cannot be shadowed as identifiers.
Identifiers
pub enum TokenKind {
Ident, // x, foo_bar, MerkleProof
// ...
}
IDENT = letter { letter | digit | "_" }. ASCII letters only at the start; _ allowed after the first byte.
Operators
pub enum TokenKind {
// Arithmetic
Plus, Minus, Star, Slash, Percent, Caret,
// Comparison
Eq, NotEq, Lt, Le, Gt, Ge,
// Logical
AndAnd, OrOr, Bang,
// Assignment
Assign, // =
// Path / member
ColonColon, // ::
Dot, // .
Question, // ? (ternary)
// ...
}
Delimiters
pub enum TokenKind {
LParen, RParen, // ( )
LBrace, RBrace, // { }
LBracket, RBracket, // [ ]
Comma, Colon, Semicolon,
DotDot, // .. (range)
Arrow, // -> (return type, unused in current grammar)
Eof,
}
Operator precedence
Pratt parser. Source: crates/achronyme-parser/src/parser/tables.rs. The Pratt loop reads left_bp and right_bp from infix_binding_power(token) and recurses while right_bp >= caller_bp.
| Level | Operators | Associativity |
|---|---|---|
| 1 | ^ (pow) | right |
| 2 | *, /, % | left |
| 3 | +, - | left |
| 4 | ==, !=, <, <=, >, >= | left |
| 5 | && | left |
| 6 | || | left |
Lower level number = higher binding power. Exponent is right-associative because in tables.rs its left_bp > right_bp, so a ^ b ^ c parses as a ^ (b ^ c).
Unary - and ! are handled in the prefix table and bind tighter than any binary operator. The ternary cond ? a : b is handled at expression top level (lower than ||) and is right-associative for the chained-ternary case.
Surface EBNF
program = { stmt } EOF ;
stmt = let_decl | mut_decl | assign | public_decl | witness_decl
| fn_decl | circuit_decl | import | export | import_circuit
| print | return | break | continue | expr_stmt ;
let_decl = "let" IDENT [ ":" type_ann ] "=" expr ";" ;
mut_decl = "mut" IDENT [ ":" type_ann ] "=" expr ";" ;
assign = lvalue "=" expr ";" ;
lvalue = IDENT { "[" expr "]" | "." IDENT } ;
public_decl = "public" input_decl { "," input_decl } ";" ;
witness_decl = "witness" input_decl { "," input_decl } ";" ;
input_decl = IDENT [ ":" type_ann ] ;
fn_decl = "fn" IDENT "(" [ params ] ")" [ ":" type_ann ] block ;
circuit_decl = "circuit" IDENT "(" [ params ] ")" block ;
params = typed_param { "," typed_param } ;
typed_param = IDENT ":" type_ann ;
import = "import" STRING [ "as" IDENT ] ";" ;
export = "export" stmt ;
import_circuit = "import" "circuit" STRING [ "as" IDENT ] ";" ;
block = "{" { stmt } "}" ;
expr = ternary ;
ternary = logical_or [ "?" expr ":" expr ] ;
logical_or = logical_and { "||" logical_and } ;
logical_and = comparison { "&&" comparison } ;
comparison = additive { ("=="|"!="|"<"|"<="|">"|">=") additive } ;
additive = multiplicative { ("+"|"-") multiplicative } ;
multiplicative = power { ("*"|"/"|"%") power } ;
power = unary { "^" unary } ; (* right-associative *)
unary = ("-"|"!") unary | postfix ;
postfix = primary { post_op } ;
post_op = "(" [ args ] ")"
| "[" expr "]"
| "." IDENT [ "(" [ args ] ")" ] ;
primary = NUMBER | FIELD_LIT | BIGINT_LIT | STRING_LIT
| "true" | "false" | "nil"
| IDENT
| TYPE "::" MEMBER
| "(" expr ")"
| "[" [ expr { "," expr } ] "]"
| "{" [ map_pair { "," map_pair } ] "}"
| if_expr | for_expr | while_expr | forever_expr
| block | fn_expr | prove_expr ;
if_expr = "if" expr block [ "else" ( block | if_expr ) ] ;
for_expr = "for" IDENT "in" iterable block ;
iterable = range | expr ;
range = NUMBER ".." (NUMBER | expr) ;
while_expr = "while" expr block ;
forever_expr = "forever" block ;
fn_expr = "fn" [ IDENT ] "(" [ params ] ")" [ ":" type_ann ] block ;
prove_expr = "prove" [ IDENT ] "(" [ prove_params ] ")" block ;
prove_params = prove_param { "," prove_param } ;
prove_param = IDENT ":" visibility base_type [ "[" "]" ] ;
args = arg { "," arg } ;
arg = [ IDENT ":" ] expr ; (* positional or keyword *)
map_pair = ( IDENT | STRING_LIT ) ":" expr ;
type_ann = [ visibility ] base_type [ "[" NUMBER "]" ] ;
visibility = "Public" | "Witness" ;
base_type = "Field" | "Bool" | "Int" | "String" ;
NUMBER = digit { digit } ;
FIELD_LIT = "0p" ( hex_digit { hex_digit } | "0" "1" { "0" | "1" } | digit { digit } ) ;
BIGINT_LIT = "0i" digit { digit } ( hex_digit { hex_digit } | "0" "1" { "0" | "1" } | digit { digit } ) ;
STRING_LIT = '"' { any_char_except_quote } '"' ;
IDENT = letter { letter | digit | "_" } ;
Literal forms
| Form | Example | Notes |
|---|---|---|
| Integer | 42, -7 | Bytecode VM uses i60 inline (range −2⁵⁹…2⁵⁹−1) |
| Field decimal | 0p123 | Hex variant 0p1A2F, binary 0p0b1010 |
| BigInt | 0i256_DEAD... | Width-prefixed (256, 384, 512 …) |
| String | "hello" | UTF-8, no escape sequences yet |
| Boolean | true, false | Tag values, not field |
| Nil | nil | VM only |
| Array | [1, 2, 3] | Homogeneous; circuit arrays must have static size |
| Map | {a: 1, b: 2} | VM only |
Number literals are stored as String in the AST and parsed lazily by the lowering layer — this lets the parser accept arbitrarily long integers without committing to a numeric type.
prove block syntax
Two accepted forms. The new typed-parameter form is preferred; the legacy public-list form is kept for backwards compatibility with beta.14 sources.
// New explicit form (typed parameters)
prove (root: public Field, leaf: witness Field, path: witness Field[]) {
// body sees root, leaf, path bound by the prove block
}
// Legacy public-list form
prove (public: [root]) {
public root;
witness leaf;
// ...
}
Captured variables follow normal scope rules; the ProveIR compiler walks the surrounding OuterScope to bind them. Captures are serialised into the constant pool together with the ProveIR template (TAG_BYTES).
Comments + whitespace
//line comment (newline-terminated)/* ... */block comment, non-nesting (single-pass scanner)- Whitespace is significant only as token separator
- Newlines are not statement terminators — semicolons are required between statements
The lexer never emits comment tokens; they are discarded inline. Doc-comment harvesting (for the LSP) happens at the parser layer by inspecting trivia stored alongside tokens.
Error recovery
The hand-written parser uses synchronization on ; and } boundaries to keep emitting Stmt::Error / Expr::Error placeholders so the LSP gets multiple diagnostics from one pass. The recovery strategy is:
- On parse error in a statement, consume tokens until the next
;or matching}is reached. - Emit
Stmt::Error { span }covering the swallowed range. - Resume parsing the next statement.
For expression errors inside a statement, the parser returns Expr::Error { id, span } and lets the statement-level recovery catch the rest. See crates/achronyme-parser/src/parser/core.rs for the synchronize() helper.
Source files
| Component | File |
|---|---|
| Lexer | crates/achronyme-parser/src/lexer.rs |
| Tokens | crates/achronyme-parser/src/token.rs |
| Parser core (Pratt) | crates/achronyme-parser/src/parser/core.rs |
| Expression parsing | crates/achronyme-parser/src/parser/exprs.rs |
| Statement parsing | crates/achronyme-parser/src/parser/stmts.rs |
| Precedence + dispatch tables | crates/achronyme-parser/src/parser/tables.rs |
See AST Reference for the data structure produced by the parser.