$ luarocks install ion7-llmion7-llm v0.2 - high-level chat pipeline built on ion7-core.
- Per-seq KV snapshots, prefix cache, slot pool, fork.
- Engine (single-session) + Pool (N concurrent sessions, one batch per tick, ~6× aggregate speedup over sequential).
- Mid-generation eviction (sessions keep generating past n_ctx).
- RadixAttention exact-match prefix cache (warm-start identical prompts), Y-Token sink hook for dynamic attention sinks.
- 4-channel streaming : content / thinking / tool_call_delta / tool_call_done / stop. Format-aware tool extraction (OpenAI, Qwen, Mistral, Hermes).
- Interleaved-thinking-aware tool loop, reasoning budget, dedicated embedding pipeline.
- Structured output is delegated to ion7-grammar (build the sampler there, pass it through engine:chat opts.sampler).