ion7-llm

Chat pipeline + multi-session inference orchestration on top of ion7-core

$ luarocks install ion7-llm

ion7-llm v0.2 - high-level chat pipeline built on ion7-core.

- Per-seq KV snapshots, prefix cache, slot pool, fork.
- Engine (single-session) + Pool (N concurrent sessions, one batch per tick, ~6× aggregate speedup over sequential).
- Mid-generation eviction (sessions keep generating past n_ctx).
- RadixAttention exact-match prefix cache (warm-start identical prompts), Y-Token sink hook for dynamic attention sinks.
- 4-channel streaming : content / thinking / tool_call_delta / tool_call_done / stop. Format-aware tool extraction (OpenAI, Qwen, Mistral, Hermes).
- Interleaved-thinking-aware tool loop, reasoning budget, dedicated embedding pipeline.
- Structured output is delegated to ion7-grammar (build the sampler there, pass it through engine:chat opts.sampler).

Versions

0.2.0beta1-216 hours ago0 downloads
0.2.0beta1-117 hours ago1 download

Dependencies

ion7-core >= 0.1.0beta4
lua >= 5.1

Labels

Manifests