01
Near-term
Keep the site honest about current implementation: one local runtime path, strict .slm validation, q8/q4 models, deterministic fixtures, adapter stack proof, selector registry routing, and clear diagnostics.
- Improve public docs
- Keep llms files aligned
- Publish support matrix
- Add evidence-oriented examples
02
Middle-term
Expand model candidates and tokenizer handling where the runtime can prove compatibility. Add more concrete demonstrations of Pythia, Llama-style, low-vocabulary, tied-output, and BPE paths only when conversion and smoke evidence exist.
- Tokenizer matrix
- Candidate model table
- Browser demo evidence
- Evaluation sidecars
03
Future lanes
Future work can explore multi-model routing, richer adapter ecology, OPFS persistence, Web Workers, WebGPU comparison, fixed-point deterministic math, and skill-package routing. Each lane needs a manifest, validator, test, and public-claim boundary.
- Multi-model routing
- OPFS cache
- Worker isolation
- WebGPU comparison
- Fixed-point fallback