omlx
Developed by jundot
oMLX is a high-performance local LLM inference server optimized specifically for Apple Silicon Macs. Built on the MLX framework, it supports text LLMs, VLMs, OCR, and embedding models. It features a unique Tiered KV Cache system (Hot RAM + Cold SSD) that persists context across requests and server restarts, making it ideal for coding agents like Claude Code. oMLX provides a native macOS menubar app and a web dashboard, supporting continuous batching, automated memory management via LRU eviction, and seamless MCP integration.
- Tiered KV Cache (Hot/Cold)
- Continuous Batching
- Multi-model Serving & LRU Eviction
- Native macOS Menubar & Dashboard
- Claude Code Optimization
desktopweb