An open-source terminal coding agent powered by Kimi K2.6, MCP tools, local memory, and direct Workers AI execution — no API middleman.
Occasional release notes, technical write-ups, and early experiments on building AI coding agents on Cloudflare.
No spam. Only meaningful ships.
/health that returns { ok: true }.
Plan mode blocks all mutating tools for safe research. Edit mode prompts per call. Auto mode approves everything for trusted tasks.
For multi-step work, the agent publishes a task list with progress icons, elapsed time, and token deltas. Multi-step work feels managed.
Drop image paths (PNG, JPG, WebP, GIF, BMP up to 5 MB) into any prompt. The model sees them inline — perfect for UI reviews, diagrams, and screenshots.
dark, light, high-contrast, dracula, nord, one-dark, monokai, solarized, tokyo-night, gruvbox, catppuccin, rose-pine. Live preview with Ctrl+T.
Large pastes collapse to [pasted N lines #id]. Full content still goes to the model — scrollback stays clean.
Type your next prompt while the model is still working. Queued prompts fire in order. Ctrl-C aborts current and clears queue.
Compiled context extracts structured state and archives raw tool outputs as recallable artifacts. Auto-compaction kicks in at ~80% usage. The model never loses track of what it learned.
Toggle the model's chain-of-thought with /reasoning or Ctrl-R. See how it thinks in real time.
Every turn is auto-saved. /resume lists past sessions with message counts in a paginated picker. Never lose your place.
Bash session-allow is keyed by the first token (allow all git commands). Write/edit show a unified diff before you approve.
Read entire modules, large configs, and full stack traces without the model losing track. Direct to Cloudflare — no middleman.
Plug in external tools via the Model Context Protocol — local stdio servers or remote SSE endpoints. GitHub, Sentry, docs search, databases, and more.
Deterministic system prompts, stable JSON serialization, and session-affinity headers maximize Workers AI prefix-cache hits. Cached tokens are billed at a discount — cost drops as the conversation grows.
The agent never surveils your conversation. Memories are stored only when you ask — via remember, recall, and forget tools — with SQLite + embeddings for durable, privacy-respecting retrieval across sessions.
Read-only research. Mutating tools are hard-blocked. Ask "plan a refactor" and the agent investigates without touching your filesystem. Review, then exit plan mode to execute.
Default mode. The agent calls tools freely for read-only work; mutating tools pause for your approval with a unified diff preview.
Autonomous execution. Every tool call is auto-approved. Use for trusted, well-scoped tasks. The agent still warns before irreversible actions.
Or run without installing: npx kimiflare
| Tool | Permission | Description |
|---|---|---|
| read | auto | Read a text file (≤ 2MB) with optional line range |
| write | prompt | Create or overwrite a file. Shows a diff before approval |
| edit | prompt | Replace an exact substring. Fails unless unique match |
| bash | prompt | Run a shell command. Session-allow keyed by first token |
| glob | auto | Match files by pattern, sorted by mtime |
| grep | auto | Regex search. Uses ripgrep if available |
| web_fetch | auto | Fetch a URL, convert HTML → markdown (≤ 100KB) |
| tasks_set | auto | Publish a live task list for multi-step work |
Plus LSP intelligence (hover, go-to-definition, references, diagnostics), cross-session memory (remember / recall / forget), and MCP extensibility for plugging in external tool servers.