a cli that turns 7b reasoning into 70b

miii vs Claude Code,
Cursor, and Codex.

They charge you to use their model. miii runs yours for free. And makes it smarter than their cloud model on long tasks.

Every cloud coding agent picks a provider for you, locks you in, charges per token, and ships your code to a server you don't control. miii runs on Ollama by default — local, free, private. Same agent. Same Beacon context engine. Your provider, your choice, your cost. No subscription on top.

with Ollama

lock-in

60–70%

context saved

MIT

license

Feature matrix

	miii	Claude Code	Cursor	Codex CLI
Cost	$0 local · API at cost	$20–400+/mo	$20–40/mo	Pay-per-token
Your code stays local	✓	✗	✗	✗
Works offline	✓	✗	✗	✗
Open source	✓ MIT	✗	✗	✗
Autonomous agent	✓	✓	partial	✓
IDE required	✗	✗	VS Code fork	✗
Goal-aware context (Beacon)★	✓	✗	✗	✗
Per-tool context compression★	✓	✗	✗	✗
Dynamic context window	✓ auto	hardcoded	✗	✗
OS-level shell sandbox	✓	✗	✗	✗
Shadow git (model edit log)	✓	✗	✗	✗
Vendor lock-in	None	Anthropic	Cursor + OpenAI	OpenAI

The real cost of cloud agents

Cloud agents bill per token.
Here's the math.

Without context management, context grows every iteration. By depth 10, each LLM call carries the full history of every file read, every command run, every test output — verbatim.

Simple task

bug fix, 3–5 tool calls

miii

irrelevant

$0.00

Claude Code (Sonnet 4.5)

~17K in + 2K out

$0.08

Codex CLI (o4-mini)

~17K in + 2K out

$0.03

Cursor Pro

counted against 500/mo cap

~$0.04 equiv.

Complex task

refactor, 10–15 tool calls, multi-file

miii

irrelevant

$0.00

Claude Code (Sonnet 4.5)

~100K in + 8K out

$0.42

Codex CLI (o4-mini)

~100K in + 8K out

$0.15

Cursor Pro

exceeds 500/mo cap quickly

$20+ subscription

Annual cost at real usage

20 tasks/day · 220 working days · 50% complex. That's 4,400 tasks/year.

miii (Ollama)

miii (Anthropic API)

~$833–1,493/yr

miii (OpenAI API)

~$400–640/yr

Claude Code

$1,100–1,760/yr

Codex CLI

$400–640/yr

Cursor Pro

$240/yr

GitHub Copilot

$120/yr

bars scaled to Claude Code annual cost as reference

miii (Ollama)	$0	Local models, zero API fees
miii (Anthropic API)	~$833–1,493/yr	Same token rate as Claude Code — no subscription markup
miii (OpenAI API)	~$400–640/yr	Same token rate as Codex CLI — no subscription markup
Claude Code	$1,100–1,760/yr	Same API + Anthropic subscription on top
Codex CLI	$400–640/yr	OpenAI API only, no local fallback
Cursor Pro	$240/yr	Capped at 500 fast requests/month, no local fallback
GitHub Copilot	$120/yr	Not a full autonomous agent

Over 3 years

miii (Ollama)

~$2,500–4,479

miii (Anthropic API)

$3,300–5,280

Claude Code

$1,200–1,920

Codex CLI

$720

Cursor Pro

miii with Anthropic API costs the same tokens as Claude Code — but Beacon's 60–70% context compression means fewer tokens per task. No subscription fee. No markup.

Beacon — why miii wins on long tasks

Every cloud agent fails the same way.
Context fills. Task abandoned. You pay for every wasted token.

Context window at each depth

Without Beaconcrashes at depth 9

depth 1

depth 2

depth 3

depth 4

depth 5

depth 6

depth 7

depth 8

context full ✗

With Beaconcompletes at depth 20

depth 1

depth 2

depth 3

depth 4

depth 5

depth 6

depth 7

depth 8

Goal Block

· · ·

depth 20

✓ complete

no compression

Beacon compressed

Goal injection block

Beacon extracts your goal at depth 0, then injects a live state block just before the last message at every subsequent depth. No LLM call. Extracted in a single split. Injected every time.

miii · depth 12

╔════════════════════════════════════════════════════╗

║ Beacon — Goal State ║

║ Goal: refactor auth module to use JWT middleware ║

║ ║

║ Progress: ║

║ • Edited src/middleware/auth.ts ║

║ • Edited src/routes/user.ts ║

║ • Tests passing ║

║ ║

║ Stay focused. Do not stop until complete. ║

╚════════════════════════════════════════════════════╝

How Beacon compresses each tool

Tool result	Without Beacon	With Beacon	Reduction
read_file (200 lines)	200 lines verbatim	filename + line count + first 4 lines	97%
list_files (50 entries)	50 lines	8 entries + count	84%
run_command (100 lines)	100 lines	first 4 lines + last line	95%
run_tests (full output)	full stdout	first 10 lines (failures always kept)	90%
Error messages	verbatim	always verbatim — never touched	—

In a 15-step task

Without Beacon

~43,000

tokens in mid-history tool results

With Beacon

~2,300

same results, compressed

Total context reduction

95%

on tool history

60–70% on total context

API cost savings (Sonnet 4.5 · $3/MTok)

10 complex tasks

~406,000 tokens

~$1.22 saved

100 complex tasks

~4,060,000 tokens

~$12.18 saved

2,200 complex/year

~89M tokens

~$267/year saved

Why 7B beats 70B here

A 70B model drowning in noise fails before a focused 7B model running with Beacon. Context beats parameters. Beacon compresses tool output at the moment it's produced, keeps the goal in view, and lets an 8K-context model run to depth 20 — where every cloud agent is dead at depth 9. The model isn't the bottleneck. The context management is.

Privacy

Every line of code you send to Claude Code,
Cursor, or Codex leaves your machine.

Claude Code

Your .env files, proprietary algorithms, unreleased features, client codebases → Anthropic's servers.

Cursor

Your .env files, proprietary algorithms, unreleased features, client codebases → Cursor Inc + OpenAI or Anthropic.

Codex CLI

Your .env files, proprietary algorithms, unreleased features, client codebases → OpenAI's servers.

miii by default

Runs on Ollama. Your code never touches a network. When you opt into a cloud provider, you're making a conscious, per-session decision. You decide what leaves and when.

For regulated teams

Claude Code, Cursor, and Codex have no local fallback. Every task, every prompt, every file read goes to the cloud. For fintech, healthcare, legal, and defence: this is the difference between compliant and non-compliant.

Who miii is for

Zero recurring cost

Want a capable coding agent at $0 (local) or raw API cost (cloud). No subscription on top of your API key.

Sensitive codebases

Working on client code, IP, or credentials in-tree. Code must not leave the machine.

Offline / air-gapped

Travel, isolated networks, regulated environments, zero-internet setups. miii works where cloud AI can't.

Long task completion

Keep hitting context limits on complex autonomous tasks. Beacon compresses 60–70% of context — small models run to depth 20.

Model choice

Use Claude or OpenAI models without a subscription. Switch between Llama, Qwen, DeepSeek, and hosted models mid-session.

Open source ownership

MIT licensed. No vendor dependency. Own your tools. Audit every line.

The honest trade-off

Cloud models (Claude Sonnet 4.5, o3) have higher raw accuracy than most local Ollama models today. For a one-off question where cost/privacy don't matter, Claude Code or Codex CLI will produce a better answer.

For everyday development — refactoring, debugging, writing tests, navigating codebases — qwen2.5-coder, deepseek-coder-v2, and llama3.1 are more than sufficient. Beacon keeps them on task through the whole job.

When you hit a hard problem, /cloud in miii escalates one prompt to Claude Opus 4 or o3. You decide what leaves your machine and when.

Try it in three steps.

01Install Ollama

# install ollama first → ollama.ai

brew install ollama

ollama pull llama3.2

02Install miii-cli

npm i -g miii-cli

03Run it

miii

npm i -g miii-cli ← Back to miii

miii vs Claude Code,Cursor, and Codex.

Cloud agents bill per token.Here's the math.

Every cloud agent fails the same way.Context fills. Task abandoned. You pay for every wasted token.

Goal injection block

Every line of code you send to Claude Code,Cursor, or Codex leaves your machine.

Try it in three steps.

miii vs Claude Code,
Cursor, and Codex.

Cloud agents bill per token.
Here's the math.

Every cloud agent fails the same way.
Context fills. Task abandoned. You pay for every wasted token.

Every line of code you send to Claude Code,
Cursor, or Codex leaves your machine.