Selection Guide
Quick guide to choosing the right model without turning the decision into a pricing spreadsheet. Benchmarks move quickly; workflow fit lasts longer.
Start with the Workflow
Section titled “Start with the Workflow”| Workflow | Start With | Why |
|---|---|---|
| Long-running agentic coding | frontier reasoning models with reliable tool use | Strong reasoning and tool use matter more than raw speed |
| Complex architecture or debugging | Highest-reasoning model you trust operationally | Deep work benefits from slower, more careful models |
| Visual/UI work | strong multimodal models | Screenshot and layout understanding matter here |
| Fast completions and quick edits | low-latency models | Low latency keeps editing flow intact |
| Huge repos or large investigations | Long-context models, used selectively | Context window size helps only when paired with good context hygiene |
| Local/private workflows | Strong open-weight coding models | Best when data control matters more than frontier performance |
Questions That Matter More Than Price
Section titled “Questions That Matter More Than Price”- Does this task need depth or speed?
- Do I need multimodal input such as screenshots or diagrams?
- Is the code sensitive enough that provider and jurisdiction matter?
- Will I be running a long agent loop or making one quick edit?
- Do I need local execution or is hosted inference acceptable?
Decision Tree
Section titled “Decision Tree”What matters most?│├─► Quality (hard problems, complex refactors)│ ├─► Need deepest reasoning → Frontier reasoning model│ └─► Need balance → Strong all-rounder model│├─► Speed (completions, quick iterations)│ └─► Low-latency model│├─► Context (huge codebases)│ └─► Long-context model + selective context loading│├─► Privacy (no cloud)│ └─► Strong open-weight model + local serving stack│└─► "Just pick for me" └─► Strong general-purpose coding model with reliable tool useModel Combinations
Section titled “Model Combinations”Many developers use multiple models:
| Role | Use For |
|---|---|
| Heavy hitter | Complex changes, architecture, debugging |
| Fast model | Tab completions, quick edits, iterative loops |
| Local/private model | Sensitive code or offline work |
This keeps the workflow responsive without forcing one model to do every job.
Switching Costs
Section titled “Switching Costs”Switching difficulty depends more on tool architecture than on the model itself:
- some tools expose a model picker in the UI
- some rely on config files or CLI flags
- some are tied to one provider family
If model switching matters to you, prefer tools that make provider changes explicit and reversible.
Practical Heuristics
Section titled “Practical Heuristics”- Use your best reasoning model for tasks that would take a human hours.
- Use faster models for autocomplete, drafts, and tight feedback loops.
- Treat long context as a capability, not a permission slip to dump everything in.
- Prefer local models when privacy constraints are the dominant requirement.
- Re-check live benchmarks before making strong model claims in team docs or policies.
Local Models
Section titled “Local Models”For privacy, offline work, or air-gapped environments. Always an option, but most developers start with hosted APIs.
| Model size class | Hardware Needed | When to Use |
|---|---|---|
| Large coding model (roughly 30B+) | 24GB+ VRAM | best local quality if you have the hardware |
| Mid-size coding model (roughly 14B-16B) | 16GB+ VRAM | good balance of quality and practicality |
| Small coding model (roughly 7B-8B) | 8GB+ VRAM | lighter hardware and experimentation |
Run with a local serving stack or desktop runtime. Pair with whatever editor or terminal workflow already fits your environment.
Sources
Section titled “Sources”| Benchmark | What It Measures | Link |
|---|---|---|
| SWE-bench | Real GitHub issue resolution | swebench.com |
| Aider Polyglot | Multi-language code editing | aider.chat/docs/leaderboards |
| Artificial Analysis | Speed, quality, model changes over time | artificialanalysis.ai |
| LLM Stats | Aggregated benchmarks | llm-stats.com |
Use live benchmark trackers for current details. Data moves quickly.