Skip to content

Selection Guide

Quick guide to choosing the right model without turning the decision into a pricing spreadsheet. Benchmarks move quickly; workflow fit lasts longer.

WorkflowStart WithWhy
Long-running agentic codingfrontier reasoning models with reliable tool useStrong reasoning and tool use matter more than raw speed
Complex architecture or debuggingHighest-reasoning model you trust operationallyDeep work benefits from slower, more careful models
Visual/UI workstrong multimodal modelsScreenshot and layout understanding matter here
Fast completions and quick editslow-latency modelsLow latency keeps editing flow intact
Huge repos or large investigationsLong-context models, used selectivelyContext window size helps only when paired with good context hygiene
Local/private workflowsStrong open-weight coding modelsBest when data control matters more than frontier performance
  1. Does this task need depth or speed?
  2. Do I need multimodal input such as screenshots or diagrams?
  3. Is the code sensitive enough that provider and jurisdiction matter?
  4. Will I be running a long agent loop or making one quick edit?
  5. Do I need local execution or is hosted inference acceptable?
What matters most?
├─► Quality (hard problems, complex refactors)
│ ├─► Need deepest reasoning → Frontier reasoning model
│ └─► Need balance → Strong all-rounder model
├─► Speed (completions, quick iterations)
│ └─► Low-latency model
├─► Context (huge codebases)
│ └─► Long-context model + selective context loading
├─► Privacy (no cloud)
│ └─► Strong open-weight model + local serving stack
└─► "Just pick for me"
└─► Strong general-purpose coding model with reliable tool use

Many developers use multiple models:

RoleUse For
Heavy hitterComplex changes, architecture, debugging
Fast modelTab completions, quick edits, iterative loops
Local/private modelSensitive code or offline work

This keeps the workflow responsive without forcing one model to do every job.

Switching difficulty depends more on tool architecture than on the model itself:

  • some tools expose a model picker in the UI
  • some rely on config files or CLI flags
  • some are tied to one provider family

If model switching matters to you, prefer tools that make provider changes explicit and reversible.

  • Use your best reasoning model for tasks that would take a human hours.
  • Use faster models for autocomplete, drafts, and tight feedback loops.
  • Treat long context as a capability, not a permission slip to dump everything in.
  • Prefer local models when privacy constraints are the dominant requirement.
  • Re-check live benchmarks before making strong model claims in team docs or policies.

For privacy, offline work, or air-gapped environments. Always an option, but most developers start with hosted APIs.

Model size classHardware NeededWhen to Use
Large coding model (roughly 30B+)24GB+ VRAMbest local quality if you have the hardware
Mid-size coding model (roughly 14B-16B)16GB+ VRAMgood balance of quality and practicality
Small coding model (roughly 7B-8B)8GB+ VRAMlighter hardware and experimentation

Run with a local serving stack or desktop runtime. Pair with whatever editor or terminal workflow already fits your environment.

BenchmarkWhat It MeasuresLink
SWE-benchReal GitHub issue resolutionswebench.com
Aider PolyglotMulti-language code editingaider.chat/docs/leaderboards
Artificial AnalysisSpeed, quality, model changes over timeartificialanalysis.ai
LLM StatsAggregated benchmarksllm-stats.com

Use live benchmark trackers for current details. Data moves quickly.