Capability Patterns
Use this page for durable capability classes, not leaderboard snapshots. Specific rankings move too quickly to build the guide around.
Capability Classes That Matter
Section titled “Capability Classes That Matter”Deep reasoning models
Section titled “Deep reasoning models”Best for:
- multi-file changes
- architecture decisions
- debugging with several interacting causes
- long agent loops that need to recover from failure
Tradeoff:
- slower and often more expensive in time or usage budget
Fast iteration models
Section titled “Fast iteration models”Best for:
- autocomplete and short edits
- drafting tests or boilerplate
- quick review loops where latency matters more than depth
Tradeoff:
- weaker at holding long plans and resolving ambiguous requirements
Multimodal models
Section titled “Multimodal models”Best for:
- UI implementation from screenshots or mockups
- debugging visual regressions
- working from diagrams, design files, or image-based documentation
Tradeoff:
- not every multimodal model is equally strong at coding depth
Long-context models
Section titled “Long-context models”Best for:
- large investigations
- broad repository mapping
- document-heavy workflows
Tradeoff:
- large context windows help only when context is selective and well-structured
Local or open-weight models
Section titled “Local or open-weight models”Best for:
- sensitive code
- offline or air-gapped environments
- teams that prioritize control over frontier performance
Tradeoff:
- capability may lag top hosted models, especially on hard agentic tasks
How to Choose by Workflow
Section titled “How to Choose by Workflow”| Workflow | Start with this capability class | Why |
|---|---|---|
| Complex bug fix | Deep reasoning | Root-cause analysis matters more than speed |
| New feature with many moving parts | Deep reasoning | Planning and recovery matter |
| UI build from design references | Multimodal | Visual understanding changes the result |
| Tight edit loop | Fast iteration | Lower latency keeps the workflow moving |
| Large codebase exploration | Long-context | Breadth helps when paired with context hygiene |
| Sensitive or regulated work | Local or open-weight | Operational boundaries may matter more than peak capability |
Context Reality
Section titled “Context Reality”Advertised context is not the same thing as reliable context. Once the prompt gets noisy, even very large windows help less than people expect.
- prefer selective retrieval over giant prompt dumps
- treat long context as a tool for breadth, not permission to include everything
- keep core rules pushed into project context files and retrieve the rest on demand
See Context Engineering for the workflow implications.
What This Page Intentionally Does Not Do
Section titled “What This Page Intentionally Does Not Do”- maintain live model rankings
- promise a single “best model”
- freeze benchmark snapshots into durable guidance
For time-sensitive benchmark details, use Benchmarks That Matter and confirm current data before making team-level decisions.
Evidence Tags
Section titled “Evidence Tags”Research-backed: verification, selective context, and review costs matter more than leaderboard chasingPractitioner-backed: capability classes are how many teams actually choose models in daily work
The taxonomy on this page is a workflow-first simplification, not one benchmark’s official ontology.
Next Steps
Section titled “Next Steps”- Choosing a Model: workflow-first chooser
- Selection Guide: practical decision heuristics
- Benchmarks That Matter: appendix-style benchmark interpretation