DeepSeek V4 compared with GPT‑5.4 and Claude Opus 4.6: prices, innovations and positioning

DeepSeek has released the V4 family in preview, consisting of Flash (284 billion total parameters, 13 billion active) and Pro (1.6 trillion total parameters, 49 billion active). Both offer a 1-million-token context window, are released under the MIT license, and introduce a hybrid attention architecture (CSA + HCA) that drastically reduces the computational cost of long sequences. The comparison with GPT‑5.4 (OpenAI, March 2026) and Claude Opus 4.6 (Anthropic, February 2026) shows a model that is competitive in benchmarks, with prices up to two orders of magnitude lower in the budget tier and one third lower in the high-end tier.


🔍 Model overview

ModelTotal parametersActive parametersMax contextOpennessRelease date
DeepSeek V4 Flash284B13B1M tokensOpen weights (MIT)24 April 2026
DeepSeek V4 Pro1.6T49B1M tokensOpen weights (MIT)24 April 2026
GPT‑5.41.05M tokensProprietary5 March 2026
GPT‑5.4 Pro1.05M tokensProprietary5 March 2026
Claude Opus 4.61M tokensProprietary5 February 2026

“Active parameters” refers to the number of parameters actually used for each inference in Mixture of Experts (MoE) models.


💰 API price comparison (USD per million tokens)

ModelInput (USD/1M tokens)Output (USD/1M tokens)Notes
DeepSeek V4 Flash0.140.28Base API price; cache: 0.02/0.05
DeepSeek V4 Pro1.743.48Cache: 1.00/3.00; in China 1¥/12¥
GPT‑5.42.5015.00Input cache: 0.25; >272k tokens: 2× input, 1.5× output
GPT‑5.4 Pro30.00180.00Available only on Pro/Enterprise plan
Claude Opus 4.65.0025.00Standard; >200k tokens: 10.00/37.50

Observations

  • DeepSeek V4 Flash costs approximately 95% less than Claude Opus 4.6 and over 98% less than GPT‑5.4 Pro in output.
  • DeepSeek V4 Pro keeps an output price 76% lower than Opus 4.6 and 77% lower than GPT‑5.4, while offering comparable performance in most tasks.
  • GPT‑5.4 sits in an intermediate price band, while GPT‑5.4 Pro and Opus 4.6 represent the premium tier.

🧪 Performance comparison (selected benchmarks)

BenchmarkDeepSeek V4 ProClaude Opus 4.6GPT‑5.4Notes
SWE‑bench (code repair)83.7%~80%~79%
HumanEval (code generation)~90%~88%~87%
AIME (math)CompetitiveCompetitiveCompetitiveModels substantially aligned
Multistep Task Completion8.90 (29/38 completed)8.87 (Opus 4.7, completion 38/38)V4 Pro excels in medium tasks, struggles with the most complex ones
Arena.ai (code ranking)14th place (thinking mode), 1st among open‑weight models
Vibe Code Benchmark1st among open‑weight modelsBeats Kimi K2.6 and Gemini 3.1 Pro

Reading the data
The benchmarks show that DeepSeek V4 Pro is in the leading group, with an edge in coding and math tasks. Claude Opus 4.6 retains a margin in the most complex tasks (long reasoning chains, precise recall in >100k token contexts), while GPT‑5.4 holds up well thanks to tool efficiency and reduced error rates. In any case, the differences are narrow and essentially concern the most difficult 10–15% of tasks.


💡 Innovations brought by each model

DeepSeek V4

  • CSA/HCA hybrid attention (Compressed Sparse Attention + Heavily Compressed Attention): the most significant innovation. It compresses the key‑value (KV) cache along the sequence dimension, reducing the computational cost of long sequences by 73% compared to V3.2 (for 1M tokens, Pro requires only 27% of V3.2’s compute, Flash just 10%). This makes the 1‑million‑token context economically viable.
  • mHC (manifold‑constrained hyper‑connections): replaces traditional residual connections to improve numerical stability during training of very deep networks.
  • Muon optimizer: accelerates convergence and improves training stability over AdamW, particularly suited to the MoE architecture.
  • FP4/FP8 mixed precision: halves the memory footprint of weights, enabling deployment on less powerful hardware.
  • Compatibility with Huawei Ascend NPUs: the first trillion-parameter model verified on Chinese accelerators, with inference speed‑ups of 1.5–1.73×.
  • Full openness: weights, architecture and training details are public (MIT license), allowing fine‑tuning and local execution.

Claude Opus 4.6

  • Agent Teams: ability to orchestrate multiple Claude instances that work in parallel and coordinate autonomously, breaking down complex tasks (code review, document analysis) as a team of engineers would.
  • Context compaction: the model can autonomously summarise its own context to extend the execution of long-running tasks without external intervention.
  • Hybrid reasoning: supports both immediate responses and extended thinking, with granular API controls over the token budget dedicated to reasoning.
  • Improved tool use: native integration with Excel and PowerPoint; ability to work across entire codebases, legal documents and debugging sessions without loss of fidelity.

GPT‑5.4

  • Native Computer Use: first general‑purpose OpenAI model capable of directly controlling a computer (mouse, keyboard, screenshots). Reaches 75% on the OSWorld benchmark, exceeding human performance (72.4%).
  • Tool Search: a mechanism that loads tool definitions only when needed, reducing by 47% the number of tokens used in multi‑tool workflows.
  • 33% fewer incorrect claims compared to GPT‑5.2 and 18% fewer responses containing errors.
  • Anticipated thought plans: the interface shows the action plan before execution, allowing mid‑course corrections.
  • 1M token context window available via API, with the promise of improved token efficiency that offsets the higher per‑token price.

🧭 Positioning and final considerations

DeepSeek V4 enters the market as an open‑weight alternative at drastically lower costs, bringing architectural innovations (CSA/HCA) that make long‑context use practical even on modest hardware. In benchmarks it is competitive with the best closed models and surpasses them in some areas (coding, mathematics, Chinese). The weak point remains reliability in the most complex tasks and production stability, where Anthropic’s and OpenAI’s models show lower variance.

Claude Opus 4.6 retains the lead in tasks requiring prolonged reasoning, accurate recall in broad context windows and agent orchestration. It is the reference model for those working on large codebases or needing maximum robustness in agentic workflows.

GPT‑5.4 brings the ability to “use a computer” natively to the market, a concrete step towards autonomous agents. Its token efficiency and reduction of factual errors make it suitable for professional environments and for generating verified content. The Pro version raises the reasoning bar further, but at a very high cost.

In summary, the choice between the three models depends on the type of workload:

  • Value for money and openness: DeepSeek V4.
  • Maximum robustness and multi‑agent orchestration: Claude Opus 4.6.
  • GUI automation and error reduction: GPT‑5.4 (or GPT‑5.4 Pro for the most demanding tasks).

Note: the prices and performance reported are updated to May 2026.

Leave a Reply

Your email address will not be published. Required fields are marked *