DeepSeek V4: 1 Trillion Parameters, Open-Source, and 94.7% on HumanEval
DeepSeek V4 arrived in early April 2026 with fully open weights under the Apache 2.0 license. It is a one-trillion parameter Mixture-of-Experts model — meaning its architecture activates only a fraction of parameters per inference, achieving high capability at manageable compute cost. On HumanEval (the standard code generation benchmark), DeepSeek V4 scored 94.7%. It excels in long-context reasoning and coding. Most remarkable: the estimated training cost was approximately $5.2 million — a fraction of the $100 million-plus budgets associated with comparable frontier models from US labs.
DeepSeek V4 is the latest and most dramatic demonstration that frontier AI capability is no longer the exclusive preserve of companies with billion-dollar compute budgets.
The Training Efficiency Story
The $5.2 million training cost claim is the number that stops most AI professionals in their tracks. For context: training a comparable US frontier model — one that achieves HumanEval scores above 90% — has typically required compute budgets of $50-100 million or more. If DeepSeek V4’s training cost claim is accurate, it represents a 10-20x efficiency improvement over comparable Western models. The efficiency gains come from several architectural innovations: a tiered KV cache storage system (MODEL1 architecture) that cuts memory requirements by 40%, sparse FP8 decoding that achieves 1.8x inference speedup, and an enhanced pre-training curriculum that improves training efficiency by 30%.
These are not marginal improvements. They represent a different approach to the fundamental tradeoffs in large model training — an approach that was developed largely outside the US AI ecosystem and is now available to anyone, for free, under Apache 2.0.
94.7% on HumanEval: What It Means
HumanEval tests code generation across a set of Python programming problems. Human expert programmers typically score around 77-80%. Claude Opus 4.6, the current leader on SWE-bench, scores around 80.8% on that more demanding coding benchmark. DeepSeek V4’s 94.7% on HumanEval puts it at the top of publicly available code generation benchmarks — though HumanEval and SWE-bench measure somewhat different capabilities, and direct comparisons require careful interpretation. What is not ambiguous: this is a model that writes very good code, available for free, that can be run on your own infrastructure.
The Geopolitical Dimension
DeepSeek is a Chinese lab. Its models are released with open weights and Apache 2.0 licensing. The combination creates a genuine geopolitical puzzle: the US export control framework for AI chips is designed to prevent China from training frontier models, but DeepSeek V4’s claimed training efficiency makes those controls less effective if the efficiency claims hold under scrutiny. Training a frontier model for $5.2 million on less powerful hardware changes the calculus of what export controls can achieve. This is a policy problem that Washington has not yet resolved, and DeepSeek V4’s release has reinvigorated the debate.
For enterprises evaluating DeepSeek V4 for production use, the practical considerations are straightforward: the model is capable, the license is permissive, and the weights are publicly available. The governance questions — around Chinese lab provenance, potential data exposure in self-hosted deployments, and regulatory compliance in specific industries — are real and require assessment. For use cases where those governance concerns are manageable, DeepSeek V4 represents a compelling capability-per-cost option that no comparable Western open-source model currently matches.
📘 AI Engineering: Building Applications with Foundation Models
📘 Atomic Habits — James Clear
DeepSeek V4: 1 Trilione di Parametri, Open-Source, e 94,7% su HumanEval
DeepSeek V4 è arrivato all’inizio di aprile 2026 con pesi completamente aperti sotto licenza Apache 2.0. È un modello Mixture-of-Experts da un trilione di parametri. Su HumanEval ha ottenuto il 94,7%. Eccelle nel ragionamento a lungo contesto e nel coding. La cosa più sorprendente: il costo di training stimato era di circa 5,2 milioni di dollari — una frazione dei budget di 100 milioni di dollari o più associati a modelli di frontiera comparabili dei lab americani.
La Storia dell’Efficienza del Training
L’affermazione sul costo di training da 5,2 milioni di dollari è il numero che ferma la maggior parte dei professionisti AI. Per contesto: formare un modello di frontiera americano comparabile ha tipicamente richiesto budget di calcolo da 50-100 milioni di dollari o più. Se la cifra di DeepSeek V4 è accurata, rappresenta un miglioramento dell’efficienza di 10-20x rispetto a modelli occidentali comparabili. I guadagni di efficienza provengono da innovazioni architetturali: un sistema di archiviazione KV cache a livelli che riduce i requisiti di memoria del 40%, decodifica FP8 sparsa che raggiunge 1,8x di speedup dell’inferenza.
La Dimensione Geopolitica
DeepSeek è un lab cinese. I suoi modelli vengono rilasciati con pesi aperti e licenza Apache 2.0. La combinazione crea un vero puzzle geopolitico: il quadro di controllo delle esportazioni USA per i chip AI è progettato per impedire alla Cina di formare modelli di frontiera, ma l’efficienza di training rivendicata da DeepSeek V4 rende quei controlli meno efficaci. Formare un modello di frontiera per 5,2 milioni di dollari su hardware meno potente cambia il calcolo di ciò che i controlli sulle esportazioni possono ottenere.