Fourteen months after DeepSeek R1 erased nearly $600 billion from Nvidia's market cap in a single day, the Hangzhou-based AI lab is preparing to do it again. According to reporting by the Financial Times and corroborated by sources at TechNode, DeepSeek's V4 model is expected to launch this week — and if the leaked specifications hold, it represents the most technically ambitious open-source AI release since GPT-4.
What DeepSeek V4 Actually Is
V4 is not a modest upgrade. It is a trillion-parameter sparse model that activates approximately 32 billion parameters per token — a Mixture-of-Experts (MoE) architecture DeepSeek calls "Top-16 routed." Unlike dense models, where every parameter fires on every input, MoE routing allows V4 to maintain the breadth of a trillion-parameter system while keeping per-token inference costs at a fraction of what a comparably sized dense model would require.
The model's headline feature is a 1 million token context window — nearly eight times larger than the 128K context of the V3 series. Internal testing, described by AIbase, showed the model processing "entire books or long code repositories at once" with coherent in-depth logical reasoning throughout. For enterprise deployments handling large codebases, lengthy legal documents, or complex multi-document research workflows, this is a qualitative shift rather than just a quantitative one.
V4 is also DeepSeek's first natively multimodal model. Sources told the Financial Times that the system can generate text, images, and video within a unified framework — putting it in direct competition with OpenAI's GPT-5 series, Google's Gemini 3.1 Pro, and Anthropic's Claude Opus 4.6, all of which have varying degrees of multimodal capability. The internal testing codename for the preview version is "Sealion Lite", per leakers who signed NDAs before participating in the closed evaluation.
Three Architectural Innovations
V4's capabilities emerge from three distinct research innovations published in late 2025 and early 2026 — each of which attracted significant attention from Western AI labs as standalone papers before their integration into V4 became apparent.
Manifold-Constrained Hyper-Connections (mHC), introduced in a December 31, 2025 paper co-authored by founder Liang Wenfeng, solves a fundamental problem in scaling large models: traditional hyper-connections that expand residual stream width create numerical instability at scale. The mHC solution constrains connection matrices to a mathematical manifold using the Sinkhorn-Knopp algorithm, limiting signal amplification to 1.6x compared to the 3,000x amplification seen with unconstrained methods. The practical result is that a 4× wider residual stream can be trained with only 6.7% additional compute overhead. IBM Principal Research Scientist Kaoutar El Maghraoui described the innovation as "scaling AI more intelligently rather than just making it bigger."
Engram conditional memory, released January 13, 2026, addresses what DeepSeek calls "silent LLM waste" — GPU cycles consumed by static knowledge lookups that don't require active reasoning. Engram decouples knowledge retrieval from reasoning by implementing an O(1) lookup module alongside the neural backbone, using multi-head hashing to map contexts to embedding tables without memory explosion. The system's context-aware gating suppresses retrieved embeddings when they conflict with global context. VentureBeat reported that benchmarks show needle-in-a-haystack retrieval jumping from 84.2% to 97.0% accuracy with Engram enabled — and a 100-billion-parameter embedding table can be offloaded to system DRAM with less than 3% throughput penalty.
DeepSeek Sparse Attention (DSA), which TechCrunch covered when it halved API costs on the V3 series, enables the million-token context window by breaking the quadratic scaling of standard transformer attention. DSA uses a "lightning indexer" to prioritize specific context excerpts, then a fine-grained token selection system to load only the most relevant tokens into the active attention window. The result is roughly linear rather than quadratic scaling with sequence length — the difference between a million-token context being economically viable and merely theoretically possible.
The Huawei Pivot and What It Signals
Perhaps the most geopolitically significant detail of the V4 launch is who got early access — and who didn't.
DeepSeek provided Huawei and Cambricon early access to V4 weeks before its official launch to ensure deep hardware optimization on Chinese-made chips. Nvidia and AMD were not given early access. The Times of India reported that by optimizing inference for Huawei's Ascend line and Cambricon's processors, DeepSeek is deliberately accelerating China's transition away from Western AI infrastructure for the inference layer — the part of the AI stack where end-users actually receive model responses.
This is a meaningful strategic distinction. Training still requires the most advanced GPUs, and DeepSeek V4 was almost certainly trained on Nvidia hardware (smuggled or otherwise obtained prior to export control tightening). But inference — the commercially critical workload that runs continuously to serve millions of users — can now be routed to domestic Chinese silicon with performance that, according to pre-release benchmarks, rivals Nvidia-optimized deployments.
The Center for Strategic and International Studies flagged this dynamic as early as March 2025 when analyzing the first DeepSeek shock: "Export controls on advanced AI chips have proven an effective deterrent for training, but the inference layer is becoming a different story." V4's Huawei-first optimization strategy is the clearest signal yet that DeepSeek is systematically building an AI stack that can run entirely on Chinese hardware — at least at the serving tier.
Why Markets Are Already Nervous
In January 2025, the launch of DeepSeek R1 — a reasoning model that rivaled GPT-4o on benchmarks while reportedly costing just $5.6 million to train — triggered a near-$600 billion single-day drop in Nvidia's market capitalization by raising a fundamental question: if frontier AI performance doesn't require billions in compute, what is all that GPU infrastructure actually worth?
V4's architecture deepens that question. With mHC enabling more efficient parameter scaling, Engram dramatically reducing memory overhead, and DSA making million-token contexts economically viable, DeepSeek is again demonstrating that architectural innovation compounds on itself. Internal benchmarks leaked to Reddit's r/DeepSeek community — unverified — claim V4 Lite outperforms Claude Opus 4.6 and Gemini 3.1 on code optimization and visual accuracy. Even discounting for self-serving internal numbers, the direction of travel is clear.
Markets are watching. Nvidia's stock has shown pre-launch volatility in the days following the FT report. The pattern from R1's launch — initial shock, partial recovery, followed by a recalibration of GPU demand assumptions — is well-understood at this point. Whether V4 triggers a comparable reaction depends largely on how credible its multimodal benchmarks look when they're independently verified.
Open Source, With Caveats
The V4 series is expected to be released under Apache 2.0 licensing, following DeepSeek's pattern with prior models. This means the weights will be publicly available for self-hosting, fine-tuning, and commercial deployment — including on the dual RTX 4090 setups that DeepSeek V3.2 was optimized to run on.
However, the Times of India and other sources have flagged a notable change from previous releases: DeepSeek is expected to release only a short technical note alongside V4, rather than the comprehensive technical paper that accompanied R1 and V3. Where R1 came with detailed methodological documentation that researchers worldwide picked apart for months, V4's documentation is expected to be significantly more guarded. The reason, according to sources, is the ongoing controversy over "distillation attacks" — the accusation, advanced by Anthropic and others, that DeepSeek used outputs from proprietary Western models to bootstrap its training data.
The distillation question cuts to the heart of the open-source AI ecosystem's credibility problem. If DeepSeek's efficiency advantages derive in part from training on Claude or GPT-4 outputs rather than from purely novel architectural work, the implications for both intellectual property and competitive dynamics are significant. By limiting documentation, DeepSeek is apparently choosing to sidestep that scrutiny — while still benefiting from the goodwill and developer adoption that open weights generate.
The Broader Race: Where V4 Fits
V4 arrives into a frontier model landscape that has accelerated dramatically since R1. February and early March 2026 alone saw Gemini 3.1 Pro, Claude Opus 4.6, GPT-5.3 Codex, Grok 4.20, and ByteDance Seed 2.0 all launched within weeks of each other. The pace has forced every major lab to compress release cycles and accelerate capability roadmaps.
What distinguishes V4 from this pack is not just performance but economics and access. A trillion-parameter model that can run on consumer-grade GPU clusters — or on Chinese domestic chips — changes the cost structure of frontier AI in ways that have downstream effects on every enterprise deploying these systems. When inference costs drop by an order of magnitude, the business models built around premium API access face structural pressure.
For the US-China AI competition specifically, V4 marks a maturation point. The first DeepSeek shock was surprising because no one expected Chinese AI to reach this performance tier under export control pressure. The V4 launch, by contrast, comes with well-telegraphed advance signals — FT reporting, community leaks, GitHub activity analysis — because everyone now takes DeepSeek seriously. The shock, if it comes, will be smaller in magnitude but larger in strategic implication: proof that the initial R1 result wasn't a one-off but a repeatable capability.
What to Watch at Launch
When DeepSeek officially announces V4, the following will determine how consequential it actually is:
Multimodal benchmark quality — specifically, whether its image and video generation outputs are competitive with GPT-5's DALL-E integration and Gemini 3.1's native video capabilities, or whether multimodal is a marketing checkbox with limited real-world performance.
Context window reliability — million-token context windows are only useful if coherence and retrieval accuracy hold across the full window. The Engram benchmarks showing needle-in-a-haystack accuracy at 97% are promising, but independent evaluation on real-world long-context tasks will be the real test.
Huawei performance gap — whether inference on Ascend hardware is genuinely competitive with Nvidia-based deployments, or whether the optimization story is more aspirational than operational at launch.
API pricing — DeepSeek has historically priced aggressively relative to Western competitors. V4's API pricing will signal whether the company intends to press its cost advantage or move upmarket toward premium positioning.
The V4 launch is expected within days. Whatever the final benchmark numbers show, the trajectory is clear: DeepSeek has systematically closed the gap with Western AI labs while simultaneously building a parallel infrastructure stack designed to operate outside Western semiconductor supply chains. That combination — capability parity plus hardware independence — is the actual story the markets are pricing.




