2026 年 4 月 11 日 星期六
  • 登录
  • 注册
周天财经
广告
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金
没有结果
查看所有结果
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金
没有结果
查看所有结果
周天财经
没有结果
查看所有结果
首页 商业

DeepSeek Unveils Its Most Powerful Open-source Innovation, Challenging GPT-5 and Gemini 3

2025 年 12 月 2 日
在 商业
阅读时间: 4 mins read
阅读:783
A A

Related articles

张勇回归后的首份财报,「红石榴计划」能否再造海底捞?

张勇回归后的首份财报,「红石榴计划」 能否再造海底捞?

2026 年 4 月 11 日
睿能科技「两连板」关键并购:押注宁王系供应商,「豪赌」基本面破局 | 并购一线​

睿能科技 「两连板」 关键并购:押注宁王系供应商,「豪赌」 基本面破局 | 并购一线​

2026 年 4 月 10 日


Image source: unsplash

Credit: unsplash

On the occasion of ChatGPT's third birthday, its competitor DeepSeek showed up with a 「birthday gift」 that seems a little too competitive, as if unwilling to let the pioneer of large language models enjoy an easy celebration.

On the evening of December 1, DeepSeek unveiled two official models—DeepSeek-V3.2 and DeepSeek-V3.2-Speciale—in one go. The accompanying technical paper reveals that these models have achieved world-leading reasoning capabilities.

According to DeepSeek, the newly updated 「regular lineup」 V3.2—now available on the web, app, and via API—strikes a balance between reasoning ability and output length, making it well-suited for everyday use.

In benchmark reasoning tests, V3.2 and GPT-5, as well as Claude 4.5, showed varied strengths across different domains. Only Gemini 3 Pro delivered a noticeably stronger overall performance compared to the first three.

Image source: DeepSeek official WeChat

Source: DeepSeek official WeChat

 

Meanwhile, DeepSeek also stated that compared to the recently released Kimi-K2-Thinking from the domestic large model developer Moonshot AI, DeepSeek V3.2 has significantly reduced output length, which greatly decreases computational overhead and user wait time. In agent benchmarking, V3.2 also outperformed other open-source models such as Kimi-K2-Thinking and MiniMax M2, making it the strongest open-source large model to date. Its overall performance is now extremely close to that of the top closed-source models.

Image from DeepSeek official WeChat

Image from DeepSeek official WeChat

What』s even more noteworthy is V3.2』s performance in certain Q&A scenarios and general agent tasks. In a specific case involving travel advice, for example, V3.2 leveraged deep reasoning along with web crawling and search engine tools to provide highly detailed and accurate travel tips and recommendations. The latest API update for V3.2 also supports tool usage in 「thinking mode」 for the first time, greatly enriching the usefulness and breadth of answers users receive.

In addition, DeepSeek specifically emphasized that V3.2 was not specially trained on the tools featured in these evaluation datasets.

We』ve observed that while benchmark scores for large models are climbing, these models often make basic factual errors in everyday user interactions (a criticism especially directed at GPT-5 upon its release). Against this backdrop, DeepSeek has made a point of highlighting with each update that it avoids relying solely on correct answers as a reward mechanism. As a result, they have not produced a so-called 「super-intelligent brain」 that appears clever in benchmarks yet fails at simple tasks and questions that matter to ordinary users—a 「low EQ」 AI agent.

Overcoming this challenge at a fundamental level—becoming a large model with both high IQ and high EQ—is the key to developing a truly versatile, reliable, and efficient AI agent. DeepSeek also believes that V3.2 can demonstrate strong generalization capabilities in real-world application scenarios.

 

In order to strike a balance between computational efficiency, powerful reasoning capabilities, and agent performance, DeepSeek has implemented comprehensive optimizations across training, integration, and application layers. According to its technical paper, V3.2 introduces DSA (DeepSeek Sparse Attention mechanism), which significantly reduces computational complexity in long-context scenarios while maintaining model performance.

At the same time, to integrate reasoning capabilities into tool-using scenarios, DeepSeek has developed a new synthesis pipeline that enables systematic, large-scale generation of training data. This approach facilitates scalable agent post-training optimization, substantially improving generalization in complex, interactive environments as well as the model』s ability to follow instructions.

In addition, as mentioned earlier, V3.2 is also the first model from DeepSeek to incorporate reasoning into tool usage, greatly enhancing the model』s generalization capabilities.

If the focus of V3.2 is on 「saying things that make sense and getting things done」—a balance-seeking approach for practical intelligent agents—then the positioning of the 「Special Forces」 V3.2 Speciale is to push the reasoning ability of open-source models to the limit and explore the boundaries of model capabilities through extended reasoning.

It』s worth noting that a major highlight of V3.2 Speciale is its integration of the theorem-proving capabilities from DeepSeek-Math-V2, the most powerful mathematical large model released just last week.

Math-V2 not only achieved gold-medal-level performance in the 2025 International Mathematical Olympiad and the 2024 China Mathematical Olympiad, but also outperformed Gemini 3 in the IMO-Proof Bench benchmark evaluation.

Moreover, in a similar vein to previously discussed approaches, this mathematical model is also striving to overcome the limitations of correct-answer reward mechanisms and the so-called 「test-solver」 identity by adopting a self-verification process. In doing so, it seeks to break through the current bottlenecks in AI』s deep reasoning, enabling large models to truly understand mathematics and logical derivations; as a result, it aims to achieve more robust, reliable, and versatile theorem-proving capabilities.

With its greatly enhanced reasoning abilities, V3.2 Speciale has achieved Gemini 3.0 Pro-level results in mainstream reasoning benchmarks. However, V3.2 Speciale』s performance advantages come at the cost of consuming a large number of tokens, which significantly increases its operational costs. As a result, it currently does not support tool calls or everyday conversation and writing, and is intended for research use only.

From OCR to Math-V2, then to V3.2 and V3.2 Speciale, each of DeepSeek』s recent product launches has been met with widespread praise. At the same time, these releases have not only brought significant improvements in overall capabilities, but also continually clarified the main development trajectories of 「practicality」 and 「generalization」.

In the second half of 2025, with GPT-5, Gemini 3, and Claude Opus 4.5 launching one after another—each outperforming the last in benchmark tests—and with DeepSeek rapidly catching up, the race to be crowned the 「most powerful large model」 is already getting crowded. Leading large models are now showing clear distinctions in their training approaches as well as their unique characteristics in real-world performance, setting the stage for an even more exciting competition among large models in 2026. (Author|Hu Jiameng, Editor|Li Chengcheng)

广告

相关 文章

张勇回归后的首份财报,「红石榴计划」能否再造海底捞?

张勇回归后的首份财报,「红石榴计划」 能否再造海底捞?

来自 周天财经
2026 年 4 月 11 日
0

作为火锅行业乃至整个中餐领域的龙头,海底...

睿能科技「两连板」关键并购:押注宁王系供应商,「豪赌」基本面破局 | 并购一线​

睿能科技 「两连板」 关键并购:押注宁王系供应商,「豪赌」 基本面破局 | 并购一线​

来自 周天财经
2026 年 4 月 10 日
0

图片系 AI 生成 4 月 9 日晚间,停牌多日的睿...

雪饼猴们「整顿」演技综艺

雪饼猴们 「整顿」 演技综艺

来自 周天财经
2026 年 4 月 10 日
0

(本文作者为 犀牛娱乐,钛媒体经授权发布...

AI带火了储能,但「升咖」之路依然漫长

AI 带火了储能,但 「升咖」 之路依然漫长

来自 周天财经
2026 年 4 月 10 日
0

(本文作者为 飞向 TAI 空,钛媒体经授权...

AI教育风口下,有人乘风破浪,有人艰难求生

手握专利武器大杀四方的 Maxeon,为何走到了破产边缘?

来自 周天财经
2026 年 4 月 10 日
0

(本文作者为 华夏能源网,钛媒体经授权发...

加载更多
广告
  • 热门
  • 评论
  • 最新
神马经典投研: 集资讯、策略、研报一站式期货投研工具

神马经典投研: 集资讯、策略、研报一站式期货投研工具

2025 年 11 月 7 日
「我们也深陷残酷价格战」,德资巨头中国区高管警告

「我们也深陷残酷价格战」,德资巨头中国区高管警告

2025 年 8 月 4 日
一周产业基金|上海市人工智能CVC基金发布;湖北百亿人形机器人母基金来了

一周产业基金|上海市人工智能 CVC 基金发布;湖北百亿人形机器人母基金来了

2025 年 8 月 4 日
「硬科技」指数携手上涨,半导体设备ETF易方达(159558)、芯片ETF易方达(516350)等产品助力布局板块龙头

基民懵了!这个火爆的板块年内涨超 37%,主力却借道 ETF 狂抛逾 400 亿元

2025 年 9 月 20 日
Lesson 1: Basics Of Photography With Natural Lighting

The Single Most Important Thing You Need To Know About Success

4
Lesson 1: Basics Of Photography With Natural Lighting

Lesson 1: Basics Of Photography With Natural Lighting

3
Lesson 1: Basics Of Photography With Natural Lighting

5 Ways Animals Will Help You Get More Business

2
Lesson 1: Basics Of Photography With Natural Lighting

New Cryptocurrency That Will Kill Of Bitcoin

2

新华社权威快报 | 中国自贸试验区扩围至 23 个 新增内蒙古

2026 年 4 月 11 日
张勇回归后的首份财报,「红石榴计划」能否再造海底捞?

张勇回归后的首份财报,「红石榴计划」 能否再造海底捞?

2026 年 4 月 11 日

今日黄金期货价格实时行情 (2026 年 4 月 8 日)

2026 年 4 月 11 日
台积电3月份营收超过130亿美元 同比继续大增也再创新高

台积电 3 月份营收超过 130 亿美元 同比继续大增也再创新高

2026 年 4 月 11 日
  • 隐私政策
  • 联系我们
  • 关于周天
  • 登录
  • 注册
投诉建议:+86 13326565461

© 2025 广州小舟天传媒有限公司 by 周天财经 - 粤 ICP 备 2025452169 号-1

没有结果
查看所有结果
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金

© 2025 广州小舟天传媒有限公司 by 周天财经 - 粤 ICP 备 2025452169 号-1

欢迎回来!

在下面登录您的帐户

忘记密码? 注册

创建新帐户!

填写以下表格进行注册

所有项目需要填写。 登录

重置您的密码

请输入您的用户名或电子邮件地址以重置密码。

登录

用户登录

还没有账号?立即注册

用户注册

已有账号?立即登录