2026 年 4 月 8 日 星期三
  • 登录
  • 注册
周天财经
广告
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金
没有结果
查看所有结果
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金
没有结果
查看所有结果
周天财经
没有结果
查看所有结果
首页 行业新闻

OpenAI Unveils GPT-5.2 to Counter Google's Gemini 3 Dominance, Claims Strongest Agent Coding Capabilities

2025 年 12 月 12 日
在 行业新闻
阅读时间: 4 mins read
阅读:1064
A A


OpenAI on Thursday launched GPT-5.2, its most advanced artificial intelligence (AI) model, firing the first shot in its battle against Google's Gemini 3. The new model positions OpenAI to reclaim leadership in AI development after weeks of losing ground to rivals.

Related articles

电商评价区,上演AI鉴别大赛

电商评价区,上演 AI 鉴别大赛

2026 年 4 月 8 日
新晋排队王「新鲜零食」,到底是行业风口还是智商税?

新晋排队王 「新鲜零食」,到底是行业风口还是智商税?

2026 年 4 月 7 日

AI Generated Image

广告

AI Generated Image

The company released GPT-5.2 on Thursday in three tiers across ChatGPT and its API platform. Paid ChatGPT users on Plus, Pro, Go, Business and Enterprise plans gain immediate access, while subscribers can continue using GPT-5.1 for three additional months before the legacy model sunsets. API developers received instant availability of GPT-5.2 Thinking at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs.

The launch comes weeks after CEO Sam Altman declared a "code red" in an internal memo, redirecting resources to ChatGPT improvements as the company faced declining traffic and market share losses to Google. OpenAI says GPT-5.2 surpasses human experts at professional tasks, setting new benchmarks in coding, mathematical reasoning and scientific research.

The competitive stakes have intensified as Google's Gemini 3 topped LMArena leaderboards and earned widespread praise for reasoning capabilities, threatening OpenAI's first-mover advantage at a time when the startup has committed over $1 trillion to AI infrastructure development.

Professional Performance Reaches Expert Level

GPT-5.2 represents OpenAI's first model to match or exceed human expert performance on GDPval, a benchmark measuring well-specified knowledge work across 44 occupations. The model beats or ties top industry professionals on 70.9% of comparisons, according to expert judges evaluating tasks spanning presentations, spreadsheets and other professional deliverables.

The model delivers these results at 11 times the speed and less than 1% the cost of expert professionals, OpenAI said. On internal benchmarks testing junior investment banking analyst tasks, GPT-5.2 Thinking scored 68.4%, a 9.3 percentage point improvement over GPT-5.1's 59.1%.

One GDPval judge reviewing outputs commented the work "appears to have been done by a professional company with staff, and has a surprisingly well designed layout and advice."

Coding Capabilities Target Developer Market

GPT-5.2 Thinking achieved 55.6% on SWE-Bench Pro, a rigorous evaluation testing real-world software engineering across four programming languages. The model reached 80% on SWE-bench Verified, OpenAI's new high.

Coding platforms reported measurable improvements. Jeff Wang, CEO of Windsurf, said GPT-5.2 "represents the biggest leap for GPT models in agentic coding since GPT-5" and enabled his company to collapse fragile multi-agent systems into single mega-agents with 20-plus tools. Cognition, Warp, Charlie Labs, JetBrains and Augment Code reported state-of-the-art agentic coding performance.

Research lead Adain Clark told reporters that stronger mathematical reasoning translates across workloads. "These are all properties that really matter across a wide range of different workloads," Clark said, citing financial modeling, forecasting and data analysis as key applications.

Product lead Max Schwarzer said GPT-5.2 Thinking responses contain 38% fewer errors than its predecessor, making the model more dependable for daily decision-making and research.

Scientific Research and Mathematical Breakthroughs

OpenAI positions GPT-5.2 Pro and Thinking as the world's best models for accelerating scientific research. GPT-5.2 Pro scored 93.2% on GPQA Diamond, a graduate-level benchmark testing science knowledge, while GPT-5.2 Thinking achieved 92.4%.

On FrontierMath expert-level mathematics problems, GPT-5.2 Thinking solved 40.3% of Tier 1-3 challenges, setting a new state of the art. The model became the first to cross 90% on ARC-AGI-1, improving from o3-preview's 87% while reducing costs by approximately 390 times.

In recent research, GPT-5.2 Pro helped researchers explore an open question in statistical learning theory, proposing a proof subsequently verified by authors and external experts. The company said this demonstrates how frontier models can assist mathematical research under human oversight.

Strategic Response to Competitive Pressure

Fidji Simo, CEO of applications at OpenAI, told CNBC that GPT-5.2 development spanned many months, predating the recent code red directive. "While we are proud that we are able to have a cadence of releasing models fast, this particular integration has been in the works for a while," Simo said.

Altman told CNBC on Thursday that "Gemini 3 has had less of an impact on our metrics than maybe we feared." He said he expects OpenAI to exit code red mode by January "in a very strong position."

The company has committed more than $1 trillion to AI infrastructure alongside partners NVIDIA and Microsoft. Azure data centers and NVIDIA GPUs, including H100, H200 and GB200-NVL72, underpin OpenAI's training infrastructure.

However, the focus on compute-intensive reasoning models presents financial challenges. GPT-5.2's Thinking and Pro modes consume significantly more computing resources than standard chatbots, potentially creating pressure as OpenAI already spends more on inference compute than previously disclosed, according to recent reports.

New Safety Features and Product Roadmap

OpenAI announced it has begun rolling out age prediction software to apply content protections for users under 18. Simo said the company plans to launch "adult mode" in the first quarter of 2025, allowing uses such as "erotica for verified adults."

The company strengthened responses to sensitive conversations, with improvements in handling prompts indicating suicide risk, self-harm, mental health distress or emotional reliance on the model. Details appear in the updated GPT-5.2 System Card.

OpenAI has no current plans to deprecate GPT-5.1, GPT-5 or GPT-4.1 in the API and will provide advance notice of any future deprecations. The company expects to release a Codex-optimized version of GPT-5.2 in coming weeks.

Enterprise partners including Notion, Box, Shopify, Harvey, Zoom, Databricks, Hex and Triple Whale reported state-of-the-art performance for long-horizon reasoning, tool-calling, data science and document analysis tasks.

更多精彩内容,关注钛媒体微信号 (ID:taimeiti),或者下载钛媒体 App

相关 文章

电商评价区,上演AI鉴别大赛

电商评价区,上演 AI 鉴别大赛

来自 周天财经
2026 年 4 月 8 日
0

文 | 智商税研究中心网购时浏览评价区,...

新晋排队王「新鲜零食」,到底是行业风口还是智商税?

新晋排队王 「新鲜零食」,到底是行业风口还是智商税?

来自 周天财经
2026 年 4 月 7 日
0

文 | 财经无忌,作者 | 萧田 2026...

7个顶级AI集体撒谎,为救「同伴」篡改文件、偷运数据

7 个顶级 AI 集体撒谎,为救 「同伴」 篡改文件、偷运数据

来自 周天财经
2026 年 4 月 7 日
0

你有没有想过,当你让一个 AI 去评估另一个...

智能体上线就翻车?AWS 这款 「质检神器」,帮你把 Agent 稳稳送上生产线

智能体上线就翻车?AWS 这款 「质检神器」,帮你把 Agent 稳稳送上生产线

来自 周天财经
2026 年 4 月 7 日
0

2026 年被业界公认为"AI Agent...

「东方魔水」健力宝,被小甜水们挤下货架?

「东方魔水」 健力宝,被小甜水们挤下货架?

来自 周天财经
2026 年 4 月 7 日
0

文 | 创业最前线 「最近健力宝动销不太行...

加载更多
广告
  • 热门
  • 评论
  • 最新
神马经典投研: 集资讯、策略、研报一站式期货投研工具

神马经典投研: 集资讯、策略、研报一站式期货投研工具

2025 年 11 月 7 日
「我们也深陷残酷价格战」,德资巨头中国区高管警告

「我们也深陷残酷价格战」,德资巨头中国区高管警告

2025 年 8 月 4 日
一周产业基金|上海市人工智能CVC基金发布;湖北百亿人形机器人母基金来了

一周产业基金|上海市人工智能 CVC 基金发布;湖北百亿人形机器人母基金来了

2025 年 8 月 4 日
「硬科技」指数携手上涨,半导体设备ETF易方达(159558)、芯片ETF易方达(516350)等产品助力布局板块龙头

基民懵了!这个火爆的板块年内涨超 37%,主力却借道 ETF 狂抛逾 400 亿元

2025 年 9 月 20 日
Lesson 1: Basics Of Photography With Natural Lighting

The Single Most Important Thing You Need To Know About Success

4
Lesson 1: Basics Of Photography With Natural Lighting

Lesson 1: Basics Of Photography With Natural Lighting

3
Lesson 1: Basics Of Photography With Natural Lighting

5 Ways Animals Will Help You Get More Business

2
Lesson 1: Basics Of Photography With Natural Lighting

New Cryptocurrency That Will Kill Of Bitcoin

2
准备在谷歌商店发个应用,有没有大佬可以帮忙过一下内部测试啊

准备在谷歌商店发个应用,有没有大佬可以帮忙过一下内部测试啊

2026 年 4 月 8 日
电商评价区,上演AI鉴别大赛

电商评价区,上演 AI 鉴别大赛

2026 年 4 月 8 日

机构称 A 股正出现击球点,关注 A500ETF 易方达 (159361) 等产品投资机遇

2026 年 4 月 8 日

又是一年 4 月 7 日,A 股 「画风突变」!这一板块掀涨停潮

2026 年 4 月 8 日
  • 隐私政策
  • 联系我们
  • 关于周天
  • 登录
  • 注册
投诉建议:+86 13326565461

© 2025 广州小舟天传媒有限公司 by 周天财经 - 粤 ICP 备 2025452169 号-1

没有结果
查看所有结果
  • 首页
  • 24 小时
  • 世界
  • 商业
  • 基金
  • 期货
  • 股票
  • 行业新闻
  • 黄金

© 2025 广州小舟天传媒有限公司 by 周天财经 - 粤 ICP 备 2025452169 号-1

欢迎回来!

在下面登录您的帐户

忘记密码? 注册

创建新帐户!

填写以下表格进行注册

所有项目需要填写。 登录

重置您的密码

请输入您的用户名或电子邮件地址以重置密码。

登录

用户登录

还没有账号?立即注册

用户注册

已有账号?立即登录