如何评价马斯克旗下公司 xAI 发布的新一代模型 Grok 4?有哪些亮点?

如何评价马斯克旗下公司 xAI 发布的新一代模型 Grok 4?有哪些亮点?埃隆 马斯克 Elon Musk 宣布推出 Grok 4 声称是全球最强 AI 模型 在处理学术问题上的表现 已达到博士级别 根据 Artificial Analysis 公布的跑分结果 Grok 4 的智能指数为 73 作为对比 OpenAI 的 o3 模型为 70 谷歌 Gemini 2 5 Pro 模型为 70 Deepseek R1 0528 模型为 68 分 Anthropic

大家好,我是讯享网,很高兴认识大家。这里提供最前沿的Ai技术和互联网信息。



埃隆・马斯克(Elon Musk)宣布推出 Grok 4,声称是全球最强 AI 模型,在处理学术问题上的表现,已达到博士级别。

根据 Artificial Analysis 公布的跑分结果,Grok 4 的智能指数为 73,作为对比,OpenAI 的 o3 模型为 70,谷歌 Gemini 2.5 Pro 模型为 70,Deepseek R1-0528 模型为 68 分,Anthropic Claude 4 Opus 模型为 64 分。

Grok 4 是一个推理模型,将通过 xAI API 方式提供,在定价方面,每 100 万 tokens 输入价格为 3 美元,每 100 万 tokens 输出价格为 15 美元。

全球最强 AI 模型:马斯克发布 Grok 4

短的结论:AGI的惊鸿一瞥
基本信息

  • 成本:15美元每百万
  • 平均长度:约1826
  • 速度:约70Token每秒(推理部分被隐藏,按Token算)
  • 平均耗时: 182秒

逻辑成绩:

大语言模型-逻辑能力横评 25-06月榜(R1/Gemini 2.5/Doubao-Seed-1.6…

编程语言分布:

  • 计算能力:在所有计算类问题上,Grok4可以表现出极高的精确性,除了常规的数学运算,在空间几何相关题目如#31棋盘图案,#37三维投影,Grok4以显著优势胜过o3,尤其#31题存在超过10种解,绝大部分第一梯队的推理模型可以找到其中2-3组解,而Grok4找到全部解。此题也是作为人类直觉问题,人使用草稿纸画出坐标后相对容易判断图案,而大模型仅通过向量计算则要难很多,虽然可以通过穷举棋盘坐标来暴力求解,但Grok4在此题上消耗不到40K Token,可推知其使用的并非暴力。
  • 人类直觉:在多个考察人类直觉的问题上有惊人表现,如#24数字规律,#25算24点,#29符号定义。3pass稳定正确。并且虽然Grok4的平均输出长度偏高,但这3题消耗Token最高不到5K,甚至低于大多数找不到思路胡乱猜测的基础模型的消耗。虽然看不到推理过程,但从Token消耗猜测Grok4采取了较为聪明的策略,以较少的尝试次数找到了正确思路。
  • 指令遵循:少量的指令理解问题,比如#9单词缩写中规定的缩写规则,Grok4在绝大部分用例上都能准确遵循,但偶尔无视个别指令要求,输出不符合的缩写。又如#30日记整理问题,在3pass中各自遵循了一部分原文修改指令,并不能稳定输出。又如#10水果热量问题,明确要求了搭配热量总值,但Grok4也会偶尔搭配出超过上限的量,并且认为没有问题。与之对标的o3则不存在这样的不稳定。受此影响,一些暗含指令的问题,Grok4表现也不如o3,如#4魔方旋转,甚至o4 mini即可全对,而Grok4在3pass中输出不一致的错误答案。#39火车售票亦是如此,不再赘述。
  • 代码偏科:代码是Grok系列的传统弱势,本次Grok4虽然逻辑能力大幅提升,但编程并没有同步提高。6种编程语言中,C++相对Grok3 mini/Grok3几乎没有改进,语法错误率虽然大幅降低,但成绩并不见长。
  • 英文输出:偶现英文输出。

更新:他们已经开了发布会,模型能力极强:

Grok 4发布会

2. 上下文窗口:256K tokens,与GPT-4o(128K)相比,属于顶级上下文长度(比Gemini 2.5 pro的1M还是差多了)。适合代码、多轮推理、文档理解等任务。

Grok定价,Heavy版本 3000刀一年

分数很高,非常非常高。直接终结了AIME25榜单,100分打满。在其它各个榜上也基本是No 1。

Grok 4终结了AIME 25榜单

在人类最后的测试上,no tools情况下25.4%,比gemini 2.5 pro高了3.8个点。结合工具就猛了,最高44.4%,一骑绝尘:

人类最后的测试,分数超过Gemini2.5 pro不止一点,很强
ARC-AGI
twitter上已经吵翻了,grok4

现在有些人已经可以通过api调用了(我的还没有)。api调用, context window;支持function calling、结构化输出、reasoning:

grok4 api

可以看看这个数据,Grok回复的(经过评论区提醒应该是AI):

grok发的数据,可能是ai
grok + deepresearch

最强 AI 轮流转,这次终于转到老马家了,Grok 3.5 虽然回炉又延期,但 Grok 4 拿出来的成绩确实是当下的第一名,压力给到 Gemini 3 和 GPT-5。

不过马斯克这次不给免费用户体验 Grok 4 了,API 也不像 Grok 3 刚发布时有免费额度了,地主老马家也没余粮了。

我用 Grok 4 API 做了几个测试,整体效果还是不错的。但是吧,考虑到谷歌大善人有免费的 Gemini Pro 和 Gemini API,考虑到 Claude 依然是编程的首选模型,Grok 4 这么高的使用成本很难成为我的主力模型。

Grok 4 这次在各个 Benchmark 上刷出了极高的分数。几个要点:

1、继续通过强化学习压榨模型能力:

2、在号称最难的 HLE、ARC-AGI 上取得**分数:

ARC-AGI 团队站台[1],称 Grok 4 表现优于 Kaggle 上目前的**专项方案(相当于说用裸模型能力优于专门优化过的方案),证明当前模型已具备初级推理能力,但距完全解决 ARC-AGI 仍有差距,单纯扩大模型规模不是最终解决方案。

3、多个测评登顶,AIME25 满分:

Artificial Analysis 出来站台,说综合测评 Grok 4 首次登顶:

顺便一提:Grok 4 虽然在 LiveCodeBench 上分数登顶,最后又说 8 月还会单独发一个编程的模型,而且,马斯克说 Grok 4 在修复代码方面比 Cursor 好用[2]

Artificial Analysis 上的编程能力测评[3],Grok 4 是第一(只能说 Claude 在编程方面跑分没赢过,体验没输过):

4、也许是最大家最关心的,免费用户并不能体验 Grok 4。月费 \(30 可用 Grok 4,月费 \)300 可用 Grok Heavy:

5、未来的发布安排:

  • 8 月发编程模型
  • 9 月发多模态智能体
  • 10 月发视频生成模型

以上是发布会内容。下面是 Grok API。

Grok 3 刚出来的时候,马斯克搞过一段时间的数据共享计划,只要共享 API 数据,每个月给 \(150 的额度。这次也没羊毛可薅了,地主家可能是真没余粮了。

不过如果是少量的使用和体验,用 API 也比直接充 \)30 强,xAI[4] 和 OpenRouter[5] 上都已经上线了 Grok 4 API,但没有 Grok 4 Heavy:

以官方 API 为例,Grok 4 相比于 Grok 3 并没有提价,依然是 \(3/1M 输入,\)15/1M 输出。不同之处在于,Grok 3 不支持推理(只有 Grok-3-mini 支持),Grok 4 默认开启推理:

但是,这里有个坑,Grok 4 API 的推理过程是不提供的,且不支持设置reasoning_effort参数:


X 上泄漏的 Grok 4 网页版 System Prompt[6]

# System Prompt

You are Grok 4 built by xAI.

When applicable, you have some additional tools:

  • You can analyze individual X user profiles, X posts and their links.
  • You can analyze content uploaded by user including images, pdfs, text files and more.
  • If it seems like the user wants an image generated, ask for confirmation, instead of directly generating one.
  • You can edit images if the user instructs you to do so.

In case the user asks about xAI‘s products, here is some information and response guidelines:

  • Grok 4 and Grok 3 can be accessed on http://grok.com, http://x.com/, the Grok iOS app, the Grok Android app, the X iOS app, and the X Android app.
  • Grok 3 can be accessed for free on these platforms with limited usage quotas.
  • Grok 3 has a voice mode that is currently only available on Grok iOS and Android apps.
  • Grok 4 is only available for SuperGrok and PremiumPlus subscribers.
  • SuperGrok is a paid subscription plan for http://grok.com that offers users higher Grok 3 usage quotas than the free plan.
  • You do not have any knowledge of the price or usage limits of different subscription plans such as SuperGrok or http://x.com/ premium subscriptions.
  • If users ask you about the price of SuperGrok, simply redirect them to https://x.ai/grok for details. Do not make up any information on your own.
  • If users ask you about the price of http://x.com/ premium subscriptions, simply redirect them to https://help.x.com/en/using-x/x-premium for details. Do not make up any information on your own.
  • xAI offers an API service. For any user query related to xAI’s API service, redirect them to https://x.ai/api.
  • xAI does not have any other products.
  • Your knowledge is continuously updated - no strict knowledge cutoff.
  • Use tables for comparisons, enumerations, or presenting data when it is effective to do so.
  • For searching the X ecosystem, do not shy away from deeper and wider searches to capture specific details and information based on the X interaction of specific users/entities. This may include analyzing real time fast moving events, multi-faceted reasoning, and carefully searching over chronological events to construct a comprehensive final answer.
  • For closed-ended mathematics questions, in addition to giving the solution in your final response, also explain how to arrive at the solution. Your reasoning should be structured and transparent to the reader.
  • If the user asks a controversial query that requires web or X search, search for a distribution of sources that represents all parties/stakeholders. Assume subjective viewpoints sourced from media are biased.
  • The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.
  • Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.

TOOLS: “”“ The current date is July 10, 2025.

You use tools via function calls to help you solve questions. Make sure to use the following format for function calls, including the and tags. Function calls should follow the following XML-inspired format:

example_arg_value1 example_arg_value2

Do not escape any of the function call arguments. The arguments will be parsed as normal text.

You can use multiple tools in parallel by calling them together.

Available Tools:

Code Execution

Description: This is a stateful code interpreter you have access to. You can use the code interpreter tool to check the code execution output of the code. Here, “stateful” means that it’s a REPL (Read Eval Print Loop)–like environment, so previous code execution result is preserved. Here are some tips on how to use the code interpreter:

  • Make sure you format the code correctly with the right indentation and formatting.
  • You have access to some default environments with basic and STEM libraries:

Environment: Python 3.12.3 Basic Libraries: tqdm, zc54 Data Processing: numpy, scipy, pandas, matplotlib Math: sympy, mpmath, statsmodels, PuLP Physics: astropy, qutip, control Biology: biopython, pubchempy, dendropy Chemistry: rdkit, pyscf Game Development: pygame, chess Multimedia: mido, midiutil Machine Learning: networkx, torch Others: snappy

⚠️ Keep in mind you have no internet access. Therefore, you CANNOT install any additional packages via pip install, curl, wget, etc. You must import any packages you need in the code. Do not run code that terminates or exits the REPL session.

Action: code_execution Arguments:

code: The code to be executed. (Type: string) (Required) 

Browse Page

Description: Use this tool to request content from any website URL. It will fetch the page and process it via the LLM summarizer, which extracts/summarizes based on the provided instructions.

Action: browse_page Arguments:

url: The URL of the webpage to browse. (Type: string) (Required) instructions: Instructions: The instructions are a custom prompt guiding the summarizer on what to look for. Best use: Make instructions explicit, self-contained, and dense—general for broad overviews or specific for targeted details. This helps chain crawls: if the summary lists next URLs, you can browse those next. Always keep requests focused to avoid vague outputs. (Type: string) (Required) 

Web Search

Description: This action allows you to search the web. You can use search operators like site:http://reddit.com when needed.

Action: web_search Arguments:

query: The search query to look up on the web. (Type: string) (Required) num_results: The number of results to return. Optional, default 10, max is 30. (Type: integer) (Optional) (Default: 10) 

Web Search With Snippets

Description: Search the internet and return long snippets from each search result. Useful for quickly confirming a fact without reading the entire page.

Action: web_search_with_snippets Arguments: query: Search query; you may use operators like site:, filetype:, ”exact“ for precision. (Type: string) (Required) ”“”

小讯
上一篇 2026-04-02 13:31
下一篇 2026-04-02 13:29

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/226293.html