评测维度 Claude Opus 4.7 GPT-5.4 Gemini 3.1 Pro
64.3%
54.2% Agentic coding (SWE-bench Verified) 87.6% — 80.6% Agentic terminal coding (Terminal-Bench 2.0)
75.1% 68.5% Multidisciplinary reasoning (Humanity’s Last Exam w/ tools) 54.7% 58.7%
Agentic search (BrowseComp) 79.3% 89.3% 85.9% Scaled tool use (MCP-Atlas) 77.3% 68.1% 73.9% Agentic computer use (OSWorld-Verified) 78.0% 75.0% — Agentic financial analysis (Finance Agent v1.1) 64.4% 61.5% 59.7%
94.2%
94.3% Visual reasoning (CharXiv w/ tools) 91.0% — —
91.5% — 92.6%
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/270982.html