Benchmark Model - Search News

Claude Opus 4.7 Is Here: Anthropic’s Latest Model Delivers, But It’s a Token Eating Machine

Anthropic's new flagship model Claude Opus 4.7 beat every benchmark we threw at it, and eats tokens like a hungry teenager.

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

VentureBeat

Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don’t tell the whole story

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google has claimed the top spot in a ...

VentureBeat

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

Business Wire

New MLPerf Training and HPC Benchmark Results Showcase 49X Performance Gains in 5 Years

SAN FRANCISCO--(BUSINESS WIRE)--Today, MLCommons® announced new results from two industry-standard MLPerf™ benchmark suites: MLPerf Training v3.1 The MLPerf Training benchmark suite comprises full ...

10don MSN

Meta unveils Muse Spark, its first AI model since hiring Alexandr Wang and a bellwether for CEO Mark Zuckerberg’s multibillion-dollar AI push

The new model is competitive with rival AI models, according to data from Meta, but won’t be widely available outside Meta’s ...

TechCrunch

Show inaccessible results

Claude Opus 4.7 Is Here: Anthropic’s Latest Model Delivers, But It’s a Token Eating Machine

OpenAI details o3 reasoning model with record-breaking benchmark scores

Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don’t tell the whole story

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

New MLPerf Training and HPC Benchmark Results Showcase 49X Performance Gains in 5 Years

Meta unveils Muse Spark, its first AI model since hiring Alexandr Wang and a bellwether for CEO Mark Zuckerberg’s multibillion-dollar AI push

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Microsoft Open-Sources Harrier Embedding Model, Tops MTEB

UniPat AI Launches EchoZ Prediction Model, Demonstrating Performance Beyond Human Traders on Polymarket

MLPerf Training and HPC Benchmark Show 49X Performance Gains in 5 Years