Topics

AI Models

Foundation models, multimodal systems, reasoning, and product shifts from new model releases.

23 articles Latest 2026-05-21 Subscribe to topic RSS

Related tags

#AI-Model#LLM#Gemini#Reasoning#Multimodal#GPT#GPT-5.4#Image-Generation#Nano-Banana

Top sources

OpenAI (5)Anthropic (2)arXiv (2)Google (2)Alibaba / Tongyi Qianwen (1)ARC Prize (1)

Articles

AI Models 2026-05-21

OpenAI Model Autonomously Solves 80-Year-Old Geometry Problem

An OpenAI reasoning model disproves a central conjecture in discrete geometry that had stood for nearly 80 years — marking the first time an AI system has autonomously solved an open mathematical problem in an active field.

AI Agents 2026-05-17

δ-mem Brings Efficient Online Memory to Large Language Models

A new lightweight memory mechanism using only an 8×8 state matrix gives frozen LLMs associative memory through delta-rule learning, boosting agent benchmark performance by up to 31% without full fine-tuning.

AI Models 2026-05-16

Don't Expect AI Progress to Sigmoid Anytime Soon

Scott Alexander pushes back against the 'all exponentials become sigmoids' argument used to dismiss AI progress concerns, showing how history is littered with premature plateau predictions, and arguing Lindy's Law suggests continued progress for ~7 more years.

Industry 2026-05-13

Google Launches Googlebook AI-Native Laptop Line

Google unveils Googlebook, a laptop series designed for Gemini Intelligence with Magic Pointer AI cursor, AI widget generation, and deep Android phone integration, shipping Fall 2026.

AI Models 2026-05-10

Fields Medalist Tests ChatGPT 5.5 Pro: PhD-Level Math Research in Under Two Hours

Timothy Gowers put ChatGPT 5.5 Pro on open problems in additive number theory. The model produced original, verified mathematical proofs with zero substantive input from Gowers — forcing the math community to rethink PhD training and research attribution.

AI Agents 2026-05-10

When You Delegate to LLMs, Your Documents Get Corrupted

A new benchmark shows that even frontier models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt roughly 25% of document content in long delegated workflows, and agentic tool use doesn't help.

Security 2026-05-01

How 6% of Users Turn to Claude for Personal Life Guidance

Anthropic's Privacy-preserving analysis of 1 million conversations reveals the most common domains of AI guidance-seeking—and where sycophancy remains a problem.

Industry 2026-04-28

OpenAI models, Codex, and Managed Agents land on AWS

OpenAI and AWS expand their partnership to bring GPT-5.5, Codex, and new Bedrock Managed Agents to AWS customers, giving enterprises a direct path to deploy frontier AI within their existing cloud infrastructure.

AI Agents 2026-04-25

LLMs make surface quality unreliable in knowledge work

One Happy Fellow argues that LLMs break the proxy measures organizations use to judge knowledge work. When spelling, formatting, review rituals, and professional tone can be generated cheaply, teams need better ways to verify whether work is actually true, useful, and decision-grade.

AI Agents 2026-04-24

DeepSeek V4 preview brings 1M context into open model competition

DeepSeek has released and open-sourced the V4 preview, with Pro and Flash variants and 1M context as the default across official services. The release matters less as a benchmark update than as a push to make long-context agent workflows cheaper and more deployable.

AI Agents 2026-04-24

Google deepens its Anthropic bet to own both model access and compute demand

Google plans to invest up to $40 billion in Anthropic, with $10 billion up front and the rest tied to performance milestones. The bigger story is how the deal binds equity, cloud distribution, and TPU demand into a single infrastructure value chain.

AI Models 2026-04-23

OpenAI launches GPT-5.5 with a bigger leap in autonomous work

OpenAI launches GPT-5.5 with stronger coding and knowledge-work performance while preserving speed, pushing the model closer to an execution layer for autonomous digital work.

Security 2026-04-21

Kelsey Piper Finds Claude Opus 4.7 Can Identify Authors from a Small Sample of Unpublished Text

Journalist Kelsey Piper demonstrates that Claude Opus 4.7 can identify her from as little as 125 words of unpublished text — across political commentary, education reports, movie reviews, and a 15-year-old college essay.

AI Agents 2026-04-16

Introducing Claude Opus 4.7 | Anthropic

Anthropic introduces Claude Opus 4.7 with enhanced AI capabilities.

AI Agents 2026-04-16

The Gemini App is now available on Mac OS

Google is bringing the Gemini app to macOS as a native desktop experience.

AI Agents 2026-04-09

Meta Introduces Muse Spark: Scaling Towards Personal Superintelligence

Meta announces initiative to provide everyone with their own superintelligent assistant, enabling truly personalized AI experiences.

AI Models 2026-04-03

Qwen3.6-Plus: AI Agent for Real-World Applications

Alibaba Tongyi Qianwen releases model for real-world agent scenarios, supporting complex task planning, code generation, multimodal understanding, and tool calling.

AI Models 2026-04-02

Google Releases Gemma 4: The Most Capable Open Models to Date

Purpose-built for advanced reasoning and agentic workflows. Four sizes: E2B/E4B/26B-MoE/31B. Apache 2.0 license. #3 on Arena AI leaderboard.

AI Models 2026-03-26

ARC-AGI-3: The Next-Gen Reasoning Benchmark for Measuring AGI

Third-generation ARC reasoning benchmark testing AI agents interactive reasoning, measuring the gap between AI and human intelligence.

AI Models 2026-03-26

OpenAI Announces Shutting Down Sora

OpenAI announces shutting down Sora app, just months after launching the AI video generation tool.

AI Models 2026-03-06

Introducing Forge | Mistral AI

OpenAI releases GPT-5.4, combining recent advances in reasoning, coding, and agentic workflows into a single frontier model. Achieves a new state-of-the-art 83.0% on GDPval benchmark with native computer-use capabilities.

AI Models 2026-02-27

Google Releases Nano Banana 2: Next-Gen Image Model Combining Pro Capabilities with Lightning Speed

Google DeepMind releases Nano Banana 2, combining Pro features with Flash speed. Supports subject consistency, precise text rendering, 4K resolution, now available across Gemini, Search, Flow and more.

AI Models 2026-02-23

OpenAI drops SWE-bench Verified after finding widespread contamination

OpenAI found that SWE-bench Verified suffers from flawed test cases and training data contamination across all major models, and is now recommending SWE-bench Pro instead.