Lemonade by AMD: Fast Open Source Local LLM Server

2026-04-03

AMD introduces Lemonade, a fast, open-source local LLM server supporting GPUs and NPUs, ready to deploy on any PC in minutes.

Key Features

GPU and NPU Support — Full utilization of AMD hardware acceleration
Multimodal Capabilities — Text, image generation, speech recognition and synthesis
OpenAI API Compatible — Works out-of-the-box with hundreds of applications
Lightweight Backend — Native C++ backend, only 2MB
One-Minute Install — Simple installer with automatic configuration
Multi-Engine Compatible — Supports llama.cpp, Ryzen AI SW, FastFlowLM, etc.
Multiple Models at Once — Load and run multiple models simultaneously
Cross-Platform — Windows, Linux, macOS (beta)

Run large models like gpt-oss-120b or Qwen-Coder-Next locally for advanced tool use, without cloud dependency.