Running Claude Code Locally with Ollama: A Performance Check
Claude Code, powered by Anthropic’s models, offers impressive capabilities but can quickly consume credits, especially during intensive coding tasks. As an alternative, running Claude Code with a local Large Language Model (LLM) through Ollama provides a cost-effective and potentially more sustainable solution. This article explores the process of setting up Claude Code with a local LLM, specifically Qwen 3.5, and assesses its performance for real-world coding scenarios.
Setting up Claude Code with a Local LLM
It’s Easier Than I Thought, But You Necessitate a Capable Device
The setup process is surprisingly straightforward. Initiate by installing Ollama, available as a macOS application from ollama.com. The subsequent steps are executed through the terminal. First, pull the desired model using the following command. Ollama will then download and prepare the model in the background:
ollama pull qwen3.5:9b
Next, install Claude Code using npm:
npm install -g @anthropic-ai/claude-code
To configure Claude Code to use your local Ollama server instead of Anthropic’s API, set the necessary environment variables. Navigate to your project directory and launch Claude Code with the following command:
cd /path/to/your/project
claude --model qwen3:latest
Once launched, run /init to allow Claude Code to scan your codebase and complete the setup. After initialization, you can interact with Claude Code as usual, issuing tasks and receiving responses from the local LLM.
Before attempting to run a local LLM, it’s crucial to assess your hardware capabilities. Local LLMs can be resource-intensive, consuming significant memory and processing power. Systems with 8GB of RAM may struggle, potentially leading to performance degradation due to swapping. 16GB of RAM offers a more usable experience, though limitations still exist. For optimal performance, especially with larger models, a dedicated GPU is recommended, particularly for non-Apple Silicon systems where CPU-only setups can be significantly slower.
Testing on a MacBook Air (M5, 16GB RAM) revealed a slight temperature increase while running Qwen 3.5 (9B). While manageable on this configuration, attempting to run a 16B model pushed the system closer to its limits.
Qwen Holds Up Better Than Expected for Real Work
It Handles Everyday Coding Well If You Keep Expectations Realistic
Local models often struggle with even basic tasks, but Qwen 3.5, when used with Ollama, proved surprisingly capable. It excels at reading and explaining code, providing clear breakdowns of unfamiliar files and accurately tracing data flow. While code generation is more variable, it performs well for boilerplate code, helper functions, and simple components, often requiring minimal edits.
Refactoring as well works effectively, particularly for focused tasks like cleaning up functions or renaming variables. However, coordinating changes across multiple files can be challenging for a 9B parameter model due to context limitations.
A significant benefit of using a local LLM with Claude Code is circumventing API limits. When using cloud-based models like Opus or Sonnet, long coding sessions can quickly exhaust available tokens, halting workflow. Switching to a local model allows you to continue working, albeit with reduced capabilities, without waiting for token resets. Local models also provide a valuable fallback option when reliable internet connectivity is unavailable.
A Local LLM Is Worth the Hassle
While not a direct replacement for powerful cloud-based models, a local LLM offers a viable alternative, especially considering its cost-effectiveness and availability. It enables you to ask questions, generate code, debug minor issues, and maintain productivity even when cloud services are unavailable or restricted. This is particularly relevant given recent outages experienced with Anthropic’s services, as reported on XDA Developers on March 26, 2026, where even the best cloud tools became temporarily inaccessible.