# Local Models Mirascope supports local models through built-in providers and OpenAI-compatible routing. This lets you run models on your own hardware for privacy, cost savings, or offline access. <Note> This guide builds on concepts from [Providers](/docs/learn/llm/providers). Read that first to understand how provider registration and routing work. </Note> ## Ollama [Ollama](https://ollama.com/) makes it easy to run open-source models locally. Mirascope includes a built-in Ollama provider that works out of the box: ```python # 1. Install Ollama: https://ollama.com/download # 2. Pull a model: ollama pull deepseek-r1:1.5b # 3. Run this script from mirascope import llm @llm.call("ollama/deepseek-r1:1.5b") def recommend_book(genre: str): return f"Recommend a {genre} book." response = recommend_book("fantasy") print(response.pretty()) ``` The Ollama provider connects to `http://localhost:11434/v1/` by default. Set the `OLLAMA_BASE_URL` environment variable to use a different endpoint, or use `llm.register_provider()` to configure it programmatically. See [Providers](/docs/learn/llm/providers) for details. ## vLLM and OpenAI-Compatible Servers Many local inference servers expose OpenAI-compatible APIs, including [vLLM](https://docs.vllm.ai/) and [LM Studio](https://lmstudio.ai/). Use `llm.register_provider()` to route models through Mirascope's OpenAI provider: ```python # 1. Install vLLM: uv pip install vllm # 2. Start the server: vllm serve meta-llama/Llama-3.2-1B-Instruct # 3. Run this script from mirascope import llm llm.register_provider( "openai:completions", scope="vllm/", base_url="http://localhost:8000/v1", api_key="vllm", # required by client but unused ) @llm.call("vllm/meta-llama/Llama-3.2-1B-Instruct") def recommend_book(genre: str): return f"Recommend a {genre} book." response = recommend_book("fantasy") print(response.pretty()) ``` This pattern works with any server that implements the OpenAI chat completions API. Adjust the `scope` prefix and `base_url` to match your setup. ## MLX (Apple Silicon) On Mac, you can run models directly on Apple Silicon using [MLX](https://github.com/ml-explore/mlx). Mirascope's MLX provider loads models from Hugging Face and runs them locally: ```python # Requires Apple Silicon Mac # 1. Install MLX dependencies: uv pip install "mirascope[mlx]" # 2. Run this script (model downloads automatically on first use) from mirascope import llm @llm.call("mlx-community/Llama-3.2-1B-Instruct-4bit") def recommend_book(genre: str): return f"Recommend a {genre} book." response = recommend_book("fantasy") print(response.pretty()) ``` Models download automatically on first use. The provider caches loaded models in memory, so subsequent calls are fast. <Note> The MLX provider currently supports basic text generation and streaming. Tools and structured output are not yet supported. </Note> ## Next Steps - [Providers](/docs/learn/llm/providers) — Configure provider settings and routing - [Streaming](/docs/learn/llm/streaming) — Stream responses from local models - [Tools](/docs/learn/llm/tools) — Use tools with Ollama and vLLM

On this page

On this page