# Local Models
Mirascope supports local models through built-in providers and OpenAI-compatible routing. This lets you run models on your own hardware for privacy, cost savings, or offline access.
<Note>
This guide builds on concepts from [Providers](/docs/learn/llm/providers). Read that first to understand how provider registration and routing work.
</Note>
## Ollama
[Ollama](https://ollama.com/) makes it easy to run open-source models locally. Mirascope includes a built-in Ollama provider that works out of the box:
```python
# 1. Install Ollama: https://ollama.com/download
# 2. Pull a model: ollama pull deepseek-r1:1.5b
# 3. Run this script
from mirascope import llm
@llm.call("ollama/deepseek-r1:1.5b")
def recommend_book(genre: str):
return f"Recommend a {genre} book."
response = recommend_book("fantasy")
print(response.pretty())
```
The Ollama provider connects to `http://localhost:11434/v1/` by default. Set the `OLLAMA_BASE_URL` environment variable to use a different endpoint, or use `llm.register_provider()` to configure it programmatically. See [Providers](/docs/learn/llm/providers) for details.
## vLLM and OpenAI-Compatible Servers
Many local inference servers expose OpenAI-compatible APIs, including [vLLM](https://docs.vllm.ai/) and [LM Studio](https://lmstudio.ai/). Use `llm.register_provider()` to route models through Mirascope's OpenAI provider:
```python
# 1. Install vLLM: uv pip install vllm
# 2. Start the server: vllm serve meta-llama/Llama-3.2-1B-Instruct
# 3. Run this script
from mirascope import llm
llm.register_provider(
"openai:completions",
scope="vllm/",
base_url="http://localhost:8000/v1",
api_key="vllm", # required by client but unused
)
@llm.call("vllm/meta-llama/Llama-3.2-1B-Instruct")
def recommend_book(genre: str):
return f"Recommend a {genre} book."
response = recommend_book("fantasy")
print(response.pretty())
```
This pattern works with any server that implements the OpenAI chat completions API. Adjust the `scope` prefix and `base_url` to match your setup.
## MLX (Apple Silicon)
On Mac, you can run models directly on Apple Silicon using [MLX](https://github.com/ml-explore/mlx). Mirascope's MLX provider loads models from Hugging Face and runs them locally:
```python
# Requires Apple Silicon Mac
# 1. Install MLX dependencies: uv pip install "mirascope[mlx]"
# 2. Run this script (model downloads automatically on first use)
from mirascope import llm
@llm.call("mlx-community/Llama-3.2-1B-Instruct-4bit")
def recommend_book(genre: str):
return f"Recommend a {genre} book."
response = recommend_book("fantasy")
print(response.pretty())
```
Models download automatically on first use. The provider caches loaded models in memory, so subsequent calls are fast.
<Note>
The MLX provider currently supports basic text generation and streaming. Tools and structured output are not yet supported.
</Note>
## Next Steps
- [Providers](/docs/learn/llm/providers) — Configure provider settings and routing
- [Streaming](/docs/learn/llm/streaming) — Stream responses from local models
- [Tools](/docs/learn/llm/tools) — Use tools with Ollama and vLLM