# Async Asynchronous programming is important when building LLM applications. LLM API calls are I/O-bound operations—your code spends most of its time waiting for network responses. Async lets you use that waiting time productively by running multiple operations concurrently, improving both responsiveness and throughput. ## Basic Async Calls Make any prompt function async by adding the `async` keyword: <TabbedSection> <Tab value="Call"> ```python import asyncio from mirascope import llm @llm.call("openai/gpt-5-mini") async def recommend_book(genre: str): return f"Recommend a {genre} book." async def main(): response = await recommend_book("fantasy") print(response.text()) asyncio.run(main()) ``` </Tab> <Tab value="Prompt"> ```python import asyncio from mirascope import llm @llm.prompt async def recommend_book(genre: str): return f"Recommend a {genre} book." async def main(): response: llm.AsyncResponse = await recommend_book("openai/gpt-5-mini", "fantasy") print(response.text()) asyncio.run(main()) ``` </Tab> <Tab value="Model"> ```python import asyncio from mirascope import llm model = llm.Model("openai/gpt-5-mini") async def main(): response: llm.AsyncResponse = await model.call_async("Recommend a fantasy book.") print(response.text()) asyncio.run(main()) ``` </Tab> </TabbedSection> The changes from synchronous code are: 1. Use an async prompt function (for prompts and calls), or use `model.call_async`/`model.stream_async` for `llm.Model` 2. Use `await` when generating the response 3. Run inside an async context (like `asyncio.run()`) ## Parallel Calls One benefit of async is that you can run multiple calls concurrently. Use `asyncio.gather()` to run calls in parallel: ```python import asyncio from mirascope import llm @llm.call("openai/gpt-5-mini") async def recommend_book(genre: str): return f"Recommend a {genre} book." async def main(): genres = ["fantasy", "mystery", "romance"] responses = await asyncio.gather(*[recommend_book(genre) for genre in genres]) for genre, response in zip(genres, responses, strict=False): print(f"[{genre}]: {response.pretty()}\n") asyncio.run(main()) ``` All three book recommendations run simultaneously—the total time is roughly the time of the slowest call, not the sum of all three. ## Async Tools When your tools perform I/O operations (API calls, database queries, file operations), make them async: ```python import asyncio from mirascope import llm @llm.tool async def fetch_weather(city: str) -> str: """Fetch current weather for a city.""" await asyncio.sleep(0.1) # Simulate async API call return f"72°F and sunny in {city}" @llm.call("openai/gpt-5-mini", tools=[fetch_weather]) async def weather_assistant(query: str): return query async def main(): response = await weather_assistant("What's the weather in Tokyo?") while response.tool_calls: tool_outputs = await response.execute_tools() response = await response.resume(tool_outputs) print(response.pretty()) asyncio.run(main()) ``` When you call `response.execute_tools`, it will automatically use `asyncio.gather()` to ensure that all tools are executed concurrently. <Note> If any tool in the toolkit is async, all tools must be async and the call must be async. To convert sync tools or prompts, just change `def` to `async def`. Tools from [MCP](/docs/learn/llm/mcp) servers are always async, so you'll need to use async when working with MCP. </Note> ## Async Streaming Streaming works the same way in async, just use `async for`: ```python import asyncio from mirascope import llm @llm.call("openai/gpt-5-mini") async def recommend_book(genre: str): return f"Recommend a {genre} book." async def main(): response = await recommend_book.stream("fantasy") async for chunk in response.text_stream(): print(chunk, end="", flush=True) asyncio.run(main()) ``` All stream iterators (`pretty_stream()`, `streams()`, `chunk_stream()`, `structured_stream()`) support async iteration. ## Response Types Mirascope provides async variants of response types: | Sync | Async | | --- | --- | | `Response` | `AsyncResponse` | | `StreamResponse` | `AsyncStreamResponse` | | `ContextResponse` | `AsyncContextResponse` | | `ContextStreamResponse` | `AsyncContextStreamResponse` | Async and sync responses are the same with respect to properties and how they accumulate content. The major differences are that methods `resume` and `execute_tools` need to be awaited in async responses, and that an async responses's toolkit contains async tools. ## Best Practices <Note title="Best Practices" collapsible={true} defaultOpen={false}> - **Avoid blocking operations**: Don't use blocking I/O (like `time.sleep()` or synchronous HTTP clients) inside async functions—this blocks the entire event loop - **Be mindful of rate limits**: Async makes it easy to send many concurrent requests, but providers have rate limits. Implement throttling if needed </Note> ## When to Use Async Async is most beneficial when: - Making multiple LLM calls that can run in parallel - Your tools perform network requests or database queries - Interacting with [MCP](/docs/learn/llm/mcp) servers - Building web applications (most frameworks are async-native) For simple scripts with single sequential calls, sync code is simpler and equally performant. ## Next Steps - [Streaming](/docs/learn/llm/streaming) — Stream responses in real-time - [Tools](/docs/learn/llm/tools) — Learn more about tool calling - [Agents](/docs/learn/llm/agents) — Build agents with async patterns

On this page

On this page