Mirascope Frog Logo
Mirascope
DocsBlogPricingCloud
⌘K
Type to search
⌘Kto search
Escto close
mirascope
v1.25.7
1.3k
Join our
Docs v1 (legacy)API ReferenceGuidesDocs
Overview
Getting Started
QuickstartStructured OutputsDynamic Configuration & ChainingTools & Agents
Agents
Web Search AgentAgent Executor: Blog WritingDocumentation AgentLocal Chat with CodebaseLocalized AgentQwant Search Agent with SourcesGenerate SQL with LLM
More Advanced
Code Generation and ExecutionDocument SegmentationExtracting from PDFExtraction using VisionGenerate Captions for an ImageGenerate Synthetic DataLLM Validation With RetriesNamed Entity Recognitiono1 Style ThinkingPII ScrubbingQuery PlanRemoving Semantic DuplicatesSearch with SourcesTranscribing SpeechSupport Ticket RoutingText ClassificationText SummarizationText TranslationKnowledge Graph
Prompt Engineering
Evaluations
Evaluating Documentation AgentEvaluating Web Search Agent with LLMEvaluating Generating SQL with LLM
Langgraph Vs Mirascope
LangGraph Quickstart using Mirascope
# Generate Captions for an Image In this recipe, we go over how to use LLMs to generate a descriptive caption set of tags for an image with OpenAI’s `gpt-4o-mini`. <Info title="Mirascope Concepts Used" collapsible={true} defaultOpen={false}> <ul> <li><a href="/docs/v1/learn/prompts/">Prompts</a></li> <li><a href="/docs/v1/learn/calls/">Calls</a></li> <li><a href="/docs/v1/learn/response_models/">Response Models</a></li> </ul> </Info> <Note title="Background"> Caption generation evolved from manual human effort to machine learning techniques like Conditional Random Fields (CRFs) and Support Vector Machines (SVMs), which were time-consuming and resource-intensive. Large Language Models (LLMs) have revolutionized this field, enabling efficient multi-modal tasks through API calls and prompt engineering, dramatically improving caption generation speed and accuracy. </Note> ## Setup Let's start by installing Mirascope and its dependencies: ```python !pip install "mirascope[openai]" ``` ```python import os os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # Set the appropriate API key for the provider you're using ``` ## Generate Captions <Warning title="Warning"> This recipe will only work for those which support images (OpenAI, Gemini, Anthropic) as of 8/13/2024. Be sure to check if your provider has multimodal support. </Warning> With OpenAI’s multimodal capabilities, image inputs are treated just like text inputs, which means we can use it as context to ask questions or make requests. For the sake of reproducibility, we will get our image from a URL to save you the hassle of having to find and download an image. The image is [a public image from BBC Science of a wolf](https://c02.purpledshub.com/uploads/sites/41/2023/01/How-to-see-the-Wolf-Moon-in-2023--4bb6bb7.jpg?w=1880&webp=1) in front of the moon. Since we can treat the image like any other text context, we can simply ask the model to caption the image: ```python from mirascope.core import openai, prompt_template url = "https://c02.purpledshub.com/uploads/sites/41/2023/01/How-to-see-the-Wolf-Moon-in-2023--4bb6bb7.jpg?w=940&webp=1" @openai.call(model="gpt-4o-mini") @prompt_template("Generate a short, descriptive caption for this image: {url:image}") def generate_caption(url: str): ... response = generate_caption(url) print(response) ``` A lone wolf howls into the night, silhouetted against a glowing full moon, creating a hauntingly beautiful scene that captures the wild spirit of nature. <Info title="Additional Real-World Examples"> <ul> <li><b>Content Moderation</b>: Classify user-generated images as appropriate, inappropriate, or requiring manual review.</li> <li><b>Ecommerce Product Classification</b>: Create descriptions and features from product images.</li> <li><b>AI Assistant for People with Vision Impairments</b>: Convert images to text, then text-to-speech so people with vision impairments can be more independent.</li> </ul> </Info> When adapting this recipe to your specific use-case, consider the following: - Refine your prompts to provide clear instructions and relevant context for your caption generation task. - Experiment with different model providers and version to balance accuracy and speed. - Use multiple model providers to evaluate results for accuracy. - Use `async` for multiple images and run the calls in parallel.

Provider

On this page

Provider

On this page

© 2026 Mirascope. All rights reserved.

Mirascope® is a registered trademark of Mirascope, Inc. in the U.S.

Privacy PolicyTerms of Use