# Table of Contents # Welcome - An open-source prompt engineering framework # Open-Source - How Lilypad balances open-source and enterprise features # Lilypad Quickstart - Start using Lilypad in one line of code # Self Hosting - Run Lilypad in your own infrastructure # Playground - No-code interface for experimenting with Lilypad # OpenTelemetry - Observability made easy # Spans - Easily instrument arbitrary blocks of code with OpenTelemetry # Traces - Structured collections of spans # Versioning - Track versions of your LLM functions # Cost & Latency - Monitor the performance and cost of your LLM functions # Comparisons - Compare different LLM function implementations # Annotations - Add labels and feedback to your LLM outputs # Lilypad API Reference - API documentation for Lilypad # Welcome to Lilypad Welcome to the Lilypad documentation! We're excited you're here. ## Why Lilypad (we think you should read this) When building with LLMs, a typical development flow might look like this: 1. [Prototype](#1-prototype) — make sure everything is functional 2. [Vibe Check](#2-vibe-check) — gut feeling should be "good enough" 3. [Annotate](#3-annotate) — systematically label the data you look at 4. [Analyze](#4-analyze) — understand where and why the system is failing 5. [Optimize](#5-optimize) — apply your learnings to improve the system (e.g. your prompt) 6. [Iterate](#6-iterate) — repeat steps 3-5 (forever, or at least until it's "good enough") Let's break each of these steps down further. ## 1. Prototype The first and most important step is simply getting started. We recommend taking a look at our open-source LLM library `mirascope`, which we've purpose built to make both prototyping and the steps that follow simple, easy, and elegant. For the remaining sections, let's use a simple LLM call as an example: ```python from mirascope import llm @llm.call(provider="$PROVIDER", model="$MODEL") # [!code highlight] def answer_question(question: str) -> str: return f"Answer this question: {question}" # [!code highlight] response = answer_question("What is the capital of France?") print(response.content) # > The capital of France is Paris. ``` We're using the `@llm.call()` decorator to turn the `answer_question` function into an LLM API call. ## 2. Vibe Check How do we feel about "The capital of France is Paris." as the answer to our question? Let's say our gut feeling is "not good enough" because we want a single word answer, so we update our prompt to make this more clear: ```python from mirascope import llm @llm.call(provider="$PROVIDER", model="$MODEL") def answer_question(question: str) -> str: return f"Answer this question in one word: {question}" # [!code highlight] response = answer_question("What is the capital of France?") print(response.content) # > Paris ``` Oops, we forgot to commit our previous prompt. Not good. For a simple example this might not seems like a big deal, but LLMs are fickle. What if the prompt we just lost happened to be the one that would've performed the best, and now you can't replicate it. How do you decide when to commit what? And how do you properly keep track of all of the different versions? This is the point at which most people reach for observability tooling. This is *almost* the right choice. The issue is that today's observability tooling was not built for the LLM era. It was built for deterministic software, but LLMs are non-deterministic. You need more than just observability — you need to build a data flywheel. This requires: 1. Some place to put your data 2. Some place to see / query / etc. that data 3. Some way to annotate that data 4. Some way to track / version artifacts (so you can compare performance over time) Current observability tools provide 1 and 2 but not 3 or 4, which are critical. Lilypad provides all four — in just one line of code. ```python {6} import lilypad from mirascope import llm lilypad.configure(auto_llm=True) @lilypad.trace(versioning="automatic") # [!code highlight] @llm.call(provider="$PROVIDER", model="$MODEL") def answer_question(question: str) -> str: return f"Answer this question in one word: {question}" response = answer_question("What is the capital of France?") print(response.content) # > Paris ``` Check out the [Versioning](/docs/lilypad/observability/versioning) section for more information. ## 3. Annotate The next step is to look at real (or synthetic) data and systematically label it. With Lilypad, you annotate the data right where you look at it. This makes it seamless. ![Lilypad Annotation Queue](/assets/docs/lilypad/welcome/annotation-queue.webp) It's also extremely important that we annotate not just the inputs/outputs but also everything about the trace. This includes the code, the prompt, the call parameters, the cost, the latency — everything you might need to know if you'd consider it "good enough" or not. ## 4. Analyze Once you've annotated enough data, it's time to look for trends — common failure points. Compare outputs from different versions on the same input. Did the changes help? ![Lilypad Trace Annotation](/assets/docs/lilypad/welcome/trace-annotation.webp) Distilling your annotations into action items makes for much easier optimization. ## 5. Optimize Now we can apply our analysis and update the system to improve it. For example, we can identify the most common points of failure and work to resolve those first. Consider our earlier example. We identify that there are a lot of longer answers and we really want single word answers, so we add "in one word" to the prompt and run the process again. ![Lilypad Versioned Function Comparison](/assets/docs/lilypad/welcome/optimization-comparison.webp) This step is just the systematic version of our earlier "vibe check" process that results in real data and actionable insights. ## 6. Iterate Part of the optimization process involves making changes — a new version. All we have to do is repeat steps 3 through 5 until we deem the system "good enough". ## Getting Started

Quickstart

Get started with Lilypad in just a few minutes

Read more →

Open Source

Learn about Lilypad's open-source initiative

Read more →

Self-Hosting

Run Lilypad in your own infrastructure

Read more →
## Observability

Spans

Instrument arbitrary blocks of code

Read more →

Traces

Structured collections of spans

Read more →

Versioning

Track versions of your LLM functions

Read more →
## Evaluation

Cost & Latency

Monitor performance and cost

Read more →

Comparisons

Compare different LLM function implementations

Read more →

Annotations

Add labels and feedback to your LLM outputs

Read more →
# Open-Source As an open-source project, Lilypad makes it so: - You can see exactly what's going on under the hood and how it works. - You can [self-host](/docs/lilypad/getting-started/self-hosting) Lilypad on your own servers and customize it to your heart's content. - You can [contribute](https://github.com/Mirascope/lilypad/blob/main/CONTRIBUTING.md) to the project. We're excited to see where the community takes it! ## Lilypad License Except for the contents and code contained in any `ee` directory or sub-directory of this repository, which is covered by a commercial license (see below), all code in this repository is licensed under the terms of the [MIT License](https://github.com/Mirascope/lilypad/blob/main/LICENSE). The contents and code of any `ee` directory or sub-directory of this repository are licensed under the Enterprise Edition (EE) License. This content and code is only available to users using Mirascope's hosted version of Lilypad or those with a valid Enterprise Edition (EE) License. See [ee/LICENSE](https://github.com/Mirascope/lilypad/blob/main/ee/LICENSE) for the full terms. The [pricing](/pricing) page has more information on the features available in the Enterprise Edition (EE) of Lilypad and how to purchase a license. # Lilypad Quickstart ## Create an account You'll need a GitHub or Google account to sign up. First, navigate to [https://lilypad.mirascope.com](https://lilypad.mirascope.com/) and create an account. ## Environment Variables Navigate to [Settings -> Organization](https://lilypad.mirascope.com/settings/org) and: 1. Create a new project. 2. Generate an API key for that project. The API key will only be shown once, so make sure to copy and save it. ```bash LILYPAD_PROJECT_ID="YOUR_PROJECT_ID" LILYPAD_API_KEY="YOUR_API_KEY" ``` If using a `.env` file, remember to use something like [`load_dotenv()`](https://github.com/theskumar/python-dotenv) so they are properly loaded. ### LLM API Key You'll need to set the API key for the provider you're using in your environment. We recommend creating one in [Google AI Studio](https://aistudio.google.com/apikey) if you don't have one yet. They have a very generous free tier. ## Installation Install the `lilypad-sdk` with any additional dependencies you may need: ```bash # For spans / tracing uv add lilypad-sdk # For Google Gemini/Vertex support uv add "lilypad-sdk[google]" # For multiple providers uv add "lilypad-sdk[google,openai,anthropic]" ``` ```bash # For spans / tracing pip install lilypad-sdk # For OpenAI support pip install "lilypad-sdk[openai]" # For multiple providers pip install "lilypad-sdk[openai,anthropic,google]" ``` Available provider extras: - `google` - Google Gemini/Vertex models (genai SDK) - `openai` - OpenAI models - `anthropic` - Anthropic models - `bedrock` - AWS Bedrock models - `azure` - Azure AI models - `mistral` - Mistral models - `outlines` - Outlines framework ## Automatically Trace & Version Run your first automatically [traced](/docs/lilypad/observability/traces) and [versioned](/docs/lilypad/observability/versioning) function: ```python import os from google.genai import Client import lilypad # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) client = Client() @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.models.generate_content( model="gemini-2.0-flash-001", contents=f"Answer this question: {question}", ) return response.text response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` ```python import os import lilypad from openai import OpenAI # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) client = OpenAI() @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"Answer this question: {question}"}], ) return response.choices[0].message.content response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` ```python import os from anthropic import Anthropic import lilypad # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) client = Anthropic() @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.messages.create( model="claude-3-7-sonnet-latest", messages=[{"role": "user", "content": f"Answer this question: {question}"}], max_tokens=1024, ) content = response.content[0] return content.text if content.type == "text" else None response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` ```python import os import lilypad from mistralai import Mistral # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) client = Mistral(api_key=os.environ["MISTRAL_API_KEY"]) @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.chat.complete( model="mistral-small-latest", messages=[{"role": "user", "content": f"Answer this question: {question}"}], ) choices = response.choices if not choices: return None content = choices[0].message.content return str(content) if content is not None and content != "" else None response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` ```python import os from boto3.session import Session import lilypad # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) session = Session() client = session.client("bedrock-runtime") @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.converse( modelId="anthropic.claude-3-haiku-20240307-v1:0", messages=[ {"role": "user", "content": [{"text": f"Answer this question: {question}"}]} ], ) return response["output"]["message"]["content"][0]["text"] # type: ignore[reportTypedDictNotRequiredAccess] response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` ```python import os from azure.ai.inference import ChatCompletionsClient from azure.core.credentials import AzureKeyCredential import lilypad # [!code highlight:6] lilypad.configure( project_id=os.environ["LILYPAD_PROJECT_ID"], api_key=os.environ["LILYPAD_API_KEY"], auto_llm=True, ) client = ChatCompletionsClient( endpoint="https://your-endpoint.openai.azure.com/openai/deployments/gpt-4o-mini", credential=AzureKeyCredential(key="..."), ) @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.complete( model="gpt-4o-mini", messages=[{"role": "user", "content": f"Answer this question: {question}"}], ) return response.choices[0].message.content response = answer_question("What is the capital of France?") print(response) # > The capital of France is Paris. ``` Follow the link output in your terminal to view the captured version and corresponding trace. See the [observability](/docs/lilypad/observability/opentelemetry) section to learn more. ## Look At Your Data! Go build and run some LLM functions and inspect their results using Lilypad. While you're at it, try [annotating](/docs/lilypad/evaluation/annotations) some data too. # Self Hosting The Free version is available for self hosting with community support. Pro and Team plan features are available for beta testing upon request. Interested in participating in the closed beta? [Join our community](/slack-invite) and send a message to William Bakst. If we determine it's a good fit, you'll get early access to Pro and Team plan features when self hosting. Lilypad is fully [open-source](/docs/lilypad/open-source) and can be self-hosted in your own infrastructure. You can also run Lilypad on your local machine. ### Requirements: To self-host Lilypad, you'll need: 1. Python 3.10+ 2. PostgreSQL 3. A GitHub or Google account (for authentication) 4. Docker In the following sections we will be using `http://localhost:8000` as the host/port URL. Replace this with the URL where you are ultimately hosting Lilypad and making it accessible. ## Setup First, you'll need to clone the repository: ```bash git clone git@github.com:Mirascope/lilypad.git cd lilypad/app ``` Next, you'll need to install `uv` and the package's dependencies: ```bash curl -LsSf https://astral.sh/uv/install.sh | sh uv sync --all-extras ``` Last, create a `.env` file where you'll store all of the upcoming environment variables. ## Authentication First, you'll need to generate and set a JWT secret (e.g. using [https://jwtsecret.com/generate](https://jwtsecret.com/generate)). Then set it in your environment: ```bash LILYPAD_JWT_SECRET=... ``` Next, you'll need to set up OAuth 2.0 Authentication. You can choose between GitHub or Google: - Homepage URL: `http://localhost:8000` - Authorization Callback URL: `http://localhost:8000/auth/callback` ### GitHub To authenticate with GitHub, create a new [OAuth app on GitHub](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/creating-an-oauth-app) and set the following environment variables: ```bash LILYPAD_GITHUB_CLIENT_ID=... LILYPAD_GITHUB_CLIENT_SECRET=... ``` ### Google To authenticate with Google, create a new [OAuth app on Google](https://developers.google.com/identity/protocols/oauth2) and set the following environment variables: ```bash LILYPAD_GOOGLE_CLIENT_ID=... LILYPAD_GOOGLE_CLIENT_SECRET=... ``` ## Client First, set your environment variables: ```bash LILYPAD_API_SERVE_FRONTEND=true ``` You can optionally set `LILYPAD_CLIENT_BASE_URL` to the URL where you're hosting the client separately. In this case, you would set `LILYPAD_API_SERVE_FRONTEND=false`. Now, build the client: ```bash cd client touch .env echo "VITE_REMOTE_API_URL=http://localhost:8000/v0" > .env pnpm i pnpm run build cd .. ``` ## Server Set the remote API and client URLs so the server knows where to route everything: ```bash LILYPAD_REMOTE_API_URL=http://localhost:8000 LILYPAD_REMOTE_CLIENT_URL=http://localhost:8000 ``` Whether you're running this locally or hosted in the cloud, you'll need to build the image: ```bash DOCKER_BUILDKIT=1 docker build -t lilypad . ``` ## Local You can use Docker Compose to run everything together locally: ```bash docker-compose up ``` You must update `docker-compose.yml` to match your `.env` setup. You should now be able to access your Lilypad instance at `http://localhost:8000`. Note that spinning down the container will not result in a loss of data, as the database is persisted in a volume. ## Hosted You'll need to set up a PostgreSQL database and set the following environment variables: ```bash LILYPAD_DB_HOST=... LILYPAD_DB_NAME=... LILYPAD_DB_PASSWORD=... LILYPAD_DB_PORT=... LILYPAD_DB_USER=... ``` Now you can run the server: ```bash docker run -p 8000:8000 --env-file .env lilypad ``` You should now be able to access your Lilypad instance at `http://localhost:8000`. ## Application In your LLM code that uses Lilypad, you'll need to set the API and client URLs so that your code points to your self-hosted instance: ```bash LILYPAD_BASE_URL=http://localhost:8000/v0 LILYPAD_REMOTE_CLIENT_URL=http://localhost:8000 ``` Now you can use Lilypad as [documented](/docs/lilypad/getting-started/quickstart). The production-stable latest release of the SDK on PyPI may not be the same version as the one you're self-hosting. Make sure to install the version that matches your self-hosted instance. # Playground The Playground is under active development during the beta period. Any and all feedback, bug reports, and feature requests are welcome! # OpenTelemetry Lilypad instruments everything using [OpenTelemetry](https://opentelemetry.io/). This also means that any code that has already been instrumented with OpenTelemetry will already work with Lilypad — no changes necessary. ```python import lilypad # configure Lilypad's `TraceProvider` lilypad.configure() ``` ## Your Code Will Always Execute _This is important, so if you ever find that it's not the case, please let us know asap!_ Barring user errors that are supposed to throw errors during development, Lilypad will never crash your code. If there's ever an issue connecting to Lilypad or pulling information, your code will run as-written as though Lilypad was not there. ## Automatically Monitor LLM API Calls One additional benefit of Lilypad is that any call made to an LLM can be traced automatically in adherence with the [OpenTelemetry Gen AI Spec](https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/). You only need to call `configure`. No other code change necessary! For example, if you make a call to Google's Gemini API as in the example below, that call will show up in your Lilypad project as a detailed [span](/docs/lilypad/observability/spans). ```python from google.genai import Client import lilypad lilypad.configure(auto_llm=True) # [!code highlight] client = Client() question = "What is the capital of France?" response = client.models.generate_content( model="gemini-2.0-flash-001", contents=f"Answer this question: {question}", ) print(response.text) # > The capital of France is Paris. ``` Follow the link to see the trace (or navigate to your project's home page). ![Liypad Auto LLM Setting](/assets/docs/lilypad/observability/open-telemetry/auto-llm.webp) # Spans > If you haven't already, we recommend reading [OpenTelemetry](/docs/lilypad/observability/opentelemetry) first. None of the tooling below requires LLMs. It is, however, very handy when working with them. Lilypad implements a `span` context manager that makes it easy to instrument a block of code: ```python import lilypad lilypad.configure() with lilypad.span("Something") as span: # [!code highlight] span.log("Did something...") # [!code highlight] ``` This will automatically capture information such as the latency of the code run within the span, including any calls to `log` that you make inside of the block. ![Basic Span](/assets/docs/lilypad/observability/spans/basic-span.webp) ## Logging Levels The `span` context manager provides additional convenience methods for common logging levels: * `log` — alias for the `info` method. * `info` — log an informational message. * `debug` — log a debug message. * `warning` — log a warning message. * `error` — log an error message. * `critical` — log a critical message. For example, you may want to catch and log and exception: ```python import lilypad lilypad.configure() with lilypad.span("Possible Failure") as span: try: # do something that could error raise RuntimeError("For testing.") except RuntimeError as e: span.error(f"Runtime exception caught: {e}") # [!code highlight] ``` ![Error Span](/assets/docs/lilypad/observability/spans/error-span.webp) ## Metadata You can also capture arbitrary structured metadata: ```python import lilypad lilypad.configure() with lilypad.span("Structured Metadata") as span: span.metadata({"key": "value"}) # [!code highlight] ``` ![Metadata Span](/assets/docs/lilypad/observability/spans/metadata-span.webp) Calling `metadata` multiple times will merge the dictionaries. ## Nested Spans Lilypad will properly capture nested spans as what we call [Traces](/docs/lilypad/observability/traces), which are really just a collection of nested spans that share a single common parent. ```python import lilypad lilypad.configure() with lilypad.span("Parent Span") as parent_span: parent_span.log("I'm the parent!") with lilypad.span("Child span") as child_span: # [!code highlight] child_span.log("I'm the child!") # [!code highlight] ``` The project home page will only display the parent in each row. You can click the dropdown to view the entire collection of spans contained within that trace: ![Nested Spans](/assets/docs/lilypad/observability/spans/nested-spans.webp) # Traces > If you haven't already, we recommend reading [Spans](/docs/lilypad/observability/spans) first. None of the tooling below requires LLMs. It is, however, very handy when working with them. As mentioned in the section on [nested spans](/docs/lilypad/observability/spans#nested-spans), traces are really just collections of spans. In the context of observability, a "trace" often refers to a record of events in a system's execution. Generally this means capturing the inputs, outputs, and additional metadata for each function or module in your system's execution flow. Lilypad makes this simple and easy with the `trace` decorator: ```python import lilypad lilypad.configure() @lilypad.trace() # [!code highlight] def child(text: str) -> str: return "Child Finished!" @lilypad.trace() # [!code highlight] def parent(text: str) -> str: output = child("I'm the child!") print(output) return "Parent Finished!" output = parent("I'm' the parent!") print(output) # > Child Finished! # > Parent Finished! ``` ![Basic Trace](/assets/docs/lilypad/observability/traces/basic-trace.webp) ```python import lilypad lilypad.configure() def child(text: str) -> str: return "Child Finished!" def parent(text: str) -> str: with lilypad.span("child") as child_span: child_input = "I'm the child!" child_span.metadata({"input": child_input}) child_output = child(child_input) child_span.metadata({"output": child_output}) print(child_output) return "Parent Finished!" with lilypad.span("parent") as parent_span: parent_input = "I'm the parent!" parent_span.metadata({"input": parent_input}) parent_output = parent(parent_input) parent_span.metadata({"output": parent_output}) print(parent_output) ``` As you can see, the `trace` decorator above cleans things up significantly and reduces the cognitive overhead involved in instrumenting your code. Of course, everything works well together, and sometimes `span` makes more sense and will work better for certain use-cases where you want to instrument code that is not structured (or does not make sense to structure) as a function. ## Custom Name By default, the `trace` decorator will use the decorated function's name for the trace. Sometimes it makes more sense to use a custom (more readable) name: ```python import lilypad lilypad.configure() @lilypad.trace(name="Answer Question") # [!code highlight] def answer_question(question: str) -> str: return "The capital of France is Paris." answer = answer_question("What is the capital of France?") print(answer) # > The capital of France is Paris. ``` ![Custom Trace Name](/assets/docs/lilypad/observability/traces/custom-name.webp) ## Updating Trace Metadata The `trace` decorator captures information such as inputs/outputs by default, but often you'll want to log additional information or metadata as part of that function's span (and not a sub-span). _We've made this possible in a type-safe way with a special `trace_ctx` reserved argument name._ ```python import lilypad lilypad.configure() @lilypad.trace(name="Answer Question") def answer_question(trace_ctx: lilypad.Span, question: str) -> str: # [!code highlight] trace_ctx.log("I'm the span for Answer Question.") # [!code highlight] return "The capital of France is Paris." answer = answer_question("What is the capital of France?") print(answer) # > The capital of France is Paris. ``` If a `trace`-decorated functions has `trace_ctx: lilypad.Span` as it's first argument, the decorator will inject the span into the argument so that you can access it directly inside the function. The resulting decorated function's call signature will then be updated such that `trace_ctx` is excluded (since `trace_ctx` will be supplied by the decorator, not the user). Above, `answer_question` only expects `question` as an input — and your editor knows this. ![Trace With A Log](/assets/docs/lilypad/observability/traces/trace-log.webp) ## Tracing LLM Calls Since LLMs are non-deterministic, we recommend [versioning](/docs/lilypad/observability/versioning) any functions that use them. Since Lilypad can create spans for LLM API calls automatically, simple calling the API inside of a `trace`-decorated function will nest that span inside of the parent function: ```python from google.genai import Client import lilypad lilypad.configure(auto_llm=True) # [!code highlight] client = Client() @lilypad.trace(name="Answer Question") def answer_question(question: str) -> str | None: response = client.models.generate_content( model="gemini-2.0-flash-001", contents=f"Answer this question: {question}", ) return response.text response = answer_question("What is the capital of France?") print(response) ``` ![Trace With Auto LLM Span](/assets/docs/lilypad/observability/traces/auto-llm-trace.webp) # Versioning > If you haven't already, we recommend reading [Traces](/docs/lilypad/observability/traces) first. When working with non-determinism (cough cough LLMs), it's extremely important that we take a snapshot of the exact version of the code that was used to produce an output. This reproducibility is necessary for proper evaluation. ## Automatic Versioning Setting `versioning="automatic"` will tell Lilypad to automatically version the decorated function every time it's executed: ```python from google.genai import Client import lilypad lilypad.configure(auto_llm=True) client = Client() @lilypad.trace(versioning="automatic") # [!code highlight] def answer_question(question: str) -> str | None: response = client.models.generate_content( model="gemini-2.0-flash-001", contents=f"Answer this question: {question}", ) return response.text response = answer_question("What is the capital of France?") # automatically versioned print(response) # > The capital of France is Paris. ``` Whenever you run a `trace`-decorated function with `versioning="automatic"`, we compute the entire code execution graph necessary to reproduce the current execution of the function. _This is what we use to determine the version._ This means that changing the code to an existing version will automatically attach the trace to that version. It also means that changes to anything that would change how the function is executed will be captured: ```python import lilypad from google.genai import Client client = Client() # This function will be included in the version and any changes # to it will result in a new version of `answer_question` def answer_question_prompt(question: str) -> str: # [!code highlight] return f"Answer this question: {question}" # [!code highlight] @lilypad.trace(name="Answer Question", versioning="automatic") def answer_question(question: str) -> str | None: response = client.models.generate_content( model="gemini-2.0-flash-001", contents=answer_question_prompt(question), # [!code highlight] ) return response.text ``` ![Versioned Function](/assets/docs/lilypad/observability/versioning/basic-version.webp) ## Version Override Since a function's version is a snapshot, we can re-run the exact code for a given version. First, generate type-safe stubs for all available versions: ```bash lilypad sync ``` This ensures that the `.version` method return type-safe function signatures matching the expected arguments for that version, which you can then run in a sandbox: ```python-snippet-concat response = answer_question.version(1)("What is the capital of France?") ``` The `.version` method will have proper type hints when the function is imported but not if used in the same file. This is due to the way that `.pyi` files work in Python. ## A/B Testing You can use the `.version` method to run different versions of the same function easily. This makes A/B testing quite easy — just run each version for a subset of users and evaluate which one is better. ## Remote Deployment Environments are currently under active development during the beta period. If remote deployment is important to you, please reach out to us on Slack or via email. # Cost & Latency Lilypad automatically captures the latency of every span. For spans containing LLM API calls, Lilypad also automatically captures the cost of that call. ## Trace View When you select a trace to view more details, you'll find the latency of the span in the top right corner: ![Trace Cost & Latency](/assets/docs/lilypad/evaluation/cost-and-latency/trace.webp) If the span is or contains an LLM call, there will also be a pill for cost. ## Project Dashboard The home page for each project acts as a dashboard overview of the project. You can see cost and latency charts amongst other stats: ![Project Cost & Latency](/assets/docs/lilypad/evaluation/cost-and-latency/dashboard.webp) ## Versioned Function Dashboard The main page for a versioned function will show you cost and latency charts, which you can toggle to show by day, week, or month: ![Versioned Function Cost & Latency](/assets/docs/lilypad/evaluation/cost-and-latency/function.webp) # Comparisons Comparisons are a core to any evaluation process. Lilypad provides a simple way to compare spans and versioned functions. ## Comparing Spans You can compare spans by selecting the two spans you want to compare and clicking the "Compare" button. ![Click Compare Button For Traces](/assets/docs/lilypad/evaluation/comparisons/click-compare-spans.webp) This will open the comparison view, which shows you the two spans side by side: ![Span Comparison](/assets/docs/lilypad/evaluation/comparisons/compare-spans.webp) ## Comparing Function Versions In the versioned function view, you can click the "Compare" button to toggle a second version dropdown: ![Click Compare Button For Function Versions](/assets/docs/lilypad/evaluation/comparisons/click-compare-functions.webp) Selecting a second version will show you a comparison view of the two versions: ![Function Comparison](/assets/docs/lilypad/evaluation/comparisons/compare-functions.webp) # Annotations When you are viewing a trace in the full view, the right sidebar contains a section for discussions and annotations. ![Trace Annotation Sidebar](/assets/docs/lilypad/evaluation/annotations/annotation-sidebar.webp) ## Discussion The discussion tab is a place for you to leave open-ended comments on the trace. Often this is the first step towards annotating the trace. When you aren't yet sure how to annotate or trace (or if you want to), open-ended discussions are a great way to start. You can use this to ask questions, leave feedback, discuss the trace with your team, or even just leave a note to yourself. ![Trace Discussion](/assets/docs/lilypad/evaluation/annotations/trace-discussion.webp) ## Assigning Traces For Annotation In the trace view, annotations are read only. You can annotate a trace by assigning to a user, which adds it to their annotation queue. ![Click Assign Annotation](/assets/docs/lilypad/evaluation/annotations/click-assign-annotation.webp) ![Assign Annotation](/assets/docs/lilypad/evaluation/annotations/assign-annotation.webp) Once added to the queue, you can annotate the trace by clicking the label, (optionally) providing a reason, and clicking submit. ![Annotation Queue](/assets/docs/lilypad/evaluation/annotations/annotation-queue.webp) When annotating a trace, existing annotations are not shown to you. This is to prevent biasing your annotation. ## Annotating In Code You can also annotate traces in code. This is useful when you want to annotate a trace using a custom script or UI. ```python import os from lilypad import Lilypad client = Lilypad() client.ee.projects.annotations.create( project_uuid=os.environ["LILYPAD_PROJECT_ID"], body=[ { "span_uuid": "...", "project_uuid": os.environ["LILYPAD_PROJECT_ID"], "function_uuid": "...", "label": "pass", # [!code highlight] "reasoning": "this is a pass because...", # [!code highlight] "type": "manual", "data": {}, "assigned_to": ["..."], "assignee_email": ["e@mail.com"], } ], ) ``` ## Passthrough Annotations Setting `mode="wrap"` in the `trace` decorator converts the original return type of the decorated function into a `Traced[R]` that provides methods for operating on the corresponding trace directly. For example, you might want to enable passthrough annotation so that your users can annotate traces directly: ```python from google.genai import Client import lilypad client = Client() lilypad.configure() @lilypad.trace(name="Answer Question", versioning="automatic", mode="wrap") # [!code highlight] def answer_question(question: str) -> str | None: response = client.models.generate_content( model="gemini-2.0-flash-001", contents=f"Answer this question: {question}", ) return response.text trace: lilypad.Trace[str | None] = answer_question("What is the capital of France?") print(trace.response) # original response # > The capital of France is Paris. annotation = lilypad.Annotation( # [!code highlight] label="pass", # [!code highlight] reasoning="The answer was correct", # [!code highlight] data=None, # [!code highlight] type=None, # [!code highlight] ) # [!code highlight] trace.annotate(annotation) # [!code highlight] ``` # Lilypad API Reference The Lilypad API reference is under active development during the beta period. For now, please use the [Lilypad FastAPI Docs](https://lilypad-api.mirascope.com/v0/docs) for the most up-to-date reference of what's available. You can use the `Lilypad` (or `AsyncLilypad`) client to interace with the Lilypad API in code: ```python import os from lilypad import Lilypad client = Lilypad( base_url="https://lilypad-api.mirascope.com/v0", # can be ommitted if set in env api_key=os.environ["LILYPAD_API_KEY"], # can be ommitted if set in env ) traces = client.projects.traces.list( project_uuid=os.environ["LILYPAD_PROJECT_ID"], ) print(traces.items[0]) # > {'span_id': '...', ...} ``` ```python import asyncio import os from lilypad import AsyncLilypad client = AsyncLilypad( base_url="https://lilypad-api.mirascope.com/v0", # can be ommitted if set in env api_key=os.environ["LILYPAD_API_KEY"], # can be ommitted if set in env ) async def main(): traces = await client.projects.traces.list( project_uuid=os.environ["LILYPAD_PROJECT_ID"], ) print(traces.items[0]) # > {'span_id': '...', ...} asyncio.run(main()) ```