{ "cells": [ { "cell_type": "markdown", "id": "2f62b7946072f9ee", "metadata": {}, "source": [ "# DiVeRSe: Enhancing LLM Reasoning with Prompt Variations\n", "\n", "This recipe demonstrates how to implement the DiVeRSe (Diverse Verifier on Reasoning Steps) technique using Large Language Models (LLMs) with Mirascope. DiVeRSe is a prompt engineering method that enhances an LLM's reasoning capabilities by generating multiple reasoning chains from variations of the original prompt.\n", "\n", "
\n", "

Mirascope Concepts Used

\n", "\n", "
\n", "\n", "
\n", "

Background

\n", "

\n", "DiVeRSe is a variant of the self-consistency prompt engineering technique. Instead of generating multiple chains from the same prompt, DiVeRSe creates variations of the original prompt to generate different reasoning chains. This approach can significantly improve the LLM's ability to reason about complex problems by considering multiple perspectives and phrasings of the same question.\n", "

\n", "
\n", "\n", "## Implementation\n", "\n", "Let's implement the DiVeRSe technique using Mirascope:\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "c116d312900e2c35", "metadata": { "ExecuteTime": { "end_time": "2024-10-01T01:44:19.373585Z", "start_time": "2024-10-01T01:44:06.136569Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16\n" ] } ], "source": [ "import asyncio\n", "\n", "from mirascope.core import openai, prompt_template\n", "from pydantic import BaseModel, Field\n", "\n", "\n", "class PromptVariations(BaseModel):\n", " variations: list[str] = Field(..., description=\"Variations of the original prompt\")\n", "\n", "\n", "@openai.call(model=\"gpt-4o-mini\", response_model=PromptVariations)\n", "@prompt_template(\n", " \"\"\"\n", " Return the {num_prompts} alternate variations of the prompt which retain the\n", " full meaning but uses different phrasing.\n", " Prompt: {prompt}\n", " \"\"\"\n", ")\n", "def get_prompt_variations(prompt: str, num_prompts: int): ...\n", "\n", "\n", "@openai.call(model=\"gpt-4o-mini\")\n", "@prompt_template(\n", " \"\"\"\n", " Answer the following question going step by step:\n", " {query}\n", " \"\"\"\n", ")\n", "async def zero_shot_cot(query: str): ...\n", "\n", "\n", "class ResponseDetails(BaseModel):\n", " solution_number: int = Field(\n", " ..., description=\"The actual number given as the answer in a solution.\"\n", " )\n", " correctness_probability: float = Field(\n", " ...,\n", " ge=0,\n", " le=1,\n", " description=\"An estimated probability that the given solution is correct from 0.0 to 1.0\",\n", " )\n", "\n", "\n", "@openai.call(model=\"gpt-4o-mini\", response_model=ResponseDetails)\n", "@prompt_template(\n", " \"\"\"\n", " Here is a query and a response which attempts to answer the query.\n", " Prompt: {query}\n", " Response: {response}\n", "\n", " Extract the raw numerical value of the answer given by the response, and also\n", " give an estimate between 0.0 and 1.0 of the probability that this solution\n", " is correct.\n", " \"\"\"\n", ")\n", "async def evaluate_response(query: str, response: str): ...\n", "\n", "\n", "async def diverse(query: str, num_variations: int) -> str:\n", " # Gather the variations of the prompt\n", " alternate_variations = get_prompt_variations(query, num_variations - 1)\n", " all_variations = alternate_variations.variations + [query]\n", "\n", " # Generate a unique reasoning chain for each prompt variation with CoT\n", " cot_tasks = [zero_shot_cot(prompt) for prompt in all_variations]\n", " cot_responses = [response.content for response in await asyncio.gather(*cot_tasks)]\n", "\n", " # Evaluate each reasoning chain\n", " eval_tasks = [\n", " evaluate_response(query, cot_response) for cot_response in cot_responses\n", " ]\n", " eval_responses = await asyncio.gather(*eval_tasks)\n", "\n", " response_scores = {}\n", " for eval_response in eval_responses:\n", " if eval_response.solution_number not in response_scores:\n", " response_scores[eval_response.solution_number] = 0\n", " response_scores[eval_response.solution_number] += (\n", " eval_response.correctness_probability\n", " )\n", " best_response = max(response_scores.keys(), key=lambda k: response_scores[k])\n", " return best_response\n", "\n", "\n", "async def run_diverse(prompt, num_variations=3) -> str:\n", " response = await diverse(prompt, num_variations)\n", " return response\n", "\n", "\n", "query = \"\"\"\n", "A committee of 3 people must be formed from a pool of 6 people, but Amy and Bob do not\n", "get along and should not be on the committee at the same time. How many viable\n", "combinations are there?\n", "\"\"\"\n", "\n", "print(await run_diverse(query))" ] }, { "cell_type": "markdown", "id": "906c5c20b6309e42", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "158ac0e470b9041b", "metadata": {}, "source": [ "This implementation consists of several key components:\n", "\n", "1. `get_prompt_variations`: Generates variations of the original prompt.\n", "2. `zero_shot_cot`: Implements zero-shot chain-of-thought reasoning for each prompt variation.\n", "3. `evaluate_response`: Evaluates each reasoning chain and assigns a probability of correctness.\n", "4. `diverse`: Orchestrates the DiVeRSe technique by generating prompt variations, creating reasoning chains, and selecting the best response.\n", "\n", "## Benefits and Considerations\n", "\n", "The DiVeRSe implementation offers several advantages:\n", "\n", "1. Improved reasoning by considering multiple phrasings of the same problem.\n", "2. Enhanced robustness against potential misinterpretations of the original prompt.\n", "3. Potential for more accurate responses in complex reasoning tasks.\n", "\n", "When implementing this technique, consider:\n", "\n", "- Balancing the number of prompt variations with computational cost and time constraints.\n", "- Adjusting the evaluation criteria for different types of problems (e.g., numerical vs. categorical answers).\n", "- Fine-tuning the prompt variation generation to ensure meaningful diversity while maintaining the original question's intent.\n", "\n", "
\n", "

Additional Real-World Applications

\n", "\n", "
\n", "\n", "When adapting this recipe to your specific use-case, consider:\n", "\n", "- Tailoring the prompt variation generation to your domain for better performance.\n", "- Experimenting with different evaluation methods for the reasoning chains.\n", "- Implementing a feedback loop to refine the prompt variation process based on the accuracy of final answers.\n", "- Combining DiVeRSe with other techniques like Self-Ask or Sim to M for even more nuanced reasoning capabilities.\n", "\n", "By leveraging Mirascope's `call` decorator, response models, and dynamic configuration, you can easily implement and customize the DiVeRSe technique to enhance your LLM's ability to reason about complex problems across a wide range of applications.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }