Qwant Search Agent with Sources
This notebook tutorial walks through the implementation of a web agent that uses Large Language Models (LLMs) to perform intelligent web searches and extract relevant information. We'll use the Groq API for our LLM calls and the Qwant search engine for web queries.
When generating information using LLMs, it's important to note that the generated outputs can often be hallucinated. This remains true even when supplying information from searching the web to the LLM as context for its generation. This makes it extremely important to maintain and include sources alongside the generated output so that the output can be better verified.
Setup
First, let's install the necessary packages:
!pip install "mirascope[groq]" requests beautifulsoup4 python-dotenv tenacityNow, let's import the required libraries and load our environment variables:
import os
os.environ["GROQ_API_KEY"] = "gsk_..."
# Set the appropriate API key for the provider you're usingimport re
import time
from typing import Any, Callable
import requests
from bs4 import BeautifulSoup
from pydantic import BaseModel, Field
from tenacity import retry, stop_after_attempt, wait_exponential
from mirascope.core import prompt_template
from mirascope.core.groq import groq_callNow that we have created our tool, it’s time to create our LLM call.
Qwant API Implementation
Let's implement the Qwant API class for performing web searches:
class QwantApi:
    BASE_URL = "https://api.qwant.com/v3"
    def __init__(self) -> None:
        self.session = requests.Session()
        self.session.headers.update(
            {
                "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
            }
        )
    def search(
        self,
        q: str,
        search_type: str = "web",
        locale: str = "en_US",
        offset: int = 0,
        safesearch: int = 1,
    ) -> dict[str, Any]:
        params = {"q": q, "locale": locale, "offset": offset, "safesearch": safesearch}
        url = f"{self.BASE_URL}/search/{search_type}"
        response = self.session.get(url, params=params)
        return response.json() if response.status_code == 200 else NoneThis class encapsulates the functionality to interact with the Qwant search API. It allows us to perform searches of different types (web, news, images, videos) and handles the API request details.
Data Models
Next, let's define our data models using Pydantic:
class SearchResponse(BaseModel):
    answer: str = Field(description="The answer to the question")
    sources: list[str] = Field(description="The sources used to generate the answer")
class SearchType(BaseModel):
    search_type: str = Field(
        description="The type of search to perform (web, news, images, videos)"
    )
    reasoning: str = Field(description="The reasoning behind the search type selection")
class OptimizedQuery(BaseModel):
    query: str = Field(description="The optimized search query")
    reasoning: str = Field(description="The reasoning behind the query optimization")These Pydantic models define the structure for our data:
- SearchResponse: Represents the final answer and its sources
- SearchType: Represents the chosen search type and the reasoning behind it
- OptimizedQuery: Represents an optimized search query and the reasoning for the optimization
Implement Throttling
Let's create a decorator for throttling our API calls:
def throttle(calls_per_minute: int) -> Callable:
    min_interval = 60.0 / calls_per_minute
    last_called: list[float] = [0.0]
    def decorator(func: Callable) -> Callable:
        def wrapper(*args: Any, **kwargs: Any) -> Any:
            elapsed = time.time() - last_called[0]
            left_to_wait = min_interval - elapsed
            if left_to_wait > 0:
                time.sleep(left_to_wait)
            ret = func(*args, **kwargs)
            last_called[0] = time.time()
            return ret
        return wrapper
    return decorator
# Modify the groq_call decorator to include throttling and retrying
def throttled_groq_call(*args: Any, **kwargs: Any) -> Any:
    @retry(
        wait=wait_exponential(multiplier=1, min=4, max=10), stop=stop_after_attempt(5)
    )
    @throttle(calls_per_minute=6)  # Adjust this value based on your rate limit
    def wrapped_call(*call_args, **call_kwargs):
        return groq_call(*args, **kwargs)(*call_args, **call_kwargs)
    return wrapped_callDetermine Search Type
Now, let's implement the function to determine the most appropriate search type:
@throttled_groq_call(
    "llama-3.3-70b-versatile", response_model=SearchType, json_mode=True
)
@prompt_template(
    """
SYSTEM:
You are an expert at determining the most appropriate type of search for a given query. Your task is to analyze the user's question and decide which Qwant search type to use: web, news, images, or videos.
Follow these strict guidelines:
1. For general information queries, use 'web'.
2. For recent events, breaking news, or time-sensitive information, use 'news'.
3. For queries explicitly asking for images or visual content, use 'images'.
4. For queries about video content or asking for video results, use 'videos'.
5. If unsure, default to 'web'.
Provide your decision in a structured format with the search type and a brief explanation of your reasoning.
USER:
Determine the most appropriate search type for the following question:
{question}
ASSISTANT:
Based on the question, I will determine the most appropriate search type and provide my reasoning.
"""
)
def determine_search_type(question: str) -> SearchType:
    """
    Determine the most appropriate search type for the given question.
    """
    ...This function uses the Groq API to determine the most appropriate search type based on the user's question. It uses a prompt template to guide the LLM in making this decision.
Web Search Function
Let's implement the function to perform the actual web search using Qwant:
def qwant_search(query: str, search_type: str, max_results: int = 5) -> dict[str, str]:
    """
    Use Qwant to get information about the query using the specified search type.
    """
    print(f"Searching Qwant for '{query}' using {search_type} search...")
    search_results = {}
    qwant = QwantApi()
    results = qwant.search(query, search_type=search_type)
    if (
        results
        and "data" in results
        and "result" in results["data"]
        and "items" in results["data"]["result"]
    ):
        items = results["data"]["result"]["items"]
        if isinstance(items, dict) and "mainline" in items:
            items = items["mainline"]
        count = 0
        for item in items:
            if "url" in item:
                url = item["url"]
                print(f"Fetching content from {url}...")
                content = get_content(url)
                search_results[url] = content
                count += 1
                if count >= max_results:
                    break
            elif isinstance(item, dict) and "items" in item:
                for subitem in item["items"]:
                    if "url" in subitem:
                        url = subitem["url"]
                        print(f"Fetching content from {url}...")
                        content = get_content(url)
                        search_results[url] = content
                        count += 1
                        if count >= max_results:
                            break
                if count >= max_results:
                    break
    return search_results
def get_content(url: str) -> str:
    """
    Fetch and parse content from a URL.
    """
    data = []
    try:
        response = requests.get(url)
        content = response.content
        soup = BeautifulSoup(content, "html.parser")
        paragraphs = soup.find_all("p")
        for paragraph in paragraphs:
            data.append(paragraph.text)
    except Exception as e:
        print(f"Error fetching content from {url}: {e}")
    return "\n".join(data)These functions handle the web search process:
- qwant_search: Performs the search using the Qwant API and fetches content from the resulting URLs
- get_content: Fetches and parses the content from a given URL using BeautifulSoup
Search and Extract Functions
Now, let's implement the functions to process the search results and extract the final answer:
@groq_call("llama-3.3-70b-versatile")
@prompt_template(
    """
SYSTEM:
You are an expert at finding information on the web.
Use the provided search results to answer the question.
Rewrite the question as needed to better find information on the web.
Search results:
{search_results}
USER:
{question}
"""
)
def search(question: str, search_results: dict[str, str]) -> str:
    """
    Use the search results to answer the user's question.
    """
    # The model will return an answer based on the search results and question
    ...
@throttled_groq_call(
    "llama-3.3-70b-versatile", response_model=SearchResponse, json_mode=True
)  # Changed response_model from SearchType to SearchResponse
@prompt_template(
    """
SYSTEM:
Extract the answer to the question based on the search results.
Provide the sources used to answer the question in a structured format.
Search results:
{results}
USER:
{question}
"""
)
def extract(
    question: str, results: dict[str, str]
) -> SearchResponse:  # This function should return SearchResponse, not SearchType
    """
    Extract a concise answer from the search results and include sources.
    """
    ...
def clean_text(text: str) -> str:
    """
    Clean the text data for better formatting and readability.
    """
    # Removing extra spaces and special characters
    return re.sub(r"\s+", " ", text).strip()These functions process the search results:
- search: Uses the Groq API to generate an answer based on the search results
- extract: Extracts a concise answer and the sources used from the search results
- clean_text: Cleans the text output for better readability
Main Execution Function
Finally, let's implement the main execution function that orchestrates the entire process:
def run(question: str) -> SearchResponse:
    """
    Orchestrate the search and extraction process to answer the user's question.
    """
    print(f"Processing question: '{question}'")
    # Step 1: Determine the appropriate search type
    search_type_result = determine_search_type(question)
    print(f"Selected search type: {search_type_result.search_type}")
    print(f"Reasoning: {search_type_result.reasoning}")
    # Step 2: Search the web using Qwant with the determined search type
    search_results = qwant_search(question, search_type_result.search_type)
    # Step 3: Use Groq Llama model to summarize search results
    response = search(question, search_results)
    print(f"Search response: {response}")
    # Step 4: Extract the final answer and structured sources
    result = extract(question, search_results)
    # Step 5: Clean the output for readability
    result.answer = clean_text(result.answer)
    print(f"Final result: {result}")
    return resultThis run function orchestrates the entire process:
- Determines the appropriate search type
- Performs the web search
- Summarizes the search results
- Extracts the final answer and sources
- Cleans the output for readability
Usage Example
Let's add an example usage of our web agent:
if __name__ == "__main__":
    print("Example usage:")
    response = run("what is the latest on donald trump and elon musk?")
    print(response)This example demonstrates how to use the run function to process a question and get a response.
Conclusion
This notebook tutorial has walked through the implementation of a web agent that uses LLMs to perform intelligent web searches and extract relevant information. The agent determines the most appropriate search type, performs the search, processes the results, and provides a structured response with sources.
Key components of this implementation include:
- Qwant API integration for web searches
- Groq API integration for LLM-powered decision making and information extraction
- Pydantic models for structured data handling
- Implemented retry logic with exponential backoff using the tenacity library.
- BeautifulSoup for web scraping
- Prompt engineering for guiding LLM behavior
This web agent can be extended and customized for various applications, such as research assistants, fact-checking tools, or automated information gathering systems.