Skip to content

Prompts & Notes

LLM Agents: What They Are, Tools, and Examples

We can define an "agent" as a person who acts on behalf of another person or group. However, the definition of agent with respect to Large Language Models (LLMs) is a hotly debated topic with no one definition yet reigning.

We like to refer to an LLM agent as an autonomous or semi-autonomous system that can act on your behalf. The core concept is the use of tools to enable the LLM to interact with its environment through tool use.

Agents can be used to handle complex, multi-step tasks that may require planning, data retrieval, or other dynamic paths that are not necessarily fully or well defined before starting the task.

This goes beyond what an LLM normally does on its own — which is to generate text responses to user queries based on its pre-training — and steps up its autonomy in planning, executing tasks, using tools, and retrieving external data.

What makes LLM agents useful is they can function within workflows that integrate multiple systems and services without having to fully define every step of the process beforehand.

How to Make a Chatbot from Scratch (That Uses LLMs)

Chatbots are built in generally one of two ways:

  1. Using no-code tools, messaging platforms (like Facebook Messenger or WhatsApp), or a chatbot builder like Botpress. This is often the preferred route for non-developers or those who don’t need specific functionality, and allows them to create chatbots quickly and with minimal technical expertise​​​.
  2. Developing one yourself, which is a great option for those needing custom functionality, advanced integrations, or unique designs.

Building your own chatbot gives you complete control over the bot’s behavior and features, but requires a basic grasp of programming.

As far as programming is concerned, Python’s readability and selection of libraries for working with language models makes it a good choice, although you can build chatbots in any language.

The sheer number of options and approaches for building a chatbot from scratch can seem overwhelming and it can be challenging to even decide where to start.

In this article, we help you navigate the process of building your own LLM-driven chatbot from scratch, including how (and where) to get started, as well as things to look out for when choosing a chat development library.

Finally, we include a step-by-step example of building a basic chatbot and extending it to use tools. We build this using Mirascope, our lightweight, Pythonic toolkit for developing LLM-powered applications that’s designed with software developer best practices in mind.

Using LLM-as-a-Judge to Evaluate AI Outputs

LLM as a judge is an evaluation technique that uses a language model to score the quality of answers of another model.

This allows you to automate certain types of evaluations that typically have to be done by humans, such as:

  • Assessing the politeness or tone of chatbot responses
  • Comparing and ranking two or more LLM responses according to how well they directly address the prompt or question
  • Scoring the quality of a machine translation
  • Determining how creative or original a response is

Such tasks have traditionally been hard to measure using software since they require subjective human-level interpretation, and machine learning assessments typically rely on objective rules or evaluation metrics.

In fact, there’s evidence that models like GPT-4 often come to similar conclusions on such tasks as humans do.

Below, we describe how LLM judges work, the steps required to set up an LLM as a judge, and the unique challenges they bring. We also show you a practical example of how to implement an LLM evaluator using our lightweight Python toolkit, Mirascope.

RAG Application: Benefits, Challenges & How to Build One

Retrieval augmented generation (RAG) is a workflow that incorporates real-time information retrieval into AI-assisted outputs.

The retrieved information gives the language model added context, allowing it to generate responses that are factual, up-to-date, and domain specific, thereby reducing the risk of misinformation or hallucinated content.

Hallucinations are LLM responses that sound plausible but are wrong, and the risk of these occurring is very real. This is due to a variety of reasons, not least because large language models rely on pre-trained data that may be outdated, incomplete, or irrelevant to the specific context of a query.

This limitation is more pronounced in domains like healthcare, finance, or legal compliance, where the details of the background information need to be correct to be useful.

RAG, however, keeps responses anchored in reality, mitigating the chances of inaccuracies. It’s found in an increasing number of generative AI applications like:

  • A customer chatbot — to intelligently extract information from relevant knowledge base articles.
  • Healthcare — to provide evidence-based medical advice.
  • Legal research — to retrieve relevant case law and regulations.
  • Personalized education platforms — to provide up-to-date answers tailored to customers’ specific questions and needs.
  • Financial advisory tools — to leverage timely market data for better decision-making.

In this article, we explain how RAG works and list its challenges. We also show an example of building an application using both LlamaIndex and Mirascope, our lightweight toolkit for building with LLMs.

We use LlamaIndex for data ingestion, indexing, and retrieval, while leveraging Mirascope’s straightforward and Pythonic approach to prompting.

A Guide to Synthetic Data Generation

Synthetic data generation involves creating artificial data that closely imitates the characteristics of real-world data, providing an alternative when real data is difficult to collect, and helping to address data bottlenecks where real-world data is scarce or inaccessible.

Instead of collecting data from actual events or observations, you use algorithms and models to produce data that replicates the patterns, structures, and statistical attributes found in real datasets.

Such data is widely used in industries where real data is either sensitive, unavailable, or limited in scope. In healthcare, for instance, synthetic data lets you test medical systems or train AI models on patient-like records without risking sensitive information.

Similarly, in finance, you can simulate customer transactions for fraud detection and risk analysis without exposing sensitive customer information or confidential business data.

You might want to generate artificial data in order to:

  • Save yourself the time and effort of collecting and managing real data yourself.
  • Augment your datasets with synthetic data that covers more scenarios and edge cases; for example, simulating rare instances of poor lighting so your facial recognition model adapts better to such situations.

Retrieval Augmented Generation: Examples & How to Build One

RAG is a way to make LLM responses more accurate and relevant by connecting the model to an external knowledge base that pulls in useful information to include in the prompt.

This overcomes certain limitations of relying on language models alone, as responses now include up-to-date, specific, and contextually relevant information that aren’t limited to what the model learned during its training.

It also contrasts with other techniques like semantic search, which retrieves relevant documents or snippets (based on the user’s meaning and intent) but leaves the task of understanding and contextualizing the information entirely to the user.

RAG helps reduce the risk of hallucination and offers benefits in fields where accuracy, timeliness, and specialized knowledge are highly valued, such as healthcare, science, legal, and others.

As an alternative to RAG you can fine-tune a model to internalize domain-specific knowledge, which can result in faster and more consistent responses — as long as those tasks have specialized, fixed requirements — but it’s generally a time consuming and potentially expensive process.

Also, the model’s knowledge is static, meaning you’ll need to fine-tune the model again to update it.

RAG, in contrast, gives you up-to-date responses from a knowledge base that can be adapted on the fly.

Below, we explain how RAG works and then show you examples of using RAG for different applications. Finally, we walk you through an example of setting up a simple RAG application in Python.

For the tutorial we use LlamaIndex for data ingestion and storage, and also Mirascope, our user-friendly development library for integrating large language models with retrieval systems to implement RAG.

How to Build a Knowledge Graph from Unstructured Information

A knowledge graph is a structured representation of interconnected information where entities are linked through defined relationships.

Knowledge graphs show you which entities are connected and how they’re related, and are most useful for structuring and giving context to unstructured data (like text, images, and audio), allowing you to:

  • Visualize subtle (or hidden) patterns or insights that might not be immediately apparent in traditional data formats.
  • Get accurate and context-aware search results by better connecting related entities and concepts.
  • Bring data together from multiple, often unrelated sources into a single, unified system.

Building a knowledge graph involves setting up these entities and their relationships:

  • Entities are the primary subjects within the graph — whether people, organizations, places, or events — and each holds attributes relevant to that subject, like a "Person" entity with attributes of name, age, and occupation.
  • Relationships between entities — often called edges — show how these entities connect and interact, such as a "Person" node being linked to a "Company" node by a "works for" relationship.
  • Properties add additional context, or metadata like dates or locations, to entities and edges.

Traditionally, building knowledge graphs used to involve bringing together a wide range of disciplines to manually design ontologies, curate data, and develop algorithms for extracting entities and relationships, which required expertise in areas like data science, natural language processing, and semantic web technologies.

Today, you no longer need to be an expert in graph theory or taxonomies to build your own graph, especially when LLMs can help simplify entity recognition and relationship extraction.

We dive into key concepts and steps for getting started with knowledge graphs, and show you how to leverage an LLM to build a graph using Mirascope, our lightweight toolkit for developing AI-driven applications.

Top LLM Frameworks For AI Application Development

LLM frameworks provide tools and libraries for building and scaling language model applications. They handle everything from model integration to deployment — allowing you to focus on your app’s functionality without having to build everything from scratch.

Frameworks:

  • Offer prompt engineering and quality assurance tools for accurate and relevant responses.
  • Provide pre-built modules for common tasks like data preprocessing, model fine-tuning, and response generation.
  • Make it easy to integrate with other tools and platforms like Hugging Face Transformers, TensorFlow, or PyTorch without having to deal with complex APIs.
  • Orchestrate workflows to manage complex, multi-step processes like input validation, output formatting, and more.

In general, these frameworks should simplify tasks that would otherwise require lots of manual coding and multiple iterations.

But modern frameworks (like LangChain) impose their own unique abstractions, requiring you to do things their way. This not only feels limiting but also makes development and maintenance harder than it needs to be.

For this reason, we developed Mirascope - a lightweight Python toolkit that provides building blocks for developing LLM-powered applications without unnecessary constraints.

Below, we’ve curated a list of the top LLM frameworks and highlighted the strengths and purpose of each framework in the following categories:

Getting Started with LangChain RAG

LangChain is well suited for retrieval augmented generation (RAG) because it offers modules and abstractions for practically anything you want to do in terms of data ingestion, chunking, embedding, retrieval, and LLM interaction.

An ecosystem in itself, LangChain lets you build out complete workflows for developing LLM-powered applications. Some of its unique abstractions, however, entail steep learning curves and opaque structures that are hard to debug.

Take runnables for instance, which LangChain relies on for chaining together steps to handle prompt flows:

Although RunnablePassthrough lets you pass data such as user inputs unchanged through a sequence of processing steps, it actually doesn’t indicate what’s going on under the hood, making it harder to find the source errors.

That’s why you might not want to rely on LangChain’s modules for absolutely everything; after all, simpler, more transparent Python logic can be more efficient and easier to manage.

That’s why we designed Mirascope, a lightweight toolkit for building agents directly in native Python without needing to resort to unique, complex abstractions that complicate development and debugging.

Mirascope avoids the rigid, all-or-nothing approach you find in the big frameworks, and instead offers a flexible architecture that lets you select only the modules you need, while giving you full control over prompting, context management, and LLM interactions.

In this article, we show you how to build a simple RAG application using LangChain’s functionality for data ingestion, preprocessing, and storage. We then integrate this with Mirascope to simplify query and response flows.

Overview of LLM Evaluation Metrics and Approaches

Methods of evaluating LLMs originated from early techniques used to assess classical NLP models (like Hidden Markov Models and Support Vector Machines). However, the rise of Transformer-based LLMs required more sophisticated evaluations that focused on scalability, generative quality, and ethical considerations.

These evaluations (or “evals” as they’re also called) systematically measure and assess model outputs to ensure they perform effectively within their intended application or context.

This is necessary because language models can generate incorrect, biased, or harmful content, which can have implications based on your use case. So you’ll need to verify that the outputs meet your quality standards and adhere to factual correctness.

The whole process of doing evals can be boiled down to two tasks:

  1. Coming up with good criteria against which to evaluate LLM outputs.
  2. Developing systems that reliably and consistently measure those outputs against your criteria.