LLM Chaining: Techniques and Best Practices
LLM chaining is a technique in artificial intelligence for connecting multiple LLMs or their outputs to other applications, tools, and services to get the best responses or to do complex tasks.
Chaining lets applications do things like work with multiple files, refine content iteratively, and improve responses. It also overcomes some inherent limitations of LLMs themselves:
- They generally only accept up to a certain amount of information in one prompt (despite “context windows” getting larger all the time), and so connecting an LLM with a service to divide up long documents and feed these via several calls to the model can be very useful.
- They only remember what’s been said in a given conversation but not outside of it, unless you store the memory or state externally.
- They still generally only output their answers as text (e.g., prose, JSON, SQL code). But what if your application needs very specific outputs like a validated CSV file, flow chart, or knowledge graph?
This is why LLM chains are useful: they let you build sophisticated applications like LLM agents and retrieval augmented generation (RAG) systems that typically include:
- An input processing step, like preparing and formatting data that’ll be sent to the LLM — which often involves a prompt template.
- APIs enabling interaction with both LLMs and external services and applications.
- Language model output processing, such as parsing, validating, and formatting.
- Data retrieval from external data sources, such as fetching relevant embeddings from a vector database to enhance contextual understanding in LangChain RAG applications.
But putting together an LLM chain isn’t always straightforward, which is why orchestration frameworks like LangChain and LlamaIndex exist, though these have their own inefficiencies that we’ll discuss later on.
It’s for that reason we designed Mirascope, our developer-friendly, pythonic toolkit, to overcome shortcomings of modern frameworks.
Mirascope works together with Lilypad, our open source prompt observability platform that allows software developers to version, trace, and optimize language model calls, treating these as non-deterministic functions.
Below, we dive into techniques for chaining and discuss what to look for in an LLM chaining framework.