Skip to content

Tips & Inspiration

Building an LLM Pipeline: Tools and Techniques

An LLM pipeline—in the context of building applications based on language models—refers to the stages of a data workflow for ensuring that data is properly sourced, preprocessed, and integrated to obtain the best model results.

Accurate model outputs can positively influence an application's performance and user experience, leading to greater satisfaction, trust, and usage of the application.

To get accurate model outputs, you generally do one of two things: ‍

  • Fine-tune and train the model itself to better align with specific tasks and datasets (as part of LLMOps), or
  • Improve the quality of your prompts through an ongoing process of iteration and refinement.

As we’re very focused on improving and revising prompts in the context of our own LLM app development library, Mirascope, we believe crafting good prompts is a cost effective way to get reliable model responses.

Does LangChain Suck? Our Thoughts on Maximizing Its Potential

Some developers feel LangChain is overly complex, mainly because:

  • It requires you to learn unique abstractions for tasks that are often more straightforward in native Python and JavaScript. These abstractions tend to obscure what's going on under the hood , making the code hard to follow and debug. ‍
  • The library wasn’t designed with software developer best practices in mind, so the lack of modularity, consistency, and clear documentation results in tightly coupled code that’s difficult to maintain and extend.

This complexity leads some to believe the framework is only good for building prototypes rather than production-grade codebases.

12 LLM Tools to Help You Build LLM Applications

While you can build LLM applications using just the raw model APIs — and we've done this a few times before helper libraries existed — leveraging specialized tools that are tailored for your Large Language Model (LLM) workflows can reduce complexity and the risk of errors, allowing you to focus on developing applications and their LLMs.

LLM tools typically consist of software libraries, frameworks, and platforms that cater to different stages of the LLM lifecycle — from data preparation and prompt engineering to model fine-tuning, deployment, and monitoring. In general, they take away the grunt work by giving you pre-built utilities and workflows to save you time you’d normally spend on repetitive tasks and complex infrastructure setups.

So, whether you’re creating conversational chatbots, question-answering systems, or recommendation engines, having the right tools in your LLM stack will generally make your workflows more productive.

A Guide to Function Calling in OpenAI

In OpenAI function calling (or tools as it’s now known), you send descriptions of functions to the LLM in order for it to format these as structured outputs in valid JSON format—which is aligned to a particular schema. You then use these outputs in your calls to an application’s function, or to an API endpoint.

What makes these structured outputs useful is they can be a part of an automated workflow that integrates multiple systems and services.

With tools, LLMs become agents with greater scope to act on your behalf, autonomously choosing which functions or external services to use , for example, use cases like generating calls for the application to search on Bing or book a flight as part of a travel chatbot.

Prompt Engineering Examples and Techniques

If there’s one certainty when it comes to prompt engineering it’s this: the more you put into your prompts in terms of clarity, richness of context, and specificity of detail, the better the model outputs you’ll receive.

Prompt engineering is the process of structuring and refining your inputs that go into the LLM to get the best outputs. And if you build LLM applications, then getting the model to output the best results possible is important to providing a good user experience for your application.

This is why adherence to best practices is key when it comes to prompting. But in order to bridge the gap between best practice in theory and actual practice, we thought it useful to present a number of prompt engineering examples. These examples not only provide useful snippets for your own use cases, but they illustrate ways in which best practices are actually applied.

Understanding LangChain Runnables

A LangChain runnable is a protocol that allows you to create and invoke custom chains. It’s designed to sequence tasks, taking the output of one call and feeding it as input to the next, making it suitable for straightforward, linear tasks where each step directly builds upon the previous one.

Runnables simplify the process of building, managing, and modifying complex workflows by providing a standardized way for different components to interact. With a single function call, you can execute a chain of operations — which is useful for scenarios where the same series of steps need to be applied multiple times.

Prompt Chaining in AI Development

Prompt chaining is a way to sequence LLM calls (and their prompts) by using the output of the last call as input to the next, to guide an LLM to produce more useful answers than if it had been prompted only once.

By treating the entire chain of calls and prompts as part of a larger request to arrive at an ultimate response, you’re able to refine and steer the intermediate calls and responses at each step to achieve a better result.

Prompt chaining allows you to manage what may start out as a large, unwieldy prompt, whose implicitly defined subtasks and details can throw off language models and result in unsatisfying responses. This is because LLMs lose focus when asked to process different ideas thrown together. They can misread relationships between different instructions and incompletely execute them.

8 Prompt Engineering Best Practices and Techniques

Eliciting the best answers from Large Language Models (LLMs) begins with giving them clear instructions. That’s really the goal of prompt engineering: it’s to guide the model to respond in kind to your precise and specific inputs. Prompt engineering done right introduces predictability in the model’s outputs and saves you the effort of having to iterate excessively on your prompts.

In our experience, there are two key aspects to prompting an LLM effectively:

  1. The language employed should be unambiguous and contextually rich. The more an LLM understands exactly what you want, the better it’ll respond.

  2. Beyond the language used, good software developer practices such as version control, ensuring the quality of prompt inputs, writing clean code, and others, help maintain a structured and dependable approach.

LlamaIndex vs LangChain vs Mirascope: An In-Depth Comparison

In the context of building Large Language Model (LLM) applications—and notably Retrieval Augmented Generation (RAG) applications—the consensus seems to be that:

  • LlamaIndex excels in scenarios requiring robust data ingestion and management.
  • LangChain is suitable for chaining LLM calls and for designing autonomous agents.

In truth, the functionalities of both frameworks often overlap. For instance, LangChain offers document loader classes for data ingestion, while LlamaIndex lets you build autonomous agents.

Comparing Prompt Flow vs LangChain vs Mirascope

Currently, LangChain is one of the most popular frameworks among developers of Large Language Model (LLM) applications, and for good reason: its library is rather expansive and covers many use cases.

But teams using LangChain also report that it:

  • Takes a while to catch up in functionality to new features of the language models, which unfortunately means users must wait too.
  • Is an opinionated framework, and as such, encourages developers to implement solutions in its way.
  • Requires developers to learn its unique abstractions for doing tasks that might be easier to accomplish in native Python or JavaScript. This is in contrast to other frameworks that may offer abstractions, but don’t require that users learn or use them.
  • Sometimes uses a large number of dependencies, even for comparatively simple tasks.