Skip to content

Tips & Inspiration

How to Build a Knowledge Graph from Unstructured Information

A knowledge graph is a structured representation of interconnected information where entities are linked through defined relationships.

Knowledge graphs show you which entities are connected and how they’re related, and are most useful for structuring and giving context to unstructured data (like text, images, and audio), allowing you to:

  • Visualize subtle (or hidden) patterns or insights that might not be immediately apparent in traditional data formats.
  • Get accurate and context-aware search results by better connecting related entities and concepts.
  • Bring data together from multiple, often unrelated sources into a single, unified system.

Building a knowledge graph involves setting up these entities and their relationships:

  • Entities are the primary subjects within the graph — whether people, organizations, places, or events — and each holds attributes relevant to that subject, like a "Person" entity with attributes of name, age, and occupation.
  • Relationships between entities — often called edges — show how these entities connect and interact, such as a "Person" node being linked to a "Company" node by a "works for" relationship.
  • Properties add additional context, or metadata like dates or locations, to entities and edges.

Traditionally, building knowledge graphs used to involve bringing together a wide range of disciplines to manually design ontologies, curate data, and develop algorithms for extracting entities and relationships, which required expertise in areas like data science, natural language processing, and semantic web technologies.

Today, you no longer need to be an expert in graph theory or taxonomies to build your own graph, especially when LLMs can help simplify entity recognition and relationship extraction.

We dive into key concepts and steps for getting started with knowledge graphs, and show you how to leverage an LLM to build a graph using Mirascope, our lightweight toolkit for developing AI-driven applications.

Top LLM Frameworks For AI Application Development

LLM frameworks provide tools and libraries for building and scaling language model applications. They handle everything from model integration to deployment — allowing you to focus on your app’s functionality without having to build everything from scratch.

Frameworks:

  • Offer prompt engineering and quality assurance tools for accurate and relevant responses.
  • Provide pre-built modules for common tasks like data preprocessing, model fine-tuning, and response generation.
  • Make it easy to integrate with other tools and platforms like Hugging Face Transformers, TensorFlow, or PyTorch without having to deal with complex APIs.
  • Orchestrate workflows to manage complex, multi-step processes like input validation, output formatting, and more.

In general, these frameworks should simplify tasks that would otherwise require lots of manual coding and multiple iterations.

But modern frameworks (like LangChain) impose their own unique abstractions, requiring you to do things their way. This not only feels limiting but also makes development and maintenance harder than it needs to be.

For this reason, we developed Mirascope - a lightweight Python toolkit that provides building blocks for developing LLM-powered applications without unnecessary constraints.

Below, we’ve curated a list of the top LLM frameworks and highlighted the strengths and purpose of each framework in the following categories:

Getting Started with LangChain RAG

LangChain is well suited for retrieval augmented generation (RAG) because it offers modules and abstractions for practically anything you want to do in terms of data ingestion, chunking, embedding, retrieval, and LLM interaction.

An ecosystem in itself, LangChain lets you build out complete workflows for developing LLM-powered applications. Some of its unique abstractions, however, entail steep learning curves and opaque structures that are hard to debug.

Take runnables for instance, which LangChain relies on for chaining together steps to handle prompt flows:

Although RunnablePassthrough lets you pass data such as user inputs unchanged through a sequence of processing steps, it actually doesn’t indicate what’s going on under the hood, making it harder to find the source errors.

That’s why you might not want to rely on LangChain’s modules for absolutely everything; after all, simpler, more transparent Python logic can be more efficient and easier to manage.

That’s why we designed Mirascope, a lightweight toolkit for building agents directly in native Python without needing to resort to unique, complex abstractions that complicate development and debugging.

Mirascope avoids the rigid, all-or-nothing approach you find in the big frameworks, and instead offers a flexible architecture that lets you select only the modules you need, while giving you full control over prompting, context management, and LLM interactions.

In this article, we show you how to build a simple RAG application using LangChain’s functionality for data ingestion, preprocessing, and storage. We then integrate this with Mirascope to simplify query and response flows.

Overview of LLM Evaluation Metrics and Approaches

Methods of evaluating LLMs originated from early techniques used to assess classical NLP models (like Hidden Markov Models and Support Vector Machines). However, the rise of Transformer-based LLMs required more sophisticated evaluations that focused on scalability, generative quality, and ethical considerations.

These evaluations (or “evals” as they’re also called) systematically measure and assess model outputs to ensure they perform effectively within their intended application or context.

This is necessary because language models can generate incorrect, biased, or harmful content, which can have implications based on your use case. So you’ll need to verify that the outputs meet your quality standards and adhere to factual correctness.

The whole process of doing evals can be boiled down to two tasks:

  1. Coming up with good criteria against which to evaluate LLM outputs.
  2. Developing systems that reliably and consistently measure those outputs against your criteria.

A Guide to Advanced Prompt Engineering

Advanced prompt engineering – which usually involves techniques such as few-shot learning, multi-step reasoning, and sophisticated prompting – enriches prompts with greater context and guidance to get better answers from the language model.

This is in contrast to basic prompting where you give the LLM simpler, direct instructions.

The more complex the task, the greater the need to structure prompts to guide the model with reasoning steps, external information, and specific examples.

Done right, advanced prompt engineering techniques let you do some pretty impressive things, for instance:

  • Not only write about Ernest Hemingway but write like Hemingway.
  • Generate product descriptions solely from e-commerce images, automatically extracting features like color, size, and material.
  • Fact check a debate or conversation in real time by allowing agents to search multiple web sources and present the most accurate information with citations.
  • Troubleshoot internal IT issues using a bot that identifies common issues and suggests solutions from the company’s knowledge base.

Comparing Prompt Engineering vs Fine-Tuning

Prompt engineering is about refining and iterating on inputs to get a desired output from a language model, while fine-tuning retrains a model on a specific dataset to get better performance out of it.

Both are ways to improve LLM results, but they’re very different in terms of approach and level of effort.

Prompt engineering, which involves creating specific instructions or queries to guide the model's output, is versatile and can be effective for a wide range of tasks, from simple to complex.

LangChain Structured Output: A Guide to Tools and Methods

The most popular LangChain tools for getting structured outputs are:

  • .with_structured_output, a class method that uses a schema like JSON to guide the structure and format of a language model’s response.
  • PydanticOutputParser that parses raw LLM text and uses a Pydantic object to extract key information.
  • StructuredOutputParser which extracts information from LLM responses according to a schema like a Python dictionary or JSON schema.

These tools modify or guide LLM responses to simplify further processing by other systems or applications.

For example, if you needed to extract a JSON object containing fields for “name," "date," and "location” out of the model’s response, you could use StructuredOutputParser to ensure the model’s output adheres to the specific schema.

LLM Prompt: Examples and Best Practices

An LLM prompt is an instruction you give a language model to guide it to a desired response, and can be anything from a simple question to an input spanning multiple calls.

For simple queries, a straightforward, one-liner prompt will often get you the response you want, but we recommend prompt engineering for anything more complex or when your initial prompts don’t achieve your desired results.

This article explores examples and techniques for effective prompt writing, including:

  • The role of prompts in dialoguing with language models
  • Prompt examples and their use cases
  • Advanced techniques for prompting
  • Challenges and limitations with prompt engineering

LLM Applications: What They Are and 6 Examples

Large language models (LLMs) have exploded in popularity since the release of ChatGPT in late 2022, which brought a surge of interest in the development community for building LLM applications across various domains like healthcare, finance, entertainment, and beyond.

Developers are integrating models into tools for content generation, customer service automation, personalized learning, and data analysis, among many others, driving innovation in areas where natural language understanding and generation excel.

As the range of applications continues to grow, we decided to explore six of the most popular and impactful use cases that are driving adoption of LLM applications today.

We also explain our top considerations for designing such apps, and show you how to implement an example question-answering chatbot using both Llama Index, a framework commonly used for ingesting and managing data, and Mirascope, our lightweight tool for building LLM applications.

A Guide to LLM Orchestration

LLM orchestration manages and coordinates interactions and workflows for improving the performance and effectiveness of LLM-driven applications.

Central to this is the large language model and its responses, which are non-deterministic and typically unstructured. This makes it challenging for other components of the application to use those responses.

But these challenges can be overcome by decision-making around careful application design on the one hand, and effective orchestration on the other.