{ "cells": [ { "cell_type": "markdown", "id": "613ff8b3800647ea", "metadata": {}, "source": [ "# Document Segmentation\n", "\n", "In this recipe, we go over how to do semantic document segmentation. Topics and themes of articles can frequently be dispersed across multiple sections or even separate files. We will be using OpenAI GPT-4o-mini to break down an article into topics and themes.\n", "\n", "
Mirascope Concepts Used
\n", "Background
\n", "\n", "Traditional machine learning techniques often relied on handcrafted features, such as detecting paragraph breaks, identifying section headers, or using statistical measures of text coherence. While effective for well-structured documents, these approaches often struggled with more complex or inconsistently formatted texts. LLMs have revolutionized document segmentation by enabling more flexible and context-aware parsing of text, regardless of formatting or structure.\n", "
\n", "Additional Real-World Applications
\n", "