Text Classification¶
In this recipe we’ll explore using Mirascope to implement binary classification, multi-class classification, and various other extensions of these classification techniques — specifically using Python the OpenAI API. We will also compare these solutions with more traditional machine learning and Natural Language Processing (NLP) techniques.
Mirascope Concepts Used
Background
Text classification is a fundamental classification problem and NLP task that involves categorizing text documents into predefined classes or categories. Historically this has required training text classifiers through more traditional machine learning methods. Large Language Models (LLMs) have revolutionized this field, making sophisticated classification tasks accessible through simple API calls and thoughtful prompt engineering.
Setup¶
Let's start by installing Mirascope and its dependencies:
!pip install "mirascope[openai]"
import os
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
# Set the appropriate API key for the provider you're using
Binary Classification: Spam Detection¶
Binary classification involves categorizing text into one of two classes. We'll demonstrate this by creating a spam detector that classifies text as either spam or not spam.
For binary classification, we can extract a boolean value by setting response_model=bool
and prompting the model to classify the text:
from mirascope.core import openai, prompt_template
@openai.call("gpt-4o-mini", response_model=bool)
def classify_spam(text: str) -> str:
return f"Classify the following text as spam or not spam: {text}"
text = "Would you like to buy some cheap viagra?"
label = classify_spam(text)
assert label is True # This text is classified as spam
text = "Hi! It was great meeting you today. Let's stay in touch!"
label = classify_spam(text)
assert label is False # This text is classified as not spam
Multi-Class Classification: Sentiment Analysis¶
Multi-class classification extends the concept to scenarios where we need to categorize text into one of several classes. We'll demonstrate this with a sentiment analysis task.
First, we define an Enum
to represent our sentiment labels:
from enum import Enum
class Sentiment(Enum):
NEGATIVE = "negative"
NEUTRAL = "neutral"
POSITIVE = "positive"
Then, we set response_model=Sentiment
to analyze the sentiment of the given text:
from mirascope.core import openai
@openai.call("gpt-4o-mini", response_model=Sentiment)
def classify_sentiment(text: str) -> str:
return f"Classify the sentiment of the following text: {text}"
text = "I hate this product. It's terrible."
label = classify_sentiment(text)
assert label == Sentiment.NEGATIVE
text = "I don't feel strongly about this product."
label = classify_sentiment(text)
assert label == Sentiment.NEUTRAL
text = "I love this product. It's amazing!"
label = classify_sentiment(text)
assert label == Sentiment.POSITIVE
Classification with Reasoning¶
So far we've demonstrated using simple types like bool
and Enum
for classification, but we can extend this approach using Pydantic's BaseModel
class to extract additional information beyond just the classification label.
For example, we can gain insight to the LLMs reasoning for the classified label simply by including a reasoning field in our response model and updating the prompt:
from enum import Enum
from mirascope.core import openai
from pydantic import BaseModel
class Sentiment(Enum):
NEGATIVE = "negative"
NEUTRAL = "neutral"
POSITIVE = "positive"
class SentimentWithReasoning(BaseModel):
reasoning: str
sentiment: Sentiment
@openai.call("gpt-4o-mini", response_model=SentimentWithReasoning)
@prompt_template(
"""
Classify the sentiment of the following text: {text}.
Explain your reasoning for the classified sentiment.
"""
)
def classify_sentiment_with_reasoning(text: str): ...
text = "I would recommend this product if it were cheaper..."
response = classify_sentiment_with_reasoning(text)
print(f"Sentiment: {response.sentiment}")
print(f"Reasoning: {response.reasoning}")
Sentiment: Sentiment.NEUTRAL Reasoning: The text expresses a positive sentiment towards the product because the speaker is willing to recommend it. However, the mention of 'if it were cheaper' introduces a condition that makes the overall sentiment appear somewhat negative, as it suggests dissatisfaction with the current price. Therefore, the sentiment can be classified as neutral, as it acknowledges both a positive recommendation but also a negative aspect regarding pricing.
Handling Uncertainty¶
When dealing with LLMs for classification tasks, it's important to account for cases where the model might be uncertain about its prediction. We can modify our approach to include a certainty score and handle cases where the model's confidence is below a certain threshold.
from pydantic import BaseModel, Field
class SentimentAnalysisWithCertainty(BaseModel):
sentiment: Sentiment
certainty: float = Field(..., ge=0, le=1)
reasoning: str
class SentimentWithCertainty(BaseModel):
sentiment: Sentiment
reasoning: str
certainty: float
@openai.call("gpt-4o-mini", response_model=SentimentWithCertainty)
@prompt_template(
"""
Classify the sentiment of the following text: {text}
Explain your reasoning for the classified sentiment.
Also provide a certainty score between 0 and 1, where 1 is absolute certainty.
"""
)
def classify_sentiment_with_certainty(text: str): ...
text = "This is the best product ever. And the worst."
response = classify_sentiment_with_certainty(text)
if response.certainty > 0.8:
print(f"Sentiment: {response.sentiment}")
print(f"Reasoning: {response.reasoning}")
print(f"Certainty: {response.certainty}")
else:
print("The model is not certain enough about the classification.")
The model is not certain enough about the classification.
Additional Real-World Applications
- Content ModerationClassify user-generated content as appropriate, inappropriate, or requiring manual review
- Customer Support TriageCategorize incoming support tickets by urgency or department.
- News Article CategorizationClassify news articles into topics (e.g. politics, sports, technology, etc)
- Intent RecognitionIdentify user intent in chatbot interactions (e.g. make a purchase, ask for help, etc.)
- Email ClassificationSort emails into categories like personal, work-related, promotional, or urgent
When adapting this recipe to your specific use-case, consider the following:
- Refine your prompts to provide clear instructions and relevant context for your specific classification task.
- Experiment with different model providers and version to balance accuracy and speed.
- Implement error handling and fallback mechanisms for cases where the model's classification is uncertain.
- Consider using a combination of classifiers for more complex categorization tasks.
Comparison with Traditional Machine Learning Models¶
Training text classification models requires a much more involved workflow:
- Preprocessing:
- Read in data, clean and standardize it, and split it into training, validation, and test datasets
- Feature Extraction:
- Basic: bag of words, TF-IDF
- Advanced: word embeddings, contextual embeddings
- Classification Algorithm / Machine Learning Algorithm:
- Basic: Naive Bayes, logistic regression, linear classifiers
- Advanced: Neural networks, transformers (e.g. BERT)
- Model Training:
- Train on training data and validate on validation data, adjusting batch size and epochs.
- Things like activation layers and optimizers can greatly impact the quality of the final trained model
- Model Evaluation:
- Evaluate model quality on the test dataset using metrics such as F1-score, recall, precision, accuracy — whichever metric best suits your use-case
Many frameworks such as TensorFlow and PyTorch make implementing such workflows easier, but it is still far more involved that the approach we showed in the beginning using Mirascope.
If you’re interested in taking a deeper dive into this more traditional approach, the TensorFlow IMDB Text Classification tutorial is a great place to start.