Prompt Inversion

AI Agents are having their breakthrough moment. At their core, agents are systems that combine models, tools, and human input to solve complex tasks autonomously. From OpenAI’s Operator and Deep Research and Google’s AI co-scientist, the biggest AI players kicked off 2025 by betting huge on agentic technology. This rise of agents-as-products represents one of the most practical and profitable use cases for AI.

‍

A wave of new frameworks for building agentic workflows has emerged, each with its own strengths and trade-offs. With so many options, choosing the right one can be tricky. In this two-part blog post we dissect six of the most popular agentic frameworks—LangChain’s LangGraph, Microsoft’s AutoGen, Pydantic’s PydanticAI, CrewAI, OpenAI’s Swarm, and Hugging Face’s Smolgents—across five factors—message passing, state management, tool calling, quality of documentation and ease of use. In this first part, we’ll walk through the setup of our agentic workflow and define some key terminology that will guide our framework comparison.

‍

To have a fair comparison, we implemented the same multi-agent spam classification system across all of six frameworks. The system integrates a fine-tuned BERT spam classifier, GPT-4 for independent reasoning and a human feedback loop for retraining BERT. Here’s how our multi-agent workflow operates:

‍

Multi-Agent Spam Classification Workflow

‍

1. Input Agent: Captures user input that contains the message to be classified and passes it to the BERT Agent.

2. BERT Agent: Uses a fine-tuned BERT model to predict whether the input message is spam or not. Passes the input message and the prediction to the GPT Agent.

3. GPT Agent: Evaluates BERT’s prediction using GPT-4, providing agreement or disagreement with a brief explanation.

If GPT agrees with BERT’s prediction, then the message, BERT’S prediction and GPT’s explanation are passed to the final Output Agent.
If GPT disagrees, then the message, BERT’S prediction and GPT’s explanation are passed to the Human Feedback Agent, initiating a retraining loop.

4. Human Feedback Agent: Prompts the human-in-the-loop to provide the correct classification label for the message based on their judgment and passes it to the Retrain Agent.

5. Retrain Agent: Retrains BERT with the new classification label obtained from human feedback and passes the message back to the BERT Agent for a fresh prediction.

6. Output Agent: Presents the final prediction, GPT’s reasoning, and details on human feedback and retraining to the user.

‍

This system represents a self-improving workflow where GPT-4’s reasoning and human feedback refine BERT’s predictions over time. It represents a complex, multi-agent setup where each agent handles a specific task while collaborating with the other agents through precise message passing, state management and tool calling.

‍

Defining some key agentic terminology:

Message Passing: This refers to how agents communicate with each other and pass data between them. Frameworks like AutoGen, Swarm, and LangGraph all refer to this as handoffs. Consistent handoff logic is critical to ensure the right data gets to the right agent at the right time.

‍
Example: In our spam classification workflow, the BERT Agent passes its prediction and the input message to the GPT Agent for analysis.

‍

State Management: This is about how the system tracks and updates the data (state) being passed between agents. Different frameworks handle data differently—LangGraph calls it state, PydanticAI refers to it as dependencies, Swarm uses context variables and CrewAI calls it expected output. Effective state management ensures the workflow’s data remains consistent and accessible across agents.

‍

Example: The Output Agent receives all the data from previous agents, including the input message, BERT’s prediction, GPT’s analysis, and conditionally, the human feedback and retraining data.

‍

Tools: Tools are functions that agents can use to call external libraries, APIs, or custom code. They help separate out logic that doesn’t need to be AI-powered, making agent behavior more deterministic and reliable. Example: A custom tool checks if GPT’s response disagrees with BERT’s prediction by searching for the word “disagree” in GPT’s explanation.

‍

In the next part of this blog post, we’ll dive into a detailed comparison of LangGraph, AutoGen, PydanticAI, CrewAI, Swarm, and SmolAgents—evaluating their strengths and weaknesses to determine which framework is best suited for specific use cases.

Agents

LLMs

About the author:

Tejas Gopal, Co-founder, COO & Head of Engineering

Tejas engineers enterprise-grade AI solutions at Prompt Inversion, leveraging MLOps strategies to ensure seamless integration and robust scalability. Tejas' background in data engineering extends to developing customized natural language processing and computer vision models. Tejas also brings years of project management and technical leadership experience which allow him to function as COO & Head of Engineering.

Choosing the Right Agentic Framework I

Multi-Agent Spam Classification Workflow

Defining some key agentic terminology:

Recent blog posts

Synthetic Voice Scams: A New Frontier in Fraud

AI Embeddings: Not Necessarily Secure Anymore

Keeping up with AI Advances