\n\n\n\n Im Building Useful Multi-Step AI Agents for My Workflow - AgntBox Im Building Useful Multi-Step AI Agents for My Workflow - AgntBox \n

Im Building Useful Multi-Step AI Agents for My Workflow

📖 10 min read1,962 wordsUpdated Mar 27, 2026

Hey everyone, Nina here from agntbox.com! Hope you’re all having a productive week. Today, I’m diving into something that’s been buzzing around my workflow a lot lately: AI agents. Specifically, I want to talk about how we can make these agents not just smart, but truly useful in a way that goes beyond a single, isolated task. We’re going to explore a framework for building multi-step, stateful AI agents.

I know, I know. “Framework” can sound a bit dry, but stick with me. This isn’t about some enterprise-level, complex system. It’s about a practical approach I’ve been refining to get my agents to do more than just answer a query or summarize a document. I want them to remember what we talked about, follow up on tasks, and even adapt their behavior based on ongoing interactions. Think of it less as a rigid structure and more as a mental model for designing better AI companions.

The problem I kept running into with simpler agent setups was context loss. I’d ask an agent to do X, and it would do X. Then I’d ask it to do Y, which depended on X, and it would often act like we’d never discussed X. It was like talking to someone with short-term memory loss. Frustrating, right?

The Challenge: Beyond Single-Shot Interactions

My initial forays into building AI agents often followed a straightforward pattern: user input -> LLM call -> agent output. This works great for simple tasks, like generating a quick email draft or finding a specific piece of information. But what if the task requires several steps, where each step’s outcome influences the next? What if the agent needs to remember preferences, past actions, or ongoing conversations?

Let’s say I want an agent to help me manage my blog’s content calendar. A simple agent might generate a blog post idea. But I want it to:

  • Suggest topics based on recent trends (which it needs to research).
  • Draft an outline for a chosen topic.
  • Critique my draft based on specific guidelines (e.g., SEO, tone).
  • Suggest improvements and then track if I’ve implemented them.
  • Remind me about upcoming deadlines.

That’s not a single-shot interaction. That’s a conversation, a collaboration. And that’s where the idea for a more structured approach started to crystallize.

Introducing the “Task-State-Tool” Framework

After a lot of trial and error, I’ve landed on a conceptual framework I’m calling “Task-State-Tool.” It’s not a library or a specific piece of software (though you can certainly build it with existing tools). It’s a way of thinking about agent design that makes them more capable and persistent.

1. Tasks: Defining the Agent’s Purpose

Every agent needs a clear purpose. Instead of just a prompt, I think of this as a “Task Definition.” This defines what the agent is supposed to achieve over a potentially long interaction. It’s the agent’s north star.

For my content calendar agent, a high-level task might be: “Assist Nina in managing her blog’s content creation process, from idea generation to publication, ensuring quality and timely delivery.”

Underneath this main task, there are sub-tasks. These are smaller, discrete actions the agent can take. For example:

  • GenerateTopicIdeas
  • OutlineBlogPost
  • ReviewDraft
  • TrackProgress
  • SendReminder

Each sub-task needs its own clear objective and expected output.

2. State: The Agent’s Memory and Context

This is where the magic happens for persistence. The “State” is essentially the agent’s working memory. It’s a structured data store that holds all the information the agent needs to remember across interactions. This isn’t just the raw chat history; it’s parsed, organized information that’s relevant to the ongoing task.

Think of it as a dictionary or a JSON object that gets updated after every significant action. For our content agent, the state might include:

  • current_blog_post_id: The ID of the post currently being worked on.
  • topic_suggestions: A list of previously suggested topics.
  • selected_topic: The topic Nina chose.
  • outline_generated: Boolean, true if an outline exists.
  • draft_status: “pending,” “reviewed,” “revisions_needed.”
  • deadlines: A dictionary of post IDs to their due dates.
  • user_preferences: Nina’s preferred tone, length, SEO keywords.

Updating the state isn’t just about appending new messages. It often involves the LLM parsing the conversation and extracting key entities, decisions, or changes in status. This structured state is what allows the agent to pick up exactly where it left off, even if I close my browser and come back later.

Here’s a simplified Python example of how a state object might look and be updated:


class AgentState:
 def __init__(self):
 self.data = {
 "current_blog_post_id": None,
 "topic_suggestions": [],
 "selected_topic": None,
 "outline_generated": False,
 "draft_status": "not_started", # "not_started", "pending", "reviewed", "revisions_needed", "published"
 "deadlines": {}, # {post_id: date_str}
 "user_preferences": {
 "tone": "conversational",
 "length": "1000-1500 words",
 "keywords": ["AI tools", "tech blogging"]
 }
 }

 def update(self, key, value):
 if key in self.data:
 self.data[key] = value
 print(f"State updated: {key} = {value}")
 else:
 print(f"Warning: Attempted to update non-existent state key: {key}")

 def get(self, key):
 return self.data.get(key)

# Example usage:
my_state = AgentState()
print(my_state.get("draft_status")) # Output: not_started

# Imagine LLM parses "I want to work on a post about LLM frameworks" and identifies a new post.
my_state.update("current_blog_post_id", "post_123")
my_state.update("selected_topic", "LLM Frameworks")
my_state.update("draft_status", "pending")

print(my_state.get("selected_topic")) # Output: LLM Frameworks

3. Tools: The Agent’s Capabilities

Tools are the agent’s hands and feet. These are discrete functions or APIs that the agent can call to perform actions in the real world (or its simulated world). The LLM acts as the brain, deciding which tool to use and when, based on the current user input and the agent’s internal state.

For our content agent, tools might include:

  • SearchInternet(query): To research recent trends or specific facts.
  • GenerateOutline(topic, preferences): Takes a topic and user preferences, returns an outline.
  • CritiqueText(text, guidelines): Takes a draft and guidelines, returns feedback.
  • SaveDocument(content, post_id): Saves generated content to a database or file.
  • SetReminder(post_id, date): Integrates with a calendar API.
  • FetchPostContent(post_id): Retrieves previous content.

The crucial part here is that the LLM needs to be explicitly told about these tools, including their names, descriptions, and expected parameters. This is often done via function calling mechanisms available in modern LLMs (like OpenAI’s function calling or Google’s Gemini tool calling).

Here’s a simplified Python example of defining a tool for an LLM:


def get_current_weather(location: str):
 """Get the current weather in a given location.
 
 Args:
 location: The city and state, e.g. San Francisco, CA
 """
 # In a real scenario, this would call a weather API
 if "london" in location.lower():
 return {"temperature": "10 Celsius", "forecast": "cloudy"}
 elif "new york" in location.lower():
 return {"temperature": "50 Fahrenheit", "forecast": "sunny"}
 else:
 return {"temperature": "unknown", "forecast": "unavailable"}

tools = [
 {
 "type": "function",
 "function": {
 "name": "get_current_weather",
 "description": "Get the current weather in a given location",
 "parameters": {
 "type": "object",
 "properties": {
 "location": {
 "type": "string",
 "description": "The city and state, e.g. San Francisco, CA",
 }
 },
 "required": ["location"],
 },
 },
 }
]

# When an LLM decides to call this tool, it would return something like:
# {"name": "get_current_weather", "arguments": {"location": "London, UK"}}
# Your code then executes get_current_weather("London, UK")

How It All Connects: The Agent’s Loop

So, how do these three pieces work together? It’s a continuous loop:

  1. User Input: Nina sends a message to the agent.
  2. Context Building: The agent takes the user input, the current AgentState, and the Task Definition. It combines these to create a comprehensive prompt for the LLM.
  3. LLM Decision: The LLM processes the prompt. Based on the task, state, and available tools, it decides:
    • What to say back to Nina.
    • Which tool(s) to call (if any), and with what arguments.
    • How to update the AgentState.
  4. Tool Execution (Optional): If the LLM decided to call a tool, the agent executes it. The result of the tool’s execution is then fed back into the loop.
  5. State Update: The AgentState is updated based on the LLM’s decision and/or the tool’s output.
  6. Agent Output: The agent responds to Nina, potentially asking for clarification or confirming an action.
  7. Repeat: The loop continues with Nina’s next input.

This loop allows the agent to maintain context, perform multi-step operations, and even correct itself if an action doesn’t go as planned. It’s a much more robust way to build truly helpful AI assistants.

A Practical Example: “Outline My Next Post”

Let’s walk through a simple scenario with my blog content agent:

Nina: “Hey, I need an outline for my next post about the Task-State-Tool framework. Keep it around 1500 words, conversational tone, and make sure to include practical examples.”

  1. User Input: “Outline for Task-State-Tool framework post…”
  2. Context Building: Agent sees the current AgentState (which might be empty or have previous interactions). It combines this with the user’s request and its primary task of content management.
  3. LLM Decision: The LLM, seeing the phrase “outline for my next post” and knowing the OutlineBlogPost tool, decides to call it. It extracts “Task-State-Tool framework” as the topic, “1500 words” for length, “conversational” for tone, and notes “practical examples.” It also identifies this as a new post and suggests an update to current_blog_post_id and selected_topic in the state.
  4. Tool Execution: The agent calls GenerateOutline(topic="Task-State-Tool framework", length="1500 words", tone="conversational", keywords=["practical examples"]).
  5. State Update: The agent updates its state:
    • current_blog_post_id: “post_124” (a newly generated ID)
    • selected_topic: “Task-State-Tool framework”
    • outline_generated: True
    • draft_status: “pending_outline_review”

    The outline returned by the tool is also stored, perhaps in a separate content store linked by post_124.

  6. Agent Output: “Okay, Nina. I’ve generated an outline for your post on the ‘Task-State-Tool framework,’ aiming for about 1500 words and a conversational tone with practical examples. I’ve saved it as ‘post_124’. Would you like me to display it here, or perhaps suggest some sub-sections?”

Now, if I respond with “Show me the outline,” the agent doesn’t need to re-understand the topic or my preferences. It simply retrieves the stored outline associated with post_124 from its state and displays it. This is the power of stateful agents.

Actionable Takeaways for Your Own Agents

If you’re looking to build more capable and persistent AI agents, here’s what I recommend based on my experiences with the Task-State-Tool framework:

  1. Start with a Clear Task Definition: Before writing any code, clearly define what you want your agent to achieve over its entire lifecycle. Break it down into sub-tasks.
  2. Design Your State Schema Early: Think about all the pieces of information your agent will need to remember. What are the key entities, statuses, and user preferences? Structure this as a dictionary or a simple class.
  3. Identify Necessary Tools: What external actions does your agent need to take? Map these to specific functions or API calls. Make sure your LLM can understand how to call them (descriptions and parameters are key).
  4. Embrace the Loop: Understand that agent interaction isn’t a single call. It’s a continuous process of input, decision, action, and state update.
  5. Iterate and Refine State Updates: This is often the trickiest part. How does your LLM reliably extract information from user input to update the state? You might need a separate, smaller LLM call just for state parsing, or careful prompt engineering.
  6. Don’t Over-Engineer: Start simple. You don’t need a complex database for your state initially. A JSON file or an in-memory dictionary can work for prototypes. Scale up as your agent’s complexity grows.

Building agents with this kind of persistent memory and capability moves us beyond simple chatbots to truly intelligent assistants. It’s a journey, and there will be bumps, but the results are incredibly rewarding. Give it a try, and let me know what you build!

Until next time, keep experimenting!

🕒 Published:

🧰
Written by Jake Chen

Software reviewer and AI tool expert. Independently tests and benchmarks AI products. No sponsored reviews — ever.

Learn more →
Browse Topics: AI & Automation | Comparisons | Dev Tools | Infrastructure | Security & Monitoring

More AI Agent Resources

AgntzenClawseoClawgoAgntup
Scroll to Top