Building Real-Time AI Chat with Pydantic AI and Django

How to make AI agents stream, persist messages, and use tools—all in a Django backend

I've been building AI chat applications for a while now. Most tutorials show you the basics. They skip the hard parts. Like, how do you actually persist messages? How do you handle streaming without blocking your server? How do you give agents tools that can modify your database?

This post walks through how we solved these problems. We built a production chat system using Pydantic AI, Django, and the Vercel AI SDK. I'll show you the key patterns that made it work.

The Stack at a Glance

Before we dive in, here's what we're working with:

Backend: Django 5 + Pydantic AI
Frontend: React with @ai-sdk/react
Protocol: Vercel AI Data Stream Protocol
Storage: PostgreSQL via Django ORM

The magic happens when these pieces work together. The backend streams AI responses. The frontend renders them in real-time. Messages get saved to the database. Tools can read and write data. Let's see how.

Part 1: The Streaming Handler

The heart of our system is a single async view. It handles everything: authentication, message history, tool execution, and streaming.

Here's the simplified flow:

from django.http import StreamingHttpResponse
from pydantic_ai import Agent

async def handle_chat(request) -> StreamingHttpResponse:
    # 1. Parse the request
    body = json.loads(request.body)
    thread_id = body.get("threadId")
    
    # 2. Get or create a conversation thread
    thread = await get_or_create_thread(user, thread_id)
    
    # 3. Load message history from database
    message_history = load_messages_from_db(thread)
    
    # 4. Create the agent with tools
    agent = Agent(
        model="claude-sonnet-4-20250514",
        system_prompt="You are a helpful assistant.",
        toolsets=[content_toolset],
    )
    
    # 5. Stream the response
    return StreamingHttpResponse(
        stream_response(agent, message_history),
        content_type="text/event-stream",
    )

Notice something important. We load history from the database, not from the frontend. The frontend sends messages. We persist them. Then we load all messages from our source of truth. This keeps things consistent.

Part 2: The Vercel AI Adapter

Pydantic AI has a built-in adapter for the Vercel AI protocol. But we needed to customize it. The key issue? Message IDs.

The frontend needs stable IDs to track which message is being streamed. The default adapter generates new IDs for each text chunk. That breaks things.

Our fix is simple:

from pydantic_ai.ui.vercel_ai import VercelAIAdapter
from pydantic_ai.ui.vercel_ai._event_stream import VercelAIEventStream

class CustomVercelAIEventStream(VercelAIEventStream):
    """Keep a stable message ID for the entire response."""
    
    async def handle_text_start(self, part, follows_text=False):
        if not follows_text:
            yield TextStartChunk(id=self.message_id)
        if part.content:
            yield TextDeltaChunk(id=self.message_id, delta=part.content)

One small change. Big difference in reliability.

Part 3: Storing Messages

Chat messages are tricky to store. You have user messages, assistant messages, tool calls, and tool results. They all have different shapes.

We use a single Message model with a JSONField for content:

class Message(models.Model):
    class Role(models.TextChoices):
        user = "user"
        assistant = "assistant"
        system = "system"
        tool = "tool"
    
    thread = models.ForeignKey("Thread", on_delete=models.CASCADE)
    role = models.CharField(max_length=20, choices=Role.choices)
    content = models.JSONField(default=list)  # Stores message parts
    metadata = models.JSONField(default=dict)
    created = models.DateTimeField(auto_now_add=True)

The content field stores Pydantic AI's native message format. This is key. We don't try to flatten or transform it. We just serialize it directly:

from pydantic_ai.messages import ModelMessagesTypeAdapter

def serialize_message(msg):
    """Convert a Pydantic AI message to JSON."""
    return json.loads(ModelMessagesTypeAdapter.dump_json([msg]))[0]

def deserialize_messages(data):
    """Convert JSON back to Pydantic AI messages."""
    return ModelMessagesTypeAdapter.validate_python(data)

When a chat completes, we persist the new messages:

async def on_complete(result):
    for model_msg in result.new_messages():
        role = "assistant" if model_msg.kind == "response" else "user"
        await thread.add_message(
            role=role,
            content=serialize_message(model_msg),
        )

Load them back before the next request. The agent sees the full conversation.

Part 4: Adding Tools

Tools are where things get fun. You can give your agent access to your Django models. It can query data, create records, update content.

We use Pydantic AI's FunctionToolset with a twist. Our SafeFunctionToolset catches errors gracefully:

class SafeFunctionToolset(FunctionToolset):
    """A toolset that returns errors instead of crashing."""
    
    async def call_tool(self, name, tool_args, ctx, tool):
        try:
            return await super().call_tool(name, tool_args, ctx, tool)
        except Exception as e:
            # Return error as data, not an exception
            return {
                "error": str(e),
                "error_type": type(e).__name__,
                "message": f"Tool '{name}' failed: {e}",
            }

Why does this matter? If a tool throws an exception, the whole stream dies. The user sees an error. Bad experience.

With SafeFunctionToolset, the agent sees the error. It can explain what went wrong. It can try a different approach. Much better.

Here's how you define a tool:

content_toolset = SafeFunctionToolset()

@content_toolset.tool
def list_pages(
    page_type: str | None = None,
    limit: int = 50,
) -> dict:
    """List pages with optional filtering.
    
    Args:
        page_type: Filter by type (blog, docs, etc.)
        limit: Max pages to return
        
    Returns:
        Dict with pages list and pagination info
    """
    qs = Page.objects.all()
    if page_type:
        qs = qs.filter(page_type=page_type)
    
    pages = qs[:limit]
    return {
        "total": qs.count(),
        "pages": [{"id": str(p.id), "title": p.title} for p in pages]
    }

The docstring becomes the tool description. Pydantic AI parses the type hints. The agent knows exactly how to call it.

Part 5: The Frontend

On the frontend, we use the Vercel AI SDK's useChat hook. It handles streaming, message state, and error handling.

import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";

function ChatView({ threadId }) {
  const transport = new DefaultChatTransport({
    api: "/api/v1/ai/chat/stream",
    credentials: "include",
    body: () => ({
      threadId,
      model: "claude-sonnet-4-20250514",
    }),
  });
  
  const { messages, sendMessage, status } = useChat({ transport });
  
  return (
    <div>
      {messages.map((msg) => (
        <Message key={msg.id} message={msg} />
      ))}
      <ChatInput 
        onSubmit={sendMessage} 
        disabled={status === "streaming"} 
      />
    </div>
  );
}

The transport object configures how requests go to your backend. We include credentials for session auth. We pass the thread ID and model in the body.

Part 6: Handling Events

The backend can send custom events during streaming. We use this for run status updates.

def encode_data_part(part_type: str, data: dict) -> str:
    """Encode a Vercel AI data stream event."""
    payload = {"type": part_type, "data": data}
    return f"2:{json.dumps([payload])}\n"

We emit events at key moments:

# When streaming starts
yield encode_data_part("data-thread_status", {
    "threadId": str(thread.id),
    "runStatus": "running",
})

# When streaming completes
yield encode_data_part("data-thread_status", {
    "threadId": str(thread.id),
    "runStatus": "complete",
})

The frontend consumes these in onData:

useChat({
  transport,
  onData: (data) => {
    for (const part of data) {
      if (part.type === "data-thread_status") {
        updateThreadStatus(part.data.threadId, part.data.runStatus);
      }
    }
  },
});

Now the UI can show spinners, update sidebar badges, or trigger refreshes. All in sync with the backend.

Part 7: Async + Django = Careful

One gotcha that tripped us up: Django's async support is still evolving.

Database queries in async views need sync_to_async. Middleware that sets request.tenant might not work in async contexts. Session data might not be available.

Our pattern:

from asgiref.sync import sync_to_async

async def handle_chat(request):
    # Explicitly grab tenant from session
    tenant_id = await sync_to_async(
        lambda: request.session.get("tenant_id")
    )()
    
    # Wrap all DB queries
    thread = await sync_to_async(Thread.objects.get)(
        id=thread_id,
        tenant_id=tenant_id,
    )

Verbose? Yes. Reliable? Also yes.

Bringing It Together

Here's what we built:

Streaming handler that loads history and runs the agent
Custom adapter for stable message IDs
Message storage using Pydantic AI's native format
Safe toolsets that handle errors gracefully
Frontend hooks that consume the stream
Event system for status updates

It's not magic. It's just careful plumbing between well-designed libraries.

The code in this post is simplified from our production system. The full implementation handles more edge cases: partial message recovery on errors, tool approval workflows, file attachments, multiple toolsets, and more.

But the patterns are the same. Store messages in their native format. Stream with stable IDs. Catch tool errors. Send status events. Keep the frontend and backend in sync.

If you're building AI into a Django app, I hope this helps. The pieces fit together well once you know how they connect.

Built with Pydantic AI, Django, and the Vercel AI SDK.