← Back to cookbooks

Intermediate · 1-2 weeks

Building AI-Powered Applications

Build applications powered by LLMs, RAG, and AI agents using Claude Code, Cursor, and modern AI frameworks.

Last reviewed Feb 27, 2026

Overview

This cookbook covers adding AI capabilities to your applications — from simple chatbots to full RAG systems, AI assistants, and autonomous agents. Whether you're augmenting an existing product or building from scratch, you'll find a workflow for your use case. Target audience: Developers who want to add AI features to existing or new apps Expected outcome: A working app with integrated AI features — chatbot, RAG pipeline, or agent — deployed to production


What You'll Need

SDKs & Frameworks


Choosing Your AI Architecture

Use Case Best Stack Approx. Code Volume
Simple chat interface Vercel AI SDK ~50 lines
RAG (retrieval-augmented generation) LlamaIndexChromaDB ~150 lines
Complex agents with tools LangChainLangGraph ~300 lines
Multi-agent systems CrewAI or AutoGen ~200+ lines
Local / private AI OllamaLlamaIndex ~100 lines

Rule of thumb: Start with the simplest architecture that solves your problem. Multi-agent systems are powerful but add significant complexity. A well-prompted single LLM call often beats an elaborate agent chain.


Workflow 1: Building a Chatbot (Vercel AI SDK)

Step 1 — Install dependencies

npm install ai @ai-sdk/openai

Step 2 — Create an API route with streaming

// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    system: 'You are a helpful assistant.',
  });

  return result.toDataStreamResponse();
}

Step 3 — Build the UI with useChat()

// app/page.tsx
'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    
      {messages.map(m => (
        
          <strong>{m.role}:</strong> {m.content}
        
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    
  );
}

Step 4 — Add provider-agnostic model switching

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';

// Switch providers with one line:
const model = process.env.AI_PROVIDER === 'anthropic'
  ? anthropic('claude-3-5-sonnet-20241022')
  : openai('gpt-4o');

Step 5 — Deploy to Vercel

npx vercel deploy

Add OPENAI_API_KEY (or equivalent) to your Vercel environment variables. The AI SDK's streaming responses work natively with Vercel's Edge Runtime.

Docs: Vercel AI SDK Quickstart


Workflow 2: Building a RAG Application (LlamaIndex)

Step 1 — Document ingestion and chunking

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

# Load documents
documents = SimpleDirectoryReader('./docs').load_data()

# Chunk into nodes
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)

Step 2 — Embedding generation

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model='text-embedding-3-small')
# Or use a local model:
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# embed_model = HuggingFaceEmbedding(model_name='BAAI/bge-small-en-v1.5')

Step 3 — Vector store setup with ChromaDB persistence

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

chroma_client = chromadb.PersistentClient(path='./chroma_db')
collection = chroma_client.get_or_create_collection('my_docs')
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes,
    storage_context=storage_context,
    embed_model=embed_model
)

Step 4 — Query engine creation

query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode='compact'
)
response = query_engine.query('What is the refund policy?')
print(response)

Step 5 — Multi-turn chat with memory

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

memory = ChatMemoryBuffer.from_defaults(token_limit=4096)
chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    verbose=True
)

response = chat_engine.chat('What is the refund policy?')
response2 = chat_engine.chat('Can I get a refund after 30 days?')  # uses memory

Step 6 — Production deployment considerations

  • Swap ChromaDB for Pinecone or Weaviate for scalability
  • Cache embeddings: don't re-embed unchanged documents
  • Use async query engines for concurrent requests
  • Monitor token usage with LangSmith or Arize Phoenix

Docs: LlamaIndex RAG Guide


Workflow 3: Building an AI Agent (LangChain)

Step 1 — Define tools

from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

@tool
def query_database(sql: str) -> str:
    """Execute a read-only SQL query and return results as JSON."""
    # connect to DB, run query
    return results_as_json

@tool
def call_api(endpoint: str, params: dict) -> str:
    """Call an internal REST API endpoint."""
    import requests
    return requests.get(endpoint, params=params).json()

search = DuckDuckGoSearchRun()
tools = [query_database, call_api, search]

Step 2 — Create agent with ReAct pattern

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm = ChatAnthropic(model='claude-3-5-sonnet-20241022')
prompt = hub.pull('hwchase17/react')
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True
)

Step 3 — Add memory and conversation history

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=10,  # keep last 10 exchanges
    return_messages=True
)
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory)

Step 4 — Implement guardrails and error handling

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,       # prevent infinite loops
    max_execution_time=30,   # 30 second timeout
    handle_parsing_errors=True,
    early_stopping_method='generate'
)

Step 5 — Deploy as API endpoint

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    question: str
    session_id: str

@app.post('/agent')
async def run_agent(query: Query):
    result = agent_executor.invoke({'input': query.question})
    return {'output': result['output']}

Docs: LangChain Agents


5 Core Agentic Design Patterns

Pattern How It Works Best For
Tool Use Agent decides which tools to call based on the task External APIs, databases, web search
Reflection Agent reviews and critiques its own output, then revises Writing, code generation, analysis
Planning Agent decomposes complex tasks into sub-steps before executing Multi-step research, complex workflows
Multi-Agent Specialized agents collaborate — orchestrator delegates to specialists Large codebases, complex domains
ReAct Alternates between Reason (think) and Act (tool call) steps General-purpose agents, debugging

Reference: Anthropic's Agentic Patterns — the foundational paper on practical agent design


Adding AI to Existing Apps

Step 1 — Identify where AI adds genuine value

  • Good fits: freeform text input, summarization, classification, personalization
  • Poor fits: deterministic calculations, structured data lookup, tasks users can do in 2 clicks Step 2 — Context management strategy
  • Only send relevant context (use retrieval, not a full data dump)
  • Summarize long conversation histories before they hit the context limit
  • Use token counting utilities to budget context Step 3 — Caching and rate limiting
from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_completion(prompt_hash: str, prompt: str) -> str:
    return llm.complete(prompt)

# Use semantic caching for similar queries with GPTCache or Redis

Step 4 — Intent classification for routing

# Route simple queries to cheap models, complex ones to powerful models
def route_query(query: str) -> str:
    classification = fast_llm.classify(query, labels=['simple', 'complex'])
    if classification == 'simple':
        return cheap_model.complete(query)
    return powerful_model.complete(query)

Step 5 — Graceful fallbacks

try:
    response = ai_service.complete(prompt, timeout=10)
except (RateLimitError, TimeoutError, APIError):
    response = fallback_response_or_cached_result
    log_ai_failure(prompt, error)

Meta-Agent Patterns

Building agents that build other agents — orchestrators that dynamically spin up specialized sub-agents based on the task at hand.

Framework Approach Use Case
MetaGPT Role-playing agents (PM, engineer, QA) collaborate to build software Software development simulation
AutoGPT Recursive task decomposition with persistent memory Open-ended research and task completion
MetaAgent (FSM-based) Finite state machine auto-designs agent graph from spec Enterprise workflow automation
Real enterprise ROI: According to McKinsey's 2024 AI survey, enterprises deploying agentic workflows report 20–40% productivity gains in knowledge work tasks. Anthropic's case studies show specific deployments achieving 3–10x throughput improvements in document processing.

LangChain vs LlamaIndex Decision Guide

Feature LangChain LlamaIndex
Primary strength Agent orchestration, tool use, chains RAG pipelines, document indexing
Learning curve Steeper (more abstractions) Gentler for RAG use cases
RAG quality Good (via LangChain RAG) Excellent (native focus)
Agent support Excellent (LangGraph) Good (via agent modules)
Community/ecosystem Very large Large and growing
Production observability LangSmith (paid) Arize Phoenix (free tier)
Best for Complex agents, multi-step pipelines Document Q&A, knowledge bases
Decision rule: If your primary use case is document/knowledge retrieval → LlamaIndex. If you need agents with multiple tools and complex orchestration → LangChain + LangGraph. For simple chatbots, use neither — Vercel AI SDK is sufficient.

Common Pitfalls

Pitfall 1: Sending all data with every request Don't stuff your entire database into the system prompt. Use function calling / tool use to let the model retrieve only what it needs. Pitfall 2: Not implementing caching Identical or semantically similar queries are common. Semantic caching with GPTCache can cut API costs 30–60%. Pitfall 3: Ignoring rate limits and costs Set hard budget limits in your API dashboard. Use token estimation before sending requests. Log all usage from day one. Pitfall 4: No fallback when AI fails AI APIs have outages and rate limits. Always implement fallbacks — a cached response, a simplified answer, or a graceful error message. Pitfall 5: Over-engineering Starting with multi-agent systems when a single well-prompted LLM call would work. Per Anthropic's agent design guidance, "the simplest solution is often the most reliable." Multi-agent adds latency, cost, and debugging complexity.

MCP servers used

  • No linked MCP servers yet.