Intermediate · 1-2 weeks
Building AI-Powered Applications
Build applications powered by LLMs, RAG, and AI agents using Claude Code, Cursor, and modern AI frameworks.
Last reviewed Feb 27, 2026
Overview
This cookbook covers adding AI capabilities to your applications — from simple chatbots to full RAG systems, AI assistants, and autonomous agents. Whether you're augmenting an existing product or building from scratch, you'll find a workflow for your use case. Target audience: Developers who want to add AI features to existing or new apps Expected outcome: A working app with integrated AI features — chatbot, RAG pipeline, or agent — deployed to production
What You'll Need
SDKs & Frameworks
- Vercel AI SDK — streaming chat UI + provider abstraction
- LangChain — agent orchestration, tool use, chains
- LlamaIndex — RAG pipelines, document ingestion, query engines
- CrewAI / AutoGen — multi-agent systems
- Ollama — local model inference API Keys
- OpenAI or Anthropic API key Vector Databases
- ChromaDB — local/embedded, great for dev
- Pinecone — managed, production-grade
- Weaviate — open-source, hybrid search Dev Tools
- Claude Code or Cursor for AI-assisted development
Choosing Your AI Architecture
| Use Case | Best Stack | Approx. Code Volume |
|---|---|---|
| Simple chat interface | Vercel AI SDK | ~50 lines |
| RAG (retrieval-augmented generation) | LlamaIndex • ChromaDB | ~150 lines |
| Complex agents with tools | LangChain • LangGraph | ~300 lines |
| Multi-agent systems | CrewAI or AutoGen | ~200+ lines |
| Local / private AI | Ollama • LlamaIndex | ~100 lines |
Rule of thumb: Start with the simplest architecture that solves your problem. Multi-agent systems are powerful but add significant complexity. A well-prompted single LLM call often beats an elaborate agent chain.
Workflow 1: Building a Chatbot (Vercel AI SDK)
Step 1 — Install dependencies
npm install ai @ai-sdk/openai
Step 2 — Create an API route with streaming
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai('gpt-4o'),
messages,
system: 'You are a helpful assistant.',
});
return result.toDataStreamResponse();
}
Step 3 — Build the UI with useChat()
// app/page.tsx
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
{messages.map(m => (
<strong>{m.role}:</strong> {m.content}
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
);
}
Step 4 — Add provider-agnostic model switching
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
// Switch providers with one line:
const model = process.env.AI_PROVIDER === 'anthropic'
? anthropic('claude-3-5-sonnet-20241022')
: openai('gpt-4o');
Step 5 — Deploy to Vercel
npx vercel deploy
Add OPENAI_API_KEY (or equivalent) to your Vercel environment variables. The AI SDK's streaming responses work natively with Vercel's Edge Runtime.
Docs: Vercel AI SDK Quickstart
Workflow 2: Building a RAG Application (LlamaIndex)
Step 1 — Document ingestion and chunking
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
# Load documents
documents = SimpleDirectoryReader('./docs').load_data()
# Chunk into nodes
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)
Step 2 — Embedding generation
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(model='text-embedding-3-small')
# Or use a local model:
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# embed_model = HuggingFaceEmbedding(model_name='BAAI/bge-small-en-v1.5')
Step 3 — Vector store setup with ChromaDB persistence
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
chroma_client = chromadb.PersistentClient(path='./chroma_db')
collection = chroma_client.get_or_create_collection('my_docs')
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
nodes,
storage_context=storage_context,
embed_model=embed_model
)
Step 4 — Query engine creation
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode='compact'
)
response = query_engine.query('What is the refund policy?')
print(response)
Step 5 — Multi-turn chat with memory
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine
memory = ChatMemoryBuffer.from_defaults(token_limit=4096)
chat_engine = CondensePlusContextChatEngine.from_defaults(
index.as_retriever(),
memory=memory,
verbose=True
)
response = chat_engine.chat('What is the refund policy?')
response2 = chat_engine.chat('Can I get a refund after 30 days?') # uses memory
Step 6 — Production deployment considerations
- Swap ChromaDB for Pinecone or Weaviate for scalability
- Cache embeddings: don't re-embed unchanged documents
- Use async query engines for concurrent requests
- Monitor token usage with LangSmith or Arize Phoenix
Docs: LlamaIndex RAG Guide
Workflow 3: Building an AI Agent (LangChain)
Step 1 — Define tools
from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
@tool
def query_database(sql: str) -> str:
"""Execute a read-only SQL query and return results as JSON."""
# connect to DB, run query
return results_as_json
@tool
def call_api(endpoint: str, params: dict) -> str:
"""Call an internal REST API endpoint."""
import requests
return requests.get(endpoint, params=params).json()
search = DuckDuckGoSearchRun()
tools = [query_database, call_api, search]
Step 2 — Create agent with ReAct pattern
from langchain_anthropic import ChatAnthropic
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
llm = ChatAnthropic(model='claude-3-5-sonnet-20241022')
prompt = hub.pull('hwchase17/react')
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors=True
)
Step 3 — Add memory and conversation history
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(
memory_key='chat_history',
k=10, # keep last 10 exchanges
return_messages=True
)
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory)
Step 4 — Implement guardrails and error handling
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=10, # prevent infinite loops
max_execution_time=30, # 30 second timeout
handle_parsing_errors=True,
early_stopping_method='generate'
)
Step 5 — Deploy as API endpoint
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Query(BaseModel):
question: str
session_id: str
@app.post('/agent')
async def run_agent(query: Query):
result = agent_executor.invoke({'input': query.question})
return {'output': result['output']}
Docs: LangChain Agents
5 Core Agentic Design Patterns
| Pattern | How It Works | Best For |
|---|---|---|
| Tool Use | Agent decides which tools to call based on the task | External APIs, databases, web search |
| Reflection | Agent reviews and critiques its own output, then revises | Writing, code generation, analysis |
| Planning | Agent decomposes complex tasks into sub-steps before executing | Multi-step research, complex workflows |
| Multi-Agent | Specialized agents collaborate — orchestrator delegates to specialists | Large codebases, complex domains |
| ReAct | Alternates between Reason (think) and Act (tool call) steps | General-purpose agents, debugging |
Reference: Anthropic's Agentic Patterns — the foundational paper on practical agent design
Adding AI to Existing Apps
Step 1 — Identify where AI adds genuine value
- Good fits: freeform text input, summarization, classification, personalization
- Poor fits: deterministic calculations, structured data lookup, tasks users can do in 2 clicks Step 2 — Context management strategy
- Only send relevant context (use retrieval, not a full data dump)
- Summarize long conversation histories before they hit the context limit
- Use token counting utilities to budget context Step 3 — Caching and rate limiting
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_completion(prompt_hash: str, prompt: str) -> str:
return llm.complete(prompt)
# Use semantic caching for similar queries with GPTCache or Redis
Step 4 — Intent classification for routing
# Route simple queries to cheap models, complex ones to powerful models
def route_query(query: str) -> str:
classification = fast_llm.classify(query, labels=['simple', 'complex'])
if classification == 'simple':
return cheap_model.complete(query)
return powerful_model.complete(query)
Step 5 — Graceful fallbacks
try:
response = ai_service.complete(prompt, timeout=10)
except (RateLimitError, TimeoutError, APIError):
response = fallback_response_or_cached_result
log_ai_failure(prompt, error)
Meta-Agent Patterns
Building agents that build other agents — orchestrators that dynamically spin up specialized sub-agents based on the task at hand.
| Framework | Approach | Use Case |
|---|---|---|
| MetaGPT | Role-playing agents (PM, engineer, QA) collaborate to build software | Software development simulation |
| AutoGPT | Recursive task decomposition with persistent memory | Open-ended research and task completion |
| MetaAgent (FSM-based) | Finite state machine auto-designs agent graph from spec | Enterprise workflow automation |
| Real enterprise ROI: According to McKinsey's 2024 AI survey, enterprises deploying agentic workflows report 20–40% productivity gains in knowledge work tasks. Anthropic's case studies show specific deployments achieving 3–10x throughput improvements in document processing. |
LangChain vs LlamaIndex Decision Guide
| Feature | LangChain | LlamaIndex |
|---|---|---|
| Primary strength | Agent orchestration, tool use, chains | RAG pipelines, document indexing |
| Learning curve | Steeper (more abstractions) | Gentler for RAG use cases |
| RAG quality | Good (via LangChain RAG) | Excellent (native focus) |
| Agent support | Excellent (LangGraph) | Good (via agent modules) |
| Community/ecosystem | Very large | Large and growing |
| Production observability | LangSmith (paid) | Arize Phoenix (free tier) |
| Best for | Complex agents, multi-step pipelines | Document Q&A, knowledge bases |
| Decision rule: If your primary use case is document/knowledge retrieval → LlamaIndex. If you need agents with multiple tools and complex orchestration → LangChain + LangGraph. For simple chatbots, use neither — Vercel AI SDK is sufficient. |
Common Pitfalls
Pitfall 1: Sending all data with every request Don't stuff your entire database into the system prompt. Use function calling / tool use to let the model retrieve only what it needs. Pitfall 2: Not implementing caching Identical or semantically similar queries are common. Semantic caching with GPTCache can cut API costs 30–60%. Pitfall 3: Ignoring rate limits and costs Set hard budget limits in your API dashboard. Use token estimation before sending requests. Log all usage from day one. Pitfall 4: No fallback when AI fails AI APIs have outages and rate limits. Always implement fallbacks — a cached response, a simplified answer, or a graceful error message. Pitfall 5: Over-engineering Starting with multi-agent systems when a single well-prompted LLM call would work. Per Anthropic's agent design guidance, "the simplest solution is often the most reliable." Multi-agent adds latency, cost, and debugging complexity.
Related tools
MCP servers used
- No linked MCP servers yet.
Related cookbooks
AI-Powered Code Review & Quality
Automate code review and enforce quality standards using AI-powered tools and agentic workflows.
Building APIs & Backends with AI Agents
Design and build robust APIs and backend services with AI coding agents, from REST to GraphQL.
Debugging with AI Agents
Systematically debug complex issues using AI coding agents with structured workflows and MCP integrations.