Intermediate · 1-2 weeks

Building AI-Powered Applications

Build applications powered by LLMs, RAG, and AI agents using Claude Code, Cursor, and modern AI frameworks.

Last reviewed Feb 27, 2026

Overview

This cookbook covers adding AI capabilities to your applications — from simple chatbots to full RAG systems, AI assistants, and autonomous agents. Whether you're augmenting an existing product or building from scratch, you'll find a workflow for your use case. Target audience: Developers who want to add AI features to existing or new apps Expected outcome: A working app with integrated AI features — chatbot, RAG pipeline, or agent — deployed to production

What You'll Need

SDKs & Frameworks

Vercel AI SDK — streaming chat UI + provider abstraction
LangChain — agent orchestration, tool use, chains
LlamaIndex — RAG pipelines, document ingestion, query engines
CrewAI / AutoGen — multi-agent systems
Ollama — local model inference API Keys
OpenAI or Anthropic API key Vector Databases
ChromaDB — local/embedded, great for dev
Pinecone — managed, production-grade
Weaviate — open-source, hybrid search Dev Tools
Claude Code or Cursor for AI-assisted development

Choosing Your AI Architecture

Use Case	Best Stack	Approx. Code Volume
Simple chat interface	Vercel AI SDK	~50 lines
RAG (retrieval-augmented generation)	LlamaIndex • ChromaDB	~150 lines
Complex agents with tools	LangChain • LangGraph	~300 lines
Multi-agent systems	CrewAI or AutoGen	~200+ lines
Local / private AI	Ollama • LlamaIndex	~100 lines

Rule of thumb: Start with the simplest architecture that solves your problem. Multi-agent systems are powerful but add significant complexity. A well-prompted single LLM call often beats an elaborate agent chain.

Workflow 1: Building a Chatbot (Vercel AI SDK)

Step 1 — Install dependencies

npm install ai @ai-sdk/openai

Step 2 — Create an API route with streaming

// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    system: 'You are a helpful assistant.',
  });

  return result.toDataStreamResponse();
}

Step 3 — Build the UI with useChat()

// app/page.tsx
'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    
      {messages.map(m => (
        
          <strong>{m.role}:</strong> {m.content}
        
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    
  );
}

Step 4 — Add provider-agnostic model switching

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';

// Switch providers with one line:
const model = process.env.AI_PROVIDER === 'anthropic'
  ? anthropic('claude-3-5-sonnet-20241022')
  : openai('gpt-4o');

Step 5 — Deploy to Vercel

npx vercel deploy

Add OPENAI_API_KEY (or equivalent) to your Vercel environment variables. The AI SDK's streaming responses work natively with Vercel's Edge Runtime.

Docs: Vercel AI SDK Quickstart

Workflow 2: Building a RAG Application (LlamaIndex)

Step 1 — Document ingestion and chunking

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

# Load documents
documents = SimpleDirectoryReader('./docs').load_data()

# Chunk into nodes
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)

Step 2 — Embedding generation

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model='text-embedding-3-small')
# Or use a local model:
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# embed_model = HuggingFaceEmbedding(model_name='BAAI/bge-small-en-v1.5')

Step 3 — Vector store setup with ChromaDB persistence

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

chroma_client = chromadb.PersistentClient(path='./chroma_db')
collection = chroma_client.get_or_create_collection('my_docs')
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes,
    storage_context=storage_context,
    embed_model=embed_model
)

Step 4 — Query engine creation

query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode='compact'
)
response = query_engine.query('What is the refund policy?')
print(response)

Step 5 — Multi-turn chat with memory

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

memory = ChatMemoryBuffer.from_defaults(token_limit=4096)
chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    verbose=True
)

response = chat_engine.chat('What is the refund policy?')
response2 = chat_engine.chat('Can I get a refund after 30 days?')  # uses memory

Step 6 — Production deployment considerations

Swap ChromaDB for Pinecone or Weaviate for scalability
Cache embeddings: don't re-embed unchanged documents
Use async query engines for concurrent requests
Monitor token usage with LangSmith or Arize Phoenix

Docs: LlamaIndex RAG Guide

Workflow 3: Building an AI Agent (LangChain)

Step 1 — Define tools

from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

@tool
def query_database(sql: str) -> str:
    """Execute a read-only SQL query and return results as JSON."""
    # connect to DB, run query
    return results_as_json

@tool
def call_api(endpoint: str, params: dict) -> str:
    """Call an internal REST API endpoint."""
    import requests
    return requests.get(endpoint, params=params).json()

search = DuckDuckGoSearchRun()
tools = [query_database, call_api, search]

Step 2 — Create agent with ReAct pattern

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm = ChatAnthropic(model='claude-3-5-sonnet-20241022')
prompt = hub.pull('hwchase17/react')
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True
)

Step 3 — Add memory and conversation history

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=10,  # keep last 10 exchanges
    return_messages=True
)
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory)

Step 4 — Implement guardrails and error handling

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,       # prevent infinite loops
    max_execution_time=30,   # 30 second timeout
    handle_parsing_errors=True,
    early_stopping_method='generate'
)

Step 5 — Deploy as API endpoint

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    question: str
    session_id: str

@app.post('/agent')
async def run_agent(query: Query):
    result = agent_executor.invoke({'input': query.question})
    return {'output': result['output']}

Docs: LangChain Agents

5 Core Agentic Design Patterns

Pattern	How It Works	Best For
Tool Use	Agent decides which tools to call based on the task	External APIs, databases, web search
Reflection	Agent reviews and critiques its own output, then revises	Writing, code generation, analysis
Planning	Agent decomposes complex tasks into sub-steps before executing	Multi-step research, complex workflows
Multi-Agent	Specialized agents collaborate — orchestrator delegates to specialists	Large codebases, complex domains
ReAct	Alternates between Reason (think) and Act (tool call) steps	General-purpose agents, debugging

Reference: Anthropic's Agentic Patterns — the foundational paper on practical agent design

Adding AI to Existing Apps

Step 1 — Identify where AI adds genuine value

Good fits: freeform text input, summarization, classification, personalization
Poor fits: deterministic calculations, structured data lookup, tasks users can do in 2 clicks Step 2 — Context management strategy
Only send relevant context (use retrieval, not a full data dump)
Summarize long conversation histories before they hit the context limit
Use token counting utilities to budget context Step 3 — Caching and rate limiting

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_completion(prompt_hash: str, prompt: str) -> str:
    return llm.complete(prompt)

# Use semantic caching for similar queries with GPTCache or Redis

Step 4 — Intent classification for routing

# Route simple queries to cheap models, complex ones to powerful models
def route_query(query: str) -> str:
    classification = fast_llm.classify(query, labels=['simple', 'complex'])
    if classification == 'simple':
        return cheap_model.complete(query)
    return powerful_model.complete(query)

Step 5 — Graceful fallbacks

try:
    response = ai_service.complete(prompt, timeout=10)
except (RateLimitError, TimeoutError, APIError):
    response = fallback_response_or_cached_result
    log_ai_failure(prompt, error)

Meta-Agent Patterns

Building agents that build other agents — orchestrators that dynamically spin up specialized sub-agents based on the task at hand.

Framework	Approach	Use Case
MetaGPT	Role-playing agents (PM, engineer, QA) collaborate to build software	Software development simulation
AutoGPT	Recursive task decomposition with persistent memory	Open-ended research and task completion
MetaAgent (FSM-based)	Finite state machine auto-designs agent graph from spec	Enterprise workflow automation
Real enterprise ROI: According to McKinsey's 2024 AI survey, enterprises deploying agentic workflows report 20–40% productivity gains in knowledge work tasks. Anthropic's case studies show specific deployments achieving 3–10x throughput improvements in document processing.

LangChain vs LlamaIndex Decision Guide

Feature	LangChain	LlamaIndex
Primary strength	Agent orchestration, tool use, chains	RAG pipelines, document indexing
Learning curve	Steeper (more abstractions)	Gentler for RAG use cases
RAG quality	Good (via LangChain RAG)	Excellent (native focus)
Agent support	Excellent (LangGraph)	Good (via agent modules)
Community/ecosystem	Very large	Large and growing
Production observability	LangSmith (paid)	Arize Phoenix (free tier)
Best for	Complex agents, multi-step pipelines	Document Q&A, knowledge bases
Decision rule: If your primary use case is document/knowledge retrieval → LlamaIndex. If you need agents with multiple tools and complex orchestration → LangChain + LangGraph. For simple chatbots, use neither — Vercel AI SDK is sufficient.

Common Pitfalls

Pitfall 1: Sending all data with every request Don't stuff your entire database into the system prompt. Use function calling / tool use to let the model retrieve only what it needs. Pitfall 2: Not implementing caching Identical or semantically similar queries are common. Semantic caching with GPTCache can cut API costs 30–60%. Pitfall 3: Ignoring rate limits and costs Set hard budget limits in your API dashboard. Use token estimation before sending requests. Log all usage from day one. Pitfall 4: No fallback when AI fails AI APIs have outages and rate limits. Always implement fallbacks — a cached response, a simplified answer, or a graceful error message. Pitfall 5: Over-engineering Starting with multi-agent systems when a single well-prompted LLM call would work. Per Anthropic's agent design guidance, "the simplest solution is often the most reliable." Multi-agent adds latency, cost, and debugging complexity.

Related tools

Related skills

MCP servers used

No linked MCP servers yet.