# 🎯 **ULTIMATE AGENTIC RAG APPLICATION PROMPT**
## Complete Production-Ready Agentic RAG Application
I need you to generate a full-stack, production-ready Agentic RAG (Retrieval-Augmented Generation) application in Python. This is a learning project, so please include extensive comments, explanations, and educational content throughout.
## 🏗️ Architecture Requirements
### Frontend Layer
**Technology:** Streamlit
**Features:**
- Modern, intuitive UI for document upload (PDF, DOCX, TXT, Markdown)
- Interactive query interface with real-time streaming responses
- Display of retrieved source documents with confidence scores
- Conversation history panel for multi-turn dialogues
- Source citation display (show which document chunks were used)
- Visual indicators for agent reasoning steps
- Session state management
### Backend Layer
**Technology:** FastAPI
**Features:**
- RESTful API with async/await patterns
- Automatic API documentation (Swagger/OpenAPI)
- CORS middleware for frontend integration
- Proper error handling with custom exception classes
- Structured logging (JSON format)
- Request validation using Pydantic models
- Health check and metrics endpoints
- API key authentication for security
### Core Azure Services
- **Azure OpenAI Service:** GPT-4o or GPT-4-turbo (latest available model)
- **Azure AI Search:** Vector store with hybrid search (vector + keyword)
- **Azure Blob Storage:** Document persistence and management
- **Azure Key Vault:** Secure credential management (optional but recommended)
### Agentic Framework
- **Orchestration:** Use LangChain or LlamaIndex for agent logic
- **Agent Pattern:** ReAct (Reasoning + Acting) or OpenAI function calling
- **Tools/Capabilities:**
- Document retriever tool
- Summarization tool
- Query refinement tool
- Multi-step reasoning with thought traces
- Self-reflection and answer validation
## 📋 Functional Requirements
### 1. Document Ingestion Pipeline
```
User uploads document → Extract text → Intelligent chunking (with overlap) →
Generate embeddings → Store in Azure AI Search + Blob Storage → Return success
```
**Requirements:**
- Support PDF, DOCX, TXT, and Markdown files
- Implement smart chunking strategy (500-1000 tokens, 10-20% overlap)
- Extract and preserve metadata (filename, upload date, page numbers)
- Generate embeddings using Azure OpenAI (text-embedding-3-large or ada-002)
- Create Azure AI Search index with vector and text fields
- Handle large documents (chunking + batch processing)
- Progress indicators during upload
### 2. Agentic Query Processing
```
User query → Agent analyzes → Plans retrieval strategy → Retrieves context →
Reasons about information → Generates response → Cites sources → Returns to user
```
**Requirements:**
- Agent breaks down complex queries into sub-tasks
- Dynamic retrieval: fetch more context if initial results insufficient
- Hybrid search: combine vector similarity + keyword matching + semantic ranking
- Re-ranking of retrieved chunks for relevance
- Multi-step reasoning visible to user (show agent's "thoughts")
- Context-aware responses with proper citations
- Handle follow-up questions using conversation history
### 3. API Endpoints
**Document Management:**
- `POST /api/v1/documents/upload` - Upload and process documents
- `GET /api/v1/documents` - List all indexed documents
- `GET /api/v1/documents/{doc_id}` - Get document details
- `DELETE /api/v1/documents/{doc_id}` - Remove document and chunks
**Query & Chat:**
- `POST /api/v1/query` - Submit query with streaming response
- `POST /api/v1/chat` - Conversational endpoint with history
- `GET /api/v1/chat/history/{session_id}` - Retrieve chat history
**System:**
- `GET /api/v1/health` - Health check
- `GET /api/v1/metrics` - Basic usage metrics
## 🗂️ Project Structure
```
agentic-rag-app/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application entry
│ │ ├── config.py # Configuration and settings
│ │ ├── dependencies.py # Dependency injection
│ │ ├── api/
│ │ │ ├── __init__.py
│ │ │ ├── routes/
│ │ │ │ ├── documents.py # Document endpoints
│ │ │ │ ├── query.py # Query endpoints
│ │ │ │ └── health.py # Health check
│ │ ├── services/
│ │ │ ├── __init__.py
│ │ │ ├── document_processor.py # Text extraction & chunking
│ │ │ ├── embedding_service.py # Azure OpenAI embeddings
│ │ │ ├── search_service.py # Azure AI Search operations
│ │ │ ├── agent_service.py # Agentic orchestration
│ │ │ └── llm_service.py # LLM interactions
│ │ ├── models/
│ │ │ ├── __init__.py
│ │ │ ├── requests.py # Pydantic request models
│ │ │ ├── responses.py # Pydantic response models
│ │ │ └── documents.py # Document data models
│ │ ├── utils/
│ │ │ ├── __init__.py
│ │ │ ├── logging.py # Logging configuration
│ │ │ ├── exceptions.py # Custom exceptions
│ │ │ └── azure_clients.py # Azure SDK clients
│ │ └── core/
│ │ ├── __init__.py
│ │ ├── security.py # Authentication
│ │ └── prompts.py # System prompts
│ ├── tests/
│ │ ├── __init__.py
│ │ └── test_api.py
│ ├── requirements.txt
│ ├── .env.example
│ └── Dockerfile
├── frontend/
│ ├── app.py # Streamlit main app
│ ├── components/
│ │ ├── __init__.py
│ │ ├── upload.py # Upload component
│ │ ├── chat.py # Chat interface
│ │ └── sidebar.py # Sidebar with settings
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── api_client.py # Backend API client
│ │ └── session.py # Session management
│ ├── requirements.txt
│ └── .streamlit/
│ └── config.toml
├── docs/
│ ├── README.md # Main documentation
│ ├── SETUP.md # Detailed setup guide
│ ├── ARCHITECTURE.md # Architecture explanation
│ ├── LEARNING_GUIDE.md # Educational walkthrough
│ └── architecture-diagram.mmd # Mermaid diagram
├── scripts/
│ ├── setup_azure.py # Azure resource setup script
│ └── seed_data.py # Sample data loader
├── sample_documents/
│ └── example.pdf # Test document
├── docker-compose.yml
├── .gitignore
└── README.md
```
## 🔧 Technical Implementation Details
### Technology Stack
- **Python:** 3.11+
- **Backend:** FastAPI 0.110+, uvicorn, python-multipart
- **Frontend:** Streamlit 1.32+
- **Azure SDKs:**
- `openai` (latest)
- `azure-search-documents`
- `azure-storage-blob`
- `azure-identity`
- **Agent Framework:** LangChain 0.1+ or LlamaIndex 0.10+
- **Document Processing:** pypdf2, python-docx, markdown
- **Data Validation:** Pydantic 2.0+
- **Additional:** python-dotenv, httpx, aiohttp
### Chunking Strategy
- Use semantic chunking (sentence-aware)
- Target chunk size: 500-1000 tokens
- Overlap: 10-20% (50-200 tokens)
- Preserve document structure metadata
- Include document title/filename in each chunk
### Embedding Configuration
- Model: `text-embedding-3-large` (3072 dimensions) or `text-embedding-ada-002`
- Batch processing for efficiency
- Normalize vectors for cosine similarity
### Azure AI Search Index Schema
```json
{
"name": "documents-index",
"fields": [
{"name": "id", "type": "Edm.String", "key": true},
{"name": "content", "type": "Edm.String", "searchable": true},
{"name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 3072, "vectorSearchProfile": "default"},
{"name": "document_id", "type": "Edm.String", "filterable": true},
{"name": "document_name", "type": "Edm.String", "filterable": true},
{"name": "chunk_index", "type": "Edm.Int32"},
{"name": "metadata", "type": "Edm.String"}
]
}
```
### Agent Prompt Template
Include a clear system prompt that:
- Defines the agent's role as a helpful RAG assistant
- Instructs to use retrieved context
- Requires citation of sources
- Encourages asking clarifying questions
- Enables multi-step reasoning
## 📚 Learning Objectives & Documentation
### Include Detailed Explanations For:
1. **What is Agentic RAG?** How it differs from simple RAG
2. **Chunking Strategies:** Why overlap matters, semantic vs. fixed-size
3. **Embedding Models:** How vector similarity works
4. **Hybrid Search:** Combining vector + keyword + semantic ranking
5. **Agent Reasoning:** ReAct pattern, tool use, chain-of-thought
6. **Prompt Engineering:** System prompts, few-shot examples, context construction
7. **Performance Optimization:** Caching, batch processing, async operations
8. **Error Handling:** Graceful degradation, retry logic, user-friendly messages
### Create These Educational Documents:
- **ARCHITECTURE.md:** System design with Mermaid diagram
- **LEARNING_GUIDE.md:** Step-by-step explanation of each component
- **SETUP.md:** Local development setup, Azure configuration
- **API_DOCS.md:** Endpoint documentation with examples
## 🎨 Code Quality Requirements
- **Type Hints:** Use throughout (functions, variables, return types)
- **Comments:** Explain WHY, not just WHAT
- **Docstrings:** Google or NumPy style for all functions/classes
- **Error Handling:** Try-except blocks with specific exceptions
- **Logging:** Use structured logging (JSON) with appropriate levels
- **PEP 8:** Follow Python style guide
- **Async/Await:** Use for I/O operations
- **Configuration:** All credentials/settings in environment variables
- **Security:** Never hardcode secrets, validate all inputs
## 🚀 Deployment & Running
### Local Development
1. Set up Azure resources (provide script or manual steps)
2. Configure `.env` file with credentials
3. Install dependencies: `pip install -r requirements.txt`
4. Run backend: `uvicorn backend.app.main:app --reload`
5. Run frontend: `streamlit run frontend/app.py`
### Docker Support
- Include `Dockerfile` for both services
- `docker-compose.yml` to run full stack
- Health checks and proper networking
## ✨ Optional Enhancements (If Possible)
- **Memory/History:** Store conversation context for multi-turn chats
- **Observability:** Integration with Langfuse or OpenTelemetry
- **Caching:** Redis for frequently accessed results
- **Rate Limiting:** Protect API endpoints
- **Admin UI:** View usage statistics, manage documents
- **Export:** Download chat history or generated responses
- **Evaluation:** Include retrieval quality metrics
## 📦 Deliverables
1. **Complete working codebase** (all files in proper structure)
2. **Requirements.txt** with pinned versions
3. **.env.example** with all required variables documented
4. **README.md** with quick start guide
5. **Detailed documentation** (SETUP.md, ARCHITECTURE.md, LEARNING_GUIDE.md)
6. **Sample data** for testing (example.pdf or similar)
7. **Mermaid diagram** showing data flow
8. **Inline comments** explaining complex logic
## 🎯 Generation Instructions for Claude
Please generate this project **step-by-step**:
1. **First:** Show the complete project structure (folder tree)
2. **Second:** Generate backend core files (config, models, main.py)
3. **Third:** Implement services (document processing, embeddings, search, agent)
4. **Fourth:** Create API routes (documents, query, health)
5. **Fifth:** Build Streamlit frontend (main app, components)
6. **Sixth:** Add configuration files (requirements.txt, .env.example, docker files)
7. **Seventh:** Create documentation (README, SETUP, ARCHITECTURE, LEARNING_GUIDE)
8. **Eighth:** Include sample prompts and test data
For each file, add:
- Clear comments explaining key concepts
- Type hints for all functions
- Error handling examples
- Educational notes where relevant
## 📊 Architecture Diagram Request
Please also create a Mermaid diagram (`docs/architecture-diagram.mmd`) showing:
- User interaction with Streamlit UI
- HTTP requests to FastAPI backend
- Document upload flow (extraction → chunking → embedding → indexing)
- Query processing flow (query → agent → retrieval → LLM → response)
- Azure services interactions (OpenAI, AI Search, Blob Storage)
- Data flow between all components
Use proper Mermaid syntax (flowchart or sequence diagram) that can be rendered in VS Code or GitHub.
---
## 🎯 My Goal
Build this application to deeply understand Agentic RAG architecture, Azure AI services integration, and production-ready Python development. The code should be clean, well-documented, and serve as a reference implementation for building intelligent document retrieval systems with autonomous agent capabilities.
---
## 💡 Follow-Up Instructions
After pasting this prompt to Claude, follow up with:
> "Please start by generating the folder structure and backend configuration files first. Then proceed step-by-step through each component, ensuring all code includes detailed comments and explanations."
This will help Claude generate organized, manageable code blocks that you can review and learn from systematically.