# 🎯 **ULTIMATE AGENTIC RAG APPLICATION PROMPT**
## Complete Production-Ready Agentic RAG Application
I need you to generate a full-stack, production-ready Agentic RAG (Retrieval-Augmented Generation) application in Python. This is a learning project, so please include extensive comments, explanations, and educational content throughout.
## 🏗️ Architecture Requirements
### Frontend Layer
**Technology:** Streamlit
**Features:**
- Modern, intuitive UI for document upload (PDF, DOCX, TXT, Markdown)
- Interactive query interface with real-time streaming responses
- Display of retrieved source documents with confidence scores
- Conversation history panel for multi-turn dialogues
- Source citation display (show which document chunks were used)
- Visual indicators for agent reasoning steps
- Session state management
### Backend Layer
**Technology:** FastAPI
**Features:**
- RESTful API with async/await patterns
- Automatic API documentation (Swagger/OpenAPI)
- CORS middleware for frontend integration
- Proper error handling with custom exception classes
- Structured logging (JSON format)
- Request validation using Pydantic models
- Health check and metrics endpoints
- API key authentication for security
### Core Azure Services
- **Azure OpenAI Service:** GPT-4o or GPT-4-turbo (latest available model)
- **Azure AI Search:** Vector store with hybrid search (vector + keyword)
- **Azure Blob Storage:** Document persistence and management
- **Azure Key Vault:** Secure credential management (optional but recommended)
### Agentic Framework
- **Orchestration:** Use LangChain or LlamaIndex for agent logic
- **Agent Pattern:** ReAct (Reasoning + Acting) or OpenAI function calling
- **Tools/Capabilities:**
- Document retriever tool
- Summarization tool
- Query refinement tool
- Multi-step reasoning with thought traces
- Self-reflection and answer validation
## 📋 Functional Requirements
### 1. Document Ingestion Pipeline
```
User uploads document → Extract text → Intelligent chunking (with overlap) →
Generate embeddings → Store in Azure AI Search + Blob Storage → Return success
```
**Requirements:**
- Support PDF, DOCX, TXT, and Markdown files
- Implement smart chunking strategy (500-1000 tokens, 10-20% overlap)
- Extract and preserve metadata (filename, upload date, page numbers)
- Generate embeddings using Azure OpenAI (text-embedding-3-large or ada-002)
- Create Azure AI Search index with vector and text fields
- Handle large documents (chunking + batch processing)
- Progress indicators during upload
### 2. Agentic Query Processing
```
User query → Agent analyzes → Plans retrieval strategy → Retrieves context →
Reasons about information → Generates response → Cites sources → Returns to user
```
**Requirements:**
- Agent breaks down complex queries into sub-tasks
- Dynamic retrieval: fetch more context if initial results insufficient
- Hybrid search: combine vector similarity + keyword matching + semantic ranking
- Re-ranking of retrieved chunks for relevance
- Multi-step reasoning visible to user (show agent's "thoughts")
- Context-aware responses with proper citations
- Handle follow-up questions using conversation history
### 3. API Endpoints
**Document Management:**
- `POST /api/v1/documents/upload` - Upload and process documents
- `GET /api/v1/documents` - List all indexed documents
- `GET /api/v1/documents/{doc_id}` - Get document details
- `DELETE /api/v1/documents/{doc_id}` - Remove document and chunks
**Query & Chat:**
- `POST /api/v1/query` - Submit query with streaming response
- `POST /api/v1/chat` - Conversational endpoint with history
- `GET /api/v1/chat/history/{session_id}` - Retrieve chat history
**System:**
- `GET /api/v1/health` - Health check
- `GET /api/v1/metrics` - Basic usage metrics
## 🗂️ Project Structure
```
agentic-rag-app/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application entry
│ │ ├── config.py # Configuration and settings
│ │ ├── dependencies.py # Dependency injection
│ │ ├── api/
│ │ │ ├── __init__.py
│ │ │ ├── routes/
│ │ │ │ ├── documents.py # Document endpoints
│ │ │ │ ├── query.py # Query endpoints
│ │ │ │ └── health.py # Health check
│ │ ├── services/
│ │ │ ├── __init__.py
│ │ │ ├── document_processor.py # Text extraction & chunking
│ │ │ ├── embedding_service.py # Azure OpenAI embeddings
│ │ │ ├── search_service.py # Azure AI Search operations
│ │ │ ├── agent_service.py # Agentic orchestration
│ │ │ └── llm_service.py # LLM interactions
│ │ ├── models/
│ │ │ ├── __init__.py
│ │ │ ├── requests.py # Pydantic request models
│ │ │ ├── responses.py # Pydantic response models
│ │ │ └── documents.py # Document data models
│ │ ├── utils/
│ │ │ ├── __init__.py
│ │ │ ├── logging.py # Logging configuration
│ │ │ ├── exceptions.py # Custom exceptions
│ │ │ └── azure_clients.py # Azure SDK clients
│ │ └── core/
│ │ ├── __init__.py
│ │ ├── security.py # Authentication
│ │ └── prompts.py # System prompts
│ ├── tests/
│ │ ├── __init__.py
│ │ └── test_api.py
│ ├── requirements.txt
│ ├── .env.example
│ └── Dockerfile
├── frontend/
│ ├── app.py # Streamlit main app
│ ├── components/
│ │ ├── __init__.py
│ │ ├── upload.py # Upload component
│ │ ├── chat.py # Chat interface
│ │ └── sidebar.py # Sidebar with settings
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── api_client.py # Backend API client
│ │ └── session.py # Session management
│ ├── requirements.txt
│ └── .streamlit/
│ └── config.toml
├── docs/
│ ├── README.md # Main documentation
│ ├── SETUP.md # Detailed setup guide
│ ├── ARCHITECTURE.md # Architecture explanation
│ ├── LEARNING_GUIDE.md # Educational walkthrough
│ └── architecture-diagram.mmd # Mermaid diagram
├── scripts/
│ ├── setup_azure.py # Azure resource setup script
│ └── seed_data.py # Sample data loader
├── sample_documents/
│ └── example.pdf # Test document
├── docker-compose.yml
├── .gitignore
└── README.md
```
## 🔧 Technical Implementation Details
### Technology Stack
- **Python:** 3.11+
- **Backend:** FastAPI 0.110+, uvicorn, python-multipart
- **Frontend:** Streamlit 1.32+
- **Azure SDKs:**
- `openai` (latest)
- `azure-search-documents`
- `azure-storage-blob`
- `azure-identity`
- **Agent Framework:** LangChain 0.1+ or LlamaIndex 0.10+
- **Document Processing:** pypdf2, python-docx, markdown
- **Data Validation:** Pydantic 2.0+
- **Additional:** python-dotenv, httpx, aiohttp
### Chunking Strategy
- Use semantic chunking (sentence-aware)
- Target chunk size: 500-1000 tokens
- Overlap: 10-20% (50-200 tokens)
- Preserve document structure metadata
- Include document title/filename in each chunk
### Embedding Configuration
- Model: `text-embedding-3-large` (3072 dimensions) or `text-embedding-ada-002`
- Batch processing for efficiency
- Normalize vectors for cosine similarity
### Azure AI Search Index Schema
```json
{
"name": "documents-index",
"fields": [
{"name": "id", "type": "Edm.String", "key": true},
{"name": "content", "type": "Edm.String", "searchable": true},
{"name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 3072, "vectorSearchProfile": "default"},
{"name": "document_id", "type": "Edm.String", "filterable": true},
{"name": "document_name", "type": "Edm.String", "filterable": true},
{"name": "chunk_index", "type": "Edm.Int32"},
{"name": "metadata", "type": "Edm.String"}
]
}
```
### Agent Prompt Template
Include a clear system prompt that:
- Defines the agent's role as a helpful RAG assistant
- Instructs to use retrieved context
- Requires citation of sources
- Encourages asking clarifying questions
- Enables multi-step reasoning
## 📚 Learning Objectives & Documentation
### Include Detailed Explanations For:
1. **What is Agentic RAG?** How it differs from simple RAG
2. **Chunking Strategies:** Why overlap matters, semantic vs. fixed-size
3. **Embedding Models:** How vector similarity works
4. **Hybrid Search:** Combining vector + keyword + semantic ranking
5. **Agent Reasoning:** ReAct pattern, tool use, chain-of-thought
6. **Prompt Engineering:** System prompts, few-shot examples, context construction
7. **Performance Optimization:** Caching, batch processing, async operations
8. **Error Handling:** Graceful degradation, retry logic, user-friendly messages
### Create These Educational Documents:
- **ARCHITECTURE.md:** System design with Mermaid diagram
- **LEARNING_GUIDE.md:** Step-by-step explanation of each component
- **SETUP.md:** Local development setup, Azure configuration
- **API_DOCS.md:** Endpoint documentation with examples
## 🎨 Code Quality Requirements
- **Type Hints:** Use throughout (functions, variables, return types)
- **Comments:** Explain WHY, not just WHAT
- **Docstrings:** Google or NumPy style for all functions/classes
- **Error Handling:** Try-except blocks with specific exceptions
- **Logging:** Use structured logging (JSON) with appropriate levels
- **PEP 8:** Follow Python style guide
- **Async/Await:** Use for I/O operations
- **Configuration:** All credentials/settings in environment variables
- **Security:** Never hardcode secrets, validate all inputs
## 🚀 Deployment & Running
### Local Development
1. Set up Azure resources (provide script or manual steps)
2. Configure `.env` file with credentials
3. Install dependencies: `pip install -r requirements.txt`
4. Run backend: `uvicorn backend.app.main:app --reload`
5. Run frontend: `streamlit run frontend/app.py`
### Docker Support
- Include `Dockerfile` for both services
- `docker-compose.yml` to run full stack
- Health checks and proper networking
## ✨ Optional Enhancements (If Possible)
- **Memory/History:** Store conversation context for multi-turn chats
- **Observability:** Integration with Langfuse or OpenTelemetry
- **Caching:** Redis for frequently accessed results
- **Rate Limiting:** Protect API endpoints
- **Admin UI:** View usage statistics, manage documents
- **Export:** Download chat history or generated responses
- **Evaluation:** Include retrieval quality metrics
## 📦 Deliverables
1. **Complete working codebase** (all files in proper structure)
2. **Requirements.txt** with pinned versions
3. **.env.example** with all required variables documented
4. **README.md** with quick start guide
5. **Detailed documentation** (SETUP.md, ARCHITECTURE.md, LEARNING_GUIDE.md)
6. **Sample data** for testing (example.pdf or similar)
7. **Mermaid diagram** showing data flow
8. **Inline comments** explaining complex logic
## 🎯 Generation Instructions for Claude
Please generate this project **step-by-step**:
1. **First:** Show the complete project structure (folder tree)
2. **Second:** Generate backend core files (config, models, main.py)
3. **Third:** Implement services (document processing, embeddings, search, agent)
4. **Fourth:** Create API routes (documents, query, health)
5. **Fifth:** Build Streamlit frontend (main app, components)
6. **Sixth:** Add configuration files (requirements.txt, .env.example, docker files)
7. **Seventh:** Create documentation (README, SETUP, ARCHITECTURE, LEARNING_GUIDE)
8. **Eighth:** Include sample prompts and test data
For each file, add:
- Clear comments explaining key concepts
- Type hints for all functions
- Error handling examples
- Educational notes where relevant
## 📊 Architecture Diagram Request
Please also create a Mermaid diagram (`docs/architecture-diagram.mmd`) showing:
- User interaction with Streamlit UI
- HTTP requests to FastAPI backend
- Document upload flow (extraction → chunking → embedding → indexing)
- Query processing flow (query → agent → retrieval → LLM → response)
- Azure services interactions (OpenAI, AI Search, Blob Storage)
- Data flow between all components
Use proper Mermaid syntax (flowchart or sequence diagram) that can be rendered in VS Code or GitHub.
---
## 🎯 My Goal
Build this application to deeply understand Agentic RAG architecture, Azure AI services integration, and production-ready Python development. The code should be clean, well-documented, and serve as a reference implementation for building intelligent document retrieval systems with autonomous agent capabilities.
---
## 💡 Follow-Up Instructions
After pasting this prompt to Claude, follow up with:
> "Please start by generating the folder structure and backend configuration files first. Then proceed step-by-step through each component, ensuring all code includes detailed comments and explanations."
This will help Claude generate organized, manageable code blocks that you can review and learn from systematically.
No comments:
Post a Comment