Azure/ Azure Kubernetes Cluster/ MS SQL Server / Azure /Azure DevOps and Terraform: October 2025

# 🎯 **ULTIMATE AGENTIC RAG APPLICATION PROMPT**

## Complete Production-Ready Agentic RAG Application

I need you to generate a full-stack, production-ready Agentic RAG (Retrieval-Augmented Generation) application in Python. This is a learning project, so please include extensive comments, explanations, and educational content throughout.

## 🏗️ Architecture Requirements

### Frontend Layer

**Technology:** Streamlit

**Features:**

- Modern, intuitive UI for document upload (PDF, DOCX, TXT, Markdown)

- Interactive query interface with real-time streaming responses

- Display of retrieved source documents with confidence scores

- Conversation history panel for multi-turn dialogues

- Source citation display (show which document chunks were used)

- Visual indicators for agent reasoning steps

- Session state management

### Backend Layer

**Technology:** FastAPI

**Features:**

- RESTful API with async/await patterns

- Automatic API documentation (Swagger/OpenAPI)

- CORS middleware for frontend integration

- Proper error handling with custom exception classes

- Structured logging (JSON format)

- Request validation using Pydantic models

- Health check and metrics endpoints

- API key authentication for security

### Core Azure Services

- **Azure OpenAI Service:** GPT-4o or GPT-4-turbo (latest available model)

- **Azure AI Search:** Vector store with hybrid search (vector + keyword)

- **Azure Blob Storage:** Document persistence and management

- **Azure Key Vault:** Secure credential management (optional but recommended)

### Agentic Framework

- **Orchestration:** Use LangChain or LlamaIndex for agent logic

- **Agent Pattern:** ReAct (Reasoning + Acting) or OpenAI function calling

- **Tools/Capabilities:**

- Document retriever tool

- Summarization tool

- Query refinement tool

- Multi-step reasoning with thought traces

- Self-reflection and answer validation

## 📋 Functional Requirements

### 1. Document Ingestion Pipeline

```

User uploads document → Extract text → Intelligent chunking (with overlap) →

Generate embeddings → Store in Azure AI Search + Blob Storage → Return success

```

**Requirements:**

- Support PDF, DOCX, TXT, and Markdown files

- Implement smart chunking strategy (500-1000 tokens, 10-20% overlap)

- Extract and preserve metadata (filename, upload date, page numbers)

- Generate embeddings using Azure OpenAI (text-embedding-3-large or ada-002)

- Create Azure AI Search index with vector and text fields

- Handle large documents (chunking + batch processing)

- Progress indicators during upload

### 2. Agentic Query Processing

```

User query → Agent analyzes → Plans retrieval strategy → Retrieves context →

Reasons about information → Generates response → Cites sources → Returns to user

```

**Requirements:**

- Agent breaks down complex queries into sub-tasks

- Dynamic retrieval: fetch more context if initial results insufficient

- Hybrid search: combine vector similarity + keyword matching + semantic ranking

- Re-ranking of retrieved chunks for relevance

- Multi-step reasoning visible to user (show agent's "thoughts")

- Context-aware responses with proper citations

- Handle follow-up questions using conversation history

### 3. API Endpoints

**Document Management:**

- `POST /api/v1/documents/upload` - Upload and process documents

- `GET /api/v1/documents` - List all indexed documents

- `GET /api/v1/documents/{doc_id}` - Get document details

- `DELETE /api/v1/documents/{doc_id}` - Remove document and chunks

**Query & Chat:**

- `POST /api/v1/query` - Submit query with streaming response

- `POST /api/v1/chat` - Conversational endpoint with history

- `GET /api/v1/chat/history/{session_id}` - Retrieve chat history

**System:**

- `GET /api/v1/health` - Health check

- `GET /api/v1/metrics` - Basic usage metrics

## 🗂️ Project Structure

```

agentic-rag-app/

├── backend/

│ ├── app/

│ │ ├── __init__.py

│ │ ├── main.py # FastAPI application entry

│ │ ├── config.py # Configuration and settings

│ │ ├── dependencies.py # Dependency injection

│ │ ├── api/

│ │ │ ├── __init__.py

│ │ │ ├── routes/

│ │ │ │ ├── documents.py # Document endpoints

│ │ │ │ ├── query.py # Query endpoints

│ │ │ │ └── health.py # Health check

│ │ ├── services/

│ │ │ ├── __init__.py

│ │ │ ├── document_processor.py # Text extraction & chunking

│ │ │ ├── embedding_service.py # Azure OpenAI embeddings

│ │ │ ├── search_service.py # Azure AI Search operations

│ │ │ ├── agent_service.py # Agentic orchestration

│ │ │ └── llm_service.py # LLM interactions

│ │ ├── models/

│ │ │ ├── __init__.py

│ │ │ ├── requests.py # Pydantic request models

│ │ │ ├── responses.py # Pydantic response models

│ │ │ └── documents.py # Document data models

│ │ ├── utils/

│ │ │ ├── __init__.py

│ │ │ ├── logging.py # Logging configuration

│ │ │ ├── exceptions.py # Custom exceptions

│ │ │ └── azure_clients.py # Azure SDK clients

│ │ └── core/

│ │ ├── __init__.py

│ │ ├── security.py # Authentication

│ │ └── prompts.py # System prompts

│ ├── tests/

│ │ ├── __init__.py

│ │ └── test_api.py

│ ├── requirements.txt

│ ├── .env.example

│ └── Dockerfile

├── frontend/

│ ├── app.py # Streamlit main app

│ ├── components/

│ │ ├── __init__.py

│ │ ├── upload.py # Upload component

│ │ ├── chat.py # Chat interface

│ │ └── sidebar.py # Sidebar with settings

│ ├── utils/

│ │ ├── __init__.py

│ │ ├── api_client.py # Backend API client

│ │ └── session.py # Session management

│ ├── requirements.txt

│ └── .streamlit/

│ └── config.toml

├── docs/

│ ├── README.md # Main documentation

│ ├── SETUP.md # Detailed setup guide

│ ├── ARCHITECTURE.md # Architecture explanation

│ ├── LEARNING_GUIDE.md # Educational walkthrough

│ └── architecture-diagram.mmd # Mermaid diagram

├── scripts/

│ ├── setup_azure.py # Azure resource setup script

│ └── seed_data.py # Sample data loader

├── sample_documents/

│ └── example.pdf # Test document

├── docker-compose.yml

├── .gitignore

└── README.md

```

## 🔧 Technical Implementation Details

### Technology Stack

- **Python:** 3.11+

- **Backend:** FastAPI 0.110+, uvicorn, python-multipart

- **Frontend:** Streamlit 1.32+

- **Azure SDKs:**

- `openai` (latest)

- `azure-search-documents`

- `azure-storage-blob`

- `azure-identity`

- **Agent Framework:** LangChain 0.1+ or LlamaIndex 0.10+

- **Document Processing:** pypdf2, python-docx, markdown

- **Data Validation:** Pydantic 2.0+

- **Additional:** python-dotenv, httpx, aiohttp

### Chunking Strategy

- Use semantic chunking (sentence-aware)

- Target chunk size: 500-1000 tokens

- Overlap: 10-20% (50-200 tokens)

- Preserve document structure metadata

- Include document title/filename in each chunk

### Embedding Configuration

- Model: `text-embedding-3-large` (3072 dimensions) or `text-embedding-ada-002`

- Batch processing for efficiency

- Normalize vectors for cosine similarity

### Azure AI Search Index Schema

```json

{

"name": "documents-index",

"fields": [

{"name": "id", "type": "Edm.String", "key": true},

{"name": "content", "type": "Edm.String", "searchable": true},

{"name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 3072, "vectorSearchProfile": "default"},

{"name": "document_id", "type": "Edm.String", "filterable": true},

{"name": "document_name", "type": "Edm.String", "filterable": true},

{"name": "chunk_index", "type": "Edm.Int32"},

{"name": "metadata", "type": "Edm.String"}

]

}

```

### Agent Prompt Template

Include a clear system prompt that:

- Defines the agent's role as a helpful RAG assistant

- Instructs to use retrieved context

- Requires citation of sources

- Encourages asking clarifying questions

- Enables multi-step reasoning

## 📚 Learning Objectives & Documentation

### Include Detailed Explanations For:

1. **What is Agentic RAG?** How it differs from simple RAG

2. **Chunking Strategies:** Why overlap matters, semantic vs. fixed-size

3. **Embedding Models:** How vector similarity works

4. **Hybrid Search:** Combining vector + keyword + semantic ranking

5. **Agent Reasoning:** ReAct pattern, tool use, chain-of-thought

6. **Prompt Engineering:** System prompts, few-shot examples, context construction

7. **Performance Optimization:** Caching, batch processing, async operations

8. **Error Handling:** Graceful degradation, retry logic, user-friendly messages

### Create These Educational Documents:

- **ARCHITECTURE.md:** System design with Mermaid diagram

- **LEARNING_GUIDE.md:** Step-by-step explanation of each component

- **SETUP.md:** Local development setup, Azure configuration

- **API_DOCS.md:** Endpoint documentation with examples

## 🎨 Code Quality Requirements

- **Type Hints:** Use throughout (functions, variables, return types)

- **Comments:** Explain WHY, not just WHAT

- **Docstrings:** Google or NumPy style for all functions/classes

- **Error Handling:** Try-except blocks with specific exceptions

- **Logging:** Use structured logging (JSON) with appropriate levels

- **PEP 8:** Follow Python style guide

- **Async/Await:** Use for I/O operations

- **Configuration:** All credentials/settings in environment variables

- **Security:** Never hardcode secrets, validate all inputs

## 🚀 Deployment & Running

### Local Development

1. Set up Azure resources (provide script or manual steps)

2. Configure `.env` file with credentials

3. Install dependencies: `pip install -r requirements.txt`

4. Run backend: `uvicorn backend.app.main:app --reload`

5. Run frontend: `streamlit run frontend/app.py`

### Docker Support

- Include `Dockerfile` for both services

- `docker-compose.yml` to run full stack

- Health checks and proper networking

## ✨ Optional Enhancements (If Possible)

- **Memory/History:** Store conversation context for multi-turn chats

- **Observability:** Integration with Langfuse or OpenTelemetry

- **Caching:** Redis for frequently accessed results

- **Rate Limiting:** Protect API endpoints

- **Admin UI:** View usage statistics, manage documents

- **Export:** Download chat history or generated responses

- **Evaluation:** Include retrieval quality metrics

## 📦 Deliverables

1. **Complete working codebase** (all files in proper structure)

2. **Requirements.txt** with pinned versions

3. **.env.example** with all required variables documented

4. **README.md** with quick start guide

5. **Detailed documentation** (SETUP.md, ARCHITECTURE.md, LEARNING_GUIDE.md)

6. **Sample data** for testing (example.pdf or similar)

7. **Mermaid diagram** showing data flow

8. **Inline comments** explaining complex logic

## 🎯 Generation Instructions for Claude

Please generate this project **step-by-step**:

1. **First:** Show the complete project structure (folder tree)

2. **Second:** Generate backend core files (config, models, main.py)

3. **Third:** Implement services (document processing, embeddings, search, agent)

4. **Fourth:** Create API routes (documents, query, health)

5. **Fifth:** Build Streamlit frontend (main app, components)

6. **Sixth:** Add configuration files (requirements.txt, .env.example, docker files)

7. **Seventh:** Create documentation (README, SETUP, ARCHITECTURE, LEARNING_GUIDE)

8. **Eighth:** Include sample prompts and test data

For each file, add:

- Clear comments explaining key concepts

- Type hints for all functions

- Error handling examples

- Educational notes where relevant

## 📊 Architecture Diagram Request

Please also create a Mermaid diagram (`docs/architecture-diagram.mmd`) showing:

- User interaction with Streamlit UI

- HTTP requests to FastAPI backend

- Document upload flow (extraction → chunking → embedding → indexing)

- Query processing flow (query → agent → retrieval → LLM → response)

- Azure services interactions (OpenAI, AI Search, Blob Storage)

- Data flow between all components

Use proper Mermaid syntax (flowchart or sequence diagram) that can be rendered in VS Code or GitHub.

---

## 🎯 My Goal

Build this application to deeply understand Agentic RAG architecture, Azure AI services integration, and production-ready Python development. The code should be clean, well-documented, and serve as a reference implementation for building intelligent document retrieval systems with autonomous agent capabilities.

---

## 💡 Follow-Up Instructions

After pasting this prompt to Claude, follow up with:

> "Please start by generating the folder structure and backend configuration files first. Then proceed step-by-step through each component, ensuring all code includes detailed comments and explanations."

This will help Claude generate organized, manageable code blocks that you can review and learn from systematically.