ULTIMATE AGENTIC RAG APPLICATION PROMPT

# 🎯 **ULTIMATE AGENTIC RAG APPLICATION PROMPT**


## Complete Production-Ready Agentic RAG Application


I need you to generate a full-stack, production-ready Agentic RAG (Retrieval-Augmented Generation) application in Python. This is a learning project, so please include extensive comments, explanations, and educational content throughout.


## 🏗️ Architecture Requirements


### Frontend Layer

**Technology:** Streamlit

**Features:**

- Modern, intuitive UI for document upload (PDF, DOCX, TXT, Markdown)

- Interactive query interface with real-time streaming responses

- Display of retrieved source documents with confidence scores

- Conversation history panel for multi-turn dialogues

- Source citation display (show which document chunks were used)

- Visual indicators for agent reasoning steps

- Session state management


### Backend Layer

**Technology:** FastAPI

**Features:**

- RESTful API with async/await patterns

- Automatic API documentation (Swagger/OpenAPI)

- CORS middleware for frontend integration

- Proper error handling with custom exception classes

- Structured logging (JSON format)

- Request validation using Pydantic models

- Health check and metrics endpoints

- API key authentication for security


### Core Azure Services

- **Azure OpenAI Service:** GPT-4o or GPT-4-turbo (latest available model)

- **Azure AI Search:** Vector store with hybrid search (vector + keyword)

- **Azure Blob Storage:** Document persistence and management

- **Azure Key Vault:** Secure credential management (optional but recommended)


### Agentic Framework

- **Orchestration:** Use LangChain or LlamaIndex for agent logic

- **Agent Pattern:** ReAct (Reasoning + Acting) or OpenAI function calling

- **Tools/Capabilities:**

  - Document retriever tool

  - Summarization tool

  - Query refinement tool

  - Multi-step reasoning with thought traces

  - Self-reflection and answer validation


## 📋 Functional Requirements


### 1. Document Ingestion Pipeline

```

User uploads document → Extract text → Intelligent chunking (with overlap) → 

Generate embeddings → Store in Azure AI Search + Blob Storage → Return success

```


**Requirements:**

- Support PDF, DOCX, TXT, and Markdown files

- Implement smart chunking strategy (500-1000 tokens, 10-20% overlap)

- Extract and preserve metadata (filename, upload date, page numbers)

- Generate embeddings using Azure OpenAI (text-embedding-3-large or ada-002)

- Create Azure AI Search index with vector and text fields

- Handle large documents (chunking + batch processing)

- Progress indicators during upload


### 2. Agentic Query Processing

```

User query → Agent analyzes → Plans retrieval strategy → Retrieves context → 

Reasons about information → Generates response → Cites sources → Returns to user

```


**Requirements:**

- Agent breaks down complex queries into sub-tasks

- Dynamic retrieval: fetch more context if initial results insufficient

- Hybrid search: combine vector similarity + keyword matching + semantic ranking

- Re-ranking of retrieved chunks for relevance

- Multi-step reasoning visible to user (show agent's "thoughts")

- Context-aware responses with proper citations

- Handle follow-up questions using conversation history


### 3. API Endpoints


**Document Management:**

- `POST /api/v1/documents/upload` - Upload and process documents

- `GET /api/v1/documents` - List all indexed documents

- `GET /api/v1/documents/{doc_id}` - Get document details

- `DELETE /api/v1/documents/{doc_id}` - Remove document and chunks


**Query & Chat:**

- `POST /api/v1/query` - Submit query with streaming response

- `POST /api/v1/chat` - Conversational endpoint with history

- `GET /api/v1/chat/history/{session_id}` - Retrieve chat history


**System:**

- `GET /api/v1/health` - Health check

- `GET /api/v1/metrics` - Basic usage metrics


## 🗂️ Project Structure


```

agentic-rag-app/

├── backend/

│   ├── app/

│   │   ├── __init__.py

│   │   ├── main.py                 # FastAPI application entry

│   │   ├── config.py               # Configuration and settings

│   │   ├── dependencies.py         # Dependency injection

│   │   ├── api/

│   │   │   ├── __init__.py

│   │   │   ├── routes/

│   │   │   │   ├── documents.py    # Document endpoints

│   │   │   │   ├── query.py        # Query endpoints

│   │   │   │   └── health.py       # Health check

│   │   ├── services/

│   │   │   ├── __init__.py

│   │   │   ├── document_processor.py  # Text extraction & chunking

│   │   │   ├── embedding_service.py   # Azure OpenAI embeddings

│   │   │   ├── search_service.py      # Azure AI Search operations

│   │   │   ├── agent_service.py       # Agentic orchestration

│   │   │   └── llm_service.py         # LLM interactions

│   │   ├── models/

│   │   │   ├── __init__.py

│   │   │   ├── requests.py         # Pydantic request models

│   │   │   ├── responses.py        # Pydantic response models

│   │   │   └── documents.py        # Document data models

│   │   ├── utils/

│   │   │   ├── __init__.py

│   │   │   ├── logging.py          # Logging configuration

│   │   │   ├── exceptions.py       # Custom exceptions

│   │   │   └── azure_clients.py    # Azure SDK clients

│   │   └── core/

│   │       ├── __init__.py

│   │       ├── security.py         # Authentication

│   │       └── prompts.py          # System prompts

│   ├── tests/

│   │   ├── __init__.py

│   │   └── test_api.py

│   ├── requirements.txt

│   ├── .env.example

│   └── Dockerfile

├── frontend/

│   ├── app.py                      # Streamlit main app

│   ├── components/

│   │   ├── __init__.py

│   │   ├── upload.py               # Upload component

│   │   ├── chat.py                 # Chat interface

│   │   └── sidebar.py              # Sidebar with settings

│   ├── utils/

│   │   ├── __init__.py

│   │   ├── api_client.py           # Backend API client

│   │   └── session.py              # Session management

│   ├── requirements.txt

│   └── .streamlit/

│       └── config.toml

├── docs/

│   ├── README.md                   # Main documentation

│   ├── SETUP.md                    # Detailed setup guide

│   ├── ARCHITECTURE.md             # Architecture explanation

│   ├── LEARNING_GUIDE.md           # Educational walkthrough

│   └── architecture-diagram.mmd    # Mermaid diagram

├── scripts/

│   ├── setup_azure.py              # Azure resource setup script

│   └── seed_data.py                # Sample data loader

├── sample_documents/

│   └── example.pdf                 # Test document

├── docker-compose.yml

├── .gitignore

└── README.md

```


## 🔧 Technical Implementation Details


### Technology Stack

- **Python:** 3.11+

- **Backend:** FastAPI 0.110+, uvicorn, python-multipart

- **Frontend:** Streamlit 1.32+

- **Azure SDKs:** 

  - `openai` (latest)

  - `azure-search-documents` 

  - `azure-storage-blob`

  - `azure-identity`

- **Agent Framework:** LangChain 0.1+ or LlamaIndex 0.10+

- **Document Processing:** pypdf2, python-docx, markdown

- **Data Validation:** Pydantic 2.0+

- **Additional:** python-dotenv, httpx, aiohttp


### Chunking Strategy

- Use semantic chunking (sentence-aware)

- Target chunk size: 500-1000 tokens

- Overlap: 10-20% (50-200 tokens)

- Preserve document structure metadata

- Include document title/filename in each chunk


### Embedding Configuration

- Model: `text-embedding-3-large` (3072 dimensions) or `text-embedding-ada-002`

- Batch processing for efficiency

- Normalize vectors for cosine similarity


### Azure AI Search Index Schema

```json

{

  "name": "documents-index",

  "fields": [

    {"name": "id", "type": "Edm.String", "key": true},

    {"name": "content", "type": "Edm.String", "searchable": true},

    {"name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 3072, "vectorSearchProfile": "default"},

    {"name": "document_id", "type": "Edm.String", "filterable": true},

    {"name": "document_name", "type": "Edm.String", "filterable": true},

    {"name": "chunk_index", "type": "Edm.Int32"},

    {"name": "metadata", "type": "Edm.String"}

  ]

}

```


### Agent Prompt Template

Include a clear system prompt that:

- Defines the agent's role as a helpful RAG assistant

- Instructs to use retrieved context

- Requires citation of sources

- Encourages asking clarifying questions

- Enables multi-step reasoning


## 📚 Learning Objectives & Documentation


### Include Detailed Explanations For:

1. **What is Agentic RAG?** How it differs from simple RAG

2. **Chunking Strategies:** Why overlap matters, semantic vs. fixed-size

3. **Embedding Models:** How vector similarity works

4. **Hybrid Search:** Combining vector + keyword + semantic ranking

5. **Agent Reasoning:** ReAct pattern, tool use, chain-of-thought

6. **Prompt Engineering:** System prompts, few-shot examples, context construction

7. **Performance Optimization:** Caching, batch processing, async operations

8. **Error Handling:** Graceful degradation, retry logic, user-friendly messages


### Create These Educational Documents:

- **ARCHITECTURE.md:** System design with Mermaid diagram

- **LEARNING_GUIDE.md:** Step-by-step explanation of each component

- **SETUP.md:** Local development setup, Azure configuration

- **API_DOCS.md:** Endpoint documentation with examples


## 🎨 Code Quality Requirements


- **Type Hints:** Use throughout (functions, variables, return types)

- **Comments:** Explain WHY, not just WHAT

- **Docstrings:** Google or NumPy style for all functions/classes

- **Error Handling:** Try-except blocks with specific exceptions

- **Logging:** Use structured logging (JSON) with appropriate levels

- **PEP 8:** Follow Python style guide

- **Async/Await:** Use for I/O operations

- **Configuration:** All credentials/settings in environment variables

- **Security:** Never hardcode secrets, validate all inputs


## 🚀 Deployment & Running


### Local Development

1. Set up Azure resources (provide script or manual steps)

2. Configure `.env` file with credentials

3. Install dependencies: `pip install -r requirements.txt`

4. Run backend: `uvicorn backend.app.main:app --reload`

5. Run frontend: `streamlit run frontend/app.py`


### Docker Support

- Include `Dockerfile` for both services

- `docker-compose.yml` to run full stack

- Health checks and proper networking


## ✨ Optional Enhancements (If Possible)


- **Memory/History:** Store conversation context for multi-turn chats

- **Observability:** Integration with Langfuse or OpenTelemetry

- **Caching:** Redis for frequently accessed results

- **Rate Limiting:** Protect API endpoints

- **Admin UI:** View usage statistics, manage documents

- **Export:** Download chat history or generated responses

- **Evaluation:** Include retrieval quality metrics


## 📦 Deliverables


1. **Complete working codebase** (all files in proper structure)

2. **Requirements.txt** with pinned versions

3. **.env.example** with all required variables documented

4. **README.md** with quick start guide

5. **Detailed documentation** (SETUP.md, ARCHITECTURE.md, LEARNING_GUIDE.md)

6. **Sample data** for testing (example.pdf or similar)

7. **Mermaid diagram** showing data flow

8. **Inline comments** explaining complex logic


## 🎯 Generation Instructions for Claude


Please generate this project **step-by-step**:


1. **First:** Show the complete project structure (folder tree)

2. **Second:** Generate backend core files (config, models, main.py)

3. **Third:** Implement services (document processing, embeddings, search, agent)

4. **Fourth:** Create API routes (documents, query, health)

5. **Fifth:** Build Streamlit frontend (main app, components)

6. **Sixth:** Add configuration files (requirements.txt, .env.example, docker files)

7. **Seventh:** Create documentation (README, SETUP, ARCHITECTURE, LEARNING_GUIDE)

8. **Eighth:** Include sample prompts and test data


For each file, add:

- Clear comments explaining key concepts

- Type hints for all functions

- Error handling examples

- Educational notes where relevant


## 📊 Architecture Diagram Request


Please also create a Mermaid diagram (`docs/architecture-diagram.mmd`) showing:

- User interaction with Streamlit UI

- HTTP requests to FastAPI backend

- Document upload flow (extraction → chunking → embedding → indexing)

- Query processing flow (query → agent → retrieval → LLM → response)

- Azure services interactions (OpenAI, AI Search, Blob Storage)

- Data flow between all components


Use proper Mermaid syntax (flowchart or sequence diagram) that can be rendered in VS Code or GitHub.


---


## 🎯 My Goal


Build this application to deeply understand Agentic RAG architecture, Azure AI services integration, and production-ready Python development. The code should be clean, well-documented, and serve as a reference implementation for building intelligent document retrieval systems with autonomous agent capabilities.


---


## 💡 Follow-Up Instructions


After pasting this prompt to Claude, follow up with:


> "Please start by generating the folder structure and backend configuration files first. Then proceed step-by-step through each component, ensuring all code includes detailed comments and explanations."


This will help Claude generate organized, manageable code blocks that you can review and learn from systematically.