About Me

My photo
I am an MCSE in Data Management and Analytics, specializing in MS SQL Server, and an MCP in Azure. With over 19+ years of experience in the IT industry, I bring expertise in data management, Azure Cloud, Data Center Migration, Infrastructure Architecture planning, as well as Virtualization and automation. I have a deep passion for driving innovation through infrastructure automation, particularly using Terraform for efficient provisioning. If you're looking for guidance on automating your infrastructure or have questions about Azure, SQL Server, or cloud migration, feel free to reach out. I often write to capture my own experiences and insights for future reference, but I hope that sharing these experiences through my blog will help others on their journey as well. Thank you for reading!

MCP server configuration

 ● Update(.claude\code_config.json)

  ⎿  Updated .claude\code_config.json with 4 additions

       11            "@modelcontextprotocol/server-postgres",

       12            "postgresql://rakuser:rakpassword@20.232.77.76:5432/librarydatabase"

       13          ]

       14 +      },

       15 +      "filesystem": {

       16 +        "command": "npx",

       17 +        "args": ["-y", "@modelcontextprotocol/server-filesystem", "C:\\"]

       18        }

       19      }

       20    }

       21   \ No newline at end of file


● Update(.claude\settings.local.json)

  ⎿  Updated .claude\settings.local.json with 3 additions and 1 removal

       13      "enableAllProjectMcpServers": true,

       14      "enabledMcpjsonServers": [

       15        "postgres-enterprise",

       16 -      "github"

       16 +      "github",

       17 +      "terraform",

       18 +      "filesystem"

       19      ]

       20    }


● Done! I've successfully:


  1. Enabled terraform server - Added "terraform" to your enabled MCP servers list

  2. Set up filesystem server - Added filesystem configuration with access to C:\ drive and enabled it


  Your MCP servers are now configured as follows:


  Enabled servers:

  - postgres-enterprise

  - github

  - terraform

  - filesystem (with access to C:


  Note: You'll need to restart Claude Code for these changes to take effect. The filesystem and terraform servers will be available after the restart.

# MCP Server Monitoring and Observability Guide



This guide covers monitoring, logging, and observability for the MCP Server deployment.

## Table of Contents

1. [Azure Monitor Integration](#azure-monitor-integration)
2. [Log Analytics](#log-analytics)
3. [Application Insights](#application-insights)
4. [Alerts and Notifications](#alerts-and-notifications)
5. [Dashboards](#dashboards)
6. [Metrics](#metrics)
7. [Troubleshooting](#troubleshooting)

## Azure Monitor Integration

The MCP Server is fully integrated with Azure Monitor for comprehensive observability.

### Key Components

- **Log Analytics Workspace**: Centralized log storage
- **Application Insights**: Application performance monitoring
- **Azure Monitor Metrics**: Resource-level metrics
- **Container App Logs**: Application and system logs

## Log Analytics

### Accessing Logs

1. Navigate to Azure Portal
2. Go to your Log Analytics Workspace
3. Select "Logs" from the left menu

### Common Queries

#### View All Application Logs
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
| take 100
```

#### Search for Errors
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "error" or Log_s contains "ERROR"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
```

#### Authentication Failures
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "401" or Log_s contains "Unauthorized"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
```

#### User Activity
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "User authenticated"
| extend UserId = extract("userId\":\"([^\"]+)", 1, Log_s)
| summarize Count = count() by UserId, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
```

#### Performance Metrics
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "response time" or Log_s contains "duration"
| extend ResponseTime = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| summarize avg(ResponseTime), max(ResponseTime), min(ResponseTime) by bin(TimeGenerated, 5m)
```

#### Database Query Performance
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "database" and Log_s contains "query"
| extend QueryDuration = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| summarize avg(QueryDuration), count() by bin(TimeGenerated, 5m)
```

## Application Insights

### Key Metrics

1. **Request Rate**: Requests per second
2. **Response Time**: Average response time
3. **Failure Rate**: Failed requests percentage
4. **Dependencies**: External service calls (database, etc.)

### Viewing Metrics

Navigate to: **Application Insights > Investigate > Performance**

### Custom Metrics

The MCP Server emits custom metrics:

- `mcp.connections.active`: Active MCP connections
- `mcp.tools.calls`: Tool call count
- `mcp.auth.success`: Successful authentications
- `mcp.auth.failed`: Failed authentications

## Alerts and Notifications

### Recommended Alerts

#### High Error Rate
```json
{
  "name": "High Error Rate",
  "description": "Alert when error rate exceeds 5%",
  "condition": {
    "metric": "requests/failed",
    "threshold": 5,
    "timeAggregation": "Average",
    "windowSize": "PT5M"
  },
  "actions": [
    {
      "actionGroup": "ops-team",
      "emailSubject": "MCP Server High Error Rate"
    }
  ]
}
```

#### High Response Time
```json
{
  "name": "High Response Time",
  "description": "Alert when average response time exceeds 2 seconds",
  "condition": {
    "metric": "requests/duration",
    "threshold": 2000,
    "timeAggregation": "Average",
    "windowSize": "PT5M"
  }
}
```

#### Authentication Failures
```json
{
  "name": "Authentication Failures",
  "description": "Alert on repeated authentication failures",
  "condition": {
    "query": "ContainerAppConsoleLogs_CL | where Log_s contains 'Authentication failed' | summarize count()",
    "threshold": 10,
    "timeAggregation": "Total",
    "windowSize": "PT5M"
  }
}
```

#### Low Availability
```json
{
  "name": "Container App Unhealthy",
  "description": "Alert when health check fails",
  "condition": {
    "metric": "healthcheck/status",
    "threshold": 1,
    "operator": "LessThan",
    "windowSize": "PT5M"
  }
}
```

### Creating Alerts via Azure CLI

```bash
# Create action group
az monitor action-group create \
  --name ops-team \
  --resource-group rg-mcp-server-prod \
  --short-name ops \
  --email admin admin@yourcompany.com

# Create metric alert
az monitor metrics alert create \
  --name high-error-rate \
  --resource-group rg-mcp-server-prod \
  --scopes /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --condition "total requests/failed > 5" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action ops-team
```

## Dashboards

### Create Custom Dashboard

1. Navigate to Azure Portal
2. Select "Dashboard" > "New dashboard"
3. Add tiles for:
   - Request count
   - Response time
   - Error rate
   - Active connections
   - CPU/Memory usage

### Sample Dashboard JSON

```json
{
  "lenses": {
    "0": {
      "order": 0,
      "parts": {
        "0": {
          "position": {
            "x": 0,
            "y": 0,
            "colSpan": 6,
            "rowSpan": 4
          },
          "metadata": {
            "type": "Extension/HubsExtension/PartType/MonitorChartPart",
            "settings": {
              "title": "Request Rate",
              "visualization": {
                "chartType": "Line",
                "legendVisualization": {
                  "isVisible": true
                }
              }
            }
          }
        }
      }
    }
  }
}
```

## Metrics

### Container App Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Replica Count | Number of active replicas | Min: 2, Max: 10 |
| CPU Usage | CPU percentage | < 80% |
| Memory Usage | Memory percentage | < 80% |
| Request Count | Total requests | Monitor trends |
| Request Duration | Average response time | < 2 seconds |

### Database Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Connections | Active connections | < 80% of max |
| CPU Usage | Database CPU | < 80% |
| Storage | Used storage | < 80% of quota |
| Query Duration | Average query time | < 500ms |

### Application Gateway Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Throughput | Bytes/second | Monitor trends |
| Failed Requests | Count of 5xx errors | < 1% |
| Backend Response Time | Time to first byte | < 1 second |
| Healthy Host Count | Number of healthy backends | > 0 |

## Troubleshooting

### Common Issues

#### 1. High Response Time

**Symptoms**: Slow API responses

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| extend Duration = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| where Duration > 2000
| project TimeGenerated, Log_s
```

**Solutions**:
- Scale up replicas
- Optimize database queries
- Check network latency
- Review application code

#### 2. Authentication Failures

**Symptoms**: 401 errors

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where Log_s contains "Token verification failed"
| project TimeGenerated, Log_s
```

**Solutions**:
- Verify Entra ID configuration
- Check token expiration
- Validate audience/issuer settings
- Review user permissions

#### 3. Database Connection Issues

**Symptoms**: Database errors

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where Log_s contains "PostgreSQL" and Log_s contains "error"
| project TimeGenerated, Log_s
```

**Solutions**:
- Check connection string
- Verify firewall rules
- Check connection pool size
- Review database health

#### 4. Memory Leaks

**Symptoms**: Increasing memory usage

**Investigation**:
- Check container app metrics
- Review memory usage trends
- Look for unclosed connections

**Solutions**:
- Restart container app
- Review application code
- Implement connection pooling
- Add memory limits

### Health Check Endpoints

#### Application Health
```bash
curl https://mcp.yourcompany.com/health
```

Expected Response:
```json
{
  "status": "healthy",
  "timestamp": "2025-12-09T10:00:00Z",
  "version": "1.0.0",
  "uptime": 86400
}
```

#### Readiness Check
```bash
curl https://mcp.yourcompany.com/ready
```

#### Metrics Endpoint
```bash
curl -H "Authorization: Bearer $TOKEN" https://mcp.yourcompany.com/metrics
```

## Log Retention

- **Container App Logs**: 30 days (configurable)
- **Log Analytics**: 30 days (configurable up to 730 days)
- **Application Insights**: 90 days default
- **Archived Logs**: Configure export to Storage Account for long-term retention

## Exporting Logs

### To Storage Account

```bash
az monitor diagnostic-settings create \
  --name export-to-storage \
  --resource /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --storage-account {storage-account-id} \
  --logs '[{"category":"ContainerAppConsoleLogs","enabled":true}]'
```

### To Event Hub

```bash
az monitor diagnostic-settings create \
  --name export-to-eventhub \
  --resource /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --event-hub {event-hub-name} \
  --event-hub-rule {auth-rule-id} \
  --logs '[{"category":"ContainerAppConsoleLogs","enabled":true}]'
```

## Best Practices

1. **Set up alerts early** - Don't wait for incidents
2. **Review logs regularly** - Weekly log reviews
3. **Monitor trends** - Look for patterns over time
4. **Document incidents** - Keep runbooks updated
5. **Test alerts** - Ensure notifications work
6. **Rotate credentials** - Regular security reviews
7. **Capacity planning** - Monitor growth trends
8. **Cost optimization** - Review unused resources

## Support

For monitoring issues:
- DevOps Team: devops@yourcompany.com
- Azure Support: https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade

MCP Server Deployment Checklist

 # MCP Server Deployment Checklist


Use this checklist to ensure a successful deployment of your enterprise MCP Server.

## Pre-Deployment

### Prerequisites
- [ ] Azure CLI installed and configured (`az --version`)
- [ ] Terraform >= 1.5.0 installed (`terraform --version`)
- [ ] Docker installed (`docker --version`)
- [ ] Node.js >= 20.0.0 installed (`node --version`)
- [ ] Azure subscription with Owner or Contributor role
- [ ] Valid Azure Entra ID tenant

### Azure Entra ID Setup
- [ ] Run `setup-entra-id.ps1` or `setup-entra-id.sh`
- [ ] Save Tenant ID, Client ID, and Client Secret securely
- [ ] Grant admin consent for API permissions in Azure Portal
- [ ] Assign test users to the application
- [ ] (Optional) Configure application roles
- [ ] (Optional) Set up conditional access policies

### Configuration
- [ ] Update `terraform/terraform.tfvars` with your values
- [ ] Choose globally unique names for ACR and PostgreSQL
- [ ] Set strong PostgreSQL admin password
- [ ] Configure tags for resource management
- [ ] Review network configuration (address spaces, subnets)

### Security
- [ ] Obtain or generate SSL certificate for Application Gateway
- [ ] Place certificate in `terraform/cert.pfx`
- [ ] Set certificate password in variables
- [ ] Review NSG rules and adjust if needed
- [ ] Configure allowed CORS origins

## Deployment Phase

### Infrastructure Deployment
- [ ] Navigate to `terraform/` directory
- [ ] Run `terraform init`
- [ ] Review `terraform plan` output carefully
- [ ] Run `terraform apply` and confirm
- [ ] Verify all resources created successfully
- [ ] Save Terraform outputs (ACR, PostgreSQL FQDN, etc.)

### Application Deployment
- [ ] Navigate to `server/` directory
- [ ] Login to ACR: `az acr login --name <acr-name>`
- [ ] Build Docker image: `docker build -t mcpserver:latest .`
- [ ] Tag image for ACR
- [ ] Push image to ACR
- [ ] Verify image in ACR: `az acr repository list --name <acr-name>`

### Container App Update
- [ ] Update Container App with new image
- [ ] Wait for deployment to complete
- [ ] Check Container App status: `az containerapp show`
- [ ] Verify replicas are running

## Post-Deployment

### Verification
- [ ] Test health endpoint: `curl https://<ip>/health`
- [ ] Test readiness endpoint: `curl https://<ip>/ready`
- [ ] Test authentication with Azure CLI token
- [ ] Verify MCP SSE endpoint connection
- [ ] Check logs in Log Analytics
- [ ] Review Container App metrics

### DNS and SSL
- [ ] Create DNS A record pointing to Application Gateway IP
- [ ] Update Application Gateway with production SSL certificate
- [ ] Verify SSL certificate validity
- [ ] Test HTTPS connection
- [ ] Enable HTTP to HTTPS redirect

### Monitoring Setup
- [ ] Create Azure Monitor alerts for:
  - [ ] High error rate (>5%)
  - [ ] High response time (>2s)
  - [ ] Authentication failures
  - [ ] Low availability
  - [ ] High resource usage
- [ ] Configure action groups for notifications
- [ ] Create custom dashboard in Azure Portal
- [ ] Set up Log Analytics saved queries
- [ ] Test alert notifications

### Client Configuration
- [ ] Distribute client configuration to users
- [ ] Update `claude_desktop_config.json` with production URL
- [ ] Test client connection from multiple machines
- [ ] Verify authentication works for all users
- [ ] Document any troubleshooting steps

### Documentation
- [ ] Update internal wiki with deployment info
- [ ] Document server URL and configuration
- [ ] Create runbook for common issues
- [ ] Document escalation procedures
- [ ] Share monitoring dashboard links

## User Onboarding

### Azure Entra ID
- [ ] Assign users to MCP Server application
- [ ] Grant appropriate roles (Admin vs User)
- [ ] Configure group-based access if needed
- [ ] Test user authentication

### Training
- [ ] Provide client configuration guide to users
- [ ] Document how to get access tokens
- [ ] Explain available MCP tools and capabilities
- [ ] Share troubleshooting guide
- [ ] Set up support channel (Teams/Slack)

## Security Hardening

### Network
- [ ] Review and restrict NSG rules
- [ ] Enable private endpoints for all services
- [ ] Configure Application Gateway WAF to Prevention mode
- [ ] Review firewall rules
- [ ] Enable DDoS protection

### Access Control
- [ ] Implement principle of least privilege
- [ ] Review and remove unnecessary permissions
- [ ] Enable Azure AD PIM if available
- [ ] Configure conditional access policies
- [ ] Enable MFA for admin accounts

### Secrets
- [ ] Rotate client secrets
- [ ] Store all secrets in Key Vault
- [ ] Enable Key Vault soft delete
- [ ] Configure access policies
- [ ] Set up secret expiration alerts

### Compliance
- [ ] Enable audit logging
- [ ] Configure log retention per compliance requirements
- [ ] Set up log export to long-term storage
- [ ] Document data residency
- [ ] Review compliance with organizational policies

## Operational Readiness

### Backup and Recovery
- [ ] Configure PostgreSQL automated backups
- [ ] Test database restore procedure
- [ ] Document recovery time objective (RTO)
- [ ] Document recovery point objective (RPO)
- [ ] Create disaster recovery plan

### Cost Management
- [ ] Set up budget alerts
- [ ] Review resource SKUs for optimization
- [ ] Enable auto-shutdown for non-prod
- [ ] Tag all resources for cost allocation
- [ ] Schedule monthly cost review

### Maintenance
- [ ] Schedule regular update windows
- [ ] Document update procedures
- [ ] Create rollback plan
- [ ] Set up change management process
- [ ] Define SLA commitments

## Sign-Off

### Technical Review
- [ ] DevOps team approval
- [ ] Security team review completed
- [ ] Network team approval
- [ ] Database team verification

### Business Review
- [ ] Stakeholder notification sent
- [ ] User communication prepared
- [ ] Support team trained
- [ ] Documentation published
- [ ] Go-live date confirmed

### Final Checks
- [ ] All checklist items completed
- [ ] No critical issues outstanding
- [ ] Monitoring and alerts verified
- [ ] Support procedures documented
- [ ] Rollback plan tested

---

## Deployment Sign-Off

**Deployment Date**: _________________

**Deployed By**: _________________

**Reviewed By**: _________________

**Approval**: _________________

---

## Post-Go-Live

### Week 1
- [ ] Daily monitoring of logs and metrics
- [ ] User feedback collection
- [ ] Performance tuning as needed
- [ ] Address any issues immediately

### Week 2-4
- [ ] Continue monitoring
- [ ] Optimize based on usage patterns
- [ ] Scale resources if needed
- [ ] Document lessons learned

### Month 1+
- [ ] Regular maintenance schedule
- [ ] Monthly cost review
- [ ] Quarterly security review
- [ ] Annual disaster recovery test

High-Quality Prompt for Terraform Project Generation.

 

Beautiful, High-Quality Prompt for Terraform Project Generation

before executing this prompt in visual studio code copilot / or any IDE, ensure 
1. we have MCP Server for terraform and filesystem mcp server is already installed.  

extend if you want to update for number of resources.. 

Create a complete Terraform project with the following requirements:

📁 1. Folder Structure

  • Create a root folder named terraform-rak in C:\drive.

  • Inside this folder, create separate .tf files for each module or resource.

📄 2. Core Terraform Files

  1. variables.tf

    • Define all variables required by the project.

  2. variables_development.tfvars

    • Store all variable values for the development environment.

  3. backend.tf

    • Configure a remote backend using an Azure Storage Account.

    • Ensure the Terraform state file is stored in a storage container.

🌐 3. Azure Resources (Each in Its Own File Using Azure Verified Modules)

🔹 Resource Group

🔹 Virtual Network

  • File: vnet.tf

  • Create a virtual network.

  • use Azure verified modules

  • - VNet with 10.0.0.0/16 address space

      - Main subnet (10.0.1.0/24) with Container Apps delegation

      - Private endpoint subnet (10.0.2.0/24)

🔹 Subnet

🔹 Network Security Group (NSG)

🔹 NSG Rules

  • File: network_security_rules.tf

  • Create security rules and:

    • Associate them with the NSG.

    • Associate the NSG with the subnet.

    • Use Azure Verified Modules

🔹 Route Table

  • File: route_table.tf

  • Create a route table with:

    • A route to the internet

    • A route to a virtual appliance (ASA firewall IP address)

    • Use Azure Verified Modules

🔹 Azure Container Apps Environment (CAE)

🔹 Container App

  • File: container_app.tf

  • Deploy a Container App and store its configuration here.

  • Use Azure Verified Modules

  • enable system-assigned identity.

🔹 Azure Container Registry

  • File: container_registry.tf

  • Create an Azure Container Registry.

  • use Azure verified moduled 

  • ensure RBAC and ABACs are configured

  • create private endpoint

  •   - Azure Container Registry (Premium SKU)

      - Container Apps Environment with VNet integration

      - Container App with auto-scaling (1-10 replicas)

      - Health probes (liveness and readiness)

      - RBAC integration between Container Apps and ACR

🔍 4. Diagnostics & Monitoring


Additional Requirements


Final Output Expectation

Generate:

  • A complete folder structure

  • Individual .tf files with correct module references and dependencies

  • A working Terraform configuration ready to run with:

ULTIMATE AGENTIC RAG APPLICATION PROMPT

# 🎯 **ULTIMATE AGENTIC RAG APPLICATION PROMPT**


## Complete Production-Ready Agentic RAG Application


I need you to generate a full-stack, production-ready Agentic RAG (Retrieval-Augmented Generation) application in Python. This is a learning project, so please include extensive comments, explanations, and educational content throughout.


## 🏗️ Architecture Requirements


### Frontend Layer

**Technology:** Streamlit

**Features:**

- Modern, intuitive UI for document upload (PDF, DOCX, TXT, Markdown)

- Interactive query interface with real-time streaming responses

- Display of retrieved source documents with confidence scores

- Conversation history panel for multi-turn dialogues

- Source citation display (show which document chunks were used)

- Visual indicators for agent reasoning steps

- Session state management


### Backend Layer

**Technology:** FastAPI

**Features:**

- RESTful API with async/await patterns

- Automatic API documentation (Swagger/OpenAPI)

- CORS middleware for frontend integration

- Proper error handling with custom exception classes

- Structured logging (JSON format)

- Request validation using Pydantic models

- Health check and metrics endpoints

- API key authentication for security


### Core Azure Services

- **Azure OpenAI Service:** GPT-4o or GPT-4-turbo (latest available model)

- **Azure AI Search:** Vector store with hybrid search (vector + keyword)

- **Azure Blob Storage:** Document persistence and management

- **Azure Key Vault:** Secure credential management (optional but recommended)


### Agentic Framework

- **Orchestration:** Use LangChain or LlamaIndex for agent logic

- **Agent Pattern:** ReAct (Reasoning + Acting) or OpenAI function calling

- **Tools/Capabilities:**

  - Document retriever tool

  - Summarization tool

  - Query refinement tool

  - Multi-step reasoning with thought traces

  - Self-reflection and answer validation


## 📋 Functional Requirements


### 1. Document Ingestion Pipeline

```

User uploads document → Extract text → Intelligent chunking (with overlap) → 

Generate embeddings → Store in Azure AI Search + Blob Storage → Return success

```


**Requirements:**

- Support PDF, DOCX, TXT, and Markdown files

- Implement smart chunking strategy (500-1000 tokens, 10-20% overlap)

- Extract and preserve metadata (filename, upload date, page numbers)

- Generate embeddings using Azure OpenAI (text-embedding-3-large or ada-002)

- Create Azure AI Search index with vector and text fields

- Handle large documents (chunking + batch processing)

- Progress indicators during upload


### 2. Agentic Query Processing

```

User query → Agent analyzes → Plans retrieval strategy → Retrieves context → 

Reasons about information → Generates response → Cites sources → Returns to user

```


**Requirements:**

- Agent breaks down complex queries into sub-tasks

- Dynamic retrieval: fetch more context if initial results insufficient

- Hybrid search: combine vector similarity + keyword matching + semantic ranking

- Re-ranking of retrieved chunks for relevance

- Multi-step reasoning visible to user (show agent's "thoughts")

- Context-aware responses with proper citations

- Handle follow-up questions using conversation history


### 3. API Endpoints


**Document Management:**

- `POST /api/v1/documents/upload` - Upload and process documents

- `GET /api/v1/documents` - List all indexed documents

- `GET /api/v1/documents/{doc_id}` - Get document details

- `DELETE /api/v1/documents/{doc_id}` - Remove document and chunks


**Query & Chat:**

- `POST /api/v1/query` - Submit query with streaming response

- `POST /api/v1/chat` - Conversational endpoint with history

- `GET /api/v1/chat/history/{session_id}` - Retrieve chat history


**System:**

- `GET /api/v1/health` - Health check

- `GET /api/v1/metrics` - Basic usage metrics


## 🗂️ Project Structure


```

agentic-rag-app/

├── backend/

│   ├── app/

│   │   ├── __init__.py

│   │   ├── main.py                 # FastAPI application entry

│   │   ├── config.py               # Configuration and settings

│   │   ├── dependencies.py         # Dependency injection

│   │   ├── api/

│   │   │   ├── __init__.py

│   │   │   ├── routes/

│   │   │   │   ├── documents.py    # Document endpoints

│   │   │   │   ├── query.py        # Query endpoints

│   │   │   │   └── health.py       # Health check

│   │   ├── services/

│   │   │   ├── __init__.py

│   │   │   ├── document_processor.py  # Text extraction & chunking

│   │   │   ├── embedding_service.py   # Azure OpenAI embeddings

│   │   │   ├── search_service.py      # Azure AI Search operations

│   │   │   ├── agent_service.py       # Agentic orchestration

│   │   │   └── llm_service.py         # LLM interactions

│   │   ├── models/

│   │   │   ├── __init__.py

│   │   │   ├── requests.py         # Pydantic request models

│   │   │   ├── responses.py        # Pydantic response models

│   │   │   └── documents.py        # Document data models

│   │   ├── utils/

│   │   │   ├── __init__.py

│   │   │   ├── logging.py          # Logging configuration

│   │   │   ├── exceptions.py       # Custom exceptions

│   │   │   └── azure_clients.py    # Azure SDK clients

│   │   └── core/

│   │       ├── __init__.py

│   │       ├── security.py         # Authentication

│   │       └── prompts.py          # System prompts

│   ├── tests/

│   │   ├── __init__.py

│   │   └── test_api.py

│   ├── requirements.txt

│   ├── .env.example

│   └── Dockerfile

├── frontend/

│   ├── app.py                      # Streamlit main app

│   ├── components/

│   │   ├── __init__.py

│   │   ├── upload.py               # Upload component

│   │   ├── chat.py                 # Chat interface

│   │   └── sidebar.py              # Sidebar with settings

│   ├── utils/

│   │   ├── __init__.py

│   │   ├── api_client.py           # Backend API client

│   │   └── session.py              # Session management

│   ├── requirements.txt

│   └── .streamlit/

│       └── config.toml

├── docs/

│   ├── README.md                   # Main documentation

│   ├── SETUP.md                    # Detailed setup guide

│   ├── ARCHITECTURE.md             # Architecture explanation

│   ├── LEARNING_GUIDE.md           # Educational walkthrough

│   └── architecture-diagram.mmd    # Mermaid diagram

├── scripts/

│   ├── setup_azure.py              # Azure resource setup script

│   └── seed_data.py                # Sample data loader

├── sample_documents/

│   └── example.pdf                 # Test document

├── docker-compose.yml

├── .gitignore

└── README.md

```


## 🔧 Technical Implementation Details


### Technology Stack

- **Python:** 3.11+

- **Backend:** FastAPI 0.110+, uvicorn, python-multipart

- **Frontend:** Streamlit 1.32+

- **Azure SDKs:** 

  - `openai` (latest)

  - `azure-search-documents` 

  - `azure-storage-blob`

  - `azure-identity`

- **Agent Framework:** LangChain 0.1+ or LlamaIndex 0.10+

- **Document Processing:** pypdf2, python-docx, markdown

- **Data Validation:** Pydantic 2.0+

- **Additional:** python-dotenv, httpx, aiohttp


### Chunking Strategy

- Use semantic chunking (sentence-aware)

- Target chunk size: 500-1000 tokens

- Overlap: 10-20% (50-200 tokens)

- Preserve document structure metadata

- Include document title/filename in each chunk


### Embedding Configuration

- Model: `text-embedding-3-large` (3072 dimensions) or `text-embedding-ada-002`

- Batch processing for efficiency

- Normalize vectors for cosine similarity


### Azure AI Search Index Schema

```json

{

  "name": "documents-index",

  "fields": [

    {"name": "id", "type": "Edm.String", "key": true},

    {"name": "content", "type": "Edm.String", "searchable": true},

    {"name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 3072, "vectorSearchProfile": "default"},

    {"name": "document_id", "type": "Edm.String", "filterable": true},

    {"name": "document_name", "type": "Edm.String", "filterable": true},

    {"name": "chunk_index", "type": "Edm.Int32"},

    {"name": "metadata", "type": "Edm.String"}

  ]

}

```


### Agent Prompt Template

Include a clear system prompt that:

- Defines the agent's role as a helpful RAG assistant

- Instructs to use retrieved context

- Requires citation of sources

- Encourages asking clarifying questions

- Enables multi-step reasoning


## 📚 Learning Objectives & Documentation


### Include Detailed Explanations For:

1. **What is Agentic RAG?** How it differs from simple RAG

2. **Chunking Strategies:** Why overlap matters, semantic vs. fixed-size

3. **Embedding Models:** How vector similarity works

4. **Hybrid Search:** Combining vector + keyword + semantic ranking

5. **Agent Reasoning:** ReAct pattern, tool use, chain-of-thought

6. **Prompt Engineering:** System prompts, few-shot examples, context construction

7. **Performance Optimization:** Caching, batch processing, async operations

8. **Error Handling:** Graceful degradation, retry logic, user-friendly messages


### Create These Educational Documents:

- **ARCHITECTURE.md:** System design with Mermaid diagram

- **LEARNING_GUIDE.md:** Step-by-step explanation of each component

- **SETUP.md:** Local development setup, Azure configuration

- **API_DOCS.md:** Endpoint documentation with examples


## 🎨 Code Quality Requirements


- **Type Hints:** Use throughout (functions, variables, return types)

- **Comments:** Explain WHY, not just WHAT

- **Docstrings:** Google or NumPy style for all functions/classes

- **Error Handling:** Try-except blocks with specific exceptions

- **Logging:** Use structured logging (JSON) with appropriate levels

- **PEP 8:** Follow Python style guide

- **Async/Await:** Use for I/O operations

- **Configuration:** All credentials/settings in environment variables

- **Security:** Never hardcode secrets, validate all inputs


## 🚀 Deployment & Running


### Local Development

1. Set up Azure resources (provide script or manual steps)

2. Configure `.env` file with credentials

3. Install dependencies: `pip install -r requirements.txt`

4. Run backend: `uvicorn backend.app.main:app --reload`

5. Run frontend: `streamlit run frontend/app.py`


### Docker Support

- Include `Dockerfile` for both services

- `docker-compose.yml` to run full stack

- Health checks and proper networking


## ✨ Optional Enhancements (If Possible)


- **Memory/History:** Store conversation context for multi-turn chats

- **Observability:** Integration with Langfuse or OpenTelemetry

- **Caching:** Redis for frequently accessed results

- **Rate Limiting:** Protect API endpoints

- **Admin UI:** View usage statistics, manage documents

- **Export:** Download chat history or generated responses

- **Evaluation:** Include retrieval quality metrics


## 📦 Deliverables


1. **Complete working codebase** (all files in proper structure)

2. **Requirements.txt** with pinned versions

3. **.env.example** with all required variables documented

4. **README.md** with quick start guide

5. **Detailed documentation** (SETUP.md, ARCHITECTURE.md, LEARNING_GUIDE.md)

6. **Sample data** for testing (example.pdf or similar)

7. **Mermaid diagram** showing data flow

8. **Inline comments** explaining complex logic


## 🎯 Generation Instructions for Claude


Please generate this project **step-by-step**:


1. **First:** Show the complete project structure (folder tree)

2. **Second:** Generate backend core files (config, models, main.py)

3. **Third:** Implement services (document processing, embeddings, search, agent)

4. **Fourth:** Create API routes (documents, query, health)

5. **Fifth:** Build Streamlit frontend (main app, components)

6. **Sixth:** Add configuration files (requirements.txt, .env.example, docker files)

7. **Seventh:** Create documentation (README, SETUP, ARCHITECTURE, LEARNING_GUIDE)

8. **Eighth:** Include sample prompts and test data


For each file, add:

- Clear comments explaining key concepts

- Type hints for all functions

- Error handling examples

- Educational notes where relevant


## 📊 Architecture Diagram Request


Please also create a Mermaid diagram (`docs/architecture-diagram.mmd`) showing:

- User interaction with Streamlit UI

- HTTP requests to FastAPI backend

- Document upload flow (extraction → chunking → embedding → indexing)

- Query processing flow (query → agent → retrieval → LLM → response)

- Azure services interactions (OpenAI, AI Search, Blob Storage)

- Data flow between all components


Use proper Mermaid syntax (flowchart or sequence diagram) that can be rendered in VS Code or GitHub.


---


## 🎯 My Goal


Build this application to deeply understand Agentic RAG architecture, Azure AI services integration, and production-ready Python development. The code should be clean, well-documented, and serve as a reference implementation for building intelligent document retrieval systems with autonomous agent capabilities.


---


## 💡 Follow-Up Instructions


After pasting this prompt to Claude, follow up with:


> "Please start by generating the folder structure and backend configuration files first. Then proceed step-by-step through each component, ensuring all code includes detailed comments and explanations."


This will help Claude generate organized, manageable code blocks that you can review and learn from systematically.

 

Prompt for generating MCP questions for question bank V1.1.

You are an expert educational assessment designer. I will provide a lecture transcript about a topic (e.g., Microsoft Fabric or Lakehouse). Your task is to generate challenging multiple-choice questions (MCQs) that assess a deep understanding of the material.


Follow this process:


1) Understand the content

   • Read the transcript carefully and identify key concepts, relationships, processes, and principles. 

   • Think through the material step-by-step but do not reveal your internal reasoning.


2) Choose concepts for questions

   • Focus on application, analysis, and evaluation (not simple recall).

   • Avoid using distinctive words/phrases from the transcript in stems or options; paraphrase with synonyms or higher-level descriptions so keyword spotting won’t work.

   • Ensure that about 25% of the questions are advanced or tricky — designed to test subtle understanding or require multi-step reasoning, similar to IIT JEE Advanced style. These can involve combining two concepts, analyzing edge cases, or reasoning through trade-offs.


3) Write clear stems

   • Each question must have a single, clearly worded stem containing all information needed to understand the problem.

   • Avoid irrelevant detail, negative wording, and trick questions.

   • Ask for the “best answer.”


4) Design the alternatives

   • Exactly four options (A, B, C, D).

   • Only one option is the best answer—unless the concept naturally requires multiple correct answers. 

   • Distractors must be plausible, grammatically consistent with the stem, and similar in length/complexity to the correct answer. Base them on realistic misconceptions.

   • Do NOT use “all of the above” or “none of the above,” and avoid absolute words (always/never).

   • Options must be mutually exclusive.


5) Randomize and balance

   • Distribute correct answers roughly evenly across A, B, C, and D.

   • Do not cue the key by repeating stem wording.


6) Difficulty & explanations

   • Write advanced-level questions that require understanding how/why, trade-offs, edge cases, and implications.

   • For each question, write the explanation in this **structured teaching-note format**:


     - Start with:  

       “You’ve correctly identified the answer: B. <option text>” (state the correct letter and option).


     - Then a section titled **Explanation** with a long conceptual discussion of why the correct answer is correct. Provide enough detail to teach the concept to someone new.


     - Then a section titled **Why Not the Other Options?** where you analyze **each distractor separately** in this format:  

       **A. <option text>:** Follow with a detailed reason why this is incorrect.  

       **C. <option text>:** Explain in detail why this is wrong.  

       **D. <option text>:** Explain in detail why this does not apply.  

       Each distractor must have at least 2–3 sentences of reasoning.


     - End with a section titled **Conclusion** that reinforces the key learning point, summarizing why the correct answer is correct in the broader context.


7) Multi-select policy

   • Default to single-answer questions.

   • Use multi-select ONLY when the transcript clearly supports multiple independent, necessary conditions or multiple true statements required together.

   • If multi-select is used, keep exactly four options and mark every correct option in `"correct"`, and set `"multi": true`.

   • Otherwise set `"multi": false`.


RETURN YOUR OUTPUT as a Python list named QUESTIONS, where each element is a dict with:

- "id": unique integer starting from 1

- "text": question stem

- "options": list of four strings (A, B, C, D)

- "correct": list of zero-based indices for the correct option(s)

- "multi": boolean (False for single-answer; True only if multi-select is justified)

- "explanation": the full structured explanation (Correct answer → Explanation → Why Not the Other Options → Conclusion)


Example format:


QUESTIONS = [

    {

        "id": 1,

        "text": "At a minimum, which workspace role is required to create new Lakehouse items?",

        "options": [

            "Viewer",

            "Contributor",

            "Reader",

            "Auditor"

        ],

        "correct": [1],

        "multi": False,

        "explanation":

Explanation

In Microsoft Fabric’s Lakehouse, the data is organized into two distinct sections: managed (tables) and unmanaged (files). The essential difference lies in how data is stored and managed:

  • Managed (Tables) Section:
    • Data is stored in the Delta Lake format, which enforces a transactional structure optimized for analytics and querying.
    • This section supports structured and semi-structured data, stored as Delta tables, which provide features like ACID transactions, schema enforcement, versioning, and time travel.
    • Managed tables are tightly integrated with the Lakehouse’s SQL endpoint and default dataset, enabling seamless querying and reporting.
    • The platform governs these tables, ensuring consistency and compatibility with Fabric’s analytical tools.
  • Unmanaged (Files) Section:
    • This section allows storage of arbitrary file types (e.g., CSVs, JSON, PNGs, PDFs, etc.) in the Lakehouse’s OneLake storage without enforcing the Delta format.
    • Files in this section are not automatically integrated into the Lakehouse’s managed table structure and are not directly queryable via the SQL endpoint unless explicitly converted to Delta tables.
    • It serves as a general-purpose file storage area, useful for raw or unstructured data that may be processed later.

Why Not the Other Options?

  • A. Managed handles multimedia files, unmanaged holds tabular datasets:
    • This is incorrect because the managed section is for tabular data in Delta format, not multimedia files. Multimedia or unstructured files (e.g., images, videos) are typically stored in the unmanaged section, not as tabular datasets.
  • C. Managed auto-archives data, unmanaged is volatile:
    • This is incorrect. Neither section auto-archives data by default, and the unmanaged section is not inherently volatile. Both sections store data persistently in OneLake, but they differ in format and governance, not archival or volatility.
  • D. Managed is cost-free, unmanaged incurs extra billing:
    • This is incorrect. Both managed and unmanaged sections are part of the Lakehouse’s storage in OneLake, and their costs are tied to the same Fabric capacity-based billing model. There’s no distinct billing difference between the two sections.

Conclusion

The key difference is that the managed section enforces the transactional Delta format for structured data, optimized for analytics, while the unmanaged section accepts arbitrary file types, providing flexibility for storing raw or unstructured data."

    }

]


Begin once you receive the transcript.

~~~~~~~~~~~~~~~~~~~~ 2nd prompt~~~~~~~~~~~~~~~~~

You are an expert educational assessment designer. I will provide you with a lecture transcript or an OCR’d PDF document. Your job is to generate challenging multiple-choice questions (MCQs) from the text.  


Follow these rules strictly:


1. Paragraph-by-paragraph coverage

   • Read the document sequentially, one paragraph or section at a time.  

   • For each paragraph, generate exactly 1 advanced MCQ that tests deep understanding of that specific content.  

   • Do not skip any paragraph, and do not recycle the same idea multiple times.  


2. Question design

   • Stems should focus on application, analysis, and evaluation — not simple recall.  

   • Avoid keyword spotting: paraphrase concepts instead of copy-pasting.  

   • Each question must have exactly 4 options (A, B, C, D).  

   • Only one option is correct (multi-select only if the paragraph clearly requires it).  

   • Distractors should be plausible and based on common misconceptions.  

   • No “all of the above,” “none of the above,” or absolute words (always/never).  


3. Interaction style (Study Mode)

   • Ask me one question at a time.  

   • Wait for my answer (A/B/C/D).  

   • Then respond with:

       ✅ Correct OR ❌ Incorrect  

       + A structured teaching note containing:  

         - **Explanation**: why the correct answer is right.  

         - **Why not the other options?**: analyze each distractor.  

         - **Conclusion**: reinforce the key point with a simple real-world analogy (5th grade friendly).  

   • After feedback, move to the next paragraph’s question.  


4. Coverage guarantee

   • Ensure every paragraph or section produces at least one unique MCQ.  

   • Continue until the entire document has been converted into a full MCQ set.  

   • The final output should represent complete coverage of the source text, without repetition.  


---


I will upload a PDF (possibly OCR format). Use this process to generate the MCQs interactively with me.