About Me

My photo
I am an MCSE in Data Management and Analytics, specializing in MS SQL Server, and an MCP in Azure. With over 19+ years of experience in the IT industry, I bring expertise in data management, Azure Cloud, Data Center Migration, Infrastructure Architecture planning, as well as Virtualization and automation. I have a deep passion for driving innovation through infrastructure automation, particularly using Terraform for efficient provisioning. If you're looking for guidance on automating your infrastructure or have questions about Azure, SQL Server, or cloud migration, feel free to reach out. I often write to capture my own experiences and insights for future reference, but I hope that sharing these experiences through my blog will help others on their journey as well. Thank you for reading!

# MCP Server Monitoring and Observability Guide



This guide covers monitoring, logging, and observability for the MCP Server deployment.

## Table of Contents

1. [Azure Monitor Integration](#azure-monitor-integration)
2. [Log Analytics](#log-analytics)
3. [Application Insights](#application-insights)
4. [Alerts and Notifications](#alerts-and-notifications)
5. [Dashboards](#dashboards)
6. [Metrics](#metrics)
7. [Troubleshooting](#troubleshooting)

## Azure Monitor Integration

The MCP Server is fully integrated with Azure Monitor for comprehensive observability.

### Key Components

- **Log Analytics Workspace**: Centralized log storage
- **Application Insights**: Application performance monitoring
- **Azure Monitor Metrics**: Resource-level metrics
- **Container App Logs**: Application and system logs

## Log Analytics

### Accessing Logs

1. Navigate to Azure Portal
2. Go to your Log Analytics Workspace
3. Select "Logs" from the left menu

### Common Queries

#### View All Application Logs
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
| take 100
```

#### Search for Errors
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "error" or Log_s contains "ERROR"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
```

#### Authentication Failures
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "401" or Log_s contains "Unauthorized"
| project TimeGenerated, Log_s
| order by TimeGenerated desc
```

#### User Activity
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "User authenticated"
| extend UserId = extract("userId\":\"([^\"]+)", 1, Log_s)
| summarize Count = count() by UserId, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
```

#### Performance Metrics
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "response time" or Log_s contains "duration"
| extend ResponseTime = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| summarize avg(ResponseTime), max(ResponseTime), min(ResponseTime) by bin(TimeGenerated, 5m)
```

#### Database Query Performance
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| where Log_s contains "database" and Log_s contains "query"
| extend QueryDuration = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| summarize avg(QueryDuration), count() by bin(TimeGenerated, 5m)
```

## Application Insights

### Key Metrics

1. **Request Rate**: Requests per second
2. **Response Time**: Average response time
3. **Failure Rate**: Failed requests percentage
4. **Dependencies**: External service calls (database, etc.)

### Viewing Metrics

Navigate to: **Application Insights > Investigate > Performance**

### Custom Metrics

The MCP Server emits custom metrics:

- `mcp.connections.active`: Active MCP connections
- `mcp.tools.calls`: Tool call count
- `mcp.auth.success`: Successful authentications
- `mcp.auth.failed`: Failed authentications

## Alerts and Notifications

### Recommended Alerts

#### High Error Rate
```json
{
  "name": "High Error Rate",
  "description": "Alert when error rate exceeds 5%",
  "condition": {
    "metric": "requests/failed",
    "threshold": 5,
    "timeAggregation": "Average",
    "windowSize": "PT5M"
  },
  "actions": [
    {
      "actionGroup": "ops-team",
      "emailSubject": "MCP Server High Error Rate"
    }
  ]
}
```

#### High Response Time
```json
{
  "name": "High Response Time",
  "description": "Alert when average response time exceeds 2 seconds",
  "condition": {
    "metric": "requests/duration",
    "threshold": 2000,
    "timeAggregation": "Average",
    "windowSize": "PT5M"
  }
}
```

#### Authentication Failures
```json
{
  "name": "Authentication Failures",
  "description": "Alert on repeated authentication failures",
  "condition": {
    "query": "ContainerAppConsoleLogs_CL | where Log_s contains 'Authentication failed' | summarize count()",
    "threshold": 10,
    "timeAggregation": "Total",
    "windowSize": "PT5M"
  }
}
```

#### Low Availability
```json
{
  "name": "Container App Unhealthy",
  "description": "Alert when health check fails",
  "condition": {
    "metric": "healthcheck/status",
    "threshold": 1,
    "operator": "LessThan",
    "windowSize": "PT5M"
  }
}
```

### Creating Alerts via Azure CLI

```bash
# Create action group
az monitor action-group create \
  --name ops-team \
  --resource-group rg-mcp-server-prod \
  --short-name ops \
  --email admin admin@yourcompany.com

# Create metric alert
az monitor metrics alert create \
  --name high-error-rate \
  --resource-group rg-mcp-server-prod \
  --scopes /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --condition "total requests/failed > 5" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action ops-team
```

## Dashboards

### Create Custom Dashboard

1. Navigate to Azure Portal
2. Select "Dashboard" > "New dashboard"
3. Add tiles for:
   - Request count
   - Response time
   - Error rate
   - Active connections
   - CPU/Memory usage

### Sample Dashboard JSON

```json
{
  "lenses": {
    "0": {
      "order": 0,
      "parts": {
        "0": {
          "position": {
            "x": 0,
            "y": 0,
            "colSpan": 6,
            "rowSpan": 4
          },
          "metadata": {
            "type": "Extension/HubsExtension/PartType/MonitorChartPart",
            "settings": {
              "title": "Request Rate",
              "visualization": {
                "chartType": "Line",
                "legendVisualization": {
                  "isVisible": true
                }
              }
            }
          }
        }
      }
    }
  }
}
```

## Metrics

### Container App Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Replica Count | Number of active replicas | Min: 2, Max: 10 |
| CPU Usage | CPU percentage | < 80% |
| Memory Usage | Memory percentage | < 80% |
| Request Count | Total requests | Monitor trends |
| Request Duration | Average response time | < 2 seconds |

### Database Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Connections | Active connections | < 80% of max |
| CPU Usage | Database CPU | < 80% |
| Storage | Used storage | < 80% of quota |
| Query Duration | Average query time | < 500ms |

### Application Gateway Metrics

| Metric | Description | Threshold |
|--------|-------------|-----------|
| Throughput | Bytes/second | Monitor trends |
| Failed Requests | Count of 5xx errors | < 1% |
| Backend Response Time | Time to first byte | < 1 second |
| Healthy Host Count | Number of healthy backends | > 0 |

## Troubleshooting

### Common Issues

#### 1. High Response Time

**Symptoms**: Slow API responses

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "ca-mcpserver-prod"
| extend Duration = todouble(extract("duration\":([0-9]+)", 1, Log_s))
| where Duration > 2000
| project TimeGenerated, Log_s
```

**Solutions**:
- Scale up replicas
- Optimize database queries
- Check network latency
- Review application code

#### 2. Authentication Failures

**Symptoms**: 401 errors

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where Log_s contains "Token verification failed"
| project TimeGenerated, Log_s
```

**Solutions**:
- Verify Entra ID configuration
- Check token expiration
- Validate audience/issuer settings
- Review user permissions

#### 3. Database Connection Issues

**Symptoms**: Database errors

**Investigation**:
```kusto
ContainerAppConsoleLogs_CL
| where Log_s contains "PostgreSQL" and Log_s contains "error"
| project TimeGenerated, Log_s
```

**Solutions**:
- Check connection string
- Verify firewall rules
- Check connection pool size
- Review database health

#### 4. Memory Leaks

**Symptoms**: Increasing memory usage

**Investigation**:
- Check container app metrics
- Review memory usage trends
- Look for unclosed connections

**Solutions**:
- Restart container app
- Review application code
- Implement connection pooling
- Add memory limits

### Health Check Endpoints

#### Application Health
```bash
curl https://mcp.yourcompany.com/health
```

Expected Response:
```json
{
  "status": "healthy",
  "timestamp": "2025-12-09T10:00:00Z",
  "version": "1.0.0",
  "uptime": 86400
}
```

#### Readiness Check
```bash
curl https://mcp.yourcompany.com/ready
```

#### Metrics Endpoint
```bash
curl -H "Authorization: Bearer $TOKEN" https://mcp.yourcompany.com/metrics
```

## Log Retention

- **Container App Logs**: 30 days (configurable)
- **Log Analytics**: 30 days (configurable up to 730 days)
- **Application Insights**: 90 days default
- **Archived Logs**: Configure export to Storage Account for long-term retention

## Exporting Logs

### To Storage Account

```bash
az monitor diagnostic-settings create \
  --name export-to-storage \
  --resource /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --storage-account {storage-account-id} \
  --logs '[{"category":"ContainerAppConsoleLogs","enabled":true}]'
```

### To Event Hub

```bash
az monitor diagnostic-settings create \
  --name export-to-eventhub \
  --resource /subscriptions/{sub-id}/resourceGroups/rg-mcp-server-prod/providers/Microsoft.App/containerApps/ca-mcpserver-prod \
  --event-hub {event-hub-name} \
  --event-hub-rule {auth-rule-id} \
  --logs '[{"category":"ContainerAppConsoleLogs","enabled":true}]'
```

## Best Practices

1. **Set up alerts early** - Don't wait for incidents
2. **Review logs regularly** - Weekly log reviews
3. **Monitor trends** - Look for patterns over time
4. **Document incidents** - Keep runbooks updated
5. **Test alerts** - Ensure notifications work
6. **Rotate credentials** - Regular security reviews
7. **Capacity planning** - Monitor growth trends
8. **Cost optimization** - Review unused resources

## Support

For monitoring issues:
- DevOps Team: devops@yourcompany.com
- Azure Support: https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade

MCP Server Deployment Checklist

 # MCP Server Deployment Checklist


Use this checklist to ensure a successful deployment of your enterprise MCP Server.

## Pre-Deployment

### Prerequisites
- [ ] Azure CLI installed and configured (`az --version`)
- [ ] Terraform >= 1.5.0 installed (`terraform --version`)
- [ ] Docker installed (`docker --version`)
- [ ] Node.js >= 20.0.0 installed (`node --version`)
- [ ] Azure subscription with Owner or Contributor role
- [ ] Valid Azure Entra ID tenant

### Azure Entra ID Setup
- [ ] Run `setup-entra-id.ps1` or `setup-entra-id.sh`
- [ ] Save Tenant ID, Client ID, and Client Secret securely
- [ ] Grant admin consent for API permissions in Azure Portal
- [ ] Assign test users to the application
- [ ] (Optional) Configure application roles
- [ ] (Optional) Set up conditional access policies

### Configuration
- [ ] Update `terraform/terraform.tfvars` with your values
- [ ] Choose globally unique names for ACR and PostgreSQL
- [ ] Set strong PostgreSQL admin password
- [ ] Configure tags for resource management
- [ ] Review network configuration (address spaces, subnets)

### Security
- [ ] Obtain or generate SSL certificate for Application Gateway
- [ ] Place certificate in `terraform/cert.pfx`
- [ ] Set certificate password in variables
- [ ] Review NSG rules and adjust if needed
- [ ] Configure allowed CORS origins

## Deployment Phase

### Infrastructure Deployment
- [ ] Navigate to `terraform/` directory
- [ ] Run `terraform init`
- [ ] Review `terraform plan` output carefully
- [ ] Run `terraform apply` and confirm
- [ ] Verify all resources created successfully
- [ ] Save Terraform outputs (ACR, PostgreSQL FQDN, etc.)

### Application Deployment
- [ ] Navigate to `server/` directory
- [ ] Login to ACR: `az acr login --name <acr-name>`
- [ ] Build Docker image: `docker build -t mcpserver:latest .`
- [ ] Tag image for ACR
- [ ] Push image to ACR
- [ ] Verify image in ACR: `az acr repository list --name <acr-name>`

### Container App Update
- [ ] Update Container App with new image
- [ ] Wait for deployment to complete
- [ ] Check Container App status: `az containerapp show`
- [ ] Verify replicas are running

## Post-Deployment

### Verification
- [ ] Test health endpoint: `curl https://<ip>/health`
- [ ] Test readiness endpoint: `curl https://<ip>/ready`
- [ ] Test authentication with Azure CLI token
- [ ] Verify MCP SSE endpoint connection
- [ ] Check logs in Log Analytics
- [ ] Review Container App metrics

### DNS and SSL
- [ ] Create DNS A record pointing to Application Gateway IP
- [ ] Update Application Gateway with production SSL certificate
- [ ] Verify SSL certificate validity
- [ ] Test HTTPS connection
- [ ] Enable HTTP to HTTPS redirect

### Monitoring Setup
- [ ] Create Azure Monitor alerts for:
  - [ ] High error rate (>5%)
  - [ ] High response time (>2s)
  - [ ] Authentication failures
  - [ ] Low availability
  - [ ] High resource usage
- [ ] Configure action groups for notifications
- [ ] Create custom dashboard in Azure Portal
- [ ] Set up Log Analytics saved queries
- [ ] Test alert notifications

### Client Configuration
- [ ] Distribute client configuration to users
- [ ] Update `claude_desktop_config.json` with production URL
- [ ] Test client connection from multiple machines
- [ ] Verify authentication works for all users
- [ ] Document any troubleshooting steps

### Documentation
- [ ] Update internal wiki with deployment info
- [ ] Document server URL and configuration
- [ ] Create runbook for common issues
- [ ] Document escalation procedures
- [ ] Share monitoring dashboard links

## User Onboarding

### Azure Entra ID
- [ ] Assign users to MCP Server application
- [ ] Grant appropriate roles (Admin vs User)
- [ ] Configure group-based access if needed
- [ ] Test user authentication

### Training
- [ ] Provide client configuration guide to users
- [ ] Document how to get access tokens
- [ ] Explain available MCP tools and capabilities
- [ ] Share troubleshooting guide
- [ ] Set up support channel (Teams/Slack)

## Security Hardening

### Network
- [ ] Review and restrict NSG rules
- [ ] Enable private endpoints for all services
- [ ] Configure Application Gateway WAF to Prevention mode
- [ ] Review firewall rules
- [ ] Enable DDoS protection

### Access Control
- [ ] Implement principle of least privilege
- [ ] Review and remove unnecessary permissions
- [ ] Enable Azure AD PIM if available
- [ ] Configure conditional access policies
- [ ] Enable MFA for admin accounts

### Secrets
- [ ] Rotate client secrets
- [ ] Store all secrets in Key Vault
- [ ] Enable Key Vault soft delete
- [ ] Configure access policies
- [ ] Set up secret expiration alerts

### Compliance
- [ ] Enable audit logging
- [ ] Configure log retention per compliance requirements
- [ ] Set up log export to long-term storage
- [ ] Document data residency
- [ ] Review compliance with organizational policies

## Operational Readiness

### Backup and Recovery
- [ ] Configure PostgreSQL automated backups
- [ ] Test database restore procedure
- [ ] Document recovery time objective (RTO)
- [ ] Document recovery point objective (RPO)
- [ ] Create disaster recovery plan

### Cost Management
- [ ] Set up budget alerts
- [ ] Review resource SKUs for optimization
- [ ] Enable auto-shutdown for non-prod
- [ ] Tag all resources for cost allocation
- [ ] Schedule monthly cost review

### Maintenance
- [ ] Schedule regular update windows
- [ ] Document update procedures
- [ ] Create rollback plan
- [ ] Set up change management process
- [ ] Define SLA commitments

## Sign-Off

### Technical Review
- [ ] DevOps team approval
- [ ] Security team review completed
- [ ] Network team approval
- [ ] Database team verification

### Business Review
- [ ] Stakeholder notification sent
- [ ] User communication prepared
- [ ] Support team trained
- [ ] Documentation published
- [ ] Go-live date confirmed

### Final Checks
- [ ] All checklist items completed
- [ ] No critical issues outstanding
- [ ] Monitoring and alerts verified
- [ ] Support procedures documented
- [ ] Rollback plan tested

---

## Deployment Sign-Off

**Deployment Date**: _________________

**Deployed By**: _________________

**Reviewed By**: _________________

**Approval**: _________________

---

## Post-Go-Live

### Week 1
- [ ] Daily monitoring of logs and metrics
- [ ] User feedback collection
- [ ] Performance tuning as needed
- [ ] Address any issues immediately

### Week 2-4
- [ ] Continue monitoring
- [ ] Optimize based on usage patterns
- [ ] Scale resources if needed
- [ ] Document lessons learned

### Month 1+
- [ ] Regular maintenance schedule
- [ ] Monthly cost review
- [ ] Quarterly security review
- [ ] Annual disaster recovery test

High-Quality Prompt for Terraform Project Generation.

 

Beautiful, High-Quality Prompt for Terraform Project Generation

before executing this prompt in visual studio code copilot / or any IDE, ensure 
1. we have MCP Server for terraform and filesystem mcp server is already installed.  

extend if you want to update for number of resources.. 

Create a complete Terraform project with the following requirements:

📁 1. Folder Structure

  • Create a root folder named terraform-rak in C:\drive.

  • Inside this folder, create separate .tf files for each module or resource.

📄 2. Core Terraform Files

  1. variables.tf

    • Define all variables required by the project.

  2. variables_development.tfvars

    • Store all variable values for the development environment.

  3. backend.tf

    • Configure a remote backend using an Azure Storage Account.

    • Ensure the Terraform state file is stored in a storage container.

🌐 3. Azure Resources (Each in Its Own File Using Azure Verified Modules)

🔹 Resource Group

🔹 Virtual Network

  • File: vnet.tf

  • Create a virtual network.

  • use Azure verified modules

  • - VNet with 10.0.0.0/16 address space

      - Main subnet (10.0.1.0/24) with Container Apps delegation

      - Private endpoint subnet (10.0.2.0/24)

🔹 Subnet

🔹 Network Security Group (NSG)

🔹 NSG Rules

  • File: network_security_rules.tf

  • Create security rules and:

    • Associate them with the NSG.

    • Associate the NSG with the subnet.

    • Use Azure Verified Modules

🔹 Route Table

  • File: route_table.tf

  • Create a route table with:

    • A route to the internet

    • A route to a virtual appliance (ASA firewall IP address)

    • Use Azure Verified Modules

🔹 Azure Container Apps Environment (CAE)

🔹 Container App

  • File: container_app.tf

  • Deploy a Container App and store its configuration here.

  • Use Azure Verified Modules

  • enable system-assigned identity.

🔹 Azure Container Registry

  • File: container_registry.tf

  • Create an Azure Container Registry.

  • use Azure verified moduled 

  • ensure RBAC and ABACs are configured

  • create private endpoint

  •   - Azure Container Registry (Premium SKU)

      - Container Apps Environment with VNet integration

      - Container App with auto-scaling (1-10 replicas)

      - Health probes (liveness and readiness)

      - RBAC integration between Container Apps and ACR

🔍 4. Diagnostics & Monitoring


Additional Requirements


Final Output Expectation

Generate:

  • A complete folder structure

  • Individual .tf files with correct module references and dependencies

  • A working Terraform configuration ready to run with: