MCP Server Deployment Checklist

 # MCP Server Deployment Checklist


Use this checklist to ensure a successful deployment of your enterprise MCP Server.

## Pre-Deployment

### Prerequisites
- [ ] Azure CLI installed and configured (`az --version`)
- [ ] Terraform >= 1.5.0 installed (`terraform --version`)
- [ ] Docker installed (`docker --version`)
- [ ] Node.js >= 20.0.0 installed (`node --version`)
- [ ] Azure subscription with Owner or Contributor role
- [ ] Valid Azure Entra ID tenant

### Azure Entra ID Setup
- [ ] Run `setup-entra-id.ps1` or `setup-entra-id.sh`
- [ ] Save Tenant ID, Client ID, and Client Secret securely
- [ ] Grant admin consent for API permissions in Azure Portal
- [ ] Assign test users to the application
- [ ] (Optional) Configure application roles
- [ ] (Optional) Set up conditional access policies

### Configuration
- [ ] Update `terraform/terraform.tfvars` with your values
- [ ] Choose globally unique names for ACR and PostgreSQL
- [ ] Set strong PostgreSQL admin password
- [ ] Configure tags for resource management
- [ ] Review network configuration (address spaces, subnets)

### Security
- [ ] Obtain or generate SSL certificate for Application Gateway
- [ ] Place certificate in `terraform/cert.pfx`
- [ ] Set certificate password in variables
- [ ] Review NSG rules and adjust if needed
- [ ] Configure allowed CORS origins

## Deployment Phase

### Infrastructure Deployment
- [ ] Navigate to `terraform/` directory
- [ ] Run `terraform init`
- [ ] Review `terraform plan` output carefully
- [ ] Run `terraform apply` and confirm
- [ ] Verify all resources created successfully
- [ ] Save Terraform outputs (ACR, PostgreSQL FQDN, etc.)

### Application Deployment
- [ ] Navigate to `server/` directory
- [ ] Login to ACR: `az acr login --name <acr-name>`
- [ ] Build Docker image: `docker build -t mcpserver:latest .`
- [ ] Tag image for ACR
- [ ] Push image to ACR
- [ ] Verify image in ACR: `az acr repository list --name <acr-name>`

### Container App Update
- [ ] Update Container App with new image
- [ ] Wait for deployment to complete
- [ ] Check Container App status: `az containerapp show`
- [ ] Verify replicas are running

## Post-Deployment

### Verification
- [ ] Test health endpoint: `curl https://<ip>/health`
- [ ] Test readiness endpoint: `curl https://<ip>/ready`
- [ ] Test authentication with Azure CLI token
- [ ] Verify MCP SSE endpoint connection
- [ ] Check logs in Log Analytics
- [ ] Review Container App metrics

### DNS and SSL
- [ ] Create DNS A record pointing to Application Gateway IP
- [ ] Update Application Gateway with production SSL certificate
- [ ] Verify SSL certificate validity
- [ ] Test HTTPS connection
- [ ] Enable HTTP to HTTPS redirect

### Monitoring Setup
- [ ] Create Azure Monitor alerts for:
  - [ ] High error rate (>5%)
  - [ ] High response time (>2s)
  - [ ] Authentication failures
  - [ ] Low availability
  - [ ] High resource usage
- [ ] Configure action groups for notifications
- [ ] Create custom dashboard in Azure Portal
- [ ] Set up Log Analytics saved queries
- [ ] Test alert notifications

### Client Configuration
- [ ] Distribute client configuration to users
- [ ] Update `claude_desktop_config.json` with production URL
- [ ] Test client connection from multiple machines
- [ ] Verify authentication works for all users
- [ ] Document any troubleshooting steps

### Documentation
- [ ] Update internal wiki with deployment info
- [ ] Document server URL and configuration
- [ ] Create runbook for common issues
- [ ] Document escalation procedures
- [ ] Share monitoring dashboard links

## User Onboarding

### Azure Entra ID
- [ ] Assign users to MCP Server application
- [ ] Grant appropriate roles (Admin vs User)
- [ ] Configure group-based access if needed
- [ ] Test user authentication

### Training
- [ ] Provide client configuration guide to users
- [ ] Document how to get access tokens
- [ ] Explain available MCP tools and capabilities
- [ ] Share troubleshooting guide
- [ ] Set up support channel (Teams/Slack)

## Security Hardening

### Network
- [ ] Review and restrict NSG rules
- [ ] Enable private endpoints for all services
- [ ] Configure Application Gateway WAF to Prevention mode
- [ ] Review firewall rules
- [ ] Enable DDoS protection

### Access Control
- [ ] Implement principle of least privilege
- [ ] Review and remove unnecessary permissions
- [ ] Enable Azure AD PIM if available
- [ ] Configure conditional access policies
- [ ] Enable MFA for admin accounts

### Secrets
- [ ] Rotate client secrets
- [ ] Store all secrets in Key Vault
- [ ] Enable Key Vault soft delete
- [ ] Configure access policies
- [ ] Set up secret expiration alerts

### Compliance
- [ ] Enable audit logging
- [ ] Configure log retention per compliance requirements
- [ ] Set up log export to long-term storage
- [ ] Document data residency
- [ ] Review compliance with organizational policies

## Operational Readiness

### Backup and Recovery
- [ ] Configure PostgreSQL automated backups
- [ ] Test database restore procedure
- [ ] Document recovery time objective (RTO)
- [ ] Document recovery point objective (RPO)
- [ ] Create disaster recovery plan

### Cost Management
- [ ] Set up budget alerts
- [ ] Review resource SKUs for optimization
- [ ] Enable auto-shutdown for non-prod
- [ ] Tag all resources for cost allocation
- [ ] Schedule monthly cost review

### Maintenance
- [ ] Schedule regular update windows
- [ ] Document update procedures
- [ ] Create rollback plan
- [ ] Set up change management process
- [ ] Define SLA commitments

## Sign-Off

### Technical Review
- [ ] DevOps team approval
- [ ] Security team review completed
- [ ] Network team approval
- [ ] Database team verification

### Business Review
- [ ] Stakeholder notification sent
- [ ] User communication prepared
- [ ] Support team trained
- [ ] Documentation published
- [ ] Go-live date confirmed

### Final Checks
- [ ] All checklist items completed
- [ ] No critical issues outstanding
- [ ] Monitoring and alerts verified
- [ ] Support procedures documented
- [ ] Rollback plan tested

---

## Deployment Sign-Off

**Deployment Date**: _________________

**Deployed By**: _________________

**Reviewed By**: _________________

**Approval**: _________________

---

## Post-Go-Live

### Week 1
- [ ] Daily monitoring of logs and metrics
- [ ] User feedback collection
- [ ] Performance tuning as needed
- [ ] Address any issues immediately

### Week 2-4
- [ ] Continue monitoring
- [ ] Optimize based on usage patterns
- [ ] Scale resources if needed
- [ ] Document lessons learned

### Month 1+
- [ ] Regular maintenance schedule
- [ ] Monthly cost review
- [ ] Quarterly security review
- [ ] Annual disaster recovery test

No comments:

Post a Comment