DevOps Tutorial
Learn DevOps practices, tools, and methodologies to streamline development, deployment, and operations for modern applications.
Overview
DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) to shorten the development lifecycle and provide continuous delivery with high software quality.
DevOps Fundamentals
Core Principles
- Collaboration: Break down silos between development and operations
- Automation: Automate repetitive tasks and processes
- Continuous Integration: Integrate code changes frequently
- Continuous Delivery: Deploy code changes automatically
- Monitoring: Monitor applications and infrastructure continuously
- Feedback: Implement fast feedback loops
DevOps Culture
- Shared Responsibility: Everyone is responsible for the entire pipeline
- Fail Fast: Identify and fix issues quickly
- Learn from Failures: Use failures as learning opportunities
- Continuous Improvement: Always look for ways to improve
Version Control with Git
Git Workflow
bash
# Feature branch workflow
git checkout -b feature/user-authentication
git add .
git commit -m "Add user authentication system"
git push origin feature/user-authentication
# Create pull request for code review
# After approval, merge to main branch
git checkout main
git pull origin main
git merge feature/user-authentication
git push origin main
Git Hooks
bash
#!/bin/sh
# pre-commit hook
echo "Running pre-commit checks..."
# Run linting
npm run lint
if [ $? -ne 0 ]; then
echo "Linting failed. Please fix errors before committing."
exit 1
fi
# Run tests
npm test
if [ $? -ne 0 ]; then
echo "Tests failed. Please fix failing tests before committing."
exit 1
fi
echo "Pre-commit checks passed!"
Conventional Commits
bash
# Format: <type>[optional scope]: <description>
git commit -m "feat(auth): add JWT token validation"
git commit -m "fix(api): resolve user registration bug"
git commit -m "docs(readme): update installation instructions"
git commit -m "refactor(database): optimize user queries"
Continuous Integration (CI)
GitHub Actions CI Pipeline
yaml
# .github/workflows/ci.yml
name: CI Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [16.x, 18.x, 20.x]
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linting
run: npm run lint
- name: Run tests
run: npm test -- --coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
- name: Build application
run: npm run build
- name: Run security audit
run: npm audit --audit-level high
docker:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
myapp:latest
myapp:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
GitLab CI/CD
yaml
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
test:
stage: test
image: node:18
script:
- npm ci
- npm run lint
- npm test -- --coverage
- npm run build
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
coverage: '/Lines\s*:\s*(\d+\.\d+)%/'
build:
stage: build
script:
- docker build -t $DOCKER_IMAGE .
- docker push $DOCKER_IMAGE
only:
- main
- develop
deploy_staging:
stage: deploy
script:
- kubectl set image deployment/myapp myapp=$DOCKER_IMAGE
- kubectl rollout status deployment/myapp
environment:
name: staging
url: https://staging.myapp.com
only:
- develop
deploy_production:
stage: deploy
script:
- kubectl set image deployment/myapp myapp=$DOCKER_IMAGE
- kubectl rollout status deployment/myapp
environment:
name: production
url: https://myapp.com
when: manual
only:
- main
Containerization with Docker
Dockerfile Best Practices
dockerfile
# Multi-stage build for Node.js application
FROM node:18-alpine AS builder
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy source code
COPY . .
# Build application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
# Set working directory
WORKDIR /app
# Copy built application from builder stage
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package.json ./package.json
# Switch to non-root user
USER nextjs
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
# Start application
CMD ["npm", "start"]
Docker Compose for Development
yaml
# docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://user:password@db:5432/myapp
- REDIS_URL=redis://redis:6379
volumes:
- .:/app
- /app/node_modules
depends_on:
- db
- redis
command: npm run dev
db:
image: postgres:15
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=user
- POSTGRES_PASSWORD=password
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- app
volumes:
postgres_data:
redis_data:
Infrastructure as Code
Terraform Configuration
hcl
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "myapp-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
}
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
Environment = var.environment
}
}
# Subnets
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-${count.index + 1}"
Type = "Public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 10}.0/24"
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-${count.index + 1}"
Type = "Private"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-igw"
}
}
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = "${var.project_name}-cluster"
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
}
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
]
}
Ansible Playbooks
yaml
# playbook.yml
---
- name: Deploy Application
hosts: webservers
become: yes
vars:
app_name: myapp
app_version: "{{ version | default('latest') }}"
tasks:
- name: Update system packages
apt:
update_cache: yes
upgrade: dist
- name: Install Docker
apt:
name: docker.io
state: present
- name: Start Docker service
systemd:
name: docker
state: started
enabled: yes
- name: Pull application image
docker_image:
name: "myapp:{{ app_version }}"
source: pull
- name: Stop existing container
docker_container:
name: "{{ app_name }}"
state: stopped
ignore_errors: yes
- name: Remove existing container
docker_container:
name: "{{ app_name }}"
state: absent
ignore_errors: yes
- name: Start new container
docker_container:
name: "{{ app_name }}"
image: "myapp:{{ app_version }}"
state: started
restart_policy: always
ports:
- "3000:3000"
env:
NODE_ENV: production
DATABASE_URL: "{{ database_url }}"
- name: Configure Nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/sites-available/{{ app_name }}
notify: restart nginx
- name: Enable Nginx site
file:
src: /etc/nginx/sites-available/{{ app_name }}
dest: /etc/nginx/sites-enabled/{{ app_name }}
state: link
notify: restart nginx
handlers:
- name: restart nginx
systemd:
name: nginx
state: restarted
Kubernetes Orchestration
Deployment Configuration
yaml
# k8s/deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.com
secretName: myapp-tls
rules:
- host: myapp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80
Helm Charts
yaml
# Chart.yaml
apiVersion: v2
name: myapp
description: A Helm chart for MyApp
type: application
version: 0.1.0
appVersion: "1.0.0"
# values.yaml
replicaCount: 3
image:
repository: myapp
pullPolicy: IfNotPresent
tag: "latest"
service:
type: ClusterIP
port: 80
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: myapp.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: myapp-tls
hosts:
- myapp.com
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
Monitoring and Observability
Prometheus Configuration
yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'myapp'
static_configs:
- targets: ['myapp:3000']
metrics_path: /metrics
scrape_interval: 5s
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Grafana Dashboard
json
{
"dashboard": {
"title": "MyApp Dashboard",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"legendFormat": "{{method}} {{status}}"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
}
]
},
{
"title": "Error Rate",
"type": "singlestat",
"targets": [
{
"expr": "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m]) * 100"
}
]
}
]
}
}
Application Metrics
javascript
// metrics.js
const prometheus = require('prom-client')
// Create metrics
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
})
const httpRequestTotal = new prometheus.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
})
const activeConnections = new prometheus.Gauge({
name: 'active_connections',
help: 'Number of active connections'
})
// Middleware to collect metrics
function metricsMiddleware(req, res, next) {
const start = Date.now()
res.on('finish', () => {
const duration = (Date.now() - start) / 1000
const labels = {
method: req.method,
route: req.route?.path || req.path,
status_code: res.statusCode
}
httpRequestDuration.observe(labels, duration)
httpRequestTotal.inc(labels)
})
next()
}
// Metrics endpoint
app.get('/metrics', (req, res) => {
res.set('Content-Type', prometheus.register.contentType)
res.end(prometheus.register.metrics())
})
module.exports = { metricsMiddleware }
Logging and Log Management
Structured Logging
javascript
// logger.js
const winston = require('winston')
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'myapp',
version: process.env.APP_VERSION || '1.0.0'
},
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
})
if (process.env.NODE_ENV !== 'production') {
logger.add(new winston.transports.Console({
format: winston.format.simple()
}))
}
// Usage
logger.info('User logged in', { userId: 123, email: 'user@example.com' })
logger.error('Database connection failed', { error: error.message, stack: error.stack })
module.exports = logger
ELK Stack Configuration
yaml
# docker-compose.elk.yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
logstash:
image: docker.elastic.co/logstash/logstash:8.8.0
ports:
- "5044:5044"
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
depends_on:
- elasticsearch
kibana:
image: docker.elastic.co/kibana/kibana:8.8.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
volumes:
elasticsearch_data:
Logstash Configuration
ruby
# logstash.conf
input {
beats {
port => 5044
}
}
filter {
if [fields][service] == "myapp" {
json {
source => "message"
}
date {
match => [ "timestamp", "ISO8601" ]
}
if [level] == "error" {
mutate {
add_tag => [ "error" ]
}
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "myapp-logs-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
Security and Compliance
Security Scanning
yaml
# .github/workflows/security.yml
name: Security Scan
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
schedule:
- cron: '0 2 * * 1' # Weekly scan
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:latest'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
Secrets Management
yaml
# k8s/secrets.yml
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
database-url: <base64-encoded-url>
jwt-secret: <base64-encoded-secret>
api-key: <base64-encoded-key>
---
# Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "https://vault.example.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "myapp"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database-url
remoteRef:
key: myapp
property: database_url
Performance Optimization
Load Testing
javascript
// k6-load-test.js
import http from 'k6/http'
import { check, sleep } from 'k6'
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 200 }, // Ramp up to 200 users
{ duration: '5m', target: 200 }, // Stay at 200 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests must complete below 500ms
http_req_failed: ['rate<0.1'], // Error rate must be below 10%
},
}
export default function() {
let response = http.get('https://myapp.com/api/users')
check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
})
sleep(1)
}
Auto Scaling
yaml
# k8s/hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Disaster Recovery
Backup Strategy
bash
#!/bin/bash
# backup.sh
# Database backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Upload to S3
aws s3 cp backup_*.sql.gz s3://myapp-backups/database/
# Clean up old backups (keep last 30 days)
find . -name "backup_*.sql.gz" -mtime +30 -delete
# Application data backup
kubectl create backup myapp-backup --from-schedule=daily-backup
# Verify backup integrity
pg_restore --list backup_$(date +%Y%m%d_%H%M%S).sql.gz > /dev/null
if [ $? -eq 0 ]; then
echo "Backup verification successful"
else
echo "Backup verification failed"
exit 1
fi
Multi-Region Deployment
yaml
# terraform/multi-region.tf
module "primary_region" {
source = "./modules/infrastructure"
region = "us-east-1"
environment = "production"
is_primary = true
}
module "secondary_region" {
source = "./modules/infrastructure"
region = "us-west-2"
environment = "production"
is_primary = false
# Cross-region replication
primary_db_identifier = module.primary_region.db_identifier
}
# Route 53 health checks and failover
resource "aws_route53_health_check" "primary" {
fqdn = module.primary_region.load_balancer_dns
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "primary" {
zone_id = var.hosted_zone_id
name = "myapp.com"
type = "A"
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
alias {
name = module.primary_region.load_balancer_dns
zone_id = module.primary_region.load_balancer_zone_id
evaluate_target_health = true
}
}
resource "aws_route53_record" "secondary" {
zone_id = var.hosted_zone_id
name = "myapp.com"
type = "A"
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
alias {
name = module.secondary_region.load_balancer_dns
zone_id = module.secondary_region.load_balancer_zone_id
evaluate_target_health = true
}
}
DevOps Best Practices
Code Quality Gates
yaml
# quality-gates.yml
quality_gates:
- name: "Code Coverage"
threshold: 80
metric: "coverage"
- name: "Technical Debt"
threshold: "A"
metric: "maintainability_rating"
- name: "Security Rating"
threshold: "A"
metric: "security_rating"
- name: "Reliability Rating"
threshold: "A"
metric: "reliability_rating"
- name: "Duplicated Lines"
threshold: 3
metric: "duplicated_lines_density"
Deployment Strategies
bash
#!/bin/bash
# blue-green-deployment.sh
BLUE_ENV="myapp-blue"
GREEN_ENV="myapp-green"
CURRENT_ENV=$(kubectl get service myapp-service -o jsonpath='{.spec.selector.version}')
if [ "$CURRENT_ENV" = "blue" ]; then
NEW_ENV="green"
OLD_ENV="blue"
else
NEW_ENV="blue"
OLD_ENV="green"
fi
echo "Deploying to $NEW_ENV environment..."
# Deploy to new environment
kubectl set image deployment/myapp-$NEW_ENV myapp=$1
# Wait for deployment to be ready
kubectl rollout status deployment/myapp-$NEW_ENV
# Run health checks
kubectl exec deployment/myapp-$NEW_ENV -- curl -f http://localhost:3000/health
if [ $? -eq 0 ]; then
echo "Health check passed. Switching traffic to $NEW_ENV..."
# Switch traffic
kubectl patch service myapp-service -p '{"spec":{"selector":{"version":"'$NEW_ENV'"}}}'
echo "Traffic switched to $NEW_ENV. Deployment successful!"
# Scale down old environment after 5 minutes
sleep 300
kubectl scale deployment myapp-$OLD_ENV --replicas=0
else
echo "Health check failed. Rolling back..."
kubectl rollout undo deployment/myapp-$NEW_ENV
exit 1
fi
Incident Response
yaml
# incident-response.yml
incident_response:
severity_levels:
P1: "Critical - Service down"
P2: "High - Major feature impacted"
P3: "Medium - Minor feature impacted"
P4: "Low - Cosmetic issue"
escalation_matrix:
P1:
immediate: ["on-call-engineer", "team-lead"]
15_minutes: ["engineering-manager", "product-manager"]
30_minutes: ["cto", "ceo"]
P2:
immediate: ["on-call-engineer"]
30_minutes: ["team-lead"]
2_hours: ["engineering-manager"]
runbooks:
- name: "Database Connection Issues"
steps:
- "Check database server status"
- "Verify connection pool settings"
- "Check network connectivity"
- "Review recent deployments"
- name: "High Response Times"
steps:
- "Check application metrics"
- "Review database query performance"
- "Check external service dependencies"
- "Scale application if needed"
This comprehensive DevOps tutorial covers essential practices and tools for modern software delivery. Adapt these practices to your specific technology stack and organizational needs.