AWS CloudWatch Monitoring and Alerting - 2025 Edition

January 9, 2025

                                                                           

🚀 What’s New in This 2025 Update

Major Changes Since 2017

  • Managed Observability - AWS managed Prometheus and Grafana services
  • Container-Native Monitoring - Deep EKS/ECS integration with CloudWatch Container Insights
  • Infrastructure as Code - CloudFormation/Terraform for monitoring automation
  • AI/ML-Powered Alerts - Amazon Lookout for Metrics and SageMaker integration
  • Enhanced Security - New IAM Access Analyzer and mandatory MFA for root users
  • Cost Optimization - Advanced cost monitoring and resource optimization tools

Key Improvements

  • ✅ Modern Observability Stack - Prometheus, Grafana, and CloudWatch integration
  • ✅ Automated Monitoring - Infrastructure as Code and Lambda-based automation
  • ✅ Multi-Cloud Support - Cloud-agnostic monitoring strategies
  • ✅ Enhanced Security - Comprehensive security and compliance monitoring

Modern AWS Monitoring Architecture 2025

AWS monitoring has evolved significantly from custom solutions to managed observability platforms. This guide covers modern approaches using CloudWatch, Prometheus, Grafana, and Infrastructure as Code.

CloudWatch Enhancements 2025

Network Firewall Dashboard

AWS introduced a new monitoring dashboard for Network Firewall with comprehensive visibility:

# CloudFormation template for Network Firewall monitoring
Resources:
  NetworkFirewallDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: NetworkFirewall-Traffic-Analysis
      DashboardBody: !Sub |
        {
          "widgets": [
            {
              "type": "metric",
              "properties": {
                "metrics": [
                  [ "AWS/NetworkFirewall", "PacketsDropped", "FirewallName", "${NetworkFirewall}" ],
                  [ ".", "PacketsForwarded", ".", "." ],
                  [ ".", "PacketsInspected", ".", "." ]
                ],
                "period": 300,
                "stat": "Sum",
                "region": "${AWS::Region}",
                "title": "Network Firewall Traffic Metrics"
              }
            },
            {
              "type": "log",
              "properties": {
                "query": "SOURCE '/aws/networkfirewall/flowlogs'\n| fields @timestamp, srcaddr, dstaddr, srcport, dstport, protocol, action\n| filter action = \"DROP\"\n| stats count() by srcaddr\n| sort count desc\n| limit 10",
                "region": "${AWS::Region}",
                "title": "Top Blocked Source IPs"
              }
            }
          ]
        }

Container Insights Integration

# EKS Cluster with Container Insights
Resources:
  EKSCluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: production-cluster
      Logging:
        ClusterLogging:
          EnabledTypes:
            - Type: api
            - Type: audit
            - Type: authenticator
            - Type: controllerManager
            - Type: scheduler
      
  ContainerInsights:
    Type: AWS::CloudWatch::CompositeAlarm
    Properties:
      AlarmName: EKS-Container-Health
      CompositeAlarmRule: !Sub |
        ALARM("${CPUAlarm}") OR 
        ALARM("${MemoryAlarm}") OR 
        ALARM("${PodRestartAlarm}")
      
  CPUAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: EKS-CPU-High
      MetricName: cluster_cpu_utilization
      Namespace: ContainerInsights
      Statistic: Average
      Period: 300
      EvaluationPeriods: 2
      Threshold: 80
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: ClusterName
          Value: !Ref EKSCluster

Modern Observability Stack

Prometheus and Grafana on AWS

# Amazon Managed Prometheus
Resources:
  PrometheusWorkspace:
    Type: AWS::AMP::Workspace
    Properties:
      Alias: production-monitoring
      
  GrafanaWorkspace:
    Type: AWS::Grafana::Workspace
    Properties:
      Name: production-grafana
      AccountAccessType: CURRENT_ACCOUNT
      AuthenticationProviders:
        - SAML
        - AWS_SSO
      DataSources:
        - PROMETHEUS
        - CLOUDWATCH
      PermissionType: SERVICE_MANAGED
      
  PrometheusServiceAccount:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Federated: !Sub 'arn:aws:iam::${AWS::AccountId}:oidc-provider/${EKSCluster.OpenIdConnectIssuerUrl}'
            Action: 'sts:AssumeRoleWithWebIdentity'
            Condition:
              StringEquals:
                '${EKSCluster.OpenIdConnectIssuerUrl}:sub': 'system:serviceaccount:monitoring:prometheus-server'
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess'

Kubernetes Monitoring Configuration

# Prometheus configuration for EKS
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    remote_write:
      - url: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-xxx/api/v1/remote_write
        sigv4:
          region: us-east-1
          service: aps
    
    rule_files:
      - "/etc/prometheus/rules/*.yml"
    
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
            
      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
            
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true    
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      serviceAccountName: prometheus-server
      containers:
      - name: prometheus
        image: prom/prometheus:v2.50.0
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus/'
          - '--web.console.libraries=/etc/prometheus/console_libraries'
          - '--web.console.templates=/etc/prometheus/consoles'
          - '--storage.tsdb.retention.time=15d'
          - '--web.enable-lifecycle'
        ports:
        - containerPort: 9090
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1000m
            memory: 2Gi
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus
        - name: prometheus-storage
          mountPath: /prometheus
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: prometheus-storage
        emptyDir: {}

Infrastructure as Code for Monitoring

Terraform Configuration

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "application_name" {
  description = "Application name"
  type        = string
  default     = "myapp"
}

# main.tf
resource "aws_cloudwatch_log_group" "application" {
  name              = "/aws/application/${var.application_name}"
  retention_in_days = 30
  
  tags = {
    Environment = var.environment
    Application = var.application_name
  }
}

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.application_name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors ec2 cpu utilization"
  alarm_actions       = [aws_sns_topic.alerts.arn]
  
  dimensions = {
    InstanceId = aws_instance.web.id
  }
  
  tags = {
    Environment = var.environment
    Application = var.application_name
  }
}

resource "aws_cloudwatch_metric_alarm" "disk_usage" {
  alarm_name          = "${var.application_name}-disk-usage"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "DiskSpaceUtilization"
  namespace           = "CWAgent"
  period              = "300"
  statistic           = "Average"
  threshold           = "85"
  alarm_description   = "This metric monitors disk space utilization"
  alarm_actions       = [aws_sns_topic.alerts.arn]
  treat_missing_data  = "breaching"
  
  dimensions = {
    InstanceId = aws_instance.web.id
    MountPath  = "/"
    Device     = "/dev/xvda1"
  }
  
  tags = {
    Environment = var.environment
    Application = var.application_name
  }
}

resource "aws_sns_topic" "alerts" {
  name = "${var.application_name}-alerts"
  
  tags = {
    Environment = var.environment
    Application = var.application_name
  }
}

resource "aws_sns_topic_subscription" "email_alerts" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = "alerts@mycompany.com"
}

resource "aws_sns_topic_subscription" "slack_alerts" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "lambda"
  endpoint  = aws_lambda_function.slack_notifier.arn
}

# Lambda function for Slack notifications
resource "aws_lambda_function" "slack_notifier" {
  filename         = "slack_notifier.zip"
  function_name    = "${var.application_name}-slack-notifier"
  role            = aws_iam_role.lambda_role.arn
  handler         = "lambda_function.lambda_handler"
  runtime         = "python3.11"
  timeout         = 10
  
  environment {
    variables = {
      SLACK_WEBHOOK_URL = var.slack_webhook_url
    }
  }
  
  tags = {
    Environment = var.environment
    Application = var.application_name
  }
}

# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "${var.application_name}-dashboard"
  
  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/EC2", "CPUUtilization", "InstanceId", aws_instance.web.id],
            ["CWAgent", "DiskSpaceUtilization", "InstanceId", aws_instance.web.id, "MountPath", "/"],
            ["CWAgent", "MemoryUtilization", "InstanceId", aws_instance.web.id]
          ]
          period = 300
          stat   = "Average"
          region = data.aws_region.current.name
          title  = "EC2 Instance Metrics"
        }
      },
      {
        type   = "log"
        width  = 12
        height = 6
        properties = {
          query = "SOURCE '${aws_cloudwatch_log_group.application.name}' | fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20"
          region = data.aws_region.current.name
          title  = "Recent Errors"
        }
      }
    ]
  })
}

data "aws_region" "current" {}

CloudWatch Agent Configuration

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/messages",
            "log_group_name": "/aws/ec2/system",
            "log_stream_name": "{instance_id}/messages",
            "retention_in_days": 7
          },
          {
            "file_path": "/var/log/myapp/app.log",
            "log_group_name": "/aws/application/myapp",
            "log_stream_name": "{instance_id}/app",
            "retention_in_days": 30
          }
        ]
      }
    }
  },
  "metrics": {
    "namespace": "CWAgent",
    "metrics_collected": {
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ],
        "totalcpu": false
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ],
        "metrics_collection_interval": 60
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}

AI/ML-Powered Monitoring

Amazon Lookout for Metrics

# Python SDK for Lookout for Metrics
import boto3
import json

client = boto3.client('lookoutmetrics')

# Create an anomaly detector
response = client.create_anomaly_detector(
    AnomalyDetectorName='MyAppAnomalyDetector',
    AnomalyDetectorDescription='Detects anomalies in application metrics',
    AnomalyDetectorConfig={
        'AnomalyDetectorFrequency': 'PT1H'
    }
)

# Create a metric set
metric_set_response = client.create_metric_set(
    AnomalyDetectorArn=response['AnomalyDetectorArn'],
    MetricSetName='ApplicationMetrics',
    MetricSetDescription='Application performance metrics',
    MetricList=[
        {
            'MetricName': 'response_time',
            'AggregationFunction': 'AVG'
        },
        {
            'MetricName': 'error_rate',
            'AggregationFunction': 'SUM'
        }
    ],
    DimensionList=[
        'instance_id',
        'endpoint'
    ],
    MetricSetFrequency='PT1H',
    MetricSource={
        'CloudWatchConfig': {
            'RoleArn': 'arn:aws:iam::123456789012:role/LookoutMetricsRole'
        }
    }
)

# Create an alert
alert_response = client.create_alert(
    AlertName='HighAnomalyAlert',
    AlertDescription='Alert when anomaly score is high',
    AnomalyDetectorArn=response['AnomalyDetectorArn'],
    AlertSensitivityThreshold=75,
    Action={
        'SNSConfiguration': {
            'RoleArn': 'arn:aws:iam::123456789012:role/LookoutMetricsRole',
            'SnsTopicArn': 'arn:aws:sns:us-east-1:123456789012:anomaly-alerts'
        }
    }
)

Lambda Function for Intelligent Alerting

import json
import boto3
import os
from datetime import datetime, timedelta

def lambda_handler(event, context):
    """
    Intelligent alerting function that filters noise and provides context
    """
    
    cloudwatch = boto3.client('cloudwatch')
    sns = boto3.client('sns')
    
    # Parse CloudWatch alarm
    message = json.loads(event['Records'][0]['Sns']['Message'])
    alarm_name = message['AlarmName']
    metric_name = message['MetricName']
    current_value = message['NewStateValue']
    
    # Get historical data for context
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=24)
    
    historical_data = cloudwatch.get_metric_statistics(
        Namespace=message['Namespace'],
        MetricName=metric_name,
        Dimensions=message['Dimensions'],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average']
    )
    
    # Calculate trend
    if len(historical_data['Datapoints']) >= 2:
        recent_avg = sum(d['Average'] for d in historical_data['Datapoints'][-6:]) / 6
        older_avg = sum(d['Average'] for d in historical_data['Datapoints'][-12:-6]) / 6
        trend = "increasing" if recent_avg > older_avg else "decreasing"
    else:
        trend = "unknown"
    
    # Get related metrics for context
    related_metrics = get_related_metrics(cloudwatch, message)
    
    # Create intelligent alert message
    alert_message = f"""
    🚨 INTELLIGENT ALERT: {alarm_name}
    
    Current Value: {current_value}
    24h Trend: {trend}
    
    Context:
    {format_context(related_metrics)}
    
    Recommended Actions:
    {get_recommendations(metric_name, current_value, trend)}
    """
    
    # Send enhanced alert
    sns.publish(
        TopicArn=os.environ['ALERT_TOPIC_ARN'],
        Message=alert_message,
        Subject=f"Intelligent Alert: {alarm_name}"
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps('Alert processed successfully')
    }

def get_related_metrics(cloudwatch, message):
    """Get related metrics for context"""
    # Implementation depends on your specific metrics
    return []

def format_context(metrics):
    """Format related metrics for display"""
    return "Additional context from related metrics"

def get_recommendations(metric_name, value, trend):
    """Provide intelligent recommendations based on the alert"""
    recommendations = {
        'CPUUtilization': [
            "Check for runaway processes",
            "Consider scaling out if trend is increasing",
            "Review recent deployments"
        ],
        'DiskSpaceUtilization': [
            "Clean up old logs and temporary files",
            "Check for large files in /tmp",
            "Consider expanding storage"
        ]
    }
    
    return '\n'.join(f"• {rec}" for rec in recommendations.get(metric_name, ["Review system performance"]))

Security and Compliance Monitoring

IAM Access Analyzer Integration

# CloudFormation template for IAM Access Analyzer
Resources:
  AccessAnalyzer:
    Type: AWS::AccessAnalyzer::Analyzer
    Properties:
      AnalyzerName: SecurityComplianceAnalyzer
      Type: ORGANIZATION
      
  AccessAnalyzerRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: access-analyzer.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AccessAnalyzerServiceRolePolicy
        
  SecurityComplianceAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: AccessAnalyzer-NewFindings
      AlarmDescription: Alert when new security findings are detected
      MetricName: FindingsCount
      Namespace: AWS/AccessAnalyzer
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 1
      Threshold: 0
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref SecurityAlertsTopic

VPC Flow Logs Analysis

# Lambda function for VPC Flow Logs analysis
import json
import boto3
import gzip
from datetime import datetime

def lambda_handler(event, context):
    """
    Analyze VPC Flow Logs for security anomalies
    """
    
    s3 = boto3.client('s3')
    cloudwatch = boto3.client('cloudwatch')
    
    # Process S3 event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # Download and decompress flow log
    response = s3.get_object(Bucket=bucket, Key=key)
    
    if key.endswith('.gz'):
        content = gzip.decompress(response['Body'].read()).decode('utf-8')
    else:
        content = response['Body'].read().decode('utf-8')
    
    # Analyze flow logs
    anomalies = analyze_flow_logs(content)
    
    # Send metrics to CloudWatch
    for anomaly in anomalies:
        cloudwatch.put_metric_data(
            Namespace='Security/VPCFlowLogs',
            MetricData=[
                {
                    'MetricName': f'Anomaly_{anomaly["type"]}',
                    'Value': anomaly['count'],
                    'Unit': 'Count',
                    'Dimensions': [
                        {
                            'Name': 'SourceIP',
                            'Value': anomaly['source_ip']
                        }
                    ]
                }
            ]
        )
    
    return {
        'statusCode': 200,
        'body': json.dumps(f'Processed {len(anomalies)} anomalies')
    }

def analyze_flow_logs(content):
    """Analyze flow logs for security anomalies"""
    anomalies = []
    
    for line in content.split('\n'):
        if not line.strip():
            continue
            
        fields = line.split()
        if len(fields) < 14:
            continue
            
        # Extract relevant fields
        srcaddr = fields[3]
        dstaddr = fields[4]
        srcport = fields[5]
        dstport = fields[6]
        protocol = fields[7]
        packets = int(fields[8])
        action = fields[12]
        
        # Detect anomalies
        if is_port_scan(srcaddr, dstaddr, srcport, dstport):
            anomalies.append({
                'type': 'port_scan',
                'source_ip': srcaddr,
                'count': 1
            })
            
        if is_ddos_attempt(srcaddr, packets):
            anomalies.append({
                'type': 'ddos_attempt',
                'source_ip': srcaddr,
                'count': packets
            })
    
    return anomalies

def is_port_scan(srcaddr, dstaddr, srcport, dstport):
    """Simple port scan detection logic"""
    # Implement your port scan detection logic
    return False

def is_ddos_attempt(srcaddr, packets):
    """Simple DDoS detection logic"""
    # Implement your DDoS detection logic
    return packets > 1000

Cost Optimization

Cost Monitoring Dashboard

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          [ "AWS/Billing", "EstimatedCharges", "Currency", "USD" ],
          [ "AWS/CloudWatch", "EstimatedCharges", "Currency", "USD", "ServiceName", "AmazonCloudWatch" ],
          [ "AWS/CloudWatch", "EstimatedCharges", "Currency", "USD", "ServiceName", "AmazonCloudWatchLogs" ]
        ],
        "period": 86400,
        "stat": "Maximum",
        "region": "us-east-1",
        "title": "Estimated Monthly Charges"
      }
    },
    {
      "type": "metric",
      "properties": {
        "metrics": [
          [ "AWS/CloudWatch", "MetricCount", "Namespace", "AWS/EC2" ],
          [ ".", ".", ".", "CWAgent" ],
          [ ".", ".", ".", "AWS/ApplicationELB" ]
        ],
        "period": 3600,
        "stat": "Sum",
        "region": "us-east-1",
        "title": "Metric Count by Namespace"
      }
    }
  ]
}

Cost Optimization Lambda

import boto3
import json
from datetime import datetime, timedelta

def lambda_handler(event, context):
    """
    Cost optimization function for CloudWatch resources
    """
    
    cloudwatch = boto3.client('cloudwatch')
    logs = boto3.client('logs')
    
    # Get cost metrics
    cost_metrics = get_cost_metrics()
    
    # Identify optimization opportunities
    optimizations = []
    
    # Check for unused log groups
    unused_log_groups = find_unused_log_groups(logs)
    optimizations.extend(unused_log_groups)
    
    # Check for low-value metrics
    low_value_metrics = find_low_value_metrics(cloudwatch)
    optimizations.extend(low_value_metrics)
    
    # Check for expensive dashboards
    expensive_dashboards = find_expensive_dashboards(cloudwatch)
    optimizations.extend(expensive_dashboards)
    
    # Generate report
    report = generate_optimization_report(optimizations)
    
    # Send to SNS
    sns = boto3.client('sns')
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789012:cost-optimization',
        Message=json.dumps(report, indent=2),
        Subject='CloudWatch Cost Optimization Report'
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps(f'Found {len(optimizations)} optimization opportunities')
    }

def get_cost_metrics():
    """Get current cost metrics"""
    return {}

def find_unused_log_groups(logs):
    """Find log groups with no recent activity"""
    unused = []
    paginator = logs.get_paginator('describe_log_groups')
    
    for page in paginator.paginate():
        for log_group in page['logGroups']:
            # Check if log group has been inactive for 30 days
            if is_log_group_inactive(logs, log_group['logGroupName'], 30):
                unused.append({
                    'type': 'unused_log_group',
                    'resource': log_group['logGroupName'],
                    'potential_savings': calculate_log_group_cost(log_group)
                })
    
    return unused

def find_low_value_metrics(cloudwatch):
    """Find metrics that are rarely queried"""
    return []

def find_expensive_dashboards(cloudwatch):
    """Find dashboards with high metric count"""
    return []

def is_log_group_inactive(logs, log_group_name, days):
    """Check if log group has been inactive for specified days"""
    try:
        response = logs.describe_log_streams(
            logGroupName=log_group_name,
            orderBy='LastEventTime',
            descending=True,
            limit=1
        )
        
        if not response['logStreams']:
            return True
            
        last_event_time = response['logStreams'][0].get('lastEventTime', 0)
        if last_event_time == 0:
            return True
            
        last_event_date = datetime.fromtimestamp(last_event_time / 1000)
        cutoff_date = datetime.now() - timedelta(days=days)
        
        return last_event_date < cutoff_date
        
    except Exception as e:
        print(f"Error checking log group {log_group_name}: {e}")
        return False

def calculate_log_group_cost(log_group):
    """Calculate estimated cost of log group"""
    stored_bytes = log_group.get('storedBytes', 0)
    # Rough estimate: $0.50 per GB per month
    return (stored_bytes / (1024**3)) * 0.50

def generate_optimization_report(optimizations):
    """Generate cost optimization report"""
    total_savings = sum(opt.get('potential_savings', 0) for opt in optimizations)
    
    return {
        'total_potential_savings': total_savings,
        'optimization_count': len(optimizations),
        'optimizations': optimizations,
        'generated_at': datetime.utcnow().isoformat()
    }

Summary

Modern AWS monitoring in 2025 emphasizes:

  1. Managed Observability - Use AWS managed Prometheus and Grafana
  2. Infrastructure as Code - Automate monitoring setup with CloudFormation/Terraform
  3. AI-Powered Insights - Leverage Amazon Lookout for Metrics and intelligent alerting
  4. Security Integration - Comprehensive security monitoring with IAM Access Analyzer
  5. Cost Optimization - Proactive cost monitoring and resource optimization
  6. Multi-Cloud Support - Cloud-agnostic monitoring with open-source tools

These patterns provide a robust foundation for modern observability that scales with your applications and reduces operational overhead.

About Cloudurable

We hope you enjoyed this modernized monitoring guide. Please provide feedback.

Cloudurable provides:


Last updated: January 2025 for AWS CloudWatch and modern observability practices

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting