Skip to content
Pablo Rodriguez

Monitoring Resources

Monitoring Distributed Application Components

Section titled “Monitoring Distributed Application Components”

A responsive system or application means that distributed components respond in a timely manner even under a heavy load. To meet this requirement, special attention must be paid to “latency”.

  • A good example would be synchronous, distributed application services that have to respond within a certain time period
  • When something goes wrong, you need a “low-latency error-handling mechanism” to report the problem so that remedial action can be taken
  • A distributed application that passes messages between services for communication is also hard to debug
    • Not always possible to reproduce the exact state of the complete application and replay events to reproduce issues

The solution is for each separate log file for each service to be consolidated in a central service that will provide a complete log of the whole application.

  • The consolidation service can then aggregate the metrics from the logs
  • Alarms can also be set for metric thresholds that have been exceeded

Metrics that can be monitored for an application include:

  • Operational health metrics - monitor the application infrastructure to ensure that the runtime environment is running well
  • Resource utilization metrics - measure the use of the application components and make sure that the components are not being over- or underutilized
  • Application performance metrics - measure if the application demand is being met
Core Monitoring Service

Amazon CloudWatch is an AWS service metrics and logs repository with the following capabilities:

  • Collects and tracks metrics for AWS services across Regions in a metric repository
  • Collects logs by using Amazon CloudWatch Logs
  • Supports built-in AWS service metrics or custom metrics
  • Calculates statistics from metrics and displays graphs of the metric statistics
  • Provides alarms for responsive event-driven architectures
  • Provides notifications to make changes to monitored resources

You can use Amazon CloudWatch Logs to monitor, store, and access your log files from Amazon EC2 instances, AWS CloudTrail, Route 53, and other AWS services. An AWS service such as Amazon EC2 puts metrics into the repository, and you retrieve statistics based on those metrics.

Statistics are metric data aggregations over specified periods of time. Metrics are stored separately in Regions, but you can use CloudWatch cross-Region functionality to aggregate statistics from different Regions.

An alarm watches a single metric over a specified time period and performs one or more specified actions based on the value of the metric relative to a threshold over time.

  • You can use an alarm to automatically initiate actions on your behalf
  • The action is a notification sent to an Amazon SNS topic or an Auto Scaling policy
  • You can also add alarms to dashboards

When you create an alarm to monitor a specific metric, you have extensive control over how CloudWatch makes that comparison:

  • You can specify the period over which the comparison is made
  • You can specify how many evaluation periods are used to arrive at a conclusion
  • If you set an alarm on a high-resolution metric, you can specify a high-resolution alarm with a period of 10 seconds or 30 seconds

The metrics are stored in the metrics repository in a namespace container. Metrics in different namespaces are isolated from each other so that metrics from different applications are not mistakenly aggregated into the same statistics.

  • AWS namespaces typically use the naming convention: AWS/service
  • For example, Amazon EC2 uses the AWS/EC2 namespace

A metric is a time-ordered set of data points with a unit of measurement such as bytes, seconds, count, and percentage. You can think of a metric as a variable to monitor, and the data points as representing the values of that variable over time.

  • Standard resolution - data points have 1-minute granularity
  • High resolution - data points have 1-second granularity

You can optionally configure a metric dimension. A dimension is a name and value pair specifying a metric characteristic.

  • For example, many Amazon EC2 metrics publish InstanceId as a dimension name, and the actual instance ID as the value for that dimension
  • This ID dimension can be used to filter the metrics when searching for a specific instance metrics

CloudWatch dashboards are customizable home pages in the CloudWatch console that you can use to monitor your resources in a single view, even those resources that are spread across different Regions.

  • Create a single view for selected metrics and alarms to help you assess the health of your resources and applications across one or more Regions
  • Select the color used for each metric on each graph so that you can track the same metric across multiple graphs
  • Create an operational playbook that provides guidance for team members during operational events
  • Create a common view of critical resource and application measurements that team members can share for faster communication flow
Event Processing

When CloudWatch alarms are breached, the event needs to be published so that automated or manual correction actions can be implemented. Amazon EventBridge is an event bus used to route events.

EventBridge is a serverless service that uses events to connect application components together, helping you to build scalable event-driven applications:

  • Event buses are routers that receive events and deliver them to zero or more targets
  • Pipes are point-to-point integrations between one source and one target
  • Makes routing decisions with configurable rules
  • Rules run based on matching an event pattern or on a schedule with Amazon EventBridge Scheduler
  • Targets can be in another AWS account, in another Region, or in both

An event bus is a router that receives events and delivers them to zero or more destinations, or targets. Use an event bus when you need to route events from many sources to many targets with optional transformation of events prior to delivery to a target.

Rules receive incoming events and send them as appropriate to targets for processing:

  • You can specify how each rule invokes its target or targets based on an event pattern or a schedule
  • An event pattern contains one or more filters to match events
  • A target is a resource or endpoint that EventBridge sends an event to when the event matches the event pattern defined for a rule

AWS provides monitoring and reporting tools for cost management:

AWS Cost Explorer

Helps you visualize, understand, and manage your AWS costs and usage with daily or monthly granularity. You can view data up to the last 13 months, which helps you see patterns in spending.

AWS Budgets

Helps you set custom budgets that alert you when your costs or usage exceeds (or are forecasted to exceed) your budgeted amount.

AWS Cost and Usage Report

Contains the most comprehensive set of AWS cost and usage data available, including additional metadata about AWS services, pricing, and reservations.

CloudWatch alarms can send notifications to Amazon EC2 Auto Scaling and SNS topics. CloudWatch collects logs and metrics from AWS services across Regions, and you can use CloudWatch dashboards to visualize metrics and alarms. EventBridge processes and routes events with an event bus or a pipe, while AWS cost monitoring tools help you understand and manage your AWS infrastructure costs.