Scaling Compute Resources

Scaling Your Compute Resources

The Need for Reactive Architectures

Modern applications are expected to handle petabytes of data, require close to 100 percent uptime, and deliver sub-second response time to users. A growing number of enterprises have adopted reactive architectures and reactive systems.

Reactive Application Characteristics

To meet these requirements, you can implement a reactive application that is:

Elastic - scaling is a mechanism to scale an application’s resources dynamically. This type of scaling will add or remove resources to react to changes and avoid downtime or traffic bottlenecks
Resilient - stays responsive in the face of failure. Has the capability to recover when stressed by load, attacks, and failure of any component in the workload’s components
Responsive - applications should respond in a timely manner with the lowest latency possible. A reactive application remains responsive under varying workloads
Message-driven - perhaps the most important characteristic. To establish boundaries between services, reactive applications rely on asynchronous message-passing to help ensure loose coupling, isolation, and location transparency

Achieve Elasticity with Scaling

Core Concept

Elasticity means that the infrastructure can expand and contract when capacity requirements change. You can acquire resources when you need them and release resources when you do not.

Scaling is the ability to increase or decrease the compute capacity of your application. Scaling is a technique that is used to achieve elasticity.

Vertical Scaling

Vertical scaling is where you increase or decrease the specifications of an individual resource:

You could upgrade to a new server with a larger hard drive or a faster CPU
Typically the application and data has to be transferred to a new resource, which can lead to application downtime
With Amazon EC2, you can stop an instance and resize it to an instance type that has more RAM, CPU, I/O, or networking capabilities
Vertical scaling can eventually reach a limit because it is hardware bound
Not always a cost-efficient or a highly available approach

Horizontal Scaling

Horizontal scaling is where you add or remove resources available to the application:

Adding resources is referred to as scaling out
Ending resources is referred to as scaling in
Good way to build internet-scale applications that take advantage of the elasticity of cloud computing
Applications, data, or both are automatically transferred to added resources

Amazon EC2 Auto Scaling

Free Service

Amazon EC2 Auto Scaling does scaling by grouping EC2 instances in a management group called an Amazon EC2 Auto Scaling group. The group can span across Availability Zones.

Key Features

Manages a logical collection of Amazon EC2 instances called an Amazon EC2 Auto Scaling group across Availability Zones
Launches or retires EC2 instances configured by launch templates
Resizes based on events from scaling policies, load balancer health check notifications, or schedule actions
Integrates with Elastic Load Balancing (ELB) to send new instances registrations and receive health notifications
Balances the number of instances across Availability Zones
Is available free of charge

Benefits

With Amazon EC2 Auto Scaling, your applications gain the following benefits:

Better fault tolerance - can detect when an instance is unhealthy, terminate it, and launch an instance to replace it
Better availability - helps ensure that your application always has the right amount of capacity to handle the current traffic demand
Better cost management - can dynamically increase and decrease capacity as needed

Amazon EC2 Auto Scaling Group Components

Capacity Settings

The number of instances in the group is determined by the capacity settings:

Minimum capacity - the smallest number of instances needed to run the application
Maximum capacity - the largest number of instances permitted for the group
Desired capacity - the optimal number of instances needed to run the application under normal circumstances

Launch Templates

To launch an EC2 instance, the group needs to know which type of EC2 instances to initiate. The group launch template is used to specify the EC2 instance configuration details:

Instance type and the Amazon Machine Image (AMI) ID
What percentage of the desired capacity should be fulfilled with On-Demand Instances, Reserved Instances, and Spot Instances
Launch template can be versioned

Scaling Mechanisms

A new Amazon EC2 Auto Scaling group has no scaling mechanisms. You can add scaling mechanisms, such as:

Schedule actions
Dynamic scaling policies
Predictive scaling policy

Amazon EC2 Auto Scaling Mechanisms

Scale based on a date and time

Are for predictable workloads
Useful for predictable workloads when you know exactly when to increase or decrease the number of instances
Example: Traffic increases on Wednesday, remains high on Thursday, starts to decrease on Friday

Target Tracking Scaling

Target tracking scaling policies increase or decrease the current capacity of the group based on a target value for a specific metric. This type of scaling is similar to the way that your thermostat maintains the temperature of your home:

You select a temperature, and the thermostat does the rest
You select a scaling metric and set a target value
Amazon EC2 Auto Scaling creates and manages the CloudWatch alarms that invokes the scaling policy

Step Scaling and Simple Scaling

Step scaling - adjust the scaling to match the size of the alarm breach
Simple scaling - will wait for scaling to finish when an alarm is reported ignoring subsequent alarms
Both use scaling metrics and threshold values for the CloudWatch alarms that invoke the scaling process

Predictive Scaling Use Cases

Predictive scaling is well suited for situations where you have:

Cyclical traffic, such as high use of resources during regular business hours and low use during evenings and weekends
Recurring on-and-off workload patterns, such as batch processing, testing, or periodic data analysis
Applications that take a long time to initialize, causing a noticeable latency impact on application performance during scale-out events

More AWS Scaling Options

AWS Auto Scaling

Uses a scaling plan to configure auto scaling for multiple resources:

Scale multiple AWS services:
- Amazon Aurora
- Amazon EC2 Auto Scaling
- Amazon ECS
- Amazon DynamoDB
Use tags to group resources in categories such as production, testing, or development
Search for and set up scaling plans for scalable resources that belong to each category

AWS Application Auto Scaling

Scale multiple resources with target tracking, step scaling, or scheduled scaling:

Scale multiple AWS services:
- AWS Auto Scaling services
- AWS Lambda functions
- Amazon SageMaker
- Amazon ElastiCache for Redis
Similar to Amazon EC2 Auto Scaling groups but for individual AWS services beyond Amazon EC2

Amazon EC2 Auto Scaling creates and manages logical collections of EC2 instances called Auto Scaling groups. Groups have capacity settings that specify minimum, maximum, and desired number of instances. Group size can be scaled in and out with schedule actions, dynamic policies, and predictive policies, while AWS Auto Scaling and Application Auto Scaling extend scaling capabilities to services beyond EC2 instances.