Application Scaling helps you offer a consistent performance for your end users during high demand and also reduce your spend during periods of low demand.
The following section covers the key terminologies used through this document:
Scaler is E2E service that manages Application Scaling functionality.
Scale Groups represent the nodes launched by Scaler. Each group is destined to adhere to a scaling policy (e.g. Add a compute node when CPU utilization on an existing node hovers at 70% for 20 seconds).
The nodes in a scale group are dynamically added or removed. These nodes will be referred to as Group nodes or just nodes in the rest of the document.
The lifecycle of group nodes starts with creation of scale groups and ends with termination of the group. You will be charged for the time between start action of a node and the time of termination.
Due to dynamic nature of nodes, you would want to automate the launch sequence of the application too. This is where saved image comes into play.
A saved image is nothing but a compute node that you had saved and has capability to launch your application at the startup.
The compute plan or plan is where you select infrastructure or hardware requirements for your group nodes. It need not be same as the plan you had used to create your saved image.
This is a plan sequence you are most likely to follow when defining application scaling:
- Create a node with a conservative plan for application (e.g. C Series 1 CPU 1GB)
- Add launch sequence to auto-install and start your application during startup
- Create a scale group with actual plan you need for your production servers (e.g. C series 16 CPU 64 GB)
A scaling policy determines the lifecycle of group nodes. It consists of an expression that contains the following factors:
- Min. nodes
- Max. nodes
- Desired nodes or Cardinality
- Watch Period and Period Duration
- Cool down
A scaling policy determines how you want to add a node to group. A negative policy is automatically created by Scaler to handle termination of nodes. For example: When a user sets an expression of CPU > 80 for upscaling, the scaler will automatically create a downscaling policy of CPU < 80. The downscaling policies will be internally managed by the scaler.
Min and Max nodes¶
Min and Max nodes determine the maximum or minimum guarantees from your scale group.
Cardinality or Desired Nodes¶
Though the actual number of group nodes is decided by scaler through policy configuration, you have an option to influence this setting at certain occasions.
One of such occasions is when you perform code or image updates. You could launch extra nodes that will absorb the changes and then manually delete the existing nodes that have older version of your code.
Keep it simple! Start with 2 nodes and let the scale group take over
Performance or Target Metric¶
At this time, scaling policy only supports CPU Utilization.
It is normal to see CPU spikes on the servers but what you want is a consistent spike that lasts for period of time to make a scaling decision. A watch period has two parts: Periods and Period Duration
A duration determines how long a period lasts and the number of periods determines how long the watch lasts. Lets go through an example to understand this better.
Consider this scaling policy:
Expression: CPU > 75
Watch Period: 2
Period Duration: 10 seconds
The scaler will watch out for 2 consecutive periods (of 10 seconds each) when the CPU Utilization stays at above 75%. And when such condition occurs the scaling operation will be initiated.
A cooldown is an lull period when all scaling operations are blocked. It typically starts right after a scaling operation. The idea of cooldown is to wait and watch for impact of a scale operation before taking further actions. The default is 150 secs.
Load balancers form the entry doors when your scaling applications. While the actual group nodes (and their IP) may keep on changing, the load balancer will enable a consistent access for the external world.
Always bundle your scale groups with a load balancer.