Auto-Scaling Deep Dive


I wrote a blog on Auto-Scaling and got many questions regarding that topic. Also, after the release of vRealize Automation 7.3.1 onwards, the workflows stopped working. So, this post has the answer to all those questions and the updated package which will work with vRealize Automation versions upwards from 7.3. Though this post mainly covers the answers to the questions, it contains information about the subject as a whole and can be further categorized into the following sub topics:

  • Definition
  • Why should we use it
  • Points to consider
  • Candidates
  • Methods
  • Lab example
  • Final words
  • Demo


Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing, whereby the amount of computational resources in a server farm, typically measured in terms of the number of active servers, scales automatically based on the load on the farm. It is closely related to, and builds upon, the idea of load balancing[Source: Wikipedia]

Auto-Scaling mainly introduced by Amazon in AWS Cloud services. So the terminologies and types are majorly influenced by AWS terminologies. Today this service is offered by all the other public cloud services (Azure, Google Cloud and others). Also, this service can be build easily in any private cloud environment or in any virtual environment.

In my opinion, Auto-Scaling is a feature and should not be tagged with any particular environment (cloud/virtualization or otherwise) . Same feature can be built easily in any private cloud/virtualized environment or even in purely physical environment as well (through monitoring and scripting).


Autoscaling offers the following advantages:

  • Scaling Out/In or Up/Down as needed, thus saving resources and energy (mostly in private datacenter environment)
  • In Cloud environment, this will result in lesser bill, as, the money charged is directly proportional to used resources
  • No need for upfront investment for future usage (e.g. Black Friday sale etc.) [specially in Cloud environment]
  • Easily replace crashed or unhealthy nodes which in turn ensures uptime and performance as per requirement.

Points to consider

Most of the times, I hear people say they want to implement “Auto-Scaling” without understanding the requirements.

There are two aspects of implementing Auto-Scaling. They are:

  • Implementing at Infrastructure layer
  • Implementing at Application layer

For the entire feature to be successful, both of the above aspects needs to go hand in hand.

Implementing Auto-Scaling in infra level is easy. But to be truly effective, the application for which it is designed needs to be aware of the environment. The requirements are provided below.

  • Application must be auto-scale aware
  • Point one translates to application being clustered in nature (in case of scale out/in). Nodes should be added and removed without hampering application operation
  • Infrastructure should be easily scalable (agile in nature)
  • Proper triggering mechanism (monitoring and alerting)

Provided below are the considerations required for the successful operation of the feature.

  • Initial waiting period – when to start scaling
  • Gap between scaling
  • Stateless vs stateful applications
    • Stateless applications (e.g. web servers with nothing stored locally)
    • Stateful applications (e.g. nodes processing payments)
  • When to stop (upper limit)
  • When to scale down
  • Till which point to scale down
  • Triggers for scaling – few examples, CPU/Memory usage, Network traffic (number of hits), number of transactions etc.

General observation, time for scaling up should be less than time taken for scaling down. Example, if for scaling out is triggered for 10 minutes of network load, then scaling in should be done after 30 minutes to 1 hour of less activity. Otherwise, too much scaling will happen resulting in unwanted situations.

Possible Candidates

Provided below is a list of possible candidates. Remember, this is not an absolute list.

  • Stateless webservers (web server farm)
  • Stateful front end servers (while scaling down, special precautions need to be taken)
  • Mid-tier application servers or other applications which has cluster functionality and supports addition/removal of nodes at runtime
  • Database servers - scale-up, scale out in case of multiple nodes is supported or required.

Consideration: In most cases, webserver or application servers are the targeted use cases. Database servers or applications of similar kind are the least targeted or preferred candidates for this service.

Finally, implementation is limited only by your imagination.


There are two well accepted methods among others. These are listed below:

  • Pre-Deployment
  • Runtime Deployment

Here, we will discuss only about the above two methods.

Pre-Deployment Method

In this method, all the required nodes are pre-deployed and are kept in standby mode (powered off or suspended). Under correct situations they are either powered on or brought into the cluster. Generally the following steps are performed.

  • Deploy the pre-decided maximum number of nodes and either keep them in powered off mode or standby mode
  • Nodes can be pre-configured to be part of the cluster (cluster nodes) or can be added at runtime
  • Power On/Off the nodes as and when required as per the triggering condition

Provided below is a list of advantages and disadvantages of this method.

  • Less reaction time
  • Does not require resource intensive operations (storage IO or runtime deployment)
  • Does not give maximum saving (storage is still consumed)
Runtime Deployment Method

In this method, the nodes are deployed at runtime as and when required. Generally, the following steps are performed for this method.

  • Nodes are deployed or removed at runtime
  • As the nodes are deployed, they become part of the cluster or removed from from it (node addition/removal)
  • Maximum resource saving
  • Maximum monetary saving
  • Large reaction time (lot of runtime time hogging operations, such as node deployment, addition to or removal from cluster etc.)
  • Resource intensive operations. If large load comes then multiple nodes are deployed in quick succession, leading to excessive IO operations. Can lead to unwanted situations.

Based on the requirement choose carefully between these two methods. I believe this is a balance between performance and saving. If the application is time sensitive, then choose the pre-deployment method. Otherwise if monetary or resource saving is the main consideration, then choose the second method.

Lab Example

As demonstrated in my earlier post, here I will provide an example from my lab. For this, I used the following VMware products.

  • vRealize Automation (version 7.5)
  • vRealize Operations Manager (version 7.0)
  • vRealize Orchestrator (version 7.5)
  • vCenter server (version 6.5 U1)
  • Webhook Shims

Provided below is a picture which shows logical lab setup.

How it works

In my lab, the VM’s are deployed through vRealize Automation (vRA). vCenter server is used as endpoint. Total environment is monitored by vRealize Operations Manager (vROps). vROps is also used for alert notifications. The alerts are sent to vRealize Orchestrator (vRO) which in turn runs a workflow. Through this workflow we initiate appropriate action (Scale Out or In) in vRA.

Note: For this demonstration we are utilizing Out of the Box feature of vRA. vRA provides action items Scale Out and In per deployment. So, this makes our task easier.

Another point to note, vROps can send notifications to vRO using three methods, SNMP trap, REST notifications and using vRO plugin. Merit and de-merit for all are provided below:

SNMP Traps


  • Does not require extra resources configuration
  • Simpler to setup


  • Irrespective of the nature, all the traps are sent over to vRO
  • vRO needs to constantly listen for the right trap. This puts extra pressure on vRO
  • Not a very clean way of handling the traps
REST Notification


  • Much cleaner way of handling the alerts
  • Alert specific workflow can be initiated from vRO
  • vRO does not need to listen for alerts/traps. The workflow runs only when an alert is triggered.


  • Needs extra resource and configuration
  • Right now, the dependent technologies are not tested for scale in production
vRO Plugin


  • No extra overhead at vRO
  • Fully supported from VMware
  • Easier processing of alerts than SNMP traps
  • Workflows are fired as per alert, so no extra overhead at vRO


  • Lenghtier setup than SNMP traps
  • For some, the configuration may seem complex

For my lab, I have chosen to demonstrate REST Notification and vRO Plugin. If you need details on how to setup and use SNMP traps, then check my earlier post.

One point to note, REST notification sends data in JSON format. The format in which vROps sends data is not understood by vRO. vRO expects data in another format. So, to solve this, we need Webhook Shims in between. Webhook Shims works as translator for vRO. This is the extra configuration needed for REST notification. In my opinion, result is worth the extra effort. Provided below is a picture on how Webhook Shims work in the entire scenario.

Also, for this demonstration I am selecting only CPU as the triggering mechanism. For a more detailed demonstration with load balancers (NSX) and HTTP hits as triggering mechanism, please check my earlier post.

Special mention

I am impressed with vRO plugin method of alert triggering. Perhaps the most useful till now. For future references I am going to depend more and more on this method.

Please check up the video for a detailed description on how to configure and set this up. I have spent some time in detailing this process.

For background read please check the blogs from John Dias

For detailed information, check the documentation .


For more details, check the following video.

I tried to record the video 4 times and somehow all four times it crashed my system. This is reason I am so late in publishing the blog. Finally, I have divided the recording in two parts. The first part has the theoretical aspect explained. The second part has the demo in my lab. So, depending on what you are looking for, you can watch the videos accordingly.

Auto Scale - Part 1 - Details

Auto Scale - Part 2 - Demo


I am thankful to VMware for the softwares in the lab and the presentation template used in the presentation. The quality of the ppt template is unparalleled and I like the simplicity. I am also thankful to Ron Tsai for sharing the updated workflows.


This is a very useful feature. Specially in the virtualization and cloud computing age. Though the examples I gave are using VMware technologies, you can build this solution using any other tools. Please check the solution and let me know your feedbacks. As always, thanks for your time and patience.