The process begins with collectd installed on each server which gathers and sends data to the RightScale monitoring system. This data can be viewed on monitoring graphs in the dashboard. You can base an alert off anything you see on the monitoring tab. For each alert you set the condition, threshold, duration, and escalation. An escalation could be to send an email or to vote to grow or shrink an array. When the ratio of servers that vote to grow or shrink is greater than the decision threshold set on the array then a server will be added or removed. In summary, the progression is:
Collectd --> Monitoring System --> Alert Specification --> Alert Escalation --> Server Array
It is common to grow or shrink an array based on the cpu idle value. You can view the value of cpu-0/idle in your monitoring graph and it is the percentage of time that the cpu has not been doing any work. When the idle value starts to fall that means the server is getting busy. This is similar to the Linux top command which has a stat called %id (e.g. 100.0%id for a cpu that isn't doing anything). Top only displays the instantaneous idle value. The monitoring system aggregates and graphs these values over a period of time and the alert sub-system allows you to specify durations so you don't have too many knee jerk reactions to sudden cpu spikes.
You can create an alert specification to grow or shrink a Server Array off any value (not just cpu idle).
You should design your application so that if a server goes offline users are not affected. Normally this is not an issue. If they are connected to a specific server the web page they are currently on is cached but when they go to a new web page it creates a new http request and HAProxy will route them to one of the up servers in the pool.
© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.