To create and view Cluster Monitoring graphs for a deployment with many servers.
Collectd must be installed on the instance for monitoring to work. Most ServerTemplates include this as part of their boot phase. If yours do not, see Setting up collectd for more information.
Many RightScale customers have deployments with hundreds or even thousands of servers in them. Looking at standard monitoring graphs is not very practical for deployments with many running servers. Cluster Monitoring makes this process simpler and more effective. With Cluster Monitoring you have the ability to filter all servers in a deployment by the ServerTemplate name they used, the actual server name itself, or a Tag. You can specify a specific value to filter against, or a "*" to match against all strings. (That is, all ServerTemplate names, all server names, or all Tags.)
The following example use cases may help give you ideas on how you can better use Cluster Monitoring.
As an example, in your deployment that includes availability across the US and EU, you could "Cluster Monitor" all of your EU servers, US servers or all front end (FE) servers across both regions, given that "EU" , "US" and/or "FE" were part of your server naming convention scheme. At a glance you could see if any of these Servers struggled with low CPU idle time, too much Apache/web traffic, etc.
As another use case, if your deployment uses database sharding, you could tag each shard with a descriptive name based on account numbers. For example, tag them with this basic naming convention:
Then you would simply apply a filter by tag "shard" in order to access all Cluster Monitor graphs for your database shard servers.
Lets say you discover that a customized ServerTemplate may have developed a memory leak. The suspected ServerTemplate is called "Custom PHP App". You could filter by ServerTemplate on "custom php app" and all servers based on this ServerTemplate would be viewable in your Cluster Monitoring Graphs. From there, you could step through various server graphs viewing memory data. If the server free memory graphs are sloping, you may have identified a trend that confirms your memory leak suspicions.
Note that these example use cases are in addition to configuring monitoring and Alerts with Alert Escalations, etc. That is, Cluster Monitoring does not prohibit you from setting up Alerts the way you are accustomed to any more than standard monitoring graphs do. Both are additional features, orthogonal to automated alerts and escalations.
Understanding the basic screen layout allows you to create and view Cluster Monitoring graphs more effectively. At first, the layout and various operations may not seem obvious. This brief Overview of the screen layout will help you grasp the screen and its functions more quickly. When viewing the Monitoring tab of your deployment, your main focus should be the following three areas:
A dotted line separates the three areas from each other
The design area is at the top of the Monitoring display area and is where you specify whether you want to create and display new standard graphs -OR- Cluster Monitor graphs.
The following screen shot shows the design area in the process of:
Once applied, the resulting set of servers that display Cluster Monitoring graph data is all servers in the current deployment that contain "staging" in their name.
After selecting the Apply action button, several graphs are added to the design area. The next example shows this. The key points of interest to note are:
A single server is always shown after applying a filter in the design area. In our example, this is the "Staging - Azure" server. This single server is known as the Representative Server. The screen area is too small to show all servers, so a representative server is shown, with the option of drilling down to other servers.
These thumbnail graphs show all of the monitored data for that individual Representative Server. As a starting point the metrics for "cpu-0/cpu_overview" are displayed.
A list of all servers in the result set of the filter that was applied. (Servers with "staging" in their name in our example.) You may select or browse through any and all servers in this list. If you select a different Representative Server the graphs immediately below them are changed to correspond to the collectd data for that server.
A graph for each Server in the Representative server drop down is shown. (Note: Example screenshot is truncated for display purposes.)
Select the Save action button (not shown) to save graph data from any filters you have run/displayed. The saved graphs are placed at the bottom of your Monitoring display, and your browser is automatically placed at the bottom.
Standard monitoring graphs appear below the design area, above any Cluster Monitor graphs, delimited by the dotted line as shown. See Viewing Graphical Data for more information on standard graphs. The functionality of standard monitoring graphs has not changed with the advent of Cluster Monitoring.
Cluster Monitor graphs display at the bottom of the Monitoring display, below the standard graphs. There is (virtually) no limit to how many Cluster Monitor graphs you can save. Once saved (by selecting the Save action button), you can perform the following basic operations on the graphs in the Cluster Monitoring display area:
Note: Once you save Cluster Monitoring graphs, a design area for Cluster Monitoring is placed at the top of the displayed graphs. Its like having a header on your Cluster Monitoring graphs section, so that you don't have to scroll back up to the top of your entire Monitoring display in order to change the contents of an individual Cluster Monitor. It functions the same as the design area at the top, except you don't have the choice of selecting standard graphs or Cluster Monitor graphs. Cluster Monitor graphs are implied by the location. This feature is simply for ease of use, there is no added functionality here.
Note: this feature is only supported for AWS and Google.
When enabling cluster monitoring, you have the option of choosing to view your graphs in "hosts," "heat" or "stack":
Heat maps show a different view of the same action. In a heat map, each color bar represents the activity of a server and the color of the bar at each point in time, showing how "hot" the server is. For example, the value of the variable being displayed color coded from blue to red:
A stacked graph is an effective alternative for displaying many servers on one graph where the activity of each one contributes to a total or sum. Each color band shows the requests/sec for one server, and the color bands are stacked on top of one another in a way that allows you to read the requests/sec served by the application at the top. Stack graphs give a thorough overview of what is transpiring in sum:
Users can add a Cluster Monitoring Widget to the Dashboard in two ways: either through the Dashboard or from the Monitoring tab of a Deployment. Cluster Monitoring Widgets help users view the health of their Deployments directly from the Dashboard Overview tab.
You are prompted to specify a Deployment at which point you are presented the fields that help define your Cluster Monitoring Widget.
Click Save.
Your Cluster Monitoring Widget will be editable directly in the Dashboard.
If a Cluster Monitor does not already exist for a Deployment, you must navigate to a Deployment with running Servers in order to enable Cluster Monitoring.
Filter by ServerTemplate, Server, or Tag from the dropdown menu. You can also click Apply without choosing a filter to get all of the Servers in your Deployment to be included in the Cluster Monitor.
Your Cluster Monitor graph is added as a Widget to the Dashboard. You can then edit your Cluster Monitoring Widget directly in the Dashboard.
Be aware of the following parameters and limitations surrounding Cluster Monitoring when working with this feature:
For example, select Server Name from the drop-down menu
As discussed earlier, you may want to fine tune your filter, look at several Representative Servers, and/or zoom in on graphs. Along the way, you can save various graphs for further analysis.
Sometimes you may want to drill down on a specifc server when using Cluster Monitoring. You may want to drill down to a specific server when:
© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.