The initial alpha release of stacked graphs and heat maps is best described on the RightScale Blog. This initial release has a number of limitations described here.
Table of Contents
All paid accounts have access to the cluster monitoring that displays one graph per server. This is accessed by choosing the "add cluster monitor" link on a deployment monitoring tab. The stacked graphs and heat maps are enabled upon request. When enabled, a color icon is displayed in the top-right of each cluster monitor to switch between the default individual graphs, a stacked graph, and a heat map.
The server selection is identical to the prior existing cluster monitoring functionality using individual per-server graphs. When switching between graph types the set of servers remains unchanged. One side-effect is that if the set of servers includes servers in non-supported clouds/regions the data for these will not be displayed (see below).
Generally all graph types that display one or two variables can be displayed in the form of stacked graphs or heat maps. Examples are cpu_idle, bytes_in & bytes_out, requests/sec. A small number of such graphs use special case calculations internally and cannot be shown as stacked graphs or heat maps. Examples are the df (disk usage) graphs. Also, overview graphs that show a number of variables on one graph cannot be shown as stacked graphs or heat maps. Examples are apache scoreboard, mysql reads. The cpu_overview graph is the one exception: the stacked graph and heat maps show "cpu busy" calculated as "100% - cpu_idle - cpu_steal". In all cases where a stacked graph or heat map cannot be displayed an error message will appear.
This release only supports servers running in EC2 US-EAST-1, RackSpace, and private clouds (Cloud.com / Eucalyptus). Servers running in other clouds / regions will appear in the legend but show no data.
The fetching of the data is currently serial, it thus takes a few seconds to produce a graph, specially if many servers are shown. This is generally not a problem but can become an issue if one of the data storage servers is down, in which case delays of tens of seconds are possible.
Due to the serial fetching the cluster monitoring is limited to 100 servers (as it has always been). If the pattern matches more than 100 servers a sampling is shown, which includes the oldest running servers, the most recently launched servers, and randomly selected servers from each ServerTemplate.
The data series for each server is clipped at the 90th percentile to avoid having data spikes, which are often erroneous, push the scale out of a rasonable range. This can currently not be disabled. The exact calculation is that a max data series is calculated across al servers, the 90th percentile value of that is calculated, 10% is added, and then all data values are clipped at that value when they are displayed.
The biggest limitation of the current stacked graphs and heat maps is the legend. A legend is shown for the larger graph types. For stacked graphs, the servers are listed from the bottom of the stack to the top. The color swatch can help identify the server in the graph, but this is often very difficult. We are planning on changing the legend to allow a server to be clicked to highlight its portion in the graph. For heat maps the servers are listed in the legend top to bottom. The numbers correspond to the Y axis and the color spectrum changes every 10 servers to help counting. Identifying servers is possible if not very pleasant. We are planning on switching to a roll-over legend.
Feedback on these graphs is appreciated. Please email tve at rightscale directly. Thanks!
© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.