Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Guides > Lifecycle Management > Deployment Management > Performing Upgrades in the Cloud > Minor Software Upgrade Scenarios > Response to a Server Failure

Response to a Server Failure

Table of Contents

 

icon-DisclaimerUpgradeStrategies-v1.png The following software upgrade strategies provide general guidelines and principles. Please use discretion at all times to develop your own software upgrade strategies that are sufficient for your application and environment.

Overview

When you experience a server failure in the cloud, it's important that you remember to always "roll forward" instead of trying to fix the problematic server.  The important thing is to get the production site back to a stable state.  Later you can troubleshoot the bad server and properly diagnose the cause of the problem without having to worry about site downtime. 

Steps

For example, let's say one of your FrontEnd servers becomes problematic and stops serving traffic.  All of a sudden your site is running at 50%.  How do you resolve this problem?

diag-ServerFailure2-v1.png
 

  1. Clone the operational Front End server ("Front End-1").  Rename it "Front End-3" server.
  2. Assign the new server the Elastic IP of the down server (EIP-1) so that it inherits the EIP at boot time.  Launch the "Front End-3" server.

    diag-ServerFailure3-v1.png
     
  3. As a safety precaution, disconnect the "Front End-1" server from the load balancer (if it's still connected).  Run the "LB app to HA proxy disconnect" RightScript on the "Front End-1" server.   You do not want "Front End-2" to  continue sending requests to a bad server. 
  4. Connect "FrontEnd-2" and "FrontEnd-3" servers to each other by running the "LB Application to HAProxy connect" (exact name may vary) RightScript on "Front End-3" so that it will properly load balance across both frontend servers.
  5. Once the site is stable you can go back to the "Front End-1" server and properly diagnose the problem.  It might be useful to keep the bad server up and running to perform various tests.  Use the monitoring features and audit entries to help you diagnose the cause of the problem.  Once you've finished your diagnosis, shut down the server.  The key advantage of the Cloud and RightScale is that you can dramatically reduce the amount of site downtime in the event of a server failure.  Instead of wasting time trying to fix a problematic server, now you can quickly replace it and then troubleshoot the problem at a more convenient time. 
You must to post a comment.
Last modified
22:59, 16 May 2013

Tags

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.