Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Clouds > AWS > Amazon EC2 > Designing Failover Architectures on EC2 > How to set up an Intermediate Failover Architecture

How to set up an Intermediate Failover Architecture

Follow the instructions below to set up an Intermediate Failover Architecture as described in Best Practices for using Elastic IPs and Availability Zones.  This setup has a production deployment where instances are running in more than one availability zone.  In this setup, there are three front ends using Elastic IPs.  If there is a failure in the primary availability zone (us-east-1a), you will still have a running front end instance that is serving content from the secondary availability zone (us-east-1b).  Simply promote the Slave-DB to "Master" and launch your backup deployment in a different availability zone (us-east-1c).  Obviously, the performance of the site will suffer proportionally during an availability zone failure, but the site will remain operational until you can launch additional front ends and a new Slave-DB.  Remember, if you have instances running in multiple availability zones, you'll have to pay for the zone-to-zone transfer fees.  Data transferred between instances across different availability zones on EC2 costs $0.01 per GB.

In this example, we will use DNS with multiple A records.  One A record is created for each EIP with the same DNS name.  The DNS server will provide the rotation of A records.  See Advanced Failover Architecture example for HAproxy with superior load balancer performance. 

01-intermediate_prod_deployment_setup.gif

 

Setup Instructions 

Production Deployment 

If you follow the diagram above as an example, your production deployment should look like the following screenshot.  Notice that you're using an Elastic IP for each frontend and that the Slave-DB and FrontEnd-3 are in a different availability zone.  Make sure that you selected the correct Availability Zone for each server.

WARNING! The current default for a server's Availability Zone is "-any-."

01-intermediate_prod_deployment.gif

 

Now that you have an operational production deployment distributed across multiple availability zones, you should create backup instances that are ready to be launched in a failure scenario.  It's important to be proactive when designing your deployment for quick failover and recovery.  Remember, since you only have to pay for instances that are launched, it's a good idea to create "backups" for each instance that are properly configured and ready to be launched in a failure scenario. 

For an intermediate setup like the diagram above, we recommend creating backups of each type of instance that are ready to be launched with the appropriate Elastic IP and into the appropriate availability zone.  Once your deployment is running, simply clone each type of instance.  In this example, you should have the following two backups:

  • Front end server (www-4)
  • Slave-DB instance

Once you add your backups, your deployment can now look like the screenshot below. 

WARNING! You should not use the "start all" link once you've added backup instances.

01-intermediate_bkup_deployment3.gif

Notice that the backups do not have Elastic IPs or Availability Zones predefined because you don't know where you will launch these instances when problems occur.

You will not have a backup for the Master-DB because you have a redundant MySQL setup.  If the Master-DB ever fails, simply promote the Slave-DB to Master-DB and launch a new Slave-DB instance.  By configuring these instances ahead of time, you'll be "ready to launch" when problems occur.

 

Failover Scenarios

Failure in Primary Availability Zone (us-east-1a)

In this example, the us-east-1a Availability Zone stops providing service, but your server is still up.  By designing a deployment according to Best Practices, you've provided a deployment configuration that is tolerant of a one-zone failure. 
 

eip_fail_1.gif

Recipe

The first step is to get back to full performance and then add back a zone failure tolerance as a second step.  During this recipe, service is never interrupted.

  1. Promote the Slave-DB to Master-DB.
  2. Launch a new front end server (www-4) in the same zone as the existing server (www-3) and assign one of the Elastic IPs that was used by one of the terminated servers. 
    See Assign an Elastic IP at Launch.
  3. Launch a new front end server (www-5) in a different availability zone (us-east-1c) and assign the other Elastic IP that was used by one of the terminated servers.  (WARNING: You will be charged for cross-zone data transfer costs.)
    See Assign an Elastic IP at Launch.
  4. Launch a new Slave-DB instance in a new availability zone (us-east-1c).  Use operational scripts to attach the new Slave-DB to the current Master-DB to restart redundancy and replication.
  5. After a suitable delay, it would be a good time to take a backup of the Master-DB and examine your backup.

 

eip_fail_2.gif

Once again you have a distributed production deployment spread across multiple availability zones.  Remember, performance will be affected proportionately if a zone stops, but the important thing is that your service will continue to perform.

 

Failure in Secondary Availability Zone (us-east-1b)

 

eip_fail_3.gif

 

Recipe

In this example, all that has happened is that our redundant zone stopped performing.  In this recipe we will replace the failed server to reestablish high reliability.  During this recipe, service is never interrupted.

  1. Launch a new front end server in a different availability zone (us-east-1c) using the Elastic IP that was used by the terminated front end server in us-east-1b.  See Assign an Elastic IP at Launch.
  2. Launch a new Slave-DB instance in the same availability zone (us-east-1c). Use operational scripts to attach the new Slave-DB to the current Master-DB to restart redundancy and replication.
  3. After a suitable delay, it would be a good time to take a backup of the Master-DB and examine your backup.

 

 

eip_fail_4.gif

 

Once again you have a distributed production deployment spread across multiple availability zones.  Remember, performance will be affected proportionately if a zone stops, but the important thing is that your service will continue to perform.

You must to post a comment.
Last modified
21:32, 16 May 2013

Tags

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.