As cloud computing continues to evolve many questions arise around the best way to design a highly reliable site on the cloud with failover and recovery. As a system administrator you must always plan for failure. This document will help explain the best way to create affordable and reliable sites on EC2 using Elastic IPs (EIP) and availability zones.
Elastic IPs (EIP) and availability zones are two key tools for creating stable architectures. An Elastic IP (EIP) allows users to allocate an IP address and assign it to an instance of their choice. Elastic IPs are dynamically re mappable IP addresses that make it easier to manage servers in the cloud because each IP address can be reassigned to a different instance when needed. An availability zone is essentially a regional data center on EC2. In the Dashboard, you have the flexibility of specifying an EIP and availability zone when you launch a server on EC2, which gives you the ability to create basic to complex site architectures that satisfy your site's cost and reliability requirements.
There are several factors to consider when designing your site's architecture
The answers to these questions will help determine how you design your site. But, if you pick the wrong design, don't worry because you can easily change architectures at any time. That's the beauty of cloud computing! You are no longer limited by your physical hardware. You only pay for what you use.
The table below highlights the key differences between 3 sample architectures.
|Basic||$||5-10 min||EIPs with 1 Avail. Zone and a backup deployment (ready to launch)|
|Intermediate||$$||Minimal||EIPs with 2 Avail. Zones (few instances in zone 2)|
|Advanced||$$$||None||EIPs with 2 Avail. Zones (duplicate setup in zone 2)|
Remember, you only have to pay for what you want. There are several different ways to design a highly reliable site. However, it depends on how much you're willing to pay for what level of failover and reliability.
The production deployment runs in a primary availability zone. A backup "clone" deployment is ready to be launched in a different availability zone if the primary zone fails. This is the most affordable failover option, because deployments are completely free. Simply create a clone of your production deployment and save it as a backup deployment. If the primary zone ever fails, all you will have to do is manually launch the backup deployment. The estimated downtime would be about 5-10 minutes after you launch the backup deployment, plus any additional time that is needed to load the database. When launching the backup deployment, make sure that the appropriate EIPs are associated with the new frontend instances.
In the example diagram below, the production deployment is live and running while the backup (clone) deployment is already configured in the Dashboard and ready to be launched in a different availability zone if there is a failure in the primary zone and your production deployment disappears. The Master and Slave databases will be restored with the same backup file so that they are identical copies. However, since database backups are saved to S3 every 10 minutes, you may lose some data. If you cannot afford to lose data, you should consider using a more advanced failover architecture. Notice that the same EIPs will be used, except they will point to the new instances.
If your production deployment cannot afford a 10 minute downtime and you're willing to invest more for added redundancy, then you can set up a slightly diversified deployment across two availability zones. In this example, we've placed the Slave-DB in a different availability zone. This may be an ideal setup if you do not have a lot of data being transferred between your application and database, and you're willing to pay for the extra instances and usage. Remember, data transferred between instances across different availability zones will costs $0.01 per GB.
If your production deployment requires the highest level of reliability with as close to 100% uptime as possible, you definitely have the flexibility to design an extremely reliable architecture, but you must be willing to pay for the extra redundancy. In this setup, a single instance can be deleted at any time and the site will continue to operate normally and does not require any responsive action. If an availability zone completely fails, you will still have a completely functional site running in a different availability zone. However, the cost of running this type of deployment on EC2 will double, plus the data transfer costs between availability zones. From a performance perspective, your users might experience a slightly slower performance since data is sent across different availability zones, but you'll have an extremely reliable architecture on the cloud.
If you are using an advanced deployment architecture and the availability zone that contains your Master-DB fails, all incoming requests will automatically be rerouted to load balancer-2 and serve content from Slave-DB in us-east-1b. Visitors to your site might experience slower performance, but your site should remain fully functional. When you are ready, you can promote your Slave-DB to Master-DB and then launch a new load balancer, app servers, and a Slave-DB in a new availability zone. Notice that the EIP that was previously pointing to load balancer-1 (126.96.36.199) is now pointing to load balancer-3 in us-east-1c. When an EIP is reassigned to a different instance, it will take approximately 3 minutes for an EIP address to be transferred to a new instance. The "WaitFor" feature inside our Run RightScripts ensures that the EIP is fully settled before it receives any requests.
© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.