Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > ServerTemplates > Archive > Pre-11H1 > MySQL-S3 > Database Manager for MySQL-S3 Runbook > MySQL-S3 Problem Scenarios > Both the Master and Slave DBs are corrupted

Both the Master and Slave DBs are corrupted

There are two main data corruption cases that can occur in the master DB.

  1. Data corruption due to the modification of the data stored in disk, not due to SQL queries. For example, data mutations due to disk malfunctions, filesystem errors, etc.
  2. Data corruption due to modifications of the DB caused by an unwanted or incorrect SQL statements. For example, a bug in our application might "destroy/corrupt" the contents of our DB.

Verification: how do I know my system is in this situation?

If you think that your DB is corrupted you probably have seen something strange that you understand.  If it's caused by disk or filesystem corruption, you'll probably see error messages in the instance log...(i.e., syslogrelated or even mysql-server related). If the corruption is due to case 2, you'll probably see that there is data missing, or perhaps application errors complaining about missing relationships, records not found or failed transactions.

Case B - Data on both master and slave DB is corrupted

It is possible that the data corruption happens at the database contents level rather than being due to errors at a lower level (i.e, disk, filesystem...). This is commonly due to errors or bugs in the application, which might cause things like inserting incorrect records into the DB, deleting valuable ones or disconnecting table relationships. Since these changes are valid from an SQL point of view, they will be propagated to the slave instances immediately, therefore both master and slave databases will most likely be corrupted in exactly the same way.

The way to recover from this situation is to restore the latest backup taken before such corruption took place. The recovery mechanism for this case is somehow similar to the situation where There are no running DB instances (master or slave)! with the exception that we have to be more careful about which backup file to restore from...since restoring from the latest backup might not be a good solution if the backup was taken after the corruption already existed.

Required actions: What recipes should I apply to solve the situation?

Overall Strategy: Existing database instances should be discarded and a new master-slave cluster should
be started. Basically we're disabling the current MySQL instances and starting from scratch.

  1. Disable automatic backups from existing databases. See Disable or Enable Continuous Backups.
  2. Disable existing databases to prevent additional application errors. Use an SSH console and login to the instance(s) and execute the "service mysqld stop" action from the command prompt.
  3. Launch a new master DB using the Bootstrap MySQL server template. See Launch a new Master-DB.
  4. Check out the restored database, and iterate (restoring older backups) until backup is not corrupted.  See Restore a previous DB backup.
  5. If necessary, re-point or restart your application to use the new master (same dns name but different IP).
  6. Launch a brand new slave DB using the Additional MySQL server template.  See Launch a new Slave-DB.
  7. Once the instance is operational, Initialize Slave-DB from Master-DB.
  8. Analyze old master/slave databases to determine the root cause of failure and then terminate the instance(s).


NOTE: To locate the latest non-corrupted backup file, you must choose from the available binary backup files in your S3 directory. The bucket and path for these are specified by the BACKUPFILE_PREFIX and BACKUP_S3_BUCKET input variables in our MySQL templates (viewable by locating either master or slave MySQL templates, and then clicking on the 'show' link next to input variables).  It's important to know when the corruption occurred in order to ensure that you are restoring the most recent non-corrupted backup.  

You must to post a comment.
Last modified
21:34, 16 May 2013

Tags

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.