Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Tutorials > Apache Hadoop Cluster Setup

Apache Hadoop Cluster Setup

Prerequisites

Steps

Credentials

Create the following credentials:

  • PUBLIC_SSH_KEY - create a credential using your public SSH key as the value.
  • PRIVATE_SSH_KEY - create a credential using your private SSH key as the value.

Import and Clone the ServerTemplate

  1. Import the Apache Hadoop ServerTemplate from the MultiCloud Marketplace.
  2. Clone the ServerTemplate to make an editable copy.
  3. Commit the revision.
  4. (optional) If you plan to make backups of the data, you should import the Storage Toolbox.

Launch the NameNode

  1. Use the committed, cloned ServerTemplate to add a server to your deployment.
  2. Click launch and configure the following inputs at the deployment level:
 Name Description Recommended Value
 Hadoop node type The type of server that is being launched, either a namenode (master) or datanode (slave). text:namenode
 Hadoop namenode dfs.replicaton property Sets the nodename dfs.replication property. See the Hadoop documentation for more information about this property. text:3
 Namenode firewall port The firewall port to open to for Filesystem metadata operations.  text:8020
 Namenode http firewall port The firewall port to open for namenode http connections. text:50070
 Public SSH Key The public key installed on each datanode to allow the nodename connections. This must be the public key pair to the private key below. cred: public_ssh_key
 Private SSH Key The private ssh key installed to allow nodename connections to the datanodes. must be the private key pair of the public key above. cred:private_ssh_key

Block Device Inputs

If you are using a block device such as Amazon EBS enter the following inputs. Add the block_device:setup_block_device recipe below the block_device:default recipe and configure the following inputs:

Name Description Example value
 Number of Volumes in the Stripe (1) To use striped volumes with your databases, specify a volume quantity. The default is 1, indicating no volume striping. Ignored for clouds that do not support volume-based storage (e.g., Rackspace). text: 1
 Total Volume Size (1) Specify the total size, in GB, of the volume or striped volume set used for primary storage. If dividing this value by the stripe volume quantity does not yield a whole number, then each volume's size is rounded up to the nearest whole integer. For example, if "Number of Volumes in the Stripe" is 3 and you specify a "Total Volume Size" of 5 GB, each volume will be 2 GB.
If deploying on a CloudStack-based cloud that does not allow custom volume sizes, the smallest predefined volume size is used instead of the size specified here. This input is ignored for clouds that do not support volume storage (e.g., Rackspace).
Important! The value for this input does not describe the actual amount of space that's available for data storage because a percent (default: 90%) is reserved for taking LVM snapshots. Use the 'Percentage of the LVM used for data (1)' input to control how much of the volume stripe is used for data storage. Be sure to account for additional space that will be required to accommodate the growth of your database.
text: 10
 Percentage of the LVM used for data (1)

The percentage of the total Volume Group extents (LVM) that is used for data storage. The remaining percent is reserved for taking LVM snapshots. (e.g. 75 percent - 3/4 used for data storage and 1/4 remainder used for overhead and snapshots)

WARNING: If the database experiences a large amount of writes/changes, LVM snapshots may fail. In such cases, use a more conservative value for this inputs. (e.g. 50%)

text: 90%-

 

Click Save.

Launch the server and wait until it becomes operational before moving on to the next step.

Launch a DataNode server

Update the deployment inputs with the following DataNode-specific inputs:

Name Description Example value
 Hadoop node type Type of server that is being launched, either a namenode (master) or datanode (slave) text:datanode
 Datanode address firewall port Firewall port for datanode address text:50010
 Datanode http firewall port Firewall port for datanode http text:50075
 Datanode ipc firewall port Firewall port for datanode ipc text:50020
  1. Click Save.
  2. Click Launch.

Additional steps

To launch more DataNode servers, clone the DataNode server, name it appropriately and repeat. You can also create a server array for the datanode. Use the Min and Max servers to keep the number of DataNode servers you need in your cluster.

You must to post a comment.
Last modified
21:22, 16 May 2013

Tags

This page has no custom tags.

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.