Set up a Hadoop NameNode and many DataNodes to form a Hadoop HDFS cluster.
|Hadoop node type||The type of server that is being launched, either a namenode (master) or datanode (slave).||text:namenode|
|Hadoop namenode dfs.replicaton property||Sets the nodename dfs.replication property. See the Hadoop documentation for more information about this property.||text:3|
|Namenode firewall port||The firewall port to open to for Filesystem metadata operations.||text:8020|
|Namenode http firewall port||The firewall port to open for namenode http connections.||text:50070|
|Public SSH Key||The public key installed on each datanode to allow the nodename connections. This must be the public key pair to the private key below.||cred: public_ssh_key|
|Private SSH Key||The private ssh key installed to allow nodename connections to the datanodes. must be the private key pair of the public key above.||cred:private_ssh_key|
If you are using a block device such as Amazon EBS enter the following inputs. Add the block_device:setup_block_device recipe below the block_device:default recipe and configure the following inputs:
|Number of Volumes in the Stripe (1)||To use striped volumes with your databases, specify a volume quantity. The default is 1, indicating no volume striping. Ignored for clouds that do not support volume-based storage (e.g., Rackspace).||text: 1|
|Total Volume Size (1)||Specify the total size, in GB, of the volume or striped volume set used for primary storage. If dividing this value by the stripe volume quantity does not yield a whole number, then each volume's size is rounded up to the nearest whole integer. For example, if "Number of Volumes in the Stripe" is 3 and you specify a "Total Volume Size" of 5 GB, each volume will be 2 GB. |
If deploying on a CloudStack-based cloud that does not allow custom volume sizes, the smallest predefined volume size is used instead of the size specified here. This input is ignored for clouds that do not support volume storage (e.g., Rackspace).
Important! The value for this input does not describe the actual amount of space that's available for data storage because a percent (default: 90%) is reserved for taking LVM snapshots. Use the 'Percentage of the LVM used for data (1)' input to control how much of the volume stripe is used for data storage. Be sure to account for additional space that will be required to accommodate the growth of your database.
|Percentage of the LVM used for data (1)|| |
The percentage of the total Volume Group extents (LVM) that is used for data storage. The remaining percent is reserved for taking LVM snapshots. (e.g. 75 percent - 3/4 used for data storage and 1/4 remainder used for overhead and snapshots)
WARNING: If the database experiences a large amount of writes/changes, LVM snapshots may fail. In such cases, use a more conservative value for this inputs. (e.g. 50%)
Launch the server and wait until it becomes operational before moving on to the next step.
Update the deployment inputs with the following DataNode-specific inputs:
|Hadoop node type||Type of server that is being launched, either a namenode (master) or datanode (slave)||text:datanode|
|Datanode address firewall port||Firewall port for datanode address||text:50010|
|Datanode http firewall port||Firewall port for datanode http||text:50075|
|Datanode ipc firewall port||Firewall port for datanode ipc||text:50020|
To launch more DataNode servers, clone the DataNode server, name it appropriately and repeat. You can also create a server array for the datanode. Use the Min and Max servers to keep the number of DataNode servers you need in your cluster.
© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.