Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Guides > RightScale Grid Technical User Guide > Customizing a Grid Computing System

Customizing a Grid Computing System

Whether you're creating a new RightScale Grid application or porting an existing application to RightScale Grid, you must perform the following tasks:

  1. Create Work
  2. Do Work
  3. Process Results

The easiest way to make a Grid Computing System using RightScale is to start with the Quickstart Deployment and then modify it accordingly.  If time permits, it is also recommended you create a Grid Quickstart Deployment using the Grid Quickstart Deployment User’s Guide to familiarize yourself with the Grid Computing. This guide will walk you through the technical details on how all the components of the Grid Computing components operate to create work, do work and process results.
 


Setting Up a Grid System

Subscribe to the Macro

  1. Navigate to Design -> Library -> Macros
  2. Select the "Shared" link under Categories.
  3. Locate the "Grid Quickstart Deployment" (published by "RightScale Services"). Your RightScale account must have a Grid Computing Solution Pack to view this macro. If you do not have access, contact your account manager for an evaluation or for upgrade options.
  4. Select the Subscribe action link.  The Grid Computing Deployment Macro is now in your 'Local' macro collection. You can now run the macro.
     

Run the Macro

  1. When viewing the subscribed macro in your 'Local' collection, click the Run action button to execute the macro.
  2. An input screen will appear that asks you to select resources to be reused. There is no need to enter any values in this screen.  Click the Run button.  Note: There will be a slight delay until you see feedback that the macro is processing.  A dialog box will show you the progress as the macro starts to execute its commands.  The macro will create the following components:
    • Grid Computing Deployment
    • Job Coordinator Server
    • Worker Server Array
       

Creating Work

Create a Job Producer

A Job Producer is a program (often written in Ruby) that can parse raw datasets on a local filesystem or database. It can then decompose these datasets into jobs.  A job is a grouping of a work unit and any necessary input files.

Note: There is no requirement for the Job Producer to be running in the EC2 environment. It may be written in any language and may run anywhere on the Internet.  

The job producer uploads the input files to an S3 bucket and creates a work unit which describing the work to be performed. It must encode the work unit so that it can be transmitted in the body of an AWS SQS message. When it enqueues the message body to AWS SQS, it returns a message ID that uniquely identifies this message.

Modifying the Quickstart Job Producer

The "RSGrid Quickstart Job Coordinator"  ServerTemplate describes a Server which, when instantiated, installs both the Job Producer and the Job Consumer. The Job Producer is made up of a configuration file (jobspec.yml) and a Ruby program (jobproducer.rb). The "RSGrid Quickstart Job Coordinator" ServerTemplate has two scripts which installs these two elements and related resources:

  • The "RSGrid Quickstart JobSpec Generator" RightScript generates and installs the jobspec.yml configuration file based on Inputs set at the Deployment level.
  • The "RSGrid Quickstart Job Coordinator" RightScript installs the jobproducer.rb file as well as input files.
     

To modify the Job Producer (and Job Coordinator), you must clone the "RSGrid Quickstart Job Coordinator" ServerTemplate and related scripts before you make changes. You must also modify the defined Server in the Quickstart Deployment to refer to the newly cloned ServerTemplate.

The jobproducer.rb code is stored as an attachment to the RSGrid Quickstart Job Coordinator Scripts script. You can make changes directly through the RightScale Dashboard. Or, you can download the attachment, make changes in an editor of your choice, and then re-upload the changes. A best practice is not to store the code as an attachment to the script, but to change the script to install the code from your code repository system.

Creating your Own Job Producer

The Ruby program (jobproducer.rb) is a good example of all the things necessary to create work for the Grid Computing System. If you would like to create your own, you must perform the following steps:

  1. You must have an account that allows you to interact with AWS SQS and AWS S3. If you use a ServerTemplate in RightScale to run your JobProducer, you can install the RightScale AWS interface gem (right_aws).
  2. Your code should upload the data files to S3.
  3. You must either use the default RightScale Yaml encoder or use your own encoder to encode a work_unit before transmission to the input queue. If you decide to create your own, you will need to use it both in the Job Producer to encode the message, but also in the worker array for the Worker_Daemon to decode the message. See the Creating Your Own Message Encoder section for more details.
  4. After encoding the work_unit, you must then Enqueue the message to the SQS input queue.


The Work Unit must contain a “s3_download:” field. It should also contain a worker_name: and a created_at: field. The user is free to add any number of key/value pairs to the work unit structure. For a detailed description, see Data Passing and Data Structures. Remember that AWS SQS messages, are limited to 256KB. In most cases, the input message contains only the work unit metadata while the actual input to the worker application is uploaded to S3.
 


Doing Work

When work is queued in the system, messages are placed in the input queue. You must create a Worker Server Array that scales based on this input queue. The RightScale Platform (elasticity daemon) monitors the queue and launches new Worker Server Instances based on the parameters of the Worker Server Array.

The Worker Server Array

The Worker Server Array is a scalable RightScale Server Array that has a worker daemon and your worker code installed on it. The "RSGrid Quickstart" Worker ServerTemplate describes the characteristics of the server and the software packages to load. The macro creates a Worker Server Array Definition which specifies the ServerTemplate to use and specifies the parameters that define the scaling behavior. To customize RightScale Grid to your application, you would clone the RSGrid Quickstart Worker ServerTemplate and the associated RightScripts. You would then change the Worker Server Array Definition to specify it to use your new cloned ServerTemplate. You can then customize the cloned template and related scripts for your application. In this section, we will describe all the components that process work unit messages to create results.

Worker Server Array Parameters

In the Worker Server Array Definition, there are parameters that describe it and how the Worker Server Array is scaled.

<<reuse new server array content>>

The Worker Daemon

At boot time the RightScale Grid worker daemon (“rightworker”) is started and it reads the rightworker.yml config file. The daemon loads default codecs from gem 'lib/codecs' then loads the user's codecs and workers from the work folder.

The Worker Daemon is responsible for retrieving a message from the input queue, downloading and input files to the local machine, and constructing a message_env structure with all the relevant data needed by the worker code to perform its work.

The worker daemon passes control to the worker code by invoking a ruby class with a 'do_work()' method. The do_work() method is analogous to the main() function in other programming languages.

RightScale Grid configuration file (rightworker.yml)

The RightScale Grid configuration file (usually named rightworker.yml) is a YAML-formatted text file of configuration parameters. Some of these parameters specify the behavior of the Daemon, while others are passed to the worker code through the message environment data structure. This file is read by the Daemon only once at startup time. Configuration changes should be made in the "RSGrid Quickstart rightworker Generator" RightScript in the "RSGrid Quickstart Worker" ServerTemplate.

diag-WorkerDaemon-v1.png

The variables are grouped into sections:

'Environment' section
The environment section is the highest-level section in the configuration file and is commonly used to create different configurations for development, testing, production, and the like. Environments are selectable using the '-e <environmentname>' switch when starting rightworker. Each environment section should contain one subsection called 'RightWorkersDaemon'.  Beside the 'RightWorkersDaemon' subsections, there are no other variables to be set within the 'Environment' section.

'RightWorkersDaemon' section
Within an environment section, this section holds all the RightScale Grid-specific configuration information.

‘Array_downsize_manager’ section
Within each 'RightWorkersDaemon' section, this section defines how you want to scale down your array when the array doesn't receive a sufficient amount of work. If this section is not defined, the default is taken.

'User' section
Within each 'RightWorkersDaemon' section, this optional section holds application-specific information that you want the daemon to pass to your worker code. It can contain any number of key/value pairs. RightScale Grid does not read this information, but rather passes it on as part of the message_env hash.

'Queues' section
Each 'RightWorkersDaemon' section has a 'queues' subsection which defines one or more input queues to monitor. If multiple queues are specified, RightScale Grid will monitor them in round-robin order. The title of each input queue subsection of the Queues section must be the exact name of an SQS input queue.

RightworkersDaemon Section
Name
Type
Default
Description
aws_access_key aws key   The user's AWS access key
aws_secret_access_key aws password   The user's AWS secret key
log Path/filename   Path to the RightScale Grid (not application) log file. Stream names like STDOUT or STDERR are also legal
email Email   Address to which to (optionally) email errors
halt_on_exit Boolean true If set to 'true', the RightScale Grid daemon and the EC2 instance will both exit if no work remains and it has been 55-99 minutes (since the RightScale Grid daemon never exits)
workers Integer 1 The number of RightScale Grid workers to be started
array_downsize_manager Section   Optional Section
user Section   Optional Section

 

Array_Downsize_Manager Section
Name
Type
Default
Description
Name 'default' or 'enhanced' default If set to 'default', we do not apply any additional rules.  An instance will exit if no work remains and it has been 55-99 minutes since the start of the hour. If set to 'enhanced', the algorithm tries to get rid of old instances which are not getting fully utilized. We will let active instances check the message queue more often, which will lead to idle instances staying more idle and eventually halting. If this section is not mentioned, we use the 'default' manager.

 

User Section
Name
Type
Default
Description
any_key Any_value   Optional number of key/value pairs to pass to the worker code.

 

Input Queues Section
Name
Type
Default
Description
audit_queue SQS name   The SQS queue to which audit messages will be sent. If omitted, audit messages will be sent to no queue.
error_queue SQS name   If set, error results will be sent to this SQS queue. Sending error results to an error queue doesn't restrict them from also being sent to a result queue or result server.
result_queue SQS name   The SQS queue to which result messages will be sent. If omitted, results will be sent to no queue.
result_server URL   A URL to which result messages will be sent with HTTP POST. The results are sent as a YAML payload in the request body. If omitted, results are not sent to any HTTP server. Note that results may be sent to both a result server and a result queue.
result_server_ignore_errored Boolean false If 'true', WorkerDaemon will place errors in the result server. This is the same behavior can be invoked by the worker_code if it returns “ignore” as the first word in its results.
result_queue_ignore_errored Boolean false If 'true', WorkerDaemon will place errors in the results queue. The same behavior can be invoked by the worker_code if it returns “ignore” as the first word in its results.
message_decoder     The name of the Ruby class to use as a message codec. The class must implement the codec interface described above, and the file containing the codec class definition must reside in the working directory of RightScale Grid.
Invocation_model Oneshot Oneshot Deprecated in RightGrid v1.3 and above
receive_message_timeout Integer   The number of seconds the daemon will mark the message invisible while the worker code processes the message.  This is also the maximum allowed time the worker can spend on processing the message.  If the processing time exceeds this value,the worker daemon will terminate the worker thread and the message will be deleted from the input queue and placed in the error queue.  This reduces the possibility of processing the same work unit twice.
lifetime_after_fault Integer 3600 If a non-permanent error occurs while processing a work unit, RightScale Grid will process it again for a maximum of 'life_time_after_fault' seconds. If the work unit hasn’t been successfully processed in that time interval, it is deleted from the input queue and place in the error queue. This parameter is only used if the message has a 'created_at' timestamp in its body.
default_worker_name Class name   The class name of the worker class to invoke on work units if the worker_name is omitted in the message. If a work_unit specifies a worker_name but that class cannot be found, the system will throw that work_unit to the error queue.
publishkeys Boolean false If false, the system will not include the values of the AWS credentials in log files.
s3_in Path   Specifies a location on the local filesystem under which all S3 input data will be placed. By default, this input data is staged to an automatically generated location on the local file system.
s3_in_delete Boolean true If 'true', remove downloaded files when the worker finishes processing the work_unit. Only files are removed; directory structures are left intact.
s3_in_overwrite Boolean true If 'true', files already present on the local file system will be re-downloaded from S3 and overwritten when each new work_unit requires them.
s3_in_flat Boolean false

Controls the collapse of file hierarchies on S3 into a flat file space on the local file system. If the downloaded file is not specified as 'local_path_and_name' then it is:

set to 'message_env['s3_in']/bucket/key' if s3_in_flat==false;

-or-

set to 'message_env['s3_in']/filename' where 'filename' is a key base name without any bucket if s3_in_flat == true.

If the file has its own local name specified, 's3_in_flat' does not affect it.

s3_log Bucket/Key   Specifies a bucket and key on S3 under which RightScale Grid will upload any log files generated by the application. If omitted, logs will not be uploaded.
s3_out Bucket/Key   Specifies a bucket and key on S3 under which RightScale Grid will upload any output files generated by the worker. If omitted, output will not be uploaded.
Worker Daemon Logic

The worker daemon calls the Worker Code:

  1. Pull a new work unit from the "input" queue.
  2. Generate names and create folders for Temporary, Log, and Output files
  3. Decode the work unit using the RightDecoder.unpack method.
    1. If the decoder returns a hash, it may contain any keys the user wishes to define but it must contain the following keys:
      1. created_at’ => ‘string’ #
        Creation time of this work unit (UTC) in format # “YYYY-MM-DD HH:MM:SS UTC”
      2. ‘worker_name’ => ‘string’ #
        The name of the worker to start. If omitted, the worker daemon will use the default_worker_name: as specified in the rightworker.yml. If the ‘default_worker_name’ is unset, the RightScale Grid worker daemon stops processing the message.
      3. ‘s3_download’ => [array ] #
        A list of files to download from S3. If this element is omitted, or no files can be downloaded from S3, your worker code will not be invoked.
    2. If the decoder returns an array, each entry in the array is interpreted as a file to be downloaded from S3. In this case, the worker_name is taken from the ‘default_worker_name’ setting in the rightworker.yml config file. If the ‘default_worker_name’ is unset, the RightScale Grid worker daemon stops processing the message.
    3. If the decoder returns another type of variable, the return data is converted to a hash.
  4. Download S3 objects. For the worker class to be called, at least one file must be specified in the 's3_download' key. File(s) will be downloaded from S3. This key must point to a hash where every key points to a hash (s3_key=>local_name) or to an array (every item is string=s3_key or is a hash: s3_key=>local_name). If 'local_name' is omitted local files will have the same names as S3 keys (look at download_s3_files method).
  5. Create the message environment to pass into the actual processing routine. Any parameters that contain substitution strings (e.g. data, time) are evaluated.
  6. Determines the worker class name to invoke based on the worker_name contained in the message. If the name of worker to be executed is omitted in the work unit, then the default_worker_name, as found in the daemon’s configuration file, will be used.
  7. Worker Daemon loads the class into memory (if the first time), creates a thread and call the do_work() method of the worker class.

 

Invoking Worker Code

Worker Code is written in a Ruby Class that has a do_work() method. If your worker code, like most, is not written in Ruby, then a Ruby Kicker Class should be created to invoke your code. The applications running on RightScale Grid span a wide range of languages, I/O techniques, and programming styles. In practice few users want to make significant changes to their application when porting it over to the cloud. In order to bridge language and other boundaries between RightScale Grid (a Ruby application), and your application (in any language), we recommend that you write and maintain a Ruby 'kicker' class. The kicker class can be thought of as a script wrapper around your application that takes care of any setup and teardown that's required to run on EC2. Perhaps your app requires a file system layout for input data that is different from layout delivered by the RightScale Grid. A kicker class can easily transform input formats and re-transform output as needed.  Other examples of setups done by kicker classes include setting environment variables, building chroot jails, setting load paths and loading libraries, running input integrity checks, and transforming application exceptions and errors. The kicker class must implement a do_work() function. The kicker gets execution control by being exec()'ed in a forked process. The configuration variable path_to_executable must be set to the filesystem path of the kicker script.

Results Processing after the Worker Code Returns
  1. Check the return value of the worker (right_result). For a detailed flowchart of result processing logic, see RightScale Results Processing Logic
  2. Upload all output files from 's3_out' (including subdirectories) to S3 (if @@config[:s3_out] is set) then upload all files (or dirs) that have been given by right_result s3_upload? hash. (variable substitution works for this item)
  3. Upload logs to S3 (including subdirectories) from 's3_log' dir (if @@config[:s3_log]) (vars substitution works for this item)
  4. Delete the message from the work queue if no errors occurred in previous steps.
  5. Delete all temp folders.
  6. Check the input queue and terminate (and the current EC2 instance) if there are no tasks in queue and it is the 55th to 59th minute of the hour since the instance started. This policy optimizes for the fact that EC2 charges for each started hour.
     

Running the RightScale Grid Worker Daemon

The rightworker daemon can be started/restarted with the command:

rightworker start|restart [-i item_name] [-d] [-c path_to_rightworker.yml] [-e environment]
  • The -i <item_name> parameter selects a specific, single input queue to monitor. This option is useful if your rightworker.yml specifies several input queues or queue setups but you only wish to run one.
  • The -d switch forces rightworker to daemonize.
  • The -c parameter is optional in cases when the rightworker.yml file is not located in the same directory as the worker application.
  • The -e <environment> parameter selects a specific environment to run under. All queue configurations will be pulled from the corresponding environment section of the rightworker.yml configuration file.

 

The daemon can be stopped with the command:

rightworker stop [-i item_name]

The -i <item_name> parameter selects a specific, single input queue to stop. Specifying this option is neither best practice nor is it recommended.
Stopping the rightworker daemon completely will cause the instance to be terminated.

The Worker Code

The Code

The code is typically written as a Ruby class with a do_work() method. The file must be located in the RightScale Grid directory, typically /mnt/rightgrid.

The Passed Environment

When the worker code is invoked by the worker daemon, it is given two structures: the message_env and the message. The message structure is the original, un-decoded message from the input queue. The message_env is constructed by the worker daemon from the unpacked message with all the key/value pairs from the message. In addition, the worker daemon adds additional key/value pairs necessary for the worker code (e.g. input files directories, tmp directories, log directories, etc.). For an exhaustive description of this structure, see . the section “Data Passing and Data Structures." 

Accessing Input Files

In a typical RightScale Grid setup, input files are copied from S3 to the local machine by the worker daemon. The worker daemon specifies the location of this directory to the worker code in the message_env{} data structure as :s3_in.

The worker code, however, is free to retrieve data where it needs to get it. If the mechanism for passing files to the worker S3 is undesirable, simply write code to retrieve it from wherever it resides (ftp, url, database, etc.).

Generating Output Files

The worker code can generate any number of output files and place them in the local output directory as specified in the message_env. At worker code completion, the worker daemon will transfer all files in the local output directory to the S3 bucket/output directory as specified in the worker daemon’s configuration file and passed to the worker code as $outputdir.

Constructing Results
  • Data Structure
  • Error condision
  • Audit Fields
  • Chained results

 

Logging

The worker code can log in two different ways. It can write to stdout, which will be captured by the worker daemon and placed in a file called user_worker.log and stored in the S3 bucket/log directory. Or, the worker code can construct its own log file and place it in the local log directory as specified by $log_dir in the message_env{}. Then, the worker daemon will move this file(s) to the s3 bucket/log directory.

Creating, Testing and Installing Custom Worker Code

Write your own Worker Class or Kicker Class in ruby.  Launch a worker array and transfer (FTP) your worker class and other necessary files to the running worker instance.  Now you can test your worker class by modifying and running the TestClass.rb program on the worker instance.  Once you are sure that your worker class functions correctly, you can add your new ruby file as an attachment to the RSGrid Quickstart Workder Code Install & Start Script as part of the RSGrid Quickstart Worker ServerTemplate.  Lastly, terminate all running instances of your worker array and queue work into the system for your new Worker Class.


Processing Results

When the worker code and daemon have completed their work, result files will be stored on S3 and a corresponding message will be queued in the Output queue.

Create a Job Consumer to process Output

Similar to Job Producers, Job Consumers are often written in Ruby.  They are programs that can parse raw datasets on a local file system or database. They decompose these datasets into Jobs. A job is a grouping of a work unit and any necessary input files.

There is no requirement for this component to be running in the EC2 environment. It may be written in any language and may run anywhere on the Internet.

The job producer uploads the input files to an S3 bucket and creates a work unit which describes the "work" to be performed. It must encode the work unit so that it can be transmitted in the body of an AWS SQS message. When it enqueues the message body to AWS SQS, it returns a message ID that uniquely identifies the message.

Modifying the Quickstart JobCoordinator ServerTemplate

The "RSGrid Quickstart Job Coordinator" ServerTemplate describes a Server which, when instantiated, installs both the Job Producer and the Job Consumer. The Job Consumer is made up of a configuration file (jobspec.yml) and a ruby program (jobconsumer.rb). The RSGrid Quickstart Job Coordinator ServerTemplate has two scripts which install these two elements and related resources:

  • The RSGrid Quickstart JobSpec Generator RightScript generates and installs the jobspec.yml configuration file based on Inputs set at the Deployment level.
  • The RSGrid Quickstart Job Coordinator RightScript installs the jobproducer.rb file as well as input files.


To modify the Job Consumer (and Job Coordinator), you must clone the RSGrid Quickstart Job Coordinator ServerTemplate and related scripts before you can make modifications. If you have already cloned this ServerTemplate from a previous step, you do not need to clone it again.

The jobconsumer.rb code is stored as an attachment to the RSGrid Quickstart Job Coordinator Scripts RightScript. You can make changes directly through the RightScale Dashboard or you can download the attachment, make changes in an editor of your choice, and then re-upload the changes. As a best practice, you should not store the code as an attachment to the script.  Instead you should change the script to install the code from your code repository system.

The jobconsumer.rb is written to actually process the output queue, error queue, and the audit queue depending on the parameters that you specify.

Creating your Own JobConsumer

The Ruby program, jobconsumer.rb, is a good example of all the things necessary to create work for the Grid Computing System. If you would like to create your own, you must perform the following steps:

  1. You must have a library that allows you to interact with AWS SQS and AWS S3. If you use a ServerTemplate in RightScale to run your JobConsumer, you can install the RightScale AWS interface gem (right_aws).
  2. You must read a message from the output queue.
  3. You must either use the default RightScale Yaml decoder or use your own decoder to decode a message. If you decide to create your own, you will need to use it both in the Job Consumer to decode the message and also in the worker array for the Worker_Daemon to encode the message. See Creating Your Own Message Encoder for details.
  4. Your code should also manipulate the resulting data files from S3 as needed.

 


Processing Error Messages

Error messages are returned when the worker code’s results return an exception. Specifically, if the :return element begins with the words 'exception' or 'aborted', the result will be considered an error.

The error message is constructed by the Worker Daemon and using the error_message{} data structure. For an exhaustive description of this structure, see Data Passing and Data Structures.

 


Processing Audit Messages

It is not necessary for you to process audit messages. You can tell the RightScale platform to process the audit messages and create statistics for you. However, if you want to access the raw audit messages for your grid application, simply disable the processing of the Audit Messages by the RightScale platform and process these messages yourself.

The audit message is constructed by the Worker Daemon using the audit_message{} data structure. For an exhaustive description of this structure, see Data Passing and Data Structures.
 

 

You must to post a comment.
Last modified
23:19, 16 May 2013

Tags

This page has no custom tags.

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.