Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Guides > RightScale Grid Technical User Guide > Anatomy of a RightScale Grid Workflow

Anatomy of a RightScale Grid Workflow

This section will describe the end-to-end flow of a job through the RightScale Grid framework, highlighting the components supplied and automated by RightScale, as well as the components and responsibilities of the end user.


Overview

The overall architecture and workflow of a grid processing system implemented with the RightScale Grid framework is shown in Figure 1 below.

diag-GridOverview-v1.png

Figure 1 – RightScale Grid architecture and workflow

In this figure, everything shown within the box is under the control of the RightScale Cloud Management Platform. This includes managing the AWS infrastructure components, the worker processes, and the elasticity metrics and controls. The user-supplied components (shown in green) are limited to the Job Producer, the input files needed by a job, the worker code to execute the job, and finally an (optional) Job Consumer process that is responsible for any post-processing or output file manipulation that may be required. Although these Job Producer and Job Consumer processes are shown in Figure 1 as external to the RightScale platform, they can be (and often are) run under control of the platform as well.  As a result, you can have one setup that serves as an entire end-to-end grid processing solution managed within a single interface.

Prior to creating a workflow, the four SQS queues indicated in Figure 1 (input, output, error, and audit) need to be created, which can easily be performed via the RightScale platform by using a predefined macro to assist in various system configuration tasks. By running the RightScale Grid macro within the RightScale dashboard, these four queues will be created automatically.

The workflow process is initiated by the Job Producer. This can be an application running within the cloud (and under management of the RightScale platform), or it can be running in a customer or hosting provider infrastructure. The first role of the Job Producer is to create the input file(s) required by the job. It then constructs the work unit data structure with details about the inputs, and the work to be performed. Next, it uploads the input files to S3 and creates a message data structure which encompasses the work unit information, in addition to details on the input file(s) location. Finally, the Job Producer inserts this new message into the SQS Input Queue. This process is repeated for every new job that is to be introduced into the workflow. The Job Producer application can take virtually any form: a PHP script, a compiled application, a Ruby script, etc. The RightScale Grid macro mentioned previously generates a Job Producer Ruby script that can be modified and used in new applications, or it can be used as a model or template for developing a new Job Producer. The RightScale Grid macro creates a single Job Producer for simplicity of illustration, but multiple Job Producers can be run simultaneously feeding the same input queue (or even multiple input queues in more complex scenarios).

 


Elasticity Daemon

The RightScale platform contains an Elasticity Daemon whose function is to monitor the input queues, and to launch worker instances to process jobs in the queue. Different scaling metrics can be used to determine the number of worker instances to launch and when to launch these instances, with the most common metric being the number of jobs in the queue. Within the RightScale dashboard, the user can specify that for every N jobs in the input queue, a worker instance should be launched. The second metric that can be used is related to the length of time that jobs have been in the queue. This is useful in situations where particular Service Level Agreements (SLA) need to be met. Using this time-based metric, the user may specify that when the average time a job spends in the queue exceeds a certain threshold, a worker instance should be launched. Similarly, when the maximum time that any job has been in the queue exceeds a specified value, a worker instance is launched to assist in processing the load. (These averages and maximums are calculated using a random sampling of 10 jobs in the input queue.)  

Once the Elasticity Daemon has determined that additional worker instances are required, a call is made within the RightScale platform to initiate the allocation of these server resources (illustrated by the orange blocks in Figure 1 containing the Worker Daemon and Worker Code). These worker instances are launched in a server array, which can be controlled via configuration settings in the RightScale dashboard. These settings allow the minimum and maximum number of worker instances to be specified. This provides a mechanism to ensure there are always one or more instances available to process any jobs that are inserted into the queue, and to limit the maximum capital expenditure in the case of unplanned or out-of-control input growth. Another feature of the RightScale platform that plays a major role in this phase of the workflow is the ServerTemplate. Prior to launching a grid computing application, a ServerTemplate for the worker instances is created, which defines the instance-specific details, such as the size of the instance, the image to use as the base operating system, the region and availability zone in which the instance should be launched, along with other configuration information. This ServerTemplate can be created manually or automatically by calling the RightScale Grid macro. Another key aspect of the ServerTemplate is that it specifies the technology stack required on the instance (Ruby, Perl, PHP, etc.) and performs the installation of these tools, as well as the installation of the worker code. This worker code is downloaded from an SVN repository, and installed on the worker instance as specified in a script run at the end of the instance’s boot cycle. Optionally, the worker code can be included as an attachment to the ServerTemplate, but this method is not as flexible as downloading from a repository, and is not considered a best practice.

 


Worker Daemon

In addition to the user-specified tools and worker code, the RightScale Grid Worker Daemon is also installed on each worker instance. This worker daemon is responsible for managing the workflow on the instance itself, executing the worker code and uploading output files to S3 to be processed by the Job Consumer. Numerous functions are performed by the RightScale Grid Worker Daemon and the worker code, but they can be generally categorized as follows:

  1. The Worker Daemon retrieves a message from the input queue.  It does not actually delete the message, but instead makes a local copy, while rendering the actual message in the queue invisible to other processes.   This way, if a processing error occurs, the message will again become visible so that another worker daemon can grab the message from the input queue, thus guaranteeing that no jobs are lost.   The input queue's "visibility timeout" parameter defines how long a message will remain hidden before it becomes visible again to other worker daemons.
  2. After message retrieval, the Worker Daemon processes the message and downloads any required files from S3 to an input directory on the local filesystem. (See Figure 2. Individual steps are indicated in square brackets [ ].)
    diag-DaemonOverview1-v1.png
    Figure 2 – Message and input file retrieval by the Worker Daemon
    The first message in the queue is copied by the Worker Daemon and rendered invisible in the Input Queue.
  3. The Worker Daemon then creates the necessary data structures from the message and passes the required inputs to the worker code.
    diag-DaemonOverview2-v1.png
    Figure 3 – Worker Daemon passes necessary inputs to the worker code on the instance
  4. The worker code processes the job, and places any result files in an output directory on the local filesystem, while any log files are placed in a local log directory.  The worker code returns a data structure containing information about the results of the job to the RightScale Grid Worker Daemon. (See Figure 4.)
    diag-DaemonOverview3-v1.png
    Figure 4 – Worker code has completed job, created output files, and passed results to Worker Daemon

 

At this point in the workflow, the job has been processed by the worker code, and the resulting data structure has been passed to the RightScale Worker Daemon, which is the daemon’s indication that the results-processing portion of the workflow can now commence. At this point, the Worker Daemon performs the following steps:

  1. The output files are uploaded to the appropriate S3 bucket.
    Any log files created during the processing of the job are uploaded to S3.
    Input and temporary files are deleted from the local filesystem (input files can be left on the local filesystem via a configuration file option to the RightScale Grid Worker Daemon). (See Figure 5.)

    diag-DaemonOverview4-v1.pngFigure 5 – Worker Daemon processes files in local filesystem for completed job
  2. If the job completed successfully, a results message is placed in the SQS Output Queue, and (optionally) results information is posted to a user-specified web server. If the job resulted in an error, a message is placed in the SQS Error Queue.
    Statistics gathered during job execution are posted to the SQS Audit Queue.
    The original message is now deleted from the SQS Input Queue. (See Figure 6.)
    diag-DaemonOverview5-v1.png
    Figure 6 – Worker Daemon has placed results in appropriate queues, and removed completed job from Input Queue
  3. As mentioned earlier, the original message is not deleted from the input queue until the entire process has been completed successfully. The amount of time that a message can be worked on (made invisible) is set via a setting in the RightScale Grid worker configuration file.  For complex jobs that may take a long time to process, this setting should be configured to a value much greater than the anticipated maximum job processing time. Currently, the maximum setting for the visibility timeout is 12 hours.
    Shown on the right side of Figure 1 is the Job Consumer. This is an optional component of the workflow, but one which is typically implemented. The Job Consumer is not necessary in configurations that utilize grid chaining, for in these cases the output queue of one workflow functions as the input queue of a separate workflow. In the deployment created via the RightScale Grid macro, the Job Consumer is implemented via three separate applications: a Result Consumer that processes the SQS Output Queue, an Audit Consumer that processes messages in the SQS Audit Queue and compiles these statistics into a .csv file for later analysis, and an Error Consumer that processes the SQS Error Queue. These consumer processes are basic examples of the types of tasks that the Job Consumer can perform, and are provided as an instructional sample and model for production tasks. As with the Job Producer, multiple Job Consumer processes can run simultaneously and process the same queues.



     

You must to post a comment.
Last modified
23:19, 16 May 2013

Tags

This page has no custom tags.

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.