Provide you with sample code that you can alter for your own purposes, and walk you through the salient sections and parameters within a sample RightGrid configuration file.
This tutorial only applies to Grid or Premium accounts. If you have a Developer account and would like to upgrade, please contact firstname.lastname@example.org.
The rightworker.yml configuration file is the heart of a RightGrid application. The configuration file sets variables needed by the RightGrid worker daemon in order to call the user's application with the correct parameters. Be sure to place the rightworker.yml file in the same directory as the app worker.
The rightworker.yml file contains the following parameters:
- Defines the environment (ex: development, staging, production).
- Defines how to upload the results back to S3.
- Defines how to send result messages to the corresponding queues.
- Includes your AWS access and secret access keys.
- Look over the sample code below
- Read the section and configuration parameter details provided
- Use the above information to create your own rightworker.yml configuration file
The sample code below was written in Ruby. Use this code as a template for creating your own job producer.
In the sample code above, the parameters are defined in three sub-sections.
The 'Environment' section is the highest-level section in the configuration file and is commonly used to create different configurations setups, such as for development, testing, and production. You can have multiple environments and use a different RightGrid application for each environment. Each environment section requires a subsection called 'RightWorkersDaemon.'
In this example, we are defining a 'development' environment.
'RightWorkers Daemon' section:
The 'RightWorkersDaemon' section holds all of the RightGrid-specific configuration information. RightGrid will ignore any other subsections of the 'Environment' section.
The 'RightWorkersDaemon' section includes the following variables:
aws_access_key - your AWS access key ID.
aws_secret_access_key - your AWS secret key.
log - name of the RightGrid's log file. This is not the application's log file. NOTE: Stream names like STDOUT or STDERR are also allowed.
email - (optional) an email address that will receive any error messages.
halt_on_exit - defines how an instance will be shutdown/terminated.
- If set to true, the RightGrid daemon and the EC2 instance will both exit if no more jobs are in the input queue and it has been 55-59 minutes since the start of the last paying hour. This process maximizes the usage time of all worker instances while minimizing overall usage costs since Amazon charges by whole hours. (Useful for production environments.)
- If set to false, the RightGrid daemon never exits and the instance is not terminated. (Useful for debugging purposes.)
workers - the number of RightGrid "worker" to be started on each worker instance. Each "worker" can process one work unit. For small jobs, you might want to increase the number of "workers" per instance in order to maximize an instance's CPU usage. (Default = 1)
The 'User' section holds application-specific configuration information. This is a useful way for passing common variables/information to all worker instances. It can contain any number of key/value pairs. RightGrid does not read this information, but rather passes it on to the do_work() method of the application class as part of the message_env hash. In the sample code above, the 'custom_entry_a' and 'custom_entry_b' values will be passed to all worker instances.
The 'Queue' subsection which defines one or more input queues to monitor. If multiple queues are specified, RightGrid will monitor them in round-robin order. The title of each queue subsection must be the exact name of the input queue.
The following variables are common to all queues:
invocation_model - can take one of two values: 'oneshot' or 'persistent' (Default = oneshot)
message_decoder - the name of the Ruby class to use as a message codec. The class must implement the codec interface described in the code below, and the file containing the codec class definition must reside in the working directory of RightGrid.
result_queue - the SQS queue where result messages will be sent. If omitted, results will be sent to no queue.
s3_in - specifies a location on the local filesystem under which all S3 input data will be placed. By default, this input data is staged to an automatically generated location on the local filesystem.
s3_in_overwrite - if true, files already present on the local filesystem will be re-downloaded from S3 and overwritten when each new workunit requires them. (Default = True)
s3_in_delete - if true, it will remove downloaded files when the worker finishes processing the workunit. NOTE: Only files are removed. Directory structures are left intact. (Default = True)
s3_in_flat - controls the collapse of file hierarchies on S3 into a flat file space on the local filesystem. If the downloaded file is not specified as 'local_path_and_name' then it:
- is set to 'message_env['s3_in']/bucket/key' if s3_in_flat==false;
- is set to 'message_env['s3_in']/filename' where 'filename' is a key base name without any bucket if s3_in_flat == true. If the file has its own local name specified, 's3_in_flat' does not affect it. (Default = False)
s3_out - specifies a bucket and key on S3 under which RightGrid will upload any output files generated by the application. If omitted, output will not be uploaded.
s3_log - specifies a bucket and key on S3 under which RightGrid will upload any log files generated by the application. If omitted, logs will not be uploaded.
receive_message_timeout - SQS visibility timeout for messages retrieved from the input queue. While a message is invisible, other RightGrid instances will not be able to dequeue it. Note that this does not guarantee that a workunit won't be processed by multiple RightGrid instances.
life_time_after_fault - if errors occur while processing a work unit, RightGrid will process it again for a maximum of 'life_time_after_fault' seconds. If the work unit hasn’t been successfully processed in that time interval, it is deleted. This parameter is only used if the message has a 'created_at' timestamp in its body. Default value == 3600 seconds (1 hour).
Variables for "one-shot" queues:
default_worker_name - the name of the worker class to invoke on work units; each work unit will be passed to the 'do_work()' method of this class. The file containing the class definition should reside on the Ruby search path ($:). In this example, we define 'RGHelloWorld' as the worker class in the sample code of the customer application below.
Variables for "persistent" queues:
path_to_executable - the location of the application or kicker executable.
These are only a sample of the variables that can be defined in the rightworker.yml file. For a complete list of all the variables, see the RightGrid User Guide.