Note: Please go to docs.rightscale.com to access the current RightScale documentation set. Also, feel free to Chat with us!
Home > Guides > Cloud Workflow Developer Guide > Attributes and Error Handling

Attributes and Error Handling

 

 

 


Table of Contents Sections

Attributes and Error Handling Overview

Some statements can be adorned with attributes that affects their behavior. Attributes appear after the statement on the same line right before the do keyword for expressions that have one (e.g. definesub, concurrent, map, foreach). There can be multiple attributes specified on a single statement in which case they are separated with commas. Attributes allow specifying and handling timeouts and handling errors and cancelations.

An attribute has a name and a value. The syntax is:


name: value

The acceptable types for attribute values depend on the attribute: they may be numbers (e.g. wait_task: 1), strings (e.g. on_timeout: skip), arrays (e.g. wait_task:"this_task", "that_task" ]), or definition names with arguments (e.g. on_timeout: handle_timeout()).

Some of the attributes define behavior that apply to tasks, while others define behavior for the whole process. Processes and tasks are described in detail in the Cloud Workflow Processes section. For the purpose of understanding the attributes behavior described below, it is enough to know that a single process consists of one or more tasks. A task is a single thread of execution.

 

Some attributes attach behavior to the expression they adorn and all their sub-expressions. The sub-expressions of an expression are expressions that belong to the block defined by the parent expression. Not all expressions define blocks so not all expressions have sub-expressions. Expressions that may have sub-expressions include define, sub, concurrent and the looping expressions (foreach, concurrent foreach etc.).

The exhaustive list of all attributes supported by the language are listed below in alphabetical order:

Attribute Applies to Possible Values Description
on_error

define, sub, call

name of the definition with arguments, skip or retry

Behavior to trigger when an error occurs in an expression or a sub-expression.

on_rollback

define, sub, call, concurrent,
concurrent map, concurrent foreach

name of the definition with arguments

Name of the definition called when an expression causes a rollback (due to an error or the task being canceled).

on_timeout

sub, concurrent, call

name of the definition with arguments, skip or retry

Behavior to trigger when a timeout occurs in an expression or a sub-expression.

task_name

sub

string representing the name of a task

Change current task name to given value.

task_prefix

concurrent foreach, concurrent map

string representing the prefix of task names

Specifies the prefix of names of tasks created by concurrent loop (suffix is iteration index).

timeout

sub, call

string representing a duration

Value defines the maximum time allowed for expressions in a statement and any sub-expression to execute.

wait_task

concurrent, concurrent foreach, concurrent map

number of tasks to be waited on or name(s) of task(s) to be waited on

Pause the execution of a task until the condition defined by a value is met

task_label sub, call, define string representing label Labels allow processes to return progress information to clients. They are associated with an arbitrary name that gets returned by the cloud workflow APIs.

The task_nametask_prefix and wait_task attributes are described in Cloud Workflow Processes. This section describes in detail the other attributes dealing with errors and timeouts.

Errors and Error Handling

Defining the steps involved in handling error cases is an integral part of all cloud workflows. This is another area where workflows and traditional programs differ: a workflow needs to describe the steps taken when errors occur the same way it describes the steps taken in the normal flow. Error handlers are thus first class citizen in RCL and are implemented as definitions themselves. Handling an error could mean alerting someone, cleaning up resources, triggering another workflow, etc. RCL makes it possible to do any of these things through the on_error attribute. Only the definesub and concurrent expressions may be adorned with that attribute.

The associated value is a string that can be any of the following:

  • skip: aborts the execution of the statement and any sub-expression then proceeds to the next statement. No cancel handler is called in this case.
  • retry: retries the execution of the statement and any sub-expression.

To illustrate the behavior associated with the different values consider the following snippet:


sub on_error: skip do
  raise "uh oh"
end
log_info("I'm here")

The engine generates an exception when the raise expression executes. This exception causes the parent expression on_error attribute to execute. The associated value is skip which means ignore the error and proceed to run the first expression after the block. The engine then proceeds to the next expression after the block (the log_info expression). If the attribute value associated with the on_error handler had been retry instead, then the engine would have proceeded to re-run the block (which in this case would result in an infinite loop).

As mentioned in the introduction, a cloud workflow may need to define additional steps that need to be executed in case of errors. The on_error attribute allows specifying a definition that gets run when an error occurs. The syntax allows for passing arguments to the definition so that the error handler can be provided with contextual information upon invocation. On top of arguments being passed explicitly, the error handler also has access to all the variables and references that were defined in the scope of the expression that raised the error.

The error handler can stipulate how the caller should behave once it completes by assigning one of the string values listed above (skip or retry) to the special $_error_behavior local variable. If the error definition does not define $_error_behavior, then the caller uses the default behavior (raise) after the error definition completes. This default behavior causes the error to be re-raised so that any error handler defined on a parent scope may handle it. If no error handler is defined or all error handlers end-up re-raising then the task terminates and its final status is failed.

The following example shows how to implement a limited number of retries using an error handler:


define handle_retries($attempts) do
 ​ if $attempts <= 3
    $_error_behavior = "retry"
  else
    $_error_behavior = "skip"
  end
end
$attempts = 0
sub on_error: handle_retries($attempts) do
 ​ $attempts = $attempts + 1
  ... # Statements that will get retried 3 times in case of errors
end

Errors can originate from evaluating expressions (e.g. division by 0) or from executing resource actions (e.g. trying to launch an already running server). A variation on the former are errors generated intentionally using the raise keyword. In all these cases the most inner error handler defined using the on_error attribute gets executed.

The raise keyword optionally followed with a message causes an error which can be caught by an error handler. All error handlers have access to a special variable that contains information about the error being raised. $_error is a hash that contains three keys:

  • "type": A string that describe the error type. All errors raised using the raise keyword have the type set to user.
  • "message": A string that contains information specific to this occurrence of the error. The string contains any message given to the raise keyword for user errors. 
  • "origin": A [ line, column ] array pointing at where the error occurred in the RCL source.
     

define handle_error() do
 ​ log_error($_error["type"] + ": " + $_error["message"]) # Will log "user: ouch"
  $_error_behavior = "skip"
end
sub on_error: handle_error() do
 ​ raise "ouch"
end

Resource Action Errors

Resource actions always operate atomically on resource collections, in other words the expression @servers.launch() is semantically equivalent to making concurrent launch API calls to all resources in the @servers array. This means that multiple errors may happen concurrently if multiple resources in the collection fail to run the action. When that happens an error handler needs to have access to the set of resources that failed as well as the set that succeeded and the initial collection to take the appropriate actions. We have already seen the special $_error variable made available to error handlers in case of an error resulting from calling an action on a resource collection. RCL also makes available the following variables to the error handler:

  • @_original: The resource collection that initially executed the action that failed.
  • @_done: A resource collection containing all the resources that successfully executed the action.
  • @_partial: A resource collection containing the partial results of the action if the action returns a collection of resources.
  • $_partial: An array containing the partial results of the action if the action returns an array of values.
  • $_errors: An array of hashes containing specific error information.
     

The $_errors variable contains an array of hashes. Each element includes the following values:

  • "resource_href": Href of the underlying resource on which the action failed, e.g. "/account/71/instances/123"
  • "action": Name of the action that failed, e.g. "run_executable"
  • "action_arguments": Hash of action arguments as specified in the definition, e.g. { "recipe_name": "sys:timezone" }
  • "request": Hash containing information related to the request including the following values...
    1. "url": Full request URL, e.g. "https://my.rightscale.com/instances/...run_executable"
    2. "verb": HTTP verb used to make the request, e.g. "POST"
    3. "headers": Hash of HTTP request headers and associated value
    4. "body": Request body (string)
  • "response": Hash containing information related to the response including the following values...
    1. "code": HTTP response code (string)
    2. "headers": Hash of HTTP response headers
    3. "body": Response body (string)

In case of resource action errors the $_error variable is initialized with the type "resource_action" and includes the detailed error message with the problem, summary, and resolution fields as a string.

Given the above, the following definition implements a retry:


define handle_terminate_error do
 ​ foreach $error in $_errors do
    @instance = rs.get($error["resource_href"]) # Retrieve the instance that failed to terminate
    if @instance.state != "stopped"          # Make sure it is still running
      log_error("Instance " + @instance.name + " failed to terminate, retrying...")
      sub on_error: skip do
        @instance.terminate() # If so try again to terminate but this time ignore any error
      end
    end
  end
  $_error_behavior = "skip" # Proceed with the next statement in caller
end
sub on_error: handle_terminate_error() do
  @instances.terminate()
end

In the definition above the error handler sets the special $_error_behavior local variable to "skip" which means that the process will proceed to the next statement after the call to terminate() even if there is an error terminating some of the instances. Note how the handler itself uses on_error to catch errors and ignore them (using skip).

Actions may return nothing, collection of resources, or array of values. In the case an action has a return value (collection or array), the error handler needs to be able to modify that value before it gets returned as a result of the action expression. For example, an error handler may retry certain actions and as a result may need to add to the returned value which would initially only contain values for the resources that ran the action successfully. An error handler can achieve this by updating the @_partial collection or the $_partial array, the content of this reference or variable at the end of the execution of the handler is what gets returned by the action call.

To take a concrete example let's consider the RightScale servers resource launch() action. This action returns a collection of launched instances. The following handler retries any failure to launch and updates the @_partial collection with instances that successfully launched on retry:


define retry_launchdo
 foreach $error in $_errors do
   @server = rs.get($error["resource_href"]) # Retrieve the server that failed to launch
   if @server.state == "stopped" # Make sure it is still stopped
     log_error("Server " + @server.name + " failed to launch, retrying...")
     sub on_error: skip do
       @instance = @server.launch() # If so try again to terminate but this time ignore any error
     end
     @_partial = @_partial + @instance # @instance may be empty in case the launch failed again
   end
  end
  $_error_behavior = "skip" # Proceed with the next statement in caller
end
sub on_error: retry_launch() do
@instances = @servers.launch()
end

The definition above adds any instance that is successfully launched in the retry to the @_partial collection thereby making sure that they get assigned to @instances as result of the launch() action.

Handlers and State

We've seen before that definitions executed via call only have access to the references and variables passed as argument (and global references and variables). Definitions executed through handlers, on the other hand, inherit from all the local variables and references defined at the time the handler is invoked (so at the time an exception is thrown, a timeout occurs or a cancelation is triggered).


define handle_errors do
  log_error("Process failed while handling " + inspect(@servers)) # Note: handler has access to @servers
  $_error_behavior = "skip"
end
@servers = rs.get(href: "/api/servers/123")
sub on_error: handle_errors() do
 ​ @servers.launch()
end

In the snippet above, the error handler has access to @servers even though that collection is defined in the main scope (the various log_xxx() functions allow for appending messages to process logs and the inspect() function produces a human friendly string representation of the object it is given).

Timeouts

The timeout and on_timeout attributes allow setting time limits on the execution of expressions and specifying the behavior when a time limit is reached respectively:


sub timeout: 30m, on_timeout: handle_launch_timeout() do@server = rs.get(href: "/api/server/1234")
  @instance = @server.launch()
  sleep_until(@instance.state == "operational")
  @server = rs.get(href: "/api/servers/1235")
  @instance = @server.launch()
  sleep_until(@instance.state == "operational")
end

The block in the snippet above must execute in less than 30 minutes otherwise its execution is canceled and the handle_launch_timeout definition is executed. Timeout values can be suffixed with d, h, m or s (respectively day, hour, minute or second).

Note that there does not need to be a on_timeout associated with all timeout attributes. Instead the most inner expression that includes a on_timeout attribute gets triggered when a timeout occurs:


sub on_timeout: outer_handler() do
  ...
  sub timeout: 10m, on_timeout: inner_handler() do
    ...
    @instance = @server.launch()
    sleep_until(@instance.state == "operational")
    ...
  end
  ...
end

In the snippet above, inner_handler gets executed if the sleep_until function takes more than 10 minutes to return.

Similar to the on_error attribute, the on_timeout attribute can be followed by a definition name or one of the behaviors values (skip or retry).

Note: Using the raise behavior in an on_timeout attribute will cause the next on_timeout handler to be executed. Timeouts never cause error handlers to be executed and vice-versa.

On top of specifying the behavior directly in the on_timeout attribute as in:


sub timeout: 10m, on_timeout: skip do
 ​ @instance = @server.launch()
end

It's also possible for a definition handling the timeout to specify what the behavior should be by setting the $_timeout_behavior local variable:


define handle_timeout do
  ​$_timeout_behavior = "retry"
end

Finally, the timeout handler may accept arguments that can be specified with the on_timeout attribute. The values of the references and variables at the point when the timeout occurs are given to the handler:


define handle_timeout($retries) do
 ​ if $retries <  3
    $_timeout_behavior = "retry"
  else 
    $_timeout_behavior = "skip"
  end
end
$retries = 0
sub timeout: 10m, on_timeout: handle_timeout($retries) do
  $retries = $retries + 1
  sleep(10 * 60 + 1) # Force the timeout handler to trigger
end

The snippet above will cause the handle_timeout definition to execute three times. The third times $retries is equal to 3, the timeout handler definition sets $_timeout_handler to skip and the block is canceled.

Labels

The task_label attribute is used to report progress information to clients. It does not affect the execution of the process and is simply a way to report what it is currently doing. The label attribute can be used on sub and call:


define main
  sub task_label: "Initialization" do
    ...
  end
sub task_label: "Launching servers" do
  ​  ...
  end
call setup_app() task_label: "Setting up application"
end


Logging

As shown in the snippet above RCL has built-in support for logging which helps troubleshoot and develop cloud workflows. Each process is associated with a unique log that is automatically created on launch. Logging is done using the log_title()log_info() and log_error() functions:

  • log_title(): To append a section title to the log
  • log_info(): To append informational message to the log
  • log_error(): To append an error message to the log

 

Logs for a process can be retrieved using the RightScale API or through the RightScale dashboard by looking at the process audit entries.

Summary

We have seen how a cloud workflow may use attributes to annotate statements and defining additional behaviors. Attributes apply to the statement they adorned and some also apply to its sub-expressions. Definitions can be written to handle errors, timeouts and cancelation. Definitions handling errors that occur during resource action execution have access to all the underlying low level errors and can modify the return value of the action. 

 

 

 

RCL Resources Cloud Workflows & Definitions Variables ► Attributes & Error Handling Branching & Looping Processes Functions Operators Mapping
You must to post a comment.
Last modified
09:47, 10 Mar 2015

Tags

Classifications

This page has no classifications.

Announcements

None


© 2006-2014 RightScale, Inc. All rights reserved.
RightScale is a registered trademark of RightScale, Inc. All other products and services may be trademarks or servicemarks of their respective owners.