Flow states retry and error handling
States can encounter runtime errors for various reasons:
- Transient errors, such as network outages and congestion
- JSON path resolution errors
By default, when a state encounters a runtime error, the entire ZIS flow fails.
Action and
Map states
support error handling using a Catch
block. You can use a Catch
block to
avoid failing the entire ZIS flow when a state encounters an error. See Using a
Catch block.
Specific runtime errors in specific states can also trigger a retry of a failed ZIS flow. See ZIS flow retry logic.
Using a Catch block
A Catch
block specifies a fallback state to run if an Action or Map state
encounters a runtime error. Example:
"Zendesk.GetTickets": {
"Type": "Action",
"ActionName": "zis:INTEGRATION:action:zendesk.get_tickets",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "AnotherState",
"ResultPath": "$.error_response"
}
],
"Next": "NextState"
}
Supported properties for a Catch block
Objects in the Catch
array support the following properties.
Name | Type | Mandatory | Description |
---|---|---|---|
ErrorEquals | array of strings | true | Array of error names. Only valid value is "States.ALL", which matches all error names |
Next | string | true | Fallback state to run if the Action or Map state encounters a runtime error and can't retry |
ResultPath | string | false | Reference path used to the store the state's output. Later states of the ZIS flow can access the output at this path. Defaults to "$", which replaces the state's input with its output |
Catching error response codes
If a custom action's HTTP request receives a non-2xx HTTP response status code,
its Action state returns a runtime error. The state also passes the following
error message to the $.Cause
reference
path:
external action failed due to status code: {http_status_code}
For example, if a custom action's request receives a 404 HTTP status code, the
$.Cause
path contains the following message:
external action failed due to status code: 404
You can use a Catch
block and Choice state to conditionally run different flow
states based on the $.Cause
path's message. For an example, see Manually
retrying a ZIS flow.
ZIS flow retry logic
The following table contains the states and runtime errors that trigger a retry of a failed ZIS flow. The interval between retries varies based on the state and error.
State | Runtime error | Interval between retries |
---|---|---|
Action | 429 rate limit error | See Retrying a ZIS flow after rate limiting |
Action | 5xx server error | See Retrying a ZIS flow after a server error |
Action | Any error other than the following:
| Retry runs immediately |
Map | A child state encounters a runtime error that triggers a flow retry | Uses the retry interval for the child state and error |
A Succeed or Fail state won't trigger a retry of a ZIS flow, even if it encounters a runtime error.
ZIS only attempts to run a flow up to four times: the initial attempt and up to three retry attempts. During a retry, the entire flow runs again from the beginning. Ensure your use case and flow logic account for this.
Using a Catch
block to capture an error that ZIS normally retries overrides ZIS's automatic retry behavior. In this case, you must manually retry the request. For example, see Manually retrying a ZIS flow.
Retrying a ZIS flow after rate limiting
A 429 HTTP status code means "too many requests." When a web server returns an error with this code, it means rate limiting has kicked in because the client is sending too many requests too quickly.
Some servers include a Retry-After
header in responses with a 429 error. This
header specifies how long you should wait before retrying the request. The
header may provide this interval as a number of seconds to wait or a date and
time after which you can retry the call.
A custom action in an Action state can send an HTTP request. If a custom action's HTTP request receives a 429 HTTP response status code and the flow fails, the flow will retry based on the Retry-After
interval.
Retry-After interval | Flow retry behavior |
---|---|
Less than 120 seconds | Retry after the Retry-After interval |
120 seconds or greater | Retry after 120 seconds |
No Retry-After header | Retry after 30–35 seconds. If the retry fails, attempt a second retry after a further 60–65 seconds. If the second retry fails, attempt a third and final retry after a further 120-125 seconds. |
Retrying a ZIS flow after a server error
A 5xx HTTP status code means something has gone wrong with the responding web server.
A custom action in an Action state can send an HTTP request. If a custom action's HTTP request receives a 5xx HTTP response status code and the flow fails, the flow will retry after 30–35 seconds. If the retry fails, ZIS attempts a second retry after a further 60–65 seconds. If the second retry fails, ZIS attempts a third and final retry after a further 120–125 seconds.
Manually retrying a ZIS flow
To manually retry a flow for other types of errors, use a Catch
block in an Action state to catch the error. Then use a Choice state to check the $.Cause reference path for a specific error code.
For example, your workflow might look like this:
- GET an object
- Catch and check for a 404 error, indicating the object doesn't exist
- Create the object
- Resume the rest of your workflow
Example:
{
"StartAt": "Zendesk.DoSomething",
"States": {
"Zendesk.DoSomething": {
"Type": "Action",
"ActionName": "zis:YOUR_INTEGRATION_NAME:action:zendesk.YOUR_ACTION_NAME",
"Parameters": {
"ticketId.$": "{{$.input.ticket.id}}"
},
"Catch": [
{
"Comment": "ZIS only supports catching all error types, i.e. States.ALL",
"ErrorEquals": ["States.ALL"],
"Next": "CheckErrorType"
}
],
"ResultPath": "$.do_something_result",
"End": true
},
"CheckErrorType": {
"Comment": "Checks whether the error caught is a 404",
"Type": "Choice",
"Choices": [
{
"Variable": "$.Cause",
"StringEquals": "external action failed due to status code: 404",
"Next": "log.errorCaught.404"
}
],
"Default": "log.errorCaught.other"
},
"log.errorCaught.404": {
"Comment": "Use this branch to handle 404 error",
"Type": "Succeed",
"Message": "I caught a 404 error"
},
"log.errorCaught.other": {
"Comment": "Use this branch to handle other errors",
"Type": "Succeed",
"Message": "I caught a non-404 error"
}
}
}
Flow circuit breaker
If a flow has more than 3,000 events occur within a 10-minute period, a circuit breaker will be triggered if either of the following limits are reached:
- More than 50% of those events lead to flows failing with a retryable error
- More than 10,000 retryable errors occur within that period
If triggered, the circuit breaker will drop subsequent events for 30 seconds.
After 30 seconds, ZIS will process the next event for the flow. If successful, the error thresholds will be reset. If not, ZIS will wait another 30 seconds before repeating the cycle until such time that the flow succeeds.