Disaster Recovery for AWS IoT

Disaster Recovery for AWS IoT provides failover guidance for your IoT devices. Customers with critical AWS IoT Core workloads can use this guidance to help store and process data in a second AWS Region if the primary Region is not available.

Benefits

Automatically replicates classic device shadows in Regions

Disaster Recovery for AWS IoT replicates classic device shadows and registry events by configuring a global Amazon DynamoDB table in the primary and secondary Regions.

Copy IoT devices, certificates, and policies in Regions

The Guidance implements an active-passive disaster recovery and provides tools to copy existing IoT devices from your primary Region (active) to your secondary Region (passive).

Amazon Route 53 health checks

Disaster Recovery for AWS IoT uses Amazon Route 53 with health checks and traffic policies to direct traffic from the primary Region to the secondary Region in the event of a Region failover.

Overview

The diagram below presents the architecture you can build using the example code on GitHub.

Disaster Recovery for AWS IoT architecture

Replication flow

1. After the code has created an Amazon DynamoDB table in each of the Regions, these tables will be configured as one global table. You must turn on registry events in the primary Region.

2. The registry publishes event messages when AWS IoT things, thing types, and thing groups are created, updated, or deleted. A topic rule forwards these messages to the DynamoDB table in the primary Region. They are automatically replicated to the table in the secondary Region.

3. DynamoDB streams captures the data on arrival in the secondary Region and invokes an AWS Lambda function (Dynamo trigger).

4. The Dynamo trigger Lambda function initiates an AWS Step Functions workflow to forward the related event types to another Lambda function.

5. The related Lambda function creates, updates or deletes several aspects of IoT things, thing groups, and thing types.

6. The Step Functions workflow creates or updates IoT things in the secondary Region. The Step Functions setup also includes retry rules to handle errors.

Failover flow

A separate set of AWS CloudFormation templates creates health checks that can be used by Amazon Route 53 in the primary and secondary Regions.

7. Amazon Route 53 with health checks and traffic policies can be used for a Region failover. For more information about failover options, refer to Solution components. Amazon Route 53 currently only supports HTTP(s) or TCP health checks. This solution uses the health of the Message Queuing Telemetry Transport (MQTT) message broker from AWS IoT Core.

8. CloudFormation templates deploy an Amazon API Gateway resource, which calls a Lambda function. This Lambda function is configured as a device in IoT Core. When invoked, the Lambda function connects to IoT Core, and subscribes to a topic and publishes a configured number of messages. The Lambda function expects to receive the same number of messages to the topic it has subscribed to.

9. Amazon Route 53 health checks calls the API Gateway resource and tests the MQTT message broker implicitly. As a layer of security, the Lambda function receives a query string before it connects to the message broker. If the query string does not match, the Lambda function issues an error message. The expected query string is configurable.

Disaster Recovery for AWS IoT

Version 1.0.0
Released: 05/2021
Author: AWS

Additional resources

Did this Solutions Implementation help you?
Provide feedback 
Build icon
Deploy an AWS Solution yourself

Browse our library of AWS Solutions to get answers to common architectural problems.

Learn more 
Find an APN partner
Find an AWS Partner Solution

Find AWS Partners to help you get started.

Learn more 
Explore icon
Explore Guidance

Find prescriptive architectural diagrams, sample code, and technical content for common use cases.

Learn more