Skip to main content

2 posts tagged with "Strands Agents"

Strands Agents Framework

View All Tags

Real-Time Voice to Sign Language Translation - Part 3: Edge AI Agent with Strands Agents on NVIDIA Jetson

· 13 min read
Chiwai Chan
Tinkerer

This is Part 3 of a 3-part series covering a real-time voice-to-sign-language translation system. In Part 1, I covered the React frontend that captures speech, processes it with Amazon Nova 2 Sonic, and publishes cleaned sentence text via MQTT. In Part 2, I covered the AWS CDK stack that routes IoT Core messages through Lambda to AppSync for real-time GraphQL subscriptions.

NVIDIA Jetson AGX Thor Developer Kit

This post covers the final piece — the edge AI agent that actually makes the physical hand move. It is a Strands Agent running on an NVIDIA Jetson that subscribes to MQTT commands from the frontend, uses Amazon Nova 2 Lite to invoke the fingerspell tool, drives the Pollen Robotics Amazing Hand's Feetech SCS0009 servos for ASL fingerspelling letter by letter, records video of the hand in action, uploads it to S3, and publishes hand state back to IoT Core — which Part 2's infrastructure routes through to the frontend via AppSync.

The three repositories in the series:

  1. Part 1 - Frontend and Voice Processing (amplify-react-nova-sonic-voice-chat-amazing-hand) — React web app that captures speech, streams to Nova 2 Sonic, publishes cleaned sentence text via MQTT
  2. Part 2 - Cloud Infrastructure (cdk-iot-amazing-hand-streaming) — AWS CDK stack that routes IoT Core messages through Lambda to AppSync
  3. This post (Part 3) - Edge AI Agent (strands-agents-amazing-hands) — Strands Agent powered by Amazon Nova 2 Lite on NVIDIA Jetson that translates sentence text to ASL servo commands, drives the Amazing Hand, and publishes state back

Goals

  • Receive MQTT commands from the React frontend (plain text or JSON with sentence field) and drive the Amazing Hand servos for ASL fingerspelling
  • Use the Strands Agents framework with Amazon Nova 2 Lite (us.amazon.nova-2-lite-v1:0) to invoke the fingerspell tool — the LLM passes the incoming text verbatim to the tool for letter-by-letter ASL spelling
  • Fingerspell text using the 26-letter ASL alphabet (A-Z), with each letter held for 0.8 seconds and spaces adding a 0.4-second pause
  • Control 8 Feetech SCS0009 servos (4 fingers x 2 joints) on the Pollen Robotics Amazing Hand via serial bus at 1M baud using the rustypot library
  • Record video of the hand via OpenCV during each fingerspelling sequence, encode to H.264 MP4 via imageio-ffmpeg, upload to S3, and include a presigned URL in the state message
  • Publish real-time hand state (servo angles, letter, video URL) to IoT Core over MQTT — which Part 2's CDK stack routes to AppSync for the frontend to consume
  • Authenticate to AWS IoT Core using mTLS with X.509 device certificates
  • Create a fresh agent instance per MQTT message to prevent conversation history accumulation and unbounded token growth
  • Handle graceful shutdown with servo torque disable on SIGINT/SIGTERM

The Overall System

This diagram shows the complete end-to-end system. Part 3 is the edge device highlighted on the right — the NVIDIA Jetson running the Strands Agent that controls the Amazing Hand.

Overall System with Part 3 Highlighted

How Part 3 fits in:

  • Part 1 (Frontend) publishes cleaned sentence text to the-project/robotic-hand/{deviceName}/action via MQTT
  • Part 3 (This agent) subscribes to the /action topic, processes the command through the Strands Agent, drives the servos, records video, and publishes state back to /state
  • Part 2 (Infrastructure) picks up the /state messages and routes them through Lambda to AppSync, where the frontend receives them via GraphQL subscriptions

Architecture

The agent is a Python application built on the Strands Agents framework. It runs as a long-lived MQTT listener on the NVIDIA Jetson, creating a fresh agent instance for each incoming message to keep memory bounded.

Agent Architecture

Agent Architecture

Components:

  • MQTT Listener (agent.py) — Subscribes to the action topic, parses incoming messages (plain text or JSON), and submits each action to a single-threaded agent executor to keep the AWS CRT MQTT event loop free
  • Strands Agent — A fresh Agent instance created per message with Amazon Nova 2 Lite as the model, the fingerspell tool as the available action, and a MaxToolCallsHook (limit 3) to prevent runaway tool-call loops
  • Fingerspell Tool (hand_control.py) — A @tool decorated function that the LLM invokes to spell text letter-by-letter using the 26-letter ASL alphabet
  • Servo Controller — Uses rustypot.Scs0009PyController to communicate with 8 Feetech SCS0009 servos over serial at 1M baud. Each finger has two servos controlled by dedicated move functions (Move_Index, Move_Middle, Move_Ring, Move_Thumb)
  • Video Recorder (video_recorder.py) — Background daemon thread captures frames via OpenCV, encodes to H.264 MP4 via imageio-ffmpeg, uploads to S3, and returns a presigned URL (1-hour expiry)
  • State Publisher — Non-blocking MQTT publisher on a separate thread that sends hand state (finger angles, letter, video URL) to the /state topic with QoS 1

Data Flow

Interactive Sequence Diagram

Edge Agent: MQTT Command to Servo Control Flow

From MQTT command to ASL fingerspelling with video capture

0/13
IoT CoreListenerMQTT ListenerAgentStrands AgentNovaNova 2 LiteServosServo ControllerS3S3 + Video0msMQTT message: { "sentence": "hello world" }QoS 11msParse JSON, extract sentence field2msstart_recording() — launch camera daemon thread3msCreate fresh Agent instance + submit to executorNo history from prior messages5msConverse API: system prompt + action text + fingersp...200msTool selection: fingerspell(text="hello world")210msfingerspell: Move H-E-L-L-O (0.8s per letter)Serial bus @ 1M baud300msPublish state per letter: { letter: "H", fingers: {....Non-blocking thread4500msfingerspell: Move W-O-R-L-D (0.8s per letter)4600msPublish state per letter: { letter: "W", fingers: {....8800msstop_recording_and_upload() — encode H.264 + upload ...9000msPresigned URL (1hr expiry)9001msRe-publish last state with video_url appended
IoT Core
Listener
Agent
Nova
Servos
S3
Milestone
Complete
Total: 13 steps across 6 components
MQTT command → ASL fingerspelling + video in ~9 seconds

How it works

MQTT Command Reception

The agent subscribes to an MQTT action topic (e.g. the-project/robotic-hand/XIAOAmazingHandRight/action) using mTLS authentication with X.509 device certificates. The first connection uses clean_session=True to flush any stale session state, then reconnects with clean_session=False for normal operation.

When a message arrives, the handler tries to parse it as JSON and extract the sentence field. If JSON parsing fails, it treats the entire payload as plain text. The action is then submitted to a single-threaded executor (agent_executor) to keep the AWS CRT MQTT event loop free:

def on_message(topic, payload, dup, qos, retain, **kwargs):
payload_str = payload.decode("utf-8")
try:
data = json.loads(payload_str)
action = data.get("sentence", payload_str)
except json.JSONDecodeError:
action = payload_str
agent_executor.submit(_process_action, action)

Strands Agent and Amazon Nova 2 Lite

The Strands Agents framework provides the core AI reasoning loop. A fresh agent instance is created for every MQTT message — this is deliberate to prevent conversation history from accumulating across messages, which would cause unbounded token growth over time.

The agent uses Amazon Nova 2 Lite (us.amazon.nova-2-lite-v1:0) via the Bedrock Converse API. Nova 2 Lite was chosen for its low-latency tool-use responses, which is critical for real-time servo control. The agent is configured with a MaxToolCallsHook that cancels tool calls beyond 3 to prevent infinite LLM tool-call loops.

The agent runs in fingerspell-only mode — only the fingerspell tool is available. The system prompt instructs the LLM to pass the entire message verbatim to the fingerspell tool without shortening or modifying it. State messages include a letter field identifying the current ASL letter being signed.

Servo Hardware and Control

Pollen Robotics Amazing Hand

The Amazing Hand — an open-source robotic hand designed by Pollen Robotics and manufactured by Seeed Studio — has 4 fingers (index, middle, ring, thumb — no pinky) with 2 Feetech SCS0009 servos per finger (8 servos total) connected via a Waveshare driver board over serial USB at 1,000,000 baud.

Each servo has an angle range of -90 to +90 degrees. Per-servo calibration offsets (MiddlePos) are applied during move operations to account for physical alignment:

MiddlePos = [-17, 8, -16, -4, -12, 10, -9, 9]

The control sequence for each finger:

  1. Set goal speed for both servos (write_goal_speed) with a 0.2ms sleep between each speed write for serial bus timing
  2. Convert angle to radians with calibration offset: np.deg2rad(MiddlePos[i] + angle)
  3. Set goal position for both servos (write_goal_position)
  4. 5ms sleep after positions are set before the next finger's commands

ASL Fingerspelling Tool

The fingerspell(text) tool is decorated with @tool from the Strands framework, making it callable by the LLM during inference. It spells text letter-by-letter using the ASL alphabet. Each of the 26 letters (A-Z) is mapped to servo angle tuples for all 4 fingers. Each letter is held for 0.8 seconds, spaces add a 0.4-second pause, and non-letter characters are skipped. A state message with the current letter field is published after each letter.

Since the Amazing Hand has no pinky finger, ASL letters that require a pinky use the ring finger instead.

Video Recording Pipeline

Video is recorded concurrently with each fingerspelling sequence:

  1. Start recording — Before the agent is invoked, start_recording() launches a background daemon thread (video-capture) that captures frames from OpenCV VideoCapture(0) at the camera's native FPS (typically 30)
  2. Stop and encode — After the agent completes, stop_recording_and_upload() stops the capture thread, converts frames from BGR (OpenCV) to RGB, and encodes to H.264 MP4 using imageio.v3 with the libx264 codec. The temp file is named hand_YYYYMMDD_HHMMSS_
  3. Upload to S3 — The MP4 is uploaded to the configured S3 bucket (default: cc-amazing-video) with key videos/hand_YYYYMMDD_HHMMSS.mp4
  4. Presigned URL — A presigned URL is generated with 1-hour expiry and appended to the last state message, which is re-published to the /state topic

State Publishing

After each servo movement, the tool publishes a state message to the MQTT /state topic (e.g. the-project/robotic-hand/XIAOAmazingHandRight/state) with QoS 1. Publishing is non-blocking — it submits to a dedicated _publish_executor thread to avoid blocking the servo tool.

The state payload:

{
"gesture": "fingerspell",
"letter": "E",
"ts": 1770550850,
"fingers": {
"index": { "angle_1": 45, "angle_2": -45 },
"middle": { "angle_1": 45, "angle_2": -45 },
"ring": { "angle_1": 45, "angle_2": -45 },
"thumb": { "angle_1": 60, "angle_2": -60 }
},
"video_url": "https://cc-amazing-video.s3.amazonaws.com/videos/hand_20260228.mp4?..."
}

The last published state is cached so that publish_state_with_video_url() can re-publish it with the presigned URL appended after video upload completes — without needing to re-read servo angles.

This state payload is what Part 2's CDK stack picks up via the IoT Rule, flattens in Lambda, and pushes into AppSync for the frontend to consume.

Threading Model

The agent uses two thread pools and a daemon thread to keep operations non-blocking:

ThreadTypeWorkersPurpose
agent_executorThreadPoolExecutor1Runs Strands agent off the AWS CRT MQTT event loop
_publish_executorThreadPoolExecutor1Publishes state messages non-blocking
video-captureDaemon Thread1Background camera frame capture

Graceful Shutdown

On SIGINT or SIGTERM, the agent:

  1. Sets a stop event to exit the main loop
  2. Disables servo torque (write_torque_enable(1, 2)) to release the servos and prevent power draw
  3. Disconnects from MQTT
  4. Logs completion

Technical Challenges & Solutions

Challenge 1: Conversation History Bloat

Problem: Strands Agents maintain conversation history by default. Over time, as hundreds of MQTT messages are processed, the token count grows unboundedly, increasing latency and cost.

Solution: A fresh Agent instance is created for every MQTT message. This discards all prior conversation history, keeping each invocation lightweight. Token usage (input, output, total) is logged after each invocation for monitoring.

Challenge 2: Runaway Tool-Call Loops

Problem: The LLM might enter a loop of calling tools repeatedly — for example, calling fingerspell then deciding to call it again with modified text, then again.

Solution: A custom MaxToolCallsHook implementing the Strands HookProvider interface. It counts tool calls per agent invocation and cancels any tool call beyond the limit of 3. This is injected into the agent via hooks=[MaxToolCallsHook()].

Challenge 3: No Pinky Finger on the Amazing Hand

Problem: The Pollen Robotics Amazing Hand has only 4 fingers (index, middle, ring, thumb) — no pinky. Several ASL letters require specific pinky positions (e.g. I, J, Y).

Solution: ASL letters that require a pinky use the ring finger instead. The 26-letter ASL alphabet is manually mapped to 4-finger servo angle tuples, approximating the correct hand shape with the available fingers.

Challenge 4: Serial Bus Timing

Problem: Sending servo commands too quickly over the serial bus causes missed commands or erratic movement. The Feetech SCS0009 protocol requires time between operations.

Solution: A 0.2ms sleep is inserted between speed writes, and a 5ms sleep is added after both goal positions are set, giving the serial bus time to process each command before the next finger's sequence begins.

Getting Started

GitHub Repository: https://github.com/chiwaichan/strands-agents-amazing-hands

Prerequisites

  • NVIDIA Jetson (AGX Thor or Orin Nano Super) with Python 3.10+
  • Pollen Robotics Amazing Hand connected via USB serial (Waveshare driver board)
  • AWS IoT Core device certificates (certificate, private key, root CA)
  • Amazon Bedrock access enabled for Nova 2 Lite in us-east-1
  • USB camera connected to the Jetson
  • S3 bucket for video storage (default: cc-amazing-video)

Installation

git clone https://github.com/chiwaichan/strands-agents-amazing-hands.git
cd strands-agents-amazing-hands
pip install -e .

Running the Agent

amazing-hand-agent \
--endpoint your-iot-endpoint.iot.us-east-1.amazonaws.com \
--cert certs/device.pem.crt \
--key certs/device.pem.key \
--ca certs/AmazonRootCA1.pem \
--topic the-project/robotic-hand/XIAOAmazingHandRight/action \
--serial-port /dev/amazing-hand-right \
--s3-bucket cc-amazing-video

The agent will connect to IoT Core, subscribe to the action topic, and wait for commands. When a message arrives, it will process it through the Strands Agent, drive the servos, record video, and publish state back.

Summary

This post covered the edge AI agent — the final piece of the voice-to-sign-language translation system:

  • Strands Agents framework with Amazon Nova 2 Lite for tool-use — a fresh agent per MQTT message prevents history bloat, with MaxToolCallsHook limiting calls to 3
  • ASL fingerspelling with the 26-letter alphabet (A-Z), each letter held for 0.8 seconds — the fingerspell tool is decorated with @tool for LLM invocation
  • 8 Feetech SCS0009 servos on 4 fingers controlled via rustypot over serial at 1M baud, with per-servo calibration offsets
  • Video pipeline captures via OpenCV in a background daemon thread, encodes to H.264 MP4 via imageio-ffmpeg, uploads to S3, and includes a 1-hour presigned URL in the final state message
  • Non-blocking threading with 2 thread pools (agent executor off MQTT event loop, state publisher) and a daemon thread for video capture
  • Real-time state publishing to IoT Core after every servo movement — which Part 2's CDK stack routes through Lambda to AppSync, completing the feedback loop to the React frontend in Part 1
  • Graceful shutdown disables servo torque on SIGINT/SIGTERM to release the servos and prevent power draw

Agentic based Over-The-Air Firmware Management of Seeed Studio XIAO ESP32S3 IoT Device Firmware using Amazon AgentCore and Strands Agents

· 8 min read
Chiwai Chan
Tinkerer

I want to have the ability to be able to manage the firmware of all IoT devices using a prompt - it could be to upgrade a device to the latest version, or even to perform a rollback, whether across the entire IoT device fleet level - every device in all 20+ solution types, all the devices within a type of solution, or even at an individual device level.

Goals

  • To be able to over-the-air flash a new firmware version using a prompt
  • To have an Agentic Agent do all the work, give it a prompt and it takes cares of the rest
  • Scalable in the number of IoT devices, as well as, being able to scale as the number of new IoT solution Types increases; with no effort required - implement once and forget
  • Have the ability to rollback to any firmware version specified in the prompt
  • This same solution can be interfaced with using the Model Context Protocol (MCP): whether via Kiro CLI or Claude Code
  • This same solution can be interfaced with using a chatbot
  • Must be authenticated to interface with this solution
  • Must be a completely serverless-solution
  • Firmware integrity verification using SHA256 checksums before flashing to ensure firmware hasn't been corrupted during download
  • Safe rollout with rate limiting and automatic abort thresholds to prevent fleet-wide failures
  • Device firmware version tracking via device shadows to enable version-based targeting for updates
  • Configuration-gated deployments to enable or disable OTA updates per device type for controlled rollouts

Architecture

End-to-End OTA Firmware Update Flow

This diagram illustrates the complete flow from a user's natural language prompt to firmware being flashed on Seeed Studio XIAO ESP32S3 devices.

End-to-End OTA Firmware Update Flow

Flow Steps:

  1. User Prompt - Developer/Operator provides a natural language command (e.g., "Update all vision_ai_face_detector devices to v2.0.0")
  2. AgentCore Runtime - Amazon Bedrock AgentCore receives and processes the request
  3. Strands Agent - The agent with firmware_updater tool reasons about the task
  4. Config Check - Agent queries DynamoDB to verify the device type is enabled for OTA updates
  5. Firmware Metadata - Agent retrieves firmware binary, SHA256 checksum, and metadata from S3
  6. Create IoT Job - Agent creates a continuous IoT Job targeting the specified device group
  7. MQTT Notification - AWS IoT Core notifies devices via MQTT topic $aws/things/+/jobs/notify
  8. Firmware Download - Each XIAO ESP32S3 Vision AI Face Detector downloads the firmware directly from S3
  9. Version Reporting - Devices report their new firmware version to their Device Shadow

Interactive Sequence Diagram

End-to-End OTA Firmware Update Flow

From natural language prompt to firmware flashed on Seeed Studio XIAO ESP32

0/15
UserDeveloper/OperatorAgentCoreBedrock AgentCoreStrandsStrands AgentDynamoDBS3S3 BucketIoT CoreAWS IoT CoreXIAOXIAO ESP320.0s"Update all vision_ai_face_detector to v2.0.0"Natural language0.1sInvoke agent with prompt0.2sGetItem: Check if device type enabled0.3senabled: true0.4svalidate_files_exist()0.5sfirmware.bin, firmware.sha256, metadata.json ✓0.6sCreateJob (continuous, targeting device group)0.7sJob created: job-vision-ai-v2.0.00.8sSuccess: OTA job created for 5 devices0.9s"Created OTA job for vision_ai_face_detector..."1.0sMQTT: $aws/things/+/jobs/notifyAll devices in group2.0sGET firmware.bin (pre-signed URL)5.0sStreaming download (1.6MB)8.0sVerify SHA256 → Flash to APP1 → Reboot12.0sShadow update: firmwareVersion = "2.0.0"
User
AgentCore
Strands
DynamoDB
S3
IoT Core
XIAO
Milestone
Complete
Total: 15 message exchanges across 7 participants
~12 seconds end-to-end (prompt to firmware flashed)

Strands Agent Architecture on Amazon Bedrock AgentCore

This diagram details the internal architecture of the Strands Agent running on Amazon Bedrock AgentCore, showing how the LLM reasons about prompts and orchestrates tool execution.

Strands Agent Architecture

Components:

  • Amazon Bedrock AgentCore - Managed runtime that hosts and scales the agent
  • ECR Container - Docker image (Python 3.12) containing the Strands Agent code
  • Amazon Nova 2 Lite - The LLM that provides reasoning capabilities
  • Agent Loop - The core execution cycle: parse prompt → select tool → execute → respond
  • firmware_updater Tools:
    • push_firmware_update() - Main orchestrator that coordinates the entire OTA process
    • validate_files_exist() - Validates firmware.bin, firmware.sha256, and metadata.json exist in S3
    • create_dynamic_thing_group() - Creates Fleet Indexing queries to target devices by firmware version

Scalability Architecture

This diagram demonstrates how the solution scales effortlessly across multiple device types and large device fleets - implement once and forget.

Scalability Architecture

Key Scalability Features:

  • Single Agent, Multiple Device Types - One Strands Agent manages all 26+ device groups without code changes
  • S3 Folder Convention - Adding a new device type is as simple as creating a new folder (e.g., firmwares/v1.0.0/new_device_type/)
  • Auto-Discovery Mapping - Folder names automatically map to Thing Groups (e.g., vision_ai_face_detectorVisionAIFaceDetectorAWSDevice)
  • Fleet Indexing Queries - Dynamically target devices based on current firmware version, no hardcoded device lists
  • Horizontal Scaling - Add unlimited devices to any group; IoT Jobs handles distribution automatically

Firmware Rollback Architecture

This diagram shows how the solution enables rollback to any previous firmware version using a simple prompt, leveraging the dual-partition architecture of the Seeed Studio XIAO ESP32.

Firmware Rollback Architecture

Key Rollback Features:

  • Version History in S3 - All firmware versions are retained (v1.0.0, v2.0.0, v3.0.0, etc.) enabling rollback to any point
  • Dual-Partition Flash Layout - XIAO ESP32 uses APP0/APP1 partitions for safe ping-pong updates
  • Persistent Storage - NVS (WiFi, config) and SPIFFS (certificates) survive firmware updates
  • SHA256 Validation - Firmware integrity verified before committing to new partition
  • Automatic Rollback - If new firmware fails to boot and connect to MQTT, device automatically reverts to previous partition

Interactive Sequence Diagram

Firmware Rollback Sequence

Rollback to any previous firmware version with dual-partition safety

0/17
UserDeveloper/OperatorStrandsStrands AgentS3S3 (Version History)IoT CoreAWS IoT CoreXIAOXIAO ESP32FlashFlash Partitions0.0s"Rollback vision_ai_face_detector to v1.0.0"Natural language0.2sList versions: v1.0.0, v2.0.0, v3.0.00.3sv1.0.0 exists with firmware.bin + sha2560.5sCreateJob: target v1.0.0 firmware0.6sJob created: rollback-v1.0.00.7s"Rollback job created, targeting 5 devices"1.0sMQTT: Job notification with v1.0.0 URL2.0sGET v1.0.0/firmware.bin5.0sStream v1.0.0 firmware (1.6MB)6.0sWrite to APP1 partition (inactive)7.0sAPP1 written, SHA256 verified7.5sSet boot partition: APP18.0sESP.restart() → Reboot10.0sBoot from APP1 (v1.0.0)12.0sMQTT Connect successful12.5sMark APP1 as valid (ota_mark_valid)13.0sShadow: firmwareVersion = "1.0.0"
User
Strands
S3
IoT Core
XIAO
Flash
Dual-partition safety: APP0 preserved during rollback to APP1
~13 seconds to rollback (with auto-recovery on failure)

Multi-Interface Access Architecture

This diagram demonstrates how the Strands Agent can be accessed through multiple interfaces with different authentication methods - enabling developers to use their preferred tools while operators can use a web-based chatbot.

Multi-Interface Access Architecture

Interface Options:

  • MCP Clients (Developer Tools) - Claude Code and Kiro CLI connect via Model Context Protocol to a Streaming AgentCore Runtime using JWT/Cognito authentication
  • Chatbot (Web UI) - AWS Amplify React app with FirmwareAssistant component connects via Lambda proxy to an IAM AgentCore Runtime using SigV4 authentication for service-to-service communication
  • Two Runtimes, Same Agent Logic - Both runtimes run the same Strands Agent code but are deployed separately with different authentication methods suited to their use cases

Firmware AI Assistant Chatbot

The chatbot interface in an Amplify React App provides a conversational way to manage firmware updates. In this example, the assistant lists all available firmware versions across device groups, and then creates an OTA job to update the pet_feeder device group to the latest firmware version.

Firmware AI Assistant Chatbot

Authentication Architecture

This diagram illustrates the multi-layer security model ensuring that all access to the firmware management system is properly authenticated. Each interface uses a different authentication method suited to its use case.

Authentication Architecture

Authentication Layers:

  • Cognito JWT (MCP Path) - Developers using Claude Code and Kiro CLI authenticate via Amazon Cognito User Pool and receive JWT tokens, connecting to the Streaming AgentCore Runtime
  • IAM SigV4 (Chatbot Path) - The Lambda proxy authenticates using AWS IAM roles with SigV4 request signing for service-to-service communication with the IAM AgentCore Runtime
  • X.509 Certificates (Device Path) - XIAO ESP32 devices authenticate to AWS IoT Core using TLS 1.2 mutual authentication with per-device certificates
  • Certificate Chain - Amazon Root CA validates device certificates stored in SPIFFS (survives firmware updates)

Serverless Architecture Overview

This diagram provides a comprehensive view of all AWS services used in the solution - every component is fully serverless with no EC2 instances to manage.

Serverless Architecture Overview

Serverless Components:

  • Frontend - AWS Amplify Hosting, AppSync GraphQL, Cognito User Pool
  • Compute - Amazon Bedrock AgentCore, Lambda Functions, EventBridge Rules
  • Storage - S3 Firmware Bucket, DynamoDB Config Table
  • IoT - IoT Core, IoT Jobs, Device Shadows, Fleet Indexing
  • Monitoring - CloudWatch Logs & Alarms, SNS Notifications
  • CI/CD - CodeBuild (ARM64), ECR Container Registry

Firmware Integrity Verification (SHA256)

This diagram shows the firmware integrity verification process that ensures firmware hasn't been corrupted during download before flashing to the device.

Firmware Integrity Verification

Verification Flow:

  1. Download - XIAO ESP32 streams firmware.bin from S3 in chunks
  2. Calculate - SHA256 hash is calculated progressively during download (streaming hash)
  3. Compare - Calculated hash is compared against expected hash from firmware.sha256 file
  4. Flash Decision - Match: proceed to flash APP1 partition | Mismatch: abort OTA and report failure

Benefits:

  • Detects corruption during download (network issues, incomplete transfers)
  • Prevents flashing of tampered firmware
  • Memory-efficient streaming verification (no need to store entire firmware before hashing)

Interactive Sequence Diagram

SHA256 Integrity Verification Sequence

Streaming hash verification during firmware download

0/15
IoT JobAWS IoT JobXIAOXIAO ESP32S3S3 BucketOTA MgrOTA ManagerFlashFlash Memory0.0sJob document with firmware URL + expected SHA2560.1sStart OTA: updateFromURLWithChecksum(url, sha256)0.2sHTTP GET firmware.bin (Content-Length: 1.6MB)0.5sStream chunk 1 (1KB)0.6sSHA256.update(chunk1) → Running hash0.7sWrite chunk 1 to APP1 partition3.0sStream chunks 2...1600 (1KB each)~1600 iterations3.5sSHA256.update(chunks) → Accumulating hash4.0sWrite remaining chunks to APP15.0sDownload complete (EOF)5.1sSHA256.finalize() → Calculated hash5.2sCompare: calculated == expected ?5.3s✓ MATCH: Commit APP1, set boot partitionSafe to flash!5.5sAPP1 committed successfully5.6sOTA_SUCCESS: Ready to reboot
IoT Job
XIAO
S3
OTA Mgr
Flash
Memory-efficient: Hash calculated during download, not after
~5.6 seconds (download + verify + commit)

Safe Rollout with Rate Limiting & Abort Thresholds

This diagram illustrates the safety mechanisms that prevent fleet-wide failures during OTA updates by controlling rollout speed and automatically aborting when issues are detected.

Safe Rollout with Rate Limiting

Safety Mechanisms:

  • Rate Limiting - Updates are deployed to a maximum of 10 devices concurrently, preventing network congestion and allowing monitoring
  • Abort Thresholds - Job automatically cancels if failure rate exceeds 5% or more than 10 absolute failures occur
  • Batch Processing - Fleet of 100 devices is updated in batches, with completed, in-progress, and pending states tracked
  • Failure Monitoring - Real-time tracking of success/failure status feeds into abort decision logic
  • Auto-Cancel - When threshold is exceeded, all pending device updates are automatically cancelled
  • SNS Alerts - Operators are immediately notified when an OTA rollout is aborted

Interactive Sequence Diagram

Safe Rollout with Abort Threshold

Rate-limited deployment with automatic abort on failure threshold

0/16
StrandsStrands AgentIoT JobsAWS IoT JobsBatch 1Batch 1 (10 devices)Batch 2Batch 2 (10 devices)MonitorFailure MonitorSNSSNS Alerts0.0sCreateJob: maxConcurrent=10, abortThreshol...Rate limiting config0.1sJob created: 100 devices in queue0.5sDeploy to Batch 1 (devices 1-10)First 10 devices15.0sDevice 1-8: SUCCESS18.0sDevice 9-10: SUCCESS18.5sBatch 1 complete: 10/10 success (0% failure)18.6s✓ Below threshold, continue rollout19.0sDeploy to Batch 2 (devices 11-20)32.0sDevice 11-14: SUCCESS45.0sDevice 15-17: FAILED (network timeout)3 failures!45.5sRunning total: 13 success, 3 failed (18.75%)45.6sCheck: 18.75% > 5% threshold45.7s⚠️ ABORT: Failure threshold exceeded!Stop rollout46.0sCancel pending jobs (devices 18-100)46.5sPublish abort notification47.0sEmail/SMS: "OTA aborted: 3 failures i...
Strands
IoT Jobs
Batch 1
Batch 2
Monitor
SNS
Prevented 84 additional devices from receiving bad firmware
Fleet-wide failure avoided by abort threshold

Source Code

The source code for this project is available on GitHub:

note

This repository is not yet open sourced. It will be made public in a future update.