Skip to main content

12 posts tagged with "Serverless"

Serverless

View All Tags

Real-Time Voice to Sign Language Translation - Part 3: Edge AI Agent with Strands Agents on NVIDIA Jetson

· 13 min read
Chiwai Chan
Tinkerer

This is Part 3 of a 3-part series covering a real-time voice-to-sign-language translation system. In Part 1, I covered the React frontend that captures speech, processes it with Amazon Nova 2 Sonic, and publishes cleaned sentence text via MQTT. In Part 2, I covered the AWS CDK stack that routes IoT Core messages through Lambda to AppSync for real-time GraphQL subscriptions.

NVIDIA Jetson AGX Thor Developer Kit

This post covers the final piece — the edge AI agent that actually makes the physical hand move. It is a Strands Agent running on an NVIDIA Jetson that subscribes to MQTT commands from the frontend, uses Amazon Nova 2 Lite to invoke the fingerspell tool, drives the Pollen Robotics Amazing Hand's Feetech SCS0009 servos for ASL fingerspelling letter by letter, records video of the hand in action, uploads it to S3, and publishes hand state back to IoT Core — which Part 2's infrastructure routes through to the frontend via AppSync.

The three repositories in the series:

  1. Part 1 - Frontend and Voice Processing (amplify-react-nova-sonic-voice-chat-amazing-hand) — React web app that captures speech, streams to Nova 2 Sonic, publishes cleaned sentence text via MQTT
  2. Part 2 - Cloud Infrastructure (cdk-iot-amazing-hand-streaming) — AWS CDK stack that routes IoT Core messages through Lambda to AppSync
  3. This post (Part 3) - Edge AI Agent (strands-agents-amazing-hands) — Strands Agent powered by Amazon Nova 2 Lite on NVIDIA Jetson that translates sentence text to ASL servo commands, drives the Amazing Hand, and publishes state back

Goals

  • Receive MQTT commands from the React frontend (plain text or JSON with sentence field) and drive the Amazing Hand servos for ASL fingerspelling
  • Use the Strands Agents framework with Amazon Nova 2 Lite (us.amazon.nova-2-lite-v1:0) to invoke the fingerspell tool — the LLM passes the incoming text verbatim to the tool for letter-by-letter ASL spelling
  • Fingerspell text using the 26-letter ASL alphabet (A-Z), with each letter held for 0.8 seconds and spaces adding a 0.4-second pause
  • Control 8 Feetech SCS0009 servos (4 fingers x 2 joints) on the Pollen Robotics Amazing Hand via serial bus at 1M baud using the rustypot library
  • Record video of the hand via OpenCV during each fingerspelling sequence, encode to H.264 MP4 via imageio-ffmpeg, upload to S3, and include a presigned URL in the state message
  • Publish real-time hand state (servo angles, letter, video URL) to IoT Core over MQTT — which Part 2's CDK stack routes to AppSync for the frontend to consume
  • Authenticate to AWS IoT Core using mTLS with X.509 device certificates
  • Create a fresh agent instance per MQTT message to prevent conversation history accumulation and unbounded token growth
  • Handle graceful shutdown with servo torque disable on SIGINT/SIGTERM

The Overall System

This diagram shows the complete end-to-end system. Part 3 is the edge device highlighted on the right — the NVIDIA Jetson running the Strands Agent that controls the Amazing Hand.

Overall System with Part 3 Highlighted

How Part 3 fits in:

  • Part 1 (Frontend) publishes cleaned sentence text to the-project/robotic-hand/{deviceName}/action via MQTT
  • Part 3 (This agent) subscribes to the /action topic, processes the command through the Strands Agent, drives the servos, records video, and publishes state back to /state
  • Part 2 (Infrastructure) picks up the /state messages and routes them through Lambda to AppSync, where the frontend receives them via GraphQL subscriptions

Architecture

The agent is a Python application built on the Strands Agents framework. It runs as a long-lived MQTT listener on the NVIDIA Jetson, creating a fresh agent instance for each incoming message to keep memory bounded.

Agent Architecture

Agent Architecture

Components:

  • MQTT Listener (agent.py) — Subscribes to the action topic, parses incoming messages (plain text or JSON), and submits each action to a single-threaded agent executor to keep the AWS CRT MQTT event loop free
  • Strands Agent — A fresh Agent instance created per message with Amazon Nova 2 Lite as the model, the fingerspell tool as the available action, and a MaxToolCallsHook (limit 3) to prevent runaway tool-call loops
  • Fingerspell Tool (hand_control.py) — A @tool decorated function that the LLM invokes to spell text letter-by-letter using the 26-letter ASL alphabet
  • Servo Controller — Uses rustypot.Scs0009PyController to communicate with 8 Feetech SCS0009 servos over serial at 1M baud. Each finger has two servos controlled by dedicated move functions (Move_Index, Move_Middle, Move_Ring, Move_Thumb)
  • Video Recorder (video_recorder.py) — Background daemon thread captures frames via OpenCV, encodes to H.264 MP4 via imageio-ffmpeg, uploads to S3, and returns a presigned URL (1-hour expiry)
  • State Publisher — Non-blocking MQTT publisher on a separate thread that sends hand state (finger angles, letter, video URL) to the /state topic with QoS 1

Data Flow

Interactive Sequence Diagram

Edge Agent: MQTT Command to Servo Control Flow

From MQTT command to ASL fingerspelling with video capture

0/13
IoT CoreListenerMQTT ListenerAgentStrands AgentNovaNova 2 LiteServosServo ControllerS3S3 + Video0msMQTT message: { "sentence": "hello world" }QoS 11msParse JSON, extract sentence field2msstart_recording() — launch camera daemon thread3msCreate fresh Agent instance + submit to executorNo history from prior messages5msConverse API: system prompt + action text + fingersp...200msTool selection: fingerspell(text="hello world")210msfingerspell: Move H-E-L-L-O (0.8s per letter)Serial bus @ 1M baud300msPublish state per letter: { letter: "H", fingers: {....Non-blocking thread4500msfingerspell: Move W-O-R-L-D (0.8s per letter)4600msPublish state per letter: { letter: "W", fingers: {....8800msstop_recording_and_upload() — encode H.264 + upload ...9000msPresigned URL (1hr expiry)9001msRe-publish last state with video_url appended
IoT Core
Listener
Agent
Nova
Servos
S3
Milestone
Complete
Total: 13 steps across 6 components
MQTT command → ASL fingerspelling + video in ~9 seconds

How it works

MQTT Command Reception

The agent subscribes to an MQTT action topic (e.g. the-project/robotic-hand/XIAOAmazingHandRight/action) using mTLS authentication with X.509 device certificates. The first connection uses clean_session=True to flush any stale session state, then reconnects with clean_session=False for normal operation.

When a message arrives, the handler tries to parse it as JSON and extract the sentence field. If JSON parsing fails, it treats the entire payload as plain text. The action is then submitted to a single-threaded executor (agent_executor) to keep the AWS CRT MQTT event loop free:

def on_message(topic, payload, dup, qos, retain, **kwargs):
payload_str = payload.decode("utf-8")
try:
data = json.loads(payload_str)
action = data.get("sentence", payload_str)
except json.JSONDecodeError:
action = payload_str
agent_executor.submit(_process_action, action)

Strands Agent and Amazon Nova 2 Lite

The Strands Agents framework provides the core AI reasoning loop. A fresh agent instance is created for every MQTT message — this is deliberate to prevent conversation history from accumulating across messages, which would cause unbounded token growth over time.

The agent uses Amazon Nova 2 Lite (us.amazon.nova-2-lite-v1:0) via the Bedrock Converse API. Nova 2 Lite was chosen for its low-latency tool-use responses, which is critical for real-time servo control. The agent is configured with a MaxToolCallsHook that cancels tool calls beyond 3 to prevent infinite LLM tool-call loops.

The agent runs in fingerspell-only mode — only the fingerspell tool is available. The system prompt instructs the LLM to pass the entire message verbatim to the fingerspell tool without shortening or modifying it. State messages include a letter field identifying the current ASL letter being signed.

Servo Hardware and Control

Pollen Robotics Amazing Hand

The Amazing Hand — an open-source robotic hand designed by Pollen Robotics and manufactured by Seeed Studio — has 4 fingers (index, middle, ring, thumb — no pinky) with 2 Feetech SCS0009 servos per finger (8 servos total) connected via a Waveshare driver board over serial USB at 1,000,000 baud.

Each servo has an angle range of -90 to +90 degrees. Per-servo calibration offsets (MiddlePos) are applied during move operations to account for physical alignment:

MiddlePos = [-17, 8, -16, -4, -12, 10, -9, 9]

The control sequence for each finger:

  1. Set goal speed for both servos (write_goal_speed) with a 0.2ms sleep between each speed write for serial bus timing
  2. Convert angle to radians with calibration offset: np.deg2rad(MiddlePos[i] + angle)
  3. Set goal position for both servos (write_goal_position)
  4. 5ms sleep after positions are set before the next finger's commands

ASL Fingerspelling Tool

The fingerspell(text) tool is decorated with @tool from the Strands framework, making it callable by the LLM during inference. It spells text letter-by-letter using the ASL alphabet. Each of the 26 letters (A-Z) is mapped to servo angle tuples for all 4 fingers. Each letter is held for 0.8 seconds, spaces add a 0.4-second pause, and non-letter characters are skipped. A state message with the current letter field is published after each letter.

Since the Amazing Hand has no pinky finger, ASL letters that require a pinky use the ring finger instead.

Video Recording Pipeline

Video is recorded concurrently with each fingerspelling sequence:

  1. Start recording — Before the agent is invoked, start_recording() launches a background daemon thread (video-capture) that captures frames from OpenCV VideoCapture(0) at the camera's native FPS (typically 30)
  2. Stop and encode — After the agent completes, stop_recording_and_upload() stops the capture thread, converts frames from BGR (OpenCV) to RGB, and encodes to H.264 MP4 using imageio.v3 with the libx264 codec. The temp file is named hand_YYYYMMDD_HHMMSS_
  3. Upload to S3 — The MP4 is uploaded to the configured S3 bucket (default: cc-amazing-video) with key videos/hand_YYYYMMDD_HHMMSS.mp4
  4. Presigned URL — A presigned URL is generated with 1-hour expiry and appended to the last state message, which is re-published to the /state topic

State Publishing

After each servo movement, the tool publishes a state message to the MQTT /state topic (e.g. the-project/robotic-hand/XIAOAmazingHandRight/state) with QoS 1. Publishing is non-blocking — it submits to a dedicated _publish_executor thread to avoid blocking the servo tool.

The state payload:

{
"gesture": "fingerspell",
"letter": "E",
"ts": 1770550850,
"fingers": {
"index": { "angle_1": 45, "angle_2": -45 },
"middle": { "angle_1": 45, "angle_2": -45 },
"ring": { "angle_1": 45, "angle_2": -45 },
"thumb": { "angle_1": 60, "angle_2": -60 }
},
"video_url": "https://cc-amazing-video.s3.amazonaws.com/videos/hand_20260228.mp4?..."
}

The last published state is cached so that publish_state_with_video_url() can re-publish it with the presigned URL appended after video upload completes — without needing to re-read servo angles.

This state payload is what Part 2's CDK stack picks up via the IoT Rule, flattens in Lambda, and pushes into AppSync for the frontend to consume.

Threading Model

The agent uses two thread pools and a daemon thread to keep operations non-blocking:

ThreadTypeWorkersPurpose
agent_executorThreadPoolExecutor1Runs Strands agent off the AWS CRT MQTT event loop
_publish_executorThreadPoolExecutor1Publishes state messages non-blocking
video-captureDaemon Thread1Background camera frame capture

Graceful Shutdown

On SIGINT or SIGTERM, the agent:

  1. Sets a stop event to exit the main loop
  2. Disables servo torque (write_torque_enable(1, 2)) to release the servos and prevent power draw
  3. Disconnects from MQTT
  4. Logs completion

Technical Challenges & Solutions

Challenge 1: Conversation History Bloat

Problem: Strands Agents maintain conversation history by default. Over time, as hundreds of MQTT messages are processed, the token count grows unboundedly, increasing latency and cost.

Solution: A fresh Agent instance is created for every MQTT message. This discards all prior conversation history, keeping each invocation lightweight. Token usage (input, output, total) is logged after each invocation for monitoring.

Challenge 2: Runaway Tool-Call Loops

Problem: The LLM might enter a loop of calling tools repeatedly — for example, calling fingerspell then deciding to call it again with modified text, then again.

Solution: A custom MaxToolCallsHook implementing the Strands HookProvider interface. It counts tool calls per agent invocation and cancels any tool call beyond the limit of 3. This is injected into the agent via hooks=[MaxToolCallsHook()].

Challenge 3: No Pinky Finger on the Amazing Hand

Problem: The Pollen Robotics Amazing Hand has only 4 fingers (index, middle, ring, thumb) — no pinky. Several ASL letters require specific pinky positions (e.g. I, J, Y).

Solution: ASL letters that require a pinky use the ring finger instead. The 26-letter ASL alphabet is manually mapped to 4-finger servo angle tuples, approximating the correct hand shape with the available fingers.

Challenge 4: Serial Bus Timing

Problem: Sending servo commands too quickly over the serial bus causes missed commands or erratic movement. The Feetech SCS0009 protocol requires time between operations.

Solution: A 0.2ms sleep is inserted between speed writes, and a 5ms sleep is added after both goal positions are set, giving the serial bus time to process each command before the next finger's sequence begins.

Getting Started

GitHub Repository: https://github.com/chiwaichan/strands-agents-amazing-hands

Prerequisites

  • NVIDIA Jetson (AGX Thor or Orin Nano Super) with Python 3.10+
  • Pollen Robotics Amazing Hand connected via USB serial (Waveshare driver board)
  • AWS IoT Core device certificates (certificate, private key, root CA)
  • Amazon Bedrock access enabled for Nova 2 Lite in us-east-1
  • USB camera connected to the Jetson
  • S3 bucket for video storage (default: cc-amazing-video)

Installation

git clone https://github.com/chiwaichan/strands-agents-amazing-hands.git
cd strands-agents-amazing-hands
pip install -e .

Running the Agent

amazing-hand-agent \
--endpoint your-iot-endpoint.iot.us-east-1.amazonaws.com \
--cert certs/device.pem.crt \
--key certs/device.pem.key \
--ca certs/AmazonRootCA1.pem \
--topic the-project/robotic-hand/XIAOAmazingHandRight/action \
--serial-port /dev/amazing-hand-right \
--s3-bucket cc-amazing-video

The agent will connect to IoT Core, subscribe to the action topic, and wait for commands. When a message arrives, it will process it through the Strands Agent, drive the servos, record video, and publish state back.

Summary

This post covered the edge AI agent — the final piece of the voice-to-sign-language translation system:

  • Strands Agents framework with Amazon Nova 2 Lite for tool-use — a fresh agent per MQTT message prevents history bloat, with MaxToolCallsHook limiting calls to 3
  • ASL fingerspelling with the 26-letter alphabet (A-Z), each letter held for 0.8 seconds — the fingerspell tool is decorated with @tool for LLM invocation
  • 8 Feetech SCS0009 servos on 4 fingers controlled via rustypot over serial at 1M baud, with per-servo calibration offsets
  • Video pipeline captures via OpenCV in a background daemon thread, encodes to H.264 MP4 via imageio-ffmpeg, uploads to S3, and includes a 1-hour presigned URL in the final state message
  • Non-blocking threading with 2 thread pools (agent executor off MQTT event loop, state publisher) and a daemon thread for video capture
  • Real-time state publishing to IoT Core after every servo movement — which Part 2's CDK stack routes through Lambda to AppSync, completing the feedback loop to the React frontend in Part 1
  • Graceful shutdown disables servo torque on SIGINT/SIGTERM to release the servos and prevent power draw

Real-Time Voice to Sign Language Translation - Part 2: Cloud Infrastructure with AWS CDK, IoT Core, and AppSync

· 11 min read
Chiwai Chan
Tinkerer

This is Part 2 of a 3-part series covering a real-time voice-to-sign-language translation system. In Part 1, I covered the React frontend that captures speech, processes it with Amazon Nova 2 Sonic, and publishes cleaned sentence text via MQTT. But there is a missing piece — how does the frontend know what the physical hand is actually doing?

The answer is this repository: a small but critical AWS CDK stack that acts as the bridge between the edge device and the React frontend. It routes real-time hand state data from IoT Core to AppSync, enabling the frontend to receive live updates via GraphQL subscriptions — so the 3D hand animation stays synchronised with the physical Amazing Hand — an open-source robotic hand designed by Pollen Robotics and manufactured by Seeed Studio.

The three repositories in the series:

  1. Part 1 - Frontend and Voice Processing (amplify-react-nova-sonic-voice-chat-amazing-hand) — React web app that captures speech, streams to Nova 2 Sonic, publishes cleaned sentence text via MQTT
  2. This post (Part 2) - Cloud Infrastructure (cdk-iot-amazing-hand-streaming) — AWS CDK stack that routes IoT Core messages through Lambda to AppSync for real-time GraphQL subscriptions
  3. Part 3 - Edge AI Agent (strands-agents-amazing-hands) — Strands Agent powered by Amazon Nova 2 Lite on NVIDIA Jetson that translates sentence text to ASL servo commands, drives the Amazing Hand, and publishes state back

Goals

  • Route real-time hand state data from IoT Core MQTT to AppSync using an IoT Rules Engine SQL query and Lambda
  • Flatten nested MQTT finger angle payloads into a flat GraphQL schema for the createHandState mutation
  • Enable the React frontend to receive live hand state updates via AppSync onCreateHandState GraphQL subscriptions
  • Extract the device name dynamically from the MQTT topic path using topic(3) in the IoT Rule SQL
  • Define all infrastructure as code using AWS CDK in TypeScript
  • Integrate with the existing Amplify Gen 2 managed AppSync API and DynamoDB table from Part 1

The Overall System

This diagram shows the complete end-to-end system. Part 2 is the infrastructure highlighted in the middle — the IoT Rule, Lambda, and AppSync connection that enables real-time state feedback from the edge device back to the frontend.

Overall System with Part 2 Highlighted

How Part 2 fits in:

  • Part 1 (Frontend) publishes cleaned sentence text to the-project/robotic-hand/{deviceName}/action and subscribes to AppSync onCreateHandState for live updates
  • Part 3 (Edge Device) receives sentence text, translates it to ASL servo commands via the Strands Agent powered by Amazon Nova 2 Lite, drives the Amazing Hand, and publishes state back to the-project/robotic-hand/{deviceName}/state
  • Part 2 (This stack) listens on the /state topic, transforms the payload, and pushes it into AppSync — completing the real-time feedback loop

Architecture

The stack is intentionally small — a single IoT Rule, a single Lambda function, and the IAM glue to connect them. The AppSync API and DynamoDB table are managed by the Amplify Gen 2 backend in Part 1, so this stack only needs to call the existing createHandState mutation.

Infrastructure Overview

CDK Stack Architecture

Resources created by this CDK stack:

  • IoT Topic Rule (AmazingHandStateStreamingRule) — Matches MQTT messages on the-project/robotic-hand/+/state using SQL SELECT gesture, letter, ts, fingers, video_url, topic(3) AS device_name, then invokes the Lambda function
  • Lambda Function (AmazingHandToAppSyncFunction) — Node.js 18 function that receives the IoT event, flattens the nested fingers object into individual angle fields, and calls the AppSync createHandState GraphQL mutation using the Amplify v6 SDK with API Key authentication
  • Lambda IAM Role — Service role with AWSLambdaBasicExecutionRole for CloudWatch Logs and an inline policy granting appsync:GraphQL on the AppSync API
  • Lambda Permission — Allows the IoT service (iot.amazonaws.com) to invoke the Lambda function

Resources managed externally (by Amplify Gen 2 in Part 1):

  • AppSync API — GraphQL API with HandState model, createHandState mutation, and onCreateHandState subscription
  • DynamoDB TableHandState table with auto-generated resolvers from the @model directive

Data Flow

Interactive Sequence Diagram

IoT Core to AppSync Data Flow

From edge device MQTT publish to React real-time subscription update

0/10
JetsonNVIDIA JetsonIoT CoreRulesIoT Rules EngineLambdaAppSyncReactReact App0msMQTT publish: the-project/robotic-hand/XIAOAmazingHandRig...Nested fingers payload1msTopic matches: the-project/robotic-hand/+/state2msSQL: SELECT gesture, letter, ts, fingers, video_url, topi...Extract device_name from topic5msInvoke with enriched event (device_name added)6msValidate device_name exists7msFlatten: fingers.index.angle_1 → indexAngle1 (x8 fields)Defaults: 0 for angles, null for optional10mscreateHandState mutation (API Key auth)Amplify v6 SDK15msPersist to DynamoDB (HandState table)20msonCreateHandState subscription pushReal-time WebSocket21msUpdate 3D hand animation + letter history + video feed
Jetson
IoT Core
Rules
Lambda
AppSync
React
Milestone
Complete
Total: 10 steps across 6 components
Edge device state → React UI update in ~20ms

How it works

IoT Rules Engine SQL Query

The IoT Rule is the entry point. It listens on the MQTT topic pattern the-project/robotic-hand/+/state where + is a single-level wildcard matching any device name (e.g. XIAOAmazingHandRight).

The SQL query (using AWS IoT SQL version 2016-03-23) selects specific fields from the MQTT payload and enriches them with metadata extracted from the topic path:

SELECT gesture, letter, ts, fingers, video_url, topic(3) AS device_name
FROM 'the-project/robotic-hand/+/state'
  • gesture — The type of sign being performed (e.g. "fingerspell")
  • letter — The current letter being signed (e.g. "E")
  • ts — Unix timestamp in seconds
  • fingers — Nested JSON object containing servo angles for all four fingers, each with two joint angles
  • video_url — Optional pre-signed S3 URL for video of the hand in action
  • topic(3) AS device_name — Extracts the 3rd segment of the MQTT topic path as the device name, so the Lambda does not need to parse the topic itself

MQTT Payload Format

The edge device publishes hand state messages in this format:

{
"gesture": "fingerspell",
"letter": "E",
"ts": 1770550850,
"fingers": {
"index": { "angle_1": 45, "angle_2": -45 },
"middle": { "angle_1": 45, "angle_2": -45 },
"ring": { "angle_1": 45, "angle_2": -45 },
"thumb": { "angle_1": 60, "angle_2": -60 }
},
"video_url": "https://cc-amazing-video.s3.amazonaws.com/videos/hand_20260228.mp4?..."
}

The fingers object uses a nested structure with angle_1 and angle_2 per finger — representing the two joints of each finger on the Amazing Hand. This nested format is natural for the edge device to produce but needs to be flattened for the GraphQL schema.

Lambda Function — Payload Transformation

The Lambda function (AmazingHandToAppSyncFunction) receives the IoT event with the enriched fields from the SQL query. Its job is to:

  1. Validate that the device_name field exists (required for the GraphQL mutation)
  2. Flatten the nested fingers object into individual angle fields:
MQTT PayloadGraphQL Field
fingers.index.angle_1indexAngle1
fingers.index.angle_2indexAngle2
fingers.middle.angle_1middleAngle1
fingers.middle.angle_2middleAngle2
fingers.ring.angle_1ringAngle1
fingers.ring.angle_2ringAngle2
fingers.thumb.angle_1thumbAngle1
fingers.thumb.angle_2thumbAngle2
  1. Default missing angle values to 0, missing gesture/letter/videoUrl to null, and missing timestamp to Math.floor(Date.now() / 1000)
  2. Call AppSync createHandState mutation using the Amplify v6 SDK configured with API Key authentication

The Lambda uses the Amplify v6 SDK (aws-amplify@^6.0.0) to call AppSync, configured via environment variables:

Amplify.configure({
API: {
GraphQL: {
endpoint: process.env.APPSYNC_API_URL,
region: process.env.AWS_REGION,
defaultAuthMode: 'apiKey',
apiKey: process.env.APPSYNC_API_KEY
}
}
});

GraphQL Mutation

The Lambda calls this mutation to persist the hand state and trigger the real-time subscription:

mutation CreateHandState($input: CreateHandStateInput!) {
createHandState(input: $input) {
id
deviceName
gesture
letter
indexAngle1
indexAngle2
middleAngle1
middleAngle2
ringAngle1
ringAngle2
thumbAngle1
thumbAngle2
timestamp
videoUrl
createdAt
}
}

When AppSync receives this mutation, two things happen:

  1. The hand state record is persisted to DynamoDB via the auto-generated @model resolver
  2. The onCreateHandState subscription is triggered, pushing the new record to all subscribed clients — including the React frontend from Part 1, which uses this data to update the 3D hand animation, signed letter history, and video feed in real-time

CDK Stack Definition

The entire stack is defined in approximately 74 lines of TypeScript. The stack accepts the AppSync API URL, API key, and API ID as props, which are injected via environment variables during deployment:

interface IoTStreamingStackProps extends cdk.StackProps {
appSyncApiUrl: string;
appSyncApiKey: string;
appSyncApiId: string;
}

The stack creates the Lambda function with the AppSync connection details as environment variables, grants it appsync:GraphQL permissions scoped to the specific API, creates the IoT Topic Rule with the SQL query, and grants IoT permission to invoke the Lambda.

Two stack outputs are exported for reference:

  • AmazingHandIoTRuleArn — The IoT Rule ARN
  • AmazingHandLambdaFunctionArn — The Lambda function ARN

CI/CD Pipeline

The project includes a GitHub Actions workflow (.github/workflows/aws-cdk-deploy.yml) that automates deployment:

  1. Triggers on pushes to main and dev branches
  2. Authenticates using OIDC (no static AWS credentials stored in GitHub)
  3. Automatically discovers the AppSync configuration by:
    • Reading the Amplify App ID from SSM Parameter Store (/iot/amplify/amazinghand)
    • Finding the Amplify data CloudFormation stack
    • Extracting the AppSync API ID from CloudFormation stack resources, then querying the AppSync API directly for the URL and API key
  4. Runs cdk deploy with the discovered values

This means the stack automatically stays connected to the correct AppSync API without manual configuration.

Technical Challenges & Solutions

Challenge 1: Flattening Nested IoT Payloads for GraphQL

Problem: The edge device publishes finger angles in a nested JSON structure (fingers.index.angle_1), but the AppSync GraphQL schema uses flat fields (indexAngle1). The IoT Rules Engine SQL can select nested objects but cannot rename nested fields into flat ones.

Solution: The Lambda function handles the transformation. It receives the nested fingers object from the IoT Rule and manually flattens each field with safe defaults (0 for missing angles, null for optional fields). This keeps the IoT Rule SQL simple and the edge device payload natural.

Challenge 2: Connecting to Amplify-Managed AppSync

Problem: The AppSync API is managed by Amplify Gen 2 in Part 1's repository, not by this CDK stack. The API URL, API key, and API ID change between environments and deployments.

Solution: The CI/CD pipeline automatically discovers the AppSync configuration at deploy time by reading from SSM Parameter Store and CloudFormation stack outputs. For local development, the values are passed via environment variables in deploy.sh. The CDK stack accepts them as typed props, keeping the infrastructure code clean.

Challenge 3: Extracting Device Name from MQTT Topic

Problem: The device name is part of the MQTT topic path (the-project/robotic-hand/XIAOAmazingHandRight/state), not the message payload. The Lambda needs it to set the deviceName field in the GraphQL mutation.

Solution: The IoT Rules Engine SQL function topic(3) extracts the 3rd segment of the topic path and aliases it as device_name. This is passed to the Lambda as part of the event, so the Lambda does not need to parse the topic itself. The wildcard + in the topic filter means this works for any device name without configuration changes.

Getting Started

GitHub Repository: https://github.com/chiwaichan/cdk-iot-amazing-hand-streaming

Prerequisites

  • Node.js 18+
  • AWS CDK CLI installed (npm install -g aws-cdk)
  • AWS CLI configured with credentials
  • An existing AppSync API with the HandState schema (deployed via Part 1's Amplify Gen 2 backend)

Deployment Steps

  1. Get AppSync configuration from Part 1's Amplify deployment:
export APPSYNC_API_ID=your_api_id
export APPSYNC_API_URL=https://your-api-id.appsync-api.us-east-1.amazonaws.com/graphql
export APPSYNC_API_KEY=your_api_key
  1. Clone and Install:
git clone https://github.com/chiwaichan/cdk-iot-amazing-hand-streaming.git
cd cdk-iot-amazing-hand-streaming
npm install
cd lambda/amazing-hand-to-appsync && npm install && cd ../..
  1. Deploy:
./deploy.sh

This bootstraps CDK (if needed) and deploys the stack with the AppSync configuration.

What's Next

In Part 3, I will cover the edge AI agent (strands-agents-amazing-hands) — a Strands Agent powered by Amazon Nova 2 Lite running on an NVIDIA Jetson that subscribes to the MQTT sentence text published by the frontend in Part 1, translates them into physical servo movements on the Pollen Robotics Amazing Hand for ASL fingerspelling, records video, and publishes hand state back to IoT Core — which this Part 2 stack routes through to AppSync for the frontend to consume.

Summary

This post covered the cloud infrastructure layer of the voice-to-sign-language translation system:

  • IoT Rules Engine listens on the-project/robotic-hand/+/state and extracts device name from the topic path using topic(3)
  • Lambda function flattens nested finger angle payloads (fingers.index.angle_1indexAngle1) and calls the AppSync createHandState GraphQL mutation
  • AppSync persists to DynamoDB and broadcasts onCreateHandState subscriptions to connected React clients in real-time
  • CDK stack is intentionally small (~74 lines) — it creates only the IoT Rule, Lambda, and IAM glue, relying on the Amplify-managed AppSync API from Part 1
  • CI/CD pipeline automatically discovers AppSync configuration from SSM Parameter Store, CloudFormation stack resources, and direct AppSync API calls — no manual configuration needed
  • The stack completes the real-time feedback loop: edge device publishes state → IoT Core → Lambda → AppSync → React frontend updates 3D hand animation

Real-Time Voice to Sign Language Translation with Amazon Nova 2 Sonic and Pollen Robotics Amazing Hand - Part 1: Frontend and Voice Processing

· 16 min read
Chiwai Chan
Tinkerer

Pollen Robotics Amazing Hand

This is Part 1 of a 3-part series covering a real-time voice-to-sign-language translation system. The complete solution spans three separate repositories, each responsible for a distinct layer of the system:

  1. This post (Part 1) - Frontend and Voice Processing — The React web app that captures speech, streams it to Amazon Nova 2 Sonic on Bedrock, publishes cleaned sentence text via MQTT, and renders a real-time 3D hand visualisation
  2. Part 2 - Cloud Infrastructure (cdk-iot-amazing-hand-streaming) — The AWS CDK stack that routes IoT Core messages through Lambda to AppSync, enabling real-time GraphQL subscriptions between the edge device and the frontend
  3. Part 3 - Edge AI Agent (strands-agents-amazing-hands) — The Strands Agent powered by Amazon Nova 2 Lite running on an NVIDIA Jetson that receives MQTT sentence text, translates it to ASL servo commands, drives the Pollen Robotics Amazing Hand for fingerspelling, and streams video and state back

In this post, I focus on how speech enters the system, how Amazon Nova 2 Sonic processes and cleans up the spoken input, and how the frontend publishes cleaned sentence text over MQTT — setting the stage for Parts 2 and 3.

The key idea is that Nova 2 Sonic is not used as a chatbot here — it is configured as a dumb speech-to-text relay pipe that cleans up grammar, removes filler words like "um" and "uh", translates non-English speech to English, and forwards the cleaned text via a forced tool invocation (send_text) on every single utterance. The frontend then publishes the cleaned sentence text to AWS IoT Core over MQTT for the edge device to translate into ASL servo commands.

Goals

  • Capture speech in the browser and stream it to Amazon Nova 2 Sonic via bidirectional streaming — no backend servers required
  • Use Nova 2 Sonic's forced tool use (send_text) with toolChoice: { any: {} } to relay cleaned text on every utterance, not as a conversational chatbot
  • Publish cleaned sentence text to AWS IoT Core over MQTT for the edge device to translate into ASL servo commands
  • Subscribe to real-time hand state updates via GraphQL (AppSync) and synchronise a 3D Three.js hand animation with the physical hand
  • Use AWS Amplify Gen 2 for infrastructure-as-code backend definition in TypeScript (Cognito, AppSync, IAM policies)
  • Display a 3-column UI with signed letter history, 3D hand animation with video feed, and live transcript with microphone controls

The Overall System

The end-to-end system takes spoken words from a browser microphone all the way through to physical ASL fingerspelling on an Amazing Hand — an open-source robotic hand designed by Pollen Robotics and manufactured by Seeed Studio — passing through cloud AI, IoT messaging, and an edge AI agent along the way.

Overall System Architecture

System Components:

  1. React Frontend (this post) - Captures speech, streams to Bedrock, publishes cleaned sentence text to MQTT, renders 3D hand animation synchronised with the physical hand via GraphQL subscriptions
  2. Cloud Infrastructure (Part 2) - AWS CDK stack with IoT Core rules that route MQTT messages through Lambda to AppSync, enabling real-time GraphQL subscriptions between the edge device and the frontend
  3. Edge AI Agent (Part 3) - Strands Agent powered by Amazon Nova 2 Lite on an NVIDIA Jetson that receives MQTT sentence text, translates it to ASL servo commands, drives the Amazing Hand for fingerspelling letter by letter, records video, and publishes hand state back via IoT Core

Interactive Sequence Diagram

End-to-End Voice to Sign Language Flow

From user speech to ASL fingerspelling on the Amazing Hand

0/13
UserBrowserReact AppBedrockAmazon BedrockIoT CoreAWS IoT CoreJetsonNVIDIA JetsonHandAmazing Hand0.0sClick microphone and speakUser speaks naturally0.1sAudioWorklet: capture → resample → PCM → Base640.2sInvokeModelWithBidirectionalStream (audio chunks)Real-time streaming2.0ssend_text tool invocation (cleaned English text)Filler words removed, translated to English2.1sMQTT publish: { id, sentence, ts }Plain text, no servo data2.2sMQTT subscribe: receive sentence text2.5sStrands Agent + Nova 2 Lite: sentence → ASL letter l...LLM tool use + ASL_ALPHABET table3.0sDrive servos for ASL fingerspelling letter by letterLetter by letter via serial4.0sPublish hand state (servo angles + video)4.1sGraphQL subscription: hand state updateVia AppSync4.2sUpdate 3D hand animation + video feed + letter history4.5saudioOutput: voice confirmation ("Sent")4.6sPlay audio confirmation + see 3D hand + video
User
Browser
Bedrock
IoT Core
Jetson
Hand
Milestone
Complete
Total: 13 steps across 6 components (3 repos)
Speech → Sentence → Edge AI → ASL Fingerspelling

Architecture

The frontend is built with React 19, Vite 7, and TypeScript 5.9. The application is structured around a main VoiceChat.tsx component that orchestrates four custom hooks, three utility modules, and a Three.js-based hand animation component.

Application UI

React Hooks Architecture

React Hooks Architecture

Components:

  • VoiceChat.tsx - Main UI component with a 3-column responsive layout. Coordinates all hooks, renders the transcript feed, microphone controls, signed letter history, hand state data grid, video feed, and 3D animation. Collapses to a single column on screens under 1100px
  • useNovaSonic - Core hook managing the Bedrock bidirectional stream with InvokeModelWithBidirectionalStreamCommand. Handles authentication via Cognito, the Nova 2 Sonic event protocol (session/prompt/content lifecycle), the async generator input stream with backpressure, and send_text tool use responses. The tool is configured with toolChoice: { any: {} } to force tool invocation on every utterance
  • useAudioRecorder - Captures microphone input using an inline AudioWorklet running in a separate thread. Accumulates 2048 samples per buffer, resamples from the device sample rate (typically 48kHz) to 16kHz, converts Float32 to PCM16, and Base64 encodes for transmission
  • useAudioPlayer - Provides audio playback capability (FIFO queue of AudioBuffers at 24kHz). In the current implementation, Nova 2 Sonic's audio output is intentionally discarded since only the cleaned text via tool use is needed — the hook is available but not actively fed audio data
  • useHandStream - Subscribes to AppSync GraphQL onCreateHandState subscription filtered by device name. Fetches the last 20 hand states on mount and maintains a real-time list of 8 servo angles (thumb, index, middle, ring — each with two joint angles), letters, and video URLs
  • iotPublisher.ts - Publishes MQTT messages to the topic the-project/robotic-hand/XIAOAmazingHandRight/action. Publishes cleaned sentence text as { id, sentence, ts } payloads and handles IoT policy attachment to the Cognito identity
  • HandAnimation.tsx - Procedurally generated 3D robotic hand using Three.js with no external 3D models. The palm is built with LatheGeometry (curved cup shape), and each finger has a dual-joint rig (proximal + distal) with synchronised linkage. Uses WebGL rendering with PCFSoftShadowMap shadows, OrbitControls, and industrial-style materials with metalness/roughness

Authentication Flow

The frontend needs temporary AWS credentials to call both Bedrock (for Nova 2 Sonic streaming) and IoT Core (for MQTT publishing). No long-term credentials are stored in the browser.

Authentication Flow

Authentication Layers:

  • Cognito User Pool - Handles user registration and login with email/password. Configured via Amplify Gen 2 defineAuth with preferredUsername as an optional attribute
  • Cognito Identity Pool - Exchanges JWT tokens from the User Pool for temporary AWS credentials (access key, secret key, session token). Credentials are automatically refreshed by the Amplify SDK before expiration
  • IAM Role - The authenticated user role grants two sets of permissions: bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream scoped to amazon.nova-2-sonic-v1:0 in us-east-1, and iot:Publish, iot:Connect, iot:DescribeEndpoint, and iot:AttachPolicy for IoT Core MQTT access. An IoT Core policy (RoboticHandPolicy) is also attached to the Cognito identity at runtime to authorise MQTT publishing to the the-project/robotic-hand/* topic pattern

How it works

Audio Capture and Processing

The browser captures audio from the microphone using the Web Audio API and an AudioWorklet running in a separate thread. The AudioWorklet avoids main-thread blocking and processes audio in real-time with echo cancellation and noise suppression enabled.

Audio Processing Pipeline

Input Processing (Recording):

  1. Microphone - Browser calls getUserMedia() to capture audio at the device's native sample rate (typically 48kHz) with mono channel, echo cancellation, and noise suppression enabled
  2. AudioWorklet - An inline AudioCaptureProcessor (loaded as a Blob URL to avoid CORS issues) runs in a separate thread. It accumulates samples in a buffer and posts a Float32Array message to the main thread every 2048 samples
  3. Resample - Linear interpolation resampling converts from 48kHz to 16kHz (Nova 2 Sonic's required input rate). The ratio is calculated dynamically from the actual device sample rate
  4. Float32 to PCM16 - Floating point samples in the range [-1, 1] are converted to 16-bit signed integers. Negative values are multiplied by 0x8000 and positive values by 0x7FFF
  5. Base64 Encode - The binary PCM data is encoded to Base64 text for JSON transmission to Bedrock via a custom uint8ArrayToBase64() utility that iterates bytes into a binary string and then calls btoa()

Amazon Nova 2 Sonic Bidirectional Streaming

The heart of the system is the bidirectional stream to Amazon Nova 2 Sonic using InvokeModelWithBidirectionalStreamCommand. Nova 2 Sonic is configured not as a chatbot, but as a speech relay that cleans up input and forwards it via forced tool use.

Bidirectional Streaming Protocol

Input Events (sent to Bedrock):

  • sessionStart - Initialises the session with inference configuration: maxTokens: 1024, topP: 0.9, temperature: 0.7
  • promptStart - Configures audio output format: audio/lpcm at 24kHz, 16-bit, mono, voice matthew, Base64 encoding. Also defines the send_text tool with toolChoice: { any: {} } to force tool invocation on every utterance
  • contentStart (TEXT) - Sends the system prompt that instructs Nova 2 Sonic to act as a "dumb speech-to-text relay pipe" — clean up grammar, remove filler words, translate non-English to English, call send_text with the cleaned text, then respond with only "Sent"
  • contentStart (AUDIO) - Marks the beginning of audio input content
  • audioInput - Streams Base64-encoded 16kHz PCM audio chunks in real-time as the user speaks
  • contentEnd / promptEnd / sessionEnd - Lifecycle events to terminate content blocks, prompts, and sessions

Output Events (received from Bedrock):

  • textOutput - Returns transcribed user speech and the generated AI response text ("Sent")
  • toolUse - The send_text tool invocation containing the cleaned text in { sentence: "..." } format. This is the primary output — the frontend publishes the sentence to MQTT for the edge device to translate into ASL servo commands
  • audioOutput - Synthesised voice response as Base64-encoded 24kHz PCM. In the current implementation, audio output is intentionally discarded since only the cleaned text via tool use is needed

Tool Use — send_text:

  • The tool is defined with toolChoice: { any: {} }, which forces Nova 2 Sonic to call it on every single utterance without exception
  • The tool accepts a single sentence parameter — the cleaned-up, well-formed sentence
  • When the tool invocation arrives, the frontend extracts the sentence and publishes it as { id, sentence, ts } to IoT Core via MQTT using publishSentence(). The edge device then translates the sentence into ASL servo commands
  • A JSON tool result ({ "status": "success", "sentence": "..." }) is sent back to Nova 2 Sonic to complete the tool use cycle

Publishing Sentences via MQTT

Once the cleaned sentence is extracted from the send_text tool invocation, iotPublisher.ts publishes it to the MQTT topic the-project/robotic-hand/XIAOAmazingHandRight/action via AWS IoT Core.

MQTT Command Flow

The payload is a simple JSON object containing:

  • id - A UUID for the message
  • sentence - The cleaned sentence text from Nova 2 Sonic
  • ts - Unix timestamp in seconds

The edge device (covered in Part 3) receives this sentence and is responsible for translating it into ASL servo commands and driving the physical hand.

Interactive Sequence Diagram

MQTT Command Pipeline

From Nova 2 Sonic text output to IoT Core sentence publish

0/7
NovaNova 2 SonicHookuseNovaSonicVoiceChatVoiceChat.tsxPublisheriotPublisher.tsIoT CoreAWS IoT Core0mssend_text tool invocation: { sentence: "hello w...Cleaned text from speech1msonToolUse callback with tool event2msParse event.content JSON, extract sentence3mspublishSentence(sentence)4msBuild payload: { id: uuid, sentence, ts }Plain text, no servo data5msMQTT publish to .../action topic6msTool result: { status: "success", sentence: ".....
Nova
Hook
VoiceChat
Publisher
IoT Core
Milestone
Sentence Publish Pipeline
Speech → Nova 2 Sonic cleanup → send_text tool use → publishSentence → MQTT to edge device

The browser console logs the performance breakdown for each utterance through the voice-to-IoT pipeline. In this example, the end-to-end time from speech detection to IoT publish is approximately 2.9 seconds — with the majority spent on Speech-to-Text (2228ms) as Nova 2 Sonic processes the audio, followed by Text-to-Tool extraction (423ms) and IoT Publish (243ms):

Voice-to-IoT Pipeline Performance

Real-Time Hand State via GraphQL Subscription

The frontend subscribes to AppSync's onCreateHandState GraphQL subscription to receive real-time updates from the edge device. Each update includes the device name, current letter being signed, all 8 servo angles (thumb, index, middle, ring — each with two joint angles), a timestamp, and an optional video URL.

On mount, the hook fetches the last 20 hand states to populate the UI immediately. New states arrive in real-time as the edge device publishes them back through IoT Core → Lambda → AppSync. The data is displayed in both the signed letter history panel and the raw hand state data grid.

3D Hand Visualisation

The HandAnimation.tsx component renders a procedurally generated 3D robotic hand using Three.js — no external 3D models are loaded. The entire hand is built from code:

  • The palm uses LatheGeometry to create a curved cup shape that tapers from a narrow wrist (radius 0.18) to wide knuckles (radius 0.56)
  • Each finger has a dual-joint rig with proximal and distal segments, knuckle joints, linkage bars, and fingertips. The thumb is mounted on the side of the palm and rotates on the Z-axis, while the index, middle, and ring fingers are mounted on the front rim and rotate on the X-axis
  • The distal joint automatically follows the proximal joint at 50% of its angle, simulating a synchronised linkage mechanism
  • Materials use industrial-style metalness/roughness: dark gray frame (0x2a2a2a), light gray joints (0x888888), and darker gray tips (0x555555)
  • The scene includes PCFSoftShadowMap shadows, ambient lighting (0.8), directional light (1.0), and a fill light (0.4), with OrbitControls for interactive zoom and rotation

Servo angle updates from the GraphQL subscription drive the finger rotations in real-time, keeping the 3D animation synchronised with the physical Amazing Hand.

Audio Playback

The useAudioPlayer hook provides a FIFO queue-based audio playback capability for Web Audio AudioBuffer objects at 24kHz. However, in the current implementation, Nova 2 Sonic's audio output is intentionally discarded — the onAudioOutput callback is set to a no-op since only the cleaned text via the send_text tool use is needed to drive the MQTT pipeline. The hook remains available for future use if audio feedback is desired.

Technical Challenges & Solutions

Challenge 1: AudioWorklet CORS Issues

Problem: Loading an AudioWorklet processor from an external JavaScript file fails with CORS errors on some deployments, particularly when using Amplify Hosting.

Solution: Inline the AudioWorklet code as a Blob URL. The processor code is defined as a string, converted to a Blob with type application/javascript, and loaded via URL.createObjectURL(). The object URL is revoked after the module is added:

const blob = new Blob([audioWorkletCode], { type: 'application/javascript' });
const workletUrl = URL.createObjectURL(blob);
await audioContext.audioWorklet.addModule(workletUrl);
URL.revokeObjectURL(workletUrl);

Challenge 2: Forcing Tool Use on Every Utterance

Problem: Nova 2 Sonic is a conversational model by default — it wants to chat and respond naturally. But in this system, it needs to act as a pure relay, forwarding every single utterance as cleaned text without adding commentary or refusing any messages.

Solution: A combination of system prompt engineering and forced tool use. The system prompt explicitly instructs Nova 2 Sonic to act as a "dumb speech-to-text relay pipe" and never add commentary. The send_text tool is configured with toolChoice: { any: {} }, which forces the model to invoke a tool on every response. After calling the tool, it is instructed to only respond with "Sent".

Challenge 3: Keeping MQTT Payloads Simple

Problem: The system needs to transmit the user's intent from the frontend to the edge device reliably via IoT Core MQTT.

Solution: Rather than translating text to servo commands on the frontend (which would require large payloads with many servo poses), the frontend publishes only the cleaned sentence text as a compact { id, sentence, ts } JSON payload. The edge device is responsible for translating the sentence into ASL servo commands, keeping the MQTT messages small and the frontend simple.

Getting Started

GitHub Repository: https://github.com/chiwaichan/amplify-react-nova-sonic-voice-chat-amazing-hand

Prerequisites

  • Node.js 18+
  • AWS Account with Amazon Nova 2 Sonic model access enabled in Bedrock (us-east-1 region)
  • AWS CLI configured with credentials

Deployment Steps

  1. Enable Nova 2 Sonic in Bedrock Console (us-east-1 region)

  2. Clone and Install:

git clone https://github.com/chiwaichan/amplify-react-nova-sonic-voice-chat-amazing-hand.git
cd amplify-react-nova-sonic-voice-chat-amazing-hand
npm install
  1. Start Amplify Sandbox:
npx ampx sandbox
  1. Run Development Server:
npm run dev
  1. Open Application: Navigate to http://localhost:5173, create an account, and start talking. Note that the full system requires Parts 2 and 3 to be deployed for the physical hand to respond — but the frontend will still capture speech, process it through Nova 2 Sonic, and display the 3D hand animation independently.

What's Next

In Part 2, I will cover the cloud infrastructure layer — the AWS CDK stack (cdk-iot-amazing-hand-streaming) that routes IoT Core MQTT messages through Lambda to AppSync. This is the bridge that enables real-time GraphQL subscriptions, allowing the frontend to receive hand state updates from the edge device as they happen.

In Part 3, I will cover the edge AI agent (strands-agents-amazing-hands) — a Strands Agent powered by Amazon Nova 2 Lite running on an NVIDIA Jetson that subscribes to the MQTT sentence text published by this frontend, translates them into physical servo movements on the Pollen Robotics Amazing Hand for ASL fingerspelling, records video of the hand in action, and publishes state back through IoT Core.

Summary

This post covered the frontend and voice processing layer of a real-time voice-to-sign-language translation system:

  • Amazon Nova 2 Sonic is used not as a chatbot but as a speech relay — configured via system prompt and toolChoice: { any: {} } forced send_text tool use to clean up grammar, remove filler words, translate to English, and forward every utterance as text
  • Audio pipeline captures at 48kHz via AudioWorklet, resamples to 16kHz, converts to PCM16 Base64 for Bedrock input. Nova 2 Sonic's audio output is intentionally discarded since only the cleaned text is needed
  • MQTT publishing sends cleaned sentence text as { id, sentence, ts } to AWS IoT Core for the edge device to translate into ASL servo commands
  • Real-time feedback via GraphQL subscriptions keeps the 3D Three.js hand animation synchronised with the physical Amazing Hand using 8 servo angles (thumb, index, middle, ring — each with two joints)
  • Fully serverless frontend using AWS Amplify Gen 2 with Cognito authentication, no backend servers — direct browser-to-Bedrock and browser-to-IoT Core communication

Agentic based Over-The-Air Firmware Management of Seeed Studio XIAO ESP32S3 IoT Device Firmware using Amazon AgentCore and Strands Agents

· 8 min read
Chiwai Chan
Tinkerer

I want to have the ability to be able to manage the firmware of all IoT devices using a prompt - it could be to upgrade a device to the latest version, or even to perform a rollback, whether across the entire IoT device fleet level - every device in all 20+ solution types, all the devices within a type of solution, or even at an individual device level.

Goals

  • To be able to over-the-air flash a new firmware version using a prompt
  • To have an Agentic Agent do all the work, give it a prompt and it takes cares of the rest
  • Scalable in the number of IoT devices, as well as, being able to scale as the number of new IoT solution Types increases; with no effort required - implement once and forget
  • Have the ability to rollback to any firmware version specified in the prompt
  • This same solution can be interfaced with using the Model Context Protocol (MCP): whether via Kiro CLI or Claude Code
  • This same solution can be interfaced with using a chatbot
  • Must be authenticated to interface with this solution
  • Must be a completely serverless-solution
  • Firmware integrity verification using SHA256 checksums before flashing to ensure firmware hasn't been corrupted during download
  • Safe rollout with rate limiting and automatic abort thresholds to prevent fleet-wide failures
  • Device firmware version tracking via device shadows to enable version-based targeting for updates
  • Configuration-gated deployments to enable or disable OTA updates per device type for controlled rollouts

Architecture

End-to-End OTA Firmware Update Flow

This diagram illustrates the complete flow from a user's natural language prompt to firmware being flashed on Seeed Studio XIAO ESP32S3 devices.

End-to-End OTA Firmware Update Flow

Flow Steps:

  1. User Prompt - Developer/Operator provides a natural language command (e.g., "Update all vision_ai_face_detector devices to v2.0.0")
  2. AgentCore Runtime - Amazon Bedrock AgentCore receives and processes the request
  3. Strands Agent - The agent with firmware_updater tool reasons about the task
  4. Config Check - Agent queries DynamoDB to verify the device type is enabled for OTA updates
  5. Firmware Metadata - Agent retrieves firmware binary, SHA256 checksum, and metadata from S3
  6. Create IoT Job - Agent creates a continuous IoT Job targeting the specified device group
  7. MQTT Notification - AWS IoT Core notifies devices via MQTT topic $aws/things/+/jobs/notify
  8. Firmware Download - Each XIAO ESP32S3 Vision AI Face Detector downloads the firmware directly from S3
  9. Version Reporting - Devices report their new firmware version to their Device Shadow

Interactive Sequence Diagram

End-to-End OTA Firmware Update Flow

From natural language prompt to firmware flashed on Seeed Studio XIAO ESP32

0/15
UserDeveloper/OperatorAgentCoreBedrock AgentCoreStrandsStrands AgentDynamoDBS3S3 BucketIoT CoreAWS IoT CoreXIAOXIAO ESP320.0s"Update all vision_ai_face_detector to v2.0.0"Natural language0.1sInvoke agent with prompt0.2sGetItem: Check if device type enabled0.3senabled: true0.4svalidate_files_exist()0.5sfirmware.bin, firmware.sha256, metadata.json ✓0.6sCreateJob (continuous, targeting device group)0.7sJob created: job-vision-ai-v2.0.00.8sSuccess: OTA job created for 5 devices0.9s"Created OTA job for vision_ai_face_detector..."1.0sMQTT: $aws/things/+/jobs/notifyAll devices in group2.0sGET firmware.bin (pre-signed URL)5.0sStreaming download (1.6MB)8.0sVerify SHA256 → Flash to APP1 → Reboot12.0sShadow update: firmwareVersion = "2.0.0"
User
AgentCore
Strands
DynamoDB
S3
IoT Core
XIAO
Milestone
Complete
Total: 15 message exchanges across 7 participants
~12 seconds end-to-end (prompt to firmware flashed)

Strands Agent Architecture on Amazon Bedrock AgentCore

This diagram details the internal architecture of the Strands Agent running on Amazon Bedrock AgentCore, showing how the LLM reasons about prompts and orchestrates tool execution.

Strands Agent Architecture

Components:

  • Amazon Bedrock AgentCore - Managed runtime that hosts and scales the agent
  • ECR Container - Docker image (Python 3.12) containing the Strands Agent code
  • Amazon Nova 2 Lite - The LLM that provides reasoning capabilities
  • Agent Loop - The core execution cycle: parse prompt → select tool → execute → respond
  • firmware_updater Tools:
    • push_firmware_update() - Main orchestrator that coordinates the entire OTA process
    • validate_files_exist() - Validates firmware.bin, firmware.sha256, and metadata.json exist in S3
    • create_dynamic_thing_group() - Creates Fleet Indexing queries to target devices by firmware version

Scalability Architecture

This diagram demonstrates how the solution scales effortlessly across multiple device types and large device fleets - implement once and forget.

Scalability Architecture

Key Scalability Features:

  • Single Agent, Multiple Device Types - One Strands Agent manages all 26+ device groups without code changes
  • S3 Folder Convention - Adding a new device type is as simple as creating a new folder (e.g., firmwares/v1.0.0/new_device_type/)
  • Auto-Discovery Mapping - Folder names automatically map to Thing Groups (e.g., vision_ai_face_detectorVisionAIFaceDetectorAWSDevice)
  • Fleet Indexing Queries - Dynamically target devices based on current firmware version, no hardcoded device lists
  • Horizontal Scaling - Add unlimited devices to any group; IoT Jobs handles distribution automatically

Firmware Rollback Architecture

This diagram shows how the solution enables rollback to any previous firmware version using a simple prompt, leveraging the dual-partition architecture of the Seeed Studio XIAO ESP32.

Firmware Rollback Architecture

Key Rollback Features:

  • Version History in S3 - All firmware versions are retained (v1.0.0, v2.0.0, v3.0.0, etc.) enabling rollback to any point
  • Dual-Partition Flash Layout - XIAO ESP32 uses APP0/APP1 partitions for safe ping-pong updates
  • Persistent Storage - NVS (WiFi, config) and SPIFFS (certificates) survive firmware updates
  • SHA256 Validation - Firmware integrity verified before committing to new partition
  • Automatic Rollback - If new firmware fails to boot and connect to MQTT, device automatically reverts to previous partition

Interactive Sequence Diagram

Firmware Rollback Sequence

Rollback to any previous firmware version with dual-partition safety

0/17
UserDeveloper/OperatorStrandsStrands AgentS3S3 (Version History)IoT CoreAWS IoT CoreXIAOXIAO ESP32FlashFlash Partitions0.0s"Rollback vision_ai_face_detector to v1.0.0"Natural language0.2sList versions: v1.0.0, v2.0.0, v3.0.00.3sv1.0.0 exists with firmware.bin + sha2560.5sCreateJob: target v1.0.0 firmware0.6sJob created: rollback-v1.0.00.7s"Rollback job created, targeting 5 devices"1.0sMQTT: Job notification with v1.0.0 URL2.0sGET v1.0.0/firmware.bin5.0sStream v1.0.0 firmware (1.6MB)6.0sWrite to APP1 partition (inactive)7.0sAPP1 written, SHA256 verified7.5sSet boot partition: APP18.0sESP.restart() → Reboot10.0sBoot from APP1 (v1.0.0)12.0sMQTT Connect successful12.5sMark APP1 as valid (ota_mark_valid)13.0sShadow: firmwareVersion = "1.0.0"
User
Strands
S3
IoT Core
XIAO
Flash
Dual-partition safety: APP0 preserved during rollback to APP1
~13 seconds to rollback (with auto-recovery on failure)

Multi-Interface Access Architecture

This diagram demonstrates how the Strands Agent can be accessed through multiple interfaces with different authentication methods - enabling developers to use their preferred tools while operators can use a web-based chatbot.

Multi-Interface Access Architecture

Interface Options:

  • MCP Clients (Developer Tools) - Claude Code and Kiro CLI connect via Model Context Protocol to a Streaming AgentCore Runtime using JWT/Cognito authentication
  • Chatbot (Web UI) - AWS Amplify React app with FirmwareAssistant component connects via Lambda proxy to an IAM AgentCore Runtime using SigV4 authentication for service-to-service communication
  • Two Runtimes, Same Agent Logic - Both runtimes run the same Strands Agent code but are deployed separately with different authentication methods suited to their use cases

Firmware AI Assistant Chatbot

The chatbot interface in an Amplify React App provides a conversational way to manage firmware updates. In this example, the assistant lists all available firmware versions across device groups, and then creates an OTA job to update the pet_feeder device group to the latest firmware version.

Firmware AI Assistant Chatbot

Authentication Architecture

This diagram illustrates the multi-layer security model ensuring that all access to the firmware management system is properly authenticated. Each interface uses a different authentication method suited to its use case.

Authentication Architecture

Authentication Layers:

  • Cognito JWT (MCP Path) - Developers using Claude Code and Kiro CLI authenticate via Amazon Cognito User Pool and receive JWT tokens, connecting to the Streaming AgentCore Runtime
  • IAM SigV4 (Chatbot Path) - The Lambda proxy authenticates using AWS IAM roles with SigV4 request signing for service-to-service communication with the IAM AgentCore Runtime
  • X.509 Certificates (Device Path) - XIAO ESP32 devices authenticate to AWS IoT Core using TLS 1.2 mutual authentication with per-device certificates
  • Certificate Chain - Amazon Root CA validates device certificates stored in SPIFFS (survives firmware updates)

Serverless Architecture Overview

This diagram provides a comprehensive view of all AWS services used in the solution - every component is fully serverless with no EC2 instances to manage.

Serverless Architecture Overview

Serverless Components:

  • Frontend - AWS Amplify Hosting, AppSync GraphQL, Cognito User Pool
  • Compute - Amazon Bedrock AgentCore, Lambda Functions, EventBridge Rules
  • Storage - S3 Firmware Bucket, DynamoDB Config Table
  • IoT - IoT Core, IoT Jobs, Device Shadows, Fleet Indexing
  • Monitoring - CloudWatch Logs & Alarms, SNS Notifications
  • CI/CD - CodeBuild (ARM64), ECR Container Registry

Firmware Integrity Verification (SHA256)

This diagram shows the firmware integrity verification process that ensures firmware hasn't been corrupted during download before flashing to the device.

Firmware Integrity Verification

Verification Flow:

  1. Download - XIAO ESP32 streams firmware.bin from S3 in chunks
  2. Calculate - SHA256 hash is calculated progressively during download (streaming hash)
  3. Compare - Calculated hash is compared against expected hash from firmware.sha256 file
  4. Flash Decision - Match: proceed to flash APP1 partition | Mismatch: abort OTA and report failure

Benefits:

  • Detects corruption during download (network issues, incomplete transfers)
  • Prevents flashing of tampered firmware
  • Memory-efficient streaming verification (no need to store entire firmware before hashing)

Interactive Sequence Diagram

SHA256 Integrity Verification Sequence

Streaming hash verification during firmware download

0/15
IoT JobAWS IoT JobXIAOXIAO ESP32S3S3 BucketOTA MgrOTA ManagerFlashFlash Memory0.0sJob document with firmware URL + expected SHA2560.1sStart OTA: updateFromURLWithChecksum(url, sha256)0.2sHTTP GET firmware.bin (Content-Length: 1.6MB)0.5sStream chunk 1 (1KB)0.6sSHA256.update(chunk1) → Running hash0.7sWrite chunk 1 to APP1 partition3.0sStream chunks 2...1600 (1KB each)~1600 iterations3.5sSHA256.update(chunks) → Accumulating hash4.0sWrite remaining chunks to APP15.0sDownload complete (EOF)5.1sSHA256.finalize() → Calculated hash5.2sCompare: calculated == expected ?5.3s✓ MATCH: Commit APP1, set boot partitionSafe to flash!5.5sAPP1 committed successfully5.6sOTA_SUCCESS: Ready to reboot
IoT Job
XIAO
S3
OTA Mgr
Flash
Memory-efficient: Hash calculated during download, not after
~5.6 seconds (download + verify + commit)

Safe Rollout with Rate Limiting & Abort Thresholds

This diagram illustrates the safety mechanisms that prevent fleet-wide failures during OTA updates by controlling rollout speed and automatically aborting when issues are detected.

Safe Rollout with Rate Limiting

Safety Mechanisms:

  • Rate Limiting - Updates are deployed to a maximum of 10 devices concurrently, preventing network congestion and allowing monitoring
  • Abort Thresholds - Job automatically cancels if failure rate exceeds 5% or more than 10 absolute failures occur
  • Batch Processing - Fleet of 100 devices is updated in batches, with completed, in-progress, and pending states tracked
  • Failure Monitoring - Real-time tracking of success/failure status feeds into abort decision logic
  • Auto-Cancel - When threshold is exceeded, all pending device updates are automatically cancelled
  • SNS Alerts - Operators are immediately notified when an OTA rollout is aborted

Interactive Sequence Diagram

Safe Rollout with Abort Threshold

Rate-limited deployment with automatic abort on failure threshold

0/16
StrandsStrands AgentIoT JobsAWS IoT JobsBatch 1Batch 1 (10 devices)Batch 2Batch 2 (10 devices)MonitorFailure MonitorSNSSNS Alerts0.0sCreateJob: maxConcurrent=10, abortThreshol...Rate limiting config0.1sJob created: 100 devices in queue0.5sDeploy to Batch 1 (devices 1-10)First 10 devices15.0sDevice 1-8: SUCCESS18.0sDevice 9-10: SUCCESS18.5sBatch 1 complete: 10/10 success (0% failure)18.6s✓ Below threshold, continue rollout19.0sDeploy to Batch 2 (devices 11-20)32.0sDevice 11-14: SUCCESS45.0sDevice 15-17: FAILED (network timeout)3 failures!45.5sRunning total: 13 success, 3 failed (18.75%)45.6sCheck: 18.75% > 5% threshold45.7s⚠️ ABORT: Failure threshold exceeded!Stop rollout46.0sCancel pending jobs (devices 18-100)46.5sPublish abort notification47.0sEmail/SMS: "OTA aborted: 3 failures i...
Strands
IoT Jobs
Batch 1
Batch 2
Monitor
SNS
Prevented 84 additional devices from receiving bad firmware
Fleet-wide failure avoided by abort threshold

Source Code

The source code for this project is available on GitHub:

note

This repository is not yet open sourced. It will be made public in a future update.

Real-Time Voice Chat with Amazon Nova Sonic using React and AWS Amplify Gen 2

· 8 min read
Chiwai Chan
Tinkerer

These days I am often creating small generic re-usable building blocks that I can pontentially use across new or existing projects, in this blog I talk about the architecture for a LLM based voice chatbot in a web browser built entirely as a serverless based solution.

The key component of this solution is using Amazon Nova 2 Sonic, a speech-to-speech foundation model that can understand spoken audio directly and generate voice responses - all through a single bidirectional stream from the browser directly to Amazon Bedrock, with no backend servers required - no EC2 instances and no Containers.

Goals

  • Enable real-time voice-to-voice conversations with AI using Amazon Nova 2 Sonic
  • Direct browser-to-Bedrock communication using bidirectional streaming - no Lambda functions or API Gateway required
  • Use AWS Amplify Gen 2 for infrastructure-as-code backend definition in TypeScript
  • Implement secure authentication using Cognito User Pool and Identity Pool for temporary AWS credentials
  • Handle real-time audio capture, processing, and playback entirely in the browser
  • Must be a completely serverless solution with automatic scaling
  • Support click-to-talk interaction model for intuitive user experience
  • Display live transcripts of both user speech and AI responses

Architecture

End-to-End Voice Chat Flow

This diagram illustrates the complete flow from a user speaking into their microphone to hearing the AI assistant's voice response.

End-to-End Voice Chat Flow

Flow Steps:

  1. User Speaks - User clicks the microphone button and speaks naturally
  2. Audio Capture - Browser captures audio via Web Audio API at 48kHz
  3. Authentication - React app authenticates with Cognito User Pool
  4. Token Exchange - JWT tokens exchanged for Identity Pool credentials
  5. AWS Credentials - Temporary AWS credentials (access key, secret, session token) returned
  6. Bidirectional Stream - Audio streamed to Bedrock via InvokeModelWithBidirectionalStream
  7. Voice Response - Nova Sonic processes speech and returns synthesized voice response
  8. Audio Playback - Response audio decoded and played through speakers
  9. User Hears - User hears the AI assistant's natural voice response

Interactive Sequence Diagram

Voice Chat Sequence Flow

From user speech to AI voice response via Amazon Nova 2 Sonic

0/21
UserBrowserReact AppCognitoBedrockAmazon BedrockNovaNova 2 Sonic0.0sClick microphone buttonUser interaction0.1sfetchAuthSession()0.2sTemporary AWS credentials0.3sInvokeModelWithBidirectionalStreamHTTP/2 streaming0.4ssessionStart + promptStart + contentStart0.5sInitialize speech-to-speech model1.0sSpeak into microphoneAudio capture1.1sAudioWorklet: capture → resample → PCM → B...1.2saudioInput events (streaming chunks)Real-time3.0sClick mic to stop recording3.1scontentEnd + new contentStart (AUDIO)3.2sProcess audio → transcribe → generate resp...3.5stextOutput: transcription + response text3.6stextOutput event3.7sSynthesize voice response (24kHz)3.8saudioOutput events (streaming chunks)3.9sBase64 → PCM → Float32 → AudioBuffer4.0sPlay audio through speakers5.0sClick mic to continue conversationSame session5.1sStart recording (session already open)5.2sNew audioInput events (next turn)Repeat cycle
User
Browser
Cognito
Bedrock
Nova
Milestone
Complete
Total: 21 steps across 5 components
~4 seconds end-to-end latency

React Hooks Architecture

This diagram details the internal architecture of the React application, showing how custom hooks orchestrate audio capture, Bedrock communication, and playback.

React Hooks Architecture

Components:

  • VoiceChat.tsx - Main UI component that coordinates all hooks and renders the interface
  • useNovaSonic - Core hook managing Bedrock bidirectional stream, authentication, and event protocol
  • useAudioRecorder - Captures microphone input using AudioWorklet in a separate thread
  • useAudioPlayer - Manages audio playback queue and Web Audio API buffer sources
  • audioUtils.ts - Low-level utilities for PCM conversion, resampling, and Base64 encoding

Data Flow:

  1. Microphone audio captured by useAudioRecorder via MediaStream
  2. AudioWorklet processes samples in real-time (separate thread)
  3. Audio resampled from 48kHz to 16kHz, converted to PCM16, then Base64
  4. useNovaSonic streams audio chunks to Bedrock
  5. Response audio received as Base64, decoded to PCM, converted to Float32
  6. useAudioPlayer queues AudioBuffers and plays through AudioContext

Authentication Flow

This diagram shows the multi-layer authentication flow that enables secure browser-to-Bedrock communication without exposing long-term credentials.

Authentication Flow

Authentication Layers:

  • Cognito User Pool - Handles user registration and login with email/password
  • Cognito Identity Pool - Exchanges JWT tokens for temporary AWS credentials
  • IAM Role - Defines permissions for authenticated users (Bedrock invoke access)
  • SigV4 Signing - AWS SDK automatically signs all Bedrock requests

Key Security Features:

  • No AWS credentials stored in browser - only temporary session credentials
  • Credentials automatically refreshed by Amplify SDK before expiration
  • IAM policy scoped to specific Bedrock model (amazon.nova-2-sonic-v1:0)
  • All communication over HTTPS with TLS 1.2+

Audio Processing Pipeline

This diagram shows the real-time audio processing that converts browser audio to Bedrock's required format and vice versa.

Audio Processing Pipeline

Input Processing (Recording):

  1. Microphone - Browser captures audio at native sample rate (typically 48kHz)
  2. AudioWorklet - Processes audio in separate thread, accumulates 2048 samples
  3. Resample - Linear interpolation converts 48kHz → 16kHz (Nova Sonic requirement)
  4. Float32 → PCM16 - Converts floating point [-1,1] to 16-bit signed integers
  5. Base64 Encode - Binary PCM encoded for JSON transmission

Output Processing (Playback):

  1. Base64 Decode - Received audio converted from Base64 to binary
  2. PCM16 → Float32 - 16-bit integers converted to floating point
  3. AudioBuffer - Web Audio API buffer created at 24kHz (Nova Sonic output rate)
  4. Queue & Play - Buffers queued and played sequentially through speakers

Interactive Sequence Diagram

Audio Processing Pipeline

Real-time audio capture, format conversion, and playback

0/16
MicMicrophoneWorkletAudioWorkletUtilsaudioUtils.tsHookuseNovaSonicBedrockPlayeruseAudioPlayer0msgetUserMedia() → MediaStream (48kHz Float32)Browser native20msprocess() called ~50x/sec, accumulate 2048 samples40mspostMessage(Float32Array)To main thread41msresampleAudio(48kHz → 16kHz)Linear interp42msfloat32ToPcm16(): [-1,1] → 16-bit signed int43msuint8ArrayToBase64(): binary → text44msonAudioData(base64String)45msaudioInput event: { content: base64 }Via SDK2000msaudioOutput event: { content: base64 } (24kHz)Streaming2001msqueueAudio(base64String)2002msbase64ToUint8Array()2003msUint8Array (PCM bytes)2004mscreateAudioBufferFromPcm()2005mspcm16ToFloat32(): 16-bit int → [-1,1]2006msAudioBuffer (24kHz)2007msAudioBufferSourceNode.start() → speakers
Mic
Worklet
Utils
Hook
Bedrock
Player
Milestone
Audio Format Conversions
Input: 48kHz Float32 → 16kHz PCM16 → Base64 | Output: Base64 → PCM16 → Float32 → 24kHz AudioBuffer

Bidirectional Streaming Protocol

This diagram illustrates how the useNovaSonic hook manages the complex bidirectional streaming protocol with Amazon Bedrock.

Bidirectional Streaming Protocol

Event Protocol: Nova Sonic uses an event-based protocol where each interaction consists of named sessions, prompts, and content blocks.

Input Events (sent to Bedrock):

  • sessionStart - Initializes session with inference parameters (maxTokens: 1024, topP: 0.9, temperature: 0.7)
  • promptStart - Defines output audio format (24kHz, LPCM, voice "matthew")
  • contentStart - Marks beginning of content blocks (TEXT for system prompt, AUDIO for user speech)
  • textInput - Sends system prompt text content
  • audioInput - Streams user audio chunks as Base64-encoded 16kHz PCM
  • contentEnd - Marks end of content block
  • promptEnd / sessionEnd - Terminates prompt and session

Output Events (received from Bedrock):

  • contentStart - Marks role transitions (USER for ASR, ASSISTANT for response)
  • textOutput - Returns transcribed user speech and generated AI response text
  • audioOutput - Returns synthesized voice response as Base64-encoded 24kHz PCM
  • contentEnd - Marks end of response content

Async Generator Pattern: The SDK requires input as AsyncIterable<Uint8Array>. The hook implements this using:

  • Event Queue - Pre-queued initialization events before stream starts
  • Promise Resolver - Backpressure control for yielding events on demand
  • pushEvent() - Adds new events during conversation (audio chunks)

Serverless Architecture Overview

This diagram provides a comprehensive view of all components - the entire solution is serverless with no EC2 instances or containers to manage.

Serverless Architecture Overview

Frontend Stack:

  • React - Component-based UI framework
  • Vite - Build tool and dev server
  • TypeScript - Type-safe development
  • AWS Amplify Hosting - Static web hosting with global CDN

Backend Stack (Amplify Gen 2):

  • amplify/backend.ts - Infrastructure defined in TypeScript
  • Cognito User Pool - Email-based authentication
  • Cognito Identity Pool - AWS credential vending
  • IAM Policy - Grants bedrock:InvokeModel permission for bidirectional streaming

AI Service:

  • Amazon Bedrock - Managed foundation model inference
  • Nova 2 Sonic - Speech-to-speech model (us-east-1)
  • Bidirectional Streaming - Real-time duplex communication

Technical Challenges & Solutions

Challenge 1: AudioWorklet CORS Issues

Problem: Loading AudioWorklet from external file fails with CORS errors on some deployments.

Solution: Inline the AudioWorklet code as a Blob URL:

const blob = new Blob([audioWorkletCode], { type: 'application/javascript' });
const workletUrl = URL.createObjectURL(blob);
await audioContext.audioWorklet.addModule(workletUrl);
URL.revokeObjectURL(workletUrl);

Challenge 2: Sample Rate Mismatch

Problem: Browsers capture audio at 48kHz, but Nova Sonic requires 16kHz input.

Solution: Linear interpolation resampling in real-time:

const resampleAudio = (audioData: Float32Array, sourceSampleRate: number, targetSampleRate: number) => {
const ratio = sourceSampleRate / targetSampleRate;
const newLength = Math.floor(audioData.length / ratio);
const result = new Float32Array(newLength);
for (let i = 0; i < newLength; i++) {
const srcIndex = i * ratio;
const floor = Math.floor(srcIndex);
const ceil = Math.min(floor + 1, audioData.length - 1);
const t = srcIndex - floor;
result[i] = audioData[floor] * (1 - t) + audioData[ceil] * t;
}
return result;
};

Challenge 3: SDK Bidirectional Stream Input

Problem: AWS SDK requires input as AsyncIterable<Uint8Array>, but events need to be pushed dynamically during the conversation.

Solution: Async generator with event queue and promise-based backpressure:

async function* createInputStream() {
while (isActiveRef.current && !ctrl.closed) {
while (ctrl.eventQueue.length > 0) {
yield ctrl.eventQueue.shift();
}
const nextEvent = await new Promise(resolve => {
ctrl.resolver = resolve;
});
if (nextEvent === null) break;
yield nextEvent;
}
}

Getting Started

GitHub Repository: https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat

Prerequisites

  • Node.js 18+
  • AWS Account with Bedrock access enabled
  • AWS CLI configured with credentials

Deployment Steps

  1. Enable Nova 2 Sonic in Bedrock Console (us-east-1 region)

  2. Clone and Install:

git clone https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat.git
cd amplify-react-amazon-nova-2-sonic-voice-chat
npm install
  1. Start Amplify Sandbox:
npx ampx sandbox
  1. Run Development Server:
npm run dev
  1. Open Application: Navigate to http://localhost:5173, create an account, and start talking!

Summary

This architecture provides a reusable building block for voice-enabled AI applications:

  • Zero backend servers - Direct browser-to-Bedrock communication
  • Real-time streaming - HTTP/2 bidirectional streaming for low latency
  • Secure authentication - Cognito User Pool + Identity Pool + IAM policies
  • Audio processing pipeline - Web Audio API, AudioWorklet, PCM conversion
  • Infrastructure as code - AWS Amplify Gen 2 with TypeScript backend definition

The entire interaction happens in real-time: speak naturally, and hear the AI respond within seconds.

FeedMyFurBabies – I am switching to AWS CDK

· 7 min read
Chiwai Chan
Tinkerer

I have been a bit slack on this Cat Feeder IoT project for the last 12 months or so; there have been many challenges I've faced during that time that prevented me from materialising the ideas I had - many of them sounded a little crazy if you've had a conversation with me in passing, but they are not crazy to me in my crazy mind as I know what I ramble about is technically doable.

Examples of the technical related challenges I had were:

  • CloudFormation: the initial version of this project was implemented using CloudFormation for the IaC, here is the repository containing both the code and deployment instructions. If you read the deployment instructions, you will notice there are a lot of manual steps required - e.g. creating 2 sets of certificates in AWS Iot Core in the AWS Console; and copying and pasting values to and from the CloudFormation Parameters and Outputs, even though at the time I made my best efforts to minimise the manual effort required while coding them. It was not a good example to get it up and running especially if you are new to AWS, Arduino or IoT; as I myself struggled at times to deploy my own example.

  • Terraform: I ported the CloudFormation IaC code to Terraform some time last year, you can find it here. Nothing is wrong with Terraform itself; I just keep forgetting to save or misplaced my terraform state files every time I resume this project. In reality I might leverage both Terraform and CDK for the projects/micro-services I create in the future, but it all really depends on what I am trying to achieve at the end of the day.

Deploying the AWS CDK version of this Cat Feeder IoT project

So, the commands below are the deployment instructions taken from the AWS CDK version of this project, you can find it here: https://github.com/chiwaichan/feedmyfurbabies-cdk-iot

git clone git@github.com:chiwaichan/feedmyfurbabies-cdk-iot.git
cdk feedmyfurbabies-cdk-iot
cdk deploy

git remote rm origin
git remote add origin https://git-codecommit.us-east-1.amazonaws.com/v1/repos/feedmyfurbabies-cdk-iot-FeedMyFurBabiesCodeCommitRepo
git push --set-upstream origin main

The commands above are all you need to execute in order to deploy the Cat Feeder project in CDK - assuming you have the AWS CDK and your AWS credentials configured on the machine you are calling these commands on; the first group of commands checks out the CDK code which deploys an AWS CodeCommit repository and a CodePipeline pipeline - creates the 1st CloudFormation Stack using a CloudFormation template; and the second group of commands pushes the CDK code into the newly created CodeCommit repository created in the first group of commands, which in turns trigger an execution in CodePipeline and the pipeline deploys the resources for this Cat Feeder IoT project - creates the 2nd CloudFormation Stack using a different CloudFormation template.

The two groups of commands creates the 2 CloudFormation Stacks shown in the screenshot below, the stack "feedmyfurbabies-cdk-iot" provisions the CodeCommit repository and CodePipeline - using the 1st CloudFormation template, and the stack "Deploy-feedmyfurbabies-cdk-iot-deployed-service" provisions the resources for this Cat Feeder IoT project - using the 2nd CloudFormation template.

CloudFormation Stacks

FYI, I did not come up with the pattern I just described above that deployed the two CloudFormation Stacks: one for the pipeline and the other for the AWS resources for this Cat Feeder IoT project; I only came across it during one of those AWS online workshops I was using to learn CDK and noticed this pattern and found it useful, and pretty much decided to adopt it for my projects going forward.

Test out the deployed solution

The resources that are relevant to architecture of this AWS IoT solution are shown in the diagram below.

Deployed resources

There are 2 sets of certificates and 2 sets of AWS IoT Things and policies deployed by the "Deploy-feedmyfurbabies-cdk-iot-deployed-service":

IoT Certificates

The 1st set of certificates and IoT Thing is hooked up to the AWS Lambda function (Lambda Thing) shown in the diagram, this Lambda function acts as an AWS IoT Thing (uses the certificates saved in Systems Manager Parameter prefixed with "/feedmyfurbabies-cdk-iot-deployed-service/CatFeederThingLambda") and is fully configured as one along with all the neccessary certificates and permissions to send an MQTT message to the "cat-feeder/action" topic in AWS IoT Core; this is a very convenient way to see in action how one could send MQTT messages to AWS IoT Core using Python, as well as a good way to confirm the deployment was successful by testing it out!

Before we invoke the Lambda Thing/function, we need to subscribe to the "cat-feeder/action" topic so that we could see the incoming messages sent by the Lambda function.

Subscribe to IoT Topic

Then we invoke the Lambda function in the AWS Console:

Lambda Result

Make sure you get a green box confirming the MQTT message was sent.

The code in the Lambda is written in Python and it sends a JSON payload (the dictionary variable shown in the code below) to the IoT Topic "cat-feeder/action"

Lambda Code

Now lets go back to AWS IoT Core to confirm we have received the message:

AWS IoT Core MQTT received

We can see the message received in IoT Core is the dictionary object sent by the Lambda code

Conclusion

Using CDK does not eliminate all the issues you might encounter when using CloudFormation - I have a future blog on creating and using CloudFormation Custom Resources lined up; because at the end of the day CDK just generates a CloudFormation template and handles the deloyment of the CloudFormation Stack for you without you having to manage the CloudFormation Stacks or templates; the intent of this blog is to demonstrate how little effort is required to deploy an AWS IoT solution using CDK, compared with the same architecture I shared in my Github repo 2 years ago but with instructions using a CloudFormation template deployment that was long and tedious in manual steps.

The ultimate aim of change in IaC is to just focusing on building and iterating!

I do often talk too much in my blogs, but in this instance the instructions to deploy this solution for yourself to try out is very minimal, with the majority of the content focused on the resources deployed; and what each resource is for and how they interact with each other.

Extra

You may have noticed that there are 2 sets of certificates deployed in IoT Core and 2 IoT Things in this reference architecture, this is because you can take the 2nd set of certificates (prefixed with "/feedmyfurbabies-cdk-iot-deployed-service/CatFeederThingESP32") and Thing provisioned purely for you to send MQTT message to AWS IoT Core from your own IoT hardware devices / micro-controllers.

Your own Thing

If you want to try it out, you will need to use the IoT Core Endpoint specific to your AWS Account and Region; you can either find it in the AWS IoT Core Console, or copy it from the CloudFormation Stack's Output:

IoT Core Endpoint

The Lambda Thing we tested above can be used to send MQTT messages to your own IoT device/micro-controller, as the 2nd set of certificates is configured with the neccessary IoT Core Policies to receive the MQTT messages sent to the Topic "cat-feeder/action", and the certificates is also configured with the policies to send MQTT messages to a second IoT Topic called "cat-feeder/states"

Your own Thing Architecture

I have a future blog that will demonstrate how to do this using MicroPython and a Seeed Studio XIAO ESP32C3 - so watch this space.

FeedMyFurBabies – Event-Sourcing using Amazon EventBridge

· 9 min read
Chiwai Chan
Tinkerer

In my previous AWS IoT Cat feeder project I used a Lambda function as the event handler each time the Seeed Studio AWS IoT 1-click button was pressed, the Lambda function in turn published an MQTT message to AWS Iot Core which is received by the Cat Feeder (via a Seeed Studio XIAO ESP32C3 micro-controller) to dispense food into either one of the cat bowls or both (depending on the type of press performed on the IoT button). The long term goal is to integrate the AWS IoT Cat Feeder with the Feed My Fur Babies project.

In this Part 2 of the Feed My Fur Babies blog series, I will be introducing the Event-Sourcing pattern to the https://www.feedmyfurbabies.com architecture; describe the benefits of designing an architecture around Event-Souring and an example implemented using Terraform. I recently learnt Terraform and I now prefer it over the native IaC.

Current state architecture

Here is the current state of the Cat Feeder architecture amd the IoT related resources previously deployed in AWS using CloudFormation:

Current State Architecture

The responsibilities of each of the resources deployed in the diagram prior to the introduction of the Event-Sourcing pattern into the architecture are:

  • AWS IoT 1-Click Button: This is an IoT button I physically press to emit an event to dispense food into one or both of the cat bowls, this button can be used anywhere where there is a WIFI connection
  • AWS IoT Core Certificates: Certificates are associated with resources and devices that interacts with the AWS IoT Core Service, either publishing an MQTT message to an AWS IoT Topic, or receiving an MQTT message from a Topic
  • AWS Lambda - IoT 1-Click Event Handler & sends an MQTT message to an Iot topic: This Lambda function is responsible for handling incoming events created by the AWS IoT 1-Click Button, as well as translating the event into an MQTT message before sending it to an AWS IoT Core Topic. This is the component in the architecture that is the main focus of this blog post, we will describe how this component will be re-architectured and decomposed to work in conjunction with the introduction of the Event-Sourcing pattern.
  • AWS IoT Core: This is the IoT service that manages the IoT Topics and Subcriptions to said Topics
  • Seeed Studio XIAO ESP32C3: a micro-controller subscribed to the IoT Topic (the one the Lambda sent MQTT messages to) that will dispense food into 1 or 2 cat bowls when it receives an MQTT message from the Topic

For further details on what role this architecture plays in the Smart IoT Cat Feeder, visit Part 2 of the Smart Cat Feeder Blog Series.

What is Event-Sourcing?

The idea of Event-Sourcing is to capture all events that occurs within a system during its lifetime, these events are stored in an immutable ledger in the sequence in which they occurred in.

One of the biggest benefits of capturing all the events of a system is that we are able to replay every single event that has ever occured within the system (partially or as a whole) at a later time (lets say 5 years later), and have the ability to selectively replay the 5 years worth of events to one or more specific downstream event bus targets: an event bus target could be a new application that was deployed into your production environment 5 years after the first event was created; what this means is that we could hydrate this new application's datastore with 5 years worth of data as if it existed at the beginning when the first event occured. Also, imagine being able to re-create entire datastores with the full history for 100s of applications (where each application has its own datastore) within your system landscape, these datastores could be hydrated with the full history of events stored in the immutable Event-Sourcing ledger, or even replay the events that occur from the very first event and up to a specific event at a given point in time (e.g. half of the entire ledge) - effectively providing you with the ability to create any datastore in any datastore engine with the data inside in a state to any given point in time.

How do we introduce Event-Sourcing into the architecture?

Step 1

We start off with the AWS Lambda function shown in the current state architecture where its responsibilites is to handle the events received from the AWS IoT 1-Click Button each time it is pressed, as well as sending an MQTT message to an AWS Iot Core Topic in response to each incoming event; essentially it has 2 distinct responsibilities


Step 2

Next, we decompose the single Lambda function into 2 separate distinct Lambda functions based on its 2 responsibilities, then we chain the 2 Lambda functions together to preserve its functionality - what we have effectively achieved by doing this is decoupling the 2 responsibilities as 2 separate units of work - resulting in 2 separate compute resources.

The benefits by a decoupled architecture are:

  • Each of the Lambda functions can be implemented in different languages - e.g. one in Python and the other can be in Java
  • Independent release cycles for each of the Lambda functions
  • Changes to either one of the 2 responsibilities can be made independently of each other
  • Each Lambda function can be scaled independently of another

Step 3

In this step we use Amazon EventBridge as the Event-Sourcing store - known as the immutable ledger we described earlier, we will also leverage EventBridge as a serverless event bus to help us receive, filter, transform, route and deliver events to downstream services (event bus targets). In this instance we will slip EventBridge in between the 2 Lambda functions and we will be storing every single IoT event sent by the IoT Button into the immutable ledge,

Benefits of adding EventBridge to the architecture:

  • The IoT 1-Click Lambda handler no longer directly calls the downstream Lambda function - so it is unaware of the downstream targets
  • The IoT events are stored in an immutable ledger in the sequence in which they occurred in
  • Prepare the system landscape with the ability to more easily develop micro-services in an Event-Driven architecture using the orchestration pattern

Target State Architecture

Target State Architecture

This is the end result of introducing Event-Sourcing to the architecture; it may not look like much benefits has been gained from adding Amazon EventBridge - in fact one might think that we've added more components and in effect created more moving parts and complexity. But I have decided to specifically introduce this very early into the architecture as an investment so that I am in a position to rapdily build out my micro-service architecture - reaping the rewards from the get go.

Try it out for yourself

I have created a GitHub Repository to deploy a complete working example of the resources shown in the Target State Architecture using Terraform.

I suggest you deploy this to have a play for yourself:

  1. Clone the repository: "git clone git@github.com:chiwaichan/feedmyfurbabies-202303-eventsourcing-using-eventbridge.git"
  2. Setup your Terraform environment
  3. Run: "terraform init && terraform apply"

Also, check out each individual resource deployed by this Terrafrom code.

Create a test IoT 1-Click event to pass the event end-to-end through all the deployed resources

This is the IoT 1-Click Lambda function handler shown in the AWS Console

1-Click Handler Lambda

Create a test event so we can invoke the Lambda function to simulate an event as if a physical IoT Button is pressed

1-Click Handler Lambda - Test Event

Here we can view the logs for this Lambda function Test invocation

1-Click Handler Lambda - Test Event

The IoT 1-Click Lambda function handler sends an Event to the Custom EventBridge Event Bus named "feedmyfurbabies"

EventBridge Event Bus

The event sent to the Custom Event Bus matches on the "source" attribute with a value of "com.feedmyfurbabies" with the Custom Event Bus Rule named "feeds-rule"

EventBridge Event Bus Rule

This Lambda function is the downstream target of the Custom Event Bus Rule that was mactched by the event and is responsible for interpreting the event message and translate it into an MQTT message, then in turn sends it to the AWS IoT Core Topic "cat-feeder/action" that you can subscribe to using a micro-controller, e.g. Seeed Studio XIAO ESP32C3.

Send MQTT Message Lambda

Send MQTT Message Lambda - Monitoring

Here we can see the logs of the event received by the EventBridge Custom Bus Rule

Send MQTT Message Lambda - Logs

In the AWS Console for the AWS Iot Core Service, we can subscribe to Topics to receive an MQTT message right at the end of the downstream services - this is useful if you don't use a micro-controller

IoT Core - MQTT Client Subscribe Topic

Future State Architecture

Future State Architecture

We end up with an architecture that will enable us to easily add targets to consume events managed by the EventBridge Custom Event Bus, doing so in a way where the IoT 1-Click Lambda function has no knowledge of any newly created subscribers of the Custom Event Bus.

In a future blog I will demonstrate this.

Feed My Fur Babies – AWS Amplify and Route53

· 4 min read
Chiwai Chan
Tinkerer

I'm starting a new blog series where I will be documenting my build of a full-stack Web and Mobile application using AWS Amplify to implement both the frontend, as well as the backend; whilst developing dependent downstream Services outside of Amplify using AWS Serverless components to implement a Micro-Service architecture using Event-Driven design pattern - where we will break the application up into smaller consumable chunks that works together well.

Since we are creating from scratch a completely new application, I will also incorporate a vital pattern that will reduce complexity throughout the lifetime of the application: we will also be implementing the application using the Event-Sourcing pattern - this pattern ensures every Event ever published within a system is stored within an immutable ledger; this ledger will enable new Data Stores of any Data Store Engine to be created at any given time by replaying the Events in the ledger, of Events created from a start date and time to an end Date and Time.

CQRS is a pattern I will write up about with great detailed in a blog in the near future, CQRS will enable the ability to create mulitple Data Stores with identical data, each Data Store using a unique Data Store Engine instance.

What is AWS Amplify?

Amplify is an AWS Service that provides any frontend web or mobile developers with no cloud expertise the ability to build and host full-stack applications on AWS. As a frontend developer, you can leverage it to build and integrate AWS Services and components into your frontend without having to deal with the underlying AWS Services; all Services the frontend is built on top of is managed by AWS Amplify - e.g. no need to managed CloudFormation Stacks, S3 Storage or AWS Cognito.

What will I be doing with Amplify

My experience from a while ago was full-stack application development and I have worked under that role for over 10 years, I've used various frontend/backend frameworks, components and patterns.

I will be building a website called Feed My Fur Babies where I will provide video streams showing live feeds of my cats from web cams placed in various spots around my house, the website will also provide users with the ability to feed my cats using internet enabled devices like the IoT Cat Feeders I recently put together and watch them hoon on their favorite treats; although I am experienced with building websites from the ground up using AWS Service, I am aiming to build Feed My Fur Babies whilst leveraging as little as possible on that experience - this is so I am building the website as close to the targeted demographics skillset of a typical Amplify as possible, i.e. as a developer with only frontend experience.

Current Architecture State

Architecture

Update

Let's talk about what was done to get to the current architecture state.

First thing I did was buying the domain feedmyfurbabies.com using AWS Route53.

Route53 Feed My Fur Babies

Next, I created a new Amplify App called "Feedme".

Amplify - App

Within the App I created two Hosted Environments: one environment is to host the production environment, the other is to host a development environment. Each Hosted Environment is configured to be built and deployed from a specfic Branch in the shared CodeCommit Repository used to version control the frontend source code.

Amplify - Hosted Environments

Amplify - Hosted Environment - Main

Amplify - Hosted Environment - Dev

Leveraging AWS Prefix Lists

· 8 min read
Chiwai Chan
Tinkerer

AWS VPC Prefix List is a feature of the AWS Networking that has been around for a short while, however, I have yet to see it leveraged to its full potential, and more often than not I have not seen them used at all.

There are 2 types of Prefix Lists:

  • AWS-managed Prefix Lists: as the name indicates these lists are managed by AWS, and they are used to maintain a set of IP address ranges for AWS services, e.g. S3, DynamoDB and CloudFront.
  • Customer-managed Prefix Lists: these are created and maintained by anyone who has access to the AWS Console, AWS APIs or AWS SDKs. This is what we will be focusing on.

In this blog we will go into:

  • What Customer-managed Prefix Lists are
  • How they can be leveraged by AWS Security Groups
  • How they can be leveraged by AWS Subnet Route Tables
  • How they can be leveraged by AWS Transit Gateway Route Tables
  • Considerations

AWS VPC Customer-managed Prefix List is a great tool to have available as it provides the ability to track and maintain a list of CIDR block values, which can then be referenced by other AWS Networking components in their rules or route tables. Each Prefix List supports either IPv4 or IPv6 based addresses, and a number of expected Max Entries for the list must be defined; the number of entries in the list cannot exceed the Max Entries.

You can use Prefix List to maintain a list of CIDR blocks of Subnets or VPCs; or, track a list of similiar IP addresses based on a grouping of your choice, e.g. EC2 instances with a certain function - you can even track CIDR values of Subnets, VPCs and EC2 within the same list.

I have a blog on how to automatically maintain a list of EC2 instances Private IP addresses based on a Tag set against an EC2 instance: Maintain a Prefix List of EC2 Private IP Addresses using EventBridge

Let's create a Prefix List in the AWS Console

1

2

3

Prefix List – Security Group Reference

Customer-managed Prefix List is great option to have to centrally manage and track a list of CIDR blocks allowed to ingress an ENI by referencing Prefix Lists in Security Groups, a single Prefix List instance can be referenced by one or many Security Groups within the same account or cross-account.

Let's take a look at an example

This is especially useful in scenarios where you have fleet of EC2 instances where you like to allow the same network traffic sources to ingress on Port 22 to perform administration tasks, these fleet EC2 instances could scatter across multiple VPCs, and may even be scattered across multiple AWS accounts.

4

Often, we add a new Source CIDR to all Security Groups as we allow a new machine to perform administration tasks to the same fleet of EC2 instances, or even remove (or not when we forget) a CIDR Source when a machine is retired. In the past we would have modified each and every one of these Security Groups.

Here is how we can leverage Customer-managed Prefix Lists with Security Groups:

5

Here, under the same Security Group rules outcome we externalise the CIDR values into a Prefix List and reference the list in all 3 Security Groups; in the case of Security Groups spanning across multiple AWS accounts the Prefix Lists can be shared with other AWS accounts using Resource Access Manager (RAM). Now, we can allow a new machine to perform administration tasks across the entire fleet of EC2 instances by only adding a new CIDR Source to a single location, conversely, we can remove a machine by deleting a CIDR Source. There is also an added benefit of reduced effort in the need to identify which Security Groups have a rule for an IP address if we were to remove access across the entire fleet using this pattern – because it is maintained in a single location.

Prefix List – Subnet Route Table Reference

Another way to use Prefix Lists is to use them to centrally manage and track a list of CIDR block destinations to route traffic out of a Subnet’s Route Table to the same Target, a Prefix List can be referenced by one or many Subnet Route Tables within the same account or cross-account using RAM.

Let's take a look at an example

Below, we have a scenario with 3 different Route Tables across the two VPCs, with each Route Table with the same Transit Gateway Target for the same set of Destinations; and also the same Destinations routed to their respective Egress Only Internet Gateway (EIGW) for their VPC.

6

Here is how we can leverage Customer-managed Prefix Lists with Subnet Route Tables:

7

We have externalised the Destination CIDR values of the 3 Route Tables into 2 separate Prefix Lists: 1st Prefix List contains the CIDR block values of Destinations routed for the EIGW in their respective VPC; the 2nd Prefix List contains CIDR block values of Destinations routed for the same Transit Gateway instance all VPCs is an attachment of.

Prefix List – Transit Gateway Route Table Reference

Lastly, in a Transit Gateway Route Table you have the option to either to define static routes or have routes dynamically propagated from a Transit Gateway attachment. You also have the option to use a Prefix List for routing.

Here is how we can leverage Customer-managed Prefix Lists with Transit Gateway Route Tables:

8

To reference a Prefix List in a Transit Gateway Route Table, you have to reference it under the "Prefix list references" section:

9

Considerations

  • The aggregated total Max Entries of all Prefix Lists referenced by a resource (e.g. a Security Group) is counted towards the resource's quota - not the aggregated total of actual entries of all Prefix Lists. Be conscious of the Prefix List you reference in a resource, does the resource referencing the Prefix List require all the CIDR values offered in the list? if not, you are not using Prefix Lists economically.
  • If the same Prefix List instance is referenced by multiple AWS resources then consistency is enforced - operational effort is reduced due to fewer changes by not having to change a values in multiple locations.
  • Before you add or remove a CIDR value from a Prefix List, consider the flow on impact it may have to the downstream resources that reference this list, as you may inadvertently terminate some traffic flow, or worse, open up traffic to sources you don't intend to.

Conclusion

One of the things I have noticed during my short time in consulting so far is that organising Cloud resources (in particular Networking), structuring them correctly and consistently across multiple environments will set up a solid foundation for organisations in the long term, however, it is often an area that is overlooked and is only paid attention to when the rate of innovation is slowed down due to complexities and inconsistencies. Prefix Lists is a great option to have to improve consistency and operational efficiencies.

Here I have only detailed the basic use of Customer-managed Prefix Lists, but in my other blog I have a more advanced use case leveraging Prefix Lists: Work-around for cross-account Transit Gateway Security Group Reference

This solution compliments the use of networking solutions in other blogs I have written:

Maintain a Prefix List of EC2 Private IP Addresses using EventBridge

· 8 min read
Chiwai Chan
Tinkerer

AWS VPC customer-managed prefix list is a great feature to have in a tool box as it provides the ability to track and maintain a list of CIDR block values, that can be referenced by other AWS Networking component’s in their rules and tables. Each Prefix List supports either IPv4 or IPv6 based addresses, and a number of expected Max Entries for the list must be defined; the number of entries in the list cannot exceed the Max Entries. Check out my blog on AWS Prefix List to learn how it could be referenced and leveraged by other AWS Networking components.

In this blog we will:

  • Walk-through the proposed solution
  • Deploy the solution from a SAM project hosted in my GitHub repository
  • Stop the running EC2 instance provisioned by the SAM project's CloudFormation stack - this will de-register the Private IP address of the EC2 instance from the Prefix List (also provisioned by the CloudFormation stack)
  • Start the same EC2 instance - this will register the Private IP address of the provisioned EC2 instance back into the Prefix List
  • Manually create an EC2 instance with a Tag value of "prefix-list=eventbridge-managed-prefix-list"

In this solution we propose an architecture to maintain a list of EC2 Private IPs in a Prefix List by leveraging EventBridge to listen for EC2 Instance State Change Events.

1

Depending on the EC2 Instance State Change value we will perform a different action against the Prefix List using a Lambda Function: if the Instance State is “running" then we register the Private IP address into the Prefix List; or, deregister the Private IP address from the Prefix list when the Instance State is “stopping”.

2

When the event is received by the Lambda function, it will perform a lookup on the Tags of the EC2 instance for a Tag (e.g. prefix-list=eventbridge-managed-prefix-list) that indicates which Prefix List (or Lists) the Lambda function will register/de-register the Private IP against. The Prefix List should be maintained economically - because it affects the quotas of resources that reference this Prefix List as described by the AWS documentation: Prefix lists concepts and rules, so the Lambda function should ideally set the Prefix List Max Entries to the number of entries expected in the list before an entry is registered, or, afterwards if an entry de-registered.

By maintaining a Prefix List and leveraging this pattern in your solutions, your solutions may potentially benefit in the following ways:

  • Reusability of configurations which will reduce the operational burden and improve consistency.
  • Re-use of Prefix Lists by sharing it with other AWS accounts by leveraging Resource Access Manager
  • Creates an automated mechanism to track and maintain a definitive list of Private IP addresses of similarly grouped of EC2 instances with non-deterministic IP addresses
  • High cohesion and low Coupling designs: reduce manual flow on changes when a change is implemented
  • Leverage programmatic mechanisms for automatically changes and maintenance – minimise deployments and/or manual tasks
  • Improve Security posture: this may potentially reduce occurances of overly broad CIDR values used in rules or route tables where it is used to encompass a few number of IP address within a wide IP range

Deploying the solution

Here we will walk-through the steps involved to deploy a SAM project of this solution hosted in my GitHub repository: https://github.com/chiwaichan/prefix-list-of-ec2-private-ip-addresses-using-eventbridge

Prerequisites:

Run the following command to checkout the code

git clone git@github.com:chiwaichan/prefix-list-of-ec2-private-ip-addresses-using-eventbridge.git

cd prefix-list-of-ec2-private-ip-addresses-using-eventbridge/

3

Run the following command to configure the SAM deploy

sam deploy --guided

Enter the following arguments in the prompt:

  • Stack Name: prefix-list-of-ec2-private-ip-addresses-using-eventbridge
  • AWS Region: ap-southeast-2 or the value of your preferred Region
  • Parameter ImageID: ami-0c6120f461d6b39e9 (the Amazon Linux AMI ID in ap-southeast-2), you can use any AMI ID for your Region
  • Parameter SecurityGroupId: the Security Group ID to use for the EC2 instance provisioned, e.g. sg-0123456789
  • Parameter SubnetId: the Subnet ID of the Subnet to deploy the EC2 instance in, e.g. subnet-0123456678

4

5

6

Confirm the deployment

Let's check to see that everything has been deployed correctly in our AWS account.

Here we can see the list of AWS resources deployed in the CloudFormation Stack

7

Here we can see the details of the EC2 instance provisioned in a "Running" state. Take note of the Private IPv4 address. 8

This is the Prefix List provisioned; here we can see the Private IPv4 address of the EC2 instance in the Prefix list entries. Also, note that the Max Entries is currently set to 1. 9

Stopping the running EC2 Instance

Let's stop the EC2 instance

10

We should see the Private IP address of the EC2 instance removed from the Prefix List Entries, the Max Entries remains as 1 - this is because the minimum value must be 1 even when there are no Entries in the Prefix List

11

This is the sniplet of Python code in the Lambda function that removes the Private IP address from the Prefix List:

# if the instance state change is 'stopping' so we remove the private IP CIDR to the Prefix List
elif ec2_state == "stopping":
if is_in_list:
print("remove")

response = client.modify_managed_prefix_list(
PrefixListId=prefix_list_id,
CurrentVersion=current_prefix_list_version,
RemoveEntries=[
{
'Cidr': private_id_address + "/32"
},
]
)

if len(current_entries) != 1:
sleep(3)

response = client.modify_managed_prefix_list(
PrefixListId=prefix_list_id,
MaxEntries=len(current_entries) - 1
)
else:
print("not in list so no action")

Starting the stopped EC2 Instance

Let's start the EC2 instance

12

We should see the Private IP address of the EC2 instance added back to the Prefix List Entries. Note the description is different to what it was when we first saw it earlier.

13

This is the sniplet of Python code in the Lambda function that adds the Private IP address to the Prefix List:

# if the instance state change is 'running' so we add the private IP CIDR to the Prefix List
if ec2_state == "running":
if is_in_list:
print("already in list so no action")
else:
print("add")

if len(current_entries) + 1 != prefix_list["MaxEntries"]:
response = client.modify_managed_prefix_list(
PrefixListId=prefix_list_id,
MaxEntries=len(current_entries) + 1
)

sleep(3)

response = client.modify_managed_prefix_list(
PrefixListId=prefix_list_id,
CurrentVersion=current_prefix_list_version,
AddEntries=[
{
'Cidr': private_id_address + "/32",
'Description': 'added by EventBridge Lambda'
},
]
)

Manually create an EC2 instance with a Prefix List Tag

Let's launch a new EC2 instance (using any AMI and deploy it in any Subnet with any Security Group) with a value of "eventbridge-managed-prefix-list" for the "prefix-list" Tag, the EventBridge and Lambda will register the Private IP address of this newly created instance into the Prefix List "eventbridge-managed-prefix-list".

14

15

16

Here we see the Private IP address of the new manually created EC2 instance appear in the Prefix List Entries; also, the Max Entries has been updated to 2 by the Lambda function.

17

FYI, You can adapted this pattern and Lambda function to add or remove Private IP addresses based on the EC2 instance state change value of your choosing.

Clean up

  • Delete the manually created EC2 instance; afterwards, you can see it removed from the Prefix List and the Prefix List's Max Entries decreased back down to 1 by the Lambda function
  • Delete the CloudFormation stack with the name "prefix-list-of-ec2-private-ip-addresses-using-eventbridge"

This solution compliments the use of networking solutions in other blogs I have written: