I Got Me One of These - NVIDIA Jetson AGX Thor Developer Kit

January 24, 2026 · One min read

Chiwai Chan

Tinkerer

I got me one of these!

NVIDIA Jetson AGX Thor Developer Kit

Jetson AGX Thor Unit

This thing is a beast and I am keen to see what kind of AI robotics experiments I can create from this.

Real-Time Voice Chat with Amazon Nova Sonic using React and AWS Amplify Gen 2

January 19, 2026 · 8 min read

Chiwai Chan

Tinkerer

These days I am often creating small generic re-usable building blocks that I can pontentially use across new or existing projects, in this blog I talk about the architecture for a LLM based voice chatbot in a web browser built entirely as a serverless based solution.

The key component of this solution is using Amazon Nova 2 Sonic, a speech-to-speech foundation model that can understand spoken audio directly and generate voice responses - all through a single bidirectional stream from the browser directly to Amazon Bedrock, with no backend servers required - no EC2 instances and no Containers.

Goals

Enable real-time voice-to-voice conversations with AI using Amazon Nova 2 Sonic
Direct browser-to-Bedrock communication using bidirectional streaming - no Lambda functions or API Gateway required
Use AWS Amplify Gen 2 for infrastructure-as-code backend definition in TypeScript
Implement secure authentication using Cognito User Pool and Identity Pool for temporary AWS credentials
Handle real-time audio capture, processing, and playback entirely in the browser
Must be a completely serverless solution with automatic scaling
Support click-to-talk interaction model for intuitive user experience
Display live transcripts of both user speech and AI responses

Architecture

End-to-End Voice Chat Flow

This diagram illustrates the complete flow from a user speaking into their microphone to hearing the AI assistant's voice response.

End-to-End Voice Chat Flow

Flow Steps:

User Speaks - User clicks the microphone button and speaks naturally
Audio Capture - Browser captures audio via Web Audio API at 48kHz
Authentication - React app authenticates with Cognito User Pool
Token Exchange - JWT tokens exchanged for Identity Pool credentials
AWS Credentials - Temporary AWS credentials (access key, secret, session token) returned
Bidirectional Stream - Audio streamed to Bedrock via InvokeModelWithBidirectionalStream
Voice Response - Nova Sonic processes speech and returns synthesized voice response
Audio Playback - Response audio decoded and played through speakers
User Hears - User hears the AI assistant's natural voice response

Interactive Sequence Diagram

Voice Chat Sequence Flow

From user speech to AI voice response via Amazon Nova 2 Sonic

0/21

User

Browser

Cognito

Bedrock

Nova

Milestone

Complete

Total: 21 steps across 5 components

~4 seconds end-to-end latency

React Hooks Architecture

This diagram details the internal architecture of the React application, showing how custom hooks orchestrate audio capture, Bedrock communication, and playback.

React Hooks Architecture

Components:

VoiceChat.tsx - Main UI component that coordinates all hooks and renders the interface
useNovaSonic - Core hook managing Bedrock bidirectional stream, authentication, and event protocol
useAudioRecorder - Captures microphone input using AudioWorklet in a separate thread
useAudioPlayer - Manages audio playback queue and Web Audio API buffer sources
audioUtils.ts - Low-level utilities for PCM conversion, resampling, and Base64 encoding

Data Flow:

Microphone audio captured by useAudioRecorder via MediaStream
AudioWorklet processes samples in real-time (separate thread)
Audio resampled from 48kHz to 16kHz, converted to PCM16, then Base64
useNovaSonic streams audio chunks to Bedrock
Response audio received as Base64, decoded to PCM, converted to Float32
useAudioPlayer queues AudioBuffers and plays through AudioContext

Authentication Flow

This diagram shows the multi-layer authentication flow that enables secure browser-to-Bedrock communication without exposing long-term credentials.

Authentication Flow

Authentication Layers:

Cognito User Pool - Handles user registration and login with email/password
Cognito Identity Pool - Exchanges JWT tokens for temporary AWS credentials
IAM Role - Defines permissions for authenticated users (Bedrock invoke access)
SigV4 Signing - AWS SDK automatically signs all Bedrock requests

Key Security Features:

No AWS credentials stored in browser - only temporary session credentials
Credentials automatically refreshed by Amplify SDK before expiration
IAM policy scoped to specific Bedrock model (amazon.nova-2-sonic-v1:0)
All communication over HTTPS with TLS 1.2+

Audio Processing Pipeline

This diagram shows the real-time audio processing that converts browser audio to Bedrock's required format and vice versa.

Audio Processing Pipeline

Input Processing (Recording):

Microphone - Browser captures audio at native sample rate (typically 48kHz)
AudioWorklet - Processes audio in separate thread, accumulates 2048 samples
Resample - Linear interpolation converts 48kHz → 16kHz (Nova Sonic requirement)
Float32 → PCM16 - Converts floating point [-1,1] to 16-bit signed integers
Base64 Encode - Binary PCM encoded for JSON transmission

Output Processing (Playback):

Base64 Decode - Received audio converted from Base64 to binary
PCM16 → Float32 - 16-bit integers converted to floating point
AudioBuffer - Web Audio API buffer created at 24kHz (Nova Sonic output rate)
Queue & Play - Buffers queued and played sequentially through speakers

Interactive Sequence Diagram

Audio Processing Pipeline

Real-time audio capture, format conversion, and playback

0/16

Mic

Worklet

Utils

Hook

Bedrock

Player

Milestone

Audio Format Conversions

Input: 48kHz Float32 → 16kHz PCM16 → Base64 | Output: Base64 → PCM16 → Float32 → 24kHz AudioBuffer

Bidirectional Streaming Protocol

This diagram illustrates how the useNovaSonic hook manages the complex bidirectional streaming protocol with Amazon Bedrock.

Bidirectional Streaming Protocol

Event Protocol: Nova Sonic uses an event-based protocol where each interaction consists of named sessions, prompts, and content blocks.

Input Events (sent to Bedrock):

sessionStart - Initializes session with inference parameters (maxTokens: 1024, topP: 0.9, temperature: 0.7)
promptStart - Defines output audio format (24kHz, LPCM, voice "matthew")
contentStart - Marks beginning of content blocks (TEXT for system prompt, AUDIO for user speech)
textInput - Sends system prompt text content
audioInput - Streams user audio chunks as Base64-encoded 16kHz PCM
contentEnd - Marks end of content block
promptEnd / sessionEnd - Terminates prompt and session

Output Events (received from Bedrock):

contentStart - Marks role transitions (USER for ASR, ASSISTANT for response)
textOutput - Returns transcribed user speech and generated AI response text
audioOutput - Returns synthesized voice response as Base64-encoded 24kHz PCM
contentEnd - Marks end of response content

Async Generator Pattern: The SDK requires input as AsyncIterable<Uint8Array>. The hook implements this using:

Event Queue - Pre-queued initialization events before stream starts
Promise Resolver - Backpressure control for yielding events on demand
pushEvent() - Adds new events during conversation (audio chunks)

Serverless Architecture Overview

This diagram provides a comprehensive view of all components - the entire solution is serverless with no EC2 instances or containers to manage.

Serverless Architecture Overview

Frontend Stack:

React - Component-based UI framework
Vite - Build tool and dev server
TypeScript - Type-safe development
AWS Amplify Hosting - Static web hosting with global CDN

Backend Stack (Amplify Gen 2):

amplify/backend.ts - Infrastructure defined in TypeScript
Cognito User Pool - Email-based authentication
Cognito Identity Pool - AWS credential vending
IAM Policy - Grants bedrock:InvokeModel permission for bidirectional streaming

AI Service:

Amazon Bedrock - Managed foundation model inference
Nova 2 Sonic - Speech-to-speech model (us-east-1)
Bidirectional Streaming - Real-time duplex communication

Technical Challenges & Solutions

Challenge 1: AudioWorklet CORS Issues

Problem: Loading AudioWorklet from external file fails with CORS errors on some deployments.

Solution: Inline the AudioWorklet code as a Blob URL:

const blob = new Blob([audioWorkletCode], { type: 'application/javascript' });
const workletUrl = URL.createObjectURL(blob);
await audioContext.audioWorklet.addModule(workletUrl);
URL.revokeObjectURL(workletUrl);

Challenge 2: Sample Rate Mismatch

Problem: Browsers capture audio at 48kHz, but Nova Sonic requires 16kHz input.

Solution: Linear interpolation resampling in real-time:

const resampleAudio = (audioData: Float32Array, sourceSampleRate: number, targetSampleRate: number) => {
  const ratio = sourceSampleRate / targetSampleRate;
  const newLength = Math.floor(audioData.length / ratio);
  const result = new Float32Array(newLength);
  for (let i = 0; i < newLength; i++) {
    const srcIndex = i * ratio;
    const floor = Math.floor(srcIndex);
    const ceil = Math.min(floor + 1, audioData.length - 1);
    const t = srcIndex - floor;
    result[i] = audioData[floor] * (1 - t) + audioData[ceil] * t;
  }
  return result;
};

Challenge 3: SDK Bidirectional Stream Input

Problem: AWS SDK requires input as AsyncIterable<Uint8Array>, but events need to be pushed dynamically during the conversation.

Solution: Async generator with event queue and promise-based backpressure:

async function* createInputStream() {
  while (isActiveRef.current && !ctrl.closed) {
    while (ctrl.eventQueue.length > 0) {
      yield ctrl.eventQueue.shift();
    }
    const nextEvent = await new Promise(resolve => {
      ctrl.resolver = resolve;
    });
    if (nextEvent === null) break;
    yield nextEvent;
  }
}

Getting Started

GitHub Repository: https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat

Prerequisites

Node.js 18+
AWS Account with Bedrock access enabled
AWS CLI configured with credentials

Deployment Steps

Enable Nova 2 Sonic in Bedrock Console (us-east-1 region)
Clone and Install:

git clone https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat.git
cd amplify-react-amazon-nova-2-sonic-voice-chat
npm install

Start Amplify Sandbox:

npx ampx sandbox

Run Development Server:

npm run dev

Open Application: Navigate to http://localhost:5173, create an account, and start talking!

Summary

This architecture provides a reusable building block for voice-enabled AI applications:

Zero backend servers - Direct browser-to-Bedrock communication
Real-time streaming - HTTP/2 bidirectional streaming for low latency
Secure authentication - Cognito User Pool + Identity Pool + IAM policies
Audio processing pipeline - Web Audio API, AudioWorklet, PCM conversion
Infrastructure as code - AWS Amplify Gen 2 with TypeScript backend definition

The entire interaction happens in real-time: speak naturally, and hear the AI respond within seconds.

Cloud-Connected Sphero RVR Robot with AWS IoT Core and Seeed Studio XIAO ESP32S3

January 19, 2026 · 4 min read

Chiwai Chan

Tinkerer

Part of ProjectRoboticsCloud-connected robotics with Sphero RVR and Hugging Face LeRobot

Seeed Studio XIAO ESP32S3

Sphero RVR

A Sphero RVR integrated with a Seeed Studio XIAO ESP32S3 with telemetry uploaded into, and also, basic drive remote control commands received from any where leveraging AWS IoT Core.

Overview

Lately I have been aiming to go deep on AI Robotics, and last year I have been slowly experimenting more and more with anything that is AI, IoT and Robotics related; with the intention of learning and going as wide and as deep as possible in any pillars I can think of. You can check out my blogs under the Robotics](/projects/robotics) Project to see what I have been up to. This year I want to focus on enabling mobility for my experiments - as in providing wheels for solutions to move around the house, ideally autonomously; starting off with wheel based solutions bought off-shelve, followed by solutions that I build myself from open-sourced projects people have kindly contirbuted online, and then ambitiously designed, 3D Printed and built all from the ground up - perhaps in a couple of years time.

This project uses a Seeed Studio XIAO ESP32S3 microcontroller to communicate with a Sphero RVR robot via UART, while simultaneously connecting to AWS IoT Core over WiFi. The system publishes real-time sensor telemetry and accepts remote drive commands through MQTT.

Hardware Components

Component	Description
Seeed Studio XIAO ESP32S3	Compact ESP32-S3 microcontroller with WiFi, 8MB flash
Sphero RVR	Programmable robot with motors, IMU, color sensor, encoders
XIAO Expansion Board	Provides OLED display (128x64 SSD1306) for status info

Hardware Wiring

Features

Real-time Telemetry

The system publishes comprehensive sensor data every 60 seconds:

IMU Data: Pitch, roll, yaw orientation
Accelerometer & Gyroscope: Motion and rotation data
Color Sensor: RGB values with confidence
Compass: Heading in degrees
Ambient Light: Lux measurements
Motor Thermal: Temperature and protection status
Encoders: Wheel tick counts
Position & Velocity: Locator data in meters

Remote Commands via MQTT

Control the RVR from anywhere using JSON commands:

Drive: Speed and heading control
Tank: Independent left/right motor control
Raw Motors: Direct motor speed control
LED Control: Headlights, brakelights, status LEDs
Navigation: Reset yaw, reset locator
Power: Wake and sleep commands

Local OLED Display

The XIAO Expansion Board's OLED display shows real-time sensor readings for local monitoring.

MQTT Message Flow

Sensor Data Pipeline

Architecture

The XIAO ESP32S3 acts as a bridge between the Sphero RVR and AWS IoT Core:

UART Communication: The ESP32S3 communicates with the RVR via UART (GPIO43/44)
WiFi Connection: Connects to local WiFi network
MQTT over TLS: Secure connection to AWS IoT Core with X.509 certificates
Bidirectional: Publishes telemetry and subscribes to command topics

High-Level System Architecture

Communication Protocol Stack

Sphero RVR Protocol

The Sphero RVR uses a binary packet-based protocol over UART. Each packet contains a start-of-packet byte (0x8D), an 8-byte header with device ID and command ID, variable-length data body, checksum, and end-of-packet byte (0xD8). The RVR has two internal processors: Nordic (handles BLE, power, color detection) and ST (handles motors, IMU, encoders).

Sphero RVR Protocol Architecture

Source Code

I ported the code into this project to control the RVR using the UART protocol based on the Sphero SDK.

You can find the source code for this project here: https://github.com/chiwaichan/platformio-aws-iot-seeed-studio-esp32s3-sphero-rvr

AWS Community Day 2025 Aotearoa New Zealand Slides

September 28, 2025 · One min read

Chiwai Chan

Tinkerer

AWS Community Day 2025 Aotearoa New Zealand

September 18, 2025 · One min read

Chiwai Chan

Tinkerer

AWS Community Day Image 7

AWS Community Day Image 6

AWS Community Day Image 5

AWS Community Day Image 1

AWS Community Day Image 2

AWS Community Day Image 3

AWS Community Day Image 4

Controlling Hugging Face LeRobot SO101 arms over AWS IoT Core using a Seeed Studio XIAO ESP32C3

August 21, 2025 · One min read

Chiwai Chan

Tinkerer

Part of ProjectRoboticsCloud-connected robotics with Sphero RVR and Hugging Face LeRobot

LeRobot Architecture

Seeed Studio XIAO ESP32C3 and Bus Servo Driver Board

The LeRobot Follower arm is subscribed to an IoT Topic that is being published in real-time by the LeRobot Leader arm over AWS IoT Core, using a Seeed Studio XIAO ESP32C3 integrated with a Seeed Studio Bus Servo Driver Board, the driver board is controlling the 6 Feetech 3215 Servos over the UART protocol.

In this video I demonstrate how to control a set of Hugging Face SO-101 arms over AWS IoT Core, without the use of the LeRobot framework, nor using a device such as a Mac nor a device like Nvidia Jetson Orin Nano Super Developer Kit. Only using Seeed Studio XIAO ESP32C3 and AWS IoT.

You can find the source code for this solution here: https://github.com/chiwaichan/aws-iot-core-lerobot-so101

Good Times

July 13, 2025 · One min read

Chiwai Chan

Tinkerer

Good times

Good Times

AWS Sydney Summit 2025

June 4, 2025 · One min read

Chiwai Chan

Tinkerer

AWS Sydney Summit Booth

I was hungry and tried to feed myself but I forgot to bring cat food.

Here is the source-code Streamlit App I demoed at the AWS Sydney Summit, to demonstrate how to control IoT devices using the Strands Agents: https://github.com/chiwaichan/AWSSydneySummit2025Demo

Iron Man Mark 3 Helmet

May 3, 2025 · One min read

Chiwai Chan

Tinkerer

Part of ProjectIron Man Suit with AWS IoT3D printed Iron Man suit integrated with AWS IoT Core and ESP32

Iron Man Mark 85 Faceplate

October 24, 2024 · One min read

Chiwai Chan

Tinkerer

Part of ProjectIron Man Suit with AWS IoT3D printed Iron Man suit integrated with AWS IoT Core and ESP32

Iron Man Mark 85 Faceplate

Goals

Architecture

End-to-End Voice Chat Flow​

Interactive Sequence Diagram​

Voice Chat Sequence Flow

React Hooks Architecture​

Authentication Flow​

Audio Processing Pipeline​

Interactive Sequence Diagram​

Audio Processing Pipeline

Bidirectional Streaming Protocol​

Serverless Architecture Overview​

Technical Challenges & Solutions

Challenge 1: AudioWorklet CORS Issues​

Challenge 2: Sample Rate Mismatch​

Challenge 3: SDK Bidirectional Stream Input​

Getting Started

Prerequisites​

Deployment Steps​

Summary

Overview​

Hardware Components​

Hardware Wiring​

Features​

Real-time Telemetry​

Remote Commands via MQTT​

Local OLED Display​

MQTT Message Flow​

Sensor Data Pipeline​

Architecture​

Sphero RVR Protocol​

Source Code​

End-to-End Voice Chat Flow

Interactive Sequence Diagram

React Hooks Architecture

Authentication Flow

Audio Processing Pipeline

Interactive Sequence Diagram

Bidirectional Streaming Protocol

Serverless Architecture Overview

Challenge 1: AudioWorklet CORS Issues

Challenge 2: Sample Rate Mismatch

Challenge 3: SDK Bidirectional Stream Input

Prerequisites

Deployment Steps

Overview

Hardware Components

Hardware Wiring

Features

Real-time Telemetry

Remote Commands via MQTT

Local OLED Display

MQTT Message Flow

Sensor Data Pipeline

Architecture

Sphero RVR Protocol

Source Code