Skip to main content

15 posts tagged with "AWS"

AWS

View All Tags

Real-Time Voice Chat with Amazon Nova Sonic using React and AWS Amplify Gen 2

· 8 min read
Chiwai Chan
Tinkerer

These days I am often creating small generic re-usable building blocks that I can pontentially use across new or existing projects, in this blog I talk about the architecture for a LLM based voice chatbot in a web browser built entirely as a serverless based solution.

The key component of this solution is using Amazon Nova 2 Sonic, a speech-to-speech foundation model that can understand spoken audio directly and generate voice responses - all through a single bidirectional stream from the browser directly to Amazon Bedrock, with no backend servers required - no EC2 instances and no Containers.

Goals

  • Enable real-time voice-to-voice conversations with AI using Amazon Nova 2 Sonic
  • Direct browser-to-Bedrock communication using bidirectional streaming - no Lambda functions or API Gateway required
  • Use AWS Amplify Gen 2 for infrastructure-as-code backend definition in TypeScript
  • Implement secure authentication using Cognito User Pool and Identity Pool for temporary AWS credentials
  • Handle real-time audio capture, processing, and playback entirely in the browser
  • Must be a completely serverless solution with automatic scaling
  • Support click-to-talk interaction model for intuitive user experience
  • Display live transcripts of both user speech and AI responses

Architecture

End-to-End Voice Chat Flow

This diagram illustrates the complete flow from a user speaking into their microphone to hearing the AI assistant's voice response.

End-to-End Voice Chat Flow

Flow Steps:

  1. User Speaks - User clicks the microphone button and speaks naturally
  2. Audio Capture - Browser captures audio via Web Audio API at 48kHz
  3. Authentication - React app authenticates with Cognito User Pool
  4. Token Exchange - JWT tokens exchanged for Identity Pool credentials
  5. AWS Credentials - Temporary AWS credentials (access key, secret, session token) returned
  6. Bidirectional Stream - Audio streamed to Bedrock via InvokeModelWithBidirectionalStream
  7. Voice Response - Nova Sonic processes speech and returns synthesized voice response
  8. Audio Playback - Response audio decoded and played through speakers
  9. User Hears - User hears the AI assistant's natural voice response

Interactive Sequence Diagram

Voice Chat Sequence Flow

From user speech to AI voice response via Amazon Nova 2 Sonic

0/21
UserBrowserReact AppCognitoBedrockAmazon BedrockNovaNova 2 Sonic0.0sClick microphone buttonUser interaction0.1sfetchAuthSession()0.2sTemporary AWS credentials0.3sInvokeModelWithBidirectionalStreamHTTP/2 streaming0.4ssessionStart + promptStart + contentStart0.5sInitialize speech-to-speech model1.0sSpeak into microphoneAudio capture1.1sAudioWorklet: capture → resample → PCM → B...1.2saudioInput events (streaming chunks)Real-time3.0sClick mic to stop recording3.1scontentEnd + new contentStart (AUDIO)3.2sProcess audio → transcribe → generate resp...3.5stextOutput: transcription + response text3.6stextOutput event3.7sSynthesize voice response (24kHz)3.8saudioOutput events (streaming chunks)3.9sBase64 → PCM → Float32 → AudioBuffer4.0sPlay audio through speakers5.0sClick mic to continue conversationSame session5.1sStart recording (session already open)5.2sNew audioInput events (next turn)Repeat cycle
User
Browser
Cognito
Bedrock
Nova
Milestone
Complete
Total: 21 steps across 5 components
~4 seconds end-to-end latency

React Hooks Architecture

This diagram details the internal architecture of the React application, showing how custom hooks orchestrate audio capture, Bedrock communication, and playback.

React Hooks Architecture

Components:

  • VoiceChat.tsx - Main UI component that coordinates all hooks and renders the interface
  • useNovaSonic - Core hook managing Bedrock bidirectional stream, authentication, and event protocol
  • useAudioRecorder - Captures microphone input using AudioWorklet in a separate thread
  • useAudioPlayer - Manages audio playback queue and Web Audio API buffer sources
  • audioUtils.ts - Low-level utilities for PCM conversion, resampling, and Base64 encoding

Data Flow:

  1. Microphone audio captured by useAudioRecorder via MediaStream
  2. AudioWorklet processes samples in real-time (separate thread)
  3. Audio resampled from 48kHz to 16kHz, converted to PCM16, then Base64
  4. useNovaSonic streams audio chunks to Bedrock
  5. Response audio received as Base64, decoded to PCM, converted to Float32
  6. useAudioPlayer queues AudioBuffers and plays through AudioContext

Authentication Flow

This diagram shows the multi-layer authentication flow that enables secure browser-to-Bedrock communication without exposing long-term credentials.

Authentication Flow

Authentication Layers:

  • Cognito User Pool - Handles user registration and login with email/password
  • Cognito Identity Pool - Exchanges JWT tokens for temporary AWS credentials
  • IAM Role - Defines permissions for authenticated users (Bedrock invoke access)
  • SigV4 Signing - AWS SDK automatically signs all Bedrock requests

Key Security Features:

  • No AWS credentials stored in browser - only temporary session credentials
  • Credentials automatically refreshed by Amplify SDK before expiration
  • IAM policy scoped to specific Bedrock model (amazon.nova-2-sonic-v1:0)
  • All communication over HTTPS with TLS 1.2+

Audio Processing Pipeline

This diagram shows the real-time audio processing that converts browser audio to Bedrock's required format and vice versa.

Audio Processing Pipeline

Input Processing (Recording):

  1. Microphone - Browser captures audio at native sample rate (typically 48kHz)
  2. AudioWorklet - Processes audio in separate thread, accumulates 2048 samples
  3. Resample - Linear interpolation converts 48kHz → 16kHz (Nova Sonic requirement)
  4. Float32 → PCM16 - Converts floating point [-1,1] to 16-bit signed integers
  5. Base64 Encode - Binary PCM encoded for JSON transmission

Output Processing (Playback):

  1. Base64 Decode - Received audio converted from Base64 to binary
  2. PCM16 → Float32 - 16-bit integers converted to floating point
  3. AudioBuffer - Web Audio API buffer created at 24kHz (Nova Sonic output rate)
  4. Queue & Play - Buffers queued and played sequentially through speakers

Interactive Sequence Diagram

Audio Processing Pipeline

Real-time audio capture, format conversion, and playback

0/16
MicMicrophoneWorkletAudioWorkletUtilsaudioUtils.tsHookuseNovaSonicBedrockPlayeruseAudioPlayer0msgetUserMedia() → MediaStream (48kHz Float32)Browser native20msprocess() called ~50x/sec, accumulate 2048 samples40mspostMessage(Float32Array)To main thread41msresampleAudio(48kHz → 16kHz)Linear interp42msfloat32ToPcm16(): [-1,1] → 16-bit signed int43msuint8ArrayToBase64(): binary → text44msonAudioData(base64String)45msaudioInput event: { content: base64 }Via SDK2000msaudioOutput event: { content: base64 } (24kHz)Streaming2001msqueueAudio(base64String)2002msbase64ToUint8Array()2003msUint8Array (PCM bytes)2004mscreateAudioBufferFromPcm()2005mspcm16ToFloat32(): 16-bit int → [-1,1]2006msAudioBuffer (24kHz)2007msAudioBufferSourceNode.start() → speakers
Mic
Worklet
Utils
Hook
Bedrock
Player
Milestone
Audio Format Conversions
Input: 48kHz Float32 → 16kHz PCM16 → Base64 | Output: Base64 → PCM16 → Float32 → 24kHz AudioBuffer

Bidirectional Streaming Protocol

This diagram illustrates how the useNovaSonic hook manages the complex bidirectional streaming protocol with Amazon Bedrock.

Bidirectional Streaming Protocol

Event Protocol: Nova Sonic uses an event-based protocol where each interaction consists of named sessions, prompts, and content blocks.

Input Events (sent to Bedrock):

  • sessionStart - Initializes session with inference parameters (maxTokens: 1024, topP: 0.9, temperature: 0.7)
  • promptStart - Defines output audio format (24kHz, LPCM, voice "matthew")
  • contentStart - Marks beginning of content blocks (TEXT for system prompt, AUDIO for user speech)
  • textInput - Sends system prompt text content
  • audioInput - Streams user audio chunks as Base64-encoded 16kHz PCM
  • contentEnd - Marks end of content block
  • promptEnd / sessionEnd - Terminates prompt and session

Output Events (received from Bedrock):

  • contentStart - Marks role transitions (USER for ASR, ASSISTANT for response)
  • textOutput - Returns transcribed user speech and generated AI response text
  • audioOutput - Returns synthesized voice response as Base64-encoded 24kHz PCM
  • contentEnd - Marks end of response content

Async Generator Pattern: The SDK requires input as AsyncIterable<Uint8Array>. The hook implements this using:

  • Event Queue - Pre-queued initialization events before stream starts
  • Promise Resolver - Backpressure control for yielding events on demand
  • pushEvent() - Adds new events during conversation (audio chunks)

Serverless Architecture Overview

This diagram provides a comprehensive view of all components - the entire solution is serverless with no EC2 instances or containers to manage.

Serverless Architecture Overview

Frontend Stack:

  • React - Component-based UI framework
  • Vite - Build tool and dev server
  • TypeScript - Type-safe development
  • AWS Amplify Hosting - Static web hosting with global CDN

Backend Stack (Amplify Gen 2):

  • amplify/backend.ts - Infrastructure defined in TypeScript
  • Cognito User Pool - Email-based authentication
  • Cognito Identity Pool - AWS credential vending
  • IAM Policy - Grants bedrock:InvokeModel permission for bidirectional streaming

AI Service:

  • Amazon Bedrock - Managed foundation model inference
  • Nova 2 Sonic - Speech-to-speech model (us-east-1)
  • Bidirectional Streaming - Real-time duplex communication

Technical Challenges & Solutions

Challenge 1: AudioWorklet CORS Issues

Problem: Loading AudioWorklet from external file fails with CORS errors on some deployments.

Solution: Inline the AudioWorklet code as a Blob URL:

const blob = new Blob([audioWorkletCode], { type: 'application/javascript' });
const workletUrl = URL.createObjectURL(blob);
await audioContext.audioWorklet.addModule(workletUrl);
URL.revokeObjectURL(workletUrl);

Challenge 2: Sample Rate Mismatch

Problem: Browsers capture audio at 48kHz, but Nova Sonic requires 16kHz input.

Solution: Linear interpolation resampling in real-time:

const resampleAudio = (audioData: Float32Array, sourceSampleRate: number, targetSampleRate: number) => {
const ratio = sourceSampleRate / targetSampleRate;
const newLength = Math.floor(audioData.length / ratio);
const result = new Float32Array(newLength);
for (let i = 0; i < newLength; i++) {
const srcIndex = i * ratio;
const floor = Math.floor(srcIndex);
const ceil = Math.min(floor + 1, audioData.length - 1);
const t = srcIndex - floor;
result[i] = audioData[floor] * (1 - t) + audioData[ceil] * t;
}
return result;
};

Challenge 3: SDK Bidirectional Stream Input

Problem: AWS SDK requires input as AsyncIterable<Uint8Array>, but events need to be pushed dynamically during the conversation.

Solution: Async generator with event queue and promise-based backpressure:

async function* createInputStream() {
while (isActiveRef.current && !ctrl.closed) {
while (ctrl.eventQueue.length > 0) {
yield ctrl.eventQueue.shift();
}
const nextEvent = await new Promise(resolve => {
ctrl.resolver = resolve;
});
if (nextEvent === null) break;
yield nextEvent;
}
}

Getting Started

GitHub Repository: https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat

Prerequisites

  • Node.js 18+
  • AWS Account with Bedrock access enabled
  • AWS CLI configured with credentials

Deployment Steps

  1. Enable Nova 2 Sonic in Bedrock Console (us-east-1 region)

  2. Clone and Install:

git clone https://github.com/chiwaichan/amplify-react-amazon-nova-2-sonic-voice-chat.git
cd amplify-react-amazon-nova-2-sonic-voice-chat
npm install
  1. Start Amplify Sandbox:
npx ampx sandbox
  1. Run Development Server:
npm run dev
  1. Open Application: Navigate to http://localhost:5173, create an account, and start talking!

Summary

This architecture provides a reusable building block for voice-enabled AI applications:

  • Zero backend servers - Direct browser-to-Bedrock communication
  • Real-time streaming - HTTP/2 bidirectional streaming for low latency
  • Secure authentication - Cognito User Pool + Identity Pool + IAM policies
  • Audio processing pipeline - Web Audio API, AudioWorklet, PCM conversion
  • Infrastructure as code - AWS Amplify Gen 2 with TypeScript backend definition

The entire interaction happens in real-time: speak naturally, and hear the AI respond within seconds.

Cloud-Connected Sphero RVR Robot with AWS IoT Core and Seeed Studio XIAO ESP32S3

· 4 min read
Chiwai Chan
Tinkerer

Seeed Studio XIAO ESP32S3

Sphero RVR

A Sphero RVR integrated with a Seeed Studio XIAO ESP32S3 with telemetry uploaded into, and also, basic drive remote control commands received from any where leveraging AWS IoT Core.

Overview

Lately I have been aiming to go deep on AI Robotics, and last year I have been slowly experimenting more and more with anything that is AI, IoT and Robotics related; with the intention of learning and going as wide and as deep as possible in any pillars I can think of. You can check out my blogs under the Robotics](/projects/robotics) Project to see what I have been up to. This year I want to focus on enabling mobility for my experiments - as in providing wheels for solutions to move around the house, ideally autonomously; starting off with wheel based solutions bought off-shelve, followed by solutions that I build myself from open-sourced projects people have kindly contirbuted online, and then ambitiously designed, 3D Printed and built all from the ground up - perhaps in a couple of years time.

This project uses a Seeed Studio XIAO ESP32S3 microcontroller to communicate with a Sphero RVR robot via UART, while simultaneously connecting to AWS IoT Core over WiFi. The system publishes real-time sensor telemetry and accepts remote drive commands through MQTT.

Hardware Components

ComponentDescription
Seeed Studio XIAO ESP32S3Compact ESP32-S3 microcontroller with WiFi, 8MB flash
Sphero RVRProgrammable robot with motors, IMU, color sensor, encoders
XIAO Expansion BoardProvides OLED display (128x64 SSD1306) for status info

Hardware Wiring

Hardware Wiring

Features

Real-time Telemetry

The system publishes comprehensive sensor data every 60 seconds:

  • IMU Data: Pitch, roll, yaw orientation
  • Accelerometer & Gyroscope: Motion and rotation data
  • Color Sensor: RGB values with confidence
  • Compass: Heading in degrees
  • Ambient Light: Lux measurements
  • Motor Thermal: Temperature and protection status
  • Encoders: Wheel tick counts
  • Position & Velocity: Locator data in meters

Remote Commands via MQTT

Control the RVR from anywhere using JSON commands:

  • Drive: Speed and heading control
  • Tank: Independent left/right motor control
  • Raw Motors: Direct motor speed control
  • LED Control: Headlights, brakelights, status LEDs
  • Navigation: Reset yaw, reset locator
  • Power: Wake and sleep commands

Local OLED Display

The XIAO Expansion Board's OLED display shows real-time sensor readings for local monitoring.

MQTT Message Flow

MQTT Message Flow

Sensor Data Pipeline

Sensor Data Pipeline

Architecture

The XIAO ESP32S3 acts as a bridge between the Sphero RVR and AWS IoT Core:

  1. UART Communication: The ESP32S3 communicates with the RVR via UART (GPIO43/44)
  2. WiFi Connection: Connects to local WiFi network
  3. MQTT over TLS: Secure connection to AWS IoT Core with X.509 certificates
  4. Bidirectional: Publishes telemetry and subscribes to command topics

High-Level System Architecture

Communication Protocol Stack

Sphero RVR Protocol

The Sphero RVR uses a binary packet-based protocol over UART. Each packet contains a start-of-packet byte (0x8D), an 8-byte header with device ID and command ID, variable-length data body, checksum, and end-of-packet byte (0xD8). The RVR has two internal processors: Nordic (handles BLE, power, color detection) and ST (handles motors, IMU, encoders).

Sphero RVR Protocol Architecture

Source Code

I ported the code into this project to control the RVR using the UART protocol based on the Sphero SDK.

You can find the source code for this project here: https://github.com/chiwaichan/platformio-aws-iot-seeed-studio-esp32s3-sphero-rvr

Controlling Hugging Face LeRobot SO101 arms over AWS IoT Core using a Seeed Studio XIAO ESP32C3

· One min read
Chiwai Chan
Tinkerer

LeRobot Architecture

Seeed Studio XIAO ESP32C3 and Bus Servo Driver Board

The LeRobot Follower arm is subscribed to an IoT Topic that is being published in real-time by the LeRobot Leader arm over AWS IoT Core, using a Seeed Studio XIAO ESP32C3 integrated with a Seeed Studio Bus Servo Driver Board, the driver board is controlling the 6 Feetech 3215 Servos over the UART protocol.

In this video I demonstrate how to control a set of Hugging Face SO-101 arms over AWS IoT Core, without the use of the LeRobot framework, nor using a device such as a Mac nor a device like Nvidia Jetson Orin Nano Super Developer Kit. Only using Seeed Studio XIAO ESP32C3 and AWS IoT.

You can find the source code for this solution here: https://github.com/chiwaichan/aws-iot-core-lerobot-so101

AWS IoT Core – Iron Man – Part 1

· One min read
Chiwai Chan
Tinkerer

FeedMyFurBabies – Storing Historical AWS IoT Core MQTT State data in Amazon Timestream

· 3 min read
Chiwai Chan
Tinkerer

In my code examples I shared in the past, when I sent and received IoT messages and states to and from AWS Core IoT Topics, I only implemented subscribers to react to perform a functionality when an MQTT message is received on a Topic; while that it was useful when my FurBaby was feed in the case when the Cat Feeder was triggered to drop Temptations into the bowls, however, we did not keep a record of the feeds or the State of the Cat Feeder into some form of data store over time - this meant we did not track when or how many times food was dropped into a bowl.

In this blog, I will demonstrate how to store the data in the MQTT messages sent to AWS IoT Core and ingest the data into Amazon Timestream database; Timestream is a serverless time-series database that is fully managed so we can leverage with worrying about maintaining the database infrastructure.

Architecture

Architecture

In this architecture we have two AWS IoT Core Topics, where each IoT Topic has an IoT Rule associated with it that will send all the data from every MQTT message receieved from that Topic - there is an ability to filter the messages but we've not using to use it, and that data is ingested into a corresponding Amazon Timestream table.

Deploying the reference architecture

git clone git@github.com:chiwaichan/feedmyfurbabies-cdk-iot-timestream.git
cd feedmyfurbabies-cdk-iot-timestream
cdk deploy

git remote rm origin
git remote add origin https://git-codecommit.us-east-1.amazonaws.com/v1/repos/feedmyfurbabies-cdk-iot-timestream-FeedMyFurBabiesCodeCommitRepo
git push --set-upstream origin main

Here is a link to my GitHub repository where this reference architecture is hosted: https://github.com/chiwaichan/feedmyfurbabies-cdk-iot-timestream

Simulate an IoT Thing to Publish MQTT Messages to IoT Core Topic

In the root directory of the repository is a script that simulates an IoT Thing and it will constantly publish MQTT messages to the "cat-feeder/states" Topic; ensure you have the AWS CLI installed on your machine with a default profile as it relies on it, and ensure the Access Keys used by the default profile has the permission to call "iot:Publish".

It sends a random number for the "food_capacity" that ranges 0-100 to represent the percentage of food that is remaining in a cat feeder, and a values for the "device_location" as we are scaling out with the number of cat feeders placed around the house. Be sure to send the same JSON structure in your MQTT message if you decide to not use the provided script to send the messages to the Topic.

publish mqtt messages script

Query the data stored in the Amazon Timestream Database/Table

Now lets jump into the AWS Console, then jump into the Timestream Service and go into the "catFeedersStates" Table; then click on "Actions" to show the "Query table" option to go to the Query editor.

timestream table

The Query editor will show a default query statement, click "Run" and you will see in the Query results the data from the MQTT messages that was generated by the script; where the MQTT messages was ingested from the IoT Topic "cat-feeder/states".

timestream table query

FeedMyFurBabies – Send and Receive MQTT messages between AWS IoT Core and your micro-controller – I am switching from Arduino CPP to MicroPython

· 8 min read
Chiwai Chan
Tinkerer

Recently I switched my Cat Feeder project's IaC to AWS CDK in favour of increasing my focus and productivity on building and iterating, rather than constantly mucking around with infrastructure everytime I resume my project after a break; which is rare and far between these days.

Just as with coding IoT microcontrollers such as the ESP32s, I want to get straight back into building every opportunity I get; so I am also switching away from Arduino based microcontroller development written in C++ - I don't have a background in C++ and to be honest this is the aspect I struggled with the most because I tend to forget things after not touching it for 6 months or so.

So I am switching to MicroPython to develop the logic for all my IoT devices going forward, this means I get to use Python - a programming lanaguge I work with frequently so there is less chance of me being forgetful when I use it at least once a month. MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a subset of the Python standard library and is optimized to run on microcontrollers and in constrained environments - a good fit for IoT devices such as the ESP32!

What about all the Arduino hardware and components I already invested in?

Good news is MircoPython is supported on all ESP32 devices - based on the ones I myself have purchased; all I need to do to each ESP32 device is to flash it with a firmware - if you are impatient, you can scroll down and skip to below to the flashing the firmware section. When I first started Arduino, MicroPython was available to use, but that was 2 years ago and there were not as many good blog and tutorial content out there as there is today; I couldn't at the time work out how to control components such as sensors, servos and motors as well as I could with C++ based coding using Arduino; nowdays there are way more content to learn off and I've learnt (by PoCing individual components) enough to switch to MicroPython. As far as I understand it, any components you have for Arduino can be used in MicroPython, provided that there is a library out there that supports it, if there isn't then you can always write your own!

What's covered in this blog?

By the end of this blog, you will be able to send and receive MQTT messages from AWS IoT core using MicroPython, I will also cover the steps involved in flashing a MicroPython firmware image onto an ESP32C3. Although this blog has a focus and example on using an ESP32, this example can be applied to any micro-controllers of any brand or flavours, provided the micro-controller you are using supports MicroPython.

Flashing the MicroPython firmware onto a Seeed Studio XIAO ESP32C3

Seeed Studio XIAO ESP32C3

The following instructions works for any generic ESP32C3 devices!

Download the latest firmware from micropython.org

https://micropython.org/download/ESP32_GENERIC_C3/

MicroPython firmware esp32c3

Next, I connected my ESP32C3 to my Mac and ran the following command to find the name of the device port

 /dev/ttyUSB0

Find port device

My ESP32C3 is named "/dev/tty.usbmodem142401", the name for your ESP32C3 may be different.

Next, install esptool onto your computer, then run the following commands to flash the MicroPython firmware onto the ESP32C3 using the bin file you've just downloaded.

esptool.py --chip esp32c3 --port /dev/tty.usbmodem142401 erase_flash

esptool.py --chip esp32c3 --port /dev/tty.usbmodem142401 --baud 460800 write_flash -z 0x0 ESP32_GENERIC_C3-20240105-v1.22.1.bin

It should look something like this when you run the commands.

esptool Flashing Firmware

Install Thonny and run it. Then go to Tools -> Options, to configure the ESP32C3 device in Thonny to match the settings shown in the screenshot below.

esptool Flashing Firmware

If everything went well, you should see these 2 sections in Thonny: "MicroPython Device" and "Shell", if not then try clicking on the Stop button in the top menu.

Thonny MicroPython Device

AWS IoT Core Certificates and Keys

In order to send MQTT messages to an AWS IoT Core Topic, or to receive a message from a Topic in reverse, you will need a set of Certificate and Key\s for your micro-controller; as well as the AWS IoT Endpoint specific to your AWS Account and Region.

It's great if you have those with you so you can skip to the next section, if not, do not worry I've got you covered. In a past blog I have a reference architecture accompanied by a GitHub repository on how to deploy resources for an AWS IoT Core solution using AWS CDK, follow that blog to the end and you will have a set of Certificate and Key to use for this MicroPython example; the CDK Stack will deploy all the neccessary resources and policies in AWS IoT Core to enable you to both send and receive MQTT messages to two separate IoT Topics.

Reference AWS IoT Core Architecture: https://chiwaichan.co.nz/blog/2024/02/02/feedmyfurbabies-i-am-switching-to-aws-cdk/

Upload the MicroPython Code to your device

Now lets upload the MicroPython code to your micro-controller and prepare the IoT Certificate and Key so we can use it to authenticate the micro-controller to enable it to send and receive MQTT messages between your micro-controller and IoT Core.

Clone my GitHub repository that contains the MicroPython example code to publish and receive MQTT message with AWS IoT Core: https://github.com/chiwaichan/feedmyfurbabies-micropython-iot

It should look something like this.

GitHub repo

Copy your Certificate and Key into the respective files shown in the above screenshot; otherwise, if you are using the Certificate and Key from my reference architecture, then you should use the 2 Systems Manager Parameter Store values create by the CDK Stack.

Systems Manager Parameter Store values

Next we convert the Certificate and Key to DER format - converting the files to DER format turns it into a binary format and makes the files more compact, especially neccessary when we run use it on small devices like the ESP32s.

In a terminal go to the certs directory and run the following commands to convert the certificate.pem and private.key files into DER format.

openssl rsa -in private.key -out key.der -outform DER
openssl x509 -in certificate.pem -out cert.der -outform DER

You should see two new files with the DER extension appear in the directory if all goes well; if not, you probably need to install openssl.

Systems Manager Parameter Store values

In Thonny, in the Files explorer, navigate to the GitHub repository's Root directory and open the main.py file. Fill in the values for the variables shown in the screenshot below to match your environment, if you are using my AWS CDK IoT referenece architecture then you are only required to fill in the WIFI details and the AWS IoT Endpoint specific to your AWS Account and Region.

Wifi and iot Core Settings

Select both the certs folder and main.py in the Files explorer, then right click and select "Upload to /" to upload the code to your micro-controller; the files will appear in the "MicroPython Device" file explorer.

Upload files to Thonny

This is the moment we've been waiting for, lets run the main.py Python script by clicking on the Play Icon in green.

Run main

If all goes well you should see some output in the Shell section of Thonny.

Thonny Shell

The code in the main.py file has a piece of code that is generating a random number for the food_capacity percentage property in the MQTT message; you can customise the message to fit your use case.

But lets verify it is actually received by AWS IoT Core.

aws iot mqtt test client

Alright, lets go the other way and see if we can receive MQTT messages from AWS IoT Core using the other Topic called "cat-feeder/action" we subscribed to in the MicroPython code.

Lets go back the AWS Console and use the MQTT test client to publish a message.

publlish mqtt from aws

thonny message received

In the Thonny Shell we can see the message "Hello from AWS IoT console" sent from the AWS IoT Core side and it being received by the micro-controller.

Feed My Fur Babies – AWS Amplify and Route53

· 4 min read
Chiwai Chan
Tinkerer

I'm starting a new blog series where I will be documenting my build of a full-stack Web and Mobile application using AWS Amplify to implement both the frontend, as well as the backend; whilst developing dependent downstream Services outside of Amplify using AWS Serverless components to implement a Micro-Service architecture using Event-Driven design pattern - where we will break the application up into smaller consumable chunks that works together well.

Since we are creating from scratch a completely new application, I will also incorporate a vital pattern that will reduce complexity throughout the lifetime of the application: we will also be implementing the application using the Event-Sourcing pattern - this pattern ensures every Event ever published within a system is stored within an immutable ledger; this ledger will enable new Data Stores of any Data Store Engine to be created at any given time by replaying the Events in the ledger, of Events created from a start date and time to an end Date and Time.

CQRS is a pattern I will write up about with great detailed in a blog in the near future, CQRS will enable the ability to create mulitple Data Stores with identical data, each Data Store using a unique Data Store Engine instance.

What is AWS Amplify?

Amplify is an AWS Service that provides any frontend web or mobile developers with no cloud expertise the ability to build and host full-stack applications on AWS. As a frontend developer, you can leverage it to build and integrate AWS Services and components into your frontend without having to deal with the underlying AWS Services; all Services the frontend is built on top of is managed by AWS Amplify - e.g. no need to managed CloudFormation Stacks, S3 Storage or AWS Cognito.

What will I be doing with Amplify

My experience from a while ago was full-stack application development and I have worked under that role for over 10 years, I've used various frontend/backend frameworks, components and patterns.

I will be building a website called Feed My Fur Babies where I will provide video streams showing live feeds of my cats from web cams placed in various spots around my house, the website will also provide users with the ability to feed my cats using internet enabled devices like the IoT Cat Feeders I recently put together and watch them hoon on their favorite treats; although I am experienced with building websites from the ground up using AWS Service, I am aiming to build Feed My Fur Babies whilst leveraging as little as possible on that experience - this is so I am building the website as close to the targeted demographics skillset of a typical Amplify as possible, i.e. as a developer with only frontend experience.

Current Architecture State

Architecture

Update

Let's talk about what was done to get to the current architecture state.

First thing I did was buying the domain feedmyfurbabies.com using AWS Route53.

Route53 Feed My Fur Babies

Next, I created a new Amplify App called "Feedme".

Amplify - App

Within the App I created two Hosted Environments: one environment is to host the production environment, the other is to host a development environment. Each Hosted Environment is configured to be built and deployed from a specfic Branch in the shared CodeCommit Repository used to version control the frontend source code.

Amplify - Hosted Environments

Amplify - Hosted Environment - Main

Amplify - Hosted Environment - Dev

AWS DeepRacer

· 9 min read
Chiwai Chan
Tinkerer

This blog is to detail my first experiences with AWS DeepRacer as somebody who knows very little about how AI works under the hood, and at first didn't fully understand the difference between Supervised Learning vs Unsupervised Learning vs Reinforcement Learning when I was writing my first Python code for the "reward_function".

AWS DeepRacer Training

DeepRacer is a Reinforcement Learning based AWS Machine Learning Service that provides a quick and fun way to get into ML by letting you build and train an ML model that can be used to drive around on a virtual, as well as a physical race track.

I'm a racing fan in many ways whether it is watching Formula 1, racing my mates in go karting or having a hoon on the scooter, so once I found out about AWS DeepRacer service I've always wanted to dip my toes in it. More than 2 years later I found an excuse to play with it during my preparations for the AWS Certified Machine Learning Specialty exam, I am expecting a question or two on DeepRacer in the exam so what better way to learn about DeepRacer than to try it out by cutting some Python code.

Goal

Have a realistic one this time and keep it simple: produce an ML model that can drive a car around a few diferent virtual tracks for a few laps without going off the track.

My Machine Learning background and relevant experience

  • Statistics, Calculus and Physics: was pretty good at these during high school and did ok in Statistics and Calculas during uni.
  • Python: have been writing some Python code in the past couple of years on and off, mainly in AWS Lambda functions.
  • Machine Learning: none
  • Writing code for mathematic: had a job that involved writing complex mathmatic equations and tree based algorithms in Ruby and Java for about 7 years

Approach

Code a Python Reward Function to return a Reinforcement Reward value based on the state of the DeepRacer vehicle - the reward can be positive for good behaviour and also be negative to discourage the agent (vehicle) for a state that is not going to give us a good race pace. The state of the vehicle is a set of key/values shown below and is available to the Python Reward Function during runtime for us to use to calculate a reward value.

    # "all_wheels_on_track": Boolean,        # flag to indicate if the agent is on the track
# "x": float, # agent's x-coordinate in meters
# "y": float, # agent's y-coordinate in meters
# "closest_objects": [int, int], # zero-based indices of the two closest objects to the agent's current position of (x, y).
# "closest_waypoints": [int, int], # indices of the two nearest waypoints.
# "distance_from_center": float, # distance in meters from the track center
# "is_crashed": Boolean, # Boolean flag to indicate whether the agent has crashed.
# "is_left_of_center": Boolean, # Flag to indicate if the agent is on the left side to the track center or not.
# "is_offtrack": Boolean, # Boolean flag to indicate whether the agent has gone off track.
# "is_reversed": Boolean, # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False).
# "heading": float, # agent's yaw in degrees
# "objects_distance": [float, ], # list of the objects' distances in meters between 0 and track_length in relation to the starting line.
# "objects_heading": [float, ], # list of the objects' headings in degrees between -180 and 180.
# "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False).
# "objects_location": [(float, float),], # list of object locations [(x,y), ...].
# "objects_speed": [float, ], # list of the objects' speeds in meters per second.
# "progress": float, # percentage of track completed
# "speed": float, # agent's speed in meters per second (m/s)
# "steering_angle": float, # agent's steering angle in degrees
# "steps": int, # number steps completed
# "track_length": float, # track length in meters.
# "track_width": float, # width of the track
# "waypoints": [(float, float), ] # list of (x,y) as milestones along the track center

Based on this set of key/values we can get a pretty good idea of the state of the vehicle/agent and what it was getting up to on the track.

So using these key/values we calculate and return a value for the Reward Function in Python. For example, if the value for "is_offtrack" is "true" then this indicates the vehicle has come off the track so we can return a negative value for the Reward Function; also, we might want to amplify the negative reward if the vehicle was doing something else it should not be doing - like steering right into a left turn (steering_angle).

Conversely, we return a positive reward value for good behaviour such as steering straight on a long stretch of the track going as fast as possible within the center of the track.

My approach to coding the Reward Functions was pretty simple: calculate the reward value based on how I myself would physically drive on a go kart track; factor as much into the calculations as possible such as how the vehicle is hitting the apex, and is it hitting it from the outside of the track or the inside; is the vehicle in a good position to take the next turn or two. For each iteration of the code, I train a new model in AWS DeepRacer with it; I normally watch the video of the simulation to pay attention to what could be improved in the next iteration; then we do the whole process all over again.

Within the Reward Function I work out a bunch of sub-rewards such as:

  • steering straight on a long stretch of the track as fast as possible within the center of the track
  • is the vehicle in a good position to take the next turn or two
  • is the vehicle was doing something else it should not be doing like steering right into a left turn

These are just some examples of sub-rewards I work out - and the list grows as I iterate and improve (or make it worse) with each version of the reward function, at the end of each function I calculate the net reward value based on the sum up of the weighted sub-rewards; each sub-reward could have a higher importance than another so I've taken a weighted approach to the calculation to allow a sub-reward to amplify the effect it has on the net reward value.

Code

Here is the very first version of the Reward Function I coded:

MAX_SPEED = 4.0

def reward_function(params):
track_width = params['track_width']
distance_from_center = params['distance_from_center']
steering_angle = params['steering_angle']
speed = params['speed']

weighted_sub_rewards = []


half_track_width = track_width / 2.0


within_percentage_of_center_weight = 1.0
steering_angle_weight = 1.0
speed_weight = 0.5
steering_angle_and_speed_weight = 1.0


add_weighted_sub_reward(weighted_sub_rewards, "within_percentage_of_center_weight", within_percentage_of_center_weight, get_sub_reward_within_percentage_of_center(distance_from_center, track_width))
add_weighted_sub_reward(weighted_sub_rewards, "steering_angle_weight", steering_angle_weight, get_sub_reward_steering_angle(steering_angle))
add_weighted_sub_reward(weighted_sub_rewards, "speed_weight", speed_weight, get_sub_reward_speed(speed))
add_weighted_sub_reward(weighted_sub_rewards, "steering_angle_and_speed_weight", steering_angle_and_speed_weight, get_sub_reward_steering_angle_and_speed_weight(steering_angle, speed))

print(weighted_sub_rewards)

weight_total = 0.0
numerator = 0.0

for weighted_sub_reward in weighted_sub_rewards:
sub_reward = weighted_sub_reward["sub_reward"]
weight = weighted_sub_reward["weight"]

weight_total += weight
numerator += sub_reward * weight

print("sub numerator", weighted_sub_reward["sub_reward_name"], (sub_reward * weight))

print(numerator)
print(weight_total)
print(numerator / weight_total)

return numerator / weight_total



def add_weighted_sub_reward(weighted_sub_rewards, sub_reward_name, weight, sub_reward):
weighted_sub_rewards.append({"sub_reward_name": sub_reward_name, "sub_reward": sub_reward, "weight": weight})

def get_sub_reward_within_percentage_of_center(distance_from_center, track_width):
half_track_width = track_width / 2.0
percentage_from_center = (distance_from_center / half_track_width * 100.0)

if percentage_from_center <= 10.0:
return 1.0
elif percentage_from_center <= 20.0:
return 0.8
elif percentage_from_center <= 40.0:
return 0.5
elif percentage_from_center <= 50.0:
return 0.4
elif percentage_from_center <= 70.0:
return 0.15
else:
return 1e-3

# The reward is better if going straight
# steering_angle of -30.0 is max right
def get_sub_reward_steering_angle(steering_angle):
is_left_turn = True if steering_angle > 0.0 else False
abs_steering_angle = abs(steering_angle)

print("abs_steering_angle", abs_steering_angle)

if abs_steering_angle <= 3.0:
return 1.0
elif abs_steering_angle <= 5.0:
return 0.9
elif abs_steering_angle <= 8.0:
return 0.75
elif abs_steering_angle <= 10.0:
return 0.7
elif abs_steering_angle <= 15.0:
return 0.5
elif abs_steering_angle <= 23.0:
return 0.35
elif abs_steering_angle <= 27.0:
return 0.2
else:
return 1e-3


def get_sub_reward_speed(speed):
percentage_of_max_speed = speed / MAX_SPEED * 100.0

print("percentage_of_max_speed", percentage_of_max_speed)

if percentage_of_max_speed >= 90.0:
return 0.7
elif percentage_of_max_speed >= 65.0:
return 0.8
elif percentage_of_max_speed >= 50.0:
return 0.9
else:
return 1.0


def get_sub_reward_steering_angle_and_speed_weight(steering_angle, speed):
abs_steering_angle = abs(steering_angle)
percentage_of_max_speed = speed / MAX_SPEED * 100.0

steering_angle_weight = 1.0
speed_weight = 1.0

steering_angle_reward = get_sub_reward_steering_angle(steering_angle)
speed_reward = get_sub_reward_speed(speed)

return (((steering_angle_reward * steering_angle_weight) + (speed_reward * speed_weight)) / (steering_angle_weight + speed_weight))

Here is a video of one of the simulation runs:

Here is a link to my Github repository where I have all of versions of reward functions I created: https://github.com/chiwaichan/aws-deepracer/tree/main/models

Conclusion

After a few weeks of training and doing about 20 runs with each run using a different reward function, I did not meet the goal I set out to do - get the agent/vehicle to do 3 laps without coming off the track on a few different tracks. On average each model was only able to race the virtual car around each track for a little over a lap without crashing. It felt like at times I hit a bit of a wall and could not improve the results and in some instances the model got worse. I need to take a break from this to think of a better approach, the way I am doing it is by improving areas without measuring the progress in each area and the amount of improvement made in each.

Next steps

  • Learn how to train a DeepRacer model in SageMaker Studio (outside of the DeepRacer Service) using a Jupyter notebook so I can have more control over how models are trained
  • Learn and perform HyperParameter Optimizations using some of the SageMaker features and services
  • Take a Data and Visualisation driven approach to derive insights into where improvements can be made to the next model iteration
  • Learn to optimise the training, e.g. stop the training early when the model is not performing well
  • Sit the AWS Certified Machine Learning Specialty exam