Voice to Robotics

Fingerspelling American Sign Language

Nova 2 Sonic · Bedrock · IoT Core · AppSync · Strands

Chiwai Chan — AWS User Group Wellington meet-up
29 April 2026

Who am I?

Tinkerer · Cloud · IoT · Robotics · Generative AI

 

https://chiwaichan.co.nz
https://x.com/chiwaichanconz
https://github.com/chiwaichan

What are we building?

  • Speak into a browser microphone
  • Cloud cleans + relays the text
  • An edge agent on an NVIDIA Jetson translates it
  • A robotic hand fingerspells it in ASL — in real time

Voice → Cloud → Edge AI → Robot hand

Why fingerspelling?

  • ASL fingerspelling is a constrained, well-defined target
  • One letter ≈ one hand pose — perfect for a 4-finger robotic hand
  • Builds a foundation for richer signs later
  • Real accessibility use cases

Voice in: Nova 2 Sonic as a "dumb relay"

Amazon Nova 2 Sonic on Amazon Bedrock.

  • Not used as a chatbot
  • Forced tool use via Bedrock's tool API: toolChoice: { any: {} }
  • Every utterance fires send_text with cleaned-up text
  • Removes filler ("um", "uh"), fixes grammar
  • Translates non-English speech to English

Why MQTT in the middle?

  • Decouples the browser from the robot
  • AWS IoT Core handles auth, fan-out, retained messages
  • Fan-out to the browser via AWS Lambda → AWS AppSync subscriptions
  • Edge device can be anywhere on the internet
  • Same pattern scales to many hands / many speakers

The edge agent (NVIDIA Jetson)

  • Strands Agent powered by Amazon Nova 2 Lite (Amazon Bedrock)
  • Subscribes to the sentence MQTT topic
  • Tools: split sentence → letters → servo poses
  • Streams video + state back via AWS IoT Core

The robot hand

  • Pollen Robotics Amazing Hand
  • Open-source, manufactured by Seeed Studio
  • 4 fingers, multiple servos per finger
  • Custom calibration per ASL letter

Live demo

🎤☁️🤖 ✋

(Fingers crossed.)

Latency: where the time goes

Measured from New Zealand → AWS us-east-1.

  • Browser → Nova 2 Sonic (Bedrock): sub-second to first token
  • Nova → MQTT publish: ~tens of ms
  • MQTT → Jetson: ~tens of ms
  • Letter pose → next letter: paced for legibility, not speed

Bottleneck is physical movement, not the cloud.

What surprised me

  • Bedrock's forced tool use is a great "shape the output" trick
  • AWS IoT Core + AppSync gives you real-time UI almost for free
  • Servo calibration is the hardest part 😅
  • Nova 2 Sonic handles accents and code-switching well

What's next

  • More signs beyond fingerspelling
  • Two-handed signing
  • On-device speech for offline mode
  • Attach the Amazing Hand to my LeRobot SO-101 arms
  • Vision-Language-Action (VLA) models on the Jetson Thor

Thank you! — Questions?

Blog series:
Part 1 — Frontend & Voice ·
Part 2 — Cloud Infrastructure ·
Part 3 — Edge AI Agent

https://chiwaichan.co.nz

https://x.com/chiwaichanconz

https://github.com/chiwaichan