Related Blog Posts
- Part 1: Frontend and Voice Processing
- Part 2: Cloud Infrastructure with CDK, IoT Core, and AppSync
- Part 3: Edge AI Agent with Strands on NVIDIA Jetson
Demo
Overview
This project is a real-time voice-to-sign-language translation system that takes spoken words from a browser microphone and translates them into physical ASL fingerspelling on a Pollen Robotics Amazing Hand. The system spans three repositories covering the frontend, cloud infrastructure, and edge AI agent.
Technologies Used
- Frontend: React 19, Vite 7, TypeScript, Three.js, AWS Amplify Gen 2
- Voice AI: Amazon Nova 2 Sonic (speech-to-speech, bidirectional streaming)
- Edge AI: Strands Agents framework, Amazon Nova 2 Lite (tool-use reasoning)
- Cloud: AWS IoT Core, AWS AppSync, AWS Lambda, Amazon DynamoDB, Amazon S3
- Hardware: NVIDIA Jetson, Pollen Robotics Amazing Hand, Feetech SCS0009 servos
- Infrastructure: AWS CDK, GitHub Actions CI/CD
- Protocols: MQTT, GraphQL subscriptions, Serial (1M baud)
Key Features
- Direct browser-to-Bedrock bidirectional streaming with Nova 2 Sonic
- Forced tool use (
send_text) to relay cleaned speech as text on every utterance - ASL fingerspelling with 26 alphabet letters (A-Z)
- Real-time 3D hand visualisation synchronised with the physical hand via GraphQL subscriptions
- Video recording with H.264 encoding, S3 upload, and presigned URLs
- Fully serverless frontend with Cognito authentication
GitHub Repositories
amplify-react-nova-sonic-voice-chat-amazing-hand
React frontend with Nova 2 Sonic voice processing and MQTT servo command publishing
cdk-iot-amazing-hand-streaming
AWS CDK stack routing IoT Core MQTT messages through Lambda to AppSync
strands-agents-amazing-hands
Strands Agent on NVIDIA Jetson driving Amazing Hand servos for ASL fingerspelling