🤟🤖

Reachy Mini ASL QA

A robot that sees, understands, and responds to American Sign Language

Over 500,000 people in the US use ASL as their primary language. Most AI systems ignore them. This one doesn't.

Webcam Feed
🤟
Hand landmark tracking
+
Reachy Mini
🤖
MuJoCo simulation

Side-by-side view: webcam with hand tracking (left) + Reachy Mini robot simulation with answer overlays (right)

Seven-Stage Pipeline

Raw webcam pixels become a spoken robot response in under 3 seconds. Everything runs locally.

1

Hand Tracking

MediaPipe HandLandmarker

Extracts 21 3D keypoints per hand at 30 FPS. Every joint from wrist to fingertip, giving precise spatial data for classification.

2

Sign Classification

Geometric Heuristics

Dual-check finger detection: palm-center distance ratio (1.2x threshold) + PIP joint angle (150° threshold). Eliminates false positives from single-metric approaches.

3

Gloss Sequencing

Temporal Buffer

Hold-time validation (0.4s), duplicate removal, and timeout detection (2s) turn noisy per-frame detections into clean sign sequences.

4

Gloss Translation

Glossa-BART (Seq2Seq Transformer)

ASL gloss is not English -- word order differs, function words are dropped. A fine-tuned BART model with beam search translates gloss to natural sentences.

5

Intent Classification

Sentence-Transformers (all-MiniLM-L6-v2)

384-dimensional embeddings + cosine similarity classify questions into 5 intent categories: identity, time, date, location, general.

6

Answer Generation

Ollama + Llama 3.2 3B

Local LLM generates contextual answers with intent-aware prompts. Capped at 2 sentences for natural spoken output. No cloud API needed.

7

Robot Response

Reachy Mini SDK + MuJoCo + TTS

Reachy Mini nods and wiggles its antennas. MuJoCo renders the simulation. TTS speaks the answer aloud. Three response channels simultaneously.

11 Supported Signs

🖐️
HELLO
All 5 fingers open
YES
Closed fist
☝️
YOU
Index finger only
✌️
WHAT
Index + Middle
👍
GOOD
Thumb only
🤙
I
Pinky only
👆
WHERE
Thumb + Index
🤙
HELP
Thumb + Pinky
🤟
WATER
Index + Middle + Ring
🖐️
STOP
4 fingers, no thumb
🤟
TIME
Thumb + Index + Middle

Three-Model Architecture

Three specialized models that no single approach can replace.

Vision

MediaPipe HandLandmarker

21 3D keypoints per hand, 30 FPS on CPU, TFLite runtime

Translation

Glossa-BART

Fine-tuned BART seq2seq for ASL gloss-to-English, beam search decoding

Reasoning

Llama 3.2 3B

Local LLM via Ollama for contextual answers, intent-aware prompts

Design Principles

Privacy-First

Everything runs locally. No cloud APIs, no telemetry, no data leaving the machine. The webcam feed is processed in-memory and never saved.

Respond, Don't Interrogate

The robot answers questions -- it doesn't ask users to repeat, rephrase, or slow down. If detection is uncertain, the system waits rather than guessing wrong.

Fail Gracefully

Every component has a fallback. Transformer won't load? Rules take over. Ollama down? Pre-written responses. No TTS? Text output. The system never crashes on a missing dependency.

Concise Answers

Responses capped at 2 sentences. Nobody wants to listen to a robot monologue. Short, clear, helpful.

Example Interactions

Sign: WHAT TIME
"It is 10:08 PM."
Sign: WHERE WATER
"I can help you find water nearby."
Sign: HELLO
"Hello! I'm Reachy Mini, a robot designed to understand ASL."
Sign: HELP YOU
"I'm here to help! What do you need?"

Quick Start

1

Install Ollama + pull model

curl -fsSL https://ollama.com/install.sh | sh && ollama pull llama3.2:3b
2

Start the Reachy Mini daemon

reachy-mini-daemon --sim --headless
3

Install and run

pip install -e . && python -m reachy_mini_asl_qa.main

Prerequisites: Python 3.10+, webcam, macOS or Linux

Tech Stack

Vision
MediaPipe HandLandmarker
Classification
Geometric Heuristics (Custom)
Translation
Glossa-BART (Seq2Seq)
Intent
all-MiniLM-L6-v2
LLM
Llama 3.2 3B (Ollama)
Simulation
MuJoCo Physics Engine
Robot
Reachy Mini SDK
Speech
Piper TTS / System TTS