AdaptLearn
AI-Powered Adaptive Learning Companion
Gemini 3 Hackathon Submission
Inspiration
We have all experienced that moment of hitting a wall while learning—staring at a problem, feeling frustrated, and finding that the standard textbook explanation just isn't clicking. We realized that a human teacher doesn't just repeat the same answer; they look at your face, sense your confusion, and change their approach. We wanted to bring that same empathetic, adaptive capability to AI, creating a tutor that meets students exactly where they are emotionally and intellectually.
What it does
AdaptLearn is a multimodal learning companion that "watches" you learn. It uses your webcam to detect real-time facial expressions—like confusion, frustration, or confidence—and monitors your screen to understand the context of your work. Instead of a static chatbot response, it uses these inputs to dynamically alter its teaching style. If you look confused, it might switch to a step-by-step breakdown or a visual analogy. If you look confident, it keeps things concise. It essentially orchestrates multiple data streams to provide truly personalized education.
How we built it
We built the core using the Gemini 2.0 Flash Exp API, leveraging its multimodal capabilities to process video and text simultaneously. The frontend is built with React, utilizing the browser's MediaStream API to capture webcam and screen feeds securely. We created an orchestration engine that fuses the emotional data from the webcam with the technical context from the screen, feeding this combined state into Gemini to generate JSON-structured responses that control the UI and the pedagogical approach.
Challenges we ran into
Our biggest challenge was latency. Orchestrating three simultaneous data streams (video, screen, and text) to provide a sub-second response felt impossible at first. We also struggled to get the model to consistently output structured JSON for our UI while maintaining a natural, conversational tone in its actual teaching explanations. Balancing the "robotic" need for structure with the "human" need for empathy required significant prompt engineering.
Accomplishments that we're proud of
We are incredibly proud that we moved beyond simple keyword matching. The system actually "sees" a user's confused face and autonomously decides to pivot its teaching strategy without being told to do so. Integrating vision, emotion detection, and pedagogical adaptation into a single, cohesive feedback loop in such a short timeframe was a major technical victory for us.
What we learned
We discovered that emotional context is often the missing link in AI education; a technically correct answer is useless if the student is too frustrated to process it. We also learned that modern multimodal models like Gemini are finally fast enough to render this kind of complex, real-time orchestration practical in a standard browser environment, opening new doors for accessible education.
What's next for AdaptLearn
We plan to implement long-term memory so the AI remembers a student's specific struggle points across different sessions, creating a continuous learning arc. We also want to add full voice interaction for a hands-free experience and explore mobile integration to make high-quality, adaptive tutoring accessible to students anywhere in the world.
Log in or sign up for Devpost to join the conversation.