What This Article Includes
In this article, I’ll share everything you need to recreate the Vision Pro–style 3D interactive website I built.
You’ll get:
The exact prompt used to generate the Vision Pro–style 3D interactive web application
The .gltf 3D product model with animation used in the demo
The full source code of the website, including the 3D scene, hand-tracking logic, and UI components
You’ll be able to study, modify, and reuse each part to showcase any product in a high-fidelity, interactive 3D experience.
The purpose of this app is straightforward:
To showcase any product in a high-fidelity 3D view and allow users to interact with it naturally.
Users can rotate, zoom, and explore a product using hand gestures via their webcam, creating a Vision Pro–style interaction that runs directly in the browser.
This tutorial includes:
The core prompt that generates ~70% of the application
A step-by-step workflow for completing the remaining 30%
Notes on adding features, fixing bugs, and improving UI
A live preview and test 3D model
Live App Preview
Interactive demo: https://gesturemodel.emergent.host/
Important Note: The “70% Ready” Reality
The prompt below typically generates an application that is approximately 70% complete.
The remaining work is done by:
Prompting the AI to fix bugs
Adding additional product features
Refining UI and performance
This is expected and normal.
The workflow is iterative, not one-shot.
Step 1: Generate the Core Application (Primary Prompt)
This prompt establishes the full architecture: frontend framework, 3D engine, hand tracking, UI, and interaction logic.
Primary Prompt
Role: Expert Creative Technologist and Frontend Developer.
Task: Create a single-page immersive web application that features a high-fidelity 3D model viewer controlled by hand gestures via the webcam.
Design Aesthetic:
Vibe: Similar to igloo.inc or Apple's product pages, minimalist, premium, smooth motion, and highly responsive.
Background: Deep dark grey/black flexible gradient or blurred ambient lights to make the 3D model pop.
Typography: Clean sans-serif fonts (Inter or SF Pro).
Core Tech Stack:
Framework: React (Next.js App Router preferred).
3D Engine: React Three Fiber (R3F) + Drei.
Styling: Tailwind CSS.
Computer Vision: Google MediaPipe Hands (specifically @mediapipe/tasks-vision) or react-webcam with a hand tracking model.
Functional Requirements:
3D Scene:
Initialize a canvas with a realistic environment map (lighting).
Load a placeholder 3D model (a simple geometry for now, but configured to accept a .glb or .gltf file of an Apple Vision Pro later).
The model should float in the center with a gentle idle animation (sine wave hovering).
Webcam & Hand Tracking:
Ask for camera permissions immediately on load with a sleek UI overlay.
Display a small, stylized video feed in the corner (circular mask) so the user can see their hand.
Detect hand landmarks in real-time.
This prompt produces a strong, structured baseline.
Step 2: Prepare the 3D Product Model
The application currently works best with .gltf or .glb files.
You can convert models using tools such as:
Blender
Cinema 4D
Maya
Ensure:
Textures are correctly applied
Materials are embedded or referenced properly
Scale and orientation are correct
Test 3D Model (Vision Pro)
Step 3: Upload the 3D Model
I used Gemini 3 Pro on emergent.sh, which allows direct upload of the .gltf file into the project.
This avoids external storage links and simplifies iteration.
Step 4: Extend Functionality Through Prompts
Once the core viewer works, additional features can be added conversationally.
Examples:
Sliders to control X / Y / Z position
UI panels for product parameters
Additional camera behaviors
Animation controls
These features can usually be added in seconds with clear prompts.
Step 5: Build a Dedicated Product Environment
Beyond a simple viewer, the AI can generate:
Floor planes
Soft shadows and contact shadows
HDR environment lighting
Ambient backgrounds
This transforms the app into a true product showcase environment.
Step 6: Debug and Refine
When issues occur:
Paste the error output
Or provide a screenshot of the problem
In most cases, the AI can diagnose and fix issues automatically while preserving structure and performance.
Final Notes
This approach shifts development from writing code to designing interactions and experiences.
You define:
How the product should look
How users should interact with it
How it should feel
The AI handles implementation, iteration, and refinement.

